Feature #4385: Unify empty arrays and NULLs - Mentat - Homeproj: Redmine for CESNET

Actions

Copy link

Feature #4385

closed

Unify empty arrays and NULLs

Added by Pavel Kácha about 6 years ago. Updated almost 6 years ago.

Status:

Rejected

Priority:

Normal

Assignee:

Radko Krkoš

Category:

Design

Target version:

Rejected

Start date:

10/19/2018

Due date:

% Done:

Estimated time:

To be discussed:

Description

Arrays in metadata table are not consistent in usage of NULLs and empty arrays. We might benefit from unification (as we don't differentiate nonexistent and empty arrays). Question is is which is "better":

Empty array may have space overhead, however the column then does not need to be NULL. Also, some queries might be simpler with no need for NULL special case.
NULL value may have overhead for the whole column, on the other side, seems like NULLs are stored as bitmap per row, so additional bit if there are already another NULL columns is negligible. How about query specialcasing?

Partially stems from #4348.

Related issues

Actions

Copy link

Updated by Pavel Kácha about 6 years ago

Related to Feature #4348: Better support for sparse columns added

Actions

Copy link

Updated by Radko Krkoš about 6 years ago

Tests reveal that converting empty arrays to NULLs have no impact on table size and a negligible impact on the combined GIN index (around 1MB). No performance testing was done to assess the impact of somewhat differently organized new index.

Actions

Copy link

Updated by Radko Krkoš about 6 years ago

Radko Krkoš wrote:

around 1MB

This was tested on mentat-dev, so the combined GIN index is about 450MB in size.

Actions

Copy link

Updated by Pavel Kácha about 6 years ago

By using my crystal sphere I presume the performance impacts would be also negligible.

Are there any other strong pro/con arguments, save for consistency?

Actions

Copy link

Updated by Radko Krkoš about 6 years ago

Pavel Kácha wrote:

By using my crystal sphere I presume the performance impacts would be also negligible.

The (manual) testing has progressed. It seems there is a performance improvement. Of course not a large one (originally in 1.5s - 2s range, now 60ms - 500ms) but an interesting one I would say (and with somewhat lower overall load - fewer workers).

Disclaimer: These are not final numbers, the impact of ENUMs must be ruled out. Nevertheless, this seems to be universal for all newly NULLed columns.

Are there any other strong pro/con arguments, save for consistency?

I do not see any strong ones. I fear we have collected almost all the low-hanging fruit by now (maybe except for #4275, but that is WiP; and low-hanging read: considerable performance benefit, not ease of implementation). Expect no further silver bullets.

Actions

Copy link