Project

General

Profile

Actions

Feature #4385

closed

Unify empty arrays and NULLs

Added by Pavel Kácha over 5 years ago. Updated about 5 years ago.

Status:
Rejected
Priority:
Normal
Assignee:
Category:
Design
Target version:
Start date:
10/19/2018
Due date:
% Done:

0%

Estimated time:
To be discussed:

Description

Arrays in metadata table are not consistent in usage of NULLs and empty arrays. We might benefit from unification (as we don't differentiate nonexistent and empty arrays). Question is is which is "better":

  • Empty array may have space overhead, however the column then does not need to be NULL. Also, some queries might be simpler with no need for NULL special case.
  • NULL value may have overhead for the whole column, on the other side, seems like NULLs are stored as bitmap per row, so additional bit if there are already another NULL columns is negligible. How about query specialcasing?

Partially stems from #4348.


Related issues

Related to Mentat - Feature #4348: Better support for sparse columnsClosedRadko Krkoš10/05/2018

Actions
Actions #1

Updated by Pavel Kácha over 5 years ago

  • Related to Feature #4348: Better support for sparse columns added
Actions #2

Updated by Radko Krkoš over 5 years ago

Tests reveal that converting empty arrays to NULLs have no impact on table size and a negligible impact on the combined GIN index (around 1MB). No performance testing was done to assess the impact of somewhat differently organized new index.

Actions #3

Updated by Radko Krkoš over 5 years ago

Radko Krkoš wrote:

around 1MB

This was tested on mentat-dev, so the combined GIN index is about 450MB in size.

Actions #4

Updated by Pavel Kácha over 5 years ago

By using my crystal sphere I presume the performance impacts would be also negligible.

Are there any other strong pro/con arguments, save for consistency?

Actions #5

Updated by Radko Krkoš over 5 years ago

Pavel Kácha wrote:

By using my crystal sphere I presume the performance impacts would be also negligible.

The (manual) testing has progressed. It seems there is a performance improvement. Of course not a large one (originally in 1.5s - 2s range, now 60ms - 500ms) but an interesting one I would say (and with somewhat lower overall load - fewer workers).

Disclaimer: These are not final numbers, the impact of ENUMs must be ruled out. Nevertheless, this seems to be universal for all newly NULLed columns.

Are there any other strong pro/con arguments, save for consistency?

I do not see any strong ones. I fear we have collected almost all the low-hanging fruit by now (maybe except for #4275, but that is WiP; and low-hanging read: considerable performance benefit, not ease of implementation). Expect no further silver bullets.

Actions #6

Updated by Pavel Kácha over 5 years ago

  • Status changed from New to Rejected

After discussion seems like mostly futile effort. Closing.

Actions #7

Updated by Jan Mach about 5 years ago

  • Target version changed from Backlog to Rejected
Actions

Also available in: Atom PDF