Bug #4367: Enumeration tables not consistent for non array columns that can contain NULL - Mentat - Homeproj: Redmine for CESNET

Actions

Copy link

Bug #4367

closed

Bug #4252: Slow queries for list precalculation (heavy parallel sequential scan 8-20 sec per each 10 mins)

Enumeration tables not consistent for non array columns that can contain NULL

Added by Radko Krkoš about 6 years ago. Updated almost 6 years ago.

Status:

Closed

Priority:

Normal

Assignee:

Pavel Kácha

Category:

Development - Core

Target version:

2.3

Start date:

10/15/2018

Due date:

% Done:

100%

Estimated time:

To be discussed:

Description

During the research of the new enumeration mechanism an edge case was omitted. The enum table update mechanism does not work correctly for columns that are not arrays and can contain NULL values, because in PostgreSQL (and SQL in general) NULL != NULL.
This does not seem to affect anything so it is just an internal consistency issue. The problem has arisen for columns cesnet_eventclass and cesnet_eventseverity.
The solution would be adding an additional UNIQUE index aimed especially at NULL values for the affected enum tables. An alternative solution would be disallowing NULL for data in enum tables, but this might not be ideal depending on the implementation (it might require further code alterations). Please discuss if you like the second solution better.

The proposed solution is as follows, running this on production systems:

DELETE FROM cesnet_eventclass WHERE data IS NULL;
CREATE UNIQUE INDEX enum_cesnet_eventclass_data_null_key ON enum_cesnet_eventclass ((data IS NULL)) WHERE data IS NULL;

DELETE FROM cesnet_eventseverity WHERE data IS NULL;
CREATE UNIQUE INDEX enum_cesnet_eventseverity_data_null_key ON enum_cesnet_eventseverity ((data IS NULL)) WHERE data IS NULL;

and adding the UNIQUE indices to DB creation/migration scripts:

CREATE UNIQUE INDEX enum_cesnet_eventclass_data_null_key ON enum_cesnet_eventclass ((data IS NULL)) WHERE data IS NULL;
CREATE UNIQUE INDEX enum_cesnet_eventseverity_data_null_key ON enum_cesnet_eventseverity ((data IS NULL)) WHERE data IS NULL;

The non-script part was already performed on dev and alt during development and testing.

Files

0001-Fixed-enumeration-precalculation-for-non-arrays.patch (2.89 KB) 0001-Fixed-enumeration-precalculation-for-non-arrays.patch

Radko Krkoš, 01/07/2019 04:00 PM

Related issues

Actions

Copy link

Updated by Radko Krkoš about 6 years ago

Assignee changed from Jan Mach to Radko Krkoš

Please postpone the implementation, this needs further development.

Actions

Copy link

Updated by Pavel Kácha about 6 years ago

Target version changed from 2.2 to 2.3

Actions

Copy link

Updated by Radko Krkoš almost 6 years ago

File 0001-Fixed-enumeration-precalculation-for-non-arrays.patch added
Status changed from New to Resolved
Assignee changed from Radko Krkoš to Jan Mach
% Done changed from 0 to 90

As the mechanism to present enumerations is different from what was originally anticipated, an easy and elegant solution to this problem is available: disallowing NULLs in enum tables altogether. The attached patch fixes this for Mentat source, for fix on production the following should be used (preferably on all enum tables for consistency, these two are required for the fix):

DELETE FROM enum_cesnet_eventseverity WHERE data is NULL;
ALTER TABLE enum_cesnet_eventseverity ALTER COLUMN data SET NOT NULL;
ALTER TABLE enum_cesnet_eventseverity ALTER COLUMN last_seen SET NOT NULL;

DELETE FROM enum_cesnet_eventclass WHERE data is NULL;
ALTER TABLE enum_cesnet_eventclass ALTER COLUMN data SET NOT NULL;
ALTER TABLE enum_cesnet_eventclass ALTER COLUMN last_seen SET NOT NULL;

Actions

Copy link

Updated by Pavel Kácha almost 6 years ago

I'd suggest go for it, and go for it for all enum columns (severity, class, category, ?). However, that also means new migration, so Mek's opinion is also important. At least this means that NULLs sometimes came from somewhere - where from? Won't removing NULL have some side effects there? Unexpected exceptions?

Actions

Copy link

Updated by Radko Krkoš almost 6 years ago

Pavel Kácha wrote:

I'd suggest go for it, and go for it for all enum columns (severity, class, category, ?).

Good.

Pavel Kácha wrote:

However, that also means new migration, so Mek's opinion is also important.

Actually it does not. The provided SQL command take care of the change, all is fixed online in a very short time. The attached patch fixes the problem for new installations.

Pavel Kácha wrote:

At least this means that NULLs sometimes came from somewhere - where from?

They came from being there. For example some events are not marked by any severity, that is represented as NULL in the database. The problem is that a mechanism based on INSERT INTO ... ON CONFLICT DO UPDATE was used to update the enum tables and as NULL != NULL, multiple NULL records have been created, one every time the script ran and an event with no severity was observed, instead of updating the one NULL record in the enum table. This is not an issue for arrays because unnest() on empty array results in empty set (0 rows), not NULL.

Pavel Kácha wrote:

Won't removing NULL have some side effects there? Unexpected exceptions?

As `without value` is added for every enum anyways, the NULL records in enum tables serve no purpose. This was already tested on mentat-alt and everything works normally there.

Actions

Copy link

Updated by Pavel Kácha almost 6 years ago

Radko Krkoš wrote:

Pavel Kácha wrote:

At least this means that NULLs sometimes came from somewhere - where from?

They came from being there. For example some events are not marked by any severity, that is represented as NULL in the database. The problem is that a mechanism based on INSERT INTO ... ON CONFLICT DO UPDATE was used to update the enum tables and as NULL != NULL, multiple NULL records have been created, one every time the script ran and an event with no severity was observed, instead of updating the one NULL record in the enum table. This is not an issue for arrays because unnest() on empty array results in empty set (0 rows), not NULL.

I mean - where they came from in the first place? Why are there no empty arrays already, instead of NULLs? How come changing schema to 'not null' does not break the place, where NULL is inserted?

Actions

Copy link

Updated by Radko Krkoš almost 6 years ago

File deleted (~~0001-Fixed-enumeration-precalculation-for-non-arrays.patch~~)

Actions

Copy link

Updated by Radko Krkoš almost 6 years ago

File 0001-Fixed-enumeration-precalculation-for-non-arrays.patch 0001-Fixed-enumeration-precalculation-for-non-arrays.patch added
Assignee changed from Jan Mach to Radko Krkoš

Pavel Kácha wrote:

I mean - where they came from in the first place? Why are there no empty arrays already, instead of NULLs? How come changing schema to 'not null' does not break the place, where NULL is inserted?

Oh, now I see what you meant. The query to populate the enumeration tables has to be changed also, see the updated patch. Thx for pointing that out. Does it make sense like this?

Actions

Copy link

Updated by Pavel Kácha almost 6 years ago

Status changed from Resolved to New
Assignee changed from Radko Krkoš to Jan Mach

No further inquiries from me. Mek, please review and consider application at your convenience.

Actions

Copy link

#10

Updated by Radko Krkoš almost 6 years ago

Radko Krkoš wrote:

Oh, now I see what you meant. The query to populate the enumeration tables has to be changed also, see the updated patch. Thx for pointing that out. Does it make sense like this?

Actually it still might be a bit confusing. The added filtering would only be required for non-arrays, it is added for both for consistency.

As you responded while I was typing, and I see that the point I was missing lied elsewhere, the text below is just for future reference:

Pavel Kácha wrote:

I mean - where they came from in the first place? Why are there no empty arrays already, instead of NULLs?

Short answer: Because of unnest().
Long answer: Columns cesnet_eventseverity and cesnet_eventclass in the events table are of type text, all other columns where we need enumerations are of type text[]. The query to enumerate all values is different for each group, for arrays SELECT unnest(column) ... is used, for non-arrays only SELECT column .... The unnest() function expands an array into multiple rows, one for each array element. An empty array is therefore expanded into 0 rows -> nothing to insert. This is different for SELECT column ... if the column can contain NULLs. Those would be then returned also and INSERT@ed into the enumeration table. As @NULL != NULL, those insertions would not create a conflict, resulting in multiple @NULL@s in the enum tables (for non-arrays).

Actions

Copy link

#11

Updated by Radko Krkoš almost 6 years ago

There is still the question of migration. This might be another albeit related topic. As far as I know, we have no mechanism to alter the DB schema between Mentat versions, only (migration from MongoDB to PostgreSQL) and (PostgreSQL upgrade from 10 to 11) are supported. I do not see any support to run any ALTER TABLE or such commands for example on migration from 2.2 to 2.3, except for database reinit, which destroys all data (events). Am I wrong here?

Actions

Copy link

#12

Updated by Pavel Kácha almost 6 years ago

Radko Krkoš wrote:

There is still the question of migration. This might be another albeit related topic. As far as I know, we have no mechanism to alter the DB schema between Mentat versions, only (migration from MongoDB to PostgreSQL) and (PostgreSQL upgrade from 10 to 11) are supported. I do not see any support to run any ALTER TABLE or such commands for example on migration from 2.2 to 2.3, except for database reinit, which destroys all data (events). Am I wrong here?

Mek, please advise Radko how to do that properly? Also, if transitions are invasive (need change in python code), they could be batched into some other db updates, if there are any in the queue.

Actions

Copy link

#13