Bug #4367
closedBug #4252: Slow queries for list precalculation (heavy parallel sequential scan 8-20 sec per each 10 mins)
Enumeration tables not consistent for non array columns that can contain NULL
100%
Description
During the research of the new enumeration mechanism an edge case was omitted. The enum table update mechanism does not work correctly for columns that are not arrays and can contain NULL values, because in PostgreSQL (and SQL in general) NULL != NULL.
This does not seem to affect anything so it is just an internal consistency issue. The problem has arisen for columns cesnet_eventclass and cesnet_eventseverity.
The solution would be adding an additional UNIQUE index aimed especially at NULL values for the affected enum tables. An alternative solution would be disallowing NULL for data in enum tables, but this might not be ideal depending on the implementation (it might require further code alterations). Please discuss if you like the second solution better.
The proposed solution is as follows, running this on production systems:
DELETE FROM cesnet_eventclass WHERE data IS NULL; CREATE UNIQUE INDEX enum_cesnet_eventclass_data_null_key ON enum_cesnet_eventclass ((data IS NULL)) WHERE data IS NULL; DELETE FROM cesnet_eventseverity WHERE data IS NULL; CREATE UNIQUE INDEX enum_cesnet_eventseverity_data_null_key ON enum_cesnet_eventseverity ((data IS NULL)) WHERE data IS NULL;
and adding the UNIQUE indices to DB creation/migration scripts:
CREATE UNIQUE INDEX enum_cesnet_eventclass_data_null_key ON enum_cesnet_eventclass ((data IS NULL)) WHERE data IS NULL; CREATE UNIQUE INDEX enum_cesnet_eventseverity_data_null_key ON enum_cesnet_eventseverity ((data IS NULL)) WHERE data IS NULL;
The non-script part was already performed on dev and alt during development and testing.
Files
Related issues
Updated by Radko Krkoš about 6 years ago
- Assignee changed from Jan Mach to Radko Krkoš
Please postpone the implementation, this needs further development.
Updated by Radko Krkoš almost 6 years ago
- File 0001-Fixed-enumeration-precalculation-for-non-arrays.patch added
- Status changed from New to Resolved
- Assignee changed from Radko Krkoš to Jan Mach
- % Done changed from 0 to 90
As the mechanism to present enumerations is different from what was originally anticipated, an easy and elegant solution to this problem is available: disallowing NULLs in enum tables altogether. The attached patch fixes this for Mentat source, for fix on production the following should be used (preferably on all enum tables for consistency, these two are required for the fix):
DELETE FROM enum_cesnet_eventseverity WHERE data is NULL; ALTER TABLE enum_cesnet_eventseverity ALTER COLUMN data SET NOT NULL; ALTER TABLE enum_cesnet_eventseverity ALTER COLUMN last_seen SET NOT NULL; DELETE FROM enum_cesnet_eventclass WHERE data is NULL; ALTER TABLE enum_cesnet_eventclass ALTER COLUMN data SET NOT NULL; ALTER TABLE enum_cesnet_eventclass ALTER COLUMN last_seen SET NOT NULL;
Updated by Pavel Kácha almost 6 years ago
I'd suggest go for it, and go for it for all enum columns (severity, class, category, ?). However, that also means new migration, so Mek's opinion is also important. At least this means that NULLs sometimes came from somewhere - where from? Won't removing NULL have some side effects there? Unexpected exceptions?
Updated by Radko Krkoš almost 6 years ago
Pavel Kácha wrote:
I'd suggest go for it, and go for it for all enum columns (severity, class, category, ?).
Good.
Pavel Kácha wrote:
However, that also means new migration, so Mek's opinion is also important.
Actually it does not. The provided SQL command take care of the change, all is fixed online in a very short time. The attached patch fixes the problem for new installations.
Pavel Kácha wrote:
At least this means that NULLs sometimes came from somewhere - where from?
They came from being there. For example some events are not marked by any severity, that is represented as NULL in the database. The problem is that a mechanism based on INSERT INTO ... ON CONFLICT DO UPDATE
was used to update the enum tables and as NULL != NULL, multiple NULL records have been created, one every time the script ran and an event with no severity was observed, instead of updating the one NULL record in the enum table. This is not an issue for arrays because unnest()
on empty array results in empty set (0 rows), not NULL.
Pavel Kácha wrote:
Won't removing NULL have some side effects there? Unexpected exceptions?
As `without value` is added for every enum anyways, the NULL records in enum tables serve no purpose. This was already tested on mentat-alt
and everything works normally there.
Updated by Pavel Kácha almost 6 years ago
Radko Krkoš wrote:
Pavel Kácha wrote:
At least this means that NULLs sometimes came from somewhere - where from?
They came from being there. For example some events are not marked by any severity, that is represented as NULL in the database. The problem is that a mechanism based on
INSERT INTO ... ON CONFLICT DO UPDATE
was used to update the enum tables and as NULL != NULL, multiple NULL records have been created, one every time the script ran and an event with no severity was observed, instead of updating the one NULL record in the enum table. This is not an issue for arrays becauseunnest()
on empty array results in empty set (0 rows), not NULL.
I mean - where they came from in the first place? Why are there no empty arrays already, instead of NULLs? How come changing schema to 'not null' does not break the place, where NULL is inserted?
Updated by Radko Krkoš almost 6 years ago
- File deleted (
0001-Fixed-enumeration-precalculation-for-non-arrays.patch)
Updated by Radko Krkoš almost 6 years ago
- File 0001-Fixed-enumeration-precalculation-for-non-arrays.patch 0001-Fixed-enumeration-precalculation-for-non-arrays.patch added
- Assignee changed from Jan Mach to Radko Krkoš
Pavel Kácha wrote:
I mean - where they came from in the first place? Why are there no empty arrays already, instead of NULLs? How come changing schema to 'not null' does not break the place, where NULL is inserted?
Oh, now I see what you meant. The query to populate the enumeration tables has to be changed also, see the updated patch. Thx for pointing that out. Does it make sense like this?
Updated by Pavel Kácha almost 6 years ago
- Status changed from Resolved to New
- Assignee changed from Radko Krkoš to Jan Mach
No further inquiries from me. Mek, please review and consider application at your convenience.
Updated by Radko Krkoš almost 6 years ago
Radko Krkoš wrote:
Oh, now I see what you meant. The query to populate the enumeration tables has to be changed also, see the updated patch. Thx for pointing that out. Does it make sense like this?
Actually it still might be a bit confusing. The added filtering would only be required for non-arrays, it is added for both for consistency.
As you responded while I was typing, and I see that the point I was missing lied elsewhere, the text below is just for future reference:
Pavel Kácha wrote:
I mean - where they came from in the first place? Why are there no empty arrays already, instead of NULLs?
Short answer: Because of unnest().
Long answer: Columns cesnet_eventseverity
and cesnet_eventclass
in the events
table are of type text, all other columns where we need enumerations are of type text[]. The query to enumerate all values is different for each group, for arrays SELECT unnest(column) ...
is used, for non-arrays only SELECT column ...
. The unnest()
function expands an array into multiple rows, one for each array element. An empty array is therefore expanded into 0 rows -> nothing to insert. This is different for SELECT column ...
if the column can contain NULLs. Those would be then returned also and INSERT@ed into the enumeration table. As @NULL != NULL
, those insertions would not create a conflict, resulting in multiple @NULL@s in the enum tables (for non-arrays).
Updated by Radko Krkoš almost 6 years ago
There is still the question of migration. This might be another albeit related topic. As far as I know, we have no mechanism to alter the DB schema between Mentat versions, only (migration from MongoDB to PostgreSQL) and (PostgreSQL upgrade from 10 to 11) are supported. I do not see any support to run any ALTER TABLE
or such commands for example on migration from 2.2 to 2.3, except for database reinit, which destroys all data (events). Am I wrong here?
Updated by Pavel Kácha almost 6 years ago
Radko Krkoš wrote:
There is still the question of migration. This might be another albeit related topic. As far as I know, we have no mechanism to alter the DB schema between Mentat versions, only (migration from MongoDB to PostgreSQL) and (PostgreSQL upgrade from 10 to 11) are supported. I do not see any support to run any
ALTER TABLE
or such commands for example on migration from 2.2 to 2.3, except for database reinit, which destroys all data (events). Am I wrong here?
Mek, please advise Radko how to do that properly? Also, if transitions are invasive (need change in python code), they could be batched into some other db updates, if there are any in the queue.
Updated by Jan Mach almost 6 years ago
- Blocked by Feature #4230: Make use of existing or implement own mechanism for handling SQL schema migrations. added
Updated by Jan Mach almost 6 years ago
- Status changed from New to In Progress
- % Done changed from 90 to 100
I have applied the attached patch to source code. Code was then deployed to mentat-alt server.
Updated by REST Automat Admin almost 6 years ago
To close this issue the migration must be created, see related issue.
Updated by Jan Mach almost 6 years ago
- Status changed from In Progress to Resolved
- Assignee changed from Jan Mach to Pavel Kácha
- Target version changed from 2.4 to 2.3
This issue is now resolved by attached patches and can be closed once the blocking issue is closed.
Updated by Pavel Kácha almost 6 years ago
- Status changed from Resolved to Closed