Bug #7215: Wrong usage of DB array in mentat_main/settings_reporting/emails - Mentat - Homeproj: Redmine for CESNET

Actions

Copy link

Bug #7215

closed

Wrong usage of DB array in mentat_main/settings_reporting/emails

Added by Radko Krkoš over 3 years ago. Updated almost 3 years ago.

Status:

Closed

Priority:

Normal

Assignee:

Rajmund Hruška

Category:

Development - Database

Target version:

2.9

Start date:

04/15/2021

Due date:

% Done:

100%

Estimated time:

To be discussed:

Description

As was discussed in VC, the emails column of table settings_reporting in the database mentat_main is declared as a character varying array, however the upper layer implementation based on Python/SQLAlchemy stores a single string value inside the array, containing comma separated emails. This was identified as a result of an oversight and should be fixed.
In addition, there are both NULL values and empty arrays present in this column, while logically equivalent, they are still distinct values; and although this might be abstracted in the upper layer, it could be improved for consistency in the DB.

Related issues

Actions

Copy link

Updated by Radko Krkoš over 3 years ago

Assignee deleted (~~Jan Mach~~)

Actions

Copy link

Updated by Radko Krkoš over 3 years ago

Category set to Development - Database

Actions

Copy link

Updated by Radko Krkoš over 3 years ago

Related to Feature #7221: Referential integrity for database arrays added

Actions

Copy link

Updated by Pavel Kácha over 3 years ago

Blocked by Bug #7220: Erroneous values in mentat_main/reports_events/mail_to added

Actions

Copy link

Updated by Pavel Kácha over 3 years ago

Blocked by deleted (Bug #7220: Erroneous values in mentat_main/reports_events/mail_to)

Actions

Copy link

Updated by Pavel Kácha over 3 years ago

Blocks Bug #7220: Erroneous values in mentat_main/reports_events/mail_to added

Actions

Copy link

Updated by Rajmund Hruška over 3 years ago

Assignee set to Rajmund Hruška
Target version changed from Backlog to 2.9

Actions

Copy link

Updated by Rajmund Hruška over 3 years ago

Status changed from New to In Progress
Priority changed from Low to Normal

In the code the validity of emails field in the form is already checked, so the only necessary thing to change in the code is changing attribute emails of SettingsReportingModel to not null. So the real work to do is writing a migration to fix existing data.

Actions

Copy link

Updated by Rajmund Hruška over 3 years ago

Status changed from In Progress to Feedback
% Done changed from 0 to 80
To be discussed changed from No to Yes

Similarly to #7220, I came up with the following code. Radko Krkoš could you please review this code?

CREATE OR REPLACE FUNCTION alembic_33cf3078976b_trim_array(character varying[])
RETURNS character varying[]
AS
$$
DECLARE
   arr ALIAS FOR $1;
   element character varying;
   retVal character varying[];
BEGIN
   FOREACH element IN ARRAY arr
   LOOP
    retVal := array_append(retVal, trim(element)::varchar);
   END LOOP;
RETURN retVal;
END;    
$$
LANGUAGE plpgsql 
   STABLE 
RETURNS NULL ON NULL INPUT;

CREATE OR REPLACE FUNCTION alembic_33cf3078976b_split_strings(character varying[])
RETURNS character varying[]
AS
$$
DECLARE
   arr ALIAS FOR $1;
   element character varying;
   retVal character varying[];
BEGIN
   FOREACH element IN ARRAY arr
   LOOP
    retVal := retVal || alembic_33cf3078976b_trim_array(string_to_array(element, ',')::varchar[]);
   END LOOP;
RETURN retVal;
END;    
$$
LANGUAGE plpgsql 
   STABLE 
RETURNS NULL ON NULL INPUT;

UPDATE settings_reporting SET emails = alembic_33cf3078976b_split_strings(emails);
UPDATE settings_reporting SET emails = ARRAY[]::varchar[] WHERE emails IS NULL;
ALTER TABLE settings_reporting ALTER COLUMN emails SET NOT NULL;
DROP FUNCTION alembic_33cf3078976b_split_strings(character varying[]);
DROP FUNCTION alembic_33cf3078976b_trim_array(character varying[]);
CLUSTER settings_reporting;

Actions

Copy link

#10

Updated by Radko Krkoš over 3 years ago

This is just a variation on the fix for #7220, it looks good to me.
As these are closely related and there is even one central migration function that both of these migrations share, I would go for one migration to fix both issues. Fixing only one does not make sense as for each it would only be a partial fix. The same goes for reverting.

Actions

Copy link

#11

Updated by Rajmund Hruška over 3 years ago

Status changed from Feedback to In Progress
To be discussed changed from Yes to No

I agree, I put the code in the same migration file, code can be found in the branch hruska-bugfix-#7220-mail_to.

There is some issue with unit tests, it seems like NULL value is being inserted into column with NOT NULL constraint, so I am still looking into that.

Actions

Copy link

#12

Updated by Rajmund Hruška over 3 years ago

Status changed from In Progress to Feedback
% Done changed from 80 to 90
To be discussed changed from No to Yes

Rajmund Hruska wrote in #note-11:

I agree, I put the code in the same migration file, code can be found in the branch hruska-bugfix-#7220-mail_to.

There is some issue with unit tests, it seems like NULL value is being inserted into column with NOT NULL constraint, so I am still looking into that.

I fixed the unit tests so the only thing left is to discuss whether or not to put CLUSTER in the migration.

Actions

Copy link

#13

Updated by Pavel Kácha over 3 years ago

In the Upgrading docs we have "step 4", where we want admin to run VACUUM/CLUSTER/ANALYZE unconditionally on any upgrade - which we think solves it and we do not need CLUSTER in the migration.

However, we might make "step 4" conditional - "run this only if Alembic runs any migrations".

Radko, could you please confirm?

Actions

Copy link

#14

Updated by Pavel Kácha over 3 years ago

To be discussed deleted (~~Yes~~)

Actions

Copy link

#15

Updated by Radko Krkoš over 3 years ago

Pavel Kácha wrote in #note-13:

In the Upgrading docs we have "step 4", where we want admin to run VACUUM/CLUSTER/ANALYZE unconditionally on any upgrade - which we think solves it and we do not need CLUSTER in the migration.

However, we might make "step 4" conditional - "run this only if Alembic runs any migrations".

Radko, could you please confirm?

It is more complicated. The first part of step 4 of the "Upgrading" section indeed runs alembic migrations, first for mentat_main through flask's CLI, then on mentat_events through a helper script, then sets CLUSTER-ing indices and autovacuum parameters. All of that is idempotent and effectively no-op if there are no changes.
The second part instructs the user to run VACUUM FREEZE, CLUSTER and ANALYZE by hand. At least VACUUM FREEZE is either useful or quick, if it was not or was run recently, respectively. I would keep that one. The last one, ANALYZE is also useful and takes a short time, so skipping it adds less value than it takes.
The only non-straightforward one is the CLUSTER. That is never no-op, it can and will take a lot of time. Also, one can CLUSTER a freshly CLUSTER-ed table, running the whole process again with the same run time but no benefits. The trouble is that it is hard to decide whether a CLUSTER might be useful, based on information that the administrator has available. First, it is not obvious whether any alembic migrations will be run until the above mentioned commands are run. Next, even if there are any migrations, they might not affect the tables in a way that CLUSTER would be beneficial. Last, it is a good idea to run CLUSTER from time to time on workloads such as Mentat, where data are piped (as in running window) through the database. The need for CLUSTER should be determined based on table/index bloat.
I would go with extending the sqldb-optimize.sh script and adding code for conditionally CLUSTER-ing only tables with excess bloat. A bloat level detection code does exist, it is reported by the system-status script.

Actions

Copy link

#16

Updated by Pavel Kácha over 3 years ago

Mmmkay. After discussion - I guess we might revisit tuning section of Database doc - akin to "add recommended CLUSTER run based on high bloat". Or, as you mentioned, we could consider adding this into sqldb-optimize.sh (however I'm unsure about that, this script only sets recommended values for arbitrary runs of VACUUM/CLUSTER/ANALYZE).

Current bloat reporting script is 11k of dense Perl code from Nagios tools, Radko, isn't there something ligher on stomach?

Actions

Copy link

#17