Project

General

Profile

Actions

Bug #7215

closed

Wrong usage of DB array in mentat_main/settings_reporting/emails

Added by Radko Krkoš over 3 years ago. Updated almost 3 years ago.

Status:
Closed
Priority:
Normal
Category:
Development - Database
Target version:
Start date:
04/15/2021
Due date:
% Done:

100%

Estimated time:
To be discussed:

Description

As was discussed in VC, the emails column of table settings_reporting in the database mentat_main is declared as a character varying array, however the upper layer implementation based on Python/SQLAlchemy stores a single string value inside the array, containing comma separated emails. This was identified as a result of an oversight and should be fixed.
In addition, there are both NULL values and empty arrays present in this column, while logically equivalent, they are still distinct values; and although this might be abstracted in the upper layer, it could be improved for consistency in the DB.


Related issues

Related to Mentat - Feature #7221: Referential integrity for database arraysNewRadko Krkoš04/16/2021

Actions
Related to Mentat - Task #7538: Streamline documentation concerning VACUUM/CLUSTER/ANALYZENewRadko Krkoš11/18/2021

Actions
Blocks Mentat - Bug #7220: Erroneous values in mentat_main/reports_events/mail_toClosedRajmund Hruška04/16/2021

Actions
Actions #1

Updated by Radko Krkoš over 3 years ago

  • Assignee deleted (Jan Mach)
Actions #2

Updated by Radko Krkoš over 3 years ago

  • Category set to Development - Database
Actions #3

Updated by Radko Krkoš over 3 years ago

  • Related to Feature #7221: Referential integrity for database arrays added
Actions #4

Updated by Pavel Kácha over 3 years ago

  • Blocked by Bug #7220: Erroneous values in mentat_main/reports_events/mail_to added
Actions #5

Updated by Pavel Kácha over 3 years ago

  • Blocked by deleted (Bug #7220: Erroneous values in mentat_main/reports_events/mail_to)
Actions #6

Updated by Pavel Kácha over 3 years ago

  • Blocks Bug #7220: Erroneous values in mentat_main/reports_events/mail_to added
Actions #7

Updated by Rajmund Hruška over 3 years ago

  • Assignee set to Rajmund Hruška
  • Target version changed from Backlog to 2.9
Actions #8

Updated by Rajmund Hruška over 3 years ago

  • Status changed from New to In Progress
  • Priority changed from Low to Normal

In the code the validity of emails field in the form is already checked, so the only necessary thing to change in the code is changing attribute emails of SettingsReportingModel to not null. So the real work to do is writing a migration to fix existing data.

Actions #9

Updated by Rajmund Hruška over 3 years ago

  • Status changed from In Progress to Feedback
  • % Done changed from 0 to 80
  • To be discussed changed from No to Yes

Similarly to #7220, I came up with the following code. Radko Krkoš could you please review this code?

CREATE OR REPLACE FUNCTION alembic_33cf3078976b_trim_array(character varying[])
RETURNS character varying[]
AS
$$
DECLARE
   arr ALIAS FOR $1;
   element character varying;
   retVal character varying[];
BEGIN
   FOREACH element IN ARRAY arr
   LOOP
    retVal := array_append(retVal, trim(element)::varchar);
   END LOOP;
RETURN retVal;
END;    
$$
LANGUAGE plpgsql 
   STABLE 
RETURNS NULL ON NULL INPUT;

CREATE OR REPLACE FUNCTION alembic_33cf3078976b_split_strings(character varying[])
RETURNS character varying[]
AS
$$
DECLARE
   arr ALIAS FOR $1;
   element character varying;
   retVal character varying[];
BEGIN
   FOREACH element IN ARRAY arr
   LOOP
    retVal := retVal || alembic_33cf3078976b_trim_array(string_to_array(element, ',')::varchar[]);
   END LOOP;
RETURN retVal;
END;    
$$
LANGUAGE plpgsql 
   STABLE 
RETURNS NULL ON NULL INPUT;

UPDATE settings_reporting SET emails = alembic_33cf3078976b_split_strings(emails);
UPDATE settings_reporting SET emails = ARRAY[]::varchar[] WHERE emails IS NULL;
ALTER TABLE settings_reporting ALTER COLUMN emails SET NOT NULL;
DROP FUNCTION alembic_33cf3078976b_split_strings(character varying[]);
DROP FUNCTION alembic_33cf3078976b_trim_array(character varying[]);
CLUSTER settings_reporting;
Actions #10

Updated by Radko Krkoš over 3 years ago

This is just a variation on the fix for #7220, it looks good to me.
As these are closely related and there is even one central migration function that both of these migrations share, I would go for one migration to fix both issues. Fixing only one does not make sense as for each it would only be a partial fix. The same goes for reverting.

Actions #11

Updated by Rajmund Hruška over 3 years ago

  • Status changed from Feedback to In Progress
  • To be discussed changed from Yes to No

I agree, I put the code in the same migration file, code can be found in the branch hruska-bugfix-#7220-mail_to.

There is some issue with unit tests, it seems like NULL value is being inserted into column with NOT NULL constraint, so I am still looking into that.

Actions #12

Updated by Rajmund Hruška over 3 years ago

  • Status changed from In Progress to Feedback
  • % Done changed from 80 to 90
  • To be discussed changed from No to Yes

Rajmund Hruska wrote in #note-11:

I agree, I put the code in the same migration file, code can be found in the branch hruska-bugfix-#7220-mail_to.

There is some issue with unit tests, it seems like NULL value is being inserted into column with NOT NULL constraint, so I am still looking into that.

I fixed the unit tests so the only thing left is to discuss whether or not to put CLUSTER in the migration.

Actions #13

Updated by Pavel Kácha over 3 years ago

In the Upgrading docs we have "step 4", where we want admin to run VACUUM/CLUSTER/ANALYZE unconditionally on any upgrade - which we think solves it and we do not need CLUSTER in the migration.

However, we might make "step 4" conditional - "run this only if Alembic runs any migrations".

Radko, could you please confirm?

Actions #14

Updated by Pavel Kácha over 3 years ago

  • To be discussed deleted (Yes)
Actions #15

Updated by Radko Krkoš over 3 years ago

Pavel Kácha wrote in #note-13:

In the Upgrading docs we have "step 4", where we want admin to run VACUUM/CLUSTER/ANALYZE unconditionally on any upgrade - which we think solves it and we do not need CLUSTER in the migration.

However, we might make "step 4" conditional - "run this only if Alembic runs any migrations".

Radko, could you please confirm?

It is more complicated. The first part of step 4 of the "Upgrading" section indeed runs alembic migrations, first for mentat_main through flask's CLI, then on mentat_events through a helper script, then sets CLUSTER-ing indices and autovacuum parameters. All of that is idempotent and effectively no-op if there are no changes.
The second part instructs the user to run VACUUM FREEZE, CLUSTER and ANALYZE by hand. At least VACUUM FREEZE is either useful or quick, if it was not or was run recently, respectively. I would keep that one. The last one, ANALYZE is also useful and takes a short time, so skipping it adds less value than it takes.
The only non-straightforward one is the CLUSTER. That is never no-op, it can and will take a lot of time. Also, one can CLUSTER a freshly CLUSTER-ed table, running the whole process again with the same run time but no benefits. The trouble is that it is hard to decide whether a CLUSTER might be useful, based on information that the administrator has available. First, it is not obvious whether any alembic migrations will be run until the above mentioned commands are run. Next, even if there are any migrations, they might not affect the tables in a way that CLUSTER would be beneficial. Last, it is a good idea to run CLUSTER from time to time on workloads such as Mentat, where data are piped (as in running window) through the database. The need for CLUSTER should be determined based on table/index bloat.
I would go with extending the sqldb-optimize.sh script and adding code for conditionally CLUSTER-ing only tables with excess bloat. A bloat level detection code does exist, it is reported by the system-status script.

Actions #16

Updated by Pavel Kácha over 3 years ago

Mmmkay. After discussion - I guess we might revisit tuning section of Database doc - akin to "add recommended CLUSTER run based on high bloat". Or, as you mentioned, we could consider adding this into sqldb-optimize.sh (however I'm unsure about that, this script only sets recommended values for arbitrary runs of VACUUM/CLUSTER/ANALYZE).

Current bloat reporting script is 11k of dense Perl code from Nagios tools, Radko, isn't there something ligher on stomach?

Actions #17

Updated by Pavel Kácha about 3 years ago

  • Related to Task #7538: Streamline documentation concerning VACUUM/CLUSTER/ANALYZE added
Actions #18

Updated by Pavel Kácha about 3 years ago

Regarding #7215#note-12, we have agreed that CLUSTER (or other big operations) does not belong to migration, the rest is moved into #7538.

Actions #19

Updated by Rajmund Hruška about 3 years ago

  • Status changed from Feedback to In Review
  • % Done changed from 90 to 100

Merged into devel.

Actions #20

Updated by Pavel Kácha almost 3 years ago

  • Status changed from In Review to Closed
Actions

Also available in: Atom PDF