Bug #7538


Streamline documentation concerning VACUUM/CLUSTER/ANALYZE

Added by Pavel Kácha over 2 years ago. Updated 12 months ago.

Target version:
Start date:
Due date:
% Done:


Estimated time:
To be discussed:


Stemmed from #7215:

Rajmund Hruska wrote in #note-12:

I fixed the unit tests so the only thing left is to discuss whether or not to put CLUSTER in the migration.

Pavel Kácha wrote in #note-13:

In the Upgrading docs we have "step 4", where we want admin to run VACUUM/CLUSTER/ANALYZE unconditionally on any upgrade - which we think solves it and we do not need CLUSTER in the migration.

However, we might make "step 4" conditional - "run this only if Alembic runs any migrations".

Radko, could you please confirm?

Radko Krkoš wrote in #note-15:

It is more complicated. The first part of step 4 of the "Upgrading" section indeed runs alembic migrations, first for mentat_main through flask's CLI, then on mentat_events through a helper script, then sets CLUSTER-ing indices and autovacuum parameters. All of that is idempotent and effectively no-op if there are no changes.
The second part instructs the user to run VACUUM FREEZE, CLUSTER and ANALYZE by hand. At least VACUUM FREEZE is either useful or quick, if it was not or was run recently, respectively. I would keep that one. The last one, ANALYZE is also useful and takes a short time, so skipping it adds less value than it takes.
The only non-straightforward one is the CLUSTER. That is never no-op, it can and will take a lot of time. Also, one can CLUSTER a freshly CLUSTER-ed table, running the whole process again with the same run time but no benefits. The trouble is that it is hard to decide whether a CLUSTER might be useful, based on information that the administrator has available. First, it is not obvious whether any alembic migrations will be run until the above mentioned commands are run. Next, even if there are any migrations, they might not affect the tables in a way that CLUSTER would be beneficial. Last, it is a good idea to run CLUSTER from time to time on workloads such as Mentat, where data are piped (as in running window) through the database. The need for CLUSTER should be determined based on table/index bloat.
I would go with extending the script and adding code for conditionally CLUSTER-ing only tables with excess bloat. A bloat level detection code does exist, it is reported by the system-status script.

Pavel Kácha wrote in #note-16:

Mmmkay. After discussion - I guess we might revisit tuning section of Database doc - akin to "add recommended CLUSTER run based on high bloat". Or, as you mentioned, we could consider adding this into (however I'm unsure about that, this script only sets recommended values for arbitrary runs of VACUUM/CLUSTER/ANALYZE).

Current bloat reporting script is 11k of dense Perl code from Nagios tools, Radko, isn't there something ligher on stomach?

Related issues

Related to Mentat - Bug #7215: Wrong usage of DB array in mentat_main/settings_reporting/emailsClosedRajmund Hruška04/15/2021


Also available in: Atom PDF