Task #7538: Streamline documentation concerning VACUUM/CLUSTER/ANALYZE - Mentat - Homeproj: Redmine for CESNET

Actions

Copy link

Task #7538

open

Streamline documentation concerning VACUUM/CLUSTER/ANALYZE

Added by Pavel Kácha about 3 years ago. Updated 6 months ago.

Status:

New

Priority:

Normal

Assignee:

Radko Krkoš

Category:

Documentation

Target version:

Backlog

Start date:

11/18/2021

Due date:

% Done:

Estimated time:

To be discussed:

Description

Stemmed from #7215:

Rajmund Hruska wrote in #note-12:

I fixed the unit tests so the only thing left is to discuss whether or not to put CLUSTER in the migration.

Pavel Kácha wrote in #note-13:

In the Upgrading docs we have "step 4", where we want admin to run VACUUM/CLUSTER/ANALYZE unconditionally on any upgrade - which we think solves it and we do not need CLUSTER in the migration.

However, we might make "step 4" conditional - "run this only if Alembic runs any migrations".

Radko, could you please confirm?

Radko Krkoš wrote in #note-15:

It is more complicated. The first part of step 4 of the "Upgrading" section indeed runs alembic migrations, first for mentat_main through flask's CLI, then on mentat_events through a helper script, then sets CLUSTER-ing indices and autovacuum parameters. All of that is idempotent and effectively no-op if there are no changes.
The second part instructs the user to run VACUUM FREEZE, CLUSTER and ANALYZE by hand. At least VACUUM FREEZE is either useful or quick, if it was not or was run recently, respectively. I would keep that one. The last one, ANALYZE is also useful and takes a short time, so skipping it adds less value than it takes.
The only non-straightforward one is the CLUSTER. That is never no-op, it can and will take a lot of time. Also, one can CLUSTER a freshly CLUSTER-ed table, running the whole process again with the same run time but no benefits. The trouble is that it is hard to decide whether a CLUSTER might be useful, based on information that the administrator has available. First, it is not obvious whether any alembic migrations will be run until the above mentioned commands are run. Next, even if there are any migrations, they might not affect the tables in a way that CLUSTER would be beneficial. Last, it is a good idea to run CLUSTER from time to time on workloads such as Mentat, where data are piped (as in running window) through the database. The need for CLUSTER should be determined based on table/index bloat.
I would go with extending the sqldb-optimize.sh script and adding code for conditionally CLUSTER-ing only tables with excess bloat. A bloat level detection code does exist, it is reported by the system-status script.

Pavel Kácha wrote in #note-16:

Mmmkay. After discussion - I guess we might revisit tuning section of Database doc - akin to "add recommended CLUSTER run based on high bloat". Or, as you mentioned, we could consider adding this into sqldb-optimize.sh (however I'm unsure about that, this script only sets recommended values for arbitrary runs of VACUUM/CLUSTER/ANALYZE).

Current bloat reporting script is 11k of dense Perl code from Nagios tools, Radko, isn't there something ligher on stomach?

Related issues