Streamline documentation concerning VACUUM/CLUSTER/ANALYZE
Stemmed from #7215:
Rajmund Hruska wrote in #note-12:
I fixed the unit tests so the only thing left is to discuss whether or not to put
CLUSTERin the migration.
Pavel Kácha wrote in #note-13:
In the Upgrading docs we have "step 4", where we want admin to run VACUUM/CLUSTER/ANALYZE unconditionally on any upgrade - which we think solves it and we do not need CLUSTER in the migration.
However, we might make "step 4" conditional - "run this only if Alembic runs any migrations".
Radko, could you please confirm?
Radko Krkoš wrote in #note-15:
It is more complicated. The first part of step 4 of the "Upgrading" section indeed runs
alembicmigrations, first for
flask's CLI, then on
mentat_eventsthrough a helper script, then sets
CLUSTER-ing indices and autovacuum parameters. All of that is idempotent and effectively no-op if there are no changes.
The second part instructs the user to run
ANALYZEby hand. At least
VACUUM FREEZEis either useful or quick, if it was not or was run recently, respectively. I would keep that one. The last one,
ANALYZEis also useful and takes a short time, so skipping it adds less value than it takes.
The only non-straightforward one is the
CLUSTER. That is never no-op, it can and will take a lot of time. Also, one can
CLUSTER-ed table, running the whole process again with the same run time but no benefits. The trouble is that it is hard to decide whether a
CLUSTERmight be useful, based on information that the administrator has available. First, it is not obvious whether any
alembicmigrations will be run until the above mentioned commands are run. Next, even if there are any migrations, they might not affect the tables in a way that
CLUSTERwould be beneficial. Last, it is a good idea to run
CLUSTERfrom time to time on workloads such as Mentat, where data are piped (as in running window) through the database. The need for
CLUSTERshould be determined based on table/index bloat.
I would go with extending the
sqldb-optimize.shscript and adding code for conditionally
CLUSTER-ing only tables with excess bloat. A bloat level detection code does exist, it is reported by the
Pavel Kácha wrote in #note-16:
Mmmkay. After discussion - I guess we might revisit tuning section of Database doc - akin to "add recommended CLUSTER run based on high bloat". Or, as you mentioned, we could consider adding this into sqldb-optimize.sh (however I'm unsure about that, this script only sets recommended values for arbitrary runs of VACUUM/CLUSTER/ANALYZE).
Current bloat reporting script is 11k of dense Perl code from Nagios tools, Radko, isn't there something ligher on stomach?