Project

General

Profile

Bug #6210

Index usage in mentat_main database

Added by Jan Mach about 2 months ago. Updated 15 days ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
Development - Database
Target version:
Start date:
02/03/2020
Due date:
% Done:

0%

Estimated time:
To be discussed:

Description

I have noticed, that there are following tables in mentat_main database containing time sequential data without index and clustering on createtime column. Should we add them?

  • reports_events
  • statistics_events
  • changelogs_items

History

#1 Updated by Radko Krkoš about 1 month ago

  • To be discussed changed from No to Yes

That depends. I do not know the access characteristics. Are the rows therein accessed by createtime? Either way, those tables are rather small, the index would probably not help much (a filtering query on the largest one - statistics_events with a month worth of data takes under 200ms). I am also puzzled by their meaning. They seem to contain some generated data since (Mentat’s) big bang. Are those tables expected to just grow forever, with no purge over time?

#2 Updated by Jan Mach about 1 month ago

Yes, these tables are accessed according to the createtime. The statistics_events table contains precalculated IDEA event statistics for 5 minute time windows and they are used on /events/dashboard view. Worst case scenario is generating statistics for the whole previous year, that are required by our manager annually.

So far we haven`t been deleting anything to keep history, which may prove useful one day to generate some long term overview. Once we reach certain (yet unknown) threshold, we will have to start purging.

#3 Updated by Radko Krkoš 29 days ago

  • To be discussed deleted (Yes)

#4 Updated by Radko Krkoš 15 days ago

  • To be discussed set to Yes

The reports_events and statistics_events tables seem to be accessed by interval. That is implemented as a text field. I advise to drop it and instead use a combined index on (dt_from, dt_to). The attribute delta is also redundant as it is the difference of dt_to - dt_from, createtime was discussed and should be kept for bookkeeping’s sake. The id seems to serve no purpose than being an artificial primary key, this could be switched to the interval pair. Also, the JSON documents seem to only be stored, not used in queries, therefore I advise json instead of jsonb.
So, for statistics_events I would use a structure like this:

CREATE TABLE statistics_events (
  dt_from datetime NOT NULL,
  dt_to datetime NOT NULL CHECK (dt_to > dt_from),
  count integer NOT NULL,
  createtime datetime NOT NULL CHECK (createtime > dt_to),
  stats_overall json,
  stats_internal json,
  stats external json,
  PRIMARY KEY (dt_from, dt_to)
);

#5 Updated by Pavel Kácha 15 days ago

  • Status changed from New to Closed

Seems current state does not need additional indices.

The plan is to look into #6256, #6257 and changes will be more substantial, so let’s not go into cosmetics now.

#6 Updated by Pavel Kácha 15 days ago

  • To be discussed deleted (Yes)

Also available in: Atom PDF