Project

General

Profile

Feature #6413

Feature #6256: Review possibilities of support of timeline calculation on db

Autarkic DB queries for timeline

Added by Radko Krkoš 12 months ago. Updated 7 months ago.

Status:
Deferred
Priority:
Normal
Assignee:
-
Category:
Development - Database
Target version:
Start date:
07/07/2020
Due date:
% Done:

0%

Estimated time:
To be discussed:

Description

The current implementation of timeline calculation, based on #6308, is divided into the database part and Python part. The main issue is that there are multiple DB queries performed for any of the views, combined afterwards in Python, what is unfit for caching (#6310, #6332). The queries should be self-contained to support those use-cases. The common table expressions (CTE) syntax should help with this.


Files

timeline_cte.txt (204 KB) timeline_cte.txt Radko Krkoš, 07/07/2020 02:56 PM

Related issues

Related to Mentat - Feature #6308: Reimplement timeline calculations on databaseClosedJan Mach04/09/2020

Actions
Related to Mentat - Feature #6257: Review calculated statistics in timelineNewPavel Kácha03/09/2020

Actions
Related to Mentat - Config #6251: Rethink the indices and planner cost optimizations in light of the current usage patternsIn ProgressRadko Krkoš03/05/2020

Actions
Precedes Mentat - Feature #6310: Implement result cachingNew07/08/202007/08/2020

Actions
Precedes Mentat - Feature #6332: Improve searching with caching and JavaScriptNew07/08/202007/08/2020

Actions
#1

Updated by Radko Krkoš 12 months ago

#2

Updated by Radko Krkoš 12 months ago

  • Precedes Feature #6332: Improve searching with caching and JavaScript added
#3

Updated by Radko Krkoš 12 months ago

  • Related to Feature #6308: Reimplement timeline calculations on database added
#4

Updated by Radko Krkoš 12 months ago

  • Related to Feature #6257: Review calculated statistics in timeline added
#5

Updated by Radko Krkoš 12 months ago

Using CTE, the basic query was extended as follows:

WITH
timeline AS (SELECT generate_series(\since, \until - INTERVAL '1ms', \interval) AS bucket),
total AS (SELECT COUNT(*) AS total FROM events WHERE "detecttime" >= \since AND "detecttime" < \until),
raw AS (SELECT \since + \interval * (width_bucket(detecttime, (SELECT array_agg(bucket) FROM timeline)) - 1) AS bucket, unnest(\set) AS set, COUNT(*) AS count FROM events WHERE "detecttime" >= \since AND "detecttime" < \until GROUP BY bucket, set),
sums AS (SELECT raw.set AS set, SUM(raw.count) AS sum FROM raw GROUP BY raw.set ORDER BY sum DESC)
(SELECT bucket, set, COALESCE(count, 0) AS count, NULL AS share FROM (timeline FULL JOIN sums ON TRUE) LEFT JOIN raw USING (bucket, set) ORDER BY bucket ASC, sums.sum DESC) UNION ALL SELECT NULL, sums.set, sums.sum, sums.sum/total.total::decimal FROM sums, total UNION ALL SELECT NULL, NULL, total.total, NULL FROM total;

with parameters \since, \until, \interval and \set, which generates all the data required for the specific view, with ARRAY and no LIMITing (e.g. category), zero-filled and sorted. See attachment timeline_cte.txt for example output. An optimization is possible for non-array column statistics and for LIMITing the query must be extended.
The runtime (on mentat-alt) is about 21.5s (median estimate) for 6 days and 1 hour interval as compared to the original query that took ~20.2s (median estimate).
Also an error is fixed here, where the interval upper bound was inclusive, resulting in several events getting in with detecttime = \until, quite commonly as some detectors report times with second granularity.

#6

Updated by Radko Krkoš 12 months ago

  • Subject changed from Autarchic DB queries for timeline to Autarkic DB queries for timeline
#7

Updated by Radko Krkoš 12 months ago

The total count query would be as follows:

WITH
timeline AS (SELECT generate_series(\since, \until - INTERVAL '1ms', \interval) AS bucket),
raw AS (SELECT \since + \interval * (width_bucket(detecttime, (SELECT array_agg(bucket) FROM timeline)) - 1) AS bucket, COUNT(*) AS count FROM events WHERE "detecttime" >= \since AND "detecttime" < \until GROUP BY bucket)
(SELECT bucket, COALESCE(count, 0) AS count FROM timeline LEFT JOIN raw USING (bucket) ORDER BY bucket ASC) UNION ALL SELECT NULL, SUM(count) FROM raw;

The run times are indistinguishable from the original version (both about 2s) on mentat-alt.

#8

Updated by Radko Krkoš 12 months ago

The limiting query should be as follows;

WITH
timeline AS (SELECT generate_series(\since, \until - INTERVAL '1ms', \interval) AS bucket),
toplist AS (SELECT unnest(\set) AS set, COUNT(*) AS sum FROM events WHERE "detecttime" >= \since AND "detecttime" < \until GROUP BY set ORDER BY sum DESC LIMIT \limit),
total AS (SELECT COUNT(*) AS total FROM events WHERE "detecttime" >= \since AND "detecttime" < \until),
raw AS (SELECT \since + \interval * (width_bucket(detecttime, (SELECT array_agg(bucket) FROM timeline)) - 1) AS bucket, unnest(\set) AS set, COUNT(*) AS count FROM events WHERE "detecttime" >= \since AND "detecttime" < \until GROUP BY bucket, set)
(SELECT bucket, set, COALESCE(count, 0) AS count, NULL AS share FROM (timeline FULL JOIN toplist ON TRUE) LEFT JOIN raw USING (bucket, set) ORDER BY bucket ASC, toplist.sum DESC) UNION ALL SELECT NULL, toplist.set, toplist.sum, toplist.sum/total.total::decimal FROM toplist, total UNION ALL SELECT NULL, NULL, total.total, NULL FROM total;

This adds the parameter \limit. The run time grew from about 46.5s to about 49s (median estimate) for 6 days, 1 hour intervals on mentat-alt.

#9

Updated by Radko Krkoš 8 months ago

  • Related to Config #6251: Rethink the indices and planner cost optimizations in light of the current usage patterns added
#10

Updated by Radko Krkoš 8 months ago

  • To be discussed changed from Yes to No
#11

Updated by Radko Krkoš 7 months ago

  • Status changed from New to Deferred
  • Assignee deleted (Radko Krkoš)
  • To be discussed deleted (No)

The design is mostly done. This should be implemented one day, but currently there are more pressing issues.

Also available in: Atom PDF