Project

General

Profile

Actions

Feature #6413

open

Feature #6256: Review possibilities of support of timeline calculation on db

Autarkic DB queries for timeline

Added by Radko Krkoš over 2 years ago. Updated about 2 months ago.

Status:
Deferred
Priority:
Normal
Category:
Development - Database
Target version:
Start date:
07/07/2020
Due date:
% Done:

0%

Estimated time:
To be discussed:

Description

The current implementation of timeline calculation, based on #6308, is divided into the database part and Python part. The main issue is that there are multiple DB queries performed for any of the views, combined afterwards in Python, what is unfit for caching (#6310, #6332). The queries should be self-contained to support those use-cases. The common table expressions (CTE) syntax should help with this.


Files

timeline_cte.txt (204 KB) timeline_cte.txt Radko Krkoš, 07/07/2020 02:56 PM

Related issues

Related to Mentat - Feature #6308: Reimplement timeline calculations on databaseClosedJan Mach04/09/2020

Actions
Related to Mentat - Feature #6257: Review calculated statistics in timelineIn ReviewJakub Maloštik03/09/2020

Actions
Related to Mentat - Config #6251: Rethink the indices and planner cost optimizations in light of the current usage patternsDeferredRadko Krkoš03/05/2020

Actions
Precedes Mentat - Feature #6310: Implement result cachingNew07/08/202007/08/2020

Actions
Precedes Mentat - Feature #6332: Improve searching with caching and JavaScriptNew07/08/202007/08/2020

Actions
Actions #1

Updated by Radko Krkoš over 2 years ago

Actions #2

Updated by Radko Krkoš over 2 years ago

  • Precedes Feature #6332: Improve searching with caching and JavaScript added
Actions #3

Updated by Radko Krkoš over 2 years ago

  • Related to Feature #6308: Reimplement timeline calculations on database added
Actions #4

Updated by Radko Krkoš over 2 years ago

  • Related to Feature #6257: Review calculated statistics in timeline added
Actions #5

Updated by Radko Krkoš over 2 years ago

Using CTE, the basic query was extended as follows:

WITH
timeline AS (SELECT generate_series(\since, \until - INTERVAL '1ms', \interval) AS bucket),
total AS (SELECT COUNT(*) AS total FROM events WHERE "detecttime" >= \since AND "detecttime" < \until),
raw AS (SELECT \since + \interval * (width_bucket(detecttime, (SELECT array_agg(bucket) FROM timeline)) - 1) AS bucket, unnest(\set) AS set, COUNT(*) AS count FROM events WHERE "detecttime" >= \since AND "detecttime" < \until GROUP BY bucket, set),
sums AS (SELECT raw.set AS set, SUM(raw.count) AS sum FROM raw GROUP BY raw.set ORDER BY sum DESC)
(SELECT bucket, set, COALESCE(count, 0) AS count, NULL AS share FROM (timeline FULL JOIN sums ON TRUE) LEFT JOIN raw USING (bucket, set) ORDER BY bucket ASC, sums.sum DESC) UNION ALL SELECT NULL, sums.set, sums.sum, sums.sum/total.total::decimal FROM sums, total UNION ALL SELECT NULL, NULL, total.total, NULL FROM total;

with parameters \since, \until, \interval and \set, which generates all the data required for the specific view, with ARRAY and no LIMITing (e.g. category), zero-filled and sorted. See attachment timeline_cte.txt for example output. An optimization is possible for non-array column statistics and for LIMITing the query must be extended.
The runtime (on mentat-alt) is about 21.5s (median estimate) for 6 days and 1 hour interval as compared to the original query that took ~20.2s (median estimate).
Also an error is fixed here, where the interval upper bound was inclusive, resulting in several events getting in with detecttime = \until, quite commonly as some detectors report times with second granularity.

Actions #6

Updated by Radko Krkoš over 2 years ago

  • Subject changed from Autarchic DB queries for timeline to Autarkic DB queries for timeline
Actions #7

Updated by Radko Krkoš over 2 years ago

The total count query would be as follows:

WITH
timeline AS (SELECT generate_series(\since, \until - INTERVAL '1ms', \interval) AS bucket),
raw AS (SELECT \since + \interval * (width_bucket(detecttime, (SELECT array_agg(bucket) FROM timeline)) - 1) AS bucket, COUNT(*) AS count FROM events WHERE "detecttime" >= \since AND "detecttime" < \until GROUP BY bucket)
(SELECT bucket, COALESCE(count, 0) AS count FROM timeline LEFT JOIN raw USING (bucket) ORDER BY bucket ASC) UNION ALL SELECT NULL, SUM(count) FROM raw;

The run times are indistinguishable from the original version (both about 2s) on mentat-alt.

Actions #8

Updated by Radko Krkoš over 2 years ago

The limiting query should be as follows;

WITH
timeline AS (SELECT generate_series(\since, \until - INTERVAL '1ms', \interval) AS bucket),
toplist AS (SELECT unnest(\set) AS set, COUNT(*) AS sum FROM events WHERE "detecttime" >= \since AND "detecttime" < \until GROUP BY set ORDER BY sum DESC LIMIT \limit),
total AS (SELECT COUNT(*) AS total FROM events WHERE "detecttime" >= \since AND "detecttime" < \until),
raw AS (SELECT \since + \interval * (width_bucket(detecttime, (SELECT array_agg(bucket) FROM timeline)) - 1) AS bucket, unnest(\set) AS set, COUNT(*) AS count FROM events WHERE "detecttime" >= \since AND "detecttime" < \until GROUP BY bucket, set)
(SELECT bucket, set, COALESCE(count, 0) AS count, NULL AS share FROM (timeline FULL JOIN toplist ON TRUE) LEFT JOIN raw USING (bucket, set) ORDER BY bucket ASC, toplist.sum DESC) UNION ALL SELECT NULL, toplist.set, toplist.sum, toplist.sum/total.total::decimal FROM toplist, total UNION ALL SELECT NULL, NULL, total.total, NULL FROM total;

This adds the parameter \limit. The run time grew from about 46.5s to about 49s (median estimate) for 6 days, 1 hour intervals on mentat-alt.

Actions #9

Updated by Radko Krkoš about 2 years ago

  • Related to Config #6251: Rethink the indices and planner cost optimizations in light of the current usage patterns added
Actions #10

Updated by Radko Krkoš about 2 years ago

  • To be discussed changed from Yes to No
Actions #11

Updated by Radko Krkoš almost 2 years ago

  • Status changed from New to Deferred
  • Assignee deleted (Radko Krkoš)
  • To be discussed deleted (No)

The design is mostly done. This should be implemented one day, but currently there are more pressing issues.

Actions #12

Updated by Pavel Kácha about 2 months ago

  • Assignee set to Jakub Maloštik
Actions

Also available in: Atom PDF