Project

General

Profile

Actions

Feature #6413

closed

Feature #6256: Review possibilities of support of timeline calculation on db

Autarkic DB queries for timeline

Added by Radko Krkoš over 4 years ago. Updated over 1 year ago.

Status:
Closed
Priority:
Normal
Category:
Development - Database
Target version:
Start date:
07/07/2020
Due date:
% Done:

0%

Estimated time:
To be discussed:

Description

The current implementation of timeline calculation, based on #6308, is divided into the database part and Python part. The main issue is that there are multiple DB queries performed for any of the views, combined afterwards in Python, what is unfit for caching (#6310, #6332). The queries should be self-contained to support those use-cases. The common table expressions (CTE) syntax should help with this.


Files

timeline_cte.txt (204 KB) timeline_cte.txt Radko Krkoš, 07/07/2020 02:56 PM

Related issues

Related to Mentat - Feature #6308: Reimplement timeline calculations on databaseClosedJan Mach04/09/2020

Actions
Related to Mentat - Feature #6257: Review calculated statistics in timelineClosedJakub Maloštik03/09/2020

Actions
Related to Mentat - Config #6251: Rethink the indices and planner cost optimizations in light of the current usage patternsDeferredRadko Krkoš03/05/2020

Actions
Precedes Mentat - Feature #6310: Implement result cachingClosed07/08/202007/08/2020

Actions
Precedes Mentat - Feature #6332: Improve searching with caching and JavaScriptFeedbackJakub Maloštik07/08/202007/08/2020

Actions
Actions #1

Updated by Radko Krkoš over 4 years ago

Actions #2

Updated by Radko Krkoš over 4 years ago

  • Precedes Feature #6332: Improve searching with caching and JavaScript added
Actions #3

Updated by Radko Krkoš over 4 years ago

  • Related to Feature #6308: Reimplement timeline calculations on database added
Actions #4

Updated by Radko Krkoš over 4 years ago

  • Related to Feature #6257: Review calculated statistics in timeline added
Actions #5

Updated by Radko Krkoš over 4 years ago

Using CTE, the basic query was extended as follows:

WITH
timeline AS (SELECT generate_series(\since, \until - INTERVAL '1ms', \interval) AS bucket),
total AS (SELECT COUNT(*) AS total FROM events WHERE "detecttime" >= \since AND "detecttime" < \until),
raw AS (SELECT \since + \interval * (width_bucket(detecttime, (SELECT array_agg(bucket) FROM timeline)) - 1) AS bucket, unnest(\set) AS set, COUNT(*) AS count FROM events WHERE "detecttime" >= \since AND "detecttime" < \until GROUP BY bucket, set),
sums AS (SELECT raw.set AS set, SUM(raw.count) AS sum FROM raw GROUP BY raw.set ORDER BY sum DESC)
(SELECT bucket, set, COALESCE(count, 0) AS count, NULL AS share FROM (timeline FULL JOIN sums ON TRUE) LEFT JOIN raw USING (bucket, set) ORDER BY bucket ASC, sums.sum DESC) UNION ALL SELECT NULL, sums.set, sums.sum, sums.sum/total.total::decimal FROM sums, total UNION ALL SELECT NULL, NULL, total.total, NULL FROM total;

with parameters \since, \until, \interval and \set, which generates all the data required for the specific view, with ARRAY and no LIMITing (e.g. category), zero-filled and sorted. See attachment timeline_cte.txt for example output. An optimization is possible for non-array column statistics and for LIMITing the query must be extended.
The runtime (on mentat-alt) is about 21.5s (median estimate) for 6 days and 1 hour interval as compared to the original query that took ~20.2s (median estimate).
Also an error is fixed here, where the interval upper bound was inclusive, resulting in several events getting in with detecttime = \until, quite commonly as some detectors report times with second granularity.

Actions #6

Updated by Radko Krkoš over 4 years ago

  • Subject changed from Autarchic DB queries for timeline to Autarkic DB queries for timeline
Actions #7

Updated by Radko Krkoš over 4 years ago

The total count query would be as follows:

WITH
timeline AS (SELECT generate_series(\since, \until - INTERVAL '1ms', \interval) AS bucket),
raw AS (SELECT \since + \interval * (width_bucket(detecttime, (SELECT array_agg(bucket) FROM timeline)) - 1) AS bucket, COUNT(*) AS count FROM events WHERE "detecttime" >= \since AND "detecttime" < \until GROUP BY bucket)
(SELECT bucket, COALESCE(count, 0) AS count FROM timeline LEFT JOIN raw USING (bucket) ORDER BY bucket ASC) UNION ALL SELECT NULL, SUM(count) FROM raw;

The run times are indistinguishable from the original version (both about 2s) on mentat-alt.

Actions #8

Updated by Radko Krkoš over 4 years ago

The limiting query should be as follows;

WITH
timeline AS (SELECT generate_series(\since, \until - INTERVAL '1ms', \interval) AS bucket),
toplist AS (SELECT unnest(\set) AS set, COUNT(*) AS sum FROM events WHERE "detecttime" >= \since AND "detecttime" < \until GROUP BY set ORDER BY sum DESC LIMIT \limit),
total AS (SELECT COUNT(*) AS total FROM events WHERE "detecttime" >= \since AND "detecttime" < \until),
raw AS (SELECT \since + \interval * (width_bucket(detecttime, (SELECT array_agg(bucket) FROM timeline)) - 1) AS bucket, unnest(\set) AS set, COUNT(*) AS count FROM events WHERE "detecttime" >= \since AND "detecttime" < \until GROUP BY bucket, set)
(SELECT bucket, set, COALESCE(count, 0) AS count, NULL AS share FROM (timeline FULL JOIN toplist ON TRUE) LEFT JOIN raw USING (bucket, set) ORDER BY bucket ASC, toplist.sum DESC) UNION ALL SELECT NULL, toplist.set, toplist.sum, toplist.sum/total.total::decimal FROM toplist, total UNION ALL SELECT NULL, NULL, total.total, NULL FROM total;

This adds the parameter \limit. The run time grew from about 46.5s to about 49s (median estimate) for 6 days, 1 hour intervals on mentat-alt.

Actions #9

Updated by Radko Krkoš about 4 years ago

  • Related to Config #6251: Rethink the indices and planner cost optimizations in light of the current usage patterns added
Actions #10

Updated by Radko Krkoš about 4 years ago

  • To be discussed changed from Yes to No
Actions #11

Updated by Radko Krkoš about 4 years ago

  • Status changed from New to Deferred
  • Assignee deleted (Radko Krkoš)
  • To be discussed deleted (No)

The design is mostly done. This should be implemented one day, but currently there are more pressing issues.

Actions #12

Updated by Pavel Kácha about 2 years ago

  • Assignee set to Jakub Maloštik
Actions #13

Updated by Jakub Maloštik about 2 years ago

  • Status changed from Deferred to In Progress
Actions #14

Updated by Jakub Maloštik about 2 years ago

After updating the queries, with variables set as follows:

\set since '\'2022-12-01T00:00:00\'::timestamp'
\set first_step '\'2022-12-01T00:00:00\'::timestamp'
\set until '\'2022-12-07T00:00:00\'::timestamp'
\set interval 'INTERVAL \'1h\''
\set set '"category"'
\set limit 100

The timings were:
WITH
timeline AS (SELECT generate_series(:first_step, :until - INTERVAL '1 microsecond', :interval) AS bucket),
total AS (SELECT COUNT(*) AS total FROM events WHERE "detecttime" >= :since AND "detecttime" < :until),
raw AS (SELECT GREATEST(:since, :first_step + :interval * (width_bucket(detecttime, (SELECT array_agg(bucket) FROM timeline)) - 1)) AS bucket, unnest(:set) AS set, COUNT(*) AS count FROM events WHERE "detecttime" >= :since AND "detecttime" < :until GROUP BY bucket, set),
sums AS (SELECT raw.set AS set, SUM(raw.count) AS sum FROM raw GROUP BY raw.set ORDER BY sum DESC)
(SELECT bucket, set, COALESCE(count, 0) AS count, NULL AS share FROM (timeline FULL JOIN sums ON TRUE) LEFT JOIN raw USING (bucket, set) ORDER BY bucket ASC, sums.sum DESC) UNION ALL SELECT NULL, sums.set, sums.sum, sums.sum/total.total::decimal FROM sums, total UNION ALL SELECT NULL, NULL, total.total, NULL FROM total;

-- ~7.9s

SELECT GREATEST(:since, :first_step + :interval * (width_bucket(detecttime, (SELECT array_agg(buckets) FROM generate_series(:first_step, :until - INTERVAL '1 microsecond', :interval) AS buckets)) - 1)) AS bucket, unnest(:set) AS set, COUNT(*) FROM events WHERE "detecttime" >= :since AND "detecttime" < :until GROUP BY bucket, set ORDER BY bucket ASC;

-- ~10.9s

--------------------------------

WITH
timeline AS (SELECT generate_series(:first_step, :until - INTERVAL '1 microsecond', :interval) AS bucket),
raw AS (SELECT GREATEST(:since, :first_step + :interval * (width_bucket(detecttime, (SELECT array_agg(bucket) FROM timeline)) - 1)) AS bucket, COUNT(*) AS count FROM events WHERE "detecttime" >= :since AND "detecttime" < :until GROUP BY bucket)
(SELECT bucket, COALESCE(count, 0) AS count FROM timeline LEFT JOIN raw USING (bucket) ORDER BY bucket ASC) UNION ALL SELECT NULL, SUM(count) FROM raw;

-- ~3.0s

SELECT GREATEST(:since, :first_step + :interval * (width_bucket(detecttime, (SELECT array_agg(buckets) FROM generate_series(:first_step, :until - INTERVAL '1 microsecond', :interval) AS buckets)) - 1)) AS bucket, COUNT(*) FROM events WHERE "detecttime" >= :since AND "detecttime" < :until GROUP BY bucket ORDER BY bucket ASC;

-- ~3.0s

--------------------------------

WITH
timeline AS (SELECT generate_series(:first_step, :until - INTERVAL '1 microsecond', :interval) AS bucket),
toplist AS (SELECT unnest(:set) AS set, COUNT(*) AS sum FROM events WHERE "detecttime" >= :since AND "detecttime" < :until GROUP BY set ORDER BY sum DESC LIMIT :limit),
total AS (SELECT COUNT(*) AS total FROM events WHERE "detecttime" >= :since AND "detecttime" < :until),
raw AS (SELECT GREATEST(:since, :first_step + :interval * (width_bucket(detecttime, (SELECT array_agg(bucket) FROM timeline)) - 1)) AS bucket, unnest(:set) AS set, COUNT(*) AS count FROM events WHERE "detecttime" >= :since AND "detecttime" < :until GROUP BY bucket, set)
(SELECT bucket, set, COALESCE(count, 0) AS count, NULL AS share FROM (timeline FULL JOIN toplist ON TRUE) LEFT JOIN raw USING (bucket, set) ORDER BY bucket ASC, toplist.sum DESC) UNION ALL SELECT NULL, toplist.set, toplist.sum, toplist.sum/total.total::decimal FROM toplist, total UNION ALL SELECT NULL, NULL, total.total, NULL FROM total;

-- ~10.9s

SELECT dist.bucket AS bucket, dist.set AS set, dist.count AS count FROM (SELECT GREATEST(:since, :first_step + :interval * (width_bucket(detecttime, (SELECT array_agg(buckets) FROM generate_series(:first_step, :until - INTERVAL '1 microsecond', :interval) AS buckets)) - 1)) AS bucket, unnest(:set) AS set, COUNT(*) FROM events WHERE "detecttime" >= :since AND "detecttime" < :until GROUP BY bucket, set ORDER BY bucket ASC) AS dist INNER JOIN (SELECT unnest(:set) AS set, COUNT(*) FROM events WHERE "detecttime" >= :since AND "detecttime" < :until GROUP BY set ORDER BY COUNT(*) DESC LIMIT :limit) AS toplist USING (set);

-- ~14.7s

Actions #15

Updated by Radko Krkoš about 2 years ago

  • To be discussed set to Yes

A follow-up from the VC:
Regarding the incompleteness of the query, that was my mistake, the relative counts are only included for the last few columns - only where needed for the current timeline output.
So, only the total times should be measured. Probably the complete Python code with DB calls versus sum of the new DB calls.

Actions #16

Updated by Jakub Maloštik over 1 year ago

  • Status changed from In Progress to Closed

After couple of revisions, the queries are as follows:

-- Only counts
WITH
timeline AS (SELECT * FROM (SELECT '2023-04-01T00:00:00'::timestamp AS bucket UNION SELECT generate_series('2023-04-01T00:00:00'::timestamp, '2023-04-07T00:00:00'::timestamp - INTERVAL '1 microsecond', '0 days 3600.000000 seconds'::interval) AS bucket) AS t ORDER BY bucket),
raw AS (SELECT GREATEST('2023-04-01T00:00:00'::timestamp, '2023-04-01T00:00:00'::timestamp + '0 days 3600.000000 seconds'::interval * (width_bucket(detecttime, (SELECT array_agg(bucket) FROM generate_series('2023-04-01T00:00:00'::timestamp, '2023-04-07T00:00:00'::timestamp - INTERVAL '1 microsecond', '0 days 3600.000000 seconds'::interval) AS bucket)) - 1)) AS bucket, COUNT(*) AS count FROM "events" WHERE "detecttime" >= '2023-04-01T00:00:00'::timestamp AND "detecttime" < '2023-04-07T00:00:00'::timestamp GROUP BY bucket)
(SELECT timeline.bucket, COALESCE(count, 0) AS count FROM "timeline" LEFT JOIN "raw" ON timeline.bucket = raw.bucket ORDER BY bucket ASC) UNION ALL SELECT NULL, SUM(count)::bigint FROM "raw";

-- Simple grouping by set without toplist

WITH
timeline AS (SELECT * FROM (SELECT '2023-04-01T00:00:00'::timestamp AS bucket UNION SELECT generate_series('2023-04-01T00:00:00'::timestamp, '2023-04-07T00:00:00'::timestamp - INTERVAL '1 microsecond', '0 days 3600.000000 seconds'::interval) AS bucket) AS t ORDER BY bucket),
total_events AS (SELECT COUNT(*) AS total FROM events WHERE "detecttime" >= '2023-04-01T00:00:00'::timestamp AND "detecttime" < '2023-04-07T00:00:00'::timestamp),
raw AS (SELECT GREATEST('2023-04-01T00:00:00'::timestamp, '2023-04-01T00:00:00'::timestamp + '0 days 3600.000000 seconds'::interval * (width_bucket(detecttime, (SELECT array_agg(bucket) FROM generate_series('2023-04-01T00:00:00'::timestamp, '2023-04-07T00:00:00'::timestamp - INTERVAL '1 microsecond', '0 days 3600.000000 seconds'::interval) AS bucket)) - 1)) AS bucket, "eventclass" AS set, COUNT(*) AS count FROM "events" WHERE "detecttime" >= '2023-04-01T00:00:00'::timestamp AND "detecttime" < '2023-04-07T00:00:00'::timestamp GROUP BY bucket, set),
sums AS (SELECT raw.set::text AS set, SUM(raw.count)::bigint AS sum FROM raw GROUP BY raw.set ORDER BY sum DESC)
(SELECT timeline.bucket, "sums".set, COALESCE(count, 0) AS count FROM ("timeline" FULL JOIN "sums" ON TRUE) LEFT JOIN "raw" ON "timeline".bucket = "raw".bucket AND "sums".set IS NOT DISTINCT FROM "raw".set ORDER BY bucket ASC, "sums".sum DESC) UNION ALL SELECT NULL, "sums".set, "sums".sum FROM "sums" UNION ALL SELECT NULL, NULL, total FROM "total_events" UNION ALL SELECT NULL, NULL, total FROM total_events;

-- Grouping of set that is an array without toplist
WITH
timeline AS (SELECT * FROM (SELECT '2023-04-01T00:00:00'::timestamp AS bucket UNION SELECT generate_series('2023-04-01T00:00:00'::timestamp, '2023-04-07T00:00:00'::timestamp - INTERVAL '1 microsecond', '0 days 3600.000000 seconds'::interval) AS bucket) AS t ORDER BY bucket),
total AS (SELECT COALESCE(SUM(CARDINALITY("category")), 0) AS total FROM events WHERE "detecttime" >= '2023-04-01T00:00:00'::timestamp AND "detecttime" < '2023-04-07T00:00:00'::timestamp),
total_events AS (SELECT COUNT(*) AS total FROM events WHERE "detecttime" >= '2023-04-01T00:00:00'::timestamp AND "detecttime" < '2023-04-07T00:00:00'::timestamp),
raw AS (SELECT GREATEST('2023-04-01T00:00:00'::timestamp, '2023-04-01T00:00:00'::timestamp + '0 days 3600.000000 seconds'::interval * (width_bucket(detecttime, (SELECT array_agg(bucket) FROM generate_series('2023-04-01T00:00:00'::timestamp, '2023-04-07T00:00:00'::timestamp - INTERVAL '1 microsecond', '0 days 3600.000000 seconds'::interval) AS bucket)) - 1)) AS bucket, unnest("category") AS set, COUNT(*) AS count FROM "events" WHERE "detecttime" >= '2023-04-01T00:00:00'::timestamp AND "detecttime" < '2023-04-07T00:00:00'::timestamp GROUP BY bucket, set),
sums AS (SELECT raw.set::text AS set, SUM(raw.count)::bigint AS sum FROM raw GROUP BY raw.set ORDER BY sum DESC)
(SELECT timeline.bucket, "sums".set, COALESCE(count, 0) AS count FROM ("timeline" FULL JOIN "sums" ON TRUE) LEFT JOIN "raw" ON "timeline".bucket = "raw".bucket AND "sums".set IS NOT DISTINCT FROM "raw".set ORDER BY bucket ASC, "sums".sum DESC) UNION ALL SELECT NULL, "sums".set, "sums".sum FROM "sums" UNION ALL SELECT NULL, NULL, total FROM "total" UNION ALL SELECT NULL, NULL, total FROM total_events

-- Grouping of set that is an array with toplist
WITH
timeline AS (SELECT * FROM (SELECT '2023-04-01T00:00:00'::timestamp AS bucket UNION SELECT generate_series('2023-04-01T00:00:00'::timestamp, '2023-04-07T00:00:00'::timestamp - INTERVAL '1 microsecond', '0 days 3600.000000 seconds'::interval) AS bucket) AS t ORDER BY bucket),
total AS (SELECT COALESCE(SUM(CARDINALITY("target_port")), 0) AS total FROM events WHERE "detecttime" >= '2023-04-01T00:00:00'::timestamp AND "detecttime" < '2023-04-07T00:00:00'::timestamp),
total_events AS (SELECT COUNT(*) AS total FROM events WHERE "detecttime" >= '2023-04-01T00:00:00'::timestamp AND "detecttime" < '2023-04-07T00:00:00'::timestamp),
toplist AS (SELECT unnest("target_port") AS set, COUNT(*) AS sum FROM events WHERE "detecttime" >= '2023-04-01T00:00:00'::timestamp AND "detecttime" < '2023-04-07T00:00:00'::timestamp GROUP BY set ORDER BY sum DESC LIMIT 100),
toplist_with_rest AS (SELECT set::text, sum FROM toplist UNION (SELECT '__REST__' as set, total - SUM(sum)::bigint as sum FROM total, toplist GROUP BY total HAVING total - SUM(sum)::bigint > 0) ORDER BY sum DESC),
raw AS (SELECT GREATEST('2023-04-01T00:00:00'::timestamp, '2023-04-01T00:00:00'::timestamp + '0 days 3600.000000 seconds'::interval * (width_bucket(detecttime, (SELECT array_agg(bucket) FROM generate_series('2023-04-01T00:00:00'::timestamp, '2023-04-07T00:00:00'::timestamp - INTERVAL '1 microsecond', '0 days 3600.000000 seconds'::interval) AS bucket)) - 1)) AS bucket, set, COUNT(*) AS count FROM (SELECT detecttime, unnest("target_port") AS set FROM events WHERE "detecttime" >= '2023-04-01T00:00:00'::timestamp AND "detecttime" < '2023-04-07T00:00:00'::timestamp) top_events INNER JOIN toplist USING (set) GROUP BY bucket, set),
raw_with_rest AS (SELECT bucket, set::text, count FROM raw UNION ALL SELECT bucket, '__REST__' AS set, raw_totals.count - raw_sums.count AS count FROM (SELECT GREATEST('2023-04-01T00:00:00'::timestamp, '2023-04-01T00:00:00'::timestamp + '0 days 3600.000000 seconds'::interval * (width_bucket(detecttime, (SELECT array_agg(bucket) FROM generate_series('2023-04-01T00:00:00'::timestamp, '2023-04-07T00:00:00'::timestamp - INTERVAL '1 microsecond', '0 days 3600.000000 seconds'::interval) AS bucket)) - 1)) AS bucket, SUM(CARDINALITY("target_port")) AS count FROM events WHERE "detecttime" >= '2023-04-01T00:00:00'::timestamp AND "detecttime" < '2023-04-07T00:00:00'::timestamp GROUP BY bucket) raw_totals FULL JOIN (SELECT bucket, SUM(count)::bigint AS count FROM raw GROUP BY bucket) raw_sums USING (bucket))
(SELECT timeline.bucket, "toplist_with_rest".set, COALESCE(count, 0) AS count FROM ("timeline" FULL JOIN "toplist_with_rest" ON TRUE) LEFT JOIN "raw_with_rest" ON "timeline".bucket = "raw_with_rest".bucket AND "toplist_with_rest".set IS NOT DISTINCT FROM "raw_with_rest".set ORDER BY bucket ASC, "toplist_with_rest".sum DESC) UNION ALL SELECT NULL, "toplist_with_rest".set, "toplist_with_rest".sum FROM "toplist_with_rest" UNION ALL SELECT NULL, NULL, total FROM "total" UNION ALL SELECT NULL, NULL, total FROM total_events

Actions #17

Updated by Jakub Maloštik over 1 year ago

After couple of revisions, the queries are as follows:

-- Only counts
WITH
timeline AS (SELECT * FROM (SELECT '2023-04-01T00:00:00'::timestamp AS bucket UNION SELECT generate_series('2023-04-01T00:00:00'::timestamp, '2023-04-07T00:00:00'::timestamp - INTERVAL '1 microsecond', '0 days 3600.000000 seconds'::interval) AS bucket) AS t ORDER BY bucket),
raw AS (SELECT GREATEST('2023-04-01T00:00:00'::timestamp, '2023-04-01T00:00:00'::timestamp + '0 days 3600.000000 seconds'::interval * (width_bucket(detecttime, (SELECT array_agg(bucket) FROM generate_series('2023-04-01T00:00:00'::timestamp, '2023-04-07T00:00:00'::timestamp - INTERVAL '1 microsecond', '0 days 3600.000000 seconds'::interval) AS bucket)) - 1)) AS bucket, COUNT(*) AS count FROM "events" WHERE "detecttime" >= '2023-04-01T00:00:00'::timestamp AND "detecttime" < '2023-04-07T00:00:00'::timestamp GROUP BY bucket)
(SELECT timeline.bucket, COALESCE(count, 0) AS count FROM "timeline" LEFT JOIN "raw" ON timeline.bucket = raw.bucket ORDER BY bucket ASC) UNION ALL SELECT NULL, SUM(count)::bigint FROM "raw";

-- Simple grouping by set without toplist
WITH
timeline AS (SELECT * FROM (SELECT '2023-04-01T00:00:00'::timestamp AS bucket UNION SELECT generate_series('2023-04-01T00:00:00'::timestamp, '2023-04-07T00:00:00'::timestamp - INTERVAL '1 microsecond', '0 days 3600.000000 seconds'::interval) AS bucket) AS t ORDER BY bucket),
total_events AS (SELECT COUNT(*) AS total FROM events WHERE "detecttime" >= '2023-04-01T00:00:00'::timestamp AND "detecttime" < '2023-04-07T00:00:00'::timestamp),
raw AS (SELECT GREATEST('2023-04-01T00:00:00'::timestamp, '2023-04-01T00:00:00'::timestamp + '0 days 3600.000000 seconds'::interval * (width_bucket(detecttime, (SELECT array_agg(bucket) FROM generate_series('2023-04-01T00:00:00'::timestamp, '2023-04-07T00:00:00'::timestamp - INTERVAL '1 microsecond', '0 days 3600.000000 seconds'::interval) AS bucket)) - 1)) AS bucket, "eventclass" AS set, COUNT(*) AS count FROM "events" WHERE "detecttime" >= '2023-04-01T00:00:00'::timestamp AND "detecttime" < '2023-04-07T00:00:00'::timestamp GROUP BY bucket, set),
sums AS (SELECT raw.set::text AS set, SUM(raw.count)::bigint AS sum FROM raw GROUP BY raw.set ORDER BY sum DESC)
(SELECT timeline.bucket, "sums".set, COALESCE(count, 0) AS count FROM ("timeline" FULL JOIN "sums" ON TRUE) LEFT JOIN "raw" ON "timeline".bucket = "raw".bucket AND "sums".set IS NOT DISTINCT FROM "raw".set ORDER BY bucket ASC, "sums".sum DESC) UNION ALL SELECT NULL, "sums".set, "sums".sum FROM "sums" UNION ALL SELECT NULL, NULL, total FROM "total_events" UNION ALL SELECT NULL, NULL, total FROM total_events;

-- Grouping of set that is an array without toplist
WITH
timeline AS (SELECT * FROM (SELECT '2023-04-01T00:00:00'::timestamp AS bucket UNION SELECT generate_series('2023-04-01T00:00:00'::timestamp, '2023-04-07T00:00:00'::timestamp - INTERVAL '1 microsecond', '0 days 3600.000000 seconds'::interval) AS bucket) AS t ORDER BY bucket),
total AS (SELECT COALESCE(SUM(CARDINALITY("category")), 0) AS total FROM events WHERE "detecttime" >= '2023-04-01T00:00:00'::timestamp AND "detecttime" < '2023-04-07T00:00:00'::timestamp),
total_events AS (SELECT COUNT(*) AS total FROM events WHERE "detecttime" >= '2023-04-01T00:00:00'::timestamp AND "detecttime" < '2023-04-07T00:00:00'::timestamp),
raw AS (SELECT GREATEST('2023-04-01T00:00:00'::timestamp, '2023-04-01T00:00:00'::timestamp + '0 days 3600.000000 seconds'::interval * (width_bucket(detecttime, (SELECT array_agg(bucket) FROM generate_series('2023-04-01T00:00:00'::timestamp, '2023-04-07T00:00:00'::timestamp - INTERVAL '1 microsecond', '0 days 3600.000000 seconds'::interval) AS bucket)) - 1)) AS bucket, unnest("category") AS set, COUNT(*) AS count FROM "events" WHERE "detecttime" >= '2023-04-01T00:00:00'::timestamp AND "detecttime" < '2023-04-07T00:00:00'::timestamp GROUP BY bucket, set),
sums AS (SELECT raw.set::text AS set, SUM(raw.count)::bigint AS sum FROM raw GROUP BY raw.set ORDER BY sum DESC)
(SELECT timeline.bucket, "sums".set, COALESCE(count, 0) AS count FROM ("timeline" FULL JOIN "sums" ON TRUE) LEFT JOIN "raw" ON "timeline".bucket = "raw".bucket AND "sums".set IS NOT DISTINCT FROM "raw".set ORDER BY bucket ASC, "sums".sum DESC) UNION ALL SELECT NULL, "sums".set, "sums".sum FROM "sums" UNION ALL SELECT NULL, NULL, total FROM "total" UNION ALL SELECT NULL, NULL, total FROM total_events

-- Grouping of set that is an array with toplist
WITH
timeline AS (SELECT * FROM (SELECT '2023-04-01T00:00:00'::timestamp AS bucket UNION SELECT generate_series('2023-04-01T00:00:00'::timestamp, '2023-04-07T00:00:00'::timestamp - INTERVAL '1 microsecond', '0 days 3600.000000 seconds'::interval) AS bucket) AS t ORDER BY bucket),
total AS (SELECT COALESCE(SUM(CARDINALITY("target_port")), 0) AS total FROM events WHERE "detecttime" >= '2023-04-01T00:00:00'::timestamp AND "detecttime" < '2023-04-07T00:00:00'::timestamp),
total_events AS (SELECT COUNT(*) AS total FROM events WHERE "detecttime" >= '2023-04-01T00:00:00'::timestamp AND "detecttime" < '2023-04-07T00:00:00'::timestamp),
toplist AS (SELECT unnest("target_port") AS set, COUNT(*) AS sum FROM events WHERE "detecttime" >= '2023-04-01T00:00:00'::timestamp AND "detecttime" < '2023-04-07T00:00:00'::timestamp GROUP BY set ORDER BY sum DESC LIMIT 100),
toplist_with_rest AS (SELECT set::text, sum FROM toplist UNION (SELECT '__REST__' as set, total - SUM(sum)::bigint as sum FROM total, toplist GROUP BY total HAVING total - SUM(sum)::bigint > 0) ORDER BY sum DESC),
raw AS (SELECT GREATEST('2023-04-01T00:00:00'::timestamp, '2023-04-01T00:00:00'::timestamp + '0 days 3600.000000 seconds'::interval * (width_bucket(detecttime, (SELECT array_agg(bucket) FROM generate_series('2023-04-01T00:00:00'::timestamp, '2023-04-07T00:00:00'::timestamp - INTERVAL '1 microsecond', '0 days 3600.000000 seconds'::interval) AS bucket)) - 1)) AS bucket, set, COUNT(*) AS count FROM (SELECT detecttime, unnest("target_port") AS set FROM events WHERE "detecttime" >= '2023-04-01T00:00:00'::timestamp AND "detecttime" < '2023-04-07T00:00:00'::timestamp) top_events INNER JOIN toplist USING (set) GROUP BY bucket, set),
raw_with_rest AS (SELECT bucket, set::text, count FROM raw UNION ALL SELECT bucket, '__REST__' AS set, raw_totals.count - raw_sums.count AS count FROM (SELECT GREATEST('2023-04-01T00:00:00'::timestamp, '2023-04-01T00:00:00'::timestamp + '0 days 3600.000000 seconds'::interval * (width_bucket(detecttime, (SELECT array_agg(bucket) FROM generate_series('2023-04-01T00:00:00'::timestamp, '2023-04-07T00:00:00'::timestamp - INTERVAL '1 microsecond', '0 days 3600.000000 seconds'::interval) AS bucket)) - 1)) AS bucket, SUM(CARDINALITY("target_port")) AS count FROM events WHERE "detecttime" >= '2023-04-01T00:00:00'::timestamp AND "detecttime" < '2023-04-07T00:00:00'::timestamp GROUP BY bucket) raw_totals FULL JOIN (SELECT bucket, SUM(count)::bigint AS count FROM raw GROUP BY bucket) raw_sums USING (bucket))
(SELECT timeline.bucket, "toplist_with_rest".set, COALESCE(count, 0) AS count FROM ("timeline" FULL JOIN "toplist_with_rest" ON TRUE) LEFT JOIN "raw_with_rest" ON "timeline".bucket = "raw_with_rest".bucket AND "toplist_with_rest".set IS NOT DISTINCT FROM "raw_with_rest".set ORDER BY bucket ASC, "toplist_with_rest".sum DESC) UNION ALL SELECT NULL, "toplist_with_rest".set, "toplist_with_rest".sum FROM "toplist_with_rest" UNION ALL SELECT NULL, NULL, total FROM "total" UNION ALL SELECT NULL, NULL, total FROM total_events

Actions #18

Updated by Rajmund Hruška over 1 year ago

  • Status changed from Closed to In Progress
  • Target version changed from Backlog to 2.11
Actions #19

Updated by Rajmund Hruška over 1 year ago

  • Status changed from In Progress to In Review
Actions #20

Updated by Rajmund Hruška over 1 year ago

  • To be discussed deleted (Yes)
Actions #21

Updated by Rajmund Hruška over 1 year ago

  • Status changed from In Review to Closed
Actions

Also available in: Atom PDF