Feature #6413: Autarkic DB queries for timeline - Mentat - Homeproj: Redmine for CESNET

Actions

Copy link

Feature #6413

closed

Feature #6256: Review possibilities of support of timeline calculation on db

Autarkic DB queries for timeline

Added by Radko Krkoš over 4 years ago. Updated over 1 year ago.

Status:

Closed

Priority:

Normal

Assignee:

Jakub Maloštik

Category:

Development - Database

Target version:

2.11

Start date:

07/07/2020

Due date:

% Done:

Estimated time:

To be discussed:

Description

The current implementation of timeline calculation, based on #6308, is divided into the database part and Python part. The main issue is that there are multiple DB queries performed for any of the views, combined afterwards in Python, what is unfit for caching (#6310, #6332). The queries should be self-contained to support those use-cases. The common table expressions (CTE) syntax should help with this.

Files

timeline_cte.txt (204 KB) timeline_cte.txt

Radko Krkoš, 07/07/2020 02:56 PM

Related issues

Actions

Copy link

Updated by Radko Krkoš over 4 years ago

Precedes Feature #6310: Implement result caching added

Actions

Copy link

Updated by Radko Krkoš over 4 years ago

Precedes Feature #6332: Improve searching with caching and JavaScript added

Actions

Copy link

Updated by Radko Krkoš over 4 years ago

Related to Feature #6308: Reimplement timeline calculations on database added

Actions

Copy link

Updated by Radko Krkoš over 4 years ago

Related to Feature #6257: Review calculated statistics in timeline added

Actions

Copy link

Updated by Radko Krkoš over 4 years ago

File timeline_cte.txt timeline_cte.txt added

Using CTE, the basic query was extended as follows:

WITH
timeline AS (SELECT generate_series(\since, \until - INTERVAL '1ms', \interval) AS bucket),
total AS (SELECT COUNT(*) AS total FROM events WHERE "detecttime" >= \since AND "detecttime" < \until),
raw AS (SELECT \since + \interval * (width_bucket(detecttime, (SELECT array_agg(bucket) FROM timeline)) - 1) AS bucket, unnest(\set) AS set, COUNT(*) AS count FROM events WHERE "detecttime" >= \since AND "detecttime" < \until GROUP BY bucket, set),
sums AS (SELECT raw.set AS set, SUM(raw.count) AS sum FROM raw GROUP BY raw.set ORDER BY sum DESC)
(SELECT bucket, set, COALESCE(count, 0) AS count, NULL AS share FROM (timeline FULL JOIN sums ON TRUE) LEFT JOIN raw USING (bucket, set) ORDER BY bucket ASC, sums.sum DESC) UNION ALL SELECT NULL, sums.set, sums.sum, sums.sum/total.total::decimal FROM sums, total UNION ALL SELECT NULL, NULL, total.total, NULL FROM total;

with parameters \since, \until, \interval and \set, which generates all the data required for the specific view, with ARRAY and no LIMITing (e.g. category), zero-filled and sorted. See attachment timeline_cte.txt for example output. An optimization is possible for non-array column statistics and for LIMITing the query must be extended.
The runtime (on mentat-alt) is about 21.5s (median estimate) for 6 days and 1 hour interval as compared to the original query that took ~20.2s (median estimate).
Also an error is fixed here, where the interval upper bound was inclusive, resulting in several events getting in with detecttime = \until, quite commonly as some detectors report times with second granularity.

Actions

Copy link

Updated by Radko Krkoš over 4 years ago

Subject changed from Autarchic DB queries for timeline to Autarkic DB queries for timeline

Actions

Copy link

Updated by Radko Krkoš over 4 years ago

The total count query would be as follows:

WITH
timeline AS (SELECT generate_series(\since, \until - INTERVAL '1ms', \interval) AS bucket),
raw AS (SELECT \since + \interval * (width_bucket(detecttime, (SELECT array_agg(bucket) FROM timeline)) - 1) AS bucket, COUNT(*) AS count FROM events WHERE "detecttime" >= \since AND "detecttime" < \until GROUP BY bucket)
(SELECT bucket, COALESCE(count, 0) AS count FROM timeline LEFT JOIN raw USING (bucket) ORDER BY bucket ASC) UNION ALL SELECT NULL, SUM(count) FROM raw;

The run times are indistinguishable from the original version (both about 2s) on mentat-alt.

Actions

Copy link

Updated by Radko Krkoš over 4 years ago

The limiting query should be as follows;

WITH
timeline AS (SELECT generate_series(\since, \until - INTERVAL '1ms', \interval) AS bucket),
toplist AS (SELECT unnest(\set) AS set, COUNT(*) AS sum FROM events WHERE "detecttime" >= \since AND "detecttime" < \until GROUP BY set ORDER BY sum DESC LIMIT \limit),
total AS (SELECT COUNT(*) AS total FROM events WHERE "detecttime" >= \since AND "detecttime" < \until),
raw AS (SELECT \since + \interval * (width_bucket(detecttime, (SELECT array_agg(bucket) FROM timeline)) - 1) AS bucket, unnest(\set) AS set, COUNT(*) AS count FROM events WHERE "detecttime" >= \since AND "detecttime" < \until GROUP BY bucket, set)
(SELECT bucket, set, COALESCE(count, 0) AS count, NULL AS share FROM (timeline FULL JOIN toplist ON TRUE) LEFT JOIN raw USING (bucket, set) ORDER BY bucket ASC, toplist.sum DESC) UNION ALL SELECT NULL, toplist.set, toplist.sum, toplist.sum/total.total::decimal FROM toplist, total UNION ALL SELECT NULL, NULL, total.total, NULL FROM total;

This adds the parameter \limit. The run time grew from about 46.5s to about 49s (median estimate) for 6 days, 1 hour intervals on mentat-alt.

Actions

Copy link

Updated by Radko Krkoš about 4 years ago

Related to Config #6251: Rethink the indices and planner cost optimizations in light of the current usage patterns added

Actions

Copy link

#10

Updated by Radko Krkoš about 4 years ago

To be discussed changed from Yes to No

Actions

Copy link

#11

Updated by Radko Krkoš about 4 years ago

Status changed from New to Deferred
Assignee deleted (~~Radko Krkoš~~)
To be discussed deleted (No)

The design is mostly done. This should be implemented one day, but currently there are more pressing issues.

Actions

Copy link

#12

Updated by Pavel Kácha about 2 years ago

Assignee set to Jakub Maloštik

Actions

Copy link

#13

Updated by Jakub Maloštik about 2 years ago

Status changed from Deferred to In Progress

Actions

Copy link

#14

Updated by Jakub Maloštik about 2 years ago

After updating the queries, with variables set as follows:

\set since '\'2022-12-01T00:00:00\'::timestamp'
\set first_step '\'2022-12-01T00:00:00\'::timestamp'
\set until '\'2022-12-07T00:00:00\'::timestamp'
\set interval 'INTERVAL \'1h\''
\set set '"category"'
\set limit 100

The timings were:

WITH
timeline AS (SELECT generate_series(:first_step, :until - INTERVAL '1 microsecond', :interval) AS bucket),
total AS (SELECT COUNT(*) AS total FROM events WHERE "detecttime" >= :since AND "detecttime" < :until),
raw AS (SELECT GREATEST(:since, :first_step + :interval * (width_bucket(detecttime, (SELECT array_agg(bucket) FROM timeline)) - 1)) AS bucket, unnest(:set) AS set, COUNT(*) AS count FROM events WHERE "detecttime" >= :since AND "detecttime" < :until GROUP BY bucket, set),
sums AS (SELECT raw.set AS set, SUM(raw.count) AS sum FROM raw GROUP BY raw.set ORDER BY sum DESC)
(SELECT bucket, set, COALESCE(count, 0) AS count, NULL AS share FROM (timeline FULL JOIN sums ON TRUE) LEFT JOIN raw USING (bucket, set) ORDER BY bucket ASC, sums.sum DESC) UNION ALL SELECT NULL, sums.set, sums.sum, sums.sum/total.total::decimal FROM sums, total UNION ALL SELECT NULL, NULL, total.total, NULL FROM total;

-- ~7.9s

SELECT GREATEST(:since, :first_step + :interval * (width_bucket(detecttime, (SELECT array_agg(buckets) FROM generate_series(:first_step, :until - INTERVAL '1 microsecond', :interval) AS buckets)) - 1)) AS bucket, unnest(:set) AS set, COUNT(*) FROM events WHERE "detecttime" >= :since AND "detecttime" < :until GROUP BY bucket, set ORDER BY bucket ASC;

-- ~10.9s

--------------------------------

WITH
timeline AS (SELECT generate_series(:first_step, :until - INTERVAL '1 microsecond', :interval) AS bucket),
raw AS (SELECT GREATEST(:since, :first_step + :interval * (width_bucket(detecttime, (SELECT array_agg(bucket) FROM timeline)) - 1)) AS bucket, COUNT(*) AS count FROM events WHERE "detecttime" >= :since AND "detecttime" < :until GROUP BY bucket)
(SELECT bucket, COALESCE(count, 0) AS count FROM timeline LEFT JOIN raw USING (bucket) ORDER BY bucket ASC) UNION ALL SELECT NULL, SUM(count) FROM raw;

-- ~3.0s

SELECT GREATEST(:since, :first_step + :interval * (width_bucket(detecttime, (SELECT array_agg(buckets) FROM generate_series(:first_step, :until - INTERVAL '1 microsecond', :interval) AS buckets)) - 1)) AS bucket, COUNT(*) FROM events WHERE "detecttime" >= :since AND "detecttime" < :until GROUP BY bucket ORDER BY bucket ASC;

-- ~3.0s

--------------------------------

WITH
timeline AS (SELECT generate_series(:first_step, :until - INTERVAL '1 microsecond', :interval) AS bucket),
toplist AS (SELECT unnest(:set) AS set, COUNT(*) AS sum FROM events WHERE "detecttime" >= :since AND "detecttime" < :until GROUP BY set ORDER BY sum DESC LIMIT :limit),
total AS (SELECT COUNT(*) AS total FROM events WHERE "detecttime" >= :since AND "detecttime" < :until),
raw AS (SELECT GREATEST(:since, :first_step + :interval * (width_bucket(detecttime, (SELECT array_agg(bucket) FROM timeline)) - 1)) AS bucket, unnest(:set) AS set, COUNT(*) AS count FROM events WHERE "detecttime" >= :since AND "detecttime" < :until GROUP BY bucket, set)
(SELECT bucket, set, COALESCE(count, 0) AS count, NULL AS share FROM (timeline FULL JOIN toplist ON TRUE) LEFT JOIN raw USING (bucket, set) ORDER BY bucket ASC, toplist.sum DESC) UNION ALL SELECT NULL, toplist.set, toplist.sum, toplist.sum/total.total::decimal FROM toplist, total UNION ALL SELECT NULL, NULL, total.total, NULL FROM total;

-- ~10.9s

SELECT dist.bucket AS bucket, dist.set AS set, dist.count AS count FROM (SELECT GREATEST(:since, :first_step + :interval * (width_bucket(detecttime, (SELECT array_agg(buckets) FROM generate_series(:first_step, :until - INTERVAL '1 microsecond', :interval) AS buckets)) - 1)) AS bucket, unnest(:set) AS set, COUNT(*) FROM events WHERE "detecttime" >= :since AND "detecttime" < :until GROUP BY bucket, set ORDER BY bucket ASC) AS dist INNER JOIN (SELECT unnest(:set) AS set, COUNT(*) FROM events WHERE "detecttime" >= :since AND "detecttime" < :until GROUP BY set ORDER BY COUNT(*) DESC LIMIT :limit) AS toplist USING (set);

-- ~14.7s

Actions

Copy link

#15

Updated by Radko Krkoš about 2 years ago

To be discussed set to Yes

A follow-up from the VC:
Regarding the incompleteness of the query, that was my mistake, the relative counts are only included for the last few columns - only where needed for the current timeline output.
So, only the total times should be measured. Probably the complete Python code with DB calls versus sum of the new DB calls.

Actions

Copy link

#16

Updated by Jakub Maloštik over 1 year ago

Status changed from In Progress to Closed

After couple of revisions, the queries are as follows:

-- Only counts
WITH
timeline AS (SELECT * FROM (SELECT '2023-04-01T00:00:00'::timestamp AS bucket UNION SELECT generate_series('2023-04-01T00:00:00'::timestamp, '2023-04-07T00:00:00'::timestamp - INTERVAL '1 microsecond', '0 days 3600.000000 seconds'::interval) AS bucket) AS t ORDER BY bucket),
raw AS (SELECT GREATEST('2023-04-01T00:00:00'::timestamp, '2023-04-01T00:00:00'::timestamp + '0 days 3600.000000 seconds'::interval * (width_bucket(detecttime, (SELECT array_agg(bucket) FROM generate_series('2023-04-01T00:00:00'::timestamp, '2023-04-07T00:00:00'::timestamp - INTERVAL '1 microsecond', '0 days 3600.000000 seconds'::interval) AS bucket)) - 1)) AS bucket, COUNT(*) AS count FROM "events" WHERE "detecttime" >= '2023-04-01T00:00:00'::timestamp AND "detecttime" < '2023-04-07T00:00:00'::timestamp GROUP BY bucket)
(SELECT timeline.bucket, COALESCE(count, 0) AS count FROM "timeline" LEFT JOIN "raw" ON timeline.bucket = raw.bucket ORDER BY bucket ASC) UNION ALL SELECT NULL, SUM(count)::bigint FROM "raw";

-- Simple grouping by set without toplist

WITH
timeline AS (SELECT * FROM (SELECT '2023-04-01T00:00:00'::timestamp AS bucket UNION SELECT generate_series('2023-04-01T00:00:00'::timestamp, '2023-04-07T00:00:00'::timestamp - INTERVAL '1 microsecond', '0 days 3600.000000 seconds'::interval) AS bucket) AS t ORDER BY bucket),
total_events AS (SELECT COUNT(*) AS total FROM events WHERE "detecttime" >= '2023-04-01T00:00:00'::timestamp AND "detecttime" < '2023-04-07T00:00:00'::timestamp),
raw AS (SELECT GREATEST('2023-04-01T00:00:00'::timestamp, '2023-04-01T00:00:00'::timestamp + '0 days 3600.000000 seconds'::interval * (width_bucket(detecttime, (SELECT array_agg(bucket) FROM generate_series('2023-04-01T00:00:00'::timestamp, '2023-04-07T00:00:00'::timestamp - INTERVAL '1 microsecond', '0 days 3600.000000 seconds'::interval) AS bucket)) - 1)) AS bucket, "eventclass" AS set, COUNT(*) AS count FROM "events" WHERE "detecttime" >= '2023-04-01T00:00:00'::timestamp AND "detecttime" < '2023-04-07T00:00:00'::timestamp GROUP BY bucket, set),
sums AS (SELECT raw.set::text AS set, SUM(raw.count)::bigint AS sum FROM raw GROUP BY raw.set ORDER BY sum DESC)
(SELECT timeline.bucket, "sums".set, COALESCE(count, 0) AS count FROM ("timeline" FULL JOIN "sums" ON TRUE) LEFT JOIN "raw" ON "timeline".bucket = "raw".bucket AND "sums".set IS NOT DISTINCT FROM "raw".set ORDER BY bucket ASC, "sums".sum DESC) UNION ALL SELECT NULL, "sums".set, "sums".sum FROM "sums" UNION ALL SELECT NULL, NULL, total FROM "total_events" UNION ALL SELECT NULL, NULL, total FROM total_events;

-- Grouping of set that is an array without toplist
WITH
timeline AS (SELECT * FROM (SELECT '2023-04-01T00:00:00'::timestamp AS bucket UNION SELECT generate_series('2023-04-01T00:00:00'::timestamp, '2023-04-07T00:00:00'::timestamp - INTERVAL '1 microsecond', '0 days 3600.000000 seconds'::interval) AS bucket) AS t ORDER BY bucket),
total AS (SELECT COALESCE(SUM(CARDINALITY("category")), 0) AS total FROM events WHERE "detecttime" >= '2023-04-01T00:00:00'::timestamp AND "detecttime" < '2023-04-07T00:00:00'::timestamp),
total_events AS (SELECT COUNT(*) AS total FROM events WHERE "detecttime" >= '2023-04-01T00:00:00'::timestamp AND "detecttime" < '2023-04-07T00:00:00'::timestamp),
raw AS (SELECT GREATEST('2023-04-01T00:00:00'::timestamp, '2023-04-01T00:00:00'::timestamp + '0 days 3600.000000 seconds'::interval * (width_bucket(detecttime, (SELECT array_agg(bucket) FROM generate_series('2023-04-01T00:00:00'::timestamp, '2023-04-07T00:00:00'::timestamp - INTERVAL '1 microsecond', '0 days 3600.000000 seconds'::interval) AS bucket)) - 1)) AS bucket, unnest("category") AS set, COUNT(*) AS count FROM "events" WHERE "detecttime" >= '2023-04-01T00:00:00'::timestamp AND "detecttime" < '2023-04-07T00:00:00'::timestamp GROUP BY bucket, set),
sums AS (SELECT raw.set::text AS set, SUM(raw.count)::bigint AS sum FROM raw GROUP BY raw.set ORDER BY sum DESC)
(SELECT timeline.bucket, "sums".set, COALESCE(count, 0) AS count FROM ("timeline" FULL JOIN "sums" ON TRUE) LEFT JOIN "raw" ON "timeline".bucket = "raw".bucket AND "sums".set IS NOT DISTINCT FROM "raw".set ORDER BY bucket ASC, "sums".sum DESC) UNION ALL SELECT NULL, "sums".set, "sums".sum FROM "sums" UNION ALL SELECT NULL, NULL, total FROM "total" UNION ALL SELECT NULL, NULL, total FROM total_events

-- Grouping of set that is an array with toplist
WITH
timeline AS (SELECT * FROM (SELECT '2023-04-01T00:00:00'::timestamp AS bucket UNION SELECT generate_series('2023-04-01T00:00:00'::timestamp, '2023-04-07T00:00:00'::timestamp - INTERVAL '1 microsecond', '0 days 3600.000000 seconds'::interval) AS bucket) AS t ORDER BY bucket),
total AS (SELECT COALESCE(SUM(CARDINALITY("target_port")), 0) AS total FROM events WHERE "detecttime" >= '2023-04-01T00:00:00'::timestamp AND "detecttime" < '2023-04-07T00:00:00'::timestamp),
total_events AS (SELECT COUNT(*) AS total FROM events WHERE "detecttime" >= '2023-04-01T00:00:00'::timestamp AND "detecttime" < '2023-04-07T00:00:00'::timestamp),
toplist AS (SELECT unnest("target_port") AS set, COUNT(*) AS sum FROM events WHERE "detecttime" >= '2023-04-01T00:00:00'::timestamp AND "detecttime" < '2023-04-07T00:00:00'::timestamp GROUP BY set ORDER BY sum DESC LIMIT 100),
toplist_with_rest AS (SELECT set::text, sum FROM toplist UNION (SELECT '__REST__' as set, total - SUM(sum)::bigint as sum FROM total, toplist GROUP BY total HAVING total - SUM(sum)::bigint > 0) ORDER BY sum DESC),
raw AS (SELECT GREATEST('2023-04-01T00:00:00'::timestamp, '2023-04-01T00:00:00'::timestamp + '0 days 3600.000000 seconds'::interval * (width_bucket(detecttime, (SELECT array_agg(bucket) FROM generate_series('2023-04-01T00:00:00'::timestamp, '2023-04-07T00:00:00'::timestamp - INTERVAL '1 microsecond', '0 days 3600.000000 seconds'::interval) AS bucket)) - 1)) AS bucket, set, COUNT(*) AS count FROM (SELECT detecttime, unnest("target_port") AS set FROM events WHERE "detecttime" >= '2023-04-01T00:00:00'::timestamp AND "detecttime" < '2023-04-07T00:00:00'::timestamp) top_events INNER JOIN toplist USING (set) GROUP BY bucket, set),
raw_with_rest AS (SELECT bucket, set::text, count FROM raw UNION ALL SELECT bucket, '__REST__' AS set, raw_totals.count - raw_sums.count AS count FROM (SELECT GREATEST('2023-04-01T00:00:00'::timestamp, '2023-04-01T00:00:00'::timestamp + '0 days 3600.000000 seconds'::interval * (width_bucket(detecttime, (SELECT array_agg(bucket) FROM generate_series('2023-04-01T00:00:00'::timestamp, '2023-04-07T00:00:00'::timestamp - INTERVAL '1 microsecond', '0 days 3600.000000 seconds'::interval) AS bucket)) - 1)) AS bucket, SUM(CARDINALITY("target_port")) AS count FROM events WHERE "detecttime" >= '2023-04-01T00:00:00'::timestamp AND "detecttime" < '2023-04-07T00:00:00'::timestamp GROUP BY bucket) raw_totals FULL JOIN (SELECT bucket, SUM(count)::bigint AS count FROM raw GROUP BY bucket) raw_sums USING (bucket))
(SELECT timeline.bucket, "toplist_with_rest".set, COALESCE(count, 0) AS count FROM ("timeline" FULL JOIN "toplist_with_rest" ON TRUE) LEFT JOIN "raw_with_rest" ON "timeline".bucket = "raw_with_rest".bucket AND "toplist_with_rest".set IS NOT DISTINCT FROM "raw_with_rest".set ORDER BY bucket ASC, "toplist_with_rest".sum DESC) UNION ALL SELECT NULL, "toplist_with_rest".set, "toplist_with_rest".sum FROM "toplist_with_rest" UNION ALL SELECT NULL, NULL, total FROM "total" UNION ALL SELECT NULL, NULL, total FROM total_events

Actions

Copy link

#17

Updated by Jakub Maloštik over 1 year ago

After couple of revisions, the queries are as follows:

-- Only counts
WITH
timeline AS (SELECT * FROM (SELECT '2023-04-01T00:00:00'::timestamp AS bucket UNION SELECT generate_series('2023-04-01T00:00:00'::timestamp, '2023-04-07T00:00:00'::timestamp - INTERVAL '1 microsecond', '0 days 3600.000000 seconds'::interval) AS bucket) AS t ORDER BY bucket),
raw AS (SELECT GREATEST('2023-04-01T00:00:00'::timestamp, '2023-04-01T00:00:00'::timestamp + '0 days 3600.000000 seconds'::interval * (width_bucket(detecttime, (SELECT array_agg(bucket) FROM generate_series('2023-04-01T00:00:00'::timestamp, '2023-04-07T00:00:00'::timestamp - INTERVAL '1 microsecond', '0 days 3600.000000 seconds'::interval) AS bucket)) - 1)) AS bucket, COUNT(*) AS count FROM "events" WHERE "detecttime" >= '2023-04-01T00:00:00'::timestamp AND "detecttime" < '2023-04-07T00:00:00'::timestamp GROUP BY bucket)
(SELECT timeline.bucket, COALESCE(count, 0) AS count FROM "timeline" LEFT JOIN "raw" ON timeline.bucket = raw.bucket ORDER BY bucket ASC) UNION ALL SELECT NULL, SUM(count)::bigint FROM "raw";

-- Simple grouping by set without toplist
WITH
timeline AS (SELECT * FROM (SELECT '2023-04-01T00:00:00'::timestamp AS bucket UNION SELECT generate_series('2023-04-01T00:00:00'::timestamp, '2023-04-07T00:00:00'::timestamp - INTERVAL '1 microsecond', '0 days 3600.000000 seconds'::interval) AS bucket) AS t ORDER BY bucket),
total_events AS (SELECT COUNT(*) AS total FROM events WHERE "detecttime" >= '2023-04-01T00:00:00'::timestamp AND "detecttime" < '2023-04-07T00:00:00'::timestamp),
raw AS (SELECT GREATEST('2023-04-01T00:00:00'::timestamp, '2023-04-01T00:00:00'::timestamp + '0 days 3600.000000 seconds'::interval * (width_bucket(detecttime, (SELECT array_agg(bucket) FROM generate_series('2023-04-01T00:00:00'::timestamp, '2023-04-07T00:00:00'::timestamp - INTERVAL '1 microsecond', '0 days 3600.000000 seconds'::interval) AS bucket)) - 1)) AS bucket, "eventclass" AS set, COUNT(*) AS count FROM "events" WHERE "detecttime" >= '2023-04-01T00:00:00'::timestamp AND "detecttime" < '2023-04-07T00:00:00'::timestamp GROUP BY bucket, set),
sums AS (SELECT raw.set::text AS set, SUM(raw.count)::bigint AS sum FROM raw GROUP BY raw.set ORDER BY sum DESC)
(SELECT timeline.bucket, "sums".set, COALESCE(count, 0) AS count FROM ("timeline" FULL JOIN "sums" ON TRUE) LEFT JOIN "raw" ON "timeline".bucket = "raw".bucket AND "sums".set IS NOT DISTINCT FROM "raw".set ORDER BY bucket ASC, "sums".sum DESC) UNION ALL SELECT NULL, "sums".set, "sums".sum FROM "sums" UNION ALL SELECT NULL, NULL, total FROM "total_events" UNION ALL SELECT NULL, NULL, total FROM total_events;

-- Grouping of set that is an array without toplist
WITH
timeline AS (SELECT * FROM (SELECT '2023-04-01T00:00:00'::timestamp AS bucket UNION SELECT generate_series('2023-04-01T00:00:00'::timestamp, '2023-04-07T00:00:00'::timestamp - INTERVAL '1 microsecond', '0 days 3600.000000 seconds'::interval) AS bucket) AS t ORDER BY bucket),
total AS (SELECT COALESCE(SUM(CARDINALITY("category")), 0) AS total FROM events WHERE "detecttime" >= '2023-04-01T00:00:00'::timestamp AND "detecttime" < '2023-04-07T00:00:00'::timestamp),
total_events AS (SELECT COUNT(*) AS total FROM events WHERE "detecttime" >= '2023-04-01T00:00:00'::timestamp AND "detecttime" < '2023-04-07T00:00:00'::timestamp),
raw AS (SELECT GREATEST('2023-04-01T00:00:00'::timestamp, '2023-04-01T00:00:00'::timestamp + '0 days 3600.000000 seconds'::interval * (width_bucket(detecttime, (SELECT array_agg(bucket) FROM generate_series('2023-04-01T00:00:00'::timestamp, '2023-04-07T00:00:00'::timestamp - INTERVAL '1 microsecond', '0 days 3600.000000 seconds'::interval) AS bucket)) - 1)) AS bucket, unnest("category") AS set, COUNT(*) AS count FROM "events" WHERE "detecttime" >= '2023-04-01T00:00:00'::timestamp AND "detecttime" < '2023-04-07T00:00:00'::timestamp GROUP BY bucket, set),
sums AS (SELECT raw.set::text AS set, SUM(raw.count)::bigint AS sum FROM raw GROUP BY raw.set ORDER BY sum DESC)
(SELECT timeline.bucket, "sums".set, COALESCE(count, 0) AS count FROM ("timeline" FULL JOIN "sums" ON TRUE) LEFT JOIN "raw" ON "timeline".bucket = "raw".bucket AND "sums".set IS NOT DISTINCT FROM "raw".set ORDER BY bucket ASC, "sums".sum DESC) UNION ALL SELECT NULL, "sums".set, "sums".sum FROM "sums" UNION ALL SELECT NULL, NULL, total FROM "total" UNION ALL SELECT NULL, NULL, total FROM total_events

-- Grouping of set that is an array with toplist
WITH
timeline AS (SELECT * FROM (SELECT '2023-04-01T00:00:00'::timestamp AS bucket UNION SELECT generate_series('2023-04-01T00:00:00'::timestamp, '2023-04-07T00:00:00'::timestamp - INTERVAL '1 microsecond', '0 days 3600.000000 seconds'::interval) AS bucket) AS t ORDER BY bucket),
total AS (SELECT COALESCE(SUM(CARDINALITY("target_port")), 0) AS total FROM events WHERE "detecttime" >= '2023-04-01T00:00:00'::timestamp AND "detecttime" < '2023-04-07T00:00:00'::timestamp),
total_events AS (SELECT COUNT(*) AS total FROM events WHERE "detecttime" >= '2023-04-01T00:00:00'::timestamp AND "detecttime" < '2023-04-07T00:00:00'::timestamp),
toplist AS (SELECT unnest("target_port") AS set, COUNT(*) AS sum FROM events WHERE "detecttime" >= '2023-04-01T00:00:00'::timestamp AND "detecttime" < '2023-04-07T00:00:00'::timestamp GROUP BY set ORDER BY sum DESC LIMIT 100),
toplist_with_rest AS (SELECT set::text, sum FROM toplist UNION (SELECT '__REST__' as set, total - SUM(sum)::bigint as sum FROM total, toplist GROUP BY total HAVING total - SUM(sum)::bigint > 0) ORDER BY sum DESC),
raw AS (SELECT GREATEST('2023-04-01T00:00:00'::timestamp, '2023-04-01T00:00:00'::timestamp + '0 days 3600.000000 seconds'::interval * (width_bucket(detecttime, (SELECT array_agg(bucket) FROM generate_series('2023-04-01T00:00:00'::timestamp, '2023-04-07T00:00:00'::timestamp - INTERVAL '1 microsecond', '0 days 3600.000000 seconds'::interval) AS bucket)) - 1)) AS bucket, set, COUNT(*) AS count FROM (SELECT detecttime, unnest("target_port") AS set FROM events WHERE "detecttime" >= '2023-04-01T00:00:00'::timestamp AND "detecttime" < '2023-04-07T00:00:00'::timestamp) top_events INNER JOIN toplist USING (set) GROUP BY bucket, set),
raw_with_rest AS (SELECT bucket, set::text, count FROM raw UNION ALL SELECT bucket, '__REST__' AS set, raw_totals.count - raw_sums.count AS count FROM (SELECT GREATEST('2023-04-01T00:00:00'::timestamp, '2023-04-01T00:00:00'::timestamp + '0 days 3600.000000 seconds'::interval * (width_bucket(detecttime, (SELECT array_agg(bucket) FROM generate_series('2023-04-01T00:00:00'::timestamp, '2023-04-07T00:00:00'::timestamp - INTERVAL '1 microsecond', '0 days 3600.000000 seconds'::interval) AS bucket)) - 1)) AS bucket, SUM(CARDINALITY("target_port")) AS count FROM events WHERE "detecttime" >= '2023-04-01T00:00:00'::timestamp AND "detecttime" < '2023-04-07T00:00:00'::timestamp GROUP BY bucket) raw_totals FULL JOIN (SELECT bucket, SUM(count)::bigint AS count FROM raw GROUP BY bucket) raw_sums USING (bucket))
(SELECT timeline.bucket, "toplist_with_rest".set, COALESCE(count, 0) AS count FROM ("timeline" FULL JOIN "toplist_with_rest" ON TRUE) LEFT JOIN "raw_with_rest" ON "timeline".bucket = "raw_with_rest".bucket AND "toplist_with_rest".set IS NOT DISTINCT FROM "raw_with_rest".set ORDER BY bucket ASC, "toplist_with_rest".sum DESC) UNION ALL SELECT NULL, "toplist_with_rest".set, "toplist_with_rest".sum FROM "toplist_with_rest" UNION ALL SELECT NULL, NULL, total FROM "total" UNION ALL SELECT NULL, NULL, total FROM total_events