Feature #6256: Review possibilities of support of timeline calculation on db - Mentat - Homeproj: Redmine for CESNET

Actions

Copy link

Feature #6256

open

Review possibilities of support of timeline calculation on db

Added by Pavel Kácha over 4 years ago. Updated over 1 year ago.

Status:

Feedback

Priority:

Normal

Assignee:

Pavel Kácha

Category:

Design

Target version:

Future

Start date:

07/07/2020

Due date:

% Done:

100%

Estimated time:

(Total: 0.00 h)

To be discussed:

Description

Partitioning?
Timescaledb?
Support for hypercubes in PQSQL itself?
More fine grained result "cubes" for statistician? In multiple tables per "cube" size?

Subtasks 1 (0 open — 1 closed)

Related issues

Actions

Copy link

Updated by Pavel Kácha over 4 years ago

Related to Feature #4609: Arbitrary grouping and sorting in Events added

Actions

Copy link

Updated by Radko Krkoš over 4 years ago

Status changed from New to Feedback
Assignee changed from Radko Krkoš to Pavel Kácha
% Done changed from 0 to 90
To be discussed changed from No to Yes

Based on research and experimentation, the simplest way, offering the best immediate results is using the DB's aggregation functions like width_bucket() and performing the time-based binning inside the DB, reading only the aggregated counts. This way is illustrated in #6257.

As the proposed options go:

Partitioning is just a minor optimization (with possible huge performance gain nevertheless), not different from an index as far as query mechanics are considered.
TimescaleDB is an external module, it looks very potent, but we should better stay away from expanding external dependencies.
Hypercubes (specialized aggregation cuboids) are a general tool to implement multidimensional fast aggregations. In our case, the aggregation is either 1D, or 2D at max with the second dimension always changing. This is a different use-case, hypercubes would probably not help much and the overhead of building and maintaining them would probably not be worth the costs. This was not tested however, it is based on available sources only.
Extending statistician does not seem viable, as even the original author does not like it very much. The most probable course of action is replacing the statistician analytics altogether with in-DB aggregations.

Actions

Copy link

Updated by Pavel Kácha over 4 years ago

To be discussed deleted (~~Yes~~)

After heated discussion:

partitioning seems the cheapest, nowadays in PQ quite usable
Timescale - may be worth testing, but contender for comparison is partitioned access (we should check that first to be able to at least compare)
Cubes - know only float64 (needs translation of all our data to float codepoints), works with multidimensional codepoints (means problem with multivalue data, Categories and Tags are solvable by Category/Tag sets, however IPs and ports lead to cartesian products of codepoints). It might be working solution, but definitely not low hanging, and implementation may be comparable in scale to enhanced statistician).
Statistician - we'd rather head to replace, simplify or ditch statistician altogether.

So, leave it for future, let's try partitioning first, maybe Timescale then.

Actions

Copy link

Updated by Pavel Kácha over 4 years ago

Target version changed from Backlog to Future

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Mentat

Custom queries

Feature #6256

Review possibilities of support of timeline calculation on db

Updated by Pavel Kácha over 4 years ago

Updated by Radko Krkoš over 4 years ago

Updated by Pavel Kácha over 4 years ago

Updated by Pavel Kácha over 4 years ago