Review possibilities of support of timeline calculation on db
- Support for hypercubes in PQSQL itself?
- More fine grained result “cubes” for statistician? In multiple tables per “cube” size?
Updated by Radko Krkoš 4 months ago
- Status changed from New to Feedback
- Assignee changed from Radko Krkoš to Pavel Kácha
- % Done changed from 0 to 90
- To be discussed changed from No to Yes
Based on research and experimentation, the simplest way, offering the best immediate results is using the DB’s aggregation functions like
width_bucket() and performing the time-based binning inside the DB, reading only the aggregated counts. This way is illustrated in #6257.
- Partitioning is just a minor optimization (with possible huge performance gain nevertheless), not different from an index as far as query mechanics are considered.
- TimescaleDB is an external module, it looks very potent, but we should better stay away from expanding external dependencies.
- Hypercubes (specialized aggregation cuboids) are a general tool to implement multidimensional fast aggregations. In our case, the aggregation is either 1D, or 2D at max with the second dimension always changing. This is a different use-case, hypercubes would probably not help much and the overhead of building and maintaining them would probably not be worth the costs. This was not tested however, it is based on available sources only.
- Extending statistician does not seem viable, as even the original author does not like it very much. The most probable course of action is replacing the statistician analytics altogether with in-DB aggregations.
Updated by Pavel Kácha 3 months ago
- To be discussed deleted (
After heated discussion:
- partitioning seems the cheapest, nowadays in PQ quite usable
- Timescale - may be worth testing, but contender for comparison is partitioned access (we should check that first to be able to at least compare)
- Cubes - know only float64 (needs translation of all our data to float codepoints), works with multidimensional codepoints (means problem with multivalue data, Categories and Tags are solvable by Category/Tag sets, however IPs and ports lead to cartesian products of codepoints). It might be working solution, but definitely not low hanging, and implementation may be comparable in scale to enhanced statistician).
- Statistician - we’d rather head to replace, simplify or ditch statistician altogether.
So, leave it for future, let’s try partitioning first, maybe Timescale then.