Minimize whole JSON IDEA events usage (jsonb column)
Mentat relies too much on the whole original JSON data (stored and encoded in jsonb column). Lot of queries spend cycles on decoding of the data, which is already stored in metadata columns, and pqsql wastes RAM on jsonb column.
First step of solution is to implement lightweight API, which would allow to serve results, which are satisfiable from metadata columns, directly, without resorting to jsonb.However, couple of features depends on IDEA format, namely filtering. Sidesteping this could be
- creating and returning incomplete lightweight IDEA-like events on the fly from the metadata columns, and allow for asking only for data, which are needed by the caller (thus freeing database from the need for fetching the whole lines to the memory and from the need to push jsonb data into app), and freeing caller from the need to ingest, parse and convert JSON
- creating complementary part of the API, which would allow for "extending" of the data, or "upgrading" incomplete lightweight IDEA events to full blown data, fetched from the db based on the ID
- converting as much of code using current API as possible to work with minimum data it needs (incomplete events) and extending to full data only after all the hard work is done
This itself would help majority of the bigger queries, and probably mostly solve 'big events' problem - set of simple deterministic conversions from the metadata table will replace costly JSON demarshalling. It might even speed up parts where 'extend' part is necessary - if all the costly processing and filtering is done beforehand on the incomplete events, and the number of the complete events is trimmed to tens or hundreds (Hawat 'show' event, reporter creating mails).
Updated by Pavel Kácha about 3 years ago
Note from Radko at #4253:
iprange types are returned as string representations in single address, prefix or arbitrary range (min-max) forms (best match). This is because iprange is a nonstandard extension and the standard connector does not understand its data type. Anyways, there is no native/standard library data type that would fit iprange (not the arbitrary range part at least).
Which is cool, as it might work as direct input to ipranges.
Updated by Radko Krkoš almost 2 years ago
- Status changed from Feedback to Resolved
There are obvious differences in performance between the former and the new version. Right now, each is deployed on different servers, but looking at
mentat-alt (new) and
mentat-hub (former), the plans generated are the same. The only difference therefore comes from reading, keeping in cache, processing and marshaling the event
BYTEA, which is omitted in the new version. The runtime went from 16s to 9s on
mentat-alt for a pathological query. Based on information above, similar speedup is expected in production.