Feature #4275
closed
Split jsonb column into its own table
Added by Pavel Kácha over 6 years ago.
Updated almost 5 years ago.
Category:
Development - Core
Description
JSONB data occupy a lot of precious RAM (even though compressed by pq), especially because they have to be fetched into RAM with the rest of the db tuples during table scan and index scan queries (which we unfortunatelly rely on).
After usage of JSONB data is minimized within Mentat code (see #4274), events table can be split, and 'extend' part of the API can be modified accordingly. Only small parts of full length jsonb IDEA data will be hot in the memory, presumably only single events, which are viewed by user in the Hawat 'show' interface, and hundreds (tops) of events from reporting - which are however usually hours old, max days if relapse kicks in.
From telco brainstorming with Radko - upgrade of the db and code could be done with relatively short downtime (the longest will be CLUSTER, which can be delayed to off-hours).
- create new empty table
- enable API 'extend' code to be able to deal with jsonb in separate table, but resort to jsonb in the original table (to be able to cope with the transition period)
- stop incoming data
- run the last run of the reporter and statistician
- alter table jsonb NULL (allow NULL data in the old table)
- set up new storage code (to split into metadata table and jsonb data table)
- enable incoming data (last column in metadata table now gets NULLs)
- enable reporter and statistician
- repeat: select old jsonb into new jsonb limit XXX; until done (limit is to prevent lock congestion)
- alter table drop old jsonb; vacuum
- CLUSTER when suitable
- get rid of dual code
- Related to Bug #4253: Handling of too big events added
- Blocks Feature #4274: Minimize whole JSON IDEA events usage (jsonb column) added
- Blocks deleted (Feature #4274: Minimize whole JSON IDEA events usage (jsonb column))
- Precedes Feature #4274: Minimize whole JSON IDEA events usage (jsonb column) added
Statistics gathering was performed to better understand this. Following numbers are for mentat-hub on 20181024.
Current table size: 171GB
Number of rows: 125.3M (125 314 536)
Aggregate size of event BYTEA: 110GB (118 092 392 057) - obtained by SELECT sum(octet_length(event)) FROM events;
Aggregate indices size: 34.8GB (REM: No REINDEX was performed for a long time so the indices are inflated considerably)
ID index size: 11GB (this would be duplicated for the data table)
That leaves us with (171-110) + 34.8 + 11 = 106.8GB per 125.3M of events, effectively 0.85GB per 1M rows. With current physical memory and long term memory load, this would get us to at least 260M events fully cached in RAM. A considerable improvement.
- Blocked by Feature #4230: Make use of existing or implement own mechanism for handling SQL schema migrations. added
- Target version changed from Backlog to 2.6
- Status changed from New to In Progress
- Assignee changed from Jan Mach to Radko Krkoš
Hi Radko,
I would like to ask you to provide me with following, so I can proceed to implement this feature in Mentat system:
- SQL code for creating new table.
- Migration SQL function for transfering data from events.event column to new split table.
Thank you for your cooperation, I will handle the rest.
- Assignee changed from Radko Krkoš to Jan Mach
Here you go:
CREATE TABLE IF NOT EXISTS events_idea(
id text PRIMARY KEY REFERENCES events(id),
event bytea
);
INSERT INTO events_idea(id, event) SELECT id, event FROM events;
ALTER TABLE events DROP COLUMN event;
VACUUM FREEZE;
CLUSTER events USING events_detecttime_idx;
The VACUUM
and CLUSTER
commands are very important as those are the points where the events
table is compacted. They should be part of the automatic migration.
- Status changed from In Progress to Resolved
- Assignee changed from Jan Mach to Pavel Kácha
- % Done changed from 90 to 100
- To be discussed changed from No to Yes
I think this issue can be considered as resolved. In related issue #4274 there is a question of assessing the performance gains after this update.
- Related to Task #6054: Explore the use of PostgreSQL views for easier event storing and querying added
- Status changed from Resolved to Closed
- To be discussed changed from Yes to No
Also available in: Atom
PDF