Feature #4275: Split jsonb column into its own table - Mentat - Homeproj: Redmine for CESNET

Actions

Copy link

Feature #4275

closed

Split jsonb column into its own table

Added by Pavel Kácha over 6 years ago. Updated almost 5 years ago.

Status:

Closed

Priority:

Normal

Assignee:

Pavel Kácha

Category:

Development - Core

Target version:

2.6

Start date:

Due date:

% Done:

100%

Estimated time:

To be discussed:

Description

JSONB data occupy a lot of precious RAM (even though compressed by pq), especially because they have to be fetched into RAM with the rest of the db tuples during table scan and index scan queries (which we unfortunatelly rely on).

After usage of JSONB data is minimized within Mentat code (see #4274), events table can be split, and 'extend' part of the API can be modified accordingly. Only small parts of full length jsonb IDEA data will be hot in the memory, presumably only single events, which are viewed by user in the Hawat 'show' interface, and hundreds (tops) of events from reporting - which are however usually hours old, max days if relapse kicks in.

Related issues

Actions

Copy link

Updated by Pavel Kácha over 6 years ago

From telco brainstorming with Radko - upgrade of the db and code could be done with relatively short downtime (the longest will be CLUSTER, which can be delayed to off-hours).

create new empty table
enable API 'extend' code to be able to deal with jsonb in separate table, but resort to jsonb in the original table (to be able to cope with the transition period)
stop incoming data
run the last run of the reporter and statistician
alter table jsonb NULL (allow NULL data in the old table)
set up new storage code (to split into metadata table and jsonb data table)
enable incoming data (last column in metadata table now gets NULLs)
enable reporter and statistician
repeat: select old jsonb into new jsonb limit XXX; until done (limit is to prevent lock congestion)
alter table drop old jsonb; vacuum
CLUSTER when suitable
get rid of dual code

Actions

Copy link

Updated by Pavel Kácha over 6 years ago

Related to Bug #4253: Handling of too big events added

Actions

Copy link

Updated by Pavel Kácha over 6 years ago

Blocks Feature #4274: Minimize whole JSON IDEA events usage (jsonb column) added

Actions

Copy link

Updated by Pavel Kácha over 6 years ago

Blocks deleted (Feature #4274: Minimize whole JSON IDEA events usage (jsonb column))

Actions

Copy link

Updated by Pavel Kácha over 6 years ago

Precedes Feature #4274: Minimize whole JSON IDEA events usage (jsonb column) added

Actions

Copy link

Updated by Radko Krkoš about 6 years ago

Statistics gathering was performed to better understand this. Following numbers are for mentat-hub on 20181024.
Current table size: 171GB
Number of rows: 125.3M (125 314 536)
Aggregate size of event BYTEA: 110GB (118 092 392 057) - obtained by SELECT sum(octet_length(event)) FROM events;
Aggregate indices size: 34.8GB (REM: No REINDEX was performed for a long time so the indices are inflated considerably)
ID index size: 11GB (this would be duplicated for the data table)

That leaves us with (171-110) + 34.8 + 11 = 106.8GB per 125.3M of events, effectively 0.85GB per 1M rows. With current physical memory and long term memory load, this would get us to at least 260M events fully cached in RAM. A considerable improvement.

Actions

Copy link

Updated by Radko Krkoš almost 6 years ago

Related to Bug #4515: Runaway query? added

Actions

Copy link

Updated by Radko Krkoš almost 6 years ago

Blocks Feature #4276: Test database conversion time added

Actions

Copy link

Updated by Pavel Kácha almost 6 years ago

Blocked by Feature #4230: Make use of existing or implement own mechanism for handling SQL schema migrations. added

Actions

Copy link

#10

Updated by Pavel Kácha about 5 years ago

Target version changed from Backlog to 2.6

Actions

Copy link

#11

Updated by Jan Mach about 5 years ago

Status changed from New to In Progress
Assignee changed from Jan Mach to Radko Krkoš

Hi Radko,

I would like to ask you to provide me with following, so I can proceed to implement this feature in Mentat system:

SQL code for creating new table.
Migration SQL function for transfering data from events.event column to new split table.

Thank you for your cooperation, I will handle the rest.

Actions

Copy link

#12

Updated by Radko Krkoš about 5 years ago

Assignee changed from Radko Krkoš to Jan Mach

Here you go:

CREATE TABLE IF NOT EXISTS events_idea(
    id text PRIMARY KEY REFERENCES events(id),
    event bytea
);

INSERT INTO events_idea(id, event) SELECT id, event FROM events;

ALTER TABLE events DROP COLUMN event;

VACUUM FREEZE;

CLUSTER events USING events_detecttime_idx;

The VACUUM and CLUSTER commands are very important as those are the points where the events table is compacted. They should be part of the automatic migration.

Actions

Copy link

#13