Project

General

Profile

Actions

Feature #4275

closed

Split jsonb column into its own table

Added by Pavel Kácha over 5 years ago. Updated about 4 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
Development - Core
Target version:
Start date:
Due date:
% Done:

100%

Estimated time:
To be discussed:
No

Description

JSONB data occupy a lot of precious RAM (even though compressed by pq), especially because they have to be fetched into RAM with the rest of the db tuples during table scan and index scan queries (which we unfortunatelly rely on).

After usage of JSONB data is minimized within Mentat code (see #4274), events table can be split, and 'extend' part of the API can be modified accordingly. Only small parts of full length jsonb IDEA data will be hot in the memory, presumably only single events, which are viewed by user in the Hawat 'show' interface, and hundreds (tops) of events from reporting - which are however usually hours old, max days if relapse kicks in.


Related issues

Related to Mentat - Bug #4253: Handling of too big eventsClosedJan Mach08/09/2018

Actions
Related to Mentat - Bug #4515: Runaway query?ClosedPavel Kácha01/14/2019

Actions
Related to Mentat - Task #6054: Explore the use of PostgreSQL views for easier event storing and queryingRejectedJan Mach11/12/2019

Actions
Blocks Mentat - Feature #4276: Test database conversion timeClosedRadko Krkoš08/21/2018

Actions
Blocked by Mentat - Feature #4230: Make use of existing or implement own mechanism for handling SQL schema migrations.ClosedPavel Kácha07/27/2018

Actions
Precedes Mentat - Feature #4274: Minimize whole JSON IDEA events usage (jsonb column)ClosedRadko Krkoš08/22/201808/22/2018

Actions
Actions #1

Updated by Pavel Kácha over 5 years ago

From telco brainstorming with Radko - upgrade of the db and code could be done with relatively short downtime (the longest will be CLUSTER, which can be delayed to off-hours).

  • create new empty table
  • enable API 'extend' code to be able to deal with jsonb in separate table, but resort to jsonb in the original table (to be able to cope with the transition period)
  • stop incoming data
  • run the last run of the reporter and statistician
  • alter table jsonb NULL (allow NULL data in the old table)
  • set up new storage code (to split into metadata table and jsonb data table)
  • enable incoming data (last column in metadata table now gets NULLs)
  • enable reporter and statistician
  • repeat: select old jsonb into new jsonb limit XXX; until done (limit is to prevent lock congestion)
  • alter table drop old jsonb; vacuum
  • CLUSTER when suitable
  • get rid of dual code
Actions #2

Updated by Pavel Kácha over 5 years ago

  • Related to Bug #4253: Handling of too big events added
Actions #3

Updated by Pavel Kácha over 5 years ago

  • Blocks Feature #4274: Minimize whole JSON IDEA events usage (jsonb column) added
Actions #4

Updated by Pavel Kácha over 5 years ago

  • Blocks deleted (Feature #4274: Minimize whole JSON IDEA events usage (jsonb column))
Actions #5

Updated by Pavel Kácha over 5 years ago

  • Precedes Feature #4274: Minimize whole JSON IDEA events usage (jsonb column) added
Actions #6

Updated by Radko Krkoš over 5 years ago

Statistics gathering was performed to better understand this. Following numbers are for mentat-hub on 20181024.
Current table size: 171GB
Number of rows: 125.3M (125 314 536)
Aggregate size of event BYTEA: 110GB (118 092 392 057) - obtained by SELECT sum(octet_length(event)) FROM events;
Aggregate indices size: 34.8GB (REM: No REINDEX was performed for a long time so the indices are inflated considerably)
ID index size: 11GB (this would be duplicated for the data table)

That leaves us with (171-110) + 34.8 + 11 = 106.8GB per 125.3M of events, effectively 0.85GB per 1M rows. With current physical memory and long term memory load, this would get us to at least 260M events fully cached in RAM. A considerable improvement.

Actions #7

Updated by Radko Krkoš about 5 years ago

Actions #8

Updated by Radko Krkoš about 5 years ago

Actions #9

Updated by Pavel Kácha about 5 years ago

  • Blocked by Feature #4230: Make use of existing or implement own mechanism for handling SQL schema migrations. added
Actions #10

Updated by Pavel Kácha over 4 years ago

  • Target version changed from Backlog to 2.6
Actions #11

Updated by Jan Mach over 4 years ago

  • Status changed from New to In Progress
  • Assignee changed from Jan Mach to Radko Krkoš

Hi Radko,

I would like to ask you to provide me with following, so I can proceed to implement this feature in Mentat system:

  • SQL code for creating new table.
  • Migration SQL function for transfering data from events.event column to new split table.

Thank you for your cooperation, I will handle the rest.

Actions #12

Updated by Radko Krkoš over 4 years ago

  • Assignee changed from Radko Krkoš to Jan Mach

Here you go:

CREATE TABLE IF NOT EXISTS events_idea(
    id text PRIMARY KEY REFERENCES events(id),
    event bytea
);

INSERT INTO events_idea(id, event) SELECT id, event FROM events;

ALTER TABLE events DROP COLUMN event;

VACUUM FREEZE;

CLUSTER events USING events_detecttime_idx;

The VACUUM and CLUSTER commands are very important as those are the points where the events table is compacted. They should be part of the automatic migration.

Actions #13

Updated by Jan Mach over 4 years ago

  • Status changed from In Progress to Resolved
  • Assignee changed from Jan Mach to Pavel Kácha
  • % Done changed from 90 to 100
  • To be discussed changed from No to Yes

I think this issue can be considered as resolved. In related issue #4274 there is a question of assessing the performance gains after this update.

Actions #14

Updated by Jan Mach over 4 years ago

  • Related to Task #6054: Explore the use of PostgreSQL views for easier event storing and querying added
Actions #15

Updated by Pavel Kácha over 4 years ago

  • Status changed from Resolved to Closed
Actions #16

Updated by Jan Mach about 4 years ago

  • To be discussed changed from Yes to No
Actions

Also available in: Atom PDF