Project

General

Profile

Actions

Bug #6252

closed

Timeline graphing is causing mayhem on production

Added by Radko Krkoš about 4 years ago. Updated almost 4 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
Research and analysis
Target version:
Start date:
03/05/2020
Due date:
% Done:

0%

Estimated time:
To be discussed:
No

Description

The current implementation of timeline graphing with a broad SELECT from the database and post-processing in Python inside Apache is causing serious problems leading to OOM-killing the Apache process and (in effect) flushing the disc cache, what impacts the performance and user experience of the whole system.
The required processing in Python is currently extensive and does not scale to non-trivial time intervals. There are numerous cases visible in the kernel log of Apache process allocating all available memory (250GB) only to be OOM killed after 30+ minutes of work. The length of time required to recover from this is extreme, as effectively the whole of disk cache is vacated and we rely on it heavily for performance.

We need to decrease the amount of work done in Python, there are several ways to reach that target, for example:
1) Identify non-useful outputs and stop calculating them.
2) Split the one large calculation of everything into parts as very rarely the user is truly interested in all possible known outputs.
3) Move the calculation into the DB, which will save a lot of duplicated iteration over the data. The DB is designed to answer analytical queries and the most efficient way to use it is to query for exactly the results required, not source data to be processed afterwards.

1 can be done at any time, 2 and 3 are best done together, after 1 is finished.


Related issues

Related to Mentat - Feature #4609: Arbitrary grouping and sorting in EventsClosedJan Mach01/30/2019

Actions
Actions #1

Updated by Pavel Kácha about 4 years ago

On the meeting Mek mentioned that there are not yet patches causing default search limits on Mentat-hub. If that is true, we should check again after they're there - whether we need some more immediate solution.

(Bud all Radko's points of course still hold.)

Actions #2

Updated by Pavel Kácha about 4 years ago

  • Related to Feature #4609: Arbitrary grouping and sorting in Events added
Actions #3

Updated by Pavel Kácha almost 4 years ago

  • Status changed from New to Closed
  • Target version changed from Backlog to 2.7

Solved for now.

Actions

Also available in: Atom PDF