Project

General

Profile

Bug #6252

Timeline graphing is causing mayhem on production

Added by Radko Krkoš 3 months ago. Updated 12 days ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
Research and analysis
Target version:
Start date:
03/05/2020
Due date:
% Done:

0%

Estimated time:
To be discussed:
No

Description

The current implementation of timeline graphing with a broad SELECT from the database and post-processing in Python inside Apache is causing serious problems leading to OOM-killing the Apache process and (in effect) flushing the disc cache, what impacts the performance and user experience of the whole system.
The required processing in Python is currently extensive and does not scale to non-trivial time intervals. There are numerous cases visible in the kernel log of Apache process allocating all available memory (250GB) only to be OOM killed after 30+ minutes of work. The length of time required to recover from this is extreme, as effectively the whole of disk cache is vacated and we rely on it heavily for performance.

We need to decrease the amount of work done in Python, there are several ways to reach that target, for example:
1) Identify non-useful outputs and stop calculating them.
2) Split the one large calculation of everything into parts as very rarely the user is truly interested in all possible known outputs.
3) Move the calculation into the DB, which will save a lot of duplicated iteration over the data. The DB is designed to answer analytical queries and the most efficient way to use it is to query for exactly the results required, not source data to be processed afterwards.

1 can be done at any time, 2 and 3 are best done together, after 1 is finished.


Related issues

Related to Mentat - Feature #4609: Arbitrary grouping and sorting in EventsClosed01/30/2019

Associated revisions

Revision ece4af50 (diff)
Added by Jan Mach 3 months ago

Added more insistent default time value for 'dt_from’ in certain views.

Endpoints events.search, timeline.search and reports.search now have more insistent default value for dt_from parameter to make default queries less demanding on resources. (Redmine issue: #4609,#6252)

History

#1 Updated by Pavel Kácha 3 months ago

On the meeting Mek mentioned that there are not yet patches causing default search limits on Mentat-hub. If that is true, we should check again after they’re there - whether we need some more immediate solution.

(Bud all Radko’s points of course still hold.)

#2 Updated by Pavel Kácha 3 months ago

  • Related to Feature #4609: Arbitrary grouping and sorting in Events added

#3 Updated by Pavel Kácha 12 days ago

  • Status changed from New to Closed
  • Target version changed from Backlog to 2.7

Solved for now.

Also available in: Atom PDF