Project

General

Profile

Actions

Bug #7648

closed

Enricher and Storage modules are too slow

Added by Rajmund Hruška over 1 year ago. Updated 8 months ago.

Status:
Closed
Priority:
High
Category:
-
Target version:
Start date:
04/26/2023
Due date:
% Done:

0%

Estimated time:
To be discussed:
No

Description

Approximately since the warden outage, the real-time processing modules mentat-storage and mentat.enricher are too slow to process the incoming messages. This happens irregularly, probably based on the amount of events coming from warden.


Files

difference.txt (32 KB) difference.txt Rajmund Hruška, 05/05/2023 02:56 PM
Screenshot from 2023-05-09 10-16-46.png (49.3 KB) Screenshot from 2023-05-09 10-16-46.png Rajmund Hruška, 05/09/2023 10:32 AM
Actions #1

Updated by Rajmund Hruška over 1 year ago

The first module in the pipeline is actually mentat-inspector. Surprisingly, this module doesn't have such performance issues.

Actions #2

Updated by Rajmund Hruška over 1 year ago

  • Assignee set to Rajmund Hruška
Actions #3

Updated by Rajmund Hruška over 1 year ago

  • Status changed from New to In Progress

The bottleneck is actually mentat-storage.py. In mentat-enricher, when 5000 messages limit in the output queue (input queue of mentat-storage) is reached, the enricher module is waiting until the amount of messages decreases below this limit. Therefore, mentat-enricher is mostly sleeping.

Actions #4

Updated by Rajmund Hruška over 1 year ago

Sometimes, mentat-storage.py is idle for a couple of minutes.

2023-05-05 10:43:44,656 mentat-storage.py [23692] INFO: Component 'storage': Stored message 'mentat-alt.3017.1683275939.364001.42.20144884.idea':'e53e18c4-9c86-4300-9bc1-5d122ea4ab06' into database (bulk mode).
2023-05-05 10:43:44,665 mentat-storage.py [23692] INFO: Component 'storage': Stored message 'mentat-alt.3017.1683275939.363383.42.20144878.idea':'dd8c921d-4808-4b48-8f15-c7e276a56efc' into database (bulk mode).
2023-05-05 10:43:44,707 mentat-storage.py [23692] INFO: Component 'storage': Stored message 'mentat-alt.3017.1683275939.361595.42.20144868.idea':'5f6e1099-c194-4296-921e-215cd5692232' into database (bulk mode).
2023-05-05 10:43:44,744 mentat-storage.py [23692] INFO: Component 'storage': Stored message 'mentat-alt.3017.1683275939.361945.42.20144869.idea':'0b45605d-8e55-4f0b-be37-346beb65732f' into database (bulk mode).
2023-05-05 10:43:44,786 mentat-storage.py [23692] INFO: Component 'storage': Stored message 'mentat-alt.3017.1683275939.364102.42.20144885.idea':'e82575bc-2859-4680-9f7d-9504ae329d61' into database (bulk mode).
2023-05-05 10:43:44,816 mentat-storage.py [23692] INFO: Component 'storage': Stored message 'mentat-alt.3017.1683275939.362930.42.20144876.idea':'09e5da1f-161b-4cdf-bc3b-2427ab0adb7e' into database (bulk mode).
2023-05-05 10:43:44,856 mentat-storage.py [23692] INFO: Component 'storage': Stored message 'mentat-alt.3017.1683275939.364796.42.20144887.idea':'887b5059-1f25-453d-b034-30654a86a43a' into database (bulk mode).
2023-05-05 10:43:44,891 mentat-storage.py [23692] INFO: Component 'storage': Stored message 'mentat-alt.3017.1683275939.365145.42.20144888.idea':'0eaf85ad-56e4-4370-8d43-cb37a20e3e8e' into database (bulk mode).
2023-05-05 10:43:44,973 mentat-storage.py [23692] INFO: Component 'storage': Stored message 'mentat-alt.3017.1683275939.364449.42.20144886.idea':'ec8bad9d-bf71-423f-9384-6d9dbcdab4ea' into database (bulk mode).
2023-05-05 10:43:44,973 mentat-storage.py [23692] INFO: Component 'storage': Commit timeout '5' elapsed, performing commit.
2023-05-05 10:43:45,022 mentat-storage.py [23692] INFO: Component 'storage': Stored message 'mentat-alt.3017.1683275939.365507.42.20144889.idea':'4687d6cb-8a3b-46a9-aee2-92328134c02f' into database (bulk mode).
2023-05-05 10:43:45,052 mentat-storage.py [23692] INFO: Component 'storage': Stored message 'mentat-alt.3017.1683275939.365849.42.20144890.idea':'91198035-2d1e-4ce5-9513-08cfb413deb8' into database (bulk mode).
2023-05-05 10:43:45,083 mentat-storage.py [23692] INFO: Component 'storage': Stored message 'mentat-alt.3017.1683275939.366309.42.20144891.idea':'326635c4-5e3e-4b5c-9b36-75010be88591' into database (bulk mode).
2023-05-05 10:46:59,537 mentat-storage.py [23692] INFO: Component 'storage': Stored message 'mentat-alt.3017.1683275939.366818.42.20144892.idea':'7612666a-1cc8-4e34-9206-d8cabcdd86d4' into database (bulk mode).
2023-05-05 10:46:59,537 mentat-storage.py [23692] INFO: Component 'storage': Commit timeout '5' elapsed, performing commit.
2023-05-05 10:46:59,539 mentat-storage.py [23692] INFO: Component 'filer': *** Processing statistics ***
    cnt_enqueued                       0 (        +0,     0.00 #/s)
    cnt_fetched                8,750,612 (      +508,     2.50 #/s)
    cnt_updated                        0 (        +0,     0.00 #/s)
    cnt_committed              8,750,611 (      +508,     2.50 #/s)
    cnt_banished                       0 (        +0,     0.00 #/s)
    cnt_canceled                       0 (        +0,     0.00 #/s)
    cnt_dispatched                     0 (        +0,     0.00 #/s)
    cnt_duplicated                     0 (        +0,     0.00 #/s)
    cnt_errors                         0 (        +0,     0.00 #/s)
2023-05-05 10:46:59,540 mentat-storage.py [23692] INFO: Component 'parser': *** Processing statistics ***
    cnt_parsed                 8,750,612 (      +508,     2.50 #/s)
    cnt_encoded                        0 (        +0,     0.00 #/s)
    cnt_errors                         0 (        +0,     0.00 #/s)
2023-05-05 10:46:59,540 mentat-storage.py [23692] INFO: Component 'storage': *** Processing statistics ***
    cnt_stored                 8,750,612 (      +508,     2.50 #/s)
    cnt_errors                         0 (        +0,     0.00 #/s)
    cnt_eci_timeout            8,495,612 (      +611,     3.01 #/s)
    cnt_eci_threshold            255,000 (        +0,     0.00 #/s)
    cnt_cis_timeout               26,918 (        +3,     0.01 #/s)
    cnt_cis_threshold                510 (        +0,     0.00 #/s)
2023-05-05 10:46:59,540 mentat-storage.py [23692] INFO: Component 'commiter': *** Processing statistics ***
    cnt_commits                8,750,612 (      +508,     2.50 #/s)
2023-05-05 10:46:59,541 mentat-storage.py [23692] INFO: Application runlog saved to file '/var/mentat/run/mentat-storage.py/202305031329.05c86.runlog'
2023-05-05 10:46:59,582 mentat-storage.py [23692] INFO: Component 'storage': Stored message 'mentat-alt.3017.1683275939.367627.42.20144894.idea':'ab5c7467-7e49-4d0f-b227-c90706c4b952' into database (bulk mode).
2023-05-05 10:46:59,624 mentat-storage.py [23692] INFO: Component 'storage': Stored message 'mentat-alt.3017.1683275939.367974.42.20144895.idea':'daba791d-e99a-4f12-8770-37abf34f71f6' into database (bulk mode).
2023-05-05 10:46:59,663 mentat-storage.py [23692] INFO: Component 'storage': Stored message 'mentat-alt.3017.1683275939.367277.42.20144893.idea':'9f24ec86-d28e-4179-bd50-cb9d9ee587a8' into database (bulk mode).
Actions #5

Updated by Rajmund Hruška over 1 year ago

Mentat-hub also has such problem. I checked the log from the past 6 days and I looked for commits which took longer than 5 seconds.

Here is a couple of the longer ones (the full list is in the attached file):

Difference: 0:04:13.735000
2023-04-30 18:38:04,090 mentat-storage.py [4248] INFO: Component 'storage': Stored message 'mentat-hub.3937.1682872674.215313.42.95108556.idea':'5d26766f-1b6c-4464-9a45-abbb2b1f0d8b' into database (bulk mode).
2023-04-30 18:42:17,825 mentat-storage.py [4248] INFO: Component 'storage': Stored message 'mentat-hub.3937.1682872674.213048.42.95108554.idea':'d8f041ee-1f2a-4c8c-a923-37db9db0e37a' into database (bulk mode).

Difference: 0:00:37.145000
2023-05-02 18:07:35,152 mentat-storage.py [4248] INFO: Component 'storage': Stored message 'mentat-hub.3937.1683043546.844466.42.104630444.idea':'9996e774-4258-4e34-bb48-7a8718b20e2d' into database (bulk mode).
2023-05-02 18:08:12,297 mentat-storage.py [4248] INFO: Component 'storage': Stored message 'mentat-hub.3937.1683043546.844523.42.104630445.idea':'7008923c-99cf-415f-9021-6312f003bf7e' into database (bulk mode).

Difference: 0:01:14.004000
2023-05-02 18:10:57,553 mentat-storage.py [4248] INFO: Component 'storage': Stored message 'mentat-hub.3937.1683043765.774413.42.104638710.idea':'5bc7bbf9-c9e9-49e4-b4f6-18dec1698ddf' into database (bulk mode).
2023-05-02 18:12:11,557 mentat-storage.py [4248] INFO: Component 'storage': Stored message 'mentat-hub.3937.1683043765.772501.42.104638709.idea':'0b2834ce-cf75-4586-8ce3-2465f48e8205' into database (bulk mode).

Difference: 0:03:15.174000
2023-05-04 06:24:21,431 mentat-storage.py [4248] INFO: Component 'storage': Stored message 'mentat-hub.3937.1683174131.527206.42.112273032.idea':'fc24b1bc-935f-4b50-824d-7b9c38f752cb' into database (bulk mode).
2023-05-04 06:27:36,605 mentat-storage.py [4248] INFO: Component 'storage': Stored message 'mentat-hub.3937.1683174131.529052.42.112273033.idea':'86b913c1-7b9e-4066-88d5-34e18f1339d2' into database (bulk mode).

Difference: 0:02:13.985000
2023-05-04 06:28:06,507 mentat-storage.py [4248] INFO: Component 'storage': Stored message 'mentat-hub.3937.1683174168.153324.42.112256350.idea':'78c428bd-dcbc-4785-a50e-353842d30c39' into database (bulk mode).
2023-05-04 06:30:20,492 mentat-storage.py [4248] INFO: Component 'storage': Stored message 'mentat-hub.3937.1683174168.153397.42.112256351.idea':'e327639f-f183-45c9-96ce-47fed564f5cf' into database (bulk mode).

Difference: 0:01:08.315000
2023-05-04 11:54:07,147 mentat-storage.py [4248] INFO: Component 'storage': Stored message 'mentat-hub.3937.1683194042.539961.42.113508866.idea':'4d2310e2-c85b-4691-8765-6964013cbd27' into database (bulk mode).
2023-05-04 11:55:15,462 mentat-storage.py [4248] INFO: Component 'storage': Stored message 'mentat-hub.3937.1683194042.540401.42.113508874.idea':'884da1eb-9e3a-4268-87bd-aa317fa16ec1' into database (bulk mode).

The respective IDEA events are:
https://mentat-hub.cesnet.cz/mentat/events/d8f041ee-1f2a-4c8c-a923-37db9db0e37a/show
https://mentat-hub.cesnet.cz/mentat/events/7008923c-99cf-415f-9021-6312f003bf7e/show
https://mentat-hub.cesnet.cz/mentat/events/0b2834ce-cf75-4586-8ce3-2465f48e8205/show
https://mentat-hub.cesnet.cz/mentat/events/86b913c1-7b9e-4066-88d5-34e18f1339d2/show
https://mentat-hub.cesnet.cz/mentat/events/e327639f-f183-45c9-96ce-47fed564f5cf/show
https://mentat-hub.cesnet.cz/mentat/events/884da1eb-9e3a-4268-87bd-aa317fa16ec1/show

Actions #6

Updated by Rajmund Hruška over 1 year ago

I think it's actually not related to the size of the stored event. For example event with id '049c4bc7-6b9f-4fa4-9067-5973ff79d194' was stored both on mentat-hub and mentat-alt.

I checked logs on mentat-hub

2023-05-09 09:35:59,147 mentat-storage.py [67933] INFO: Component 'storage': Stored message 'mentat-hub.3937.1683617735.348093.42.135677343.idea':'7c810a13-346d-48da-9f4b-9e074c30bacb' into database (bulk mode).
2023-05-09 09:35:59,162 mentat-storage.py [67933] INFO: Component 'storage': Stored message 'mentat-hub.3937.1683617735.348316.42.135677344.idea':'049c4bc7-6b9f-4fa4-9067-5973ff79d194' into database (bulk mode).
2023-05-09 09:35:59,177 mentat-storage.py [67933] INFO: Component 'storage': Stored message 'mentat-hub.3937.1683617735.348763.42.135677346.idea':'53f9accd-179f-4cfe-ae06-c939cdaffa92' into database (bulk mode).

and mentat-alt

2023-05-09 09:36:11,572 mentat-storage.py [5435] INFO: Component 'storage': Stored message 'mentat-alt.3017.1683617733.891316.42.48657867.idea':'3afcc9cc-b36c-44ea-a99f-a7180e1c477e' into database (bulk mode).
2023-05-09 09:36:11,592 mentat-storage.py [5435] INFO: Component 'storage': Stored message 'mentat-alt.3017.1683617733.896053.42.48657898.idea':'1a7c73b4-be52-4f0c-9de7-fc5b92c44bd5' into database (bulk mode).
2023-05-09 09:36:11,592 mentat-storage.py [5435] INFO: Component 'filer': Scheduling next queue check after '3' seconds
2023-05-09 09:36:11,593 mentat-storage.py [5435] INFO: Waiting for '2' seconds until next scheduled event
2023-05-09 09:36:13,593 mentat-storage.py [5435] INFO: Received wakeup signal (14)
2023-05-09 09:36:13,593 mentat-storage.py [5435] INFO: Waiting for '1' seconds until next scheduled event
2023-05-09 09:36:14,593 mentat-storage.py [5435] INFO: Received wakeup signal (14)
2023-05-09 09:36:14,593 mentat-storage.py [5435] INFO: Component 'storage': Commit timeout '5' elapsed, performing commit.

and later, a few minutes after stopping the module with mentat-controller

2023-05-09 10:18:14,746 mentat-storage.py [5435] WARNING: Received signal 'SIGINT' (2)
2023-05-09 10:18:14,770 mentat-storage.py [5435] INFO: Component 'storage': Stored message 'mentat-alt.3017.1683617733.896821.42.48657900.idea':'049c4bc7-6b9f-4fa4-9067-5973ff79d194' into database (bulk mode).
2023-05-09 10:18:14,770 mentat-storage.py [5435] INFO: Component 'storage': Commit timeout '5' elapsed, performing commit.
2023-05-09 10:18:14,771 mentat-storage.py [5435] INFO: Component 'filer': *** Processing statistics ***
2023-05-09 10:18:14,772 mentat-storage.py [5435] INFO: Component 'parser': *** Processing statistics ***
2023-05-09 10:18:14,772 mentat-storage.py [5435] INFO: Component 'storage': *** Processing statistics ***
2023-05-09 10:18:14,772 mentat-storage.py [5435] INFO: Component 'commiter': *** Processing statistics ***
2023-05-09 10:18:14,773 mentat-storage.py [5435] INFO: Application runlog saved to file '/var/mentat/run/mentat-storage.py/202305051102.01535.runlog'

I was also lucky to see the query in the database as shown in the attached screenshot.

Actions #7

Updated by Rajmund Hruška over 1 year ago

So, the issue was with the database. I should still check enricher, though, which seems to be the bottleneck. Maybe there is some low hanging fruit.

Actions #8

Updated by Rajmund Hruška over 1 year ago

I looked for some easy ways to improve enricher but I found next to nothing. The only reasonable thing I saw is using country database instead of city database in geoip plugin. It only gives about 0.5 miliseconds of improvement per event, but it's a trivial change which also makes more sense, so I would go for it. And in a day it can save about 25 minutes.

Actions #9

Updated by Pavel Kácha over 1 year ago

  • Target version changed from 2.11 to Backlog
Actions #10

Updated by Rajmund Hruška 8 months ago

  • Status changed from In Progress to Closed

No longer an issue on the new mentat-hub.

Actions

Also available in: Atom PDF