Bug #7624: precache module throws error when database is shutting down - Mentat - Homeproj: Redmine for CESNET

Actions

Copy link

Bug #7624

closed

precache module throws error when database is shutting down

Added by Rajmund Hruška almost 2 years ago. Updated over 1 year ago.

Status:

Closed

Priority:

Normal

Assignee:

Rajmund Hruška

Category:

Development - Core

Target version:

2.11

Start date:

01/31/2023

Due date:

% Done:

100%

Estimated time:

To be discussed:

Description

So, this happens quite often when I turn on the maintenance mode when upgrading Mentat on mentat-alt.

2022-11-11 13:31:39,218 ERROR: Unable to fetch item set 'itemset-stat-detectors' from database: connection to server at "localhost" (::1), port 5432 failed: FATAL:  the database system is shutting down
2022-11-11 13:31:39,224 ERROR: Unable to fetch item set 'itemset-stat-detectortypes' from database: connection to server at "localhost" (::1), port 5432 failed: FATAL:  the database system is shutting down
2022-11-11 13:31:39,230 ERROR: Unable to fetch item set 'itemset-stat-protocols' from database: connection to server at "localhost" (::1), port 5432 failed: FATAL:  the database system is shutting down
2022-11-11 13:31:39,236 ERROR: Unable to fetch item set 'itemset-stat-groups' from database: connection to server at "localhost" (::1), port 5432 failed: FATAL:  the database system is shutting down
2022-11-11 13:31:39,242 ERROR: Unable to fetch item set 'itemset-stat-classes' from database: connection to server at "localhost" (::1), port 5432 failed: FATAL:  the database system is shutting down
2022-11-11 13:31:39,247 ERROR: Unable to fetch item set 'itemset-stat-severities' from database: connection to server at "localhost" (::1), port 5432 failed: FATAL:  the database system is shutting down
2022-11-11 13:31:39,253 ERROR: Unable to fetch item set 'itemset-stat-inspectionerrors' from database: connection to server at "localhost" (::1), port 5432 failed: FATAL:  the database system is shutting down

I think it's quite annoying to get these emails.

Related issues

Actions

Copy link

Updated by Rajmund Hruška almost 2 years ago

Subject changed from prechace module throws error when database is shutting down to precache module throws error when database is shutting down

Actions

Copy link

Updated by Rajmund Hruška over 1 year ago

Status changed from New to In Progress
Assignee set to Rajmund Hruška
To be discussed changed from No to Yes

I encounter this error when upgrading mentat-alt. I use the conf/scripts/maintenance-mode.sh script. In addition to steps 0 and 1 of the upgrading recipe at https://alchemist.cesnet.cz/mentat/doc/production/html/_doclib/upgrading.html#upgrading-mentat-system, it also restarts the database. Restarting the database is usually not required for upgrading, so a simple solution would be to remove the database restart from the script.

Actions

Copy link

Updated by Rajmund Hruška over 1 year ago

precache is a module which is run by cron - it's a post-processing module. mentat-controller --command stop only stops real-time modules. So mentat-controller doesn't wait for precache.py to finish.

The precache module is usually running about 7.5 minutes. But sometimes it's running more than 10 minutes. The cron job is running every 10 minutes and I think this module (precache) is not protected against running multiple times so there might be some race conditions.

This might be an issue for multiple post-processing modules. For example, mentat-reporter can also run more than 10 minutes, but there is a file mutex, so the other run results in an error:

Traceback (most recent call last):
  File "/var/mentat/venv/bin/mentat-reporter.py", line 51, in <module>
    MentatReporterScript().run()
  File "/var/mentat/venv/lib/python3.7/site-packages/pyzenkit/baseapp.py", line 1559, in run
    self._stage_process()
  File "/var/mentat/venv/lib/python3.7/site-packages/pyzenkit/baseapp.py", line 1472, in _stage_process
    self._sub_stage_process()
  File "/var/mentat/venv/lib/python3.7/site-packages/pyzenkit/zenscript.py", line 353, in _sub_stage_process
    self.execute_script_command(cmdname)
  File "/var/mentat/venv/lib/python3.7/site-packages/pyzenkit/zenscript.py", line 410, in execute_script_command
    self.runlog[command_name] = cbk()  # pylint: disable=locally-disabled,not-callable
  File "/var/mentat/venv/lib/python3.7/site-packages/mentat/module/reporter.py", line 343, in cbk_command_report
    with SimpleFlock("/var/tmp/mentat-reporter.py", 5):
  File "/var/mentat/venv/lib/python3.7/site-packages/mentat/module/reporter.py", line 119, in __enter__
    fcntl.flock(self._fd, fcntl.LOCK_EX | fcntl.LOCK_NB)
BlockingIOError: [Errno 11] Resource temporarily unavailable

Actions

Copy link

Updated by Rajmund Hruška over 1 year ago

Related to Bug #7644: Some post-processing modules are not protected against race conditions added

Actions

Copy link

Updated by Rajmund Hruška over 1 year ago

Status changed from In Progress to Resolved
Target version changed from Backlog to 2.11
% Done changed from 0 to 100
To be discussed deleted (~~Yes~~)

I have removed the database restart from the maintenance script.

I have created a new issue #7644 for the race conditions.

I will change the precache cron to run every hour instead of every 10 minutes. At first, I will just change it manually on mentat-alt. But I don't think this will help with the total execution time. I might be wrong though.

Actions

Copy link

Updated by Rajmund Hruška over 1 year ago

Status changed from Resolved to In Review

Actions

Copy link

Updated by Rajmund Hruška over 1 year ago

Rajmund Hruška wrote in #note-5:

I will change the precache cron to run every hour instead of every 10 minutes. At first, I will just change it manually on mentat-alt. But I don't think this will help with the total execution time. I might be wrong though.

I was indeed wrong. On mentat-alt, the module is now running every hour and the runtime is about 10 minutes. So, I think it would be a good idea to change the configuration file in the repository.

Actions

Copy link

Updated by Rajmund Hruška over 1 year ago

Status changed from In Review to Closed

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Mentat

Custom queries

Bug #7624

precache module throws error when database is shutting down

Updated by Rajmund Hruška almost 2 years ago

Updated by Rajmund Hruška over 1 year ago

Updated by Rajmund Hruška over 1 year ago

Updated by Rajmund Hruška over 1 year ago

Updated by Rajmund Hruška over 1 year ago

Updated by Rajmund Hruška over 1 year ago

Updated by Rajmund Hruška over 1 year ago

Updated by Rajmund Hruška over 1 year ago