Bug #7624
closedprecache module throws error when database is shutting down
100%
Description
So, this happens quite often when I turn on the maintenance mode when upgrading Mentat on mentat-alt.
2022-11-11 13:31:39,218 ERROR: Unable to fetch item set 'itemset-stat-detectors' from database: connection to server at "localhost" (::1), port 5432 failed: FATAL: the database system is shutting down
2022-11-11 13:31:39,224 ERROR: Unable to fetch item set 'itemset-stat-detectortypes' from database: connection to server at "localhost" (::1), port 5432 failed: FATAL: the database system is shutting down
2022-11-11 13:31:39,230 ERROR: Unable to fetch item set 'itemset-stat-protocols' from database: connection to server at "localhost" (::1), port 5432 failed: FATAL: the database system is shutting down
2022-11-11 13:31:39,236 ERROR: Unable to fetch item set 'itemset-stat-groups' from database: connection to server at "localhost" (::1), port 5432 failed: FATAL: the database system is shutting down
2022-11-11 13:31:39,242 ERROR: Unable to fetch item set 'itemset-stat-classes' from database: connection to server at "localhost" (::1), port 5432 failed: FATAL: the database system is shutting down
2022-11-11 13:31:39,247 ERROR: Unable to fetch item set 'itemset-stat-severities' from database: connection to server at "localhost" (::1), port 5432 failed: FATAL: the database system is shutting down
2022-11-11 13:31:39,253 ERROR: Unable to fetch item set 'itemset-stat-inspectionerrors' from database: connection to server at "localhost" (::1), port 5432 failed: FATAL: the database system is shutting down
I think it's quite annoying to get these emails.
Related issues
Updated by Rajmund Hruška almost 2 years ago
- Subject changed from prechace module throws error when database is shutting down to precache module throws error when database is shutting down
Updated by Rajmund Hruška over 1 year ago
- Status changed from New to In Progress
- Assignee set to Rajmund Hruška
- To be discussed changed from No to Yes
I encounter this error when upgrading mentat-alt. I use the conf/scripts/maintenance-mode.sh
script. In addition to steps 0 and 1 of the upgrading recipe at https://alchemist.cesnet.cz/mentat/doc/production/html/_doclib/upgrading.html#upgrading-mentat-system, it also restarts the database. Restarting the database is usually not required for upgrading, so a simple solution would be to remove the database restart from the script.
Updated by Rajmund Hruška over 1 year ago
precache
is a module which is run by cron
- it's a post-processing module. mentat-controller --command stop
only stops real-time modules. So mentat-controller
doesn't wait for precache.py
to finish.
The precache
module is usually running about 7.5 minutes. But sometimes it's running more than 10 minutes. The cron job is running every 10 minutes and I think this module (precache
) is not protected against running multiple times so there might be some race conditions.
This might be an issue for multiple post-processing modules. For example, mentat-reporter
can also run more than 10 minutes, but there is a file mutex, so the other run results in an error:
Traceback (most recent call last):
File "/var/mentat/venv/bin/mentat-reporter.py", line 51, in <module>
MentatReporterScript().run()
File "/var/mentat/venv/lib/python3.7/site-packages/pyzenkit/baseapp.py", line 1559, in run
self._stage_process()
File "/var/mentat/venv/lib/python3.7/site-packages/pyzenkit/baseapp.py", line 1472, in _stage_process
self._sub_stage_process()
File "/var/mentat/venv/lib/python3.7/site-packages/pyzenkit/zenscript.py", line 353, in _sub_stage_process
self.execute_script_command(cmdname)
File "/var/mentat/venv/lib/python3.7/site-packages/pyzenkit/zenscript.py", line 410, in execute_script_command
self.runlog[command_name] = cbk() # pylint: disable=locally-disabled,not-callable
File "/var/mentat/venv/lib/python3.7/site-packages/mentat/module/reporter.py", line 343, in cbk_command_report
with SimpleFlock("/var/tmp/mentat-reporter.py", 5):
File "/var/mentat/venv/lib/python3.7/site-packages/mentat/module/reporter.py", line 119, in __enter__
fcntl.flock(self._fd, fcntl.LOCK_EX | fcntl.LOCK_NB)
BlockingIOError: [Errno 11] Resource temporarily unavailable
Updated by Rajmund Hruška over 1 year ago
- Related to Bug #7644: Some post-processing modules are not protected against race conditions added
Updated by Rajmund Hruška over 1 year ago
- Status changed from In Progress to Resolved
- Target version changed from Backlog to 2.11
- % Done changed from 0 to 100
- To be discussed deleted (
Yes)
I have removed the database restart from the maintenance script.
I have created a new issue #7644 for the race conditions.
I will change the precache cron to run every hour instead of every 10 minutes. At first, I will just change it manually on mentat-alt. But I don't think this will help with the total execution time. I might be wrong though.
Updated by Rajmund Hruška over 1 year ago
- Status changed from Resolved to In Review
Updated by Rajmund Hruška over 1 year ago
Rajmund Hruška wrote in #note-5:
I will change the precache cron to run every hour instead of every 10 minutes. At first, I will just change it manually on mentat-alt. But I don't think this will help with the total execution time. I might be wrong though.
I was indeed wrong. On mentat-alt, the module is now running every hour and the runtime is about 10 minutes. So, I think it would be a good idea to change the configuration file in the repository.
Updated by Rajmund Hruška over 1 year ago
- Status changed from In Review to Closed