Project

General

Profile

Actions

Bug #7624

closed

precache module throws error when database is shutting down

Added by Rajmund Hruška almost 2 years ago. Updated over 1 year ago.

Status:
Closed
Priority:
Normal
Category:
Development - Core
Target version:
Start date:
01/31/2023
Due date:
% Done:

100%

Estimated time:
To be discussed:

Description

So, this happens quite often when I turn on the maintenance mode when upgrading Mentat on mentat-alt.

2022-11-11 13:31:39,218 ERROR: Unable to fetch item set 'itemset-stat-detectors' from database: connection to server at "localhost" (::1), port 5432 failed: FATAL:  the database system is shutting down
2022-11-11 13:31:39,224 ERROR: Unable to fetch item set 'itemset-stat-detectortypes' from database: connection to server at "localhost" (::1), port 5432 failed: FATAL:  the database system is shutting down
2022-11-11 13:31:39,230 ERROR: Unable to fetch item set 'itemset-stat-protocols' from database: connection to server at "localhost" (::1), port 5432 failed: FATAL:  the database system is shutting down
2022-11-11 13:31:39,236 ERROR: Unable to fetch item set 'itemset-stat-groups' from database: connection to server at "localhost" (::1), port 5432 failed: FATAL:  the database system is shutting down
2022-11-11 13:31:39,242 ERROR: Unable to fetch item set 'itemset-stat-classes' from database: connection to server at "localhost" (::1), port 5432 failed: FATAL:  the database system is shutting down
2022-11-11 13:31:39,247 ERROR: Unable to fetch item set 'itemset-stat-severities' from database: connection to server at "localhost" (::1), port 5432 failed: FATAL:  the database system is shutting down
2022-11-11 13:31:39,253 ERROR: Unable to fetch item set 'itemset-stat-inspectionerrors' from database: connection to server at "localhost" (::1), port 5432 failed: FATAL:  the database system is shutting down

I think it's quite annoying to get these emails.


Related issues

Related to Mentat - Bug #7644: Some post-processing modules are not protected against race conditionsClosedRajmund Hruška04/17/2023

Actions
Actions #1

Updated by Rajmund Hruška over 1 year ago

  • Subject changed from prechace module throws error when database is shutting down to precache module throws error when database is shutting down
Actions #2

Updated by Rajmund Hruška over 1 year ago

  • Status changed from New to In Progress
  • Assignee set to Rajmund Hruška
  • To be discussed changed from No to Yes

I encounter this error when upgrading mentat-alt. I use the conf/scripts/maintenance-mode.sh script. In addition to steps 0 and 1 of the upgrading recipe at https://alchemist.cesnet.cz/mentat/doc/production/html/_doclib/upgrading.html#upgrading-mentat-system, it also restarts the database. Restarting the database is usually not required for upgrading, so a simple solution would be to remove the database restart from the script.

Actions #3

Updated by Rajmund Hruška over 1 year ago

precache is a module which is run by cron - it's a post-processing module. mentat-controller --command stop only stops real-time modules. So mentat-controller doesn't wait for precache.py to finish.

The precache module is usually running about 7.5 minutes. But sometimes it's running more than 10 minutes. The cron job is running every 10 minutes and I think this module (precache) is not protected against running multiple times so there might be some race conditions.

This might be an issue for multiple post-processing modules. For example, mentat-reporter can also run more than 10 minutes, but there is a file mutex, so the other run results in an error:

Traceback (most recent call last):
  File "/var/mentat/venv/bin/mentat-reporter.py", line 51, in <module>
    MentatReporterScript().run()
  File "/var/mentat/venv/lib/python3.7/site-packages/pyzenkit/baseapp.py", line 1559, in run
    self._stage_process()
  File "/var/mentat/venv/lib/python3.7/site-packages/pyzenkit/baseapp.py", line 1472, in _stage_process
    self._sub_stage_process()
  File "/var/mentat/venv/lib/python3.7/site-packages/pyzenkit/zenscript.py", line 353, in _sub_stage_process
    self.execute_script_command(cmdname)
  File "/var/mentat/venv/lib/python3.7/site-packages/pyzenkit/zenscript.py", line 410, in execute_script_command
    self.runlog[command_name] = cbk()  # pylint: disable=locally-disabled,not-callable
  File "/var/mentat/venv/lib/python3.7/site-packages/mentat/module/reporter.py", line 343, in cbk_command_report
    with SimpleFlock("/var/tmp/mentat-reporter.py", 5):
  File "/var/mentat/venv/lib/python3.7/site-packages/mentat/module/reporter.py", line 119, in __enter__
    fcntl.flock(self._fd, fcntl.LOCK_EX | fcntl.LOCK_NB)
BlockingIOError: [Errno 11] Resource temporarily unavailable
Actions #4

Updated by Rajmund Hruška over 1 year ago

  • Related to Bug #7644: Some post-processing modules are not protected against race conditions added
Actions #5

Updated by Rajmund Hruška over 1 year ago

  • Status changed from In Progress to Resolved
  • Target version changed from Backlog to 2.11
  • % Done changed from 0 to 100
  • To be discussed deleted (Yes)

I have removed the database restart from the maintenance script.

I have created a new issue #7644 for the race conditions.

I will change the precache cron to run every hour instead of every 10 minutes. At first, I will just change it manually on mentat-alt. But I don't think this will help with the total execution time. I might be wrong though.

Actions #6

Updated by Rajmund Hruška over 1 year ago

  • Status changed from Resolved to In Review
Actions #7

Updated by Rajmund Hruška over 1 year ago

Rajmund Hruška wrote in #note-5:

I will change the precache cron to run every hour instead of every 10 minutes. At first, I will just change it manually on mentat-alt. But I don't think this will help with the total execution time. I might be wrong though.

I was indeed wrong. On mentat-alt, the module is now running every hour and the runtime is about 10 minutes. So, I think it would be a good idea to change the configuration file in the repository.

Actions #8

Updated by Rajmund Hruška over 1 year ago

  • Status changed from In Review to Closed
Actions

Also available in: Atom PDF