Project

General

Profile

Actions

Bug #7733

closed

mentat-storage may break during shutdown

Added by Pavel Kácha 8 months ago. Updated 6 months ago.

Status:
Closed
Priority:
Normal
Category:
Development - Core
Target version:
Start date:
05/02/2024
Due date:
% Done:

0%

Estimated time:
To be discussed:
No

Description

Found (and forgotten) during server upgrade - when shutting down, storage lost (or reused after remove) a savepoint.

2024-03-11 10:11:32,577 mentat-storage.py [1499] INFO: Component 'storage': Committing all pending messages and switching to immediate commit mode.
2024-03-11 10:11:32,577 mentat-storage.py [1499] ERROR: Exception <class 'mentat.services.eventstorage.EventStorageException'>: Savepoint does not exist in
transaction.
2024-03-11 10:11:32,581 mentat-storage.py [1499] ERROR:
  File "/var/mentat/venv/lib/python3.11/site-packages/pyzenkit/zendaemon.py", line 1209, in _sub_stage_process
        self._event_loop()
  File "/var/mentat/venv/lib/python3.11/site-packages/pyzenkit/zendaemon.py", line 1188, in _event_loop
        (flag, args) = handler(self, args)
                                   ^^^^^^^^^^^^^^^^^^^
  File "/var/mentat/venv/lib/python3.11/site-packages/mentat/daemon/component/storage.py", line 238, in cbk_event_stop
        self._commit_pending()
  File "/var/mentat/venv/lib/python3.11/site-packages/mentat/daemon/component/storage.py", line 323, in _commit_pending
        self.event_service.commit_bulk()
  File "/var/mentat/venv/lib/python3.11/site-packages/mentat/services/eventstorage.py", line 1010, in wrapped_f
        return func(other_self, *args, **kwargs)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/var/mentat/venv/lib/python3.11/site-packages/mentat/services/eventstorage.py", line 1097, in commit_bulk
        self.savepoint_release()
  File "/var/mentat/venv/lib/python3.11/site-packages/mentat/services/eventstorage.py", line 1010, in wrapped_f
        return func(other_self, *args, **kwargs)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/var/mentat/venv/lib/python3.11/site-packages/mentat/services/eventstorage.py", line 1131, in savepoint_release
        raise EventStorageException("Savepoint does not exist in transaction.")

Actions #1

Updated by Rajmund Hruška 6 months ago

  • Status changed from New to In Progress
  • Assignee set to Rajmund Hruška
  • Target version changed from Backlog to 2.13

I think I know what the issue was and luckily, we didn't lose any events.

To reproduce this bug, try stopping the storage just after this line appears in the log:

2024-06-13 11:28:50,289 mentat-storage.py [5269] INFO: Component 'storage': Commit timeout '5' elapsed, performing commit.

That is produced by this function:

def cbk_event_database_commit(self, daemon, args):
    """ 
    Check, whether there are any messages waiting to be committed for greater than
    configured time period. Commit them if necessary to avoid loss of data.
    """ 
    daemon.logger.debug(
        "Component '{}': Checking whether commit needs to be enforced".format(
            self.cid
        )
    )
    if self.events_uncommitted and ((time.time() - self.last_commit) > self.commit_bulkintv):
        daemon.logger.info(
            "Component '{}': Commit timeout '{}' elapsed, performing commit.".format(
                self.cid,
                self.commit_bulkintv
            )
        )
        self._commit_pending()

here is _commit_pending:

def _commit_pending(self):
    self.event_service.commit_bulk()
    self.inc_statistic(self.STATS_CNT_COMMIT_TIMEOUT, self.events_uncommitted)
    self.inc_statistic(self.STATS_CNT_COMMITS_TIMEOUT)
    self.events_uncommitted = 0
    self.last_commit = time.time()

and commit_bulk:

def commit_bulk(self):
    """ 
    Release and commit currently pending savepoint changes.
    """ 
    self.savepoint_release()
    self.commit()

Now in the function handling stopping of the storage:

def cbk_event_stop(self, daemon, args):
    """ 
    Stop the component.
    """ 
    daemon.logger.debug(
        "Component '{}': Stopping the component".format(
            self.cid
        )
    )
    # In case we are running in bulk commit mode.
    if self.commit_bulk:
        daemon.logger.info(
            "Component '{}': Committing all pending messages and switching to immediate commit mode.".format(
                self.cid
            )
        )
        # Commit all currently pending IDEA messages.
        self._commit_pending()
        # Switch to immediate commit mode for the rest of the messages in the queue.
        self._setup_insert_now(daemon)

one can see that _commit_pending is once again called resulting in calling savepoint_release twice.

Actions #2

Updated by Rajmund Hruška 6 months ago

  • Status changed from In Progress to Resolved
Actions #3

Updated by Rajmund Hruška 6 months ago

  • Status changed from Resolved to In Review
Actions #4

Updated by Rajmund Hruška 6 months ago

  • Status changed from In Review to Closed
Actions

Also available in: Atom PDF