Bug #7121: Spool dir is sometimes created with wrong privileges on start - Mentat - Homeproj: Redmine for CESNET

Actions

Copy link

Bug #7121

closed

Spool dir is sometimes created with wrong privileges on start

Added by Pavel Kácha almost 4 years ago. Updated 8 months ago.

Status:

Closed

Priority:

Normal

Assignee:

Rajmund Hruška

Category:

Development - Core

Target version:

Backlog

Start date:

03/11/2021

Due date:

% Done:

100%

Estimated time:

To be discussed:

Description

After cold start (after reboot), when /var/mentat/spool is empty, mentat-enricher.py directory has been created with wrong privileges: root:root.

All the others have been ok (mentat:mentat).

This causes outage in startup, as previous daemon in queue (mentat-inspector-b in our case) cannot output the events.

Related issues

Actions

Copy link

Updated by Radko Krkoš over 3 years ago

Related to Feature #4447: System status monitor should also report on warden-filer status added

Actions

Copy link

Updated by Radko Krkoš over 3 years ago

Related to Config #4723: Access permisions prevent warden-filer start after system reboot added

Actions

Copy link

Updated by Jan Mach about 3 years ago

Category set to Development - Core
Status changed from New to In Progress
Target version changed from Backlog to 2.9

Actions

Copy link

Updated by Jan Mach about 3 years ago

Status changed from In Progress to Feedback
% Done changed from 0 to 100
To be discussed changed from No to Yes

I was unable to replicate the problem situation locally. So I have instead chosen different approach to fix this bug:

I have enforced the queue work directories to be created with correct user/group ownership and permissions with chown and chmod.
I have enhanced logging regarding creation of all queue work directories. In case this will happen again in the future we might be able to conduct better investigation of the problem. There is an intentional unhandled exception with traceback to enable us locate the source of the problem. Both EUID and EGID are logged.

Unless someone can think of some other thing to help us cover our a**es I suggest we merge this ASAP to devel branch and deploy to mentat-alt to start using it in live environment and hopefully catch next occurence of this problem.

We might consider this bug resolved and close the task until the problem emerges again. In that case I would gather as much information as possible including relevant log lines and create new issue.

Actions

Copy link

Updated by Pavel Kácha about 3 years ago

From today's meeting:

As this is not replicable on dev env, please use mentat-alt and try to hunt it down on real iron. (Make first reboot (or couple) without your patches, to confirm it is replicable there.)

Actions

Copy link

Updated by Jan Mach about 3 years ago

As per our agreement I have tried to reproduce the bug on mentat-alt using multiple restarts. I was not able to do it both before update and after updating the code with attached patch. System booted up correctly and both Mentat and Warden client launched perfectly every time (with the exception of the first try, there was a minor bug in the patch that prevented mentat-storage to start, but was not related to the original problem).

In case this bug reappears in the future the enhanced logging might give us better understanding of the problem, but at the moment I am not sure, what to do next with this issue.

Actions

Copy link

Updated by Pavel Kácha about 3 years ago

So let's set to deferred and reopen if issue reappears?

Actions

Copy link

Updated by Jan Mach about 3 years ago

Pavel Kácha wrote in #note-7:

So let's set to deferred and reopen if issue reappears?

Your call. I suggest to close it, because I feel optimistic. Enforcing the queue directory ownership should work. If issue reappears, we can try to gather more evidence and log information, file new issue and link it back to this one. If we just set it to deferred we will push it in front of us for god knows how long.

Actions

Copy link

Updated by Pavel Kácha about 3 years ago

Status changed from Feedback to Closed

Jan Mach wrote in #note-8:

Pavel Kácha wrote in #note-7:

So let's set to deferred and reopen if issue reappears?

Your call. I suggest to close it, because I feel optimistic. Enforcing the queue directory ownership should work. If issue reappears, we can try to gather more evidence and log information, file new issue and link it back to this one. If we just set it to deferred we will push it in front of us for god knows how long.

No hard opinion. It's merged and deployed, closing then.

Actions

Copy link

#10

Updated by Pavel Kácha about 3 years ago

To be discussed deleted (~~Yes~~)

Actions

Copy link

#11

Updated by Pavel Kácha over 2 years ago

Status changed from Closed to New
Assignee changed from Jan Mach to Rajmund Hruška
Target version changed from 2.9 to Backlog

Seems issue still persists - happened at least 2022-07-19 somewhere between 11:00-12:00, enricher was unable to create /var/mentat/spool/mentat-enricher.py directory.

Actions

Copy link

#12

Updated by Pavel Kácha 8 months ago

Status changed from New to Closed

No occurrence since, let's reopen if it creeps out.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Mentat

Custom queries

Bug #7121

Spool dir is sometimes created with wrong privileges on start

Updated by Radko Krkoš over 3 years ago

Updated by Radko Krkoš over 3 years ago

Updated by Jan Mach about 3 years ago

Updated by Jan Mach about 3 years ago

Updated by Pavel Kácha about 3 years ago

Updated by Jan Mach about 3 years ago

Updated by Pavel Kácha about 3 years ago

Updated by Jan Mach about 3 years ago

Updated by Pavel Kácha about 3 years ago

Updated by Pavel Kácha about 3 years ago

Updated by Pavel Kácha over 2 years ago

Updated by Pavel Kácha 8 months ago