Bug #7759: Reporter doesn't update thresholds in some cases - Mentat - Homeproj: Redmine for CESNET

Actions

Copy link

Bug #7759

closed

Reporter doesn't update thresholds in some cases

Added by Rajmund Hruška 5 months ago. Updated 4 months ago.

Status:

Closed

Priority:

Normal

Assignee:

Jakub Judiny

Category:

Development - Core

Target version:

2.13.2

Start date:

07/12/2024

Due date:

% Done:

Estimated time:

To be discussed:

Description

If there are some events already thresholded, then TTL for events with the same source and event class, will not be updated.

Files

relapsed.txt (8.9 KB) relapsed.txt

Rajmund Hruška, 07/18/2024 11:12 AM

Related issues

Actions

Copy link

Updated by Rajmund Hruška 5 months ago

Subject changed from Reporter doesn't update thresholds in some cases to Reporter always reports relapsed events
Description updated (diff)
Status changed from In Progress to New
Assignee deleted (~~Rajmund Hruška~~)

Actions

Copy link

Updated by Rajmund Hruška 5 months ago

File relapsed.txt relapsed.txt added

Actions

Copy link

Updated by Pavel Kácha 5 months ago

An attempt to describe reporting mechanism is here .

Attempt in my words:

When report is sent for particular severity:class:ip, it gets its record in thresholds table. Also, all the subsequent events during threshold period are not reported, but recorded in events_thresholded table.

When no events arrive during relapse period (which starts some time before the end of the threshold period, and ends along with the threshold period), then after the end of the threshold all the corresponding events are silently flushed from the events_thresholded table.

When some events do arrive during the relapse period, then on the first reporter run after the threshold period all the corresponding thresholded events are reported, and also a new threshold period should start.

Seems we need a bit of review whether this algorithm is indeed implemented correctly (first idea – forgotten repeated thresholding when thresholded events are reported?)

Actions

Copy link

Updated by Pavel Kácha 5 months ago

Assignee set to Jakub Judiny

Actions

Copy link

Updated by Jakub Judiny 5 months ago

Subject changed from Reporter always reports relapsed events to Reporter doesn't update thresholds in some cases
Description updated (diff)
Status changed from New to In Progress
Target version changed from 2.13.1 to Backlog

Actions

Copy link

Updated by Jakub Judiny 5 months ago

Status changed from In Progress to Resolved

Actions

Copy link

Updated by Jakub Judiny 5 months ago

Status changed from Resolved to In Progress
Target version changed from Backlog to 2.13.2

Actions

Copy link

Updated by Jakub Judiny 5 months ago

Status changed from In Progress to Resolved

Actions

Copy link

Updated by Jakub Judiny 5 months ago

In the case described by the relapsed.txt file, some events were thresholded and then reported as a relapse a few second later. That means thresholding period was still active - and when the thresholding is still active, the caching mechanism ensures that thresholding time will not be set again (because it was already set in this reporter run).

The problem was simple - events were reported as relapsed even when the time was equal to the end of relapse time. When combined with the mechanism of caching described above, it caused the problem described by this issue. So I changed it to report events as relapsed only AFTER the thresholding (relapse) time is over (< instead of <=).

Actions

Copy link

#10

Updated by Jakub Judiny 5 months ago

So before my change, if the thresholding period ended at 2:20 and the reporter script was called at 2:20, events were thresholded and then reported a few second later as a relapse.

Now, the events will be thresholded (2:20) and reported in the next reporter run (4:20 for medium severity). Caching will work correctly in this case, because the thresholding will be long over, when the relapse is reported.

Actions

Copy link

#11

Updated by Rajmund Hruška 5 months ago

Status changed from Resolved to Feedback

Jakub Judiny wrote in #note-10:

Now, the events will be thresholded (2:20) and reported in the next reporter run (4:20 for medium severity). Caching will work correctly in this case, because the thresholding will be long over, when the relapse is reported.

I think those events would be deleted before they could have been reported.

Also somewhat unrelated: it only takes the events from the relapsed period, but it should take all the thresholded events (even before the start of relapse period) if there is any event during the relapse period.

Actions

Copy link

#12

Updated by Rajmund Hruška 5 months ago

Status changed from Feedback to In Review

Actions

Copy link

#13

Updated by Pavel Kácha 5 months ago

Another example. Is it the same issue?

https://mentat-hub.cesnet.cz/mentat/events/search?dt_from=2024-07-04+05%3A26%3A26&dt_to=&source_addrs=158.194.5.37&source_ports=&groups=abuse%40upol.cz&not_groups=&not_protocols=&description=&categories=Test&not_categories=True&not_severities=&classes=vulnerable-config-ipmi&not_classess=&submit=Hledat

https://mentat-hub.cesnet.cz/mentat/reports/205924/show
https://mentat-hub.cesnet.cz/mentat/reports/206007/show (relapse)
https://mentat-hub.cesnet.cz/mentat/reports/206072/show (relapse)
https://mentat-hub.cesnet.cz/mentat/reports/206093/show (NO relapse)
https://mentat-hub.cesnet.cz/mentat/reports/206183/show (NO relapse)
https://mentat-hub.cesnet.cz/mentat/reports/206285/show (NO relapse)

Actions

Copy link

#14

Updated by Jakub Judiny 5 months ago

Pavel Kácha wrote in #note-13:

Another example. Is it the same issue?

https://mentat-hub.cesnet.cz/mentat/events/search?dt_from=2024-07-04+05%3A26%3A26&dt_to=&source_addrs=158.194.5.37&source_ports=&groups=abuse%40upol.cz&not_groups=&not_protocols=&description=&categories=Test&not_categories=True&not_severities=&classes=vulnerable-config-ipmi&not_classess=&submit=Hledat

https://mentat-hub.cesnet.cz/mentat/reports/205924/show
https://mentat-hub.cesnet.cz/mentat/reports/206007/show (relapse)
https://mentat-hub.cesnet.cz/mentat/reports/206072/show (relapse)
https://mentat-hub.cesnet.cz/mentat/reports/206093/show (NO relapse)
https://mentat-hub.cesnet.cz/mentat/reports/206183/show (NO relapse)
https://mentat-hub.cesnet.cz/mentat/reports/206285/show (NO relapse)

Yes, this looks like the same issue - the third report did not prolong the thresholding period, because of this interval problem. So the fourth report should not have been created. Fifth and sixth reports are OK, because the events did not arrive during the relapse period (they arrived after the thresholding period ended).

Actions

Copy link

#15

Updated by Rajmund Hruška 5 months ago

Seems to be working well on mentat-alt - https://mentat-alt.cesnet.cz/mentat/reports/211576/show.

And it created new threshold record:

                       id                           |    thresholdtime    |     relapsetime     |       ttltime       
 vulnerable-config-xxx+++195.113.xxx.xxx            | 2024-08-06 01:30:00 | 2024-08-10 01:30:00 | 2024-08-12 01:30:00

Actions

Copy link

#16

Updated by Rajmund Hruška 4 months ago

Status changed from In Review to Closed

Actions

Copy link

#17

Updated by Jakub Judiny 4 months ago

Related to Bug #7775: Event aggregation in reports seems broken (recurrence mechanism) added

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Mentat

Custom queries

Bug #7759

Reporter doesn't update thresholds in some cases

Updated by Rajmund Hruška 5 months ago

Updated by Rajmund Hruška 5 months ago

Updated by Pavel Kácha 5 months ago

Updated by Pavel Kácha 5 months ago

Updated by Jakub Judiny 5 months ago

Updated by Jakub Judiny 5 months ago

Updated by Jakub Judiny 5 months ago

Updated by Jakub Judiny 5 months ago

Updated by Jakub Judiny 5 months ago

Updated by Jakub Judiny 5 months ago

Updated by Rajmund Hruška 5 months ago

Updated by Rajmund Hruška 5 months ago

Updated by Pavel Kácha 5 months ago

Updated by Jakub Judiny 5 months ago

Updated by Rajmund Hruška 5 months ago

Updated by Rajmund Hruška 4 months ago

Updated by Jakub Judiny 4 months ago