Bug #7759
closedReporter doesn't update thresholds in some cases
Added by Rajmund Hruška 6 months ago. Updated 4 months ago.
0%
Description
If there are some events already thresholded, then TTL for events with the same source and event class, will not be updated.
Files
relapsed.txt (8.9 KB) relapsed.txt | Rajmund Hruška, 07/18/2024 11:12 AM |
Related issues
Updated by Rajmund Hruška 6 months ago
- Subject changed from Reporter doesn't update thresholds in some cases to Reporter always reports relapsed events
- Description updated (diff)
- Status changed from In Progress to New
- Assignee deleted (
Rajmund Hruška)
Updated by Pavel Kácha 6 months ago
An attempt to describe reporting mechanism is here .
Attempt in my words:
- When report is sent for particular severity:class:ip, it gets its record in thresholds table. Also, all the subsequent events during threshold period are not reported, but recorded in events_thresholded table.
- When no events arrive during relapse period (which starts some time before the end of the threshold period, and ends along with the threshold period), then after the end of the threshold all the corresponding events are silently flushed from the events_thresholded table.
- When some events do arrive during the relapse period, then on the first reporter run after the threshold period all the corresponding thresholded events are reported, and also a new threshold period should start.
Seems we need a bit of review whether this algorithm is indeed implemented correctly (first idea – forgotten repeated thresholding when thresholded events are reported?)
Updated by Jakub Judiny 5 months ago
- Subject changed from Reporter always reports relapsed events to Reporter doesn't update thresholds in some cases
- Description updated (diff)
- Status changed from New to In Progress
- Target version changed from 2.13.1 to Backlog
Updated by Jakub Judiny 5 months ago
- Status changed from In Progress to Resolved
Updated by Jakub Judiny 5 months ago
- Status changed from Resolved to In Progress
- Target version changed from Backlog to 2.13.2
Updated by Jakub Judiny 5 months ago
- Status changed from In Progress to Resolved
Updated by Jakub Judiny 5 months ago
In the case described by the relapsed.txt file, some events were thresholded and then reported as a relapse a few second later. That means thresholding period was still active - and when the thresholding is still active, the caching mechanism ensures that thresholding time will not be set again (because it was already set in this reporter run).
The problem was simple - events were reported as relapsed even when the time was equal to the end of relapse time. When combined with the mechanism of caching described above, it caused the problem described by this issue. So I changed it to report events as relapsed only AFTER the thresholding (relapse) time is over (< instead of <=).
Updated by Jakub Judiny 5 months ago
So before my change, if the thresholding period ended at 2:20 and the reporter script was called at 2:20, events were thresholded and then reported a few second later as a relapse.
Now, the events will be thresholded (2:20) and reported in the next reporter run (4:20 for medium severity). Caching will work correctly in this case, because the thresholding will be long over, when the relapse is reported.
Updated by Rajmund Hruška 5 months ago
- Status changed from Resolved to Feedback
Jakub Judiny wrote in #note-10:
Now, the events will be thresholded (2:20) and reported in the next reporter run (4:20 for medium severity). Caching will work correctly in this case, because the thresholding will be long over, when the relapse is reported.
I think those events would be deleted before they could have been reported.
Also somewhat unrelated: it only takes the events from the relapsed period, but it should take all the thresholded events (even before the start of relapse period) if there is any event during the relapse period.
Updated by Rajmund Hruška 5 months ago
- Status changed from Feedback to In Review
Updated by Pavel Kácha 5 months ago
Another example. Is it the same issue?
https://mentat-hub.cesnet.cz/mentat/reports/205924/show
https://mentat-hub.cesnet.cz/mentat/reports/206007/show (relapse)
https://mentat-hub.cesnet.cz/mentat/reports/206072/show (relapse)
https://mentat-hub.cesnet.cz/mentat/reports/206093/show (NO relapse)
https://mentat-hub.cesnet.cz/mentat/reports/206183/show (NO relapse)
https://mentat-hub.cesnet.cz/mentat/reports/206285/show (NO relapse)
Updated by Jakub Judiny 5 months ago
Pavel Kácha wrote in #note-13:
Another example. Is it the same issue?
https://mentat-hub.cesnet.cz/mentat/reports/205924/show
https://mentat-hub.cesnet.cz/mentat/reports/206007/show (relapse)
https://mentat-hub.cesnet.cz/mentat/reports/206072/show (relapse)
https://mentat-hub.cesnet.cz/mentat/reports/206093/show (NO relapse)
https://mentat-hub.cesnet.cz/mentat/reports/206183/show (NO relapse)
https://mentat-hub.cesnet.cz/mentat/reports/206285/show (NO relapse)
Yes, this looks like the same issue - the third report did not prolong the thresholding period, because of this interval problem. So the fourth report should not have been created. Fifth and sixth reports are OK, because the events did not arrive during the relapse period (they arrived after the thresholding period ended).
Updated by Rajmund Hruška 5 months ago
Seems to be working well on mentat-alt - https://mentat-alt.cesnet.cz/mentat/reports/211576/show.
And it created new threshold record:
id | thresholdtime | relapsetime | ttltime vulnerable-config-xxx+++195.113.xxx.xxx | 2024-08-06 01:30:00 | 2024-08-10 01:30:00 | 2024-08-12 01:30:00
Updated by Jakub Judiny 4 months ago
- Related to Bug #7775: Event aggregation in reports seems broken (recurrence mechanism) added