Project

General

Profile

Actions

Feature #6227

open

Incorporate new info from Negistry into group db and reporting

Added by Pavel Kácha over 1 year ago. Updated 1 day ago.

Status:
Feedback
Priority:
Normal
Category:
Development - Core
Target version:
Start date:
04/29/2021
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
To be discussed:

Description

Negistry will be able to hold information about (some) local subblocks not in RIPE, and also about some overlaying blocks (like Metacentrum machines, CESNET machines on foreign IPs, etc). Also, it exports abuse contacts with information about severity, which this abuse contact is expecting to see.

There are two exports - one, which is very similar to current one, simply exports all the information. And nonoverlapping one, which provides calculated information about abuse contact (and related block handles without detailed info).

See full examples in attachments.


Files


Subtasks 2 (2 open0 closed)

Feature #7052: Link report to each group which owns itResolvedRajmund Hruska04/29/2021

Actions
Feature #7257: Store email addresses to which the report was sentResolvedRajmund Hruska04/29/2021

Actions

Related issues

Related to Mentat - Task #6239: Simplify/remove too detailed user settings for reportingResolvedRajmund Hruska02/25/2020

Actions
Related to Mentat - Bug #6209: Reenable Metacentrum network list updateDeferred01/31/2020

Actions
Actions #2

Updated by Pavel Kácha over 1 year ago

  • Related to Task #6239: Simplify/remove too detailed user settings for reporting added
Actions #3

Updated by Pavel Kácha over 1 year ago

If we need to send reports to more recipients (To, Cc), we have a clash in reporting preferences (which takes precedence and how), and also in working with summary reports.

Settings adaptation gets tracked in #6239.

After discussion, summary reports can be solved by a bit different splitting. We now have nonoverlapping abuse contact resolution, so let's create summary reports for the most specific networks. That may mean more summaries, however only for handful of organisations, for whom we have more specific data - and these will most probably benefit from it. Also, it is possible to "merge" summaries across boundaries in overlapping networks (GRID).

Actions #4

Updated by Jan Žerdík over 1 year ago

  • To be discussed changed from Yes to No
Actions #5

Updated by Pavel Kácha about 1 year ago

  • Target version changed from Backlog to 2.8
Actions #6

Updated by Jan Žerdík about 1 year ago

  • To be discussed changed from No to Yes
Actions #7

Updated by Pavel Kácha about 1 year ago

Nets without abuse contact:
  • we need new config key which will get these stray reports
  • error line in report (thus in template)
Nets which cannot compute abuse contact (Cc low, To critical, we are not able to direct report low/med/high)
  • let it also go to stray report contact with error line

Needs solution in Negistry (to not allow "unsolvable" situations).

Actions #8

Updated by Pavel Kácha 9 months ago

  • Assignee changed from Jan Žerdík to Rajmund Hruska
Actions #9

Updated by Pavel Kácha 9 months ago

  • To be discussed deleted (Yes)
Actions #10

Updated by Pavel Kácha 8 months ago

  • To be discussed set to Yes
Actions #11

Updated by Pavel Kácha 7 months ago

  • To be discussed deleted (Yes)
Some clarifications from 2021-01-11 meeting:
  • whois_type (source in db) does not need to be deduced from the data, but command line importer could have an argument to specify it (however we currently use only Negistry as the source, so this is not high prio)
  • there is no need for backward compatibility as new Negistry, which implements this, supports old format (albeit incomplete) also.
Also - there are three mostly orthogonal parts of this - in order of importance (so these might get implemented and rolled out gradually and separately).
  • "feed" rank, which decides relative priority between overlaping feeds (say higher priority for Cesnet appliances over RIPE data)
  • network hierarchy (subblocks contained within parent blocks)
  • "severity" - what severity of incidents particular abuse contact accepts (for example faculty abuse contact handles everything whereas top level university abuse contact handles only high and critical)
Actions #12

Updated by Pavel Kácha 6 months ago

From the 2021-01-22 meeting:

Note to events_tresholded table:
  • event id, ip+class (or sort+concat category), group id, timestamp
  • record for each event, so we can just join in tresholded events in case of relapse
Rajmund Hruska noted, that we will have to change reporting algorithm to accomodate more detailed abuse contact data.
Currently the algorithm goes through all the groups and tries to fetch events to report for them. That's because there are group specific settings for reporting windows. However, we have already decided to remove per group settings and make them systemwide in #6239, so we can change the algorithm in two steps:
  • fetch the list of related events for the reporting window, then apply reporting cycle only for affected groups (result should be still the same as before)
    • after the meeting Radko correctly noted that fetching all the events is not the good idea from the memory standpoint (Python does not return back), and we already have the abuse field parsed out in the metatable, so we might just select ... distinct abuse groups for the reporting time window and the rest of algorith stays the same)
  • then make changes necessary to accomodate negistry (more granular abuse contact info in Enricher) and change the algorithm to iterate over those more specific sets of abuse contact

However - due to personal changes the #6239 haven't reached devel correctly, co we need to merge d282e3a0 (and nothing after) and work on top of that.

Actions #13

Updated by Rajmund Hruska 6 months ago

  • To be discussed set to Yes
Actions #14

Updated by Rajmund Hruska 6 months ago

Based on the meeting from 2021-02-12:
  • Currently, my solution assigns emails to cc header after filtering the events for the most specific group, just before the email is sent. However, determining groups in cc header should be done before the filtering, so that some groups won't get events which would be otherwise filtered out.
  • The database should store each group to which the report is sent. This is tracked in a separate issue #7052
Actions #15

Updated by Pavel Kácha 6 months ago

  • Target version changed from 2.8 to 2.9
Actions #16

Updated by Pavel Kácha 6 months ago

  • Related to Bug #6209: Reenable Metacentrum network list update added
Actions #17

Updated by Rajmund Hruska 5 months ago

  • Status changed from New to Feedback

As one network can now be linked to multiple abuse groups based on the severity, I think it makes sense to create a new table which will be mapping this relation between network and a group. Also, networks table will have one more column - rank.

If I understand the schema correctly, deleting all network records beforehand (by dropping the old table in alembic and creating 2 new tables) and creating the records during the first run of netmngr module, should be alright. It does seem pretty rough though and maybe there is some issue that I can't see, so I would like to ask your opinion.

Actions #18

Updated by Rajmund Hruska 5 months ago

  • Status changed from Feedback to In Progress
  • To be discussed changed from Yes to No

From the 2021-03-05 call:
My idea was to save the network hierarchy in the database. It turns out that Mentat supports the hierarchy of groups rather than the hierarchy of the networks. So, I will find a different approach to save necessary information needed for more specific reporting based on the severity.

Actions #19

Updated by Pavel Kácha 5 months ago

Rajmund Hruska wrote in #note-18:

From the 2021-03-05 call:
My idea was to save the network hierarchy in the database. It turns out that Mentat supports the hierarchy of groups rather than the hierarchy of the networks. So, I will find a different approach to save necessary information needed for more specific reporting based on the severity.

Please, take look whether the hierarchy could be created from the netblock subset/superset relations. I have theorized about using Organisation field from RIPE, however this field will is not set for non-RIPE feeds, and also it does not say anything about parent/child block relation, as I originally thought.

Also - careful with regenerating the db. Groups do have associated a lot of settings, which would be lost. Deleting and reassigning only the networks db might be ok, though. (However, I would be careful and compare the result for possible differences or anomalies.)

Actions #20

Updated by Rajmund Hruska 5 months ago

  • To be discussed changed from No to Yes
Actions #21

Updated by Rajmund Hruska 5 months ago

I made a script which compares resolved abuses for IP addresses from negistry-multifeed-abuses-segmented-ipranges.json and from new Mentat resolving. I ran across a problem of having two almost identical networks, which only differ in the rank and the feed. I thought there was some problem in having both records in the database, but it seems that it's actually ok, so I will change importer to allow this.

Other than that, there is a broken unit test regarding reporting, but current version allows sending reports to multiple groups and takes rank into consideration. The only feature left is considering severity level in Negistry data for reporting.

Actions #22

Updated by Rajmund Hruska 5 months ago

  • To be discussed changed from Yes to No
Actions #23

Updated by Rajmund Hruska 5 months ago

I made a change which allows having multiple networks with the same primary key. Because of that, I have changed the way how source in networks table is calculated. Now, if network from the input file contains feed attribute, then this attribute becomes the part of the source of that particular network (e.g. whois/grid_devices).

I ran the comparing script again and I found out that there is still quite an issue. Look at these records from negistry-multifeed-abuses-full-ipranges.json:
  • feed: cuni_subnets
  • rank: 500
  • primary_key: 195.113.149.128 - 195.113.149.131
  • resolved_abuses: {low: }
  • feed: ripe_cesnet
  • rank: 100
  • primary_key: 195.113.149.128 - 195.113.149.131
  • resolved_abuses: {low: }

The resolved abuse from cuni_subnets feed should be in 'abuse_to' header and the other one should be in 'abuse_cc' header, but in negistry-multifeed-abuses-segmented-ipranges.json it's opposite. Isn't there a mistake in that segmented file?

I am also attaching the file which contains list of abuses for a given IP address, as calculated by Mentat.

Actions #24

Updated by Pavel Kácha 5 months ago

Rajmund Hruska wrote in #note-23:

I ran the comparing script again and I found out that there is still quite an issue. Look at these records from negistry-multifeed-abuses-full-ipranges.json:
  • feed: cuni_subnets
  • rank: 500
  • primary_key: 195.113.149.128 - 195.113.149.131
  • resolved_abuses: {low: }
  • feed: ripe_cesnet
  • rank: 100
  • primary_key: 195.113.149.128 - 195.113.149.131
  • resolved_abuses: {low: }

The resolved abuse from cuni_subnets feed should be in 'abuse_to' header and the other one should be in 'abuse_cc' header, but in negistry-multifeed-abuses-segmented-ipranges.json it's opposite. Isn't there a mistake in that segmented file?

Is this really so? Excerpt from negistry-multifeed-abuses-segmented-ipranges.json:

{
    "ip4_start": "195.113.149.128", 
    "ip4_end": "195.113.149.131", 
    "cidrs": [
      "195.113.149.128/30" 
    ], 
    "first": 3279000960, 
    "last": 3279000963, 
    "ranges": [
      {
        "feed": "cuni_subnets", 
        "primary_key": "195.113.149.128 - 195.113.149.131" 
      }, 
      {
        "feed": "ripe_cesnet", 
        "primary_key": "195.113.149.128 - 195.113.149.131" 
      }, 
      {
        "feed": "ripe_cesnet", 
        "primary_key": "195.113.0.0 - 195.113.255.255" 
      }
    ], 
    "abuses": {
      "high": {
        "cc": [
          "abuse@cuni.cz", 
          "abuse@cesnet.cz" 
        ], 
        "to": [
          "vhor@cuni.cz" 
        ]
      }, 
      "medium": {
        "cc": [
          "abuse@cuni.cz", 
          "abuse@cesnet.cz" 
        ], 
        "to": [
          "vhor@cuni.cz" 
        ]
      }, 
      "critical": {
        "cc": [
          "abuse@cuni.cz", 
          "abuse@cesnet.cz" 
        ], 
        "to": [
          "vhor@cuni.cz" 
        ]
      }, 
      "low": {
        "cc": [
          "abuse@cuni.cz", 
          "abuse@cesnet.cz" 
        ], 
        "to": [
          "vhor@cuni.cz" 
        ]
      }
    }
  }, 

vhor is in To (from cuni_subnets), then abuse@cuni and abuse@cesnet (from ripe_cesnet).

Actions #25

Updated by Rajmund Hruska 5 months ago

Oh, so the file attached to this issue is probably outdated, because here is the record I was referring to.

  {
    "ip4_start": "195.113.149.128", 
    "ip4_end": "195.113.149.131", 
    "cidrs": [
      "195.113.149.128/30" 
    ], 
    "first": 3279000960, 
    "last": 3279000963, 
    "feeds": [
      "ripe_cesnet", 
      "cuni_subnets", 
      "ripe_cesnet" 
    ], 
    "ranks": [
      100, 
      500, 
      100
    ], 
    "primary_keys": [
      "195.113.149.128 - 195.113.149.131", 
      "195.113.149.128 - 195.113.149.131", 
      "195.113.0.0 - 195.113.255.255" 
    ], 
    "netnames": [
      [
        "CUNI-UJOP-TCZ" 
      ], 
      [
        "UJOP-PODEBRADY" 
      ], 
      [
        "CZ-TEN-34-970317" 
      ]
    ], 
    "client_ids": [
      "C010000", 
      null, 
      "A010000" 
    ], 
    "abuse_to": {
      "low": [
        "abuse@cuni.cz" 
      ]
    }, 
    "abuse_cc": [
      {
        "low": [
          "vhor@cuni.cz" 
        ]
      }, 
      {
        "low": [
          "abuse@cesnet.cz" 
        ]
      }
    ]
  }
Actions #26

Updated by Rajmund Hruska 4 months ago

Rajmund Hruska wrote in #note-23:

I made a change which allows having multiple networks with the same primary key. Because of that, I have changed the way how source in networks table is calculated. Now, if network from the input file contains feed attribute, then this attribute becomes the part of the source of that particular network (e.g. whois/grid_devices).

I realized that this change will make every network in the current database (from old Negistry) different from the networks from the new Negistry even with the same primary key, because the source attribute is different. Networks in the database don't have special indispensable attributes so I think it's OK to create them anew. I can try to think of another way of solving this though, if necessary.

Actions #27

Updated by Rajmund Hruska 4 months ago

I fixed the issue with some events being filtered out even though they shouldn't be - it was the same issue as in #4489. I thought I had this case covered but apparently it wasn't covered.

I have also fixed the broken unit tests.

Actions #28

Updated by Rajmund Hruska 4 months ago

  • To be discussed changed from No to Yes
Actions #29

Updated by Rajmund Hruska 4 months ago

  • Status changed from Feedback to In Progress
  • To be discussed changed from Yes to No
From the 2021-03-26 call:
  • Deleting networks and creating them anew shouldn't be a problem. I should create some script to test current networks from mentat-hub and those new ones.
  • Turns out, that there is a bug in the old netmngr.py script which I thought is a feature. If a network has multiple resolved abuses emails then for each of those emails a new abuse group is created. Instead of that, network should belong only to one abuse group. Proposed solution is to sort the emails and create a new abuse group which name would be equal to all emails joined together. Most of the networks only have one email in resolved abuses so they would stay the same. Then, I should I look into having an option of changing the name of the abuse group while preserving functionality.
  • Also, if is in resolved abuses with some other emails, then this email should be ignored. This currently only affects network 2001:718::/29.
Actions #30

Updated by Rajmund Hruska 3 months ago

I finished the last part of this task - reporting by severity. There are probably tons of bugs, so it should be thoroughly tested.

I dumped groups, networks and settings_reporting tables from mentat-alt, copied the data to my local machine and ran the netmngr.py script. It seems like there are quite a lot of changes, but it seems OK to me. I am attaching the output.

Actions #31

Updated by Rajmund Hruska about 1 month ago

  • To be discussed changed from No to Yes

I changed the event generator to generate events, where source IP addresses can be resolved to any group stored in the database and I tried reporting those events.

I noticed that at one point Mentat tried sending a report with empty to and cc headers. I checked the target abuse group and found out that this group only has abuse contacts for events with severity high or critical and the event Mentat was reporting had severity medium. I think it was a test group but this type of issue should probably be handled.

The other issue is with the shown IP address in the reports. If I remember correctly, if the event has source IP addresses from multiple (unrelated) abuse groups, the report should only show IP addresses which belong to the particular group which owns the report. Also, the web interface in the tab Statistics -> # IPs only shows IP4 addresses and does not show IP6 addresses.

Actions #32

Updated by Rajmund Hruska about 1 month ago

Based on the meeting om 2021-06-24:

Rajmund Hruska wrote in #note-31:

I noticed that at one point Mentat tried sending a report with empty to and cc headers. I checked the target abuse group and found out that this group only has abuse contacts for events with severity high or critical and the event Mentat was reporting had severity medium. I think it was a test group but this type of issue should probably be handled.

Mentat now filters out the groups from resolved abuse groups list where the reported severity is lower than the expected severity.

I did not write a code for checking that any IP address from any network stored in the database can be reported for any severity. First of all, it's not related to this issue. Secondly, do we really want to enforce that any IP address belonging to the network of some abuse group can be reported for any severity?

The other issue is with the shown IP address in the reports. If I remember correctly, if the event has source IP addresses from multiple (unrelated) abuse groups, the report should only show IP addresses which belong to the particular group which owns the report. Also, the web interface in the tab Statistics -> # IPs only shows IP4 addresses and does not show IP6 addresses.

I should check how it is displayed in the current version of Mentat (mentat-alt), by using artificial Idea event with multiple source IP addresses which belong to multiple groups. This is yet to be done.

Actions #33

Updated by Rajmund Hruska about 1 month ago

Rajmund Hruska wrote in #note-32:

The other issue is with the shown IP address in the reports. If I remember correctly, if the event has source IP addresses from multiple (unrelated) abuse groups, the report should only show IP addresses which belong to the particular group which owns the report. Also, the web interface in the tab Statistics -> # IPs only shows IP4 addresses and does not show IP6 addresses.

I should check how it is displayed in the current version of Mentat (mentat-alt), by using artificial Idea event with multiple source IP addresses which belong to multiple groups. This is yet to be done.

I created artificials events (1, 2 and 3) with multiple sources and stored it at mentat-alt. The behaviour in my branch implementing this feature is the same as the current version of Mentat (2.8). In the report message, only the relevant IP addresses are shown which is expected. But "Metadata" tab of the report counts all IP4 addresses across all reported events and doesn't count IP6 addresses. Similarly, "Statistics" tab, section "# IPs" only shows all IP4 addresses. The reports for each groups can be found here: 1, 2, 3, 4

Actions #34

Updated by Rajmund Hruska about 1 month ago

At 2021-06-24 meeting, I mentioned that when creating a name of abuse group which consists of multiple emails I assume that those emails stay in the same severity across all networks.

Basically, I assume that situations like this don't occur:

{
...
"resolved_abuses": {
      "medium": ["aaa@bbb.cz"],
      "low": ["aaa@bbb2.cz"]
    }
}
{
...
"resolved_abuses": {
      "high": ["aaa@bbb.cz"],
      "medium": ["aaa@bbb2.cz"]
    }
}

During the meeting it was said that this assumption should be checked. I wrote a function called in netmngr.py module which checks that and logs inconsistencies. There is no inconsistency in the data from Negistry.

Actions #35

Updated by Rajmund Hruska 19 days ago

  • To be discussed changed from Yes to No

Apparently, there is a new fallback option in the data from Negistry, so I should look into that.

Also, at some point I tried comparing resolved abuses from Mentat (with data from Negistry) with those from negistry-multifeed-abuses-segmented-ipranges.json and for some IP addresses 'to' emails and 'cc' emails differ (Mentat says that email should be in 'cc' header and negistry-multifeed-abuses-segmented-ipranges.json says that the email should be in 'to' header). I should run this comparison again.

Actions #36

Updated by Rajmund Hruska 9 days ago

  • Status changed from In Progress to Feedback
  • To be discussed changed from No to Yes

Rajmund Hruska wrote in #note-34:

At 2021-06-24 meeting, I mentioned that when creating a name of abuse group which consists of multiple emails I assume that those emails stay in the same severity across all networks.

Basically, I assume that situations like this don't occur:
[...]This has

During the meeting it was said that this assumption should be checked. I wrote a function called in netmngr.py module which checks that and logs inconsistencies. There is no inconsistency in the data from Negistry.

With the new fallback option this is no longer true. There is a network with only one resolved_abuses item (fallback: ) and there is another network with only one resolved_abuses item (low: ). It is still just one group with one resolved email, but the email occurs in different 'severities'.

Currently, in the database there are 3 entities: groups, networks and reporting settings. Each group has exactly one reporting settings, where the resolved abuses are stored.

There are probably multiple ways how to handle fallback option. The simplest one I can think of is storing the fallback in the networks table. The disadvantage of that is that some emails would be stored in the networks table and some in the reporting settings table.

What is your opinion?

Actions #37

Updated by Pavel Kácha 6 days ago

Rajmund Hruska wrote in #note-36:

Rajmund Hruska wrote in #note-34:

At 2021-06-24 meeting, I mentioned that when creating a name of abuse group which consists of multiple emails I assume that those emails stay in the same severity across all networks.

Basically, I assume that situations like this don't occur:
[...]This has

During the meeting it was said that this assumption should be checked. I wrote a function called in netmngr.py module which checks that and logs inconsistencies. There is no inconsistency in the data from Negistry.

With the new fallback option this is no longer true. There is a network with only one resolved_abuses item (fallback: ) and there is another network with only one resolved_abuses item (low: ). It is still just one group with one resolved email, but the email occurs in different 'severities'.

Currently, in the database there are 3 entities: groups, networks and reporting settings. Each group has exactly one reporting settings, where the resolved abuses are stored.

There are probably multiple ways how to handle fallback option. The simplest one I can think of is storing the fallback in the networks table. The disadvantage of that is that some emails would be stored in the networks table and some in the reporting settings table.

What is your opinion?

After discussion we came to three options:

  • storing the fallback in the networks table (needs model change, migration, maybe UI changes, etc.)
  • split the concerned group to two with different attributes, akin to as now + for fallback networks (makes UI cluttered for group admin, need to manage two groups)
  • those base ranges where fallback attribute is used are specific for CESNET/Negistry, we could rely on global fallback (which would need to be implemented) and basically ignore fallback attribute (maybe check whether the value is expected - , to nail down the case when model on the side of Negistry somehow changes)

Third seems to be the least invasive and work fine for now.

Actions #38

Updated by Pavel Kácha 6 days ago

Pavel Kácha wrote in #note-37:

  • those base ranges where fallback attribute is used are specific for CESNET/Negistry, we could rely on global fallback (which would need to be implemented) and basically ignore fallback attribute (maybe check whether the value is expected - , to nail down the case when model on the side of Negistry somehow changes)

Third seems to be the least invasive and work fine for now.

So we need to:
  • create global fallback, which will be used if we are unable to resolve any abuse contact (possibly more addresses)
  • this creates ambiguity as to whom should get the ownership of the report, when there is no resolve. This is deeper problem, and we decided to change the behavior to assign the report to ALL related groups, but use low/medium/high only when sending mails - and use global fallback only when no mail would be sent
  • maybe we could look into getting this information into reporting templates, to be able to clearly signal in the report that is's "orphaned", so the fallback recipient clearly knows what happened
  • update Negistry importer script to just check the fallback attribute for and quack when it's not the case (to underpin possible Negistry model change)
Actions #39

Updated by Pavel Kácha 6 days ago

  • To be discussed deleted (Yes)
Actions #40

Updated by Rajmund Hruska 1 day ago

I chose to implement a global fallback. In the netmngr.py script which imports abuse groups and networks, if network has fallback it is never stored in the database. If I stored such networks I would successfully resolve IP address and the issue wouldn't be solved. The networks with fallback option are stored temporarily to check that this option is the same as the global fallback, which is stored in core/reporting.json.conf. If those fallbacks differ, an error is logged but the script continues without stopping.

Finally, filtering abuse groups based on severity is done after the report is created and if there is no email address to send the report to, the global fallback is used.

I haven't made any change in reporting template yet.

Actions

Also available in: Atom PDF