Project

General

Profile

Actions

Feature #4571

closed

Aggregation of IP address lists to ranges within one event

Added by Pavel Kácha over 5 years ago. Updated 11 months ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
Development - Core
Target version:
Start date:
01/18/2019
Due date:
% Done:

100%

Estimated time:
To be discussed:

Description

Some detectors (namely LaBrea) send large lists of IP addresses. It would spare db io and space (hence querying) to simplify runs of consequential IP addresses to ranges right before saving to db (in storage daemon). Like:

"192.0.2.5", "192.0.2.6", "192.0.2.7", "192.0.2.8" -> "192.0.2.5-192.0.2.8"

Actions #1

Updated by Rajmund Hruška over 3 years ago

  • Subject changed from Aggregation os IP address lists to ranges within one event to Aggregation of IP address lists to ranges within one event
  • Status changed from New to Resolved
  • Assignee changed from Jan Mach to Pavel Kácha
  • % Done changed from 0 to 100
Actions #2

Updated by Pavel Kácha over 3 years ago

  • Assignee changed from Pavel Kácha to Rajmund Hruška

If I understand the code correctly you assume that in the IDEA.Source.IP4/IP6 are only single IPs (also, you are testing only cases of single IPs). However, there can be instances of IP4Range("192.0.2.64-192.0.2.127") or IP6Net("2001:db8:230::25c8:1946/32") (or IP4Net or IP6Range) already coming from IDEA.

I'd suggest working with them uniformly as ranges (all have .low() and .high() method, including single IP4, IP6 classes), it might also simplify the code.

Also, partitioning mixed list to two lists of IP4 and IP6 addrs respectively may be doable quite effectively with Python filter builtin.

Actions #3

Updated by Pavel Kácha over 3 years ago

  • Status changed from Resolved to Feedback
Actions #4

Updated by Rajmund Hruška over 3 years ago

  • Assignee changed from Rajmund Hruška to Pavel Kácha

I am not sure what you mean. Currently, I copy nets and ranges to the output list. I try to join the the remaining single IP addresses into ranges and after that I append this list of ranges to the output. Should I try to join nets and ranges into a bigger range?

Also, I didn't handle the case when some IP address in IDEA event is a part of a range which is also in IDEA event, as this case wasn't handled in the original code.

Actions #5

Updated by Pavel Kácha over 3 years ago

  • Assignee changed from Pavel Kácha to Rajmund Hruška

Rajmund Hruska wrote in #note-4:

I am not sure what you mean. Currently, I copy nets and ranges to the output list. I try to join the the remaining single IP addresses into ranges and after that I append this list of ranges to the output. Should I try to join nets and ranges into a bigger range?

Well, that's pretty much what I'd do - it doesn't matter, whether input are single IPs or ranges, what we do care is that the result is represented as the tersest possible set of ranges or IPs, which exactly cover input. The search on the db always goes over the ranges (even on one IP search), so we want to compress them as much as possible.

Example: 1-5, 2-8, 10-15, 12, 15, 17 can be compacted as 1-8, 10-15, 17.

Also, I didn't handle the case when some IP address in IDEA event is a part of a range which is also in IDEA event, as this case wasn't handled in the original code.

That would be nice of course, but depends on the amount of work or refactoring needed.

Actions #6

Updated by Rajmund Hruška over 3 years ago

  • Assignee changed from Rajmund Hruška to Pavel Kácha

I pushed a new branch with a new version, because I used a slightly different approach. Now, I try to join also ranges and nets. Also, the code handles the case when some IP address in IDEA event is a part of a range or a net which is also in IDEA event.

The IP addresses are now displayed in descending order. IP nets are displayed as ranges (e.g. 192.168.0.0/27 -> 192.168.0.0-192.168.0.127) but in the database, they are stored as a nets.

https://homeproj.cesnet.cz/projects/mentat/repository/mentat-ng/revisions/3bfc67521efc95b6f30871e467b07a1ec062b43a

Actions #7

Updated by Radko Krkoš over 3 years ago

  • To be discussed set to Yes

An addendum to VC discussion regarding interpretation of CIDR prefix notation.

The CIDR prefix notation is defined in RFC 4632, Section 3.1, as follows:

In the simplest sense, the change from Class A/B/C network numbers to
classless prefixes is to make explicit which bits in a 32-bit IPv4
address are interpreted as the network number (or prefix) associated
with a site and which are the used to number individual end systems
within the site.

What implies that it is possible to represent a specific end system using this notation.

In CIDR notation, a prefix is shown as a 4-octet
quantity, just like a traditional IPv4 address or network number,
followed by the "/" (slash) character, followed by a decimal value
between 0 and 32 that describes the number of significant bits.

Note the explicit "IPv4 address or network number".

For example, the legacy "Class B" network 172.16.0.0, with an implied
network mask of 255.255.0.0, is defined as the prefix 172.16.0.0/16,
the "/16" indicating that the mask to extract the network portion of
the prefix is a 32-bit value where the most significant 16 bits are
ones and the least significant 16 bits are zeros.

Again, implying that the prefix notation is a concatenation of an IPv4 address (any: network, host/end system, broadcast) a slash and the subnet length, as the instructions to extract the network portion are given.

Unfortunately all examples given are of address blocks (as the RFC deals with routing), so no hard evidence. Let us look at interpretations of this RFC.

Wikipedia - example of CIDR notation:

192.168.100.14/24 represents the IPv4 address 192.168.100.14 and its associated routing prefix 192.168.100.0, or equivalently, its subnet mask 255.255.255.0, which has 24 leading 1-bits.

Cisco - example in IP addressing and subnetting tutorial:

DeviceA: 172.16.17.30/20
DeviceB: 172.16.28.15/20

IONOS - a web hosting company has an in-depth CIDR tutorial, stating this:

Creating subnets is about creating commonalities. 201.105.7.34/24 is in the same network as 201.105.7.1/24. The suffix signals that only the first 24 bits of the network component are counted.

Et cetera.

My interpretation is that an address in prefix form with host portion bits set represents a host with subnetting information and should not be translated as an erroneous network specification.
For example, 192.168.1.10/24 is a host 192.168.1.10 in the subnetwork 192.168.1.0/24. The subnetwork information is not important from the point of view of a SIEM (but may be important to the administrator of the device, hence the inclusion in the IDEA alert).

My proposal is to ignore the subnet size information for non-network addresses for storage and search, just provide those in the reports (as this information might be important to the end admin), or just ignore it altogether if this would turn hard to implement. As for the subnets in prefix form (with no host bits set), these should continue to be interpreted as subnets.

Actions #8

Updated by Pavel Kácha over 3 years ago

That holds only in case everyone sending into Warden knows and respects all these standards, and consciously applies this distinction (bare IP vs CIDR network vs CIDR interface=IP+network) them to Idea events. Realistically - Idea events are usually generated from templates or usual libraries. In templates you get whatever comes, and for senders it makes no sense to put CIDR interfaces in. In code you usually use IPranges, which do not support CIDR interfaces and strip IP bits (arguably it should raise an error) or use Python ipaddress module (which does not support arbitrary ranges, so they use IPv4Address tuple, ignoring CIDR altogether.

Also note that Idea docs (https://idea.cesnet.cz/en/definition#net4) mentions only networks - which is again omission on Idea side, so CIDR interfaces make no sense here. Of course, we should clarify single IPs here, and I'm all for clarifying here that CIDR nets are only meant for networks and make no sense as CIDR interface pairs.

So - realistically - we should not deduce, but clarify docs before someone REALLY starts using it and create correct checks on our side.

Actions #9

Updated by Radko Krkoš over 3 years ago

  • To be discussed changed from Yes to No

Pavel Kácha wrote in #note-8:

That holds only in case everyone sending into Warden knows and respects all these standards, and consciously applies this distinction (bare IP vs CIDR network vs CIDR interface=IP+network) them to Idea events.

This is a void argument that you can make about anything. If you default on ignorance of the other side, all communication is futile and impossible.

Realistically - Idea events are usually generated from templates or usual libraries. In templates you get whatever comes, and for senders it makes no sense to put CIDR interfaces in.

I see your point, but the precise address must have come from somewhere (I doubt it was randomly made up as a special case of the subnet that was really meant). To me it makes more sense to ignore the prefix information than the host bits. Then, I tried around the most frown upon research method, a questionaire (popularity contest). It seems that these CIDRs are interpreted as host IPs only by people with networking background, others (the overwhelming majority) interpret IPs with prefix (mostly by the slash) as a subnet range as the host part is not obvious to them. So this can be probably closed as a case of fachidiotism on my side.

In code you usually use IPranges, which do not support CIDR interfaces and strip IP bits (arguably it should raise an error) or use Python ipaddress module (which does not support arbitrary ranges, so they use IPv4Address tuple, ignoring CIDR altogether.

I support the case of raising an error.

Also note that Idea docs (https://idea.cesnet.cz/en/definition#net4) mentions only networks - which is again omission on Idea side, so CIDR interfaces make no sense here. Of course, we should clarify single IPs here, and I'm all for clarifying here that CIDR nets are only meant for networks and make no sense as CIDR interface pairs.

I support this idea.

So - realistically - we should not deduce, but clarify docs before someone REALLY starts using it and create correct checks on our side.

Again, I do not see this as some far-fetched deduction (as I have hopefully shown, it is the canonical meaning), but I also do not find it productive to continue arguing. I agree that clarifying the documentation is the best course of action.

Thanks for the discussion.

Actions #10

Updated by Pavel Kácha about 3 years ago

  • Target version changed from Backlog to 2.8
Actions #11

Updated by Pavel Kácha about 3 years ago

  • Status changed from Feedback to In Review
Actions #12

Updated by Jan Mach about 3 years ago

Merged into devel.

Actions #13

Updated by Jan Mach about 3 years ago

  • Status changed from In Review to Closed
Actions #14

Updated by Rajmund Hruška 11 months ago

  • Status changed from Closed to Feedback
  • Target version changed from 2.8 to 2.11

Seem like it's not finished yet. The IP address should also be sorted.

Actions #15

Updated by Rajmund Hruška 11 months ago

Rajmund Hruška wrote in #note-14:

Seem like it's not finished yet. The IP address should also be sorted.

Radko Krkoš, the IP addresses are actually aggregated and stored into the database in descending order. The GUI just shows them in the order in which they are stored in the json we have received.

Actions #16

Updated by Rajmund Hruška 11 months ago

  • To be discussed changed from No to Yes
Actions #17

Updated by Rajmund Hruška 11 months ago

  • Status changed from Feedback to Closed
  • Target version changed from 2.11 to Backlog
  • To be discussed deleted (Yes)
Actions

Also available in: Atom PDF