Project

General

Profile

Actions

Feature #4277

closed

Test split ranges in metadata table

Added by Pavel Kácha over 6 years ago. Updated about 5 years ago.

Status:
Closed
Priority:
High
Assignee:
Category:
Development - Core
Target version:
Start date:
08/21/2018
Due date:
% Done:

100%

Estimated time:
To be discussed:

Description

Queries involving ranges often fall down to the table scan or index scan. It may help to split out covering ranges from the events themselves into metadata table get them indexed, and extend the Source/Target queries by ANDing these covering ranges first. Queries then might end up using index for covering range (which would work for vast majority of events), and resort for doublechecking the array of IP ranges the hard way only on the result.

We count on pqsql planner cleverness here that it will use index to trim down the amount of hard scanned data when it makes sense by its data statistics.

It needs some tests.


Files

ip_aggr_test.py (1.42 KB) ip_aggr_test.py Radko Krkoš, 08/09/2019 10:03 AM

Related issues

Related to Mentat - Bug #4515: Runaway query?ClosedPavel Kácha01/14/2019

Actions
Actions #1

Updated by Radko Krkoš about 6 years ago

  • % Done changed from 0 to 10

This is a bit tricky as there could be events using both IPv4 and IPv6 addresses in either source or target. There is no reasonable way to aggregate IPv4 and IPv6 addresses into one supernet range.

Possible solutions:
  1. Split {source,target}_aggr to {source,target}_ip{4,6}_aggr, this complicates the searching logic as IPv4 and IPv6 addresses must be distinguished and the query constructed accordingly. Also, the metadata table will grow more and all four new columns will be sparse (less so for IPv4 versions but nevertheless).
  2. As most events use one address family exclusively, aggregate over the family used. This has the disadvantage of failing for mixed events, either at aggregation or by providing wrong results for searches.

I am not in strong favor of either. Alternative 1 is less wrong from the business logic point of view but will require a lot of additional changes in query generation (already delicate part) and is not particularly DB friendly.

Actions #2

Updated by Pavel Kácha about 6 years ago

Are sparse tables that much of a hassle? (I mean, is, dunno, table space utilization overhead too big? Index space? Sparse indices bad idea?)

Actions #3

Updated by Pavel Kácha about 6 years ago

Split out ranges have the drawback of PQ having to check the accessible and indexed range and ALSO always check the array. Like:

... query && source_aggr AND query && ANY(source_ip)

How about letting PQ know (in indexed field) that info in source_aggr is complete? Like:

... (source_item_count=1 AND query && source_aggr) OR (source_item_count>1 AND query && source_aggr AND query && ANY(source_ip))

If PQ is clever enough, it might shortcircuit on the first part of the query, and only go for source_ip in multivalue events.

Actions #4

Updated by Pavel Kácha about 6 years ago

Radko Krkoš wrote:

  1. As most events use one address family exclusively, aggregate over the family used. This has the disadvantage of failing for mixed events, either at aggregation or by providing wrong results for searches.

Seems to me that this variant would need to be well thought out so we don't get wrong result for searches (and PQ just falls back to slower array intersection).

Do we have ANY mixed family events at all?

Actions #5

Updated by Radko Krkoš about 6 years ago

Pavel Kácha wrote:

Radko Krkoš wrote:

  1. As most events use one address family exclusively, aggregate over the family used. This has the disadvantage of failing for mixed events, either at aggregation or by providing wrong results for searches.

Seems to me that this variant would need to be well thought out so we don't get wrong result for searches (and PQ just falls back to slower array intersection).

That would be possible with sparse columns, the condition would be actually quite universal. I will address the sparse column question in other reply later.

Do we have ANY mixed family events at all?

We can have in principle. In the past we always respected such possibilities. Ignoring this case would make this issue much simpler but there is no reasonable way to not provide wrong results in case of the eventual mixed event.

Actions #6

Updated by Radko Krkoš about 6 years ago

Pavel Kácha wrote:

Are sparse tables that much of a hassle?

Apart from them being actively discouraged as sparse columns use means suboptimal Normal Form decomposition, they are quite well supported in PostgreSQL. The issue in fact is that those columns (Source_IP4, Source_IP6) are almost orthogonal, at least for current data. That does not look like good design from any angle.

(I mean, is, dunno, table space utilization overhead too big?

Actual size increase depends on the data type, but for 125M rows the overhead is not negligible in any case. I would advise against adding columns extensively.

Index space?

Indices can be created quite efficiently for sparse columns if the NULL rows are not interesting. Otherwise a full BTREE index on one column of our data is about 5GB in size. The benefits of partial index also disappear quickly for multicolumn indices (well, for our data) due to orthogonality.

Sparse indices bad idea?)

As mentioned above, they are harder to combine into multicolumn ones efficiently. For GIN indices, the planner currently can not even use sparse ones as mentioned (including a solution) in #4348. The downside is also quite a bit more complicated query filtering condition.
There is a difference between NULL and empty array '{}'. In Mentat the second type is used, most of the metadata table columns are arrays. For the IP address aggregation the data type would be iprange so a sparse index could be created of type GIN with condition "IS NOT NULL".

Actions #7

Updated by Radko Krkoš about 6 years ago

Pavel Kácha wrote:

Split out ranges have the drawback of PQ having to check the accessible and indexed range and ALSO always check the array. Like:

Have you tested this? It should not be the case. The idea is that the aggregation could be efficiently indexed while no index can be used for ANY. Therefore the query planner should use an index search over the aggregation index and a recheck condition with ANY. As the index scan will often exclude the unmatching row, the recheck should be unnecessary.
Am I wrong?

Actions #8

Updated by Pavel Kácha about 6 years ago

Radko Krkoš wrote:

Pavel Kácha wrote:

Radko Krkoš wrote:

  1. As most events use one address family exclusively, aggregate over the family used. This has the disadvantage of failing for mixed events, either at aggregation or by providing wrong results for searches.

Seems to me that this variant would need to be well thought out so we don't get wrong result for searches (and PQ just falls back to slower array intersection).

That would be possible with sparse columns, the condition would be actually quite universal. I will address the sparse column question in other reply later.

Do we have ANY mixed family events at all?

We can have in principle. In the past we always respected such possibilities. Ignoring this case would make this issue much simpler but there is no reasonable way to not provide wrong results in case of the eventual mixed event.

Sure we can, but do we? We are shooting for good enough solution, event this whole source_aggr dance is just sidestepping the problem and heading for common case (one range or at least close ranges), which may shoot us in the future.

However, I'm still not convinced of any of the alternatives, or maybe more swayed to solution 1, still just trying to figure out how severe real drawbacks may be (performance vs space vs complexity). I don't see a complexity as big problem as (though complex) queries will be written just once in the code and not touched again.

But, if we don't have combined IPv4/IPv6 (or the very minimum) events right now, and if we can get solution 2 working in a way that yields correct results (though suboptimally for IPv4/IPv6 combined events, and by suboptimally I mean as slowly/quickly as now), 1 may be overkill for now.

Actions #9

Updated by Pavel Kácha about 6 years ago

Radko Krkoš wrote:

Pavel Kácha wrote:

Are sparse tables that much of a hassle?

Apart from them being actively discouraged as sparse columns use means suboptimal Normal Form decomposition, they are quite well supported in PostgreSQL. The issue in fact is that those columns (Source_IP4, Source_IP6) are almost orthogonal, at least for current data. That does not look like good design from any angle.

Yes, alas the whole metadata table is not a good design from any angle, however it just happens to work for us. )

Yup, they're orthogonal, distinct datatypes, and I'm not sure clenching them together is a good idea (that is why I kept them split in Idea, to not mess together their integer representation). They are distinct namespaces, it just happens everybody shovels them together to simplify things... most of the time.

(I mean, is, dunno, table space utilization overhead too big?

Actual size increase depends on the data type, but for 125M rows the overhead is not negligible in any case. I would advise against adding columns extensively.

Index space?

Indices can be created quite efficiently for sparse columns if the NULL rows are not interesting. Otherwise a full BTREE index on one column of our data is about 5GB in size. The benefits of partial index also disappear quickly for multicolumn indices (well, for our data) due to orthogonality.

Sparse indices bad idea?)

As mentioned above, they are harder to combine into multicolumn ones efficiently. For GIN indices, the planner currently can not even use sparse ones as mentioned (including a solution) in #4348. The downside is also quite a bit more complicated query filtering condition.
There is a difference between NULL and empty array '{}'. In Mentat the second type is used, most of the metadata table columns are arrays. For the IP address aggregation the data type would be iprange so a sparse index could be created of type GIN with condition "IS NOT NULL".

I see, multicolumns didn't strike me. Makes sense.

Yes, I sense the problem, thats why I revived #4348, I'm just not sure I grasp it fully and we have to talk over it live, possibly with Mek.

Agreed with adding columns cautiously, thats why we need to do some perf tests, to weigh-in benefits/drawbacks.

Actions #10

Updated by Pavel Kácha about 6 years ago

Radko Krkoš wrote:

Pavel Kácha wrote:

Split out ranges have the drawback of PQ having to check the accessible and indexed range and ALSO always check the array. Like:

Have you tested this? It should not be the case. The idea is that the aggregation could be efficiently indexed while no index can be used for ANY. Therefore the query planner should use an index search over the aggregation index and a recheck condition with ANY. As the index scan will often exclude the unmatching row, the recheck should be unnecessary.
Am I wrong?

Wrong wording on my part, what I tried to say is pretty much what you summarized. PQ has to check the accessible and indexed range (aggr) and THEN ALSO check the array for resulting set. My idea tried to go further to mitigate the need to go to the array if the indexed range (aggr) is exactly same as the array (array length 1).

Actions #11

Updated by Radko Krkoš about 6 years ago

Pavel Kácha wrote:

Radko Krkoš wrote:

Pavel Kácha wrote:

Radko Krkoš wrote:

  1. As most events use one address family exclusively, aggregate over the family used. This has the disadvantage of failing for mixed events, either at aggregation or by providing wrong results for searches.

Seems to me that this variant would need to be well thought out so we don't get wrong result for searches (and PQ just falls back to slower array intersection).

That would be possible with sparse columns, the condition would be actually quite universal. I will address the sparse column question in other reply later.

Do we have ANY mixed family events at all?

We can have in principle. In the past we always respected such possibilities. Ignoring this case would make this issue much simpler but there is no reasonable way to not provide wrong results in case of the eventual mixed event.

Sure we can, but do we? We are shooting for good enough solution, event this whole source_aggr dance is just sidestepping the problem and heading for common case (one range or at least close ranges), which may shoot us in the future.

I see, OK, thanks for clarification.

However, I'm still not convinced of any of the alternatives, or maybe more swayed to solution 1, still just trying to figure out how severe real drawbacks may be (performance vs space vs complexity). I don't see a complexity as big problem as (though complex) queries will be written just once in the code and not touched again.

Then there is the issue of reading the queries.

But, if we don't have combined IPv4/IPv6 (or the very minimum) events right now, and if we can get solution 2 working in a way that yields correct results (though suboptimally for IPv4/IPv6 combined events, and by suboptimally I mean as slowly/quickly as now), 1 may be overkill for now.

This can not be done with 2 the way you described. If you only create one column, you can only store one range, IPv4 and IPv6 cannot be joined together, you will never get a good aggregate for mixed events. Only way would be to use an array but that would provide no benefit over current solution, index could not be used. Remember that even in #4253, the database search time was not a problem, array would gain us nothing.

Pavel Kácha wrote:

Radko Krkoš wrote:

Pavel Kácha wrote:

Are sparse tables that much of a hassle?

Apart from them being actively discouraged as sparse columns use means suboptimal Normal Form decomposition, they are quite well supported in PostgreSQL. The issue in fact is that those columns (Source_IP4, Source_IP6) are almost orthogonal, at least for current data. That does not look like good design from any angle.

Yes, alas the whole metadata table is not a good design from any angle, however it just happens to work for us. )

I would respectfully disagree. The metadata table is the most direct way to tackle the problem. Yes, the data is somehow encoded in the BLOB payload but not in a straightforward way that can be "explained" to the DB. The same principle would run for a full-text research paper database with metadata. Already present in full-text, reasonable to store in separate columns.

Yup, they're orthogonal, distinct datatypes, and I'm not sure clenching them together is a good idea (that is why I kept them split in Idea, to not mess together their integer representation). They are distinct namespaces, it just happens everybody shovels them together to simplify things... most of the time.

A good point.

(I mean, is, dunno, table space utilization overhead too big?

Actual size increase depends on the data type, but for 125M rows the overhead is not negligible in any case. I would advise against adding columns extensively.

Index space?

Indices can be created quite efficiently for sparse columns if the NULL rows are not interesting. Otherwise a full BTREE index on one column of our data is about 5GB in size. The benefits of partial index also disappear quickly for multicolumn indices (well, for our data) due to orthogonality.

Sparse indices bad idea?)

As mentioned above, they are harder to combine into multicolumn ones efficiently. For GIN indices, the planner currently can not even use sparse ones as mentioned (including a solution) in #4348. The downside is also quite a bit more complicated query filtering condition.
There is a difference between NULL and empty array '{}'. In Mentat the second type is used, most of the metadata table columns are arrays. For the IP address aggregation the data type would be iprange so a sparse index could be created of type GIN with condition "IS NOT NULL".

I see, multicolumns didn't strike me. Makes sense.

Yes, I sense the problem, thats why I revived #4348, I'm just not sure I grasp it fully and we have to talk over it live, possibly with Mek.

Obviously. We need a live talk about these issues.

Agreed with adding columns cautiously, thats why we need to do some perf tests, to weigh-in benefits/drawbacks.

Pavel Kácha wrote:

Radko Krkoš wrote:

Pavel Kácha wrote:

Split out ranges have the drawback of PQ having to check the accessible and indexed range and ALSO always check the array. Like:

Have you tested this? It should not be the case. The idea is that the aggregation could be efficiently indexed while no index can be used for ANY. Therefore the query planner should use an index search over the aggregation index and a recheck condition with ANY. As the index scan will often exclude the unmatching row, the recheck should be unnecessary.
Am I wrong?

Wrong wording on my part, what I tried to say is pretty much what you summarized. PQ has to check the accessible and indexed range (aggr) and THEN ALSO check the array for resulting set. My idea tried to go further to mitigate the need to go to the array if the indexed range (aggr) is exactly same as the array (array length 1).

I see. That would require us to use arrays for the aggregated ranges. As mentioned above, that would provide no benefit at all. Or am I missing something?

Actions #12

Updated by Pavel Kácha about 6 years ago

Radko Krkoš wrote:

Pavel Kácha wrote:

However, I'm still not convinced of any of the alternatives, or maybe more swayed to solution 1, still just trying to figure out how severe real drawbacks may be (performance vs space vs complexity). I don't see a complexity as big problem as (though complex) queries will be written just once in the code and not touched again.

Then there is the issue of reading the queries.

You mean readability of the sql code? Sure, but we can mitigate this by good commenting/docs.

But, if we don't have combined IPv4/IPv6 (or the very minimum) events right now, and if we can get solution 2 working in a way that yields correct results (though suboptimally for IPv4/IPv6 combined events, and by suboptimally I mean as slowly/quickly as now), 1 may be overkill for now.

This can not be done with 2 the way you described. If you only create one column, you can only store one range, IPv4 and IPv6 cannot be joined together, you will never get a good aggregate for mixed events. Only way would be to use an array but that would provide no benefit over current solution, index could not be used. Remember that even in #4253, the database search time was not a problem, array would gain us nothing.

Not sure why arrays would be necessary?

What was your idea of 2? Yielding wrong answers I consider a no-go.

Using 2 means using it for IPv4 solely, or to think out some magic (thats what I meant in "if we can get solution 2 working in a way...") to sidestep it (for example explicit NULL for "don't use 'aggr' fields" and more complex query). That however might mean even more complexity, and then source_ip4_aggr + source_ip6_aggr might be the only way.

Pavel Kácha wrote:

Radko Krkoš wrote:

Pavel Kácha wrote:

Are sparse tables that much of a hassle?

Apart from them being actively discouraged as sparse columns use means suboptimal Normal Form decomposition, they are quite well supported in PostgreSQL. The issue in fact is that those columns (Source_IP4, Source_IP6) are almost orthogonal, at least for current data. That does not look like good design from any angle.

Yes, alas the whole metadata table is not a good design from any angle, however it just happens to work for us. )

I would respectfully disagree. The metadata table is the most direct way to tackle the problem. Yes, the data is somehow encoded in the BLOB payload but not in a straightforward way that can be "explained" to the DB. The same principle would run for a full-text research paper database with metadata. Already present in full-text, reasonable to store in separate columns.

Ok... maybe. Or EAV model perhaps (flexible, but not very performing).

Pavel Kácha wrote:

Radko Krkoš wrote:

Pavel Kácha wrote:

Split out ranges have the drawback of PQ having to check the accessible and indexed range and ALSO always check the array. Like:

Have you tested this? It should not be the case. The idea is that the aggregation could be efficiently indexed while no index can be used for ANY. Therefore the query planner should use an index search over the aggregation index and a recheck condition with ANY. As the index scan will often exclude the unmatching row, the recheck should be unnecessary.
Am I wrong?

Wrong wording on my part, what I tried to say is pretty much what you summarized. PQ has to check the accessible and indexed range (aggr) and THEN ALSO check the array for resulting set. My idea tried to go further to mitigate the need to go to the array if the indexed range (aggr) is exactly same as the array (array length 1).

I see. That would require us to use arrays for the aggregated ranges. As mentioned above, that would provide no benefit at all. Or am I missing something?

Not sure why arrays? What I meant was to have somewhere indexable information about whether source_whatever_aggr is exact, or merely encompassing. If complete, no need for PQ to doublecheck in source_ip array.

Actions #13

Updated by Radko Krkoš about 6 years ago

Pavel Kácha wrote:

You mean readability of the sql code? Sure, but we can mitigate this by good commenting/docs.

I prefer self-documenting code but agreed.

But, if we don't have combined IPv4/IPv6 (or the very minimum) events right now, and if we can get solution 2 working in a way that yields correct results (though suboptimally for IPv4/IPv6 combined events, and by suboptimally I mean as slowly/quickly as now), 1 may be overkill for now.

This can not be done with 2 the way you described. If you only create one column, you can only store one range, IPv4 and IPv6 cannot be joined together, you will never get a good aggregate for mixed events. Only way would be to use an array but that would provide no benefit over current solution, index could not be used. Remember that even in #4253, the database search time was not a problem, array would gain us nothing.

Not sure why arrays would be necessary?

What was your idea of 2? Yielding wrong answers I consider a no-go.

Oh, good. I was a bit worried. I was not sure how to tackle this, but...

Using 2 means using it for IPv4 solely, or to think out some magic (thats what I meant in "if we can get solution 2 working in a way...") to sidestep it (for example explicit NULL for "don't use 'aggr' fields" and more complex query). That however might mean even more complexity, and then source_ip4_aggr + source_ip6_aggr might be the only way.

the variant for IPv4 solely sound the most reasonable from the DB structure standpoint. It would nevertheless lead to complications in the code / different queries based on IP family. I doubt all that is worth it. Let's discuss live.

Pavel Kácha wrote:
I would respectfully disagree. The metadata table is the most direct way to tackle the problem. Yes, the data is somehow encoded in the BLOB payload but not in a straightforward way that can be "explained" to the DB. The same principle would run for a full-text research paper database with metadata. Already present in full-text, reasonable to store in separate columns.

Ok... maybe. Or EAV model perhaps (flexible, but not very performing).

Like building a different simplified JSON IDEA and storing/querying that one? Intriguing idea. Or an sub-IDEA variant for sparse entities only... Anyway I see little benefit, most would be almost the same, querying a bit more complicated. Probably not worth it. I will time to test that in spare time but assessing the performance will require quite a large DB.

Pavel Kácha wrote:
I see. That would require us to use arrays for the aggregated ranges. As mentioned above, that would provide no benefit at all. Or am I missing something?

Not sure why arrays? What I meant was to have somewhere indexable information about whether source_whatever_aggr is exact, or merely encompassing. If complete, no need for PQ to doublecheck in source_ip array.

I see, the length of the original IP address array. Sorry for being stupid. Anyway, not worth adding the new column IMHO. And as discussed above, the potential improvement is negligible over the added aggregated column. It would not be possible to index this together with the aggregated value (that requires GIN, for sizes BTREE would be almost a requirement).

Actions #14

Updated by Pavel Kácha about 6 years ago

Radko Krkoš wrote:

Pavel Kácha wrote:

Pavel Kácha wrote:
I would respectfully disagree. The metadata table is the most direct way to tackle the problem. Yes, the data is somehow encoded in the BLOB payload but not in a straightforward way that can be "explained" to the DB. The same principle would run for a full-text research paper database with metadata. Already present in full-text, reasonable to store in separate columns.

Ok... maybe. Or EAV model perhaps (flexible, but not very performing).

Like building a different simplified JSON IDEA and storing/querying that one? Intriguing idea. Or an sub-IDEA variant for sparse entities only... Anyway I see little benefit, most would be almost the same, querying a bit more complicated. Probably not worth it. I will time to test that in spare time but assessing the performance will require quite a large DB.

Don't waste time on that, twas just reaction on storing research papers. EAV model is useful for objects with many different attributes, and always needs joining. (We're using EAV in Negistry, for RIPE-like objects.)

Actions #15

Updated by Pavel Kácha almost 6 years ago

So, does it make a difference, if those encompassing range fields are in the GIN index and queries use them along with source_ip and target_ip?

Actions #16

Updated by Radko Krkoš almost 6 years ago

Actions #17

Updated by Radko Krkoš almost 6 years ago

  • Status changed from New to Feedback
  • % Done changed from 10 to 90

TLDR: Let's discuss this in VC.

Pavel Kácha wrote:

So, does it make a difference, if those encompassing range fields are in the GIN index and queries use them along with source_ip and target_ip?

The GIN index variant was pursued, aggregating the lists of source/target addresses into CIDR ranges. The data were saved in this form:

ALTER TABLE events ADD COLUMN source_ip_inet_ip4 inet;
ALTER TABLE events ADD COLUMN source_ip_inet_ip6 inet;
ALTER TABLE events ADD COLUMN target_ip_inet_ip4 inet;
ALTER TABLE events ADD COLUMN target_ip_inet_ip6 inet;

The conversion was done using plpgsql custom functions:

CREATE OR REPLACE FUNCTION cidr_ip4(iprange[]) RETURNS cidr
AS
$$
DECLARE
    retVal cidr;
    r_ip iprange;
    r_ip4 ip4r;
    cidr_range cidr;
BEGIN
    FOREACH r_ip IN ARRAY $1 LOOP
        IF family(r_ip) = 4 THEN
            r_ip4 = r_ip::ip4r;
            FOR cidr_range IN SELECT cidr_split(r_ip4) LOOP
                IF retVal IS NULL THEN
                    retVal := cidr_range;
                ELSE
                    retVal := inet_merge(retVal, cidr_range);
                END IF;
            END LOOP;
        END IF;
    END LOOP;
RETURN retVal;
END;
$$
LANGUAGE plpgsql;

CREATE OR REPLACE FUNCTION cidr_ip6(iprange[]) RETURNS cidr
AS
$$
DECLARE
    retVal cidr;
    r_ip iprange;
    r_ip6 ip6r;
    cidr_range cidr;
BEGIN
    FOREACH r_ip IN ARRAY $1 LOOP
        IF family(r_ip) = 6 THEN
            r_ip6 = r_ip::ip6r;
            FOR cidr_range IN SELECT cidr_split(r_ip6) LOOP
                IF retVal IS NULL THEN
                    retVal := cidr_range;
                ELSE
                    retVal := inet_merge(retVal, cidr_range);
                END IF;
            END LOOP;
        END IF;
    END LOOP;
RETURN retVal;
END;
$$
LANGUAGE plpgsql;

and an UPDATE statement:

UPDATE events SET source_ip_inet_ip4 = cidr_ip4(source_ip), source_ip_inet_ip6 = cidr_ip6(source_ip), target_ip_inet_ip4 = cidr_ip4(target_ip), target_ip_inet_ip6 = cidr_ip6(target_ip);

A GIN index could then be created but unfortunately I was unable to make PostgreSQL to use the index for queries.

An alternative approach was also tested, resulting in ip4r/ip6r aggregates. These are arbitrary ranges so generally somewhat more selective. Table schema change & aggregation functions:

ALTER TABLE events ADD COLUMN source_ip_aggr_ip4 ip4r;
ALTER TABLE events ADD COLUMN source_ip_aggr_ip6 ip6r;
ALTER TABLE events ADD COLUMN target_ip_aggr_ip4 ip4r;
ALTER TABLE events ADD COLUMN target_ip_aggr_ip6 ip6r;

CREATE OR REPLACE FUNCTION aggr_ip4(iprange[]) RETURNS ip4r
AS
$$
DECLARE
    retVal ip4r;
    r_ip iprange;
    r_ip4 ip4r;
    _min ip4;
    _max ip4;
BEGIN
    FOREACH r_ip IN ARRAY $1 LOOP
        IF family(r_ip) = 4 THEN
            r_ip4 = r_ip::ip4r;
            IF retVal IS NULL THEN
                retVal := r_ip4;
            ELSE
                _min := lower(retVal);
                _max := upper(retVal);
                IF _min > lower(r_ip4) THEN
                    _min := lower(r_ip4);
                END IF;
                IF _max < upper(r_ip4) THEN
                    _max := upper(r_ip4);
                END IF;
                retVal := ip4r(_min, _max);
            END IF;
        END IF;
    END LOOP;
RETURN retVal;
END;
$$
LANGUAGE plpgsql;

CREATE OR REPLACE FUNCTION aggr_ip6(iprange[]) RETURNS ip6r
AS
$$
DECLARE
    retVal ip6r;
    r_ip iprange;
    r_ip6 ip6r;
    _min ip6;
    _max ip6;
BEGIN
    FOREACH r_ip IN ARRAY $1 LOOP
        IF family(r_ip) = 6 THEN
            r_ip6 = r_ip::ip6r;
            IF retVal IS NULL THEN
                retVal := r_ip6;
            ELSE
                _min := lower(retVal);
                _max := upper(retVal);
                IF _min > lower(r_ip6) THEN
                    _min := lower(r_ip6);
                END IF;
                IF _max < upper(r_ip6) THEN
                    _max := upper(r_ip6);
                END IF;
                retVal := ip6r(_min, _max);
            END IF;
        END IF;
    END LOOP;
RETURN retVal;
END;
$$
LANGUAGE plpgsql;

For these, a GIST index can be crated:

CREATE INDEX events_ip_aggr_idx ON events USING GIST (source_ip_aggr_ip4, target_ip_aggr_ip4, source_ip_aggr_ip6, target_ip_aggr_ip6);

Such an index can be used to greatly improve an IP address search:

EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM events WHERE source_ip_aggr_ip4 && '195.113.214.167' AND ('195.113.214.167' && ANY(source_ip)) AND detecttime>'2018-07-23' ORDER BY detecttime DESC LIMIT 100;
                                                                        QUERY PLAN                                                                        
----------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=59854.53..59912.18 rows=100 width=750) (actual time=101.214..101.227 rows=22 loops=1)
   Buffers: shared hit=9899
   ->  Gather Merge  (cost=59854.53..61215.09 rows=2360 width=750) (actual time=101.211..112.411 rows=22 loops=1)
         Workers Planned: 4
         Workers Launched: 4
         Buffers: shared hit=20627
         ->  Sort  (cost=58852.67..58899.87 rows=590 width=750) (actual time=87.701..87.702 rows=4 loops=5)
               Sort Key: detecttime DESC
               Sort Method: quicksort  Memory: 29kB
               Worker 0:  Sort Method: quicksort  Memory: 27kB
               Worker 1:  Sort Method: quicksort  Memory: 26kB
               Worker 2:  Sort Method: quicksort  Memory: 28kB
               Worker 3:  Sort Method: quicksort  Memory: 26kB
               Buffers: shared hit=20627
               ->  Parallel Bitmap Heap Scan on events  (cost=4816.46..58131.09 rows=590 width=750) (actual time=71.880..87.617 rows=4 loops=5)
                     Recheck Cond: (source_ip_aggr_ip4 && '195.113.214.167'::ip4r)
                     Filter: ((detecttime > '2018-07-23 00:00:00'::timestamp without time zone) AND ('195.113.214.167'::iprange && ANY (source_ip)))
                     Rows Removed by Filter: 3398
                     Heap Blocks: exact=6003
                     Buffers: shared hit=20595
                     ->  Bitmap Index Scan on events_ip_aggr_idx  (cost=0.00..4797.60 rows=48236 width=0) (actual time=61.973..61.973 rows=17014 loops=1)
                           Index Cond: (source_ip_aggr_ip4 && '195.113.214.167'::ip4r)
                           Buffers: shared hit=3896
 Planning Time: 1.273 ms
 Execution Time: 112.595 ms

versus:

EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM events WHERE ('195.113.214.167' && ANY(source_ip)) AND detecttime>'2018-07-23' ORDER BY detecttime DESC LIMIT 100;                                                                            QUERY PLAN                                                                             
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=13.92..1174.53 rows=100 width=750) (actual time=9176.079..13275.647 rows=22 loops=1)
   Buffers: shared hit=7494800 read=636646
   ->  Index Scan Backward using events_detecttime_idx on events  (cost=13.92..5474000.92 rows=471648 width=750) (actual time=9176.075..13275.638 rows=22 loops=1)
         Index Cond: (detecttime > '2018-07-23 00:00:00'::timestamp without time zone)
         Filter: ('195.113.214.167'::iprange && ANY (source_ip))
         Rows Removed by Filter: 9647138
         Buffers: shared hit=7494800 read=636646
 Planning Time: 1.191 ms
 Execution Time: 13275.732 ms

On the other hand, the network prefix address form is problematic as with detecttime index the runtime is inversely proportional to prefix (with larger prefix, more IPs match the filter, less disk read is required) while with the aggr GIST index is is proportional directly (smaller prefix means less data to read, analyze and sort). The optimum method switch happens at about /12. As iprange is an extension and the planner does not understand the role of the prefix (the estimated cost is the same regardless of prefix length), the choice of the best version cannot be currently done automatically, forcing us to create the query manually to include/exclude the filtering on these new columns based on prefix length.

The other downside is that the new columns cannot be included in the combined GIN index, requiring a new GIST index. Those are rather large, estimate for two months of stored data, ~15GB. Also, it took a lot of tinkering with costs for the planner to correctly pick the GIST index based plan.
Nevertheless this would allow us to speed up the single bad performing query type left and with all query types supported by indices, the impact of out of cache situation as happened recently would be minimized.

Actions #18

Updated by Radko Krkoš almost 6 years ago

The optimum method switch happens at about /12.

Actually the cutoff is somewhere around /6 as the times are the same order of magnitude until then.

Actions #19

Updated by Pavel Kácha almost 6 years ago

Radko Krkoš wrote:

On the other hand, the network prefix address form is problematic as with detecttime index the runtime is inversely proportional to prefix (with larger prefix, more IPs match the filter, less disk read is required) while with the aggr GIST index is is proportional directly (smaller prefix means less data to read, analyze and sort). The optimum method switch happens at about /12. As iprange is an extension and the planner does not understand the role of the prefix (the estimated cost is the same regardless of prefix length), the choice of the best version cannot be currently done automatically, forcing us to create the query manually to include/exclude the filtering on these new columns based on prefix length.

Now that's interesting. I haven't anticipated that large impact, big thanks for testing. However, I'm not sure what you mean by prefix when GIST version uses arbitrary ranges?

It's unfortunate GIN CIDR version doesn't work. So it seems GIN supports && only on arrays, not on CIDR ranges. Bad luck.

The other downside is that the new columns cannot be included in the combined GIN index, requiring a new GIST index. Those are rather large, estimate for two months of stored data, ~15GB. Also, it took a lot of tinkering with costs for the planner to correctly pick the GIST index based plan.
Nevertheless this would allow us to speed up the single bad performing query type left and with all query types supported by indices, the impact of out of cache situation as happened recently would be minimized.

Yes, I would at least give it a try (if the heuristics of choosing the query is straigthforward, ergo not that much work). We can drop the index anytime.

No need to answer, we'll talk IVC after I'm back and done sightseeing. *grin*

Actions #20

Updated by Pavel Kácha almost 6 years ago

Pavel Kácha wrote:

It's unfortunate GIN CIDR version doesn't work. So it seems GIN supports && only on arrays, not on CIDR ranges. Bad luck.

Quick, maybe wrong, hint, do you have btree_gin extension loaded for inet type?

Actions #21

Updated by Radko Krkoš almost 6 years ago

Pavel Kácha wrote:

Pavel Kácha wrote:

It's unfortunate GIN CIDR version doesn't work. So it seems GIN supports && only on arrays, not on CIDR ranges. Bad luck.

Quick, maybe wrong, hint, do you have btree_gin extension loaded for inet type?

Yes, otherwise it would not be possible to even create the index.

Actions #22

Updated by Radko Krkoš almost 6 years ago

Pavel Kácha wrote:
However, I’m not sure what you mean by prefix when GIST version uses arbitrary ranges?

Yes, technically you can use arbitrary ranges. I was testing this using prefix ranges. They are logically equivalent for the whole subnets. Keep in mind that these are not hard numbers, the important measure is number of matching rows, not the IP range size.

Actions #23

Updated by Pavel Kácha almost 6 years ago

Radko Krkoš wrote:

Pavel Kácha wrote:
However, I’m not sure what you mean by prefix when GIST version uses arbitrary ranges?

Yes, technically you can use arbitrary ranges. I was testing this using prefix ranges. They are logically equivalent for the whole subnets. Keep in mind that these are not hard numbers, the important measure is number of matching rows, not the IP range size.

Ack, so I understood this right, but I'm lost in something else. Let's not bury ourselves in characters, talk will be more productive.

Actions #24

Updated by Radko Krkoš almost 6 years ago

Pavel Kácha wrote:

Ack, so I understood this right, but I'm lost in something else. Let's not bury ourselves in characters, talk will be more productive.

Yeah, I tried reducing the amount of text, but the topic is too extensive.

Actions #25

Updated by Pavel Kácha almost 6 years ago

Partial wrap-up from 2019-02-07 meeting:

Planner does not take into consideration size of queried range,
  • so we need to choose the right query ourselves (configuration option?),
  • and some surgical changes in planner weights have been necessary.

Radko, prepare please example search and insert query(es?) and indices creation/migration. Also, it will be necessary to document whats and whys of planner weight changes and configuration (breakpoint range size).

Actions #26

Updated by Radko Krkoš almost 6 years ago

Based on the online discussion, the following approach was chosen as a solution:

Accelerating IP searches using aggregated ranges and GIST indices

The procedure requires a table alteration with a calculation based conversion procedure.
1. Alter the table
Following 4 new columns will be added:

ALTER TABLE events ADD COLUMN source_ip_aggr_ip4 ip4r;
ALTER TABLE events ADD COLUMN source_ip_aggr_ip6 ip6r;
ALTER TABLE events ADD COLUMN target_ip_aggr_ip4 ip4r;
ALTER TABLE events ADD COLUMN target_ip_aggr_ip6 ip6r;

2. Populate the new columns by respective aggregated values
The idea is to create a single IP4/6 range containing all source/target adresses in the event. To aggregate the IP addresses, the following 2 plpgsql functions are used. This allows for higher speed as data is processed inside the database system itself only, thusly reducing any random latencies. The processing is then disk bound.

CREATE OR REPLACE FUNCTION aggr_ip4(iprange[]) RETURNS ip4r
AS
$$
DECLARE
    retVal ip4r;
    r_ip iprange;
    r_ip4 ip4r;
    _min ip4;
    _max ip4;
BEGIN
    FOREACH r_ip IN ARRAY $1 LOOP
        IF family(r_ip) = 4 THEN
            r_ip4 = r_ip::ip4r;
            IF retVal IS NULL THEN
                retVal := r_ip4;
            ELSE
                _min := lower(retVal);
                _max := upper(retVal);
                IF _min > lower(r_ip4) THEN
                    _min := lower(r_ip4);
                END IF;
                IF _max < upper(r_ip4) THEN
                    _max := upper(r_ip4);
                END IF;
                retVal := ip4r(_min, _max);
            END IF;
        END IF;
    END LOOP;
RETURN retVal;
END;
$$
LANGUAGE plpgsql;

CREATE OR REPLACE FUNCTION aggr_ip6(iprange[]) RETURNS ip6r
AS
$$
DECLARE
    retVal ip6r;
    r_ip iprange;
    r_ip6 ip6r;
    _min ip6;
    _max ip6;
BEGIN
    FOREACH r_ip IN ARRAY $1 LOOP
        IF family(r_ip) = 6 THEN
            r_ip6 = r_ip::ip6r;
            IF retVal IS NULL THEN
                retVal := r_ip6;
            ELSE
                _min := lower(retVal);
                _max := upper(retVal);
                IF _min > lower(r_ip6) THEN
                    _min := lower(r_ip6);
                END IF;
                IF _max < upper(r_ip6) THEN
                    _max := upper(r_ip6);
                END IF;
                retVal := ip6r(_min, _max);
            END IF;
        END IF;
    END LOOP;
RETURN retVal;
END;
$$
LANGUAGE plpgsql;

The column population is then performed as follows:
UPDATE events SET source_ip_aggr_ip4 = aggr_ip4(source_ip), source_ip_aggr_ip6 = aggr_ip6(source_ip), target_ip_aggr_ip4 = aggr_ip4(target_ip), target_ip_aggr_ip6 = aggr_ip6(target_ip);

The aggregation functions are planned for the migration only and could safely be deleted afterwards.

3. Create GIST indices to accelerate the searches

A GIST index index can be used to accelerate the && (overlaps) operation of ip4r. All the 4 new columns can be combined into a one index, what saves space, nevertheless the index is quite big anyways. The combined GIST index can be created as follows:

CREATE INDEX events_ip_aggr_idx ON events USING GIST (source_ip_aggr_ip4, target_ip_aggr_ip4, source_ip_aggr_ip6, target_ip_aggr_ip6);

That concludes the architectural changes to DB structure.

4. Altered queries for accelerated filtering
For the applied changes to be useful, querying has to be altered as well as this optimization is based on data reinterpretation far beyond the scope of database query planner or its data type understanding. A typical query for a single address that looks like this:

SELECT * FROM events WHERE (__IP__ && ANY(source_ip)) AND detecttime>__DATE__ ORDER BY detecttime DESC LIMIT 100;

where __IP__ is an IP address and __DATE__ is a lower bound datetime, can be altered as follows:
SELECT * FROM events WHERE source_ip_aggr_ipX && IP AND (__IP__ && ANY(source_ip)) AND detecttime>__DATE__ ORDER BY detecttime DESC LIMIT 100;

with source_ip_aggr_ipX should be source_ip_aggr_ip4 for IPv4 or source_ip_aggr_ip6 for IPv6. Effectively this part is added right before the current IP filter:
source_ip_aggr_ipX && __IP__

The process is analogous for target filtering.

5. Execution plan optimization (by altering the query wording)
The query modification part as described in previous section offers a very good performance improvement for single IP addresses which is a quite common use case. For range searches (generally in CIDR form), which are also used frequently, the situation is unfortunatelly not as straightforward. Choosing optimal plan should be based on filter selectivity, where if the condition filters our few addresses (as happens for queries covering many addresses of the continuum, like '195.0.0.0/8'), a quasi sequantial scan ordered by detecttime should be more efficient. On the other hand, searches for C class (/24) ranges should be accelerated by applying index to evaluate the condition. Unfortunatelly the query planner does not understand the prefix (it's length has no impact on plan step cost), therefore we have to manually either alter the query (as described in previous section) or not based on filter selectivity. The best way we can do this is by range size (prefix length = (int)log2(max-min)? max-min?), although in reality the relative frequency is the truly meaninful estimate. Such statistics are gathered by ANALYZE, using those would be a bit too low-level for now (it should anyways be possible).
Maybe two different cutoff limits, one for general IP range and one for constituency adresses, would be better but it would complicate the logic considerably. One limit should therefore be sufficient. The exact value should be determined experimentally, estimates show a value around prefix- length of 8. This does not have to be exact (and cannot be for reasons described above) as there is a range of prefix length where the two alternative plans are comparable in performance.

6. Execution plan optimization (by improving the execution planner)

This is out of scope of current effort.

The method described in previous section is a hack around the limitations of the current query planner. A true fix would be by improving the planner/estimator. This might be a good topic for discussion for the next PostgreSQL developers conference.

Actions #27

Updated by Radko Krkoš almost 6 years ago

  • Assignee changed from Radko Krkoš to Jan Mach
Actions #28

Updated by Radko Krkoš almost 6 years ago

Pavel Kácha wrote:

Also, it will be necessary to document whats and whys of planner weight changes ...

This is already documented in general. Specific values and their explanations can only be added after deployment and some subsequent analysis.

Actions #29

Updated by Pavel Kácha over 5 years ago

  • Target version changed from Backlog to 2.5
Actions #30

Updated by Jan Mach over 5 years ago

  • Status changed from Feedback to In Progress
  • % Done changed from 90 to 100

Implementation is done. Currently deploying on mentat-alt test server.

Actions #31

Updated by Jan Mach over 5 years ago

Mentat system on mentat-alt system was just successfully upgraded to latest version with support for address search with aggregated ranges.

Database schema was successfully migrated:

mentat_events=# \d events
                                 Table "public.events" 
         Column          |            Type             | Collation | Nullable | Default 
-------------------------+-----------------------------+-----------+----------+---------
 id                      | text                        |           | not null | 
 detecttime              | timestamp without time zone |           | not null | 
 category                | text[]                      |           | not null | 
 description             | text                        |           |          | 
 source_ip               | iprange[]                   |           |          | 
 target_ip               | iprange[]                   |           |          | 
 source_port             | integer[]                   |           |          | 
 target_port             | integer[]                   |           |          | 
 source_type             | text[]                      |           |          | 
 target_type             | text[]                      |           |          | 
 protocol                | text[]                      |           |          | 
 node_name               | text[]                      |           | not null | 
 node_type               | text[]                      |           |          | 
 cesnet_storagetime      | timestamp without time zone |           | not null | 
 cesnet_resolvedabuses   | text[]                      |           |          | 
 cesnet_eventclass       | text                        |           |          | 
 cesnet_eventseverity    | text                        |           |          | 
 cesnet_inspectionerrors | text[]                      |           |          | 
 event                   | bytea                       |           |          | 
 source_ip_aggr_ip4      | ip4r                        |           |          | 
 target_ip_aggr_ip4      | ip4r                        |           |          | 
 source_ip_aggr_ip6      | ip6r                        |           |          | 
 target_ip_aggr_ip6      | ip6r                        |           |          | 
Indexes:
    "events_pkey" PRIMARY KEY, btree (id)
    "events_cesnet_eventseverity_idx" btree (cesnet_eventseverity) WHERE cesnet_eventseverity IS NOT NULL
    "events_cesnet_storagetime_idx" btree (cesnet_storagetime) CLUSTER
    "events_combined_idx" gin (category, node_name, protocol, source_port, target_port, source_type, target_type, node_type, cesnet_resolvedabuses, cesnet_inspectionerrors)
    "events_detecttime_idx" btree (detecttime)
    "events_ip_aggr_idx" gist (source_ip_aggr_ip4, target_ip_aggr_ip4, source_ip_aggr_ip6, target_ip_aggr_ip6)

Migration took:

(venv) root@mentat-alt:~# time /etc/mentat/scripts/sqldb-migrate.sh upgrade head
INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO  [alembic.runtime.migration] Will assume transactional DDL.
INFO  [alembic.runtime.migration] Running upgrade a9e28583cc3b -> 3fb6b209a5cd, Added support for aggregated address ranges to improve event search performance.

real    252m9,914s
user    0m0,560s
sys    0m0,036s

(venv) root@mentat-alt:~# free -h
              total        used        free      shared  buff/cache   available
Mem:           188G        1,2G        915M         16G        186G        169G
Swap:          9,3G        442M        8,9G

mentat_events=# select count(id) from events;
  count   
----------
 69744636
(1 row)

So in average it means the migration speed was roughly 4600 events per second (including index generation).

Actions #32

Updated by Jan Mach over 5 years ago

  • Status changed from In Progress to Feedback
  • Assignee changed from Jan Mach to Radko Krkoš
  • Priority changed from Normal to High

Because this work has a potential to cause big problems in case anything goes wrong, I would like to ask both Pavel and Radko to double check my commit:

https://homeproj.cesnet.cz/projects/mentat/repository/revisions/b6f56980c61f296103289ea6a45d38f2cb6b4e3a/diff/

The most crucial is the _aggr_iplist() method on line 198 of the diff, which is responsible for calculating the aggregated range before insertion. If there was an error, the whole database on production server would need to be recalculated afterwards.

Also, I have tried searching according to the source IP and it takes horribly long on the mentat-alt server. I tried searching for a single IP address that was sure to be in the database:

https://mentat-alt.cesnet.cz/mentat/events/search?submit=Search&source_addrs=147.251.21.105

When I try the similar search on my local development machine, the produced query is for example:

SELECT "id", "detecttime", "category", "description", "source_ip", "target_ip", "source_ip_aggr_ip4", "source_ip_aggr_ip6", "target_ip_aggr_ip4", "target_ip_aggr_ip6", "source_port", "target_port", "source_type", "target_type", "protocol", "node_name", "node_type", "cesnet_resolvedabuses", "cesnet_storagetime", "cesnet_eventclass", "cesnet_eventseverity", "cesnet_inspectionerrors", "event" FROM events WHERE ("source_ip_aggr_ip4" && '192.168.0.5' AND '192.168.0.5' && ANY("source_ip")) ORDER BY "detecttime" DESC LIMIT 100
Actions #33

Updated by Jan Mach over 5 years ago

Jan Mach wrote:

Because this work has a potential to cause big problems in case anything goes wrong, I would like to ask both Pavel and Radko to double check my commit:

https://homeproj.cesnet.cz/projects/mentat/repository/revisions/b6f56980c61f296103289ea6a45d38f2cb6b4e3a/diff/

The most crucial is the _aggr_iplist() method on line 198 of the diff, which is responsible for calculating the aggregated range before insertion. If there was an error, the whole database on production server would need to be recalculated afterwards.

Also, I have tried searching according to the source IP and it takes horribly long on the mentat-alt server. I tried searching for a single IP address that was sure to be in the database:

https://mentat-alt.cesnet.cz/mentat/events/search?submit=Search&source_addrs=147.251.21.105

When I try the similar search on my local development machine, the produced query is for example:

[...]

The query finally returned some data. It took 1:03:34.349000 and the generated SQL was:

SELECT "id", "detecttime", "category", "description", "source_ip", "target_ip", "source_ip_aggr_ip4", "source_ip_aggr_ip6", "target_ip_aggr_ip4", "target_ip_aggr_ip6", "source_port", "target_port", "source_type", "target_type", "protocol", "node_name", "node_type", "cesnet_resolvedabuses", "cesnet_storagetime", "cesnet_eventclass", "cesnet_eventseverity", "cesnet_inspectionerrors", "event" FROM events WHERE ("source_ip_aggr_ip4" && '147.251.21.105' AND '147.251.21.105' && ANY("source_ip")) ORDER BY "detecttime" DESC LIMIT 100
Actions #34

Updated by Radko Krkoš over 5 years ago

Jan Mach wrote:

Also, I have tried searching according to the source IP and it takes horribly long on the mentat-alt server. I tried searching for a single IP address that was sure to be in the database:

https://mentat-alt.cesnet.cz/mentat/events/search?submit=Search&source_addrs=147.251.21.105

The reason is that the server was thrashing heavily. The upgrade procedure lead to removing and reinserting of all rows in the table (that is how UPDATE is implemented in PostgreSQL). The deleted rows are reclaimed during VACUUM, but if all rows are UPDATEd, the autovacuum falls behind, leading to grossly enlarged table and indices. On mentat-alt this lead to ~60M invalidated but unreclaimed rows, in addition to ~70M valid ones. The size of the events table and indices was as follows:

          tablename           |                 indexname                 |  num_rows   | table_size | index_size | unique | number_of_scans | tuples_read | tuples_fetched
------------------------------+-------------------------------------------+-------------+------------+------------+--------+-----------------+-------------+----------------
 events                       | events_cesnet_eventseverity_idx           | 7.29255e+07 | 184 GB     | 6791 MB    | N      |        21612155 |   128517430 |       32503100
 events                       | events_cesnet_storagetime_idx             | 7.29255e+07 | 184 GB     | 8860 MB    | N      |        47515916 | 74851464923 |    55793911310
 events                       | events_combined_idx                       | 7.29255e+07 | 184 GB     | 7443 MB    | N      |          960588 | 14799142452 |              0
 events                       | events_detecttime_idx                     | 7.29255e+07 | 184 GB     | 10 GB      | N      |            1350 |  1466805083 |     1368616934
 events                       | events_ip_aggr_idx                        | 7.29255e+07 | 184 GB     | 7710 MB    | N      |               0 |           0 |              0
 events                       | events_pkey                               | 7.29255e+07 | 184 GB     | 9968 MB    | Y      |          744757 |   117755302 |        1362582

These no longer fit to available RAM. To correct the situation, VACUUM FREEZE to reclaim entries in the table and indices and CLUSTER to compact the table and indices are required (and ANALYZE to calculate statistics). The runtimes on mentat-alt:

VACUUM FREEZE;    52m
CLUSTER;        1h20m
ANALYZE;           1m

Table & index sizes after the maintenance:

          tablename           |                 indexname                 |  num_rows   | table_size | index_size | unique | number_of_scans | tuples_read | tuples_fetched
------------------------------+-------------------------------------------+-------------+------------+------------+--------+-----------------+-------------+----------------
 events                       | events_cesnet_eventseverity_idx           | 7.02752e+07 | 96 GB      | 1279 MB    | N      |        21612155 |   128517430 |       32503100
 events                       | events_cesnet_storagetime_idx             | 7.02752e+07 | 96 GB      | 1505 MB    | N      |        47515917 | 74921729217 |    55864175604
 events                       | events_combined_idx                       | 7.02752e+07 | 96 GB      | 2428 MB    | N      |          960588 | 14799142452 |              0
 events                       | events_detecttime_idx                     | 7.02752e+07 | 96 GB      | 1505 MB    | N      |            1350 |  1466805083 |     1368616934
 events                       | events_ip_aggr_idx                        | 7.02752e+07 | 96 GB      | 4770 MB    | N      |               0 |           0 |              0
 events                       | events_pkey                               | 7.02752e+07 | 96 GB      | 3958 MB    | Y      |          744757 |   117755302 |        1362582

This fits in RAM, so no thrashing. Performing VACUUM and CLUSTER after the UPDATE is the recommended way to handle the bloat.
The performance should be back to normal.

Actions #35

Updated by Jan Mach over 5 years ago

Thank you Radko for the database fix. I have updated the documentation accordingly to note the necessity of these database maintenance tasks. I can confirm, that the search that took more than one hour yesterday now finishes in ~20 seconds.

Actions #36

Updated by Jan Mach over 5 years ago

From my point of view the implementation part of this task is now done.

Actions #37

Updated by Pavel Kácha over 5 years ago

Radko Krkoš wrote:

This fits in RAM, so no thrashing. Performing VACUUM and CLUSTER after the UPDATE is the recommended way to handle the bloat.
The performance should be back to normal.

Well, twenty seconds for one IP is still quite unsatisfying according to expectations, or am I missing something?

Actions #38

Updated by Jan Mach over 5 years ago

Pavel Kácha wrote:

Radko Krkoš wrote:

This fits in RAM, so no thrashing. Performing VACUUM and CLUSTER after the UPDATE is the recommended way to handle the bloat.
The performance should be back to normal.

Well, twenty seconds for one IP is still quite unsatisfying according to expectations, or am I missing something?

It was without any time frame specifications, so the whole database needed to be searched. I do not have the numbers to tell if that is good or bad in comparison with previous implementation, I just stated that the database response got better after your maintenance tasks.

Actions #39

Updated by Pavel Kácha over 5 years ago

Jan Mach wrote:

Pavel Kácha wrote:

Radko Krkoš wrote:

This fits in RAM, so no thrashing. Performing VACUUM and CLUSTER after the UPDATE is the recommended way to handle the bloat.
The performance should be back to normal.

Well, twenty seconds for one IP is still quite unsatisfying according to expectations, or am I missing something?

It was without any time frame specifications, so the whole database needed to be searched. I do not have the numbers to tell if that is good or bad in comparison with previous implementation, I just stated that the database response got better after your maintenance tasks.

Sure, twas more of a question to Radko.

Actions #40

Updated by Radko Krkoš over 5 years ago

Pavel Kácha wrote:

Jan Mach wrote:

Pavel Kácha wrote:

Radko Krkoš wrote:

This fits in RAM, so no thrashing. Performing VACUUM and CLUSTER after the UPDATE is the recommended way to handle the bloat.
The performance should be back to normal.

Well, twenty seconds for one IP is still quite unsatisfying according to expectations, or am I missing something?

It was without any time frame specifications, so the whole database needed to be searched. I do not have the numbers to tell if that is good or bad in comparison with previous implementation, I just stated that the database response got better after your maintenance tasks.

Sure, twas more of a question to Radko.

It is the same time as before as the same plan is used. We need to fine-tune the planner parameters for the optimization to actually work. I did not have enough time to do that yet. If the new plans are used, the time seems to be around 8s. That is a considerable improvement from 20s, although less than we expected from the original measurements. The scaling seems to be not linear from the numbers obtained based on the amount of data on mentat-dev.

Actions #41

Updated by Pavel Kácha over 5 years ago

Radko Krkoš wrote:

It is the same time as before as the same plan is used. We need to fine-tune the planner parameters for the optimization to actually work. I did not have enough time to do that yet. If the new plans are used, the time seems to be around 8s. That is a considerable improvement from 20s, although less than we expected from the original measurements. The scaling seems to be not linear from the numbers obtained based on the amount of data on mentat-dev.

Mmmkay, let's see after you get into it. Scaling is what I was afraid of.

Actions #42

Updated by Radko Krkoš over 5 years ago

The attached script is running on mentat-alt now to test for formal equality of IP arrays and aggregated values. I think letting this run for some time is enough to confirm correctness.

Actions #43

Updated by Jan Mach over 5 years ago

Due to the bug in algorithm for calculating aggregated IP ranges it is necessary to recalculate the source|target_ip_aggr_ columns and redo the database maintenance tasks.

Mentat is now down on mentat-alt and I have just launched the column recalculations on in tmux terminal.

Actions #44

Updated by Jan Mach over 5 years ago

Everything is up and running again on mentat-alt.

Actions #45

Updated by Jan Mach about 5 years ago

  • Status changed from Feedback to Closed

Code for this issue is running for a long time on our development server and without any problems. I now consider this issue to be closed.

Actions

Also available in: Atom PDF