Project

General

Profile

Knowledge Base PostgreSQL

ToDo v ramci PostgreSQL

  • zjistit za jakych okolnosti se pouziji indexy
  • otestovat vytvoreni slozenych indexu
  • zmerit query s a bez Btree/GIN indexu
  • porovnani IP s rozsahem

JSON/JSONB Datove typy

JSON primitive type PostgreSQL type Notes
string text \u0000 is disallowed, as are non-ASCII Unicode escapes if database encoding is not UTF8
number number NaN and infinity values are disallowed
boolean boolean Only lowercase true and false spellings are accepted
null (none) SQL NULL is a different concept
  • Postgres - Numeric Types
    Name Storage Size Description Range
    numeric variable user-specified precision, exact up to 131072 digits before the decimal point; up to 16383 digits after the decimal point

Jelikoz datove typy JSON a JSONB podporuji pouze 4 primitivni JSON typy budeme se muset vyporadat s konverzi IP adres (a pripadne i data a casu) do typu 'number' (ekvivalent Postgresiho typu numeric). Pro konverzi lze ale primo vyuzit funkce a prikazy PostgreSQL.

Funkce a operatory

Porovnavaci operatory

The standard comparison operators:

Operator Description
< less than
> greater than
<= less than or equal to
>= greater than or equal to
= equal
<> or != not equal

are available for jsonb, but not for json.

Operatory datovych typu json a jsonb

Operator Right Operand Type Description Example Example Result
-> int Get JSON array element (indexed from zero) '[{"a":"foo"},{"b":"bar"},{"c":"baz"}]'::json -> 2 {"c":"baz"}
-> text Get JSON object field by key '{"a": {"b":"foo"}}'::json -> 'a' {"b":"foo"}
->> int Get JSON array element as text '[1,2,3]'::json ->> 2 3
->> text Get JSON object field as text '{"a":1,"b":2}'::json ->> 'b' 2
#> text[] Get JSON object at specified path '{"a": {"b":{"c": "foo"}}}'::json #> '{a,b}' {"c": "foo"}
#>> text[] Get JSON object at specified path as text '{"a":[1,2,3],"b":[4,5,6]}'::json #>> '{a,2}' 3

Dalsi jsonb operatory

Operator Right Operand Type Description Example
@> jsonb Does the left JSON value contain within it the right value? '{"a":1, "b":2}'::jsonb @> '{"b":2}'::jsonb
<@ jsonb Is the left JSON value contained within the right value? '{"b":2}'::jsonb <@ '{"a":1, "b":2}'::jsonb
? text Does the key/element string exist within the JSON value? '{"a":1, "b":2}'::jsonb ? 'b'
?|
text[] Do any of these key/element strings exist? '{"a":1, "b":2, "c":3}'::jsonb
?|
array['b', 'c']
?& text[] Do all of these key/element strings exist? '["a", "b"]'::jsonb ?& array['a', 'b']

GIN Index

Typ JSONB podporuje GIN index dvou typu:

  1. komplexni json_ops, podporujici vsechny JSON operatory (@>, <@, ?, ?|, ?&)
  2. rychly a usporny jsonb_path_ops, podporujici pouze operator @>

GIN index json_ops se vytvori nad daty pomoci:

CREATE INDEX jsonb_test_gin on jsonb_test using gin (data);

EXPLAIN SELECT * FROM jsonb_test WHERE data @> '{
"Description":"Connection attempt" }';
                             QUERY PLAN
--------------------------------------------------------------------
 Seq Scan on jsonb_test  (cost=0.00..1.07 rows=1 width=36)
   Filter: (data @> '{"Description": "Connection attempt"}'::jsonb)
(2 řádky)

Index jsonb_path_ops pak pomoci:

CREATE INDEX jsonb_test_path_gin on jsonb_test using gin (data
jsonb_path_ops);

EXPLAIN SELECT * FROM jsonb_test WHERE data @> '{
"Description":"Connection attempt" }';
                             QUERY PLAN
--------------------------------------------------------------------
 Seq Scan on jsonb_test  (cost=0.00..1.07 rows=1 width=36)
   Filter: (data @> '{"Description": "Connection attempt"}'::jsonb)
(2 řádky)

Zakladni prace s PostgreSQL

Vytvoreni databaze, uzivatele, nastaveni prav

CREATE DATABASE jsonb_db;
CREATE USER plesnik;
GRANT ALL ON DATABASE jsonb_db TO plesnik;
CREATE TABLE jsonb_test(id serial, data jsonb);

Schema tabulky jsonb_test

jsonb_db=> \d+ jsonb_test;
                                             Table "public.jsonb_test" 
 Column |  Type   |                        Modifiers                        | Storage  | Stats target | Description
--------+---------+---------------------------------------------------------+----------+--------------+-------------
 id     | integer | not null default nextval('jsonb_test_id_seq'::regclass) | plain    |              |
 data   | jsonb   |                                                         | extended |              |

INSERT JSON dokumentu

INSERT INTO jsonb(data) VALUES ('{JSON_doc}');

SELECT JSON dokumentu

SELECT * FROM jsonb_test WHERE data @> '{ "Description":"Connection attempt" }';
  • EXPLAIN
    EXPLAIN SELECT * FROM jsonb_test WHERE data @> '{"Description":"Connection attempt" }';
                                 QUERY PLAN
    --------------------------------------------------------------------
     Seq Scan on jsonb_test  (cost=0.00..25.38 rows=1 width=36)
       Filter: (data @> '{"Description": "Connection attempt"}'::jsonb)
    (2 řádky)
    

Testovani operatoru

!=, =, >, <, >=, <=, AND, OR

SELECT COUNT(*) FROM alerts WHERE alert #> '{DetectTime}' >= '1465182258' AND alert #> '{DetectTime}' <= '1465192258';
 count
--------
 177079

SELECT COUNT(*) FROM alerts WHERE alert #> '{DetectTime}' >= '1465182258' AND alert #> '{DetectTime}' <= '1465192258' AND alert #> '{Category,0}' = '"Recon.Scanning"';
 count
--------
 167868

SELECT COUNT(*) FROM alerts WHERE alert #> '{DetectTime}' >= '1465182258' AND alert #> '{DetectTime}' <= '1465192258' AND alert #> '{Category,0}' != '"Recon.Scanning"';
 count
-------
  9211

SELECT COUNT(*) FROM alerts WHERE (alert #> '{DetectTime}' >= '1465182258' AND alert #> '{DetectTime}' <= '1465192258')
                               AND (alert #> '{Category,0}' = '"Recon.Scanning"' OR alert #> '{Category,0}' = '"Attempt.Login"');
 count
--------
 176432

BETWEEN x AND y

SELECT * FROM alerts WHERE alert #> '{DetectTime}' BETWEEN '1465182257' AND '1465182259' limit 1;

{"ID": "1-1465185939.235388-Qb5U3LG6i8dK", "ts": 1465186081, "Node": [{"Name": "cz.cesnet.mentat.warden_filer", "Type": ["Relay"]}, {"Name": "cz.cesnet.au1.warden_filer", "Type": ["Relay"]}, {"SW": ["LaBrea"], "Name": "cesnet.au1", "Type": ["Connection", "Tarpit"]}], "ts_u": 1465186081.570976, "Attach": [{"Type": ["Syslog"],
 "Content": "1465182258 Initial Connect - tarpitting: 47.17.234.136 62992 -> 195.113.252.177 930", "ContentType": "text/plain"}], "Format": "IDEA0", "Source": [{"IP4": [{"ip": 789703304, "max": 789703304, "min": 789703304}], "Port": [62992]}], "Target": [{"IP4": [{"ip": 3279027377, "max": 3279027377, "min": 3279027377}], "Port":
 [930]}], "class2": "idea", "_CESNET": {"Impact": "Remote host 47.17.234.136:62992 connected to honeypot to port 930", "StorageTime": 1465186081, "EventTemplate": "labrea-001"}, "Category": ["Recon.Scanning"], "CeaseTime": 1465185837, "ConnCount": 226, "CreateTime": 1465185939, "DetectTime": 1465182258, "Description":
 "Connection attempt"}

IN

  • Sequential scan:
select * from alerts where alert@>'{"Node":[{"Type":["Connection"]}]}' limit 1;

{"ID": "1-1465578504.908653-x2SZKfGOkXNj", "ts": 1465578603, "Node": [{"Name": "cz.cesnet.mentat.warden_filer", "Type": ["Relay"]}, {"Name": "cz.cesnet.au1.warden_filer", "Type": ["Relay"]}, {"SW": ["LaBrea"], "Name": "cesnet.au1", "Type":
 ["Connection", "Tarpit"]}], "ts_u": 1465578603.376266, "Attach": [{"Type": ["Syslog"], "Content": "1465575581 Initial Connect - tarpitting: 171.88.84.104 58014 -> 195.113.252.33 666", "ContentType": "text/plain"}], "Format": "IDEA0", "Source": [{"IP4": 
[{"ip": 2874692712, "max": 2874692712, "min": 2874692712}], "Port": [58014]}], "Target": [{"IP4": [{"ip": 3279027233, "max": 3279027233, "min": 3279027233}], "Port": [666]}], "class2": "idea", "_CESNET": {"Impact": "Remote host 171.88.84.104:58014
connected to honeypot to port 666", "StorageTime": 1465578603, "EventTemplate": "labrea-001"}, "Category": ["Recon.Scanning"], "CeaseTime": 1465577204, "ConnCount": 14, "CreateTime": 1465578504, "DetectTime": 1465575581, "Description":
 "Connection attempt"}
  • GIN Index scan:
SELECT count(*) FROM alerts WHERE alert->'Category' ? 'Intrusion.Botnet';

CREATE INDEX - GIN

CREATE INDEX categoryGin ON alerts USING GIN ((alert->'Category'));

CREATE INDEX - BTREE

CREATE INDEX categoryBtree ON alerts USING BTREE ((alert->'Category' ::text));

Filter by IP

select count(*) from alerts where alert @> '{"Source":[{"IP4":[{"ip": 3642164482}]}]}';

Poznatky

Predtim

  • umime inteligentni IN

23.11.2016

  • umime inteligentni IN i s indexy
  • neni mozne prohledavat pole JSON objektu obsahujici pole: {JSON_Object[{JSON Object[neco]},…]}
  • pri velkem mnozstvi dat (vice jako 90% tabulky) je paralelni sekvencni sken rychlejsi nez MongoDB s indexem
  • pri malem poctu dat je PostgreSQL s GIN indexem ekvivalentni MongoDB s indexem

29.11.2016

  • zamerit se na format zprav, ktere do systemu chodi
    • ipblacklist filter reportuje obe adresy (bot, cc) v Source
    • ftas reportuje dlouhe, neformatovane udalosti bez vyuziti IDEA identifikatoru (UID: 1465309283_gc15.cesnet.cz_1212_src_ip_217.23.5.21)

26.04.2017

  • typovanie JSON objektu pre index iné ako text je možné len po vložení dát, t.j. na prázdnej tabuľke CREATE INDEX on alerts(alert->'ts'::integer); zlyhá, CREATE INDEX on alerts(alert->'ts'::text); prebehne

Zaver

  1. chybi relacni operatory (>,>=,<=,<) pro praci s JSON dokumenty v sobe kombinujici pole a objekty
  2. ruzne zdroje vyuzivaji odlisne koncepty prace s volnou strukturou IDEA formatu (viz poznamky z 29.11.2016)
  3. srovnani rychlosti viz tabulka https://homeproj.cesnet.cz/projects/mentat/wiki/AnalysisDatabaseTesting#Mereni-vykonu

Odkazy