Knowledge Base PostgreSQL¶
- Table of contents
- Knowledge Base PostgreSQL
ToDo v ramci PostgreSQL¶
- zjistit za jakych okolnosti se pouziji indexy
- otestovat vytvoreni slozenych indexu
- zmerit query s a bez Btree/GIN indexu
- porovnani IP s rozsahem
JSON/JSONB Datove typy¶
JSON primitive type | PostgreSQL type | Notes |
string |
text | \u0000 is disallowed, as are non-ASCII Unicode escapes if database encoding is not UTF8 |
number |
number | NaN and infinity values are disallowed |
boolean |
boolean | Only lowercase true and false spellings are accepted |
null |
(none) | SQL NULL is a different concept |
- Postgres - Numeric Types
Name Storage Size Description Range numeric variable user-specified precision, exact up to 131072 digits before the decimal point; up to 16383 digits after the decimal point
Jelikoz datove typy JSON a JSONB podporuji pouze 4 primitivni JSON typy budeme se muset vyporadat s konverzi IP adres (a pripadne i data a casu) do typu 'number' (ekvivalent Postgresiho typu numeric). Pro konverzi lze ale primo vyuzit funkce a prikazy PostgreSQL.
Funkce a operatory¶
Porovnavaci operatory¶
The standard comparison operators:
Operator | Description |
< |
less than |
> |
greater than |
<= |
less than or equal to |
>= |
greater than or equal to |
= |
equal |
<> or != |
not equal |
are available for jsonb, but not for json.
Operatory datovych typu json
a jsonb
¶
Operator | Right Operand Type | Description | Example | Example Result |
-> |
int | Get JSON array element (indexed from zero) | '[{"a":"foo"},{"b":"bar"},{"c":"baz"}]'::json -> 2 |
{"c":"baz"} |
-> |
text | Get JSON object field by key | '{"a": {"b":"foo"}}'::json -> 'a' |
{"b":"foo"} |
->> |
int | Get JSON array element as text | '[1,2,3]'::json ->> 2 |
3 |
->> |
text | Get JSON object field as text | '{"a":1,"b":2}'::json ->> 'b' |
2 |
#> |
text[] | Get JSON object at specified path | '{"a": {"b":{"c": "foo"}}}'::json #> '{a,b}' |
{"c": "foo"} |
#>> |
text[] | Get JSON object at specified path as text | '{"a":[1,2,3],"b":[4,5,6]}'::json #>> '{a,2}' |
3 |
Dalsi jsonb
operatory¶
Operator | Right Operand Type | Description | Example |
@> |
jsonb | Does the left JSON value contain within it the right value? | '{"a":1, "b":2}'::jsonb @> '{"b":2}'::jsonb |
<@ | jsonb | Is the left JSON value contained within the right value? | '{"b":2}'::jsonb <@ '{"a":1, "b":2}'::jsonb |
? |
text | Does the key/element string exist within the JSON value? | '{"a":1, "b":2}'::jsonb ? 'b' |
?| |
text[] | Do any of these key/element strings exist? | '{"a":1, "b":2, "c":3}'::jsonb ?|array['b', 'c'] |
?& |
text[] | Do all of these key/element strings exist? | '["a", "b"]'::jsonb ?& array['a', 'b'] |
GIN Index¶
Typ JSONB podporuje GIN index dvou typu:
- komplexni
json_ops
, podporujici vsechny JSON operatory (@>
, <@, ?, ?|, ?&) - rychly a usporny
jsonb_path_ops
, podporujici pouze operator@>
GIN index json_ops
se vytvori nad daty pomoci:
CREATE INDEX jsonb_test_gin on jsonb_test using gin (data); EXPLAIN SELECT * FROM jsonb_test WHERE data @> '{ "Description":"Connection attempt" }'; QUERY PLAN -------------------------------------------------------------------- Seq Scan on jsonb_test (cost=0.00..1.07 rows=1 width=36) Filter: (data @> '{"Description": "Connection attempt"}'::jsonb) (2 řádky)
Index jsonb_path_ops
pak pomoci:
CREATE INDEX jsonb_test_path_gin on jsonb_test using gin (data jsonb_path_ops); EXPLAIN SELECT * FROM jsonb_test WHERE data @> '{ "Description":"Connection attempt" }'; QUERY PLAN -------------------------------------------------------------------- Seq Scan on jsonb_test (cost=0.00..1.07 rows=1 width=36) Filter: (data @> '{"Description": "Connection attempt"}'::jsonb) (2 řádky)
Zakladni prace s PostgreSQL¶
Vytvoreni databaze, uzivatele, nastaveni prav¶
CREATE DATABASE jsonb_db; CREATE USER plesnik; GRANT ALL ON DATABASE jsonb_db TO plesnik; CREATE TABLE jsonb_test(id serial, data jsonb);
Schema tabulky jsonb_test
¶
jsonb_db=> \d+ jsonb_test; Table "public.jsonb_test" Column | Type | Modifiers | Storage | Stats target | Description --------+---------+---------------------------------------------------------+----------+--------------+------------- id | integer | not null default nextval('jsonb_test_id_seq'::regclass) | plain | | data | jsonb | | extended | |
INSERT JSON dokumentu¶
- dokument: test.json (JSON_doc)
INSERT INTO jsonb(data) VALUES ('{JSON_doc}');
SELECT JSON dokumentu¶
SELECT * FROM jsonb_test WHERE data @> '{ "Description":"Connection attempt" }';
- EXPLAIN
EXPLAIN SELECT * FROM jsonb_test WHERE data @> '{"Description":"Connection attempt" }'; QUERY PLAN -------------------------------------------------------------------- Seq Scan on jsonb_test (cost=0.00..25.38 rows=1 width=36) Filter: (data @> '{"Description": "Connection attempt"}'::jsonb) (2 řádky)
Testovani operatoru¶
!=
, =
, >
, <
, >=
, <=
, AND
, OR
¶
SELECT COUNT(*) FROM alerts WHERE alert #> '{DetectTime}' >= '1465182258' AND alert #> '{DetectTime}' <= '1465192258'; count -------- 177079 SELECT COUNT(*) FROM alerts WHERE alert #> '{DetectTime}' >= '1465182258' AND alert #> '{DetectTime}' <= '1465192258' AND alert #> '{Category,0}' = '"Recon.Scanning"'; count -------- 167868 SELECT COUNT(*) FROM alerts WHERE alert #> '{DetectTime}' >= '1465182258' AND alert #> '{DetectTime}' <= '1465192258' AND alert #> '{Category,0}' != '"Recon.Scanning"'; count ------- 9211 SELECT COUNT(*) FROM alerts WHERE (alert #> '{DetectTime}' >= '1465182258' AND alert #> '{DetectTime}' <= '1465192258') AND (alert #> '{Category,0}' = '"Recon.Scanning"' OR alert #> '{Category,0}' = '"Attempt.Login"'); count -------- 176432
BETWEEN x AND y¶
SELECT * FROM alerts WHERE alert #> '{DetectTime}' BETWEEN '1465182257' AND '1465182259' limit 1; {"ID": "1-1465185939.235388-Qb5U3LG6i8dK", "ts": 1465186081, "Node": [{"Name": "cz.cesnet.mentat.warden_filer", "Type": ["Relay"]}, {"Name": "cz.cesnet.au1.warden_filer", "Type": ["Relay"]}, {"SW": ["LaBrea"], "Name": "cesnet.au1", "Type": ["Connection", "Tarpit"]}], "ts_u": 1465186081.570976, "Attach": [{"Type": ["Syslog"], "Content": "1465182258 Initial Connect - tarpitting: 47.17.234.136 62992 -> 195.113.252.177 930", "ContentType": "text/plain"}], "Format": "IDEA0", "Source": [{"IP4": [{"ip": 789703304, "max": 789703304, "min": 789703304}], "Port": [62992]}], "Target": [{"IP4": [{"ip": 3279027377, "max": 3279027377, "min": 3279027377}], "Port": [930]}], "class2": "idea", "_CESNET": {"Impact": "Remote host 47.17.234.136:62992 connected to honeypot to port 930", "StorageTime": 1465186081, "EventTemplate": "labrea-001"}, "Category": ["Recon.Scanning"], "CeaseTime": 1465185837, "ConnCount": 226, "CreateTime": 1465185939, "DetectTime": 1465182258, "Description": "Connection attempt"}
IN¶
- Sequential scan:
select * from alerts where alert@>'{"Node":[{"Type":["Connection"]}]}' limit 1; {"ID": "1-1465578504.908653-x2SZKfGOkXNj", "ts": 1465578603, "Node": [{"Name": "cz.cesnet.mentat.warden_filer", "Type": ["Relay"]}, {"Name": "cz.cesnet.au1.warden_filer", "Type": ["Relay"]}, {"SW": ["LaBrea"], "Name": "cesnet.au1", "Type": ["Connection", "Tarpit"]}], "ts_u": 1465578603.376266, "Attach": [{"Type": ["Syslog"], "Content": "1465575581 Initial Connect - tarpitting: 171.88.84.104 58014 -> 195.113.252.33 666", "ContentType": "text/plain"}], "Format": "IDEA0", "Source": [{"IP4": [{"ip": 2874692712, "max": 2874692712, "min": 2874692712}], "Port": [58014]}], "Target": [{"IP4": [{"ip": 3279027233, "max": 3279027233, "min": 3279027233}], "Port": [666]}], "class2": "idea", "_CESNET": {"Impact": "Remote host 171.88.84.104:58014 connected to honeypot to port 666", "StorageTime": 1465578603, "EventTemplate": "labrea-001"}, "Category": ["Recon.Scanning"], "CeaseTime": 1465577204, "ConnCount": 14, "CreateTime": 1465578504, "DetectTime": 1465575581, "Description": "Connection attempt"}
- GIN Index scan:
SELECT count(*) FROM alerts WHERE alert->'Category' ? 'Intrusion.Botnet';
CREATE INDEX - GIN¶
CREATE INDEX categoryGin ON alerts USING GIN ((alert->'Category'));
CREATE INDEX - BTREE¶
CREATE INDEX categoryBtree ON alerts USING BTREE ((alert->'Category' ::text));
Filter by IP¶
select count(*) from alerts where alert @> '{"Source":[{"IP4":[{"ip": 3642164482}]}]}';
Poznatky¶
Predtim¶
- umime inteligentni IN
23.11.2016¶
- umime inteligentni IN i s indexy
- neni mozne prohledavat pole JSON objektu obsahujici pole: {JSON_Object[{JSON Object[neco]},…]}
- pri velkem mnozstvi dat (vice jako 90% tabulky) je paralelni sekvencni sken rychlejsi nez MongoDB s indexem
- pri malem poctu dat je PostgreSQL s GIN indexem ekvivalentni MongoDB s indexem
29.11.2016¶
- zamerit se na format zprav, ktere do systemu chodi
- ipblacklist filter reportuje obe adresy (bot, cc) v Source
- ftas reportuje dlouhe, neformatovane udalosti bez vyuziti IDEA identifikatoru (UID: 1465309283_gc15.cesnet.cz_1212_src_ip_217.23.5.21)
26.04.2017¶
- typovanie JSON objektu pre index iné ako text je možné len po vložení dát, t.j. na prázdnej tabuľke CREATE INDEX on alerts(alert->'ts'::integer); zlyhá, CREATE INDEX on alerts(alert->'ts'::text); prebehne
Zaver¶
- chybi relacni operatory (>,>=,<=,<) pro praci s JSON dokumenty v sobe kombinujici pole a objekty
- ruzne zdroje vyuzivaji odlisne koncepty prace s volnou strukturou IDEA formatu (viz poznamky z 29.11.2016)
- srovnani rychlosti viz tabulka https://homeproj.cesnet.cz/projects/mentat/wiki/AnalysisDatabaseTesting#Mereni-vykonu
Odkazy¶