Project

General

Profile

Actions

Task #4100

closed

Move datetime evaluations to DB where appropriate

Added by Radko Krkoš almost 6 years ago. Updated over 5 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
Development - Core
Target version:
Start date:
05/22/2018
Due date:
% Done:

100%

Estimated time:
To be discussed:

Description

There are queries for all events from a time window <T-4h, T), which use externally supplied timestamp, e.g.:

2018-05-21 09:55:04.113 CEST statement: SELECT * FROM events WHERE "cesnet_storagetime" >= '2018-05-21T05:55:01.362722'::timestamp

Such queries are currently planned based on ANALYZE statistics which get out of date very quickly, leading to suboptimal plans.
Replacing those queries by:

SELECT * FROM events WHERE "cesnet_storagetime" >= NOW() - INTERVAL '4h'

would allow the planner to understand the true meaning and choose the optimal plan (single threaded sequential scan over storagetime index).
  • The origin of these queries is currently unknown, reporter is the suspect.
Actions #1

Updated by Radko Krkoš almost 6 years ago

  • This query seems to be ran every hour at n:55, n=0,1,2,...,23.
Actions #2

Updated by Radko Krkoš almost 6 years ago

After more investigation (by Jan Mach), watchdog was confirmed to be the source of the query. The purpose is to detect whether any data were stored in the last 2 hours (stored in GMT).
The most efficient way to do this is:

SELECT MAX > CURRENT_TIMESTAMP AT TIME ZONE 'GMT' - INTERVAL '2h' FROM events;

returning boolean TRUE/FALSE, with run time of <1ms, therefore an improvement of 4 orders of magnitude.

Actions #3

Updated by Radko Krkoš almost 6 years ago

Unmangled query, sorry:

SELECT MAX(cesnet_storagetime) > CURRENT_TIMESTAMP AT TIME ZONE 'GMT’ - INTERVAL '2h’ FROM events;

Actions #4

Updated by Jan Mach almost 6 years ago

  • Target version set to Backlog
Actions #5

Updated by Radko Krkoš almost 6 years ago

  • Status changed from New to Resolved

This query was used by the watchdog and is being migrated to a more efficient one like:

SELECT max(cesnet_storagetime) > NOW() AT TIME ZONE 'GMT' - INTERVAL '2h' FROM events;

returning the boolean value that watchdog is interested in, with runtime of <1ms.

Actions #6

Updated by Jan Mach almost 6 years ago

  • Status changed from Resolved to In Progress

This issue is not resolved yet, implementation was not yet done.

Actions #7

Updated by Jan Mach over 5 years ago

  • Target version changed from Backlog to 2.1
Actions #8

Updated by Jan Mach over 5 years ago

  • Status changed from In Progress to Resolved
  • % Done changed from 0 to 100

Attached commit successfully resolves the issue. Accepted solution:

params = ('{:d}s'.format(interval),)
query = psycopg2.sql.SQL("SELECT max({}) > NOW() AT TIME ZONE 'GMT' - INTERVAL %s AS watchdog FROM events").\
    format(psycopg2.sql.Identifier('cesnet_storagetime'))

Issue is in the resolved state, waiting for the verification after building and deploying the packages on the server before closing it for good.

Actions #9

Updated by Pavel Kácha over 5 years ago

  • Assignee changed from Jan Mach to Radko Krkoš
Actions #10

Updated by Radko Krkoš over 5 years ago

  • Status changed from Resolved to Closed

Accepted solution matches the recommendation. If the watchdog works as intended, then it is an improvement from the DB point of view. "Less is less". Closing the issue.

Actions

Also available in: Atom PDF