Project

General

Profile

Actions

Bug #4384

closed

Possibility of DoS by repeating long query

Added by Pavel Kácha over 5 years ago. Updated about 4 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
Development - Core
Target version:
Start date:
10/19/2018
Due date:
% Done:

100%

Estimated time:
To be discussed:
No

Description

Long query does not stop when user closes or resubmits the page. This can cause several versions of one or more long queries running in vain. Possibilities:

  1. Limit number of queries for one user.
  2. Run the same query for the same user just once (if identical query arrives from the same user, kill the previous one).
  3. Cache queries, for second identical query don't run the new one, but return results from the first one, when ready.

1 and 2 might be reasonable to implement by means of named transactions (as we could find previous or "identical" transaction and its owner by name and kill it, or give to the user possibility to choose what to kill).
3 is hard to implement, needs connection pool/caching, own data transfers on result. Probably not worth the hassle.


Related issues

Related to Mentat - Bug #4515: Runaway query?ClosedPavel Kácha01/14/2019

Actions
Related to Mentat - Feature #4609: Arbitrary grouping and sorting in EventsClosedJan Mach01/30/2019

Actions
Related to Mentat - Bug #6191: Implement thread safety in eventstorage service moduleClosedPavel Kácha01/14/2020

Actions
Actions #1

Updated by Pavel Kácha over 5 years ago

  • Assignee changed from Jan Mach to Radko Krkoš

Assigning to Radko for initial exploration.

Actions #2

Updated by Pavel Kácha over 5 years ago

We can also kill the query by JS when associated web page is closed. Something by means of:

window.addEventListener("beforeunload", function(e){
   navigator.sendBeacon("pageclosed?queryid=slimy");
}, false);

Actions #3

Updated by Radko Krkoš over 5 years ago

The named transactions we talked about in telco were only a proposal and are not implemented in PostgreSQL. Sorry for confusion.
A workaround would be using comments for naming, for example like this:

SELECT * FROM events; -- User42

Actions #4

Updated by Radko Krkoš over 5 years ago

  • % Done changed from 0 to 10

Unfortunately, the comments are removed and cannot be seen in pg_stat_activity output.

Alternative way would be the use of in-query renaming, something along the line of:

SELECT * FROM events AS User42;

Then, it is possible to read the list of running queries:

SELECT pid, query, state FROM pg_stat_activity;
  pid  |                      query                      | state  
-------+-------------------------------------------------+--------
 27934 |                                                 | 
 22545 | SELECT pid, query, state FROM pg_stat_activity; | active
 22581 | SELECT * FROM events AS User42;                 | idle
 24354 |                                                 | idle
 27931 |                                                 | 
 27930 |                                                 | 
 27932 |                                                 | 
(7 rows)

(The state column must also be considered, especially if transactions are used).

This of course renames the table for the purposes of the query so might not be a good idea if you need to reference the table name elsewhere in the query. A workaround might be defining an empty but named windowing function, which is thrown away by the planner (so it has no effect at all), but nevertheless still part of the query, like this:

SELECT * FROM events WINDOW User42 AS ();

Neither option is particularly pretty. Let's discuss this sometimes.

Actions #5

Updated by Pavel Kácha over 5 years ago

Another shot:

SELECT 'query_bflm_1234' AS query_name, * FROM events WHERE ...;

If API reads dicts, dicts will now have one more key - query_name: "query_bflm_1234".

Actions #6

Updated by Radko Krkoš over 5 years ago

Actions #7

Updated by Radko Krkoš about 5 years ago

  • Status changed from New to Feedback

Several options for naming the queries have been found. It is up to Mek to pick his favorite. Let's discuss this online.

Actions #8

Updated by Radko Krkoš about 5 years ago

  • Status changed from Feedback to In Progress
  • Assignee changed from Radko Krkoš to Jan Mach

Reassigning, option SELECT * FROM events AS User42; from #4384#note-4 was chosen for implementation.

Actions #9

Updated by Pavel Kácha about 5 years ago

Not creating subsequent issues yet, but when time comes, it will be probably necessary to

  • create web API to signal, that page has been closed (which will consequently kill the related query)
  • check for too many queries run by user (and stop him politely)
  • nice to have - query manager as companion to "Database status"
Actions #10

Updated by Pavel Kácha about 5 years ago

  • Related to Feature #4609: Arbitrary grouping and sorting in Events added
Actions #11

Updated by Jan Mach over 4 years ago

  • To be discussed changed from No to Yes
Actions #12

Updated by Jan Mach over 4 years ago

  • % Done changed from 10 to 90

Groundwork for working with named queries should now be in place. There are following features:

  • Each query submitted via /events/search or /api/events/search endpoint gets assigned unique identifier like '{:d}_{:s}'.format(current_user_id, random_str(6, ascii_lowercase)).
  • The mentat.services.eventstorage service now supports querying pg_stat_activity table for these named queries and canceling them with pg_cancel_background().
  • The /dbstatus/view endpoint now contains new tab with list of all currently running named queries. There are also endpoints /dbstatus/query/<id>/stop> and /api/dbstatus/query/<id>/stop> for stopping the query with mandatory confirmation dialog.

To test this feature you may execute following long running query:

https://mentat-alt.cesnet.cz/mentat/events/search?dt_from=&dt_to=&source_addrs=&source_ports=&submit=Search&limit=25000

Then immediatelly go to the DB status overview page:

https://mentat-alt.cesnet.cz/mentat/dbstatus/view

Things to discuss:
  1. How to efficiently name the query before executing the search. Currently the unique identifier is generated on the backend prior to the search. This approach is bulletproof, because every search will get unique identifier. However by using this approach the query request was already submitted by the user and the HTTP request is blocking, client waiting for response. There is no straightforward way how to pass the unique id back to the client, so that for example the query might get canceled on browser tab close. Obvious solution would be to generate the ID when displaying the form, so that the id is known by the client before the search is initiated. However there are many pitfalls in this approach, for example: how to handle form resubmission with respect to the uniqueness of the ID (there are valid cases); how to deal with search queries initiated with stored/prepared queries (in this case the form was never initially shown, so here is no query ID); etc.
Actions #13

Updated by Pavel Kácha over 4 years ago

Jan Mach wrote:

Groundwork for working with named queries should now be in place. There are following features:

Cool stuff!

To test this feature you may execute following long running query:

...

Then immediatelly go to the DB status overview page:

https://mentat-alt.cesnet.cz/mentat/dbstatus/view

https://mentat-alt.cesnet.cz/mentat/dbstatus/query/97_jiwmzm/stop, clicked from dbstatus/view Kill query, returns 404.

Things to discuss:
  1. How to efficiently name the query before executing the search. Currently the unique identifier is generated on the backend prior to the search. This approach is bulletproof, because every search will get unique identifier. However by using this approach the query request was already submitted by the user and the HTTP request is blocking, client waiting for response. There is no straightforward way how to pass the unique id back to the client, so that for example the query might get canceled on browser tab close. Obvious solution would be to generate the ID when displaying the form, so that the id is known by the client before the search is initiated. However there are many pitfalls in this approach, for example: how to handle form resubmission with respect to the uniqueness of the ID (there are valid cases); how to deal with search queries initiated with stored/prepared queries (in this case the form was never initially shown, so here is no query ID); etc.

What are valid cases of resubmission?

Actions #14

Updated by Jan Mach over 4 years ago

Pavel Kácha wrote:

Jan Mach wrote:

Groundwork for working with named queries should now be in place. There are following features:

Cool stuff!

To test this feature you may execute following long running query:

...

Then immediatelly go to the DB status overview page:

https://mentat-alt.cesnet.cz/mentat/dbstatus/view

https://mentat-alt.cesnet.cz/mentat/dbstatus/query/97_jiwmzm/stop, clicked from dbstatus/view Kill query, returns 404.

That is because the query to the database is still too short. Before you confirm the stop, it is no longer running.

Things to discuss:
  1. How to efficiently name the query before executing the search. Currently the unique identifier is generated on the backend prior to the search. This approach is bulletproof, because every search will get unique identifier. However by using this approach the query request was already submitted by the user and the HTTP request is blocking, client waiting for response. There is no straightforward way how to pass the unique id back to the client, so that for example the query might get canceled on browser tab close. Obvious solution would be to generate the ID when displaying the form, so that the id is known by the client before the search is initiated. However there are many pitfalls in this approach, for example: how to handle form resubmission with respect to the uniqueness of the ID (there are valid cases); how to deal with search queries initiated with stored/prepared queries (in this case the form was never initially shown, so here is no query ID); etc.

What are valid cases of resubmission?

What if I have browser tab with search results open and just want to search again to see latest snapshot?

Actions #15

Updated by Pavel Kácha over 4 years ago

Jan Mach wrote:

Pavel Kácha wrote:

https://mentat-alt.cesnet.cz/mentat/dbstatus/query/97_jiwmzm/stop, clicked from dbstatus/view Kill query, returns 404.

That is because the query to the database is still too short. Before you confirm the stop, it is no longer running.

I see. You're right, query was not in the db anymore, but in Apache/Python. Maybe we should show something like "Query has already finished or unknown."

Things to discuss:
  1. How to efficiently name the query before executing the search. Currently the unique identifier is generated on the backend prior to the search. This approach is bulletproof, because every search will get unique identifier. However by using this approach the query request was already submitted by the user and the HTTP request is blocking, client waiting for response. There is no straightforward way how to pass the unique id back to the client, so that for example the query might get canceled on browser tab close. Obvious solution would be to generate the ID when displaying the form, so that the id is known by the client before the search is initiated. However there are many pitfalls in this approach, for example: how to handle form resubmission with respect to the uniqueness of the ID (there are valid cases); how to deal with search queries initiated with stored/prepared queries (in this case the form was never initially shown, so here is no query ID); etc.

What are valid cases of resubmission?

What if I have browser tab with search results open and just want to search again to see latest snapshot?

The query has already finished (as I see the results), and I'm free to reuse the identifier?

Actions #16

Updated by Pavel Kácha over 4 years ago

Today meeting conclusions:

  • seems we actually do not know how to to kill query on close unless we go full javascript, so let's not overcomplicate
  • so it seems web page does not need to know query id beforehand then, so let's stay on already implemented model
  • limit for queries per user will suffice, "you have too much queries on fly, wait for them to finish first"
  • new query with the same identifier kills existing one - this crudely solves multiple clicks
    • we might as well put small piece of JS there, which deactivates the Submit button (or the form altogether) after submission (do we actually want this? different ticket perhaps?)
  • would be nice to show all queries (even not named) for admin

P.S.: Mek, feel free to implement user id's within query identifier however you see it proper/convenient. My apologies for pushing implementation details on you.

Actions #17

Updated by Jan Mach over 4 years ago

  • Target version changed from Backlog to 2.6
Actions #18

Updated by Jan Mach over 4 years ago

  • To be discussed changed from Yes to No
Actions #19

Updated by Jan Mach over 4 years ago

  • Status changed from In Progress to Feedback
  • % Done changed from 90 to 100
  • To be discussed changed from No to Yes

Remarks from last meeting were addressed.

Actions #20

Updated by Jan Mach over 4 years ago

  • Related to Bug #6191: Implement thread safety in eventstorage service module added
Actions #21

Updated by Jan Mach over 4 years ago

  • Status changed from Feedback to Closed

I have found out, that the final issues I have been trying to work out are direct cause of the fact, that direct database access to IDEA events is not implemented in thread-safe manner. I have created new issue #6191 to address this problem, so this one can be now closed.

Actions #22

Updated by Jan Mach about 4 years ago

  • To be discussed changed from Yes to No
Actions

Also available in: Atom PDF