Bug #4384: Possibility of DoS by repeating long query - Mentat - Homeproj: Redmine for CESNET

Actions

Copy link

Bug #4384

closed

Possibility of DoS by repeating long query

Added by Pavel Kácha about 6 years ago. Updated almost 5 years ago.

Status:

Closed

Priority:

Normal

Assignee:

Jan Mach

Category:

Development - Core

Target version:

2.6

Start date:

10/19/2018

Due date:

% Done:

100%

Estimated time:

To be discussed:

Description

Long query does not stop when user closes or resubmits the page. This can cause several versions of one or more long queries running in vain. Possibilities:

Limit number of queries for one user.
Run the same query for the same user just once (if identical query arrives from the same user, kill the previous one).
Cache queries, for second identical query don't run the new one, but return results from the first one, when ready.

1 and 2 might be reasonable to implement by means of named transactions (as we could find previous or "identical" transaction and its owner by name and kill it, or give to the user possibility to choose what to kill).
3 is hard to implement, needs connection pool/caching, own data transfers on result. Probably not worth the hassle.

Related issues

Actions

Copy link

Updated by Pavel Kácha about 6 years ago

Assignee changed from Jan Mach to Radko Krkoš

Assigning to Radko for initial exploration.

Actions

Copy link

Updated by Pavel Kácha about 6 years ago

We can also kill the query by JS when associated web page is closed. Something by means of:

window.addEventListener("beforeunload", function(e){
   navigator.sendBeacon("pageclosed?queryid=slimy");
}, false);

Actions

Copy link

Updated by Radko Krkoš about 6 years ago

The named transactions we talked about in telco were only a proposal and are not implemented in PostgreSQL. Sorry for confusion.
A workaround would be using comments for naming, for example like this:

SELECT * FROM events; -- User42

Actions

Copy link

Updated by Radko Krkoš about 6 years ago

% Done changed from 0 to 10

Unfortunately, the comments are removed and cannot be seen in pg_stat_activity output.

Alternative way would be the use of in-query renaming, something along the line of:

SELECT * FROM events AS User42;

Then, it is possible to read the list of running queries:

SELECT pid, query, state FROM pg_stat_activity;
  pid  |                      query                      | state  
-------+-------------------------------------------------+--------
 27934 |                                                 | 
 22545 | SELECT pid, query, state FROM pg_stat_activity; | active
 22581 | SELECT * FROM events AS User42;                 | idle
 24354 |                                                 | idle
 27931 |                                                 | 
 27930 |                                                 | 
 27932 |                                                 | 
(7 rows)

(The state column must also be considered, especially if transactions are used).

This of course renames the table for the purposes of the query so might not be a good idea if you need to reference the table name elsewhere in the query. A workaround might be defining an empty but named windowing function, which is thrown away by the planner (so it has no effect at all), but nevertheless still part of the query, like this:

SELECT * FROM events WINDOW User42 AS ();

Neither option is particularly pretty. Let's discuss this sometimes.

Actions

Copy link

Updated by Pavel Kácha almost 6 years ago

Another shot:

SELECT 'query_bflm_1234' AS query_name, * FROM events WHERE ...;

If API reads dicts, dicts will now have one more key - query_name: "query_bflm_1234".

Actions

Copy link

Updated by Radko Krkoš almost 6 years ago

Related to Bug #4515: Runaway query? added

Actions

Copy link

Updated by Radko Krkoš almost 6 years ago

Status changed from New to Feedback

Several options for naming the queries have been found. It is up to Mek to pick his favorite. Let's discuss this online.

Actions

Copy link

Updated by Radko Krkoš almost 6 years ago

Status changed from Feedback to In Progress
Assignee changed from Radko Krkoš to Jan Mach

Reassigning, option SELECT * FROM events AS User42; from #4384#note-4 was chosen for implementation.

Actions

Copy link

Updated by Pavel Kácha almost 6 years ago

Not creating subsequent issues yet, but when time comes, it will be probably necessary to

create web API to signal, that page has been closed (which will consequently kill the related query)
check for too many queries run by user (and stop him politely)
nice to have - query manager as companion to "Database status"

Actions

Copy link

#10

Updated by Pavel Kácha almost 6 years ago

Related to Feature #4609: Arbitrary grouping and sorting in Events added

Actions

Copy link

#11

Updated by Jan Mach about 5 years ago

To be discussed changed from No to Yes

Actions

Copy link

#12

Updated by Jan Mach about 5 years ago

% Done changed from 10 to 90

Groundwork for working with named queries should now be in place. There are following features:

Each query submitted via /events/search or /api/events/search endpoint gets assigned unique identifier like '{:d}_{:s}'.format(current_user_id, random_str(6, ascii_lowercase)).
The mentat.services.eventstorage service now supports querying pg_stat_activity table for these named queries and canceling them with pg_cancel_background().
The /dbstatus/view endpoint now contains new tab with list of all currently running named queries. There are also endpoints /dbstatus/query/<id>/stop> and /api/dbstatus/query/<id>/stop> for stopping the query with mandatory confirmation dialog.

To test this feature you may execute following long running query:

https://mentat-alt.cesnet.cz/mentat/events/search?dt_from=&dt_to=&source_addrs=&source_ports=&submit=Search&limit=25000

Then immediatelly go to the DB status overview page:

https://mentat-alt.cesnet.cz/mentat/dbstatus/view

Things to discuss:

How to efficiently name the query before executing the search. Currently the unique identifier is generated on the backend prior to the search. This approach is bulletproof, because every search will get unique identifier. However by using this approach the query request was already submitted by the user and the HTTP request is blocking, client waiting for response. There is no straightforward way how to pass the unique id back to the client, so that for example the query might get canceled on browser tab close. Obvious solution would be to generate the ID when displaying the form, so that the id is known by the client before the search is initiated. However there are many pitfalls in this approach, for example: how to handle form resubmission with respect to the uniqueness of the ID (there are valid cases); how to deal with search queries initiated with stored/prepared queries (in this case the form was never initially shown, so here is no query ID); etc.

Actions

Copy link

#13

Updated by Pavel Kácha about 5 years ago

Jan Mach wrote:

Groundwork for working with named queries should now be in place. There are following features:

Cool stuff!

To test this feature you may execute following long running query:

...

Then immediatelly go to the DB status overview page:

https://mentat-alt.cesnet.cz/mentat/dbstatus/view

https://mentat-alt.cesnet.cz/mentat/dbstatus/query/97_jiwmzm/stop, clicked from dbstatus/view Kill query, returns 404.

Things to discuss:

How to efficiently name the query before executing the search. Currently the unique identifier is generated on the backend prior to the search. This approach is bulletproof, because every search will get unique identifier. However by using this approach the query request was already submitted by the user and the HTTP request is blocking, client waiting for response. There is no straightforward way how to pass the unique id back to the client, so that for example the query might get canceled on browser tab close. Obvious solution would be to generate the ID when displaying the form, so that the id is known by the client before the search is initiated. However there are many pitfalls in this approach, for example: how to handle form resubmission with respect to the uniqueness of the ID (there are valid cases); how to deal with search queries initiated with stored/prepared queries (in this case the form was never initially shown, so here is no query ID); etc.

What are valid cases of resubmission?

Actions

Copy link

#14

Updated by Jan Mach about 5 years ago

Pavel Kácha wrote:

Jan Mach wrote:

Groundwork for working with named queries should now be in place. There are following features:

Cool stuff!

To test this feature you may execute following long running query:

...

Then immediatelly go to the DB status overview page:

https://mentat-alt.cesnet.cz/mentat/dbstatus/view

https://mentat-alt.cesnet.cz/mentat/dbstatus/query/97_jiwmzm/stop, clicked from dbstatus/view Kill query, returns 404.

That is because the query to the database is still too short. Before you confirm the stop, it is no longer running.

Things to discuss:

How to efficiently name the query before executing the search. Currently the unique identifier is generated on the backend prior to the search. This approach is bulletproof, because every search will get unique identifier. However by using this approach the query request was already submitted by the user and the HTTP request is blocking, client waiting for response. There is no straightforward way how to pass the unique id back to the client, so that for example the query might get canceled on browser tab close. Obvious solution would be to generate the ID when displaying the form, so that the id is known by the client before the search is initiated. However there are many pitfalls in this approach, for example: how to handle form resubmission with respect to the uniqueness of the ID (there are valid cases); how to deal with search queries initiated with stored/prepared queries (in this case the form was never initially shown, so here is no query ID); etc.

What are valid cases of resubmission?

What if I have browser tab with search results open and just want to search again to see latest snapshot?

Actions

Copy link

#15

Updated by Pavel Kácha about 5 years ago

Jan Mach wrote:

Pavel Kácha wrote:

https://mentat-alt.cesnet.cz/mentat/dbstatus/query/97_jiwmzm/stop, clicked from dbstatus/view Kill query, returns 404.

That is because the query to the database is still too short. Before you confirm the stop, it is no longer running.

I see. You're right, query was not in the db anymore, but in Apache/Python. Maybe we should show something like "Query has already finished or unknown."

Things to discuss:

How to efficiently name the query before executing the search. Currently the unique identifier is generated on the backend prior to the search. This approach is bulletproof, because every search will get unique identifier. However by using this approach the query request was already submitted by the user and the HTTP request is blocking, client waiting for response. There is no straightforward way how to pass the unique id back to the client, so that for example the query might get canceled on browser tab close. Obvious solution would be to generate the ID when displaying the form, so that the id is known by the client before the search is initiated. However there are many pitfalls in this approach, for example: how to handle form resubmission with respect to the uniqueness of the ID (there are valid cases); how to deal with search queries initiated with stored/prepared queries (in this case the form was never initially shown, so here is no query ID); etc.

What are valid cases of resubmission?

What if I have browser tab with search results open and just want to search again to see latest snapshot?

The query has already finished (as I see the results), and I'm free to reuse the identifier?

Actions

Copy link

#16

Updated by Pavel Kácha about 5 years ago

Today meeting conclusions:

seems we actually do not know how to to kill query on close unless we go full javascript, so let's not overcomplicate
so it seems web page does not need to know query id beforehand then, so let's stay on already implemented model
limit for queries per user will suffice, "you have too much queries on fly, wait for them to finish first"
new query with the same identifier kills existing one - this crudely solves multiple clicks
- we might as well put small piece of JS there, which deactivates the Submit button (or the form altogether) after submission (do we actually want this? different ticket perhaps?)

would be nice to show all queries (even not named) for admin

P.S.: Mek, feel free to implement user id's within query identifier however you see it proper/convenient. My apologies for pushing implementation details on you.

Actions

Copy link

#17

Updated by Jan Mach about 5 years ago

Target version changed from Backlog to 2.6

Actions

Copy link

#18

Updated by Jan Mach about 5 years ago

To be discussed changed from Yes to No

Actions

Copy link

#19

Updated by Jan Mach almost 5 years ago

Status changed from In Progress to Feedback
% Done changed from 90 to 100
To be discussed changed from No to Yes

Remarks from last meeting were addressed.

Actions

Copy link

#20

Updated by Jan Mach almost 5 years ago

Related to Bug #6191: Implement thread safety in eventstorage service module added

Actions

Copy link

#21

Updated by Jan Mach almost 5 years ago

Status changed from Feedback to Closed

I have found out, that the final issues I have been trying to work out are direct cause of the fact, that direct database access to IDEA events is not implemented in thread-safe manner. I have created new issue #6191 to address this problem, so this one can be now closed.

Actions

Copy link

#22

Updated by Jan Mach almost 5 years ago

To be discussed changed from Yes to No

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Mentat

Custom queries

Bug #4384

Possibility of DoS by repeating long query

Updated by Pavel Kácha about 6 years ago

Updated by Pavel Kácha about 6 years ago

Updated by Radko Krkoš about 6 years ago

Updated by Radko Krkoš about 6 years ago

Updated by Pavel Kácha almost 6 years ago

Updated by Radko Krkoš almost 6 years ago

Updated by Radko Krkoš almost 6 years ago

Updated by Radko Krkoš almost 6 years ago

Updated by Pavel Kácha almost 6 years ago

Updated by Pavel Kácha almost 6 years ago

Updated by Jan Mach about 5 years ago

Updated by Jan Mach about 5 years ago

Updated by Pavel Kácha about 5 years ago

Updated by Jan Mach about 5 years ago

Updated by Pavel Kácha about 5 years ago

Updated by Pavel Kácha about 5 years ago

Updated by Jan Mach about 5 years ago

Updated by Jan Mach about 5 years ago

Updated by Jan Mach almost 5 years ago

Updated by Jan Mach almost 5 years ago

Updated by Jan Mach almost 5 years ago

Updated by Jan Mach almost 5 years ago