Bug #4384
closedPossibility of DoS by repeating long query
100%
Description
Long query does not stop when user closes or resubmits the page. This can cause several versions of one or more long queries running in vain. Possibilities:
- Limit number of queries for one user.
- Run the same query for the same user just once (if identical query arrives from the same user, kill the previous one).
- Cache queries, for second identical query don't run the new one, but return results from the first one, when ready.
1 and 2 might be reasonable to implement by means of named transactions (as we could find previous or "identical" transaction and its owner by name and kill it, or give to the user possibility to choose what to kill).
3 is hard to implement, needs connection pool/caching, own data transfers on result. Probably not worth the hassle.
Related issues
Updated by Pavel Kácha almost 6 years ago
- Assignee changed from Jan Mach to Radko Krkoš
Assigning to Radko for initial exploration.
Updated by Pavel Kácha almost 6 years ago
We can also kill the query by JS when associated web page is closed. Something by means of:
window.addEventListener("beforeunload", function(e){ navigator.sendBeacon("pageclosed?queryid=slimy"); }, false);
Updated by Radko Krkoš almost 6 years ago
The named transactions we talked about in telco were only a proposal and are not implemented in PostgreSQL. Sorry for confusion.
A workaround would be using comments for naming, for example like this:
SELECT * FROM events; -- User42
Updated by Radko Krkoš almost 6 years ago
- % Done changed from 0 to 10
Unfortunately, the comments are removed and cannot be seen in pg_stat_activity
output.
Alternative way would be the use of in-query renaming, something along the line of:
SELECT * FROM events AS User42;
Then, it is possible to read the list of running queries:
SELECT pid, query, state FROM pg_stat_activity; pid | query | state -------+-------------------------------------------------+-------- 27934 | | 22545 | SELECT pid, query, state FROM pg_stat_activity; | active 22581 | SELECT * FROM events AS User42; | idle 24354 | | idle 27931 | | 27930 | | 27932 | | (7 rows)
(The state column must also be considered, especially if transactions are used).
This of course renames the table for the purposes of the query so might not be a good idea if you need to reference the table name elsewhere in the query. A workaround might be defining an empty but named windowing function, which is thrown away by the planner (so it has no effect at all), but nevertheless still part of the query, like this:
SELECT * FROM events WINDOW User42 AS ();
Neither option is particularly pretty. Let's discuss this sometimes.
Updated by Pavel Kácha almost 6 years ago
Another shot:
SELECT 'query_bflm_1234' AS query_name, * FROM events WHERE ...;
If API reads dicts, dicts will now have one more key - query_name: "query_bflm_1234".
Updated by Radko Krkoš over 5 years ago
- Related to Bug #4515: Runaway query? added
Updated by Radko Krkoš over 5 years ago
- Status changed from New to Feedback
Several options for naming the queries have been found. It is up to Mek to pick his favorite. Let's discuss this online.
Updated by Radko Krkoš over 5 years ago
- Status changed from Feedback to In Progress
- Assignee changed from Radko Krkoš to Jan Mach
Reassigning, option SELECT * FROM events AS User42;
from #4384#note-4 was chosen for implementation.
Updated by Pavel Kácha over 5 years ago
Not creating subsequent issues yet, but when time comes, it will be probably necessary to
- create web API to signal, that page has been closed (which will consequently kill the related query)
- check for too many queries run by user (and stop him politely)
- nice to have - query manager as companion to "Database status"
Updated by Pavel Kácha over 5 years ago
- Related to Feature #4609: Arbitrary grouping and sorting in Events added
Updated by Jan Mach almost 5 years ago
- % Done changed from 10 to 90
Groundwork for working with named queries should now be in place. There are following features:
- Each query submitted via
/events/search
or/api/events/search
endpoint gets assigned unique identifier like'{:d}_{:s}'.format(current_user_id, random_str(6, ascii_lowercase))
. - The mentat.services.eventstorage service now supports querying
pg_stat_activity
table for these named queries and canceling them withpg_cancel_background()
. - The
/dbstatus/view
endpoint now contains new tab with list of all currently running named queries. There are also endpoints/dbstatus/query/<id>/stop>
and/api/dbstatus/query/<id>/stop>
for stopping the query with mandatory confirmation dialog.
To test this feature you may execute following long running query:
Then immediatelly go to the DB status overview page:
https://mentat-alt.cesnet.cz/mentat/dbstatus/view
Things to discuss:- How to efficiently name the query before executing the search. Currently the unique identifier is generated on the backend prior to the search. This approach is bulletproof, because every search will get unique identifier. However by using this approach the query request was already submitted by the user and the HTTP request is blocking, client waiting for response. There is no straightforward way how to pass the unique id back to the client, so that for example the query might get canceled on browser tab close. Obvious solution would be to generate the ID when displaying the form, so that the id is known by the client before the search is initiated. However there are many pitfalls in this approach, for example: how to handle form resubmission with respect to the uniqueness of the ID (there are valid cases); how to deal with search queries initiated with stored/prepared queries (in this case the form was never initially shown, so here is no query ID); etc.
Updated by Pavel Kácha almost 5 years ago
Jan Mach wrote:
Groundwork for working with named queries should now be in place. There are following features:
Cool stuff!
To test this feature you may execute following long running query:
...
Then immediatelly go to the DB status overview page:
https://mentat-alt.cesnet.cz/mentat/dbstatus/query/97_jiwmzm/stop, clicked from dbstatus/view Kill query, returns 404.
Things to discuss:
- How to efficiently name the query before executing the search. Currently the unique identifier is generated on the backend prior to the search. This approach is bulletproof, because every search will get unique identifier. However by using this approach the query request was already submitted by the user and the HTTP request is blocking, client waiting for response. There is no straightforward way how to pass the unique id back to the client, so that for example the query might get canceled on browser tab close. Obvious solution would be to generate the ID when displaying the form, so that the id is known by the client before the search is initiated. However there are many pitfalls in this approach, for example: how to handle form resubmission with respect to the uniqueness of the ID (there are valid cases); how to deal with search queries initiated with stored/prepared queries (in this case the form was never initially shown, so here is no query ID); etc.
What are valid cases of resubmission?
Updated by Jan Mach almost 5 years ago
Pavel Kácha wrote:
Jan Mach wrote:
Groundwork for working with named queries should now be in place. There are following features:
Cool stuff!
To test this feature you may execute following long running query:
...
Then immediatelly go to the DB status overview page:
https://mentat-alt.cesnet.cz/mentat/dbstatus/query/97_jiwmzm/stop, clicked from dbstatus/view Kill query, returns 404.
That is because the query to the database is still too short. Before you confirm the stop, it is no longer running.
Things to discuss:
- How to efficiently name the query before executing the search. Currently the unique identifier is generated on the backend prior to the search. This approach is bulletproof, because every search will get unique identifier. However by using this approach the query request was already submitted by the user and the HTTP request is blocking, client waiting for response. There is no straightforward way how to pass the unique id back to the client, so that for example the query might get canceled on browser tab close. Obvious solution would be to generate the ID when displaying the form, so that the id is known by the client before the search is initiated. However there are many pitfalls in this approach, for example: how to handle form resubmission with respect to the uniqueness of the ID (there are valid cases); how to deal with search queries initiated with stored/prepared queries (in this case the form was never initially shown, so here is no query ID); etc.
What are valid cases of resubmission?
What if I have browser tab with search results open and just want to search again to see latest snapshot?
Updated by Pavel Kácha almost 5 years ago
Jan Mach wrote:
Pavel Kácha wrote:
https://mentat-alt.cesnet.cz/mentat/dbstatus/query/97_jiwmzm/stop, clicked from dbstatus/view Kill query, returns 404.
That is because the query to the database is still too short. Before you confirm the stop, it is no longer running.
I see. You're right, query was not in the db anymore, but in Apache/Python. Maybe we should show something like "Query has already finished or unknown."
Things to discuss:
- How to efficiently name the query before executing the search. Currently the unique identifier is generated on the backend prior to the search. This approach is bulletproof, because every search will get unique identifier. However by using this approach the query request was already submitted by the user and the HTTP request is blocking, client waiting for response. There is no straightforward way how to pass the unique id back to the client, so that for example the query might get canceled on browser tab close. Obvious solution would be to generate the ID when displaying the form, so that the id is known by the client before the search is initiated. However there are many pitfalls in this approach, for example: how to handle form resubmission with respect to the uniqueness of the ID (there are valid cases); how to deal with search queries initiated with stored/prepared queries (in this case the form was never initially shown, so here is no query ID); etc.
What are valid cases of resubmission?
What if I have browser tab with search results open and just want to search again to see latest snapshot?
The query has already finished (as I see the results), and I'm free to reuse the identifier?
Updated by Pavel Kácha almost 5 years ago
Today meeting conclusions:
- seems we actually do not know how to to kill query on close unless we go full javascript, so let's not overcomplicate
- so it seems web page does not need to know query id beforehand then, so let's stay on already implemented model
- limit for queries per user will suffice, "you have too much queries on fly, wait for them to finish first"
- new query with the same identifier kills existing one - this crudely solves multiple clicks
- we might as well put small piece of JS there, which deactivates the Submit button (or the form altogether) after submission (do we actually want this? different ticket perhaps?)
- would be nice to show all queries (even not named) for admin
P.S.: Mek, feel free to implement user id's within query identifier however you see it proper/convenient. My apologies for pushing implementation details on you.
Updated by Jan Mach almost 5 years ago
- Target version changed from Backlog to 2.6
Updated by Jan Mach almost 5 years ago
- Status changed from In Progress to Feedback
- % Done changed from 90 to 100
- To be discussed changed from No to Yes
Remarks from last meeting were addressed.
Updated by Jan Mach over 4 years ago
- Related to Bug #6191: Implement thread safety in eventstorage service module added
Updated by Jan Mach over 4 years ago
- Status changed from Feedback to Closed
I have found out, that the final issues I have been trying to work out are direct cause of the fact, that direct database access to IDEA events is not implemented in thread-safe manner. I have created new issue #6191 to address this problem, so this one can be now closed.