Project

General

Profile

Actions

Task #6205

closed

Review encapsulation of SQLAlchemy and Psycopg2 DBALs

Added by Jan Mach over 2 years ago. Updated 4 months ago.

Status:
Closed
Priority:
Normal
Category:
Development - Core
Target version:
Start date:
01/24/2020
Due date:
% Done:

100%

Estimated time:
To be discussed:

Description

Working on issue #4251 revealed, that the SQLAlchemy and/or Psycopg2 DBALs may not be optimally used within the project. For example cursors or sessions my be abused in a way that goes against best practices, which can cause troubles like those in #4251, because sessions/transaction can get stuck.

It would be wise to review current implementation.


Files

test_flask_sqlalchemy.py (1.33 KB) test_flask_sqlalchemy.py Radko Krkoš, 01/04/2021 12:43 PM

Related issues

Related to Mentat - Bug #4251: User data access through SQLAlchemy keeps a transaction openClosedJan Mach08/09/2018

Actions
Actions #1

Updated by Jan Mach over 2 years ago

  • Related to Bug #4251: User data access through SQLAlchemy keeps a transaction open added
Actions #2

Updated by Jan Mach about 2 years ago

  • Target version changed from 2.7 to 2.8
Actions #3

Updated by Rajmund Hruska over 1 year ago

  • Assignee changed from Jan Mach to Rajmund Hruska
Actions #4

Updated by Rajmund Hruska over 1 year ago

  • To be discussed changed from No to Yes
Actions #5

Updated by Radko Krkoš over 1 year ago

This issue stems from #4251, where a workaround for a specific transaction handling bug was found and recommended. Based on the videocall discussion a more streamlined approach is required, possibly by managing the transaction lifetime by limiting the life of a Session object.
Following the official documentation [1], the upcoming version of SQLAlchemy (v1.4), currently in beta, includes a Python's context manager based workflow with Session as a transaction manager and Engine as some form of a connection manager.
Based on [1], the best approach would be to keep a reference to Engine, instantiate a Session from it using a context manager and perform the data-focused actions inside that context. This is perhaps better documented by a code excerpt from [1]:

# create session and add objects
with Session(engine) as session, session.begin():
    session.add(some_object)
    session.add(some_other_object)
# inner context calls session.commit(), if there were no exceptions
# outer context calls session.close()

The current version 1.3 does not support the context manager approach while supporting the Engine/Session distinction, so a similar solution is possible [2], but the user (programmer) must both keep a reference to sessionmaker and carefully manage the lifetime of the Session object, what is error prone. It should nevertheless be a feasible approach.

Waiting for the next version might make sense here, although it is of note that the minimum required version of Python is bumped to 3.6 there [3] (not an issue for us as we are currently using 3.7.3, but might cause problems elsewhere). The migration document also promises various performance improvements in the next version.

The required changes to the code are non-trivial for both versions and the migration to v1.4 also seems to require some rewrites. I advise to do both at the same time and postpone this issue until after the SQLAlchemy v1.4 release.

[1] https://docs.sqlalchemy.org/en/14/orm/session_basics.html
[2] https://docs.sqlalchemy.org/en/13/orm/session_basics.html
[3] https://docs.sqlalchemy.org/en/14/changelog/migration_14.html

Actions #6

Updated by Radko Krkoš over 1 year ago

As for the flask-sqlalchemy, it is just a simple adaptor [1] and seems to take the sessionmaker route [2]. It also seems to manage the lifetime of the session subobject. Although it is not obvious from the example code [3] (why would an object's attribute lifetime be different from the object itself?), the documentation refers to the session as scoped and declares [4]:

You have to commit the session, but you don’t have to remove it at the end of the request, Flask-SQLAlchemy does that for you.

So the way of flask-sqlalchemy should be a viable approach to correctly access the database and if not, it is a bug somewhere in the flask-sqlalchemy\SQLAlchemy\psycopg2 stack and should be treated as such.

This is all based on code review and study of the documentation, not tested hard evidence.

[1] https://github.com/pallets/flask-sqlalchemy/blob/master/src/flask_sqlalchemy/model.py
[2] https://flask-sqlalchemy.palletsprojects.com/en/2.x/api/#flask_sqlalchemy.SQLAlchemy
[3] https://flask-sqlalchemy.palletsprojects.com/en/2.x/quickstart/#a-minimal-application
[4] https://flask-sqlalchemy.palletsprojects.com/en/2.x/quickstart/#road-to-enlightenment

Actions #7

Updated by Radko Krkoš over 1 year ago

Further investigating the flask-sqlalchemy, a simple testing script was prepared (attached), to determine the session management behaviour. The output, especially section "Commit - transaction start", shows that commit() in fact does not immediately start another transaction and that is deferred until the next query. Therefore, a viable approach using flask-sqlalchemy is:

from flask import Flask
from flask_sqlalchemy import SQLAlchemy

app = Flask(__name__)
db = SQLAlchemy(app)

# New transaction
db.session.query(...)
db.session.commit()

# New transaction
db.session.query(...)
db.session.commit()

...

as opposed to rollback() that is required using the current architecture based on pure SQLAlchemy.

Actions #8

Updated by Pavel Kácha over 1 year ago

Consensus on 2021-01-11 meeting was that the correct way most probably is switching to flask-sqlalchemy, which hides some sqlalchemy idiosyncracies. We would than be able to remove weird rollback introduced in #4251.

Actions #9

Updated by Pavel Kácha over 1 year ago

  • To be discussed deleted (Yes)
Actions #10

Updated by Pavel Kácha over 1 year ago

  • Target version changed from 2.8 to 2.9
Actions #11

Updated by Rajmund Hruska 9 months ago

  • Status changed from New to Feedback
  • To be discussed set to Yes

I tried using flask_sqlalchemy to get rid of that rollback in 0d12bd68. Although the code is executable and all tests passed, when I check the running queries I see that the query from #4251 is still running. So the original problem is not solved yet.

From my understanding, I used the flask_sqlalchemy in the same way as in the example posted in #note-7 so it should be working. That is not the case though.

Actions #12

Updated by Radko Krkoš 9 months ago

Rajmund Hruska wrote in #note-11:

From my understanding, I used the flask_sqlalchemy in the same way as in the example posted in #note-7 so it should be working.

Not quite. You need to signal an end of transaction to the ORM somehow. In #note-7, it is done by calling db.session.commit(). In whois.py, after being done with the loop of queries (somewhere about where previously the storage.session.rollback() was), the storage.session.commit() method should be called. That should close the running transaction, without it, the transaction will remain open. The symptoms are the same as for #4251, but the cause is not a deficiency in the new ORM (as was with the prior one), but in the way it is used, see #note-6 and original the source [4,#note-6] for more details.

I did not run any code now, so take all this with a grain of salt. Is my reasoning understandable?

Actions #13

Updated by Radko Krkoš 9 months ago

As was discussed on the meeting, with regards also to #4569, access to SQLAlchemy's internals is needed even in the flask_sqlalchemy. Based on the documentation [1], this might work:

from flask_sqlalchemy import BaseQuery

[1] https://flask-sqlalchemy.palletsprojects.com/en/2.x/customizing/#query-class

Actions #14

Updated by Rajmund Hruska 9 months ago

  • Status changed from Feedback to In Progress
  • % Done changed from 0 to 50
  • To be discussed changed from Yes to No

Radko Krkoš wrote in #note-13:

As was discussed on the meeting, with regards also to #4569, access to SQLAlchemy's internals is needed even in the flask_sqlalchemy. Based on the documentation [1], this might work:

[...]

[1] https://flask-sqlalchemy.palletsprojects.com/en/2.x/customizing/#query-class

I found out that Query can be found at flask_sqlalchemy.orm.Query, so I used that class.

I can't fully remove sqlalchemy module because sqlalchemy.exc.OperationalError is used and exc module is not present in the flask_sqlalchemy module.

In 7c1dce70 I also added the RetryingQuery back, which I removed in the previous commit just for test purposes.

As it was discussed during the meeting, I will check if flask_sqlalchemy can be used at other places where insert/delete queries are used.

Actions #15

Updated by Radko Krkoš 9 months ago

Rajmund Hruska wrote in #note-14:

I can't fully remove sqlalchemy module because sqlalchemy.exc.OperationalError is used and exc module is not present in the flask_sqlalchemy module.

I have found:

flask_sqlalchemy.orm.exc.sa_exc.OperationalError

Not thoroughly tested.

Also, there seems to be proper support for model definition:
https://flask-sqlalchemy.palletsprojects.com/en/2.x/models/

Lastly, I would advise to redo the patches from scratch, the commit history is getting hard to understand.

Actions #16

Updated by Rajmund Hruska 9 months ago

Radko Krkoš wrote in #note-15:

Rajmund Hruska wrote in #note-14:

I can't fully remove sqlalchemy module because sqlalchemy.exc.OperationalError is used and exc module is not present in the flask_sqlalchemy module.

I have found:

[...]

Not thoroughly tested.

I tried running the following code:

import flask_sqlalchemy
import sqlalchemy

def func():
    raise sqlalchemy.exc.OperationalError('Test', orig=None, params=None)

def func2():
    try:
        func()
    except flask_sqlalchemy.orm.exc.sa_exc.OperationalError:
        print('caught')

func2()

After executing the code, 'caught' was printed on the screen so I assume it is possible to use flask_sqlalchemy.orm.exc.sa_exc.OperationalError instead of sqlalchemy.exc.OperationalError.

Also, there seems to be proper support for model definition:
https://flask-sqlalchemy.palletsprojects.com/en/2.x/models/

I will look into that in more detail.

Lastly, I would advise to redo the patches from scratch, the commit history is getting hard to understand.

I will create a new branch.

Actions #17

Updated by Rajmund Hruska 9 months ago

I checked the eventstorage.py service, which uses Psycopg2. I don't have any experience with Psycopg2, I only read the official documentation. From my point of view, the Psycopg2 is used correctly and I wouldn't change anything there.

Actions #18

Updated by Radko Krkoš 9 months ago

Rajmund Hruska wrote in #note-17:

I checked the eventstorage.py service, which uses Psycopg2. I don't have any experience with Psycopg2, I only read the official documentation. From my point of view, the Psycopg2 is used correctly and I wouldn't change anything there.

Yes, this is about an idiosyncrasy of the SQLAlchemy ORM. With psycopg2, the transactions are managed differently (manually), so there should be no gotchas there.

Actions #19

Updated by Rajmund Hruska 9 months ago

Radko Krkoš wrote in #note-18:

Rajmund Hruska wrote in #note-17:

I checked the eventstorage.py service, which uses Psycopg2. I don't have any experience with Psycopg2, I only read the official documentation. From my point of view, the Psycopg2 is used correctly and I wouldn't change anything there.

Yes, this is about an idiosyncrasy of the SQLAlchemy ORM. With psycopg2, the transactions are managed differently (manually), so there should be no gotchas there.

I thought so but I checked psycopg2 because it was mentioned in the title and description of the issue. Anyway, at least I understand it better now

Actions #20

Updated by Rajmund Hruska 7 months ago

  • To be discussed changed from No to Yes
Actions #21

Updated by Rajmund Hruska 6 months ago

  • Status changed from In Progress to Feedback
  • % Done changed from 50 to 100

So, in ed1b8f23 I started using flask_sqlalchemy for sessions in lib/mentat. In lib/hawat the flask_sqlalchemy is already used. And lib/vial will be merged into lib/hawat. I guess the issue is resolved now, isn't it?

Actions #22

Updated by Radko Krkoš 6 months ago

Rajmund Hruska wrote in #note-21:

So, in ed1b8f23 I started using flask_sqlalchemy for sessions in lib/mentat. In lib/hawat the flask_sqlalchemy is already used. And lib/vial will be merged into lib/hawat. I guess the issue is resolved now, isn't it?

From a quick glance, not all issues from the previous review have been dealt with. At least I can see the sqlalchemy.exc.OperationalError one unaddressed.
In addition, I do not like the inconsistency of going to full path in code on lines 31 & 44 (after diff) while going in the other direction on lines 32 & 53. Why?

Actions #23

Updated by Rajmund Hruska 6 months ago

Radko Krkoš wrote in #note-22:

Rajmund Hruska wrote in #note-21:

So, in ed1b8f23 I started using flask_sqlalchemy for sessions in lib/mentat. In lib/hawat the flask_sqlalchemy is already used. And lib/vial will be merged into lib/hawat. I guess the issue is resolved now, isn't it?

From a quick glance, not all issues from the previous review have been dealt with. At least I can see the sqlalchemy.exc.OperationalError one unaddressed.

I think we talked about that in one of the meetings but I don't remember the reason why I didn't the OperationalError from flask_sqlalchemy. I guess I can use the other one, as it seems to be working anyway (#note-16).

In addition, I do not like the inconsistency of going to full path in code on lines 31 & 44 (after diff) while going in the other direction on lines 32 & 53. Why?

There wasn't really any particular reason at all it was just how it was convenient at the time. I only needed the OperationalError in lines 32 & 53 and not full sqlalchemy as it was before diff. And I needed multiple objects from flask_sqlalchemy (lines 76, 78).

Actions #24

Updated by Radko Krkoš 6 months ago

Rajmund Hruska wrote in #note-23:

Radko Krkoš wrote in #note-22:

Rajmund Hruska wrote in #note-21:

So, in ed1b8f23 I started using flask_sqlalchemy for sessions in lib/mentat. In lib/hawat the flask_sqlalchemy is already used. And lib/vial will be merged into lib/hawat. I guess the issue is resolved now, isn't it?

From a quick glance, not all issues from the previous review have been dealt with. At least I can see the sqlalchemy.exc.OperationalError one unaddressed.

I think we talked about that in one of the meetings but I don't remember the reason why I didn't the OperationalError from flask_sqlalchemy. I guess I can use the other one, as it seems to be working anyway (#note-16).

In addition, I do not like the inconsistency of going to full path in code on lines 31 & 44 (after diff) while going in the other direction on lines 32 & 53. Why?

There wasn't really any particular reason at all it was just how it was convenient at the time. I only needed the OperationalError in lines 32 & 53 and not full sqlalchemy as it was before diff. And I needed multiple objects from flask_sqlalchemy (lines 76, 78).

OK.

Actions #25

Updated by Pavel Kácha 6 months ago

As the most important aim of this task was to get rid transaction dancing kludge to sidestep bare SQLAlchemy idiosyncracies, I guess we're done here - time for polishing and merging.

There are still cases of questionable *SQLAlchemy usage, I've created another ticket for long winter evenings: #7545.

Actions #26

Updated by Rajmund Hruska 6 months ago

  • Status changed from Feedback to In Review
  • To be discussed deleted (Yes)

Merged into devel.

Actions #27

Updated by Pavel Kácha 4 months ago

  • Status changed from In Review to Closed
Actions

Also available in: Atom PDF