Across Process Boundary In Scoped_session
Solution 1:
To understand why this happens, you need to understand what scoped_session
and Pool
actually does. scoped_session
keeps a registry of sessions so that the following happens
- the first time you call
DBSession
, it creates aSession
object for you in the registry - subsequently, if necessary conditions are met (i.e. same thread, session has not been closed), it does not create a new
Session
object and instead returns you the previously createdSession
object back
When you create a Pool
, it creates the workers in the __init__
method. (Note that there's nothing fundamental about starting the worker processes in __init__
. An equally valid implementation could wait until workers are first needed before it starts them, which would exhibit different behavior in your example.) When this happens (on Unix), the parent process forks itself for every worker process, which involves the operating system copying the memory of the current running process into a new process, so you will literally get the exact same objects in the exact same places.
Putting these two together, in the first example you are creating a Session
before forking, which gets copied over to all worker processes during the creation of the Pool
, resulting in the same identity, while in the second example you delay the creation of the Session
object until after the worker processes have started, resulting in different identities.
It's important to note that while the Session
objects share the same id
, they are not the same object, in the sense that if you change anything about the Session
in the parent process, they will not be reflected in the child processes. They just happen to all share the same memory address due to the fork. However, OS-level resources like connections are shared, so if you had run a query on session
before Pool()
, a connection would have been created for you in the connection pool and subsequently forked into the child processes. If you then attempt to perform queries in the child processes you will run into weird errors because your processes are clobbering over each other over the same exact connection!
The above is moot for Windows because Windows does not have fork()
.
Solution 2:
TCP connections are represented as file descriptors, which usually work across process boundaries, meaning this will cause concurrent access to the file descriptor on behalf of two or more entirely independent Python interpreter states.
https://docs.sqlalchemy.org/en/13/core/pooling.html#using-connection-pools-with-multiprocessing
Post a Comment for "Across Process Boundary In Scoped_session"