Across Process Boundary In Scoped_session

July 08, 2024 Post a Comment

I'm using SQLAlchemy and multiprocessing. I also use scoped_session sinse it avoids share the same session but I've found an error and their solution but I don't understand why do

Solution 1:

To understand why this happens, you need to understand what scoped_session and Pool actually does. scoped_session keeps a registry of sessions so that the following happens

the first time you call DBSession, it creates a Session object for you in the registry
subsequently, if necessary conditions are met (i.e. same thread, session has not been closed), it does not create a new Session object and instead returns you the previously created Session object back

When you create a Pool, it creates the workers in the __init__ method. (Note that there's nothing fundamental about starting the worker processes in __init__. An equally valid implementation could wait until workers are first needed before it starts them, which would exhibit different behavior in your example.) When this happens (on Unix), the parent process forks itself for every worker process, which involves the operating system copying the memory of the current running process into a new process, so you will literally get the exact same objects in the exact same places.

Putting these two together, in the first example you are creating a Session before forking, which gets copied over to all worker processes during the creation of the Pool, resulting in the same identity, while in the second example you delay the creation of the Session object until after the worker processes have started, resulting in different identities.

It's important to note that while the Session objects share the same id, they are not the same object, in the sense that if you change anything about the Session in the parent process, they will not be reflected in the child processes. They just happen to all share the same memory address due to the fork. However, OS-level resources like connections are shared, so if you had run a query on session before Pool(), a connection would have been created for you in the connection pool and subsequently forked into the child processes. If you then attempt to perform queries in the child processes you will run into weird errors because your processes are clobbering over each other over the same exact connection!

Baca Juga

The above is moot for Windows because Windows does not have fork().

Solution 2:

TCP connections are represented as file descriptors, which usually work across process boundaries, meaning this will cause concurrent access to the file descriptor on behalf of two or more entirely independent Python interpreter states.

https://docs.sqlalchemy.org/en/13/core/pooling.html#using-connection-pools-with-multiprocessing

Learn Python Tutorials

Across Process Boundary In Scoped_session

Solution 1:

Solution 2:

Post a Comment for "Across Process Boundary In Scoped_session"