-
-
Save tylerneylon/a7ff6017b7a1f9a506cf75aa23eacfd6 to your computer and use it in GitHub Desktop.
# -*- coding: utf-8 -*- | |
""" rwlock.py | |
A class to implement read-write locks on top of the standard threading | |
library. | |
This is implemented with two mutexes (threading.Lock instances) as per this | |
wikipedia pseudocode: | |
https://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock#Using_two_mutexes | |
Code written by Tyler Neylon at Unbox Research. | |
This file is public domain. | |
""" | |
# _______________________________________________________________________ | |
# Imports | |
from contextlib import contextmanager | |
from threading import Lock | |
# _______________________________________________________________________ | |
# Class | |
class RWLock(object): | |
""" RWLock class; this is meant to allow an object to be read from by | |
multiple threads, but only written to by a single thread at a time. See: | |
https://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock | |
Usage: | |
from rwlock import RWLock | |
my_obj_rwlock = RWLock() | |
# When reading from my_obj: | |
with my_obj_rwlock.r_locked(): | |
do_read_only_things_with(my_obj) | |
# When writing to my_obj: | |
with my_obj_rwlock.w_locked(): | |
mutate(my_obj) | |
""" | |
def __init__(self): | |
self.w_lock = Lock() | |
self.num_r_lock = Lock() | |
self.num_r = 0 | |
# ___________________________________________________________________ | |
# Reading methods. | |
def r_acquire(self): | |
self.num_r_lock.acquire() | |
self.num_r += 1 | |
if self.num_r == 1: | |
self.w_lock.acquire() | |
self.num_r_lock.release() | |
def r_release(self): | |
assert self.num_r > 0 | |
self.num_r_lock.acquire() | |
self.num_r -= 1 | |
if self.num_r == 0: | |
self.w_lock.release() | |
self.num_r_lock.release() | |
@contextmanager | |
def r_locked(self): | |
""" This method is designed to be used via the `with` statement. """ | |
try: | |
self.r_acquire() | |
yield | |
finally: | |
self.r_release() | |
# ___________________________________________________________________ | |
# Writing methods. | |
def w_acquire(self): | |
self.w_lock.acquire() | |
def w_release(self): | |
self.w_lock.release() | |
@contextmanager | |
def w_locked(self): | |
""" This method is designed to be used via the `with` statement. """ | |
try: | |
self.w_acquire() | |
yield | |
finally: | |
self.w_release() |
Oh, that's a good point.
Updated.
the try catch is needed because its used with yield?
I have a simply script read and write a global variable.
import threading
from rwlock import RWLock
marker = RWLock()
WEEKDAYS = ['Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday']
today = 0
def calendar_reader(id_number):
global today
name = 'Reader-' + str(id_number)
while today < len(WEEKDAYS)-1:
# print(today)
with marker.r_locked():
print(name, 'sees that today is', WEEKDAYS[today])
def calendar_writer(id_number):
global today
name = 'Writer-' + str(id_number)
while today < len(WEEKDAYS)-1:
with marker.w_locked():
today = (today + 1) % 7
print(name, 'updated date to ', WEEKDAYS[today])
if __name__ == '__main__':
for i in range(10):
threading.Thread(target=calendar_reader, args=(i,)).start()
for i in range(2):
threading.Thread(target=calendar_writer, args=(i,)).start()
It works on the builtin Lock and readerwriterlock 1.0.4, but to make it work using your script, I have to uncomment print(today)
on line 12. I didn't figure out why.
Hi @nickyfoto — this is actually correct behavior. What's happening is that the read locks are starving the write threads, so today
never gets a chance to change. If you have both readers and writers, then it's up to you to ensure that at some point all of the readers will stop reading in order to allow the writers to write. A simple way to do this is to only have a single reader thread that regularly releases the read lock. Another option is to somehow fence the readers so that every once in a while they will all simultaneously let go of the read lock.
The reason this same code may work with other libraries is that their locking stage may be slower than this code. In that case it's less likely to starve the write lock. However, that does not make the other libraries better — the opposite, those libraries are actually slower, and even for those slow libraries it's bad practice to count on them being slow as a means to ensure your write threads are not starved.
The reason it terminates when you uncomment the print()
line is that the print()
line is slow enough that it dramatically increases the probability that at some point all of the read threads will not be holding on to the read lock.
Hi @tylerneylon , thank you for the clear explanation. I learned a lot.
I wonder about other options in realizing this task. I want to create a library for https://eduzaurus.com/free-essay-samples/sociology/ and I wonder how to do it. Thanks for the idea.
You should make sure that https://gist.github.com/tylerneylon/a7ff6017b7a1f9a506cf75aa23eacfd6#file-rwlock-py-L87 does not get released if a read lock was acquired.
Hi @IceflowRE — If a read lock is active, then the write lock cannot be acquired until that read lock is complete. So w_release()
should not be called unless there are zero readers.
You might be saying that incorrect code could call w_release()
when a read lock is active. In that case, it is incorrect usage of this class. I am not trying to protect against bad usage of the class — only to support correct usage of it. The same philosophy is true of many concurrency tools.
Hi @IceflowRE — If a read lock is active, then the write lock cannot be acquired until that read lock is complete. So
w_release()
should not be called unless there are zero readers.You might be saying that incorrect code could call
w_release()
when a read lock is active. In that case, it is incorrect usage of this class. I am not trying to protect against bad usage of the class — only to support correct usage of it. The same philosophy is true of many concurrency tools.
Ah ok thanks :) because there are other implementation which prevents those.
Hi, in the member functions 'r_acquire' and 'r_release', a different reader thread can end up releasing the write lock than the one that acquired it, how does that work?
Hi @nidhimad , there is not really an ownership between readers and the write lock. Instead, the write lock is locked when either (there is a writer, and there can only be one writer) or (there are any readers, and there can be many simultaneous readers).
For example, consider this sequence of events for readers A, B, C: [A acquires] [B acquires] [A releases] [C acquires] [B releases] [C releases]. This has interesting overlap between the readers, but we don't actually care about the order -- we only care that the write lock is held until all readers are done. So this is the sequence of events in the example:
[start]
num_r = 0
w_lock is unlocked
[A acquires]
num_r = 1
w_lock is locked
[B acquires]
num_r = 2
[A releases]
num_r = 1
[C acquires]
num_r = 2
[B releases]
num_r = 1
[C releases]
num_r = 0
w_lock is unlocked
This implementation is not safe at all in my opinion. Someone who uses this read write lock can call w_release to release the write lock that's currently held by another thread. Since write lock should only be released by the thread holding the lock, there should be a variable holding the current thread ID and w_release needs to check whether calling thread is the same before actually releasing it.
Edit:
I see a previous comment that says the implementation is not trying to protect against incorrect usage, in that case, nvm
@tylerneylon, I think your implementation is based on the assumption that a mutex acquired by one thread can be released by another.
Please look at the first implementation in the below link, it is similar to yours:
https://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock
@nidhimad, you are correct that this code may release a lock from a different thread than the acquiring thread. This is supported by Python's threading library. See: https://docs.python.org/3/library/threading.html#lock-objects
Specifically, the documentation for Lock.release()
says "Release a lock. This can be called from any thread, not only the thread which has acquired the lock." In other words, this implementation will not have a problem due to using a lock this way.
You are also correct that this implementation is like one listed on Wikipedia. This one may be useful for anyone who wants an already-written Python implementation. :)
So if I understand correctly, as long as any readers are reading, more can be added on even if there is a writer waiting. Perhaps you could avoid this potentially infinite wait by having w_acquire acquire num_r_lock, and w_release release it?
If so, then I believe it becomes more similar to this other solution: https://www.oreilly.com/library/view/python-cookbook/0596001673/ch06s04.html which, as soon as a writer starts waiting, causes any new readers to have to wait (Edit: I figured out this is not correct. It releases the lock while waiting, so other new readers could still pick it up, so this example is also not the behavior I want). With this behavior of new readers having to wait, it may be slower in some cases, but is much more fair to the writers so it is faster in other cases, and may be less likely to cause your writer thread to hang if readers are consistently run.
However, I'm not sure I quite grasp whether this increases or decreases the risk of deadlock, though I know it depends on the situation. Anyone have any insights there?
Edit: I believe I was correct, that for what I'm looking for, which allows the writer to block new readers until it has had its own turn, can be done by having the reader incrementer lock also be held by the writer upon its turn. Of course, this could create deadlock if readers and writers end up waiting on each other, but that is also always a risk with the other given solutions.
Edit 2: However, I haven't been able to test that, because my situation apparently has a reader requesting to read multiple times in a row before releasing the writer lock, and I deadlock in between recursion levels if I try to have the writer grab the reader lock. Or it just deadlocks and I haven't figured out why. For now, the given solution is good enough, till I learn more.
Hi @Nibbetts , thanks for the link to the alternative approach, which looks useful! I'll call that other approach a queued
lock and this one (the gist above) queueless
. (I just made up those names, but they make sense to me.)
When do we want a write request to take precedence over later read requests? (ie use a queued
lock vs a queueless
lock?)
The advantage of a queueless
lock is that readers don't have to wait for each other. The advantage of a queued
lock is that writers won't be starved.
Consider the situation where we have a number of incoming readers, and each reader and each writer holds the lock for 1 full second (a long time for a cpu). Suppose we receive requests very quickly in this order, R for read, W for write: R R R W R R R
, and then nothing for a while.
- If we have a
queueless
lock, then all 6 reads happen in about 1 second, since they are simultaneous; then the write occurs. The average read wait is 0 seconds, and the total time is 2 seconds. - If we have a
queued
lock, then the first 3 reads happen together in a concurrent second, then the write occurs, then the next 3 reads. The average read wait is 1 second (avg of 0,0,0,2,2,2), and the total time is 3 seconds. - Note that the writer has to wait about 1 second in both cases. I'm assuming reads don't slow each other down (which, if they do, changes the timing but also affects both kinds of locks).
If we wanted, we could look at specific R/W orderings like R W*n R*m
for large values of n
and m
and produce more pronounced differences.
All of the above is to say that each lock type may be better to use depending on your use case. My take on how to choose between them is:
- What are the possible time intervals between being in a zero-reader state?
- If those time intervals are too long between writes, then use a
queued
lock; otherwise goqueueless
.
Just another note: In general, it would be nice if a low-level Lock
object had the ability to shout, "Hey! I've been waiting 300 years!" if it was waiting a while. I'd use that while building a system to help gain visibility into any bad behavior, which could help find deadlock cases, or in general to make good decisions like that between a queued and queueless lock.
A forked version with support for with
blocks https://gist.github.com/Eboubaker/6a0b807788088a764b2a4c156fda0e4b
Hi @Eboubaker ! As a friendly fyi, this gist does include support for with
blocks (this is done via contextmanager
).
Example usage:
from rwlock import RWLock
rwlock = RWLock() # This is a lock for my_obj.
# When reading from my_obj:
with rwlock.r_locked():
do_read_only_things_with(my_obj)
# When writing to my_obj:
with rwlock.w_locked():
mutate(my_obj)
def update_things():
with rwlock.w_locked():
# update state
with rwlock.r_locked():
# read state
if some_confition:
update_things()
this example will hang in your implementation if the same thread was doing the work,
you have to move lock from update_things
, but this method is being called from multiple places, and you have to keep adding with rwlock.w_locked():
before each call to the method, code will be less readable
Thanks for explaining. You've designed your lock to be reentrant, which is cool. If I have time I may add this feature here, but for now it is not a reentrant lock.
@tylerneylon, I think your implementation is based on the assumption that a mutex acquired by one thread can be released by another. Please look at the first implementation in the below link, it is similar to yours: https://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock
I believe my implementation does not have this problem I have a list of thread ids that are reading, and my release method will not do anything if the current thread did not call acquire_read before
@Nibbetts So if I understand correctly, as long as any readers are reading, more can be added on even if there is a writer waiting. Perhaps you could avoid this potentially infinite wait by having w_acquire acquire num_r_lock, and w_release release it?
in my implementation i used a condition lock for the read_ready_state
any writer or reader will need to acquire the state before they can do read or write requests, so there will be a queue which is managed by the languge and will give the lock by the order of callers. so when the writer requests the state other read requests will have to be after when the writer has released the state
use time.sleep(1) to give them a chance to gain the lock
import threading
import time
# from rwlock import RWLock
marker = RWLock()
WEEKDAYS = ['Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday']
today = 0
def calendar_reader(id_number):
global today
name = 'Reader-' + str(id_number)
while today < len(WEEKDAYS)-1:
# print(today)
with marker.r_locked():
print(name, 'sees that today is', WEEKDAYS[today])
time.sleep(1)
def calendar_writer(id_number):
global today
name = 'Writer-' + str(id_number)
while today < len(WEEKDAYS)-1:
with marker.w_locked():
today = (today + 1) % 7
print(name, 'updated date to ', WEEKDAYS[today])
time.sleep(1)
if __name__ == '__main__':
for i in range(10):
threading.Thread(target=calendar_reader, args=(i,)).start()
for i in range(2):
threading.Thread(target=calendar_writer, args=(i,)).start()
result:
Reader-0 sees that today is Sunday
Reader-1 sees that today is Sunday
Reader-2 sees that today is Sunday
Reader-3 sees that today is Sunday
Reader-4 sees that today is Sunday
Reader-5 sees that today is Sunday
Reader-6 sees that today is Sunday
Reader-7 sees that today is Sunday
Reader-8 sees that today is Sunday
Reader-9 sees that today is Sunday
Writer-0 updated date to Monday
Writer-1 updated date to Tuesday
Reader-4 sees that today is Tuesday
Reader-6 sees that today is Tuesday
Reader-0 sees that today is Tuesday
Reader-7 sees that today is Tuesday
Reader-3 sees that today is Tuesday
Reader-5 sees that today is Tuesday
Reader-2 sees that today is Tuesday
Reader-8 sees that today is Tuesday
Reader-1 sees that today is Tuesday
Reader-9 sees that today is Tuesday
Writer-0 updated date to Wednesday
Writer-1 updated date to Thursday
Reader-9 sees that today is Thursday
Reader-4 sees that today is Thursday
Reader-2 sees that today is Thursday
Reader-7 sees that today is Thursday
Reader-1 sees that today is Thursday
Reader-0 sees that today is Thursday
Reader-5 sees that today is Thursday
Reader-6 sees that today is Thursday
Reader-3 sees that today is Thursday
Reader-8 sees that today is Thursday
Writer-1 updated date to Friday
Writer-0 updated date to Saturday
The advantage of a queueless lock is that readers don't have to wait for each other. The advantage of a queued lock is that writers won't be starved.
@tylerneylon, If you put queue on r_acquire
only (not the whole operation), readers will not block each other. This is called "fair" order.
Here is example with more detailed explanation https://en.wikipedia.org/wiki/Readers%e2%80%93writers_problem#Third_readers%E2%80%93writers_problem.
must add try... finally.. or it may be dead lock when exception happened!