-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add unit tests for github auth provider #149
Conversation
Setting any threshold for the lowest value for the auth cache size might still not guarantee (in some rough timing conditions), the entry won't disappear before it's read for the first time by the core logic. The change adds a short "hot/proxy" cache for new entries, that are not yet ever read by the core logic via is_authorized(). When that's done, only then the entry is moved to the regular TTL/LRU cache.
Also provide getter for tests
Hey, @athornton long time no see :) Sorry to keep you busy, but since you proved so much helpful on my previous endeavors... I hope it's pretty much on a sliver plate. All tests clean and tidy. The unit tests aren't hardcore exhaustive, but provide full line&branch coverage. |
giftless/auth/github.py
Outdated
from operator import attrgetter, itemgetter | ||
from threading import Condition, Lock, RLock | ||
from typing import Any, cast, overload | ||
from threading import Lock, RLock, _RLock |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ew, but if you need it you need it. Honestly it looks like a bug in threading that something public exposes _RLock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed 🤮😞 It's mostly a shutup-treat for mypy which for some reason rejects type(RLock())
as a type. This is the only workaround I found for a dynamic low-level type coming from a factory function RLock()
. I'll try to dig a bit more, but my hopes are low.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yissss 🐍 I fixed this with a little protocol instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool! Now I've actually seen Protocol in the wild, and it makes sense.
@@ -280,17 +278,28 @@ def __eq__(self, other: object) -> bool: | |||
def __hash__(self) -> int: | |||
return hash((self.login, self.id)) | |||
|
|||
def permissions(self, org: str, repo: str) -> set[Permission] | None: | |||
def permissions( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not entirely sure I understand the purpose of "authoritative".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is related to caching and thread safety, in the boundaries of the current design.
The "problem" is in the way how this whole authentication is performed in BaseView._is_authorized
, which first creates the authenticator, __call__
s it to get the Identity
and only then it verifies the right permissions with Identity.is_authorized
.
These two consecutive steps have no locks, so they can get mixed up with other threads simultaneously doing the same thing. Because I'm authorizing the user via GitHub API calls, my goal is to prevent those from repeating not only by caching that for some TTL, but also by blocking concurrent identical auth requests from different threads, so only a single thread gets to do the API calls and other just sit blocking and get only the result.
Because I wanted to do all the API calls within the GithubAuthenticator.__call__
method not to drag the requests
Session
object around (which is otherwise helpful in reusing a single TLS connection the to API), I'm storing only the resulting permissions on the GithubIdentity
object, which can only simply expire.
in a different design with a shared
requests.Session
, theGithubIdentity
would know itself how to get the user's permissions (and refresh its permission cache), but theSession
should have a limited lifetime and as such, I'd have to manage it externally, which seemed even more complicated.
With all this, and the fact all the caches not only have configurable TTL, but also a rather low maximum size, I was obsessed about a scenario where (likely) an attacker would DoS the permission cache by doing concurrent calls for the same user, but different org/repo. In that scenario, the cached entry that got stored in the GithubAuthenticator.__call__
gets evicted sooner than it's being finally accessed via Identity.is_authorized
.
Therefore I couldn't eventually resist to introduce a tiny in-between cache to remember the permissions for a user obtained in __call__
till it's being successfully read in is_authorized
😮💨
This permissions
method is called in two contexts:
- in the
__call__
to peek if a certain user's permissions are cached - from
is_authorized
, which isn't used internally, but mean the obtained permissions have been read by the core logic for good and that I'm free to finally forget it.
The authoritative
flag is just for that latter case, telling the logic the permissions have been properly read and now can sit and expire in the regular cache.
I admit this design is covering a really niche scenario, but once I realized it's there, my "OCD" couldn't let it stay. If you think there's a chance to considerably simplify the design, I'm all ears, but since this works and it's already done... 😇
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds good to me. My approach would just (if I'd noticed it at all, which I probably wouldn't because I wouldn't have tested high concurrency) have been to throw a mutex around the critical section and accept the performance hit. This seems much more amenable to many callers at once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great. I wish we didn't have to import a private symbol from threading, but that feels like a problem in threading, not with your code.
@athornton, is it merge-ready now? ;) |
Hey @vit-zikmund post-merge I'm now getting a bunch of failures in the github auth tests. They're of the form:
I'm starting in on this but I suspect you'll be able to fix it trivially. |
Oh yeah, lemme fix that real quick! |
fixed in #155 |
This PR add 100% (well 99% due to typing clutter) coverage unit tests for the github auth provider.
It fixes a handful of mostly cosmetic things in the original code, that popped up during the work.
One considerably more significant change is the introduction of the authorization "proxy" cache, which ensures that the core giftless code will always get a valid
is_authorized()
response for the user calling giftless, even if the usual auth proxy is configured too low to withstand some harsh hammering conditions.Closes: #148