-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
change solrpy library to pysolr #2352
Conversation
c239c3b
to
2f52a83
Compare
(needs to have the legacy tests / new tests stuff fixed.) |
@joetsoi This looks good, what's left to do with the tests? It seems like basic auth could be reimplmemented if necessary, but I agree that we shouldn't worry too much about it for now |
I meant I need to fix the branch for the new_tests being moved to tests and tests being moved to legacy tests. |
if solr_user is not None and solr_password is not None: | ||
return SolrConnection(solr_url, http_user=solr_user, | ||
http_pass=solr_password) | ||
if decode_dates: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@joetsoi can you briefly explain why this is needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I recall correctly
- solrpy requests xml from solr and parses and allows you to search in two different ways
search
(returns datetime types) andraw_search
(returns text) - pysolr parses the json (which does not provide a datetime type) and returns that in its search results
Since this is ckan, we like being difficult so we obviously use both raw and non raw searches in solrpy in differeent places and expect the results from solr to be a datetime and text other times.
So I had to add the decoding of dates to pysolr for the functions expecting datetime objects and some for when it wasn't
if i recall I lifted the datetime regex from pysolr (https://github.com/toastdriven/pysolr/blob/master/pysolr.py#L68) I think I should probably import it instead.
@amercader how do you feel about merging this for 2.5? |
Although this is a pretty old PR, I think updating to maintained deps is probably a good thing. Should we unassign this and see if anyone is willing to volunteer to push this through? |
IIRC the only thing that hold me back was the dates magic going on on this PR. But if that makes sense to someone else who has time to look into it, let's go for it |
If you want me to update this to master I can
|
@joetsoi That was be amazing, thank you. |
@TkTech since you're interested ckan/ideas#174 (comment) , would you like to provide an update for this PR so I can merge? |
ba595ba
to
786ba55
Compare
def solr_datetime_decoder(d): | ||
for k, v in d.items(): | ||
if isinstance(v, basestring): | ||
possible_datetime = re.search(pysolr.DATETIME_REGEX, v) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed so that we are importing the regex instead of copy pasting it.
pysolr currently always provides a string as the first argument to SolrError exceptions. An additional check has been added to future proof the code.
solrpy is unmaintained and by some handwaving not actually realistic benchmarks, slower, even the author of solrpy has said use something else like pysolr
https://groups.google.com/forum/#!searchin/solrpy/pysolr/solrpy/OPuV-jxvtRU/ZyHJ5mONIPoJ
from what I recall, solrpy does more xml transformation where as pysolr just requests the json.
pysolr also uses requests instead of handling the underlying connection itself, so our code gets to be a bit cleaner as well.
We do lose the basic http auth config options for solr as pysolr does not support this, I don't mind as this was actually undocumented on solrpy if I recall anyway)
(also because i'm doing this https://github.com/joetsoi/fedora-rpms-for-ckan)