Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

change solrpy library to pysolr #2352

Merged
merged 9 commits into from
May 5, 2016

Conversation

joetsoi
Copy link
Contributor

@joetsoi joetsoi commented Mar 17, 2015

solrpy is unmaintained and by some handwaving not actually realistic benchmarks, slower, even the author of solrpy has said use something else like pysolr

Leonardo Santagada  30/08/2014

I don't think solrpy supports collection api. Talking about not supporting stuff... 
I'm one of the authors of solrpy and I'm thinking why don't we just put a link to pysolr 
and declare this library deprecated? Has someone looked at pysolr and can comment 
if there is any advantage that we still have over it?

https://groups.google.com/forum/#!searchin/solrpy/pysolr/solrpy/OPuV-jxvtRU/ZyHJ5mONIPoJ

from what I recall, solrpy does more xml transformation where as pysolr just requests the json.
pysolr also uses requests instead of handling the underlying connection itself, so our code gets to be a bit cleaner as well.

We do lose the basic http auth config options for solr as pysolr does not support this, I don't mind as this was actually undocumented on solrpy if I recall anyway)

(also because i'm doing this https://github.com/joetsoi/fedora-rpms-for-ckan)

@joetsoi
Copy link
Contributor Author

joetsoi commented Mar 17, 2015

(needs to have the legacy tests / new tests stuff fixed.)

@amercader
Copy link
Member

@joetsoi This looks good, what's left to do with the tests?

It seems like basic auth could be reimplmemented if necessary, but I agree that we shouldn't worry too much about it for now

@joetsoi
Copy link
Contributor Author

joetsoi commented Apr 9, 2015

I meant I need to fix the branch for the new_tests being moved to tests and tests being moved to legacy tests.

if solr_user is not None and solr_password is not None:
return SolrConnection(solr_url, http_user=solr_user,
http_pass=solr_password)
if decode_dates:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joetsoi can you briefly explain why this is needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I recall correctly

  • solrpy requests xml from solr and parses and allows you to search in two different ways
    search (returns datetime types) and raw_search (returns text)
  • pysolr parses the json (which does not provide a datetime type) and returns that in its search results

Since this is ckan, we like being difficult so we obviously use both raw and non raw searches in solrpy in differeent places and expect the results from solr to be a datetime and text other times.

So I had to add the decoding of dates to pysolr for the functions expecting datetime objects and some for when it wasn't

if i recall I lifted the datetime regex from pysolr (https://github.com/toastdriven/pysolr/blob/master/pysolr.py#L68) I think I should probably import it instead.

@wardi
Copy link
Contributor

wardi commented Nov 13, 2015

@amercader how do you feel about merging this for 2.5?

@rossjones
Copy link
Contributor

Although this is a pretty old PR, I think updating to maintained deps is probably a good thing. Should we unassign this and see if anyone is willing to volunteer to push this through?

@amercader
Copy link
Member

IIRC the only thing that hold me back was the dates magic going on on this PR. But if that makes sense to someone else who has time to look into it, let's go for it

@joetsoi
Copy link
Contributor Author

joetsoi commented Feb 12, 2016

If you want me to update this to master I can
On 12 Feb 2016 11:32 am, "Adrià Mercader" [email protected] wrote:

IIRC the only thing that hold me back was the dates magic going on on this
PR. But if that makes sense to someone else who has time to look into it,
let's go for it


Reply to this email directly or view it on GitHub
#2352 (comment).

@rossjones
Copy link
Contributor

@joetsoi That was be amazing, thank you.

@wardi
Copy link
Contributor

wardi commented Apr 24, 2016

@TkTech since you're interested ckan/ideas#174 (comment) , would you like to provide an update for this PR so I can merge?

joetsoi added a commit to joetsoi/ckan that referenced this pull request Apr 24, 2016
@joetsoi joetsoi force-pushed the solrpy-is-dead-long-live-pysolr branch from ba595ba to 786ba55 Compare April 24, 2016 22:44
def solr_datetime_decoder(d):
for k, v in d.items():
if isinstance(v, basestring):
possible_datetime = re.search(pysolr.DATETIME_REGEX, v)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed so that we are importing the regex instead of copy pasting it.

pysolr currently always provides a string as the first argument to
SolrError exceptions. An additional check has been added to future proof
the code.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants