Description
Feature Request Template
Please include the following information in your feature request:
Headline
- "CiteGeist gets better again: CourtListener's search engine is now enhanced to provide the best results to all your queries"
What is the Feature?
"CiteGeist" is low-key the name of our search engine's ranking algorithm. There are a number of ways we can make it better:
- Add network-based ranking to search results for enhanced relevancy courtlistener#4436
- Boost search relevancy by jurisdiction courtlistener#4381
- Enhance case name search relevancy courtlistener#4366
- Implement Relevance Decay based on filing date courtlistener#558
Each of these uses the metadata of the case to provide better query relevancy and together they should make results more relevant.
What Problem Might it Solve?
Currently, searching uses field-based boosting, phrase boosting, and TF-IDF relevancy to order results. It's fine, but sometimes you really have to wonder why it didn't find the case you're looking for.
Describe a Scenario in Which the Feature Might be Used
Whenever people search, this would enhance the results they got.
Technical Requirements
-
How hard is it to make, subjectively? Medium
-
Best guess, how long would it take to make, roughly? 4 weeks
-
What would it require that we do technically?
We would have to figure out the different ranking algorithms and how to apply them to our results in a performant way that can be updated as new content comes in (difficult with network-based algos). We would then have to calculate the network scores and associate them with the 10M cases we have.
Existing Systems or Alternatives?
There are plenty of case law search tools, but I think the target we currently have is Google Scholar. If we do this, I expect our results should compete with theirs.
Any Additional Information?
-
We're also working on semantic search, which would partly obviate this, but I think we'll have keyword search for the foreseeable future.
-
We're in a cool position with the network-based search because one of the folks in our orbit is doing PhD-level research on that topic. If we can make it work, that'd be special.
-
In the past, we used pagerank for this, but we stopped a few years ago and nobody noticed. It's not the best ranking algo for something like court cases.
Activity