The document summarizes a performance comparison study conducted between Elasticsearch and SolrCloud. It found that SolrCloud was slightly faster at indexing and querying large datasets, and was able to support a significantly higher queries per second. However, the document notes limitations to the study and concludes that both Elasticsearch and SolrCloud showed acceptable performance, so the best option depends on the specific search application requirements.
1 of 39
Downloaded 416 times
More Related Content
Solr and Elasticsearch, a performance study
1. Elasticsearch and SolrCloud
a performance comparison
Tom Mortimer - Technical Director
27th November 2014
[email protected]
www.flax.co.uk/blog
+44 (0) 8700 118334
Twitter: @FlaxSearch
2. Who are Flax?
We design, build and support open source powered
search applications
3. Who are Flax?
We design, build and support open source powered
search applications
Based in Cambridge U.K., technology agnostic &
independent – but open source exponents & committers
4. Who are Flax?
We design, build and support open source powered
search applications
Based in Cambridge U.K., technology agnostic &
independent – but open source exponents & committers
UK Authorized Partner of
5. Who are Flax?
We design, build and support open source powered
search applications
Based in Cambridge U.K., technology agnostic &
independent – but open source exponents & committers
UK Authorized Partner of
Customers include Reed Specialist Recruitment, Mydeco,
NLA, Gorkana, Financial Times, News UK, EMBL-EBI,
Accenture, University of Cambridge, UK Government...
6. Who are Flax?
We design, build and support open source powered
search applications
Based in Cambridge U.K., technology agnostic &
independent – but open source exponents & committers
UK Authorized Partner of
Customers in recruitment, government, e-commerce,
news & media, bioinformatics, consulting, law...
7. Who are Flax?
We design, build and support open source powered
search applications
Based in Cambridge U.K., technology agnostic &
independent – but open source exponents & committers
UK Authorized Partner of
Customers in recruitment, government, e-commerce,
news & media, bioinformatics, consulting, law...
8. Open source search server based on Lucene
Created in 2004 by Yonik Seeley
Became an Apache project in 2006
Merged with Lucene in 2011
Web API
XML config, XML/JSON data formats
SolrCloud features added in 2012
Uses Apache ZooKeeper for cluster management
9. Open source search server based on Lucene
Created in 2010 by Shay Banon
RESTful Web API
Everything is JSON
Distributed and NRT by design
Own Zen Discovery module for cluster management
10. vs.
Both have large, dynamic communities
Well-funded commercial backing
Widely used in many diverse projects
Elasticsearch easier to setup and configure
Elasticsearch query DSL
But: is Elasticsearch as tolerant of network faults?
(Jepsen tests by Kyle Kingsbury)
How does performance compare?
11. vs.
Both have large, dynamic communities
Well-funded commercial backing
Widely used in many diverse projects
Elasticsearch easier to setup and configure
Elasticsearch query DSL
But: is Elasticsearch as tolerant of network faults?
(Jepsen tests by Kyle Kingsbury)
How does performance compare?
Note that we don't have a preference...we use both!
12. Why does performance matter?
Won't it be the same, as they both use Lucene?
Can't you just throw hardware at it?
Hardware is cheaper than developers
13. Why does performance matter?
Won't it be the same, as they both use Lucene?
Can't you just throw hardware at it?
Hardware is cheaper than developers
Well, no.
14. Why does performance matter?
There's a lot more to them than just a web API on top of
Lucene.
Several of our customers have fixed hardware budgets
May have to use limited internal resources
With large indexes or complex queries, need to squeeze
every last bit of performance out of the hardware
15. Why does performance matter?
There's a lot more to them than just a web API on top of
Lucene.
Several of our customers have fixed hardware budgets
May have to use limited internal resources
With large indexes or complex queries, need to squeeze
every last bit of performance out of the hardware
16. What performance studies are
out there?
Not many found by a Google search.
http://blog.socialcast.com/realtime-search-solr-vs-elasticsearch/
Solr much faster than Elasticsearch, except for NRT
searches with concurrent indexing (where situation was
reversed).
But: This was over 3 years ago, before SolrCloud
17. Our experience
Client with complex filtering requirements for content
licensing, 10Ms of documents, limited hardware budget,
no NRT requirement.
Performed tests 18 months ago on EC2.
Solr was approximately 20 times faster!
More recently, Solr was 4 times faster for a project
requiring geospatial filtering
What about now?
18. This study
Recent versions of Elasticsearch (1.4.0) and Solr (4.10.2)
Concentrated on indexing performance, query times with
and without concurrent indexing, QPS, filters and facets.
Hardware kindly provided by BigStep.com
Full Metal Cloud (real instances, not VMs)
Optimised for high performance
Can be faster than your own dedicated hardware!
21. The results?
Not really very interesting
SolrCloud and Elasticsearch were both very fast
Similar performance with concurrent indexing or not
Solr could handle higher QPS
22. Cluster configuration
Two machines, each with 96GB RAM
Two instances of SolrCloud or Elasticsearch on
each
Each instance has 24GB JVM heap
Four shards
No replicas
24. Data
40M documents created by using a Markov chain on a
seed document (on Stoicism) from gutenberg.org
“Below planets. this Below lay this the lay infinite the void infinite without
void beginning, without middle, beginning, or middle, end, or this end
occupied...”
Small (5-20 word) and larger (200-1000 word) docs
Randomly assigned ints for “source” and “level”, to
simulate licensing filters and for facets.
25. Indexing
Python script and requests library
Single process for small index, four processes for
larger index
Single process for indexing concurrent with search
26. Searching
Python and requests
Each query time logged for analysis
Single process for query time testing
Multiple processes to test QPS
All tests performed warm
Queries consisted of three randomly chosen terms
combined with OR
Filters randomly generated
Facets / Elasticsearch aggregations
27. 40M Small documents
Elasticsearch indexed them in 30 minutes
Total index size was 8.8 GB (easily cacheable)
Solr indexed them in 43 minutes
Total index size was 7.6 GB
29. 40M Large documents
Elasticsearch indexed them in 179 minutes
Total index size was 363 GB (not completely
cacheable)
Solr indexed them in 119 minutes
Total index size was 226 GB
30. 40M Large documents (search with facets)
Elasticsearch: 0.21s mean, 99% < 0.75s
Solr: 0.25s mean, 99% < 0.84s
34. Conclusions
SolrCloud seems to be slightly faster. However,
performance was acceptable in all cases.
SolrCloud can apparently support a significantly
higher number of queries per second (tested without
concurrent indexing, however).
35. Limitations and problems
Validity of generated documents?
Validity of random queries?
Searches did not fetch any document data
Did not test highlighting, range facets, geolocation,
etc. etc...
Only tested one type of cluster configuration
(Elasticsearch is very flexible about node role).
Did not tune JVM parameters
Did not perform profiling to identify reasons for
differences
36. What's next
Would have also liked to have compared BigStep with
Amazon EC2.
If there is any interest, I hope to address some of
these problems in the near future.
We'll open source the code (next week?) on
www.github.com/flaxsearch
37. What to take away from this?
Elasticsearch and Solr are both awesome
They currently seem very close in terms of
performance (according to this limited study)
38. What to take away from this?
Elasticsearch and Solr are both awesome
They currently seem very close in terms of
performance (according to this limited study)
However, all search applications are different
Solr and Elasticsearch may have quite different
performance characteristics in certain cases.
Hard to predict.
If performance is important to you, it will pay to try
both.