SlideShare a Scribd company logo
Elasticsearch and SolrCloud 
a performance comparison 
Tom Mortimer - Technical Director 
27th November 2014 
charlie@flax.co.uk 
www.flax.co.uk/blog 
+44 (0) 8700 118334 
Twitter: @FlaxSearch
Who are Flax? 
We design, build and support open source powered 
search applications
Who are Flax? 
We design, build and support open source powered 
search applications 
Based in Cambridge U.K., technology agnostic & 
independent – but open source exponents & committers
Who are Flax? 
We design, build and support open source powered 
search applications 
Based in Cambridge U.K., technology agnostic & 
independent – but open source exponents & committers 
UK Authorized Partner of
Who are Flax? 
We design, build and support open source powered 
search applications 
Based in Cambridge U.K., technology agnostic & 
independent – but open source exponents & committers 
UK Authorized Partner of 
Customers include Reed Specialist Recruitment, Mydeco, 
NLA, Gorkana, Financial Times, News UK, EMBL-EBI, 
Accenture, University of Cambridge, UK Government...
Who are Flax? 
We design, build and support open source powered 
search applications 
Based in Cambridge U.K., technology agnostic & 
independent – but open source exponents & committers 
UK Authorized Partner of 
Customers in recruitment, government, e-commerce, 
news & media, bioinformatics, consulting, law...
Who are Flax? 
We design, build and support open source powered 
search applications 
Based in Cambridge U.K., technology agnostic & 
independent – but open source exponents & committers 
UK Authorized Partner of 
Customers in recruitment, government, e-commerce, 
news & media, bioinformatics, consulting, law...
 Open source search server based on Lucene 
 Created in 2004 by Yonik Seeley 
 Became an Apache project in 2006 
 Merged with Lucene in 2011 
 Web API 
 XML config, XML/JSON data formats 
 SolrCloud features added in 2012 
 Uses Apache ZooKeeper for cluster management
 Open source search server based on Lucene 
 Created in 2010 by Shay Banon 
 RESTful Web API 
 Everything is JSON 
 Distributed and NRT by design 
 Own Zen Discovery module for cluster management
vs. 
 Both have large, dynamic communities 
 Well-funded commercial backing 
 Widely used in many diverse projects 
 Elasticsearch easier to setup and configure 
 Elasticsearch query DSL 
 But: is Elasticsearch as tolerant of network faults? 
(Jepsen tests by Kyle Kingsbury) 
 How does performance compare?
vs. 
 Both have large, dynamic communities 
 Well-funded commercial backing 
 Widely used in many diverse projects 
 Elasticsearch easier to setup and configure 
 Elasticsearch query DSL 
 But: is Elasticsearch as tolerant of network faults? 
(Jepsen tests by Kyle Kingsbury) 
 How does performance compare? 
 Note that we don't have a preference...we use both!
Why does performance matter? 
 Won't it be the same, as they both use Lucene? 
 Can't you just throw hardware at it? 
 Hardware is cheaper than developers
Why does performance matter? 
 Won't it be the same, as they both use Lucene? 
 Can't you just throw hardware at it? 
 Hardware is cheaper than developers 
 Well, no.
Why does performance matter? 
 There's a lot more to them than just a web API on top of 
Lucene. 
 Several of our customers have fixed hardware budgets 
 May have to use limited internal resources 
 With large indexes or complex queries, need to squeeze 
every last bit of performance out of the hardware
Why does performance matter? 
 There's a lot more to them than just a web API on top of 
Lucene. 
 Several of our customers have fixed hardware budgets 
 May have to use limited internal resources 
 With large indexes or complex queries, need to squeeze 
every last bit of performance out of the hardware
What performance studies are 
out there? 
 Not many found by a Google search. 
http://blog.socialcast.com/realtime-search-solr-vs-elasticsearch/ 
 Solr much faster than Elasticsearch, except for NRT 
searches with concurrent indexing (where situation was 
reversed). 
 But: This was over 3 years ago, before SolrCloud
Our experience 
 Client with complex filtering requirements for content 
licensing, 10Ms of documents, limited hardware budget, 
no NRT requirement. 
 Performed tests 18 months ago on EC2. 
 Solr was approximately 20 times faster! 
 More recently, Solr was 4 times faster for a project 
requiring geospatial filtering 
 What about now?
This study 
 Recent versions of Elasticsearch (1.4.0) and Solr (4.10.2) 
 Concentrated on indexing performance, query times with 
and without concurrent indexing, QPS, filters and facets. 
 Hardware kindly provided by BigStep.com 
 Full Metal Cloud (real instances, not VMs) 
 Optimised for high performance 
 Can be faster than your own dedicated hardware!
The results?
The results? 
 Not really very interesting
The results? 
 Not really very interesting 
 SolrCloud and Elasticsearch were both very fast 
 Similar performance with concurrent indexing or not 
 Solr could handle higher QPS
Cluster configuration 
 Two machines, each with 96GB RAM 
 Two instances of SolrCloud or Elasticsearch on 
each 
 Each instance has 24GB JVM heap 
 Four shards 
 No replicas
Cluster configuration in BigStep
Data 
 40M documents created by using a Markov chain on a 
seed document (on Stoicism) from gutenberg.org 
“Below planets. this Below lay this the lay infinite the void infinite without 
void beginning, without middle, beginning, or middle, end, or this end 
occupied...” 
 Small (5-20 word) and larger (200-1000 word) docs 
 Randomly assigned ints for “source” and “level”, to 
simulate licensing filters and for facets.
Indexing 
 Python script and requests library 
 Single process for small index, four processes for 
larger index 
 Single process for indexing concurrent with search
Searching 
 Python and requests 
 Each query time logged for analysis 
 Single process for query time testing 
 Multiple processes to test QPS 
 All tests performed warm 
 Queries consisted of three randomly chosen terms 
combined with OR 
 Filters randomly generated 
 Facets / Elasticsearch aggregations
40M Small documents 
 Elasticsearch indexed them in 30 minutes 
 Total index size was 8.8 GB (easily cacheable) 
 Solr indexed them in 43 minutes 
 Total index size was 7.6 GB
40M Small documents (concurrent indexing) 
Elasticsearch: 0.01s mean, 99% < 0.06s 
Solr: 0.01s mean, 99% < 0.10s
40M Large documents 
 Elasticsearch indexed them in 179 minutes 
 Total index size was 363 GB (not completely 
cacheable) 
 Solr indexed them in 119 minutes 
 Total index size was 226 GB
40M Large documents (search with facets) 
Elasticsearch: 0.21s mean, 99% < 0.75s 
Solr: 0.25s mean, 99% < 0.84s
40M Large documents (with 10 filters) 
Elasticsearch: 0.21s mean, 99% < 0.72s 
Solr: 0.09s mean, 99% < 0.50s
40M Large documents (concurrent indexing) 
Elasticsearch: 0.16s mean, 99% < 0.86s 
Solr: 0.09s mean, 99% < 0.46s
40M Large documents (QPS)
Conclusions 
 SolrCloud seems to be slightly faster. However, 
performance was acceptable in all cases. 
 SolrCloud can apparently support a significantly 
higher number of queries per second (tested without 
concurrent indexing, however).
Limitations and problems 
 Validity of generated documents? 
 Validity of random queries? 
 Searches did not fetch any document data 
 Did not test highlighting, range facets, geolocation, 
etc. etc... 
 Only tested one type of cluster configuration 
(Elasticsearch is very flexible about node role). 
 Did not tune JVM parameters 
 Did not perform profiling to identify reasons for 
differences
What's next 
 Would have also liked to have compared BigStep with 
Amazon EC2. 
 If there is any interest, I hope to address some of 
these problems in the near future. 
 We'll open source the code (next week?) on 
www.github.com/flaxsearch
What to take away from this? 
 Elasticsearch and Solr are both awesome 
 They currently seem very close in terms of 
performance (according to this limited study)
What to take away from this? 
 Elasticsearch and Solr are both awesome 
 They currently seem very close in terms of 
performance (according to this limited study) 
 However, all search applications are different 
 Solr and Elasticsearch may have quite different 
performance characteristics in certain cases. 
 Hard to predict. 
 If performance is important to you, it will pay to try 
both.
Thanks! 
 To you, for listening 
 To for the use of Full Metal Cloud 
 Any questions? - tom@flax.co.uk

More Related Content

Solr and Elasticsearch, a performance study

  • 1. Elasticsearch and SolrCloud a performance comparison Tom Mortimer - Technical Director 27th November 2014 [email protected] www.flax.co.uk/blog +44 (0) 8700 118334 Twitter: @FlaxSearch
  • 2. Who are Flax? We design, build and support open source powered search applications
  • 3. Who are Flax? We design, build and support open source powered search applications Based in Cambridge U.K., technology agnostic & independent – but open source exponents & committers
  • 4. Who are Flax? We design, build and support open source powered search applications Based in Cambridge U.K., technology agnostic & independent – but open source exponents & committers UK Authorized Partner of
  • 5. Who are Flax? We design, build and support open source powered search applications Based in Cambridge U.K., technology agnostic & independent – but open source exponents & committers UK Authorized Partner of Customers include Reed Specialist Recruitment, Mydeco, NLA, Gorkana, Financial Times, News UK, EMBL-EBI, Accenture, University of Cambridge, UK Government...
  • 6. Who are Flax? We design, build and support open source powered search applications Based in Cambridge U.K., technology agnostic & independent – but open source exponents & committers UK Authorized Partner of Customers in recruitment, government, e-commerce, news & media, bioinformatics, consulting, law...
  • 7. Who are Flax? We design, build and support open source powered search applications Based in Cambridge U.K., technology agnostic & independent – but open source exponents & committers UK Authorized Partner of Customers in recruitment, government, e-commerce, news & media, bioinformatics, consulting, law...
  • 8.  Open source search server based on Lucene  Created in 2004 by Yonik Seeley  Became an Apache project in 2006  Merged with Lucene in 2011  Web API  XML config, XML/JSON data formats  SolrCloud features added in 2012  Uses Apache ZooKeeper for cluster management
  • 9.  Open source search server based on Lucene  Created in 2010 by Shay Banon  RESTful Web API  Everything is JSON  Distributed and NRT by design  Own Zen Discovery module for cluster management
  • 10. vs.  Both have large, dynamic communities  Well-funded commercial backing  Widely used in many diverse projects  Elasticsearch easier to setup and configure  Elasticsearch query DSL  But: is Elasticsearch as tolerant of network faults? (Jepsen tests by Kyle Kingsbury)  How does performance compare?
  • 11. vs.  Both have large, dynamic communities  Well-funded commercial backing  Widely used in many diverse projects  Elasticsearch easier to setup and configure  Elasticsearch query DSL  But: is Elasticsearch as tolerant of network faults? (Jepsen tests by Kyle Kingsbury)  How does performance compare?  Note that we don't have a preference...we use both!
  • 12. Why does performance matter?  Won't it be the same, as they both use Lucene?  Can't you just throw hardware at it?  Hardware is cheaper than developers
  • 13. Why does performance matter?  Won't it be the same, as they both use Lucene?  Can't you just throw hardware at it?  Hardware is cheaper than developers  Well, no.
  • 14. Why does performance matter?  There's a lot more to them than just a web API on top of Lucene.  Several of our customers have fixed hardware budgets  May have to use limited internal resources  With large indexes or complex queries, need to squeeze every last bit of performance out of the hardware
  • 15. Why does performance matter?  There's a lot more to them than just a web API on top of Lucene.  Several of our customers have fixed hardware budgets  May have to use limited internal resources  With large indexes or complex queries, need to squeeze every last bit of performance out of the hardware
  • 16. What performance studies are out there?  Not many found by a Google search. http://blog.socialcast.com/realtime-search-solr-vs-elasticsearch/  Solr much faster than Elasticsearch, except for NRT searches with concurrent indexing (where situation was reversed).  But: This was over 3 years ago, before SolrCloud
  • 17. Our experience  Client with complex filtering requirements for content licensing, 10Ms of documents, limited hardware budget, no NRT requirement.  Performed tests 18 months ago on EC2.  Solr was approximately 20 times faster!  More recently, Solr was 4 times faster for a project requiring geospatial filtering  What about now?
  • 18. This study  Recent versions of Elasticsearch (1.4.0) and Solr (4.10.2)  Concentrated on indexing performance, query times with and without concurrent indexing, QPS, filters and facets.  Hardware kindly provided by BigStep.com  Full Metal Cloud (real instances, not VMs)  Optimised for high performance  Can be faster than your own dedicated hardware!
  • 20. The results?  Not really very interesting
  • 21. The results?  Not really very interesting  SolrCloud and Elasticsearch were both very fast  Similar performance with concurrent indexing or not  Solr could handle higher QPS
  • 22. Cluster configuration  Two machines, each with 96GB RAM  Two instances of SolrCloud or Elasticsearch on each  Each instance has 24GB JVM heap  Four shards  No replicas
  • 24. Data  40M documents created by using a Markov chain on a seed document (on Stoicism) from gutenberg.org “Below planets. this Below lay this the lay infinite the void infinite without void beginning, without middle, beginning, or middle, end, or this end occupied...”  Small (5-20 word) and larger (200-1000 word) docs  Randomly assigned ints for “source” and “level”, to simulate licensing filters and for facets.
  • 25. Indexing  Python script and requests library  Single process for small index, four processes for larger index  Single process for indexing concurrent with search
  • 26. Searching  Python and requests  Each query time logged for analysis  Single process for query time testing  Multiple processes to test QPS  All tests performed warm  Queries consisted of three randomly chosen terms combined with OR  Filters randomly generated  Facets / Elasticsearch aggregations
  • 27. 40M Small documents  Elasticsearch indexed them in 30 minutes  Total index size was 8.8 GB (easily cacheable)  Solr indexed them in 43 minutes  Total index size was 7.6 GB
  • 28. 40M Small documents (concurrent indexing) Elasticsearch: 0.01s mean, 99% < 0.06s Solr: 0.01s mean, 99% < 0.10s
  • 29. 40M Large documents  Elasticsearch indexed them in 179 minutes  Total index size was 363 GB (not completely cacheable)  Solr indexed them in 119 minutes  Total index size was 226 GB
  • 30. 40M Large documents (search with facets) Elasticsearch: 0.21s mean, 99% < 0.75s Solr: 0.25s mean, 99% < 0.84s
  • 31. 40M Large documents (with 10 filters) Elasticsearch: 0.21s mean, 99% < 0.72s Solr: 0.09s mean, 99% < 0.50s
  • 32. 40M Large documents (concurrent indexing) Elasticsearch: 0.16s mean, 99% < 0.86s Solr: 0.09s mean, 99% < 0.46s
  • 34. Conclusions  SolrCloud seems to be slightly faster. However, performance was acceptable in all cases.  SolrCloud can apparently support a significantly higher number of queries per second (tested without concurrent indexing, however).
  • 35. Limitations and problems  Validity of generated documents?  Validity of random queries?  Searches did not fetch any document data  Did not test highlighting, range facets, geolocation, etc. etc...  Only tested one type of cluster configuration (Elasticsearch is very flexible about node role).  Did not tune JVM parameters  Did not perform profiling to identify reasons for differences
  • 36. What's next  Would have also liked to have compared BigStep with Amazon EC2.  If there is any interest, I hope to address some of these problems in the near future.  We'll open source the code (next week?) on www.github.com/flaxsearch
  • 37. What to take away from this?  Elasticsearch and Solr are both awesome  They currently seem very close in terms of performance (according to this limited study)
  • 38. What to take away from this?  Elasticsearch and Solr are both awesome  They currently seem very close in terms of performance (according to this limited study)  However, all search applications are different  Solr and Elasticsearch may have quite different performance characteristics in certain cases.  Hard to predict.  If performance is important to you, it will pay to try both.
  • 39. Thanks!  To you, for listening  To for the use of Full Metal Cloud  Any questions? - [email protected]