SlideShare a Scribd company logo
Lucandra Lucene + Cassandra http://github/tjake/Lucandra http://twitter.com/tjake Jake Luciani
What we'll cover today: Search use-cases Problems scaling and maintaining Lucene/Solr Cassandra Lucandra Lucandra in Action  Q&A
Types of search apps:  
Types of search apps:  
Lucene/Solr Scaling Problems Writes are expensive on a live system Merge, Reopen, Optimize, Sorting "Too many open files" Solr replication too many moving parts Scaling writes requires client side sharding Lots of grid management -> ZooKeeper? Backups? Monitoring? Failures? Ops Team? Oh my! This sounds a lot like mysql doesn't it?....
Cassandra - Love Child of BigTable and Dynamo Peer to peer (easy to add new nodes) CAP Configurable Multi-level TreeMap (sorta) Pluggable replication/sorting Writes are very fast! Low latency  Integrates with Hadoop  Major adoption and development
Cassandra's Data Model { "bloghost.com" :                                                   // Keyspace        { "Posts" :                                                            // ColumnFamily         { " tjake.bloghost.com " :                                   // Key             { "20100426-Lucandra" : "lucandra talk today!" } // Columns                }        },      { "Comments" :                                         // SuperColumnFamily          { " tjake.bloghost.com " :                        // Key             { "20100426-Lucandra-1":                // SuperColumn                {"From" : "Otis","Comment": "Don't Suck!"}, // Columns                },             { "20100426-Lucandra-2":                // SuperColumn                 {"From" : "Jake","Comment": "O.K."},  // Columns                          },       } }}
Cassandra - Partitioning
Cassandra - Scale Up / Scale Down
Cassandra - Replication
Solr/Lucene Components
Lucandra Components
How is an index stored? { "Lucandra" :    { "Docs" :                         {  "Index1/Doc1" :  { "Field1" : "T1 T2 T1", ... },        {  "Index1/Doc2" :  { "Field1" : "T3 T1", ... }     },     {"TermVectors" :        {"Index1/Field1/T1" : { "Doc1": [0, 2], "Doc2":[1] },        {"Index1/Field1/T2" : { "Doc1": [1] },        {"Index1/Field1/T3" : { "Doc2": [1] },     } }
Lucandra Deployed
Lucandra In Action Sparse.ly and Wikassandra
sparse.ly -  twitter search for friends only ~4k Indexes on 2 boxes
Wikassandra - Search wikipedia 4 node cluster 3k writes per sec (over thrift from single node) Solr interface

More Related Content

Viewers also liked (20)

Lucene
LuceneLucene
Lucene
Matt Wood
 
Apache Lucene: Searching the Web and Everything Else (Jazoon07)
Apache Lucene: Searching the Web and Everything Else (Jazoon07)Apache Lucene: Searching the Web and Everything Else (Jazoon07)
Apache Lucene: Searching the Web and Everything Else (Jazoon07)
dnaber
 
Introduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneIntroduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of Lucene
Rahul Jain
 
Implementing Semantic Search
Implementing Semantic SearchImplementing Semantic Search
Implementing Semantic Search
Paul Wlodarczyk
 
Portable Lucene Index Format & Applications - Andrzej Bialecki
Portable Lucene Index Format & Applications - Andrzej BialeckiPortable Lucene Index Format & Applications - Andrzej Bialecki
Portable Lucene Index Format & Applications - Andrzej Bialecki
lucenerevolution
 
Lucene And Solr Intro
Lucene And Solr IntroLucene And Solr Intro
Lucene And Solr Intro
pascaldimassimo
 
Introduction to Lucene and Solr - 1
Introduction to Lucene and Solr - 1Introduction to Lucene and Solr - 1
Introduction to Lucene and Solr - 1
YI-CHING WU
 
Lily for the Bay Area HBase UG - NYC edition
Lily for the Bay Area HBase UG - NYC editionLily for the Bay Area HBase UG - NYC edition
Lily for the Bay Area HBase UG - NYC edition
NGDATA
 
Apache lucene
Apache luceneApache lucene
Apache lucene
Dr. Abhiram Gandhe
 
Analytics in olap with lucene & hadoop
Analytics in olap with lucene & hadoopAnalytics in olap with lucene & hadoop
Analytics in olap with lucene & hadoop
lucenerevolution
 
Search Lucene
Search LuceneSearch Lucene
Search Lucene
Jeremy Coates
 
Architecture and implementation of Apache Lucene
Architecture and implementation of Apache LuceneArchitecture and implementation of Apache Lucene
Architecture and implementation of Apache Lucene
Josiane Gamgo
 
Solr
SolrSolr
Solr
sortivo
 
Introduction To Apache Lucene
Introduction To Apache LuceneIntroduction To Apache Lucene
Introduction To Apache Lucene
Mindfire Solutions
 
Beyond full-text searches with Lucene and Solr
Beyond full-text searches with Lucene and SolrBeyond full-text searches with Lucene and Solr
Beyond full-text searches with Lucene and Solr
Bertrand Delacretaz
 
Devinsampa nginx-scripting
Devinsampa nginx-scriptingDevinsampa nginx-scripting
Devinsampa nginx-scripting
Tony Fabeen
 
Munching & crunching - Lucene index post-processing
Munching & crunching - Lucene index post-processingMunching & crunching - Lucene index post-processing
Munching & crunching - Lucene index post-processing
abial
 
Index types
Index typesIndex types
Index types
Volodymyr Zhabiuk
 
From Lucene to Elasticsearch, a short explanation of horizontal scalability
From Lucene to Elasticsearch, a short explanation of horizontal scalabilityFrom Lucene to Elasticsearch, a short explanation of horizontal scalability
From Lucene to Elasticsearch, a short explanation of horizontal scalability
Stéphane Gamard
 
Text Indexing / Inverted Indices
Text Indexing / Inverted IndicesText Indexing / Inverted Indices
Text Indexing / Inverted Indices
Carlos Castillo (ChaTo)
 
Apache Lucene: Searching the Web and Everything Else (Jazoon07)
Apache Lucene: Searching the Web and Everything Else (Jazoon07)Apache Lucene: Searching the Web and Everything Else (Jazoon07)
Apache Lucene: Searching the Web and Everything Else (Jazoon07)
dnaber
 
Introduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneIntroduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of Lucene
Rahul Jain
 
Implementing Semantic Search
Implementing Semantic SearchImplementing Semantic Search
Implementing Semantic Search
Paul Wlodarczyk
 
Portable Lucene Index Format & Applications - Andrzej Bialecki
Portable Lucene Index Format & Applications - Andrzej BialeckiPortable Lucene Index Format & Applications - Andrzej Bialecki
Portable Lucene Index Format & Applications - Andrzej Bialecki
lucenerevolution
 
Introduction to Lucene and Solr - 1
Introduction to Lucene and Solr - 1Introduction to Lucene and Solr - 1
Introduction to Lucene and Solr - 1
YI-CHING WU
 
Lily for the Bay Area HBase UG - NYC edition
Lily for the Bay Area HBase UG - NYC editionLily for the Bay Area HBase UG - NYC edition
Lily for the Bay Area HBase UG - NYC edition
NGDATA
 
Analytics in olap with lucene & hadoop
Analytics in olap with lucene & hadoopAnalytics in olap with lucene & hadoop
Analytics in olap with lucene & hadoop
lucenerevolution
 
Architecture and implementation of Apache Lucene
Architecture and implementation of Apache LuceneArchitecture and implementation of Apache Lucene
Architecture and implementation of Apache Lucene
Josiane Gamgo
 
Beyond full-text searches with Lucene and Solr
Beyond full-text searches with Lucene and SolrBeyond full-text searches with Lucene and Solr
Beyond full-text searches with Lucene and Solr
Bertrand Delacretaz
 
Devinsampa nginx-scripting
Devinsampa nginx-scriptingDevinsampa nginx-scripting
Devinsampa nginx-scripting
Tony Fabeen
 
Munching & crunching - Lucene index post-processing
Munching & crunching - Lucene index post-processingMunching & crunching - Lucene index post-processing
Munching & crunching - Lucene index post-processing
abial
 
From Lucene to Elasticsearch, a short explanation of horizontal scalability
From Lucene to Elasticsearch, a short explanation of horizontal scalabilityFrom Lucene to Elasticsearch, a short explanation of horizontal scalability
From Lucene to Elasticsearch, a short explanation of horizontal scalability
Stéphane Gamard
 

Similar to Lucandra (20)

An Introduction to Solr
An Introduction to SolrAn Introduction to Solr
An Introduction to Solr
tomhill
 
BNC Tech Forum 09: Lexcycle Stanza demo
BNC Tech Forum 09: Lexcycle Stanza demoBNC Tech Forum 09: Lexcycle Stanza demo
BNC Tech Forum 09: Lexcycle Stanza demo
BookNet Canada
 
SPARQLing Services
SPARQLing ServicesSPARQLing Services
SPARQLing Services
Leigh Dodds
 
Best practices in museum search
 Best practices in museum search Best practices in museum search
Best practices in museum search
Nate Solas
 
Ks2007 Semanticweb In Action
Ks2007 Semanticweb In ActionKs2007 Semanticweb In Action
Ks2007 Semanticweb In Action
Rinke Hoekstra
 
Lecture 3 - Comm Lab: Web @ ITP
Lecture 3 - Comm Lab: Web @ ITP Lecture 3 - Comm Lab: Web @ ITP
Lecture 3 - Comm Lab: Web @ ITP
yucefmerhi
 
XMLT
XMLTXMLT
XMLT
Kunal Gaind
 
Ruby off Rails---rack, sinatra and sequel
Ruby off Rails---rack, sinatra and sequelRuby off Rails---rack, sinatra and sequel
Ruby off Rails---rack, sinatra and sequel
Jiang Wu
 
Is There Room For Another Elephant In Tucson
Is There Room For Another Elephant In TucsonIs There Room For Another Elephant In Tucson
Is There Room For Another Elephant In Tucson
Andy Lenards
 
Forum Presentation
Forum PresentationForum Presentation
Forum Presentation
Angus Pratt
 
Faceted Search – the 120 Million Documents Story
Faceted Search – the 120 Million Documents StoryFaceted Search – the 120 Million Documents Story
Faceted Search – the 120 Million Documents Story
Sourcesense
 
XML and Web Services with PHP5 and PEAR
XML and Web Services with PHP5 and PEARXML and Web Services with PHP5 and PEAR
XML and Web Services with PHP5 and PEAR
Stephan Schmidt
 
Weird Plsql
Weird PlsqlWeird Plsql
Weird Plsql
webanddb
 
Improving Soap Message Serialization
Improving Soap Message SerializationImproving Soap Message Serialization
Improving Soap Message Serialization
Prabath Siriwardena
 
Code4Lib 2007: MyResearch Portal
Code4Lib 2007: MyResearch PortalCode4Lib 2007: MyResearch Portal
Code4Lib 2007: MyResearch Portal
eby
 
Sax Dom Tutorial
Sax Dom TutorialSax Dom Tutorial
Sax Dom Tutorial
vikram singh
 
Sphinx on Rails
Sphinx on RailsSphinx on Rails
Sphinx on Rails
freelancing_god
 
How to use cache scope component
How to use cache scope componentHow to use cache scope component
How to use cache scope component
prathyusha vadla
 
Beyond the Node: Arkestration with Noah
Beyond the Node: Arkestration with NoahBeyond the Node: Arkestration with Noah
Beyond the Node: Arkestration with Noah
lusis
 
XML processing with perl
XML processing with perlXML processing with perl
XML processing with perl
Joe Jiang
 
An Introduction to Solr
An Introduction to SolrAn Introduction to Solr
An Introduction to Solr
tomhill
 
BNC Tech Forum 09: Lexcycle Stanza demo
BNC Tech Forum 09: Lexcycle Stanza demoBNC Tech Forum 09: Lexcycle Stanza demo
BNC Tech Forum 09: Lexcycle Stanza demo
BookNet Canada
 
SPARQLing Services
SPARQLing ServicesSPARQLing Services
SPARQLing Services
Leigh Dodds
 
Best practices in museum search
 Best practices in museum search Best practices in museum search
Best practices in museum search
Nate Solas
 
Ks2007 Semanticweb In Action
Ks2007 Semanticweb In ActionKs2007 Semanticweb In Action
Ks2007 Semanticweb In Action
Rinke Hoekstra
 
Lecture 3 - Comm Lab: Web @ ITP
Lecture 3 - Comm Lab: Web @ ITP Lecture 3 - Comm Lab: Web @ ITP
Lecture 3 - Comm Lab: Web @ ITP
yucefmerhi
 
Ruby off Rails---rack, sinatra and sequel
Ruby off Rails---rack, sinatra and sequelRuby off Rails---rack, sinatra and sequel
Ruby off Rails---rack, sinatra and sequel
Jiang Wu
 
Is There Room For Another Elephant In Tucson
Is There Room For Another Elephant In TucsonIs There Room For Another Elephant In Tucson
Is There Room For Another Elephant In Tucson
Andy Lenards
 
Forum Presentation
Forum PresentationForum Presentation
Forum Presentation
Angus Pratt
 
Faceted Search – the 120 Million Documents Story
Faceted Search – the 120 Million Documents StoryFaceted Search – the 120 Million Documents Story
Faceted Search – the 120 Million Documents Story
Sourcesense
 
XML and Web Services with PHP5 and PEAR
XML and Web Services with PHP5 and PEARXML and Web Services with PHP5 and PEAR
XML and Web Services with PHP5 and PEAR
Stephan Schmidt
 
Weird Plsql
Weird PlsqlWeird Plsql
Weird Plsql
webanddb
 
Improving Soap Message Serialization
Improving Soap Message SerializationImproving Soap Message Serialization
Improving Soap Message Serialization
Prabath Siriwardena
 
Code4Lib 2007: MyResearch Portal
Code4Lib 2007: MyResearch PortalCode4Lib 2007: MyResearch Portal
Code4Lib 2007: MyResearch Portal
eby
 
How to use cache scope component
How to use cache scope componentHow to use cache scope component
How to use cache scope component
prathyusha vadla
 
Beyond the Node: Arkestration with Noah
Beyond the Node: Arkestration with NoahBeyond the Node: Arkestration with Noah
Beyond the Node: Arkestration with Noah
lusis
 
XML processing with perl
XML processing with perlXML processing with perl
XML processing with perl
Joe Jiang
 

Lucandra

  • 1. Lucandra Lucene + Cassandra http://github/tjake/Lucandra http://twitter.com/tjake Jake Luciani
  • 2. What we'll cover today: Search use-cases Problems scaling and maintaining Lucene/Solr Cassandra Lucandra Lucandra in Action  Q&A
  • 3. Types of search apps:  
  • 4. Types of search apps:  
  • 5. Lucene/Solr Scaling Problems Writes are expensive on a live system Merge, Reopen, Optimize, Sorting "Too many open files" Solr replication too many moving parts Scaling writes requires client side sharding Lots of grid management -> ZooKeeper? Backups? Monitoring? Failures? Ops Team? Oh my! This sounds a lot like mysql doesn't it?....
  • 6. Cassandra - Love Child of BigTable and Dynamo Peer to peer (easy to add new nodes) CAP Configurable Multi-level TreeMap (sorta) Pluggable replication/sorting Writes are very fast! Low latency  Integrates with Hadoop  Major adoption and development
  • 7. Cassandra's Data Model { "bloghost.com" :                                                   // Keyspace      { "Posts" :                                                            // ColumnFamily        { " tjake.bloghost.com " :                                   // Key            { "20100426-Lucandra" : "lucandra talk today!" } // Columns                }        },      { "Comments" :                                         // SuperColumnFamily          { " tjake.bloghost.com " :                        // Key            { "20100426-Lucandra-1":                // SuperColumn                {"From" : "Otis","Comment": "Don't Suck!"}, // Columns               },            { "20100426-Lucandra-2":                // SuperColumn                {"From" : "Jake","Comment": "O.K."},  // Columns                         },      } }}
  • 9. Cassandra - Scale Up / Scale Down
  • 13. How is an index stored? { "Lucandra" :    { "Docs" :                         {  "Index1/Doc1" :  { "Field1" : "T1 T2 T1", ... },        {  "Index1/Doc2" :  { "Field1" : "T3 T1", ... }    },    {"TermVectors" :        {"Index1/Field1/T1" : { "Doc1": [0, 2], "Doc2":[1] },        {"Index1/Field1/T2" : { "Doc1": [1] },        {"Index1/Field1/T3" : { "Doc2": [1] },    } }
  • 15. Lucandra In Action Sparse.ly and Wikassandra
  • 16. sparse.ly -  twitter search for friends only ~4k Indexes on 2 boxes
  • 17. Wikassandra - Search wikipedia 4 node cluster 3k writes per sec (over thrift from single node) Solr interface

Editor's Notes