SlideShare a Scribd company logo
ApacheGary DusbabekRackspaceSilicon Valley Cloud Computing Group • 17 June 2010
OutlineHistoryScalingReplication ModelData ModelTuningWrite PathRead PathClient AccessPractical Considerations
OutlineHistoryScalingReplication ModelData ModelTuningWrite PathRead PathClient AccessPractical Considerations
Why Cassandra?1.98 billion 500 GB drives988EB6 fold growthIn 4 years322 million 500GB drives161 EB20062010Source: http://www.emc.com/collateral/analyst-reports/expanding-digital-idc-white-paper.pdf
Why Cassandra?
SQLSpecialized data structures (think B-trees)Shines with complicated queriesFocus on fast query & analysis quicklyNot necessarily on large datasets
Ever tried scaling a RDBMSFor reads?Memcache etc.For writes?Oh noes!
VerticalScalingIs hardcredit: janetmck via flickr
No, really:VerticalScalingIs hard
Enter CassandraAmazon DynamoConsistent hashingPartitioningReplicationOne-hop routingGoogle BigTableColumn FamiliesMemtablesSSTables
OriginsPre-2008
Moving Along2008
Landed2009
OutlineHistoryScalingReplication ModelData ModelTuningWrite PathRead PathClient AccessPractical Considerations
Distributed and ScalableHorizontal!All nodes are identicalNo master or SPOFAdding is simpleAutomatic cluster maintenance
OutlineHistoryScalingReplication ModelData ModelTuningWrite PathRead PathClient AccessPractical Considerations
ReplicationReplication factorHow many nodes data is replicated onConsistency levelZero, One, Quorum, AllSync or async for writesReliability of readsRead repair
Ring TopologyRF=3Conceptual RingOne token per nodeMultiple ranges per nodeajdg
Ring TopologyRF=2Conceptual RingOne token per nodeMultiple ranges per nodeajdg
New NodeRF=3Token assignmentRange adjustmentBootstrapArrival only affects immediate neighborsamjdg
Ring PartitionRF=3Node diesAvailable?	Hinting	HandoffAchtung!Plan for thisajdg
OutlineHistoryScalingReplication ModelData ModelTuningWrite PathRead PathClient AccessPractical Considerations
Schema-free Sparse-tableFlexible column namingYou define the sort orderNot required to have a specific column just because another row does
Data ModelKeyspaceColumnFamilyRow (indexed)
Key
ColumnsName (sorted)Value
Easier to show from the bottom up
Data ModelA single column
Data ModelA single row
Data Model
OutlineHistoryScalingReplication ModelData ModelTuningWrite PathRead PathClient AccessPractical Considerations
Eventually ConsistentCAP TheoremConsistencyAvailabilityPartition ToleranceChoose twoCassandra chooses A and PBut…
Eventually ConsistentI got a fever!  And the only prescription is MORE CONSISTENCY!
Tunable ConsistencyGive up a little A and P to get more CRatchet up the consistency levelR + W > N    Strong consistencyMore to come
OutlineHistoryScalingReplication ModelData ModelTuningWrite PathRead PathClient AccessPractical Considerations
Inserting: OverviewSimple: put(key, col, value) Complex: put(key, [col:value, …, col:value]) Batch: multi key.
Inserting: WritesCommit log for durabilityConfigurable fsyncSequential writes onlyMemtable – no disk access (no reads or seeks)Sstables are final (become read only)IndexesBloom filterRaw dataBottom line: FAST!!!
OutlineHistoryScalingReplication ModelData ModelTuningWritePathRead PathClient AccessPractical Considerations
Querying: OverviewYou need a key or keys:Single: key=‘a’Range: key=‘a’ through ’f’And columns to retrieve:Slice:  cols={bar through kite}By name: key=‘b’ cols={bar, cat, llama}Nothing like SQL “WHERE col=‘faz’”But secondary indices are being worked on (see CASSANDRA-749)
Querying: ReadsPractically lock freeSstable proliferationNew in 0.6:Row cache (avoid sstable lookup, not write-through)Key cache (avoid index scan)
OutlineHistoryScalingReplication ModelData ModelTuningWritePathRead PathClient AccessPractical Considerations
Client API (Low Level)Fat ClientLive non-storage nodeReduced RPC overheadThrift (12 language bindings!)http://incubator.apache.org/thrift/No streamingAvroWork in progress
Client API (High Level)http://wiki.apache.org/cassandra/ClientOptions
Feature richConnection poolingLoad balancing/failoverSimplified APIsVersion opaque
OutlineHistoryScalingReplication ModelData ModelTuningWritePathRead PathClient AccessPractical Considerations
Practical Considerations	Partitioner-Random or Order PreservingRange queriesProvisioningVirtual or bare metal Cluster sizeData modelThink in terms of accessGiving up transactions, ad-hoc queries, arbitrary indexes and joins(you may already do this with an RDBMS!)
Practical ConsiderationsWide rowsData life-spanCluster planningBootstrapping

More Related Content

Introduction to Cassandra (June 2010)

Editor's Notes

  1. Data growth has been expanding.
  2. Historical industry leaders
  3. 32 core processor machines are expensiveCosts go way up when you try to scale these databasesAlso-instability.
  4. Terabytes of data~1,000,000 ops/secondSchema changes are difficult (impossible)Manual sharding takes a lot of effortAutomated sharding + replication is difficult
  5. 100 M users, 25 TB data
  6. Horizontal – commodity hardware, not specialized boxes
  7. Cluster is a logical storage ringNode placement divides the ring into ranges that represent start/stop points for keysAutomatic or manual token assignment (use another slide for that) Closer together means less responsibility and data
  8. Token
  9. Bootstrapping
  10. Hinting not designed for long failures.
  11. RDBMS focus on consistency. Limits scale.
  12. No multi-key transactions
  13. Sstable proliferation degrades performance.
  14. DistributedScalableSchema-freeSparse tableEventually consistentTunable (throughput and fault-tolerance)