SlideShare a Scribd company logo
Apache Cassandra
Anti-Patterns

Matthew F. Dennis // @mdennis
C* on a SAN
●   fact: C* was designed, from the start, for
    commodity hardware
●   more than just not requiring a SAN, C*
    actually performs better without one
●   SPOF
●   unnecessary (large) cost
●   “(un)coordinated” IO from nodes
●   SANs were designed to solve problems C*
    doesn’t have
Commit Log + Data Directory
                (on the same volume)


●   conflicting IO patterns
    ●   commit log is 100% sequential append only
    ●   data directory is (usually) random on reads
●   commit log is essentially serialized
●   massive difference in write
    throughput under load
●   NB: does not apply to SSDs or EC2
Oversize JVM Heaps
●   4 – 8 GB is good
    (assuming sufficient ram on your boxen)
●   10 – 12 GB is not bad
    (and often “correct”)
●   16GB == max
●   > 16GB => badness
●   heap >= boxen RAM => badness
oversized JVM heaps
 (for the visually oriented)
not using -pr on scheduled repairs


●   -pr is kind of new
●   only applies to scheduled repairs
●   reduces work to 1/RF (e.g. 1/3)
low file handle limit
●   C* requires lots of file handles
    (sorry, deal with it)
●   Sockets and SSTables mostly
●   1024 (common default) is not sufficient
●   fails in horrible miserably unpredictable ways
    (though clear from the logs after the fact)
●   32K - 128K is common
●   unlimited is also common, but personally I
    prefer some sort of limit ...
Load Balacners
                   (in front of C*)


●   clients will load balance
    (C* has no master so this can work reliably)
●   SPOF
●   performance bottle neck
●   unneeded complexity
●   unneeded cost
restricting clients to a single node


●   why?
●   no really, I don’t understand how
    this was thought to be a good idea
●   thedailywtf.com territory
Unbalanced Ring
●   used to be the number one
    problem encountered
●   OPSC automates the resolution of
    this to two clicks (do it + confirm)
    even across multiple data centers
●   related: don’t let C* auto pick your
    tokens, always specify initial_token
Row Cache + Slice Queries

●   the row cache is a row cache, not a query cache or
    slice cache or magic cache or WTF-ever-you-thought-
    it-was cache
●   for the obvious impaired: that’s why we called it a
    row cache – because it caches rows
●   laughable performance difference in some extreme
    cases (e.g. 100X increase in throughput, 10X drop in
    latency, maxed cpu to under 10% average)
Row Cache + Large Rows


●   2GB row? yeah, lets cache that !!!

●   related: wtf are you doing trying to
    read a 2GB row all at once anyway?
OPP/BOP
●   if you think you need BOP, check
    again
●   no seriously, you’re doing it wrong
●   if you use BOP anyway:
    ●   IRC will mock you
    ●   your OPS team will plan your disappearance
    ●   I will setup a auto reply for your entire domain
        that responds solely with “stop using BOP”
Unbounded Batches
●   batches are sent as a single message
●   they must fit entirely in memory
    (both server side and client side)
●   best size is very much an empirical
    exercise depending on your HW, load,
    data model, moon phase, etc
    (start with 10 – 100 and tune)
●   NB: streaming transport will address
    this in future releases
Bad Rotational Math

●   rotational disks require seek time
●   5ms is a fast seek time for a rotational disk
●   you cannot get thousands of random seeks
    per second from rotational disks
●   caches/memory alleviate this, SSDs solve it
●   maths are teh hard? buy SSDs
●   everything fits in memory? I don’t care what
    disks you buy
32 Bit JVMs

●   C* deals (usually) with BigData
●   32 bits cannot address BigData
●   mmap, file offsets, heaps, caches
●   always wrong? no, I guess not ...
EBS volumes
●   nice in theory, but ...
●   not predictable
●   freezes common
●   outages common
●   stripe ephemeral drives instead
●   provisioned IOPS EBS?
    future hazy, ask again later
Non-Sun (err, Oracle) JVM

●   at least u22, but in general the
    latest release
    (unless you have specific reasons otherwise)
●   this is changing
●   some people (successfully) use
    OpenJDK anyway
Super Columns
●   10-15 percent overhead on reads and writes
●   entire super column is always held in memory
    at all stages
●   most C* devs hate working on them
●   C* and DataStax is committed to maintaining
    the API going forward, but they should be
    avoided for new projects
●   composite columns are an alternative
Not Running OPSC

●   extremely useful postmortem
●   trivial (usually) to setup
●   DataStax offers a free version
    (you have no excuse now)
Q?
Matthew F. Dennis // @mdennis
Thank You!
 Matthew F. Dennis // @mdennis
 http://slideshare.net/mattdennis

More Related Content

What's hot (20)

Logical replication with pglogical
Logical replication with pglogicalLogical replication with pglogical
Logical replication with pglogical
Umair Shahid
 
Shootout at the AWS Corral
Shootout at the AWS CorralShootout at the AWS Corral
Shootout at the AWS Corral
PostgreSQL Experts, Inc.
 
Cassandra NYC 2011 Data Modeling
Cassandra NYC 2011 Data ModelingCassandra NYC 2011 Data Modeling
Cassandra NYC 2011 Data Modeling
Matthew Dennis
 
PostgreSQL Replication in 10 Minutes - SCALE
PostgreSQL Replication in 10  Minutes - SCALEPostgreSQL Replication in 10  Minutes - SCALE
PostgreSQL Replication in 10 Minutes - SCALE
PostgreSQL Experts, Inc.
 
Seastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for CephSeastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for Ceph
ScyllaDB
 
PostgreSQL worst practices, version PGConf.US 2017 by Ilya Kosmodemiansky
PostgreSQL worst practices, version PGConf.US 2017 by Ilya KosmodemianskyPostgreSQL worst practices, version PGConf.US 2017 by Ilya Kosmodemiansky
PostgreSQL worst practices, version PGConf.US 2017 by Ilya Kosmodemiansky
PostgreSQL-Consulting
 
Elephant Roads: a tour of Postgres forks
Elephant Roads: a tour of Postgres forksElephant Roads: a tour of Postgres forks
Elephant Roads: a tour of Postgres forks
Command Prompt., Inc
 
92 grand prix_2013
92 grand prix_201392 grand prix_2013
92 grand prix_2013
PostgreSQL Experts, Inc.
 
P99CONF — What We Need to Unlearn About Persistent Storage
P99CONF — What We Need to Unlearn About Persistent StorageP99CONF — What We Need to Unlearn About Persistent Storage
P99CONF — What We Need to Unlearn About Persistent Storage
ScyllaDB
 
Shootout at the PAAS Corral
Shootout at the PAAS CorralShootout at the PAAS Corral
Shootout at the PAAS Corral
PostgreSQL Experts, Inc.
 
Ndb cluster 80_ycsb_mem
Ndb cluster 80_ycsb_memNdb cluster 80_ycsb_mem
Ndb cluster 80_ycsb_mem
mikaelronstrom
 
Responding rapidly when you have 100+ GB data sets in Java
Responding rapidly when you have 100+ GB data sets in JavaResponding rapidly when you have 100+ GB data sets in Java
Responding rapidly when you have 100+ GB data sets in Java
Peter Lawrey
 
HighLoad Solutions On MySQL / Xiaobin Lin (Alibaba)
HighLoad Solutions On MySQL / Xiaobin Lin (Alibaba)HighLoad Solutions On MySQL / Xiaobin Lin (Alibaba)
HighLoad Solutions On MySQL / Xiaobin Lin (Alibaba)
Ontico
 
OOPs, OOMs, oh my! Containerizing JVM apps
OOPs, OOMs, oh my! Containerizing JVM appsOOPs, OOMs, oh my! Containerizing JVM apps
OOPs, OOMs, oh my! Containerizing JVM apps
Sematext Group, Inc.
 
Avoiding Data Hotspots at Scale
Avoiding Data Hotspots at ScaleAvoiding Data Hotspots at Scale
Avoiding Data Hotspots at Scale
ScyllaDB
 
Unikraft: Fast, Specialized Unikernels the Easy Way
Unikraft: Fast, Specialized Unikernels the Easy WayUnikraft: Fast, Specialized Unikernels the Easy Way
Unikraft: Fast, Specialized Unikernels the Easy Way
ScyllaDB
 
Low latency for high throughput
Low latency for high throughputLow latency for high throughput
Low latency for high throughput
Peter Lawrey
 
Fail over fail_back
Fail over fail_backFail over fail_back
Fail over fail_back
PostgreSQL Experts, Inc.
 
Streaming Replication (Keynote @ PostgreSQL Conference 2009 Japan)
Streaming Replication (Keynote @ PostgreSQL Conference 2009 Japan)Streaming Replication (Keynote @ PostgreSQL Conference 2009 Japan)
Streaming Replication (Keynote @ PostgreSQL Conference 2009 Japan)
Masao Fujii
 
7 Ways To Crash Postgres
7 Ways To Crash Postgres7 Ways To Crash Postgres
7 Ways To Crash Postgres
PostgreSQL Experts, Inc.
 
Logical replication with pglogical
Logical replication with pglogicalLogical replication with pglogical
Logical replication with pglogical
Umair Shahid
 
Cassandra NYC 2011 Data Modeling
Cassandra NYC 2011 Data ModelingCassandra NYC 2011 Data Modeling
Cassandra NYC 2011 Data Modeling
Matthew Dennis
 
PostgreSQL Replication in 10 Minutes - SCALE
PostgreSQL Replication in 10  Minutes - SCALEPostgreSQL Replication in 10  Minutes - SCALE
PostgreSQL Replication in 10 Minutes - SCALE
PostgreSQL Experts, Inc.
 
Seastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for CephSeastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for Ceph
ScyllaDB
 
PostgreSQL worst practices, version PGConf.US 2017 by Ilya Kosmodemiansky
PostgreSQL worst practices, version PGConf.US 2017 by Ilya KosmodemianskyPostgreSQL worst practices, version PGConf.US 2017 by Ilya Kosmodemiansky
PostgreSQL worst practices, version PGConf.US 2017 by Ilya Kosmodemiansky
PostgreSQL-Consulting
 
Elephant Roads: a tour of Postgres forks
Elephant Roads: a tour of Postgres forksElephant Roads: a tour of Postgres forks
Elephant Roads: a tour of Postgres forks
Command Prompt., Inc
 
P99CONF — What We Need to Unlearn About Persistent Storage
P99CONF — What We Need to Unlearn About Persistent StorageP99CONF — What We Need to Unlearn About Persistent Storage
P99CONF — What We Need to Unlearn About Persistent Storage
ScyllaDB
 
Ndb cluster 80_ycsb_mem
Ndb cluster 80_ycsb_memNdb cluster 80_ycsb_mem
Ndb cluster 80_ycsb_mem
mikaelronstrom
 
Responding rapidly when you have 100+ GB data sets in Java
Responding rapidly when you have 100+ GB data sets in JavaResponding rapidly when you have 100+ GB data sets in Java
Responding rapidly when you have 100+ GB data sets in Java
Peter Lawrey
 
HighLoad Solutions On MySQL / Xiaobin Lin (Alibaba)
HighLoad Solutions On MySQL / Xiaobin Lin (Alibaba)HighLoad Solutions On MySQL / Xiaobin Lin (Alibaba)
HighLoad Solutions On MySQL / Xiaobin Lin (Alibaba)
Ontico
 
OOPs, OOMs, oh my! Containerizing JVM apps
OOPs, OOMs, oh my! Containerizing JVM appsOOPs, OOMs, oh my! Containerizing JVM apps
OOPs, OOMs, oh my! Containerizing JVM apps
Sematext Group, Inc.
 
Avoiding Data Hotspots at Scale
Avoiding Data Hotspots at ScaleAvoiding Data Hotspots at Scale
Avoiding Data Hotspots at Scale
ScyllaDB
 
Unikraft: Fast, Specialized Unikernels the Easy Way
Unikraft: Fast, Specialized Unikernels the Easy WayUnikraft: Fast, Specialized Unikernels the Easy Way
Unikraft: Fast, Specialized Unikernels the Easy Way
ScyllaDB
 
Low latency for high throughput
Low latency for high throughputLow latency for high throughput
Low latency for high throughput
Peter Lawrey
 
Streaming Replication (Keynote @ PostgreSQL Conference 2009 Japan)
Streaming Replication (Keynote @ PostgreSQL Conference 2009 Japan)Streaming Replication (Keynote @ PostgreSQL Conference 2009 Japan)
Streaming Replication (Keynote @ PostgreSQL Conference 2009 Japan)
Masao Fujii
 

Viewers also liked (20)

Cassandra nice use cases and worst anti patterns
Cassandra nice use cases and worst anti patternsCassandra nice use cases and worst anti patterns
Cassandra nice use cases and worst anti patterns
Duyhai Doan
 
Cassandra concepts, patterns and anti-patterns
Cassandra concepts, patterns and anti-patternsCassandra concepts, patterns and anti-patterns
Cassandra concepts, patterns and anti-patterns
Dave Gardner
 
DZone Cassandra Data Modeling Webinar
DZone Cassandra Data Modeling WebinarDZone Cassandra Data Modeling Webinar
DZone Cassandra Data Modeling Webinar
Matthew Dennis
 
durability, durability, durability
durability, durability, durabilitydurability, durability, durability
durability, durability, durability
Matthew Dennis
 
Cassandra Data Modeling
Cassandra Data ModelingCassandra Data Modeling
Cassandra Data Modeling
Matthew Dennis
 
Cassandra, Modeling and Availability at AMUG
Cassandra, Modeling and Availability at AMUGCassandra, Modeling and Availability at AMUG
Cassandra, Modeling and Availability at AMUG
Matthew Dennis
 
The Future Of Big Data
The Future Of Big DataThe Future Of Big Data
The Future Of Big Data
Matthew Dennis
 
Learning Cassandra
Learning CassandraLearning Cassandra
Learning Cassandra
Dave Gardner
 
Advanced data modeling with apache cassandra
Advanced data modeling with apache cassandraAdvanced data modeling with apache cassandra
Advanced data modeling with apache cassandra
Patrick McFadin
 
Cassandra Data Modeling - Practical Considerations @ Netflix
Cassandra Data Modeling - Practical Considerations @ NetflixCassandra Data Modeling - Practical Considerations @ Netflix
Cassandra Data Modeling - Practical Considerations @ Netflix
nkorla1share
 
Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...
Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...
Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...
DataStax
 
Acunu Analytics: Simpler Real-Time Cassandra Apps
Acunu Analytics: Simpler Real-Time Cassandra AppsAcunu Analytics: Simpler Real-Time Cassandra Apps
Acunu Analytics: Simpler Real-Time Cassandra Apps
Acunu
 
Virtual nodes: Operational Aspirin
Virtual nodes: Operational AspirinVirtual nodes: Operational Aspirin
Virtual nodes: Operational Aspirin
Acunu
 
Tuberculosis abdominal
Tuberculosis abdominalTuberculosis abdominal
Tuberculosis abdominal
Humberto Blas
 
Webinar Cassandra Anti-Patterns
Webinar Cassandra Anti-PatternsWebinar Cassandra Anti-Patterns
Webinar Cassandra Anti-Patterns
Christopher Batey
 
Fears, misconceptions, and accepted anti patterns of a first time cassandra a...
Fears, misconceptions, and accepted anti patterns of a first time cassandra a...Fears, misconceptions, and accepted anti patterns of a first time cassandra a...
Fears, misconceptions, and accepted anti patterns of a first time cassandra a...
Kinetic Data
 
Денис Нелюбин, "Тамтэк"
Денис Нелюбин, "Тамтэк"Денис Нелюбин, "Тамтэк"
Денис Нелюбин, "Тамтэк"
Ontico
 
Cassandra Day Chicago 2015: Top 5 Tips/Tricks with Apache Cassandra and DSE
Cassandra Day Chicago 2015: Top 5 Tips/Tricks with Apache Cassandra and DSECassandra Day Chicago 2015: Top 5 Tips/Tricks with Apache Cassandra and DSE
Cassandra Day Chicago 2015: Top 5 Tips/Tricks with Apache Cassandra and DSE
DataStax Academy
 
Cap Theorem
Cap TheoremCap Theorem
Cap Theorem
Michał Łomnicki
 
No sql system_survey
No sql system_surveyNo sql system_survey
No sql system_survey
Henrique Dias
 
Cassandra nice use cases and worst anti patterns
Cassandra nice use cases and worst anti patternsCassandra nice use cases and worst anti patterns
Cassandra nice use cases and worst anti patterns
Duyhai Doan
 
Cassandra concepts, patterns and anti-patterns
Cassandra concepts, patterns and anti-patternsCassandra concepts, patterns and anti-patterns
Cassandra concepts, patterns and anti-patterns
Dave Gardner
 
DZone Cassandra Data Modeling Webinar
DZone Cassandra Data Modeling WebinarDZone Cassandra Data Modeling Webinar
DZone Cassandra Data Modeling Webinar
Matthew Dennis
 
durability, durability, durability
durability, durability, durabilitydurability, durability, durability
durability, durability, durability
Matthew Dennis
 
Cassandra Data Modeling
Cassandra Data ModelingCassandra Data Modeling
Cassandra Data Modeling
Matthew Dennis
 
Cassandra, Modeling and Availability at AMUG
Cassandra, Modeling and Availability at AMUGCassandra, Modeling and Availability at AMUG
Cassandra, Modeling and Availability at AMUG
Matthew Dennis
 
The Future Of Big Data
The Future Of Big DataThe Future Of Big Data
The Future Of Big Data
Matthew Dennis
 
Learning Cassandra
Learning CassandraLearning Cassandra
Learning Cassandra
Dave Gardner
 
Advanced data modeling with apache cassandra
Advanced data modeling with apache cassandraAdvanced data modeling with apache cassandra
Advanced data modeling with apache cassandra
Patrick McFadin
 
Cassandra Data Modeling - Practical Considerations @ Netflix
Cassandra Data Modeling - Practical Considerations @ NetflixCassandra Data Modeling - Practical Considerations @ Netflix
Cassandra Data Modeling - Practical Considerations @ Netflix
nkorla1share
 
Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...
Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...
Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...
DataStax
 
Acunu Analytics: Simpler Real-Time Cassandra Apps
Acunu Analytics: Simpler Real-Time Cassandra AppsAcunu Analytics: Simpler Real-Time Cassandra Apps
Acunu Analytics: Simpler Real-Time Cassandra Apps
Acunu
 
Virtual nodes: Operational Aspirin
Virtual nodes: Operational AspirinVirtual nodes: Operational Aspirin
Virtual nodes: Operational Aspirin
Acunu
 
Tuberculosis abdominal
Tuberculosis abdominalTuberculosis abdominal
Tuberculosis abdominal
Humberto Blas
 
Webinar Cassandra Anti-Patterns
Webinar Cassandra Anti-PatternsWebinar Cassandra Anti-Patterns
Webinar Cassandra Anti-Patterns
Christopher Batey
 
Fears, misconceptions, and accepted anti patterns of a first time cassandra a...
Fears, misconceptions, and accepted anti patterns of a first time cassandra a...Fears, misconceptions, and accepted anti patterns of a first time cassandra a...
Fears, misconceptions, and accepted anti patterns of a first time cassandra a...
Kinetic Data
 
Денис Нелюбин, "Тамтэк"
Денис Нелюбин, "Тамтэк"Денис Нелюбин, "Тамтэк"
Денис Нелюбин, "Тамтэк"
Ontico
 
Cassandra Day Chicago 2015: Top 5 Tips/Tricks with Apache Cassandra and DSE
Cassandra Day Chicago 2015: Top 5 Tips/Tricks with Apache Cassandra and DSECassandra Day Chicago 2015: Top 5 Tips/Tricks with Apache Cassandra and DSE
Cassandra Day Chicago 2015: Top 5 Tips/Tricks with Apache Cassandra and DSE
DataStax Academy
 
No sql system_survey
No sql system_surveyNo sql system_survey
No sql system_survey
Henrique Dias
 
Ad

Similar to strangeloop 2012 apache cassandra anti patterns (20)

JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
Speedment, Inc.
 
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
Malin Weiss
 
M6d cassandrapresentation
M6d cassandrapresentationM6d cassandrapresentation
M6d cassandrapresentation
Edward Capriolo
 
5 Steps to PostgreSQL Performance
5 Steps to PostgreSQL Performance5 Steps to PostgreSQL Performance
5 Steps to PostgreSQL Performance
Command Prompt., Inc
 
Five steps perform_2009 (1)
Five steps perform_2009 (1)Five steps perform_2009 (1)
Five steps perform_2009 (1)
PostgreSQL Experts, Inc.
 
OpenDS_Jazoon2010
OpenDS_Jazoon2010OpenDS_Jazoon2010
OpenDS_Jazoon2010
Ludovic Poitou
 
NYJavaSIG - Big Data Microservices w/ Speedment
NYJavaSIG - Big Data Microservices w/ SpeedmentNYJavaSIG - Big Data Microservices w/ Speedment
NYJavaSIG - Big Data Microservices w/ Speedment
Speedment, Inc.
 
Which Freaking Database Should I Use?
Which Freaking Database Should I Use?Which Freaking Database Should I Use?
Which Freaking Database Should I Use?
Great Wide Open
 
CassandraSummit2015_Cassandra upgrades at scale @ NETFLIX
CassandraSummit2015_Cassandra upgrades at scale @ NETFLIXCassandraSummit2015_Cassandra upgrades at scale @ NETFLIX
CassandraSummit2015_Cassandra upgrades at scale @ NETFLIX
Vinay Kumar Chella
 
Big Data and its emergence
Big Data and its emergenceBig Data and its emergence
Big Data and its emergence
koolkalpz
 
Tweaking performance on high-load projects
Tweaking performance on high-load projectsTweaking performance on high-load projects
Tweaking performance on high-load projects
Dmitriy Dumanskiy
 
Key Challenges in Cloud Computing and How Yahoo! is Approaching Them
Key Challenges in Cloud Computing and How Yahoo! is Approaching ThemKey Challenges in Cloud Computing and How Yahoo! is Approaching Them
Key Challenges in Cloud Computing and How Yahoo! is Approaching Them
Yahoo Developer Network
 
Cassandra implementation for collecting data and presenting data
Cassandra implementation for collecting data and presenting dataCassandra implementation for collecting data and presenting data
Cassandra implementation for collecting data and presenting data
Chen Robert
 
Challenges in Maintaining a High Performance Search Engine Written in Java
Challenges in Maintaining a High Performance Search Engine Written in JavaChallenges in Maintaining a High Performance Search Engine Written in Java
Challenges in Maintaining a High Performance Search Engine Written in Java
lucenerevolution
 
Voldemort Nosql
Voldemort NosqlVoldemort Nosql
Voldemort Nosql
elliando dias
 
SQL or NoSQL, that is the question!
SQL or NoSQL, that is the question!SQL or NoSQL, that is the question!
SQL or NoSQL, that is the question!
Andraz Tori
 
Nyc summit intro_to_cassandra
Nyc summit intro_to_cassandraNyc summit intro_to_cassandra
Nyc summit intro_to_cassandra
zznate
 
Cassandra from the trenches: migrating Netflix
Cassandra from the trenches: migrating NetflixCassandra from the trenches: migrating Netflix
Cassandra from the trenches: migrating Netflix
Jason Brown
 
Cassandra at Instagram (August 2013)
Cassandra at Instagram (August 2013)Cassandra at Instagram (August 2013)
Cassandra at Instagram (August 2013)
Rick Branson
 
Five Lessons in Distributed Databases
Five Lessons  in Distributed DatabasesFive Lessons  in Distributed Databases
Five Lessons in Distributed Databases
jbellis
 
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
Speedment, Inc.
 
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
Malin Weiss
 
M6d cassandrapresentation
M6d cassandrapresentationM6d cassandrapresentation
M6d cassandrapresentation
Edward Capriolo
 
NYJavaSIG - Big Data Microservices w/ Speedment
NYJavaSIG - Big Data Microservices w/ SpeedmentNYJavaSIG - Big Data Microservices w/ Speedment
NYJavaSIG - Big Data Microservices w/ Speedment
Speedment, Inc.
 
Which Freaking Database Should I Use?
Which Freaking Database Should I Use?Which Freaking Database Should I Use?
Which Freaking Database Should I Use?
Great Wide Open
 
CassandraSummit2015_Cassandra upgrades at scale @ NETFLIX
CassandraSummit2015_Cassandra upgrades at scale @ NETFLIXCassandraSummit2015_Cassandra upgrades at scale @ NETFLIX
CassandraSummit2015_Cassandra upgrades at scale @ NETFLIX
Vinay Kumar Chella
 
Big Data and its emergence
Big Data and its emergenceBig Data and its emergence
Big Data and its emergence
koolkalpz
 
Tweaking performance on high-load projects
Tweaking performance on high-load projectsTweaking performance on high-load projects
Tweaking performance on high-load projects
Dmitriy Dumanskiy
 
Key Challenges in Cloud Computing and How Yahoo! is Approaching Them
Key Challenges in Cloud Computing and How Yahoo! is Approaching ThemKey Challenges in Cloud Computing and How Yahoo! is Approaching Them
Key Challenges in Cloud Computing and How Yahoo! is Approaching Them
Yahoo Developer Network
 
Cassandra implementation for collecting data and presenting data
Cassandra implementation for collecting data and presenting dataCassandra implementation for collecting data and presenting data
Cassandra implementation for collecting data and presenting data
Chen Robert
 
Challenges in Maintaining a High Performance Search Engine Written in Java
Challenges in Maintaining a High Performance Search Engine Written in JavaChallenges in Maintaining a High Performance Search Engine Written in Java
Challenges in Maintaining a High Performance Search Engine Written in Java
lucenerevolution
 
SQL or NoSQL, that is the question!
SQL or NoSQL, that is the question!SQL or NoSQL, that is the question!
SQL or NoSQL, that is the question!
Andraz Tori
 
Nyc summit intro_to_cassandra
Nyc summit intro_to_cassandraNyc summit intro_to_cassandra
Nyc summit intro_to_cassandra
zznate
 
Cassandra from the trenches: migrating Netflix
Cassandra from the trenches: migrating NetflixCassandra from the trenches: migrating Netflix
Cassandra from the trenches: migrating Netflix
Jason Brown
 
Cassandra at Instagram (August 2013)
Cassandra at Instagram (August 2013)Cassandra at Instagram (August 2013)
Cassandra at Instagram (August 2013)
Rick Branson
 
Five Lessons in Distributed Databases
Five Lessons  in Distributed DatabasesFive Lessons  in Distributed Databases
Five Lessons in Distributed Databases
jbellis
 
Ad

Recently uploaded (20)

Your startup on AWS - How to architect and maintain a Lean and Mean account
Your startup on AWS - How to architect and maintain a Lean and Mean accountYour startup on AWS - How to architect and maintain a Lean and Mean account
Your startup on AWS - How to architect and maintain a Lean and Mean account
angelo60207
 
Domino IQ – Was Sie erwartet, erste Schritte und Anwendungsfälle
Domino IQ – Was Sie erwartet, erste Schritte und AnwendungsfälleDomino IQ – Was Sie erwartet, erste Schritte und Anwendungsfälle
Domino IQ – Was Sie erwartet, erste Schritte und Anwendungsfälle
panagenda
 
vertical-cnc-processing-centers-drillteq-v-200-en.pdf
vertical-cnc-processing-centers-drillteq-v-200-en.pdfvertical-cnc-processing-centers-drillteq-v-200-en.pdf
vertical-cnc-processing-centers-drillteq-v-200-en.pdf
AmirStern2
 
National Fuels Treatments Initiative: Building a Seamless Map of Hazardous Fu...
National Fuels Treatments Initiative: Building a Seamless Map of Hazardous Fu...National Fuels Treatments Initiative: Building a Seamless Map of Hazardous Fu...
National Fuels Treatments Initiative: Building a Seamless Map of Hazardous Fu...
Safe Software
 
Kubernetes Security Act Now Before It’s Too Late
Kubernetes Security Act Now Before It’s Too LateKubernetes Security Act Now Before It’s Too Late
Kubernetes Security Act Now Before It’s Too Late
Michael Furman
 
Providing an OGC API Processes REST Interface for FME Flow
Providing an OGC API Processes REST Interface for FME FlowProviding an OGC API Processes REST Interface for FME Flow
Providing an OGC API Processes REST Interface for FME Flow
Safe Software
 
Oracle Cloud Infrastructure AI Foundations
Oracle Cloud Infrastructure AI FoundationsOracle Cloud Infrastructure AI Foundations
Oracle Cloud Infrastructure AI Foundations
VICTOR MAESTRE RAMIREZ
 
TrustArc Webinar - 2025 Global Privacy Survey
TrustArc Webinar - 2025 Global Privacy SurveyTrustArc Webinar - 2025 Global Privacy Survey
TrustArc Webinar - 2025 Global Privacy Survey
TrustArc
 
“State-space Models vs. Transformers for Ultra-low-power Edge AI,” a Presenta...
“State-space Models vs. Transformers for Ultra-low-power Edge AI,” a Presenta...“State-space Models vs. Transformers for Ultra-low-power Edge AI,” a Presenta...
“State-space Models vs. Transformers for Ultra-low-power Edge AI,” a Presenta...
Edge AI and Vision Alliance
 
“Solving Tomorrow’s AI Problems Today with Cadence’s Newest Processor,” a Pre...
“Solving Tomorrow’s AI Problems Today with Cadence’s Newest Processor,” a Pre...“Solving Tomorrow’s AI Problems Today with Cadence’s Newest Processor,” a Pre...
“Solving Tomorrow’s AI Problems Today with Cadence’s Newest Processor,” a Pre...
Edge AI and Vision Alliance
 
Domino IQ – What to Expect, First Steps and Use Cases
Domino IQ – What to Expect, First Steps and Use CasesDomino IQ – What to Expect, First Steps and Use Cases
Domino IQ – What to Expect, First Steps and Use Cases
panagenda
 
TimeSeries Machine Learning - PyData London 2025
TimeSeries Machine Learning - PyData London 2025TimeSeries Machine Learning - PyData London 2025
TimeSeries Machine Learning - PyData London 2025
Suyash Joshi
 
Your startup on AWS - How to architect and maintain a Lean and Mean account J...
Your startup on AWS - How to architect and maintain a Lean and Mean account J...Your startup on AWS - How to architect and maintain a Lean and Mean account J...
Your startup on AWS - How to architect and maintain a Lean and Mean account J...
angelo60207
 
Cisco ISE Performance, Scalability and Best Practices.pdf
Cisco ISE Performance, Scalability and Best Practices.pdfCisco ISE Performance, Scalability and Best Practices.pdf
Cisco ISE Performance, Scalability and Best Practices.pdf
superdpz
 
Bridging the divide: A conversation on tariffs today in the book industry - T...
Bridging the divide: A conversation on tariffs today in the book industry - T...Bridging the divide: A conversation on tariffs today in the book industry - T...
Bridging the divide: A conversation on tariffs today in the book industry - T...
BookNet Canada
 
Introduction to Internet of things .ppt.
Introduction to Internet of things .ppt.Introduction to Internet of things .ppt.
Introduction to Internet of things .ppt.
hok12341073
 
Artificial Intelligence in the Nonprofit Boardroom.pdf
Artificial Intelligence in the Nonprofit Boardroom.pdfArtificial Intelligence in the Nonprofit Boardroom.pdf
Artificial Intelligence in the Nonprofit Boardroom.pdf
OnBoard
 
Trends Artificial Intelligence - Mary Meeker
Trends Artificial Intelligence - Mary MeekerTrends Artificial Intelligence - Mary Meeker
Trends Artificial Intelligence - Mary Meeker
Clive Dickens
 
Viral>Wondershare Filmora 14.5.18.12900 Crack Free Download
Viral>Wondershare Filmora 14.5.18.12900 Crack Free DownloadViral>Wondershare Filmora 14.5.18.12900 Crack Free Download
Viral>Wondershare Filmora 14.5.18.12900 Crack Free Download
Puppy jhon
 
AI Agents in Logistics and Supply Chain Applications Benefits and Implementation
AI Agents in Logistics and Supply Chain Applications Benefits and ImplementationAI Agents in Logistics and Supply Chain Applications Benefits and Implementation
AI Agents in Logistics and Supply Chain Applications Benefits and Implementation
Christine Shepherd
 
Your startup on AWS - How to architect and maintain a Lean and Mean account
Your startup on AWS - How to architect and maintain a Lean and Mean accountYour startup on AWS - How to architect and maintain a Lean and Mean account
Your startup on AWS - How to architect and maintain a Lean and Mean account
angelo60207
 
Domino IQ – Was Sie erwartet, erste Schritte und Anwendungsfälle
Domino IQ – Was Sie erwartet, erste Schritte und AnwendungsfälleDomino IQ – Was Sie erwartet, erste Schritte und Anwendungsfälle
Domino IQ – Was Sie erwartet, erste Schritte und Anwendungsfälle
panagenda
 
vertical-cnc-processing-centers-drillteq-v-200-en.pdf
vertical-cnc-processing-centers-drillteq-v-200-en.pdfvertical-cnc-processing-centers-drillteq-v-200-en.pdf
vertical-cnc-processing-centers-drillteq-v-200-en.pdf
AmirStern2
 
National Fuels Treatments Initiative: Building a Seamless Map of Hazardous Fu...
National Fuels Treatments Initiative: Building a Seamless Map of Hazardous Fu...National Fuels Treatments Initiative: Building a Seamless Map of Hazardous Fu...
National Fuels Treatments Initiative: Building a Seamless Map of Hazardous Fu...
Safe Software
 
Kubernetes Security Act Now Before It’s Too Late
Kubernetes Security Act Now Before It’s Too LateKubernetes Security Act Now Before It’s Too Late
Kubernetes Security Act Now Before It’s Too Late
Michael Furman
 
Providing an OGC API Processes REST Interface for FME Flow
Providing an OGC API Processes REST Interface for FME FlowProviding an OGC API Processes REST Interface for FME Flow
Providing an OGC API Processes REST Interface for FME Flow
Safe Software
 
Oracle Cloud Infrastructure AI Foundations
Oracle Cloud Infrastructure AI FoundationsOracle Cloud Infrastructure AI Foundations
Oracle Cloud Infrastructure AI Foundations
VICTOR MAESTRE RAMIREZ
 
TrustArc Webinar - 2025 Global Privacy Survey
TrustArc Webinar - 2025 Global Privacy SurveyTrustArc Webinar - 2025 Global Privacy Survey
TrustArc Webinar - 2025 Global Privacy Survey
TrustArc
 
“State-space Models vs. Transformers for Ultra-low-power Edge AI,” a Presenta...
“State-space Models vs. Transformers for Ultra-low-power Edge AI,” a Presenta...“State-space Models vs. Transformers for Ultra-low-power Edge AI,” a Presenta...
“State-space Models vs. Transformers for Ultra-low-power Edge AI,” a Presenta...
Edge AI and Vision Alliance
 
“Solving Tomorrow’s AI Problems Today with Cadence’s Newest Processor,” a Pre...
“Solving Tomorrow’s AI Problems Today with Cadence’s Newest Processor,” a Pre...“Solving Tomorrow’s AI Problems Today with Cadence’s Newest Processor,” a Pre...
“Solving Tomorrow’s AI Problems Today with Cadence’s Newest Processor,” a Pre...
Edge AI and Vision Alliance
 
Domino IQ – What to Expect, First Steps and Use Cases
Domino IQ – What to Expect, First Steps and Use CasesDomino IQ – What to Expect, First Steps and Use Cases
Domino IQ – What to Expect, First Steps and Use Cases
panagenda
 
TimeSeries Machine Learning - PyData London 2025
TimeSeries Machine Learning - PyData London 2025TimeSeries Machine Learning - PyData London 2025
TimeSeries Machine Learning - PyData London 2025
Suyash Joshi
 
Your startup on AWS - How to architect and maintain a Lean and Mean account J...
Your startup on AWS - How to architect and maintain a Lean and Mean account J...Your startup on AWS - How to architect and maintain a Lean and Mean account J...
Your startup on AWS - How to architect and maintain a Lean and Mean account J...
angelo60207
 
Cisco ISE Performance, Scalability and Best Practices.pdf
Cisco ISE Performance, Scalability and Best Practices.pdfCisco ISE Performance, Scalability and Best Practices.pdf
Cisco ISE Performance, Scalability and Best Practices.pdf
superdpz
 
Bridging the divide: A conversation on tariffs today in the book industry - T...
Bridging the divide: A conversation on tariffs today in the book industry - T...Bridging the divide: A conversation on tariffs today in the book industry - T...
Bridging the divide: A conversation on tariffs today in the book industry - T...
BookNet Canada
 
Introduction to Internet of things .ppt.
Introduction to Internet of things .ppt.Introduction to Internet of things .ppt.
Introduction to Internet of things .ppt.
hok12341073
 
Artificial Intelligence in the Nonprofit Boardroom.pdf
Artificial Intelligence in the Nonprofit Boardroom.pdfArtificial Intelligence in the Nonprofit Boardroom.pdf
Artificial Intelligence in the Nonprofit Boardroom.pdf
OnBoard
 
Trends Artificial Intelligence - Mary Meeker
Trends Artificial Intelligence - Mary MeekerTrends Artificial Intelligence - Mary Meeker
Trends Artificial Intelligence - Mary Meeker
Clive Dickens
 
Viral>Wondershare Filmora 14.5.18.12900 Crack Free Download
Viral>Wondershare Filmora 14.5.18.12900 Crack Free DownloadViral>Wondershare Filmora 14.5.18.12900 Crack Free Download
Viral>Wondershare Filmora 14.5.18.12900 Crack Free Download
Puppy jhon
 
AI Agents in Logistics and Supply Chain Applications Benefits and Implementation
AI Agents in Logistics and Supply Chain Applications Benefits and ImplementationAI Agents in Logistics and Supply Chain Applications Benefits and Implementation
AI Agents in Logistics and Supply Chain Applications Benefits and Implementation
Christine Shepherd
 

strangeloop 2012 apache cassandra anti patterns

  • 2. C* on a SAN ● fact: C* was designed, from the start, for commodity hardware ● more than just not requiring a SAN, C* actually performs better without one ● SPOF ● unnecessary (large) cost ● “(un)coordinated” IO from nodes ● SANs were designed to solve problems C* doesn’t have
  • 3. Commit Log + Data Directory (on the same volume) ● conflicting IO patterns ● commit log is 100% sequential append only ● data directory is (usually) random on reads ● commit log is essentially serialized ● massive difference in write throughput under load ● NB: does not apply to SSDs or EC2
  • 4. Oversize JVM Heaps ● 4 – 8 GB is good (assuming sufficient ram on your boxen) ● 10 – 12 GB is not bad (and often “correct”) ● 16GB == max ● > 16GB => badness ● heap >= boxen RAM => badness
  • 5. oversized JVM heaps (for the visually oriented)
  • 6. not using -pr on scheduled repairs ● -pr is kind of new ● only applies to scheduled repairs ● reduces work to 1/RF (e.g. 1/3)
  • 7. low file handle limit ● C* requires lots of file handles (sorry, deal with it) ● Sockets and SSTables mostly ● 1024 (common default) is not sufficient ● fails in horrible miserably unpredictable ways (though clear from the logs after the fact) ● 32K - 128K is common ● unlimited is also common, but personally I prefer some sort of limit ...
  • 8. Load Balacners (in front of C*) ● clients will load balance (C* has no master so this can work reliably) ● SPOF ● performance bottle neck ● unneeded complexity ● unneeded cost
  • 9. restricting clients to a single node ● why? ● no really, I don’t understand how this was thought to be a good idea ● thedailywtf.com territory
  • 10. Unbalanced Ring ● used to be the number one problem encountered ● OPSC automates the resolution of this to two clicks (do it + confirm) even across multiple data centers ● related: don’t let C* auto pick your tokens, always specify initial_token
  • 11. Row Cache + Slice Queries ● the row cache is a row cache, not a query cache or slice cache or magic cache or WTF-ever-you-thought- it-was cache ● for the obvious impaired: that’s why we called it a row cache – because it caches rows ● laughable performance difference in some extreme cases (e.g. 100X increase in throughput, 10X drop in latency, maxed cpu to under 10% average)
  • 12. Row Cache + Large Rows ● 2GB row? yeah, lets cache that !!! ● related: wtf are you doing trying to read a 2GB row all at once anyway?
  • 13. OPP/BOP ● if you think you need BOP, check again ● no seriously, you’re doing it wrong ● if you use BOP anyway: ● IRC will mock you ● your OPS team will plan your disappearance ● I will setup a auto reply for your entire domain that responds solely with “stop using BOP”
  • 14. Unbounded Batches ● batches are sent as a single message ● they must fit entirely in memory (both server side and client side) ● best size is very much an empirical exercise depending on your HW, load, data model, moon phase, etc (start with 10 – 100 and tune) ● NB: streaming transport will address this in future releases
  • 15. Bad Rotational Math ● rotational disks require seek time ● 5ms is a fast seek time for a rotational disk ● you cannot get thousands of random seeks per second from rotational disks ● caches/memory alleviate this, SSDs solve it ● maths are teh hard? buy SSDs ● everything fits in memory? I don’t care what disks you buy
  • 16. 32 Bit JVMs ● C* deals (usually) with BigData ● 32 bits cannot address BigData ● mmap, file offsets, heaps, caches ● always wrong? no, I guess not ...
  • 17. EBS volumes ● nice in theory, but ... ● not predictable ● freezes common ● outages common ● stripe ephemeral drives instead ● provisioned IOPS EBS? future hazy, ask again later
  • 18. Non-Sun (err, Oracle) JVM ● at least u22, but in general the latest release (unless you have specific reasons otherwise) ● this is changing ● some people (successfully) use OpenJDK anyway
  • 19. Super Columns ● 10-15 percent overhead on reads and writes ● entire super column is always held in memory at all stages ● most C* devs hate working on them ● C* and DataStax is committed to maintaining the API going forward, but they should be avoided for new projects ● composite columns are an alternative
  • 20. Not Running OPSC ● extremely useful postmortem ● trivial (usually) to setup ● DataStax offers a free version (you have no excuse now)
  • 21. Q? Matthew F. Dennis // @mdennis
  • 22. Thank You! Matthew F. Dennis // @mdennis http://slideshare.net/mattdennis