SlideShare a Scribd company logo
第17回Cassandra勉強会: MyCassandra
(24)
•  @sunsuk7tp
•          /P.A. WORKS              /
•                CS M2
•          :
       : HPC
          TSUBAME
          MPI, Cell B.E., GPU CUDA, Hadoop on
              :
         
                          , P2P
                          NoSQL Afternoon in Japan (10.11.1,           )
          SACSIS 2011
•  Web                                    6
     PHP, Perl, JavaScript
    
          Apache Solr, MySQL
          NoSQL
                  NoSQL
•        Jazz,     trumpet
•  Cassandra 0.6.0
     @railute   @yutuki_r                    @techmemo
                                                    Itmedia
                                                      3
                                                    http://lab.jibun.atmarkit.co.jp/entries/1058
+

    NoSQL, Key-Value Store (KVS), Document-Oriented DB, GraphDB
       : memcached, Google Bigtable, Amazon Dynamo, Amazon SimpleDB, Apache Cassandra,
     Voldemort, Ringo, Vpork, MongoDB, CouchDB, Tokyo Tyrant, Flare, ROMA, kumofs, Kai, Redis,
     LevelDB, Hadoop HBase, Hypertable,Yahoo! PNUTS, Scalaris, Dynomite, ThruDB, Neo4j, IBM
     ObjectGrid, Oracle Coherence,       100
                                        :               ↔
      • 
      •  join, transaction
      •                      /
/DC




                                •  decentralized
                                • 

     •  master/slave
     •  data/meta/proxy
     • 
•    •  Map Reduce
• 
  SPOF
     DC
              dc1          dc2



               rack/dc
           region
                     dc3
 
       • 
   
       •  (   )                          <<                       &
       •                    , correlated failure
      SPOF = “                  ”
   
       •          : 1
       •                :            /
( ) Daniel Ford et. al. (Google), “Availability in Globally Distributed Storage Systems”, OSDI 2010
 


     ⇒               !!

         SPOF



                ~
 SPOF
         decentralized
     •    proxy/master/slave
 
Consistent Hashing (                                       )
    
(A~Z                )
           N := 3                      ID

           A                F
       Z                                        •  request proxy
                        secondary 1
                                                •          primary node
                         Q                      •             secondary node
          V                           N
       primary                    secondary 2
                           hash(key) = Q
                        key   values
MyCassandra
SQL                                     map


                    Megastore
                     library
   relational
  data model              table
                    (multi-dimentional
                      sorted map)


(sorted) records     (sorted) map
                                         (sorted) map
   + indices           + indices

    RDB            Bigtable                KVS
 NoSQL
PNUTS (VLDB ‘08): MySQL NoSQL   YCSB (SOCC ’10):
Write-Heavy       Read-Heavy



                                  write-
                                optimized
Better




                 read-                        read-
               optimized                    optimized



                 write-
               optimized
Apache HBase       write optimized   Bigtable like   centralized
     Apache Cassandra   write optimized   Bigtable like   decentralized
     Sharded MySQL      read optimized    MySQL           centralized
     Yahoo! Sherpa      read optimized    MySQL           centralized



       :


⇒              Cassandra                             MySQL
MyCassandra
= Dynamo + Bigtable
= Dynamo + Bigtable
      (P2P/decentralized)
= Dynamo +
     (P2P/decentralized)
                         RDBMS
            Table


               •      /
               • 
               • 




              NoSQL               !!
     query
MyCassandra
= Dynamo +
     (P2P/decentralized)
MySQL
= Dynamo +   Bigtable
              Redis
                :
第17回Cassandra勉強会: MyCassandra
1


    (master/worker, sharding,
       consistent hashing)




                     •  cache / persistence
                     •  index
                     •  write/read-optimized
                     • 
+




    MyCassandra
  InnoDB (MySQL 5.1~     )
  MyISAM
  Memory
  Merge
  Archive
  Federated
  NDB
  CSV
  Blackhole ( )
  FALCON
  MariaDB
  Drizzle              InnoDB/MyISAM
  solidDB
                        MySQL Cluster
  :
  MySQL:
  Bigtable:   Cassandra
  Redis:       /      snapshot
  MongoDB:                       DB




     
     
                                 decentralized
     • 
            RDB (MySQL / PostgreSQL)
     •  master/slave     decentralized
          MongoDB / Redis

 
     •  MapReduce
                 MySQL                                 Bigtable
          MySQL (InnoDB) INSERT
             Bigtable                     INSERT/GET
     • 
                 /              /

  EC2+RDS                 MyCassandra
            /
 I/O
     •  Bigtable (LSM-tree)
     •  MySQL (B-trees/ )
     •  Redis (Hash)
     •  MongoDB (B-tree)
     •  KyotoCabinet (B+ tree/hash)
hash           B-Trees          LSM-Tree
 write                  1   random I/O   append
 read                   1   random I/O   N    random I/O
                                         + merge
                cache
         Memcached,     MySQL,           Cassandra,
         Redis,         MongoDB,         HBase,
         KyotoCabinet   KyotoCabinet     LevelDB

 


 
+
                                : O(1)
                                     sequential write
                                I/O
      Always   writable
                                write-lock               memory
                                      sync               <k1, obj (v1+v2)> async flush
         write path                                 Memtable
     LSM-Tree [P. O’Neil ‘96]
                                                        disk
                                                    <k1, v1>, <k1, v2>
                                                   Commit Log
                                   sequential
         disk          mem                                                 <k1,obj1>
                                   write             SSTable 1
                                                                           <k1,obj2>
                                                     SSTable 2
                                                                           <k1,obj3>
                                                     SSTable 3
    SSTable
+

    Key
      •  Memtable           value
      •  SSTable                value
                                  I/O
     disk                                                      memory
                                                 <k1,obj>
                                                             Memtable

             disk               mem                             disk
                               <k1,obj+obj1~3>
                                                             Commit Log
                   client            merge
                                                 <k1,obj1>
                                                              SSTable 1
                               I/O               <k1,obj2>
                                                              SSTable 2
                                                 <k1,obj3>
                                                              SSTable 3
+
                                                        (     / 99.9%)

                                                  1/9
                                 Better
                                        read                             write
                                 avg.   6.16 ms
Number of queries




                                                                         read




                                                             Latency (ms)

                       write                                write: 2.0 ms
                    avg. 0.69 ms                            read: 86.9 ms
                                                            99.9 percentile
                               Latency (ms)
Max. QPS for 40 Clients           Bigtable
                                                MySQL
40000
                                                Redis
35000
30000
25000
20000
15000
10000
5000                                                       Better
   0

 (qps) Write Only   Write Heavy   Read Heavy   Read Only
           /            /
        /99%/Max/
 
 
                     ( KB~       MB)
    HDD/SSD
                  (zipfian, uniform, latest)
 
     •  Embedded InnoDB, KyotoCabinet

#                               ( )
select
proxy
  client
                                client
    •  o.a.c.cli
    •  o.a.c.avro/thrift                                      server
  proxy
    •  o.a.c.service.StorageProxy
  server                                                     engine
   •  o.a.c.service.StorageService
   •  o.a.c.db.ReadVerbHandler/RowMutationVerbHandler
  engine
   •  o.a.c.db.Table (keyspace       )
        o.a.c.db.commitlog
        o.a.c.db.ColumnFamilyStore (columnfamily       )
         o.a.c.db.engine.StorageEngineInterface
         o.a.c.db.engine.MySQLInstance, RedisInstance, MongoDBInstance, …
 
     •  put (key, cf)
                                               OK
     •  get (key)
     •  getRangeSlice (startWith, engWith, maxResults)
     •  truncate/dropTable/dropDB
 
     •  secondaryIndex
     •  expire
     •  counter (Cassandra-0.8    )
    Cassandra
     •         : keyspace – columnfamily – column
     •              key/value(             )
     • 
            ColumnFamily        SSTable <key, value>
            value: columnFamily
     Keyspace
                  ColumnFamily A                      ColumnFamily B
      key col gender     age      region     key       col visits plan
      sato      male     17       [null]     sato         18     Gold
      suzuki    female   21       Tokyo      suzuki       214    Bronze

                              Bigtable (Cassandra)
          Cassandra
     •  Super Column
 SSTable              key-value
     • 

                                  KVS
 key prefix
     • 
Cassandra       MySQL      Redis

keyspace        database   db

column family   table      record

column          field
database                                                                   db
             table A                             table B                  key            values
key        values                     key      values
                                                                          A:sato         …
sato       gender;male;age;17         sato     visits;18;plan;Gold
                                                                          B:ito          …
suzuki     gender;female;age;         suzuki visits;
                                                                          A:suzuki       …
           21;region;Tokyo                   214;plan;Bronze
                                                                          B:tanaka       …
                           RDB (MySQL)
                                                                             KVS (Redis)
       keyspace
                       columnfamily A                        columnfamily B
        key col gender       age      region        key       col visits plan
        sato      male       17       [null]        sato             18           Gold
        suzuki    female     21       Tokyo         suzuki           214          Bronze

                                  Bigtable (Cassandra)
 
      •  MySQL database = keyspace :=>
           MyCassandra (MySQL)
      •  MySQL table = keyspace :=>
           Cassandra               Bigtable (Cassandra)
keyspace
                 columnfamily A                           columnfamily B
  key col gender          age     region         key       col visits plan
  sato       male         17      [null]         sato            18    Gold
  suzuki     female       21      Tokyo          suzuki          214   Bronze

                                                                  MySQL
                           gender          age   region    visits      plan
                 sato      male            17    [null]    18          Gold
         Table
                 suzuki    female          21    Tokyo     214         Bronze
 


                                         1
 
 secondary index
     rowKey CF          counter   secondary   token
                                  index
           Serialized
           Object
     Key     Value

                         Key-Value KVS                …
 
     • 

     • 
     • 

                write query            read query

            sync         async    async         sync


            W             R        W                R
          Bigtable       MySQL   Bigtable      MySQL
•  W:
                                •  R:
                                •  RW:
 
 
                                    write query

                                sync              async


                                W                  R
 
Quorum Protocol:   (   )+   (          )>     (          )
     • 

                                write             read



                                W        RW       R
•  :
                                                               •  R:
                                                               •  RW:

 =3, =2
                                    Client
W:RW:R = 1:1:1              Proxy
                                             1) 


                                             2)    W, RW

                      ACK
                                                                        ACK

                                             3a)
      W          RW           R
                                             3b)           R

                                                                          ACK
                 : max (W, RW)
•  :
                                                             •  R:
                                                             •  RW:
 =3, =2
W:RW:R = 1:1:1                   Client
                     Proxy
                                          1) 


                                          2)    R, RW

                                          3a)
                                          3b)       or
                                                W
     W     RW         R
                                          4) 
                                                  .
                                                (Cassandra read repair   )
                 : max (R, RW)
20000                                              Cassandra
                    0.90      max. qps for 40 clients    MyCassandra Cluster
      18000
      16000                                                   6.49
      14000
      12000                                     1.54
                                     0.93
      10000
Better 8000
       6000
       4000
       2000
          0
                  [100:0]          [50:50]        [5:95]        [0:100] [write:read]
   (query/sec)   Write-Only      Write-Heavy   Read-Heavy      Read-Only

                    Write Heavy                    Read Heavy
                  • YCSB / Zipfian
                  •                                     6.49
                  • 
  https://github.com/sunsuk7tp/MyCassandra
  MyCassandra-0.2.0 (      )
     •  based on Cassandra-0.7.5
     •  Baseic CRUD on a simple record
     •  RangeSlice
     •  keyspace
1.         cassandra.yaml
      •       engine host, port, …
      •     default engine
2.                                         (           )
3.         MyCassandra               (Cassandra   )
4.                            or           keyspace,
           columnfamily
      •     engine              (keyspace   )
      •                   (column family  )
    Embedded InnoDB
     •  HailDB:                      …
     •  Handler Socket:                            …
     •  ExtraDB
     •  API
    DBM (KyotoCabinet)
     •  KyotoCassandra/Kyossandra/ ssandra (   )
     • 
     •  NoSQL
     •  QDBM, TC       Hash or B+Tree db
•            /
•  hash/B+tree
• 
class            persistence   algorithm        lock unit
ProtoHashDB      volatile      hash             whole (rwlock)
ProtoTreeDB                    red black tree   whole (rwlock)
StashDB                        hash             record (rwlock)
CacheDB                        hash             record (mutex)
GrassDB                        B+ tree          page (rwlock)
HashDB           persistent    hash             record (rwlock)
TreeDB                         B+ tree          page (rwlock)
DirDB                          undefined        record (rwlock)
ForestDB                       B+ tree          page (rwlock)
 MyCassandra-0.2.2
 •  secondaryIndex
      MySQL MongoDB
 MyCassandra-0.3.0
 •  Based on Cassandra-0.8
 •  Atomic counter
 •  Brisk (Hadoop + Cassandra)…
1. 
2. 
3. 
    Cassandra             /expire
     •  tombstone
     •                SSTable
     •  Bigtable like

 MyCassandra            Bigtable
  • 
  •  expire
  • 
       1  Table
 




 


 instance        instance   instance

         ping                        detect
engine          engine      engine            instance   ?
                                                             ?

                node down                                ?
 
     • 

            Redis
            MongoDB
           
     • 
                 key
            Join

 
 
     • 
 
     •  Cassandra-0.6         :
               GC
           
     •  Cassandra-0.7, 0.8:
         
         
         
         

                                 …
 Issue
  •  https://github.com/sunsuk7tp/MyCassandra/issues
 Twitter
  •  @MyCassandraJP
  •  @_MyCassandra # @MyCassandra                orz
  •  @sunsuk7tp #


 Google    Groups
  •  https://groups.google.com/group/my-cassandra
               / @railute
     •                       Cassandra
    Gemini Mobile Technologies / @geminimobile
     •              Hibari
               / @yutuki_r
     •  Cassandra               twitter
    dann / @techmemo
     •  Cassandra
               / @tatsuya6502
     •  YCSB         , Hibari
               / @mikio1978 / @fallabs
     •  KyotoCabinet
             / @muga_nishizawa
           / @Nakata_itpro
             / @shudo
    Cassandra
 
    UST                        (         )
第17回Cassandra勉強会: MyCassandra

More Related Content

What's hot (19)

Key-Value-Stores -- The Key to Scaling?
Key-Value-Stores -- The Key to Scaling?Key-Value-Stores -- The Key to Scaling?
Key-Value-Stores -- The Key to Scaling?
Tim Lossen
 
Boosting Machine Learning with Redis Modules and Spark
Boosting Machine Learning with Redis Modules and SparkBoosting Machine Learning with Redis Modules and Spark
Boosting Machine Learning with Redis Modules and Spark
Dvir Volk
 
Evaluating NoSQL Performance: Time for Benchmarking
Evaluating NoSQL Performance: Time for BenchmarkingEvaluating NoSQL Performance: Time for Benchmarking
Evaluating NoSQL Performance: Time for Benchmarking
Sergey Bushik
 
Cassandra vs. Redis
Cassandra vs. RedisCassandra vs. Redis
Cassandra vs. Redis
Tim Lossen
 
MyRocks Deep Dive
MyRocks Deep DiveMyRocks Deep Dive
MyRocks Deep Dive
Yoshinori Matsunobu
 
Scaling HDFS to Manage Billions of Files
Scaling HDFS to Manage Billions of FilesScaling HDFS to Manage Billions of Files
Scaling HDFS to Manage Billions of Files
Haohui Mai
 
Millions of Regions in HBase: Size Matters
Millions of Regions in HBase: Size MattersMillions of Regions in HBase: Size Matters
Millions of Regions in HBase: Size Matters
DataWorks Summit
 
Cassandra Talk: Austin JUG
Cassandra Talk: Austin JUGCassandra Talk: Austin JUG
Cassandra Talk: Austin JUG
Stu Hood
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
Arnab Mitra
 
On Rails with Apache Cassandra
On Rails with Apache CassandraOn Rails with Apache Cassandra
On Rails with Apache Cassandra
Stu Hood
 
Bluestore
BluestoreBluestore
Bluestore
Patrick McGarry
 
NoSQL: Cassadra vs. HBase
NoSQL: Cassadra vs. HBaseNoSQL: Cassadra vs. HBase
NoSQL: Cassadra vs. HBase
Antonio Severien
 
NewSQL
NewSQLNewSQL
NewSQL
hyeongchae lee
 
Ceph and RocksDB
Ceph and RocksDBCeph and RocksDB
Ceph and RocksDB
Sage Weil
 
Benchmarking MongoDB and CouchBase
Benchmarking MongoDB and CouchBaseBenchmarking MongoDB and CouchBase
Benchmarking MongoDB and CouchBase
Christopher Choi
 
NewSQL vs NoSQL for New OLTP
NewSQL vs NoSQL for New OLTPNewSQL vs NoSQL for New OLTP
NewSQL vs NoSQL for New OLTP
DATAVERSITY
 
Developers summit cassandraで見るNoSQL
Developers summit cassandraで見るNoSQLDevelopers summit cassandraで見るNoSQL
Developers summit cassandraで見るNoSQL
Ryu Kobayashi
 
Database Sharding the Right Way: Easy, Reliable, and Open source - HighLoad++...
Database Sharding the Right Way: Easy, Reliable, and Open source - HighLoad++...Database Sharding the Right Way: Easy, Reliable, and Open source - HighLoad++...
Database Sharding the Right Way: Easy, Reliable, and Open source - HighLoad++...
CUBRID
 
Intro to big data choco devday - 23-01-2014
Intro to big data   choco devday - 23-01-2014Intro to big data   choco devday - 23-01-2014
Intro to big data choco devday - 23-01-2014
Hassan Islamov
 
Key-Value-Stores -- The Key to Scaling?
Key-Value-Stores -- The Key to Scaling?Key-Value-Stores -- The Key to Scaling?
Key-Value-Stores -- The Key to Scaling?
Tim Lossen
 
Boosting Machine Learning with Redis Modules and Spark
Boosting Machine Learning with Redis Modules and SparkBoosting Machine Learning with Redis Modules and Spark
Boosting Machine Learning with Redis Modules and Spark
Dvir Volk
 
Evaluating NoSQL Performance: Time for Benchmarking
Evaluating NoSQL Performance: Time for BenchmarkingEvaluating NoSQL Performance: Time for Benchmarking
Evaluating NoSQL Performance: Time for Benchmarking
Sergey Bushik
 
Cassandra vs. Redis
Cassandra vs. RedisCassandra vs. Redis
Cassandra vs. Redis
Tim Lossen
 
Scaling HDFS to Manage Billions of Files
Scaling HDFS to Manage Billions of FilesScaling HDFS to Manage Billions of Files
Scaling HDFS to Manage Billions of Files
Haohui Mai
 
Millions of Regions in HBase: Size Matters
Millions of Regions in HBase: Size MattersMillions of Regions in HBase: Size Matters
Millions of Regions in HBase: Size Matters
DataWorks Summit
 
Cassandra Talk: Austin JUG
Cassandra Talk: Austin JUGCassandra Talk: Austin JUG
Cassandra Talk: Austin JUG
Stu Hood
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
Arnab Mitra
 
On Rails with Apache Cassandra
On Rails with Apache CassandraOn Rails with Apache Cassandra
On Rails with Apache Cassandra
Stu Hood
 
Ceph and RocksDB
Ceph and RocksDBCeph and RocksDB
Ceph and RocksDB
Sage Weil
 
Benchmarking MongoDB and CouchBase
Benchmarking MongoDB and CouchBaseBenchmarking MongoDB and CouchBase
Benchmarking MongoDB and CouchBase
Christopher Choi
 
NewSQL vs NoSQL for New OLTP
NewSQL vs NoSQL for New OLTPNewSQL vs NoSQL for New OLTP
NewSQL vs NoSQL for New OLTP
DATAVERSITY
 
Developers summit cassandraで見るNoSQL
Developers summit cassandraで見るNoSQLDevelopers summit cassandraで見るNoSQL
Developers summit cassandraで見るNoSQL
Ryu Kobayashi
 
Database Sharding the Right Way: Easy, Reliable, and Open source - HighLoad++...
Database Sharding the Right Way: Easy, Reliable, and Open source - HighLoad++...Database Sharding the Right Way: Easy, Reliable, and Open source - HighLoad++...
Database Sharding the Right Way: Easy, Reliable, and Open source - HighLoad++...
CUBRID
 
Intro to big data choco devday - 23-01-2014
Intro to big data   choco devday - 23-01-2014Intro to big data   choco devday - 23-01-2014
Intro to big data choco devday - 23-01-2014
Hassan Islamov
 

Similar to 第17回Cassandra勉強会: MyCassandra (20)

MyCassandra (Full English Version)
MyCassandra (Full English Version)MyCassandra (Full English Version)
MyCassandra (Full English Version)
Shun Nakamura
 
Spring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_dataSpring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_data
Roger Xia
 
What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010
jbellis
 
Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop
Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoopJava one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop
Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop
srisatish ambati
 
Drop acid
Drop acidDrop acid
Drop acid
Mike Feltman
 
Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)
Chris Richardson
 
High-Performance Storage Services with HailDB and Java
High-Performance Storage Services with HailDB and JavaHigh-Performance Storage Services with HailDB and Java
High-Performance Storage Services with HailDB and Java
sunnygleason
 
Accelerating NoSQL
Accelerating NoSQLAccelerating NoSQL
Accelerating NoSQL
sunnygleason
 
No sql solutions - 공개용
No sql solutions - 공개용No sql solutions - 공개용
No sql solutions - 공개용
Byeongweon Moon
 
20140614 introduction to spark-ben white
20140614 introduction to spark-ben white20140614 introduction to spark-ben white
20140614 introduction to spark-ben white
Data Con LA
 
MyRocks introduction and production deployment
MyRocks introduction and production deploymentMyRocks introduction and production deployment
MyRocks introduction and production deployment
Yoshinori Matsunobu
 
MongoDB, RabbitMQ y Applicaciones en Nube
MongoDB, RabbitMQ y Applicaciones en NubeMongoDB, RabbitMQ y Applicaciones en Nube
MongoDB, RabbitMQ y Applicaciones en Nube
Socialmetrix
 
Cassandra Explained
Cassandra ExplainedCassandra Explained
Cassandra Explained
Eric Evans
 
Qubole Overview at the Fifth Elephant Conference
Qubole Overview at the Fifth Elephant ConferenceQubole Overview at the Fifth Elephant Conference
Qubole Overview at the Fifth Elephant Conference
Joydeep Sen Sarma
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
Yan Cui
 
Spark After Dark - LA Apache Spark Users Group - Feb 2015
Spark After Dark - LA Apache Spark Users Group - Feb 2015Spark After Dark - LA Apache Spark Users Group - Feb 2015
Spark After Dark - LA Apache Spark Users Group - Feb 2015
Chris Fregly
 
Spark after Dark by Chris Fregly of Databricks
Spark after Dark by Chris Fregly of DatabricksSpark after Dark by Chris Fregly of Databricks
Spark after Dark by Chris Fregly of Databricks
Data Con LA
 
Mongodb lab
Mongodb labMongodb lab
Mongodb lab
Bas van Oudenaarde
 
Scaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value StoresScaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value Stores
DataWorks Summit
 
MySQL Cluster Scaling to a Billion Queries
MySQL Cluster Scaling to a Billion QueriesMySQL Cluster Scaling to a Billion Queries
MySQL Cluster Scaling to a Billion Queries
Bernd Ocklin
 
MyCassandra (Full English Version)
MyCassandra (Full English Version)MyCassandra (Full English Version)
MyCassandra (Full English Version)
Shun Nakamura
 
Spring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_dataSpring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_data
Roger Xia
 
What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010
jbellis
 
Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop
Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoopJava one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop
Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop
srisatish ambati
 
Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)
Chris Richardson
 
High-Performance Storage Services with HailDB and Java
High-Performance Storage Services with HailDB and JavaHigh-Performance Storage Services with HailDB and Java
High-Performance Storage Services with HailDB and Java
sunnygleason
 
Accelerating NoSQL
Accelerating NoSQLAccelerating NoSQL
Accelerating NoSQL
sunnygleason
 
No sql solutions - 공개용
No sql solutions - 공개용No sql solutions - 공개용
No sql solutions - 공개용
Byeongweon Moon
 
20140614 introduction to spark-ben white
20140614 introduction to spark-ben white20140614 introduction to spark-ben white
20140614 introduction to spark-ben white
Data Con LA
 
MyRocks introduction and production deployment
MyRocks introduction and production deploymentMyRocks introduction and production deployment
MyRocks introduction and production deployment
Yoshinori Matsunobu
 
MongoDB, RabbitMQ y Applicaciones en Nube
MongoDB, RabbitMQ y Applicaciones en NubeMongoDB, RabbitMQ y Applicaciones en Nube
MongoDB, RabbitMQ y Applicaciones en Nube
Socialmetrix
 
Cassandra Explained
Cassandra ExplainedCassandra Explained
Cassandra Explained
Eric Evans
 
Qubole Overview at the Fifth Elephant Conference
Qubole Overview at the Fifth Elephant ConferenceQubole Overview at the Fifth Elephant Conference
Qubole Overview at the Fifth Elephant Conference
Joydeep Sen Sarma
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
Yan Cui
 
Spark After Dark - LA Apache Spark Users Group - Feb 2015
Spark After Dark - LA Apache Spark Users Group - Feb 2015Spark After Dark - LA Apache Spark Users Group - Feb 2015
Spark After Dark - LA Apache Spark Users Group - Feb 2015
Chris Fregly
 
Spark after Dark by Chris Fregly of Databricks
Spark after Dark by Chris Fregly of DatabricksSpark after Dark by Chris Fregly of Databricks
Spark after Dark by Chris Fregly of Databricks
Data Con LA
 
Scaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value StoresScaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value Stores
DataWorks Summit
 
MySQL Cluster Scaling to a Billion Queries
MySQL Cluster Scaling to a Billion QueriesMySQL Cluster Scaling to a Billion Queries
MySQL Cluster Scaling to a Billion Queries
Bernd Ocklin
 

More from Shun Nakamura (8)

HBase at LINE
HBase at LINEHBase at LINE
HBase at LINE
Shun Nakamura
 
MyCassandra: A Cloud Storage Supporting both Read Heavy and Write Heavy Workl...
MyCassandra: A Cloud Storage Supporting both Read Heavy and Write Heavy Workl...MyCassandra: A Cloud Storage Supporting both Read Heavy and Write Heavy Workl...
MyCassandra: A Cloud Storage Supporting both Read Heavy and Write Heavy Workl...
Shun Nakamura
 
シリコンバレーに行ってきた!
シリコンバレーに行ってきた!シリコンバレーに行ってきた!
シリコンバレーに行ってきた!
Shun Nakamura
 
MyCassandra
MyCassandraMyCassandra
MyCassandra
Shun Nakamura
 
読み出し性能と書き込み性能を両立させるクラウドストレージ (OS-117-24)
読み出し性能と書き込み性能を両立させるクラウドストレージ (OS-117-24)読み出し性能と書き込み性能を両立させるクラウドストレージ (OS-117-24)
読み出し性能と書き込み性能を両立させるクラウドストレージ (OS-117-24)
Shun Nakamura
 
読み出し性能と書き込み性能を選択可能なクラウドストレージ (DEIM2011-C3-3)
読み出し性能と書き込み性能を選択可能なクラウドストレージ (DEIM2011-C3-3)読み出し性能と書き込み性能を選択可能なクラウドストレージ (DEIM2011-C3-3)
読み出し性能と書き込み性能を選択可能なクラウドストレージ (DEIM2011-C3-3)
Shun Nakamura
 
Cassandra勉強会
Cassandra勉強会Cassandra勉強会
Cassandra勉強会
Shun Nakamura
 
ComSys WIP
ComSys WIPComSys WIP
ComSys WIP
Shun Nakamura
 
MyCassandra: A Cloud Storage Supporting both Read Heavy and Write Heavy Workl...
MyCassandra: A Cloud Storage Supporting both Read Heavy and Write Heavy Workl...MyCassandra: A Cloud Storage Supporting both Read Heavy and Write Heavy Workl...
MyCassandra: A Cloud Storage Supporting both Read Heavy and Write Heavy Workl...
Shun Nakamura
 
シリコンバレーに行ってきた!
シリコンバレーに行ってきた!シリコンバレーに行ってきた!
シリコンバレーに行ってきた!
Shun Nakamura
 
読み出し性能と書き込み性能を両立させるクラウドストレージ (OS-117-24)
読み出し性能と書き込み性能を両立させるクラウドストレージ (OS-117-24)読み出し性能と書き込み性能を両立させるクラウドストレージ (OS-117-24)
読み出し性能と書き込み性能を両立させるクラウドストレージ (OS-117-24)
Shun Nakamura
 
読み出し性能と書き込み性能を選択可能なクラウドストレージ (DEIM2011-C3-3)
読み出し性能と書き込み性能を選択可能なクラウドストレージ (DEIM2011-C3-3)読み出し性能と書き込み性能を選択可能なクラウドストレージ (DEIM2011-C3-3)
読み出し性能と書き込み性能を選択可能なクラウドストレージ (DEIM2011-C3-3)
Shun Nakamura
 

Recently uploaded (20)

Rens van de Schoot - Mensen, machines en de zoektocht naar het laatste releva...
Rens van de Schoot - Mensen, machines en de zoektocht naar het laatste releva...Rens van de Schoot - Mensen, machines en de zoektocht naar het laatste releva...
Rens van de Schoot - Mensen, machines en de zoektocht naar het laatste releva...
voginip
 
Harnessing the Power of AI in Salesforce.pdf
Harnessing the Power of AI in Salesforce.pdfHarnessing the Power of AI in Salesforce.pdf
Harnessing the Power of AI in Salesforce.pdf
rabiabajaj1
 
Packaging your App for AppExchange – Managed Vs Unmanaged.pptx
Packaging your App for AppExchange – Managed Vs Unmanaged.pptxPackaging your App for AppExchange – Managed Vs Unmanaged.pptx
Packaging your App for AppExchange – Managed Vs Unmanaged.pptx
mohayyudin7826
 
Designing for Multiple Blockchains in Industry Ecosystems
Designing for Multiple Blockchains in Industry EcosystemsDesigning for Multiple Blockchains in Industry Ecosystems
Designing for Multiple Blockchains in Industry Ecosystems
Dilum Bandara
 
The Death of the Browser - Rachel-Lee Nabors, AgentQL
The Death of the Browser - Rachel-Lee Nabors, AgentQLThe Death of the Browser - Rachel-Lee Nabors, AgentQL
The Death of the Browser - Rachel-Lee Nabors, AgentQL
All Things Open
 
Presentation Session 2 -Context Grounding.pdf
Presentation Session 2 -Context Grounding.pdfPresentation Session 2 -Context Grounding.pdf
Presentation Session 2 -Context Grounding.pdf
Mukesh Kala
 
Achieving Extreme Scale with ScyllaDB: Tips & Tradeoffs
Achieving Extreme Scale with ScyllaDB: Tips & TradeoffsAchieving Extreme Scale with ScyllaDB: Tips & Tradeoffs
Achieving Extreme Scale with ScyllaDB: Tips & Tradeoffs
ScyllaDB
 
Delivering your own state-of-the-art enterprise LLMs
Delivering your own state-of-the-art enterprise LLMsDelivering your own state-of-the-art enterprise LLMs
Delivering your own state-of-the-art enterprise LLMs
AI Infra Forum
 
Ansible Variables in Playbook - RHCE.pdf
Ansible Variables in Playbook - RHCE.pdfAnsible Variables in Playbook - RHCE.pdf
Ansible Variables in Playbook - RHCE.pdf
RHCSA Guru
 
Event-driven and serverless in the world of IoT
Event-driven and serverless in the world of IoTEvent-driven and serverless in the world of IoT
Event-driven and serverless in the world of IoT
Jimmy Dahlqvist
 
Scalable Multi-Agent AI with AutoGen by Udai
Scalable Multi-Agent AI with AutoGen by UdaiScalable Multi-Agent AI with AutoGen by Udai
Scalable Multi-Agent AI with AutoGen by Udai
Udaiappa Ramachandran
 
Leveraging Pre-Trained Transformer Models for Protein Function Prediction - T...
Leveraging Pre-Trained Transformer Models for Protein Function Prediction - T...Leveraging Pre-Trained Transformer Models for Protein Function Prediction - T...
Leveraging Pre-Trained Transformer Models for Protein Function Prediction - T...
All Things Open
 
You Don't Need an AI Strategy, But You Do Need to Be Strategic About AI - Jes...
You Don't Need an AI Strategy, But You Do Need to Be Strategic About AI - Jes...You Don't Need an AI Strategy, But You Do Need to Be Strategic About AI - Jes...
You Don't Need an AI Strategy, But You Do Need to Be Strategic About AI - Jes...
All Things Open
 
Making GenAI Work: A structured approach to implementation
Making GenAI Work: A structured approach to implementationMaking GenAI Work: A structured approach to implementation
Making GenAI Work: A structured approach to implementation
Jeffrey Funk
 
TrustArc Webinar: Strategies for Future-Proofing Privacy for Healthcare
TrustArc Webinar: Strategies for Future-Proofing Privacy for HealthcareTrustArc Webinar: Strategies for Future-Proofing Privacy for Healthcare
TrustArc Webinar: Strategies for Future-Proofing Privacy for Healthcare
TrustArc
 
Ansible Vault Encrypting and Protecting Secrets - RHCE.pdf
Ansible Vault Encrypting and Protecting Secrets - RHCE.pdfAnsible Vault Encrypting and Protecting Secrets - RHCE.pdf
Ansible Vault Encrypting and Protecting Secrets - RHCE.pdf
RHCSA Guru
 
From Strategy To Execution In Hypergrowth
From Strategy To Execution In HypergrowthFrom Strategy To Execution In Hypergrowth
From Strategy To Execution In Hypergrowth
Pete Nieminen
 
Windows Client Privilege Escalation-Shared.pptx
Windows Client Privilege Escalation-Shared.pptxWindows Client Privilege Escalation-Shared.pptx
Windows Client Privilege Escalation-Shared.pptx
Oddvar Moe
 
Build with AI on Google Cloud Session #5
Build with AI on Google Cloud Session #5Build with AI on Google Cloud Session #5
Build with AI on Google Cloud Session #5
Margaret Maynard-Reid
 
Comparative Analysis of Reasoning Techniques
Comparative Analysis of Reasoning TechniquesComparative Analysis of Reasoning Techniques
Comparative Analysis of Reasoning Techniques
HoussemEddineDEGHA
 
Rens van de Schoot - Mensen, machines en de zoektocht naar het laatste releva...
Rens van de Schoot - Mensen, machines en de zoektocht naar het laatste releva...Rens van de Schoot - Mensen, machines en de zoektocht naar het laatste releva...
Rens van de Schoot - Mensen, machines en de zoektocht naar het laatste releva...
voginip
 
Harnessing the Power of AI in Salesforce.pdf
Harnessing the Power of AI in Salesforce.pdfHarnessing the Power of AI in Salesforce.pdf
Harnessing the Power of AI in Salesforce.pdf
rabiabajaj1
 
Packaging your App for AppExchange – Managed Vs Unmanaged.pptx
Packaging your App for AppExchange – Managed Vs Unmanaged.pptxPackaging your App for AppExchange – Managed Vs Unmanaged.pptx
Packaging your App for AppExchange – Managed Vs Unmanaged.pptx
mohayyudin7826
 
Designing for Multiple Blockchains in Industry Ecosystems
Designing for Multiple Blockchains in Industry EcosystemsDesigning for Multiple Blockchains in Industry Ecosystems
Designing for Multiple Blockchains in Industry Ecosystems
Dilum Bandara
 
The Death of the Browser - Rachel-Lee Nabors, AgentQL
The Death of the Browser - Rachel-Lee Nabors, AgentQLThe Death of the Browser - Rachel-Lee Nabors, AgentQL
The Death of the Browser - Rachel-Lee Nabors, AgentQL
All Things Open
 
Presentation Session 2 -Context Grounding.pdf
Presentation Session 2 -Context Grounding.pdfPresentation Session 2 -Context Grounding.pdf
Presentation Session 2 -Context Grounding.pdf
Mukesh Kala
 
Achieving Extreme Scale with ScyllaDB: Tips & Tradeoffs
Achieving Extreme Scale with ScyllaDB: Tips & TradeoffsAchieving Extreme Scale with ScyllaDB: Tips & Tradeoffs
Achieving Extreme Scale with ScyllaDB: Tips & Tradeoffs
ScyllaDB
 
Delivering your own state-of-the-art enterprise LLMs
Delivering your own state-of-the-art enterprise LLMsDelivering your own state-of-the-art enterprise LLMs
Delivering your own state-of-the-art enterprise LLMs
AI Infra Forum
 
Ansible Variables in Playbook - RHCE.pdf
Ansible Variables in Playbook - RHCE.pdfAnsible Variables in Playbook - RHCE.pdf
Ansible Variables in Playbook - RHCE.pdf
RHCSA Guru
 
Event-driven and serverless in the world of IoT
Event-driven and serverless in the world of IoTEvent-driven and serverless in the world of IoT
Event-driven and serverless in the world of IoT
Jimmy Dahlqvist
 
Scalable Multi-Agent AI with AutoGen by Udai
Scalable Multi-Agent AI with AutoGen by UdaiScalable Multi-Agent AI with AutoGen by Udai
Scalable Multi-Agent AI with AutoGen by Udai
Udaiappa Ramachandran
 
Leveraging Pre-Trained Transformer Models for Protein Function Prediction - T...
Leveraging Pre-Trained Transformer Models for Protein Function Prediction - T...Leveraging Pre-Trained Transformer Models for Protein Function Prediction - T...
Leveraging Pre-Trained Transformer Models for Protein Function Prediction - T...
All Things Open
 
You Don't Need an AI Strategy, But You Do Need to Be Strategic About AI - Jes...
You Don't Need an AI Strategy, But You Do Need to Be Strategic About AI - Jes...You Don't Need an AI Strategy, But You Do Need to Be Strategic About AI - Jes...
You Don't Need an AI Strategy, But You Do Need to Be Strategic About AI - Jes...
All Things Open
 
Making GenAI Work: A structured approach to implementation
Making GenAI Work: A structured approach to implementationMaking GenAI Work: A structured approach to implementation
Making GenAI Work: A structured approach to implementation
Jeffrey Funk
 
TrustArc Webinar: Strategies for Future-Proofing Privacy for Healthcare
TrustArc Webinar: Strategies for Future-Proofing Privacy for HealthcareTrustArc Webinar: Strategies for Future-Proofing Privacy for Healthcare
TrustArc Webinar: Strategies for Future-Proofing Privacy for Healthcare
TrustArc
 
Ansible Vault Encrypting and Protecting Secrets - RHCE.pdf
Ansible Vault Encrypting and Protecting Secrets - RHCE.pdfAnsible Vault Encrypting and Protecting Secrets - RHCE.pdf
Ansible Vault Encrypting and Protecting Secrets - RHCE.pdf
RHCSA Guru
 
From Strategy To Execution In Hypergrowth
From Strategy To Execution In HypergrowthFrom Strategy To Execution In Hypergrowth
From Strategy To Execution In Hypergrowth
Pete Nieminen
 
Windows Client Privilege Escalation-Shared.pptx
Windows Client Privilege Escalation-Shared.pptxWindows Client Privilege Escalation-Shared.pptx
Windows Client Privilege Escalation-Shared.pptx
Oddvar Moe
 
Build with AI on Google Cloud Session #5
Build with AI on Google Cloud Session #5Build with AI on Google Cloud Session #5
Build with AI on Google Cloud Session #5
Margaret Maynard-Reid
 
Comparative Analysis of Reasoning Techniques
Comparative Analysis of Reasoning TechniquesComparative Analysis of Reasoning Techniques
Comparative Analysis of Reasoning Techniques
HoussemEddineDEGHA
 

第17回Cassandra勉強会: MyCassandra

  • 2. (24) •  @sunsuk7tp •  /P.A. WORKS / •  CS M2 •  :   : HPC   TSUBAME   MPI, Cell B.E., GPU CUDA, Hadoop on   :     , P2P   NoSQL Afternoon in Japan (10.11.1, )   SACSIS 2011 •  Web 6   PHP, Perl, JavaScript     Apache Solr, MySQL   NoSQL   NoSQL •  Jazz, trumpet •  Cassandra 0.6.0   @railute @yutuki_r @techmemo Itmedia 3 http://lab.jibun.atmarkit.co.jp/entries/1058
  • 3. +   NoSQL, Key-Value Store (KVS), Document-Oriented DB, GraphDB : memcached, Google Bigtable, Amazon Dynamo, Amazon SimpleDB, Apache Cassandra, Voldemort, Ringo, Vpork, MongoDB, CouchDB, Tokyo Tyrant, Flare, ROMA, kumofs, Kai, Redis, LevelDB, Hadoop HBase, Hypertable,Yahoo! PNUTS, Scalaris, Dynomite, ThruDB, Neo4j, IBM ObjectGrid, Oracle Coherence, 100   : ↔ •  •  join, transaction •  /
  • 4. /DC •  decentralized •  •  master/slave •  data/meta/proxy •  •  •  Map Reduce • 
  • 5.   SPOF   DC dc1 dc2 rack/dc region dc3
  • 6.   •    •  ( ) << & •  , correlated failure   SPOF = “ ”   •  : 1 •  : / ( ) Daniel Ford et. al. (Google), “Availability in Globally Distributed Storage Systems”, OSDI 2010
  • 7.   ⇒  !!   SPOF   ~ SPOF
  • 8.   decentralized •  proxy/master/slave  
  • 9. Consistent Hashing ( )   (A~Z ) N := 3 ID A F Z •  request proxy secondary 1 •  primary node Q •  secondary node V N primary secondary 2 hash(key) = Q key values
  • 11. SQL map Megastore library relational data model table (multi-dimentional sorted map) (sorted) records (sorted) map (sorted) map + indices + indices RDB Bigtable KVS NoSQL
  • 12. PNUTS (VLDB ‘08): MySQL NoSQL YCSB (SOCC ’10):
  • 13. Write-Heavy Read-Heavy write- optimized Better read- read- optimized optimized write- optimized
  • 14. Apache HBase write optimized Bigtable like centralized Apache Cassandra write optimized Bigtable like decentralized Sharded MySQL read optimized MySQL centralized Yahoo! Sherpa read optimized MySQL centralized : ⇒  Cassandra MySQL
  • 16. = Dynamo + Bigtable
  • 17. = Dynamo + Bigtable (P2P/decentralized)
  • 18. = Dynamo + (P2P/decentralized)
  • 19.   RDBMS   Table •  / •  •  NoSQL !! query
  • 21. = Dynamo + (P2P/decentralized)
  • 22. MySQL = Dynamo + Bigtable Redis :
  • 24. 1 (master/worker, sharding, consistent hashing) •  cache / persistence •  index •  write/read-optimized • 
  • 25. + MyCassandra
  • 26.   InnoDB (MySQL 5.1~ )   MyISAM   Memory   Merge   Archive   Federated   NDB   CSV   Blackhole ( )   FALCON   MariaDB   Drizzle InnoDB/MyISAM   solidDB MySQL Cluster   :
  • 27.   MySQL:   Bigtable: Cassandra   Redis: / snapshot   MongoDB: DB    
  • 28.   decentralized •    RDB (MySQL / PostgreSQL) •  master/slave decentralized   MongoDB / Redis   •  MapReduce   MySQL Bigtable   MySQL (InnoDB) INSERT   Bigtable INSERT/GET •    / /   EC2+RDS MyCassandra
  • 29.   /  I/O •  Bigtable (LSM-tree) •  MySQL (B-trees/ ) •  Redis (Hash) •  MongoDB (B-tree) •  KyotoCabinet (B+ tree/hash)
  • 30. hash B-Trees LSM-Tree write 1 random I/O append read 1 random I/O N random I/O + merge cache Memcached, MySQL, Cassandra, Redis, MongoDB, HBase, KyotoCabinet KyotoCabinet LevelDB    
  • 31. + : O(1)   sequential write I/O   Always writable write-lock memory sync <k1, obj (v1+v2)> async flush write path Memtable LSM-Tree [P. O’Neil ‘96] disk <k1, v1>, <k1, v2> Commit Log sequential disk mem <k1,obj1> write SSTable 1 <k1,obj2> SSTable 2 <k1,obj3> SSTable 3 SSTable
  • 32. +   Key •  Memtable value •  SSTable value I/O disk memory <k1,obj> Memtable disk mem disk <k1,obj+obj1~3> Commit Log client merge <k1,obj1> SSTable 1 I/O <k1,obj2> SSTable 2 <k1,obj3> SSTable 3
  • 33. + ( / 99.9%) 1/9 Better read write avg. 6.16 ms Number of queries read Latency (ms) write write: 2.0 ms avg. 0.69 ms read: 86.9 ms 99.9 percentile Latency (ms)
  • 34. Max. QPS for 40 Clients Bigtable MySQL 40000 Redis 35000 30000 25000 20000 15000 10000 5000 Better 0 (qps) Write Only Write Heavy Read Heavy Read Only
  • 35.   / /   /99%/Max/       ( KB~ MB)   HDD/SSD   (zipfian, uniform, latest)   •  Embedded InnoDB, KyotoCabinet # ( )
  • 37. proxy   client client •  o.a.c.cli •  o.a.c.avro/thrift server   proxy •  o.a.c.service.StorageProxy   server engine •  o.a.c.service.StorageService •  o.a.c.db.ReadVerbHandler/RowMutationVerbHandler   engine •  o.a.c.db.Table (keyspace )   o.a.c.db.commitlog   o.a.c.db.ColumnFamilyStore (columnfamily )   o.a.c.db.engine.StorageEngineInterface   o.a.c.db.engine.MySQLInstance, RedisInstance, MongoDBInstance, …
  • 38.   •  put (key, cf)   OK •  get (key) •  getRangeSlice (startWith, engWith, maxResults) •  truncate/dropTable/dropDB   •  secondaryIndex •  expire •  counter (Cassandra-0.8 )
  • 39.   Cassandra •  : keyspace – columnfamily – column •  key/value( ) •    ColumnFamily SSTable <key, value>   value: columnFamily Keyspace ColumnFamily A ColumnFamily B key col gender age region key col visits plan sato male 17 [null] sato 18 Gold suzuki female 21 Tokyo suzuki 214 Bronze Bigtable (Cassandra)
  • 40.   Cassandra •  Super Column  SSTable key-value •    KVS key prefix • 
  • 41. Cassandra MySQL Redis keyspace database db column family table record column field
  • 42. database db table A table B key values key values key values A:sato … sato gender;male;age;17 sato visits;18;plan;Gold B:ito … suzuki gender;female;age; suzuki visits; A:suzuki … 21;region;Tokyo 214;plan;Bronze B:tanaka … RDB (MySQL) KVS (Redis) keyspace columnfamily A columnfamily B key col gender age region key col visits plan sato male 17 [null] sato 18 Gold suzuki female 21 Tokyo suzuki 214 Bronze Bigtable (Cassandra)
  • 43.   •  MySQL database = keyspace :=>   MyCassandra (MySQL) •  MySQL table = keyspace :=>   Cassandra Bigtable (Cassandra) keyspace columnfamily A columnfamily B key col gender age region key col visits plan sato male 17 [null] sato 18 Gold suzuki female 21 Tokyo suzuki 214 Bronze MySQL gender age region visits plan sato male 17 [null] 18 Gold Table suzuki female 21 Tokyo 214 Bronze
  • 44.   1   secondary index rowKey CF counter secondary token index Serialized Object Key Value Key-Value KVS …
  • 45.   •  •  •  write query read query sync async async sync W R W R Bigtable MySQL Bigtable MySQL
  • 46. •  W: •  R: •  RW:     write query sync async W R   Quorum Protocol: ( )+ ( )> ( ) •  write read W RW R
  • 47. •  : •  R: •  RW: =3, =2 Client W:RW:R = 1:1:1 Proxy 1)  2)  W, RW ACK ACK 3a) W RW R 3b) R ACK : max (W, RW)
  • 48. •  : •  R: •  RW: =3, =2 W:RW:R = 1:1:1 Client Proxy 1)  2)  R, RW 3a) 3b) or W W RW R 4)  . (Cassandra read repair ) : max (R, RW)
  • 49. 20000 Cassandra 0.90 max. qps for 40 clients MyCassandra Cluster 18000 16000 6.49 14000 12000 1.54 0.93 10000 Better 8000 6000 4000 2000 0 [100:0] [50:50] [5:95] [0:100] [write:read] (query/sec) Write-Only Write-Heavy Read-Heavy Read-Only Write Heavy Read Heavy • YCSB / Zipfian •  6.49 • 
  • 50.   https://github.com/sunsuk7tp/MyCassandra   MyCassandra-0.2.0 ( ) •  based on Cassandra-0.7.5 •  Baseic CRUD on a simple record •  RangeSlice •  keyspace
  • 51. 1.  cassandra.yaml •  engine host, port, … •  default engine 2.  ( ) 3.  MyCassandra (Cassandra ) 4.  or keyspace, columnfamily •  engine (keyspace ) •  (column family )
  • 52.   Embedded InnoDB •  HailDB: … •  Handler Socket: … •  ExtraDB •  API   DBM (KyotoCabinet) •  KyotoCassandra/Kyossandra/ ssandra ( ) •  •  NoSQL •  QDBM, TC Hash or B+Tree db
  • 53. •  / •  hash/B+tree •  class persistence algorithm lock unit ProtoHashDB volatile hash whole (rwlock) ProtoTreeDB red black tree whole (rwlock) StashDB hash record (rwlock) CacheDB hash record (mutex) GrassDB B+ tree page (rwlock) HashDB persistent hash record (rwlock) TreeDB B+ tree page (rwlock) DirDB undefined record (rwlock) ForestDB B+ tree page (rwlock)
  • 54.  MyCassandra-0.2.2 •  secondaryIndex   MySQL MongoDB  MyCassandra-0.3.0 •  Based on Cassandra-0.8 •  Atomic counter •  Brisk (Hadoop + Cassandra)…
  • 56.   Cassandra /expire •  tombstone •  SSTable •  Bigtable like  MyCassandra Bigtable •  •  expire •    1 Table
  • 57.     instance instance instance ping detect engine engine engine instance ? ? node down ?
  • 58.   •    Redis   MongoDB   •    key   Join  
  • 59.   •    •  Cassandra-0.6 :   GC   •  Cassandra-0.7, 0.8:           …
  • 60.  Issue •  https://github.com/sunsuk7tp/MyCassandra/issues  Twitter •  @MyCassandraJP •  @_MyCassandra # @MyCassandra orz •  @sunsuk7tp #  Google Groups •  https://groups.google.com/group/my-cassandra
  • 61.   / @railute •  Cassandra   Gemini Mobile Technologies / @geminimobile •  Hibari   / @yutuki_r •  Cassandra twitter   dann / @techmemo •  Cassandra   / @tatsuya6502 •  YCSB , Hibari   / @mikio1978 / @fallabs •  KyotoCabinet   / @muga_nishizawa   / @Nakata_itpro   / @shudo   Cassandra     UST ( )