SlideShare a Scribd company logo
Google’s BigTable Out of the Slipstream :: July 3, 2008
The BigTable Goals Wide Applicability Used in more than 60 Google products Scalability High Performance High Availability
The BigTable Arena Internet Scale Google :: BigTable and GFS Apache :: HBase and HDFS Amazon :: SimpleDB and S3 Facebook :: Cachr and Haystacks
The BigTable Features Dynamic control over data layout and format Data is uninterpreted strings “ Does not support a full relational model” Locality of data Dynamic control over serving data from memory or disk Sparse, distributed, persistent multidimensional sorted map. The map is indexed by: A row key A column name A timestamp Each value in the map is an uninterpreted array of bytes Column oriented
Architecture GFS SSTables Tables Chubby Clusters Tablets Tablet Servers
Table Structure Columns Timestamp / version Key Table Indexes Column Families Expando Columns
Google App Engine
App Engine BigTable + Python + AppEngine SDK Choice of web frameworks: webapp (pre-installed) Django CherryPy Pylons Web.py Google Accounts integration App Engine SDK for offline development Offline development environment Online runtime environment Free to get started Priced similar to Amazon S3
Getting Started Sign-up for an account Download Python 2.5 Download AppEngine SDK Local version of BigTable Web-server Google user account simulator Webapp framework Getting started tutorial Write you application Upload to google
Class Definition Python code to declare a datastore class: class Patient(db.Model):   firstName = db.UserProperty()   lastName = db.UserProperty()   dateOfBirth = db.DateTimeProperty()   sex = db.UserProperty()
Create Python code to create and store an object: patient = Patient() patient.firstName=“George” patient.lastName=“James” dateOfBirth=“2008-01-01” sex=“M” patient.put()
Query Python code to query a class: patients = Patient.all() for patient in patients: self.response.out.write(‘Name %s %s.’,  patient.firstName, patient.lastName)
More complex query Python code to select the 100 youngest male patients: allPatients = Patient.all() allPatients.filter(‘sex=‘,’Male’) allPatients.order(‘dateOfBirth’) patients = allPatients.fetch(100)
Query using GQL GQL = Google Query Language GQL code to select the 100 youngest male patients: select * from Patient where sex=‘Male’ order by dateOfBirth Cannot select specific columns No joins
Indexes Development SDK Index definitions generated automatically based on data access within your application Index definitions uploaded to the Google server - kind: Patient properties: - name: dateOfBirth direction: asc - name: sex direction: desc
Indexes
Data Viewer
Data Viewer
Data Viewer
Conclusions BigTable is an  Internet Scale  solution Conventional databases are not up to the job  Home grown solutions Increasing demand ??? Profit
Thank you Questions?

More Related Content

Google's BigTable

  • 1. Google’s BigTable Out of the Slipstream :: July 3, 2008
  • 2. The BigTable Goals Wide Applicability Used in more than 60 Google products Scalability High Performance High Availability
  • 3. The BigTable Arena Internet Scale Google :: BigTable and GFS Apache :: HBase and HDFS Amazon :: SimpleDB and S3 Facebook :: Cachr and Haystacks
  • 4. The BigTable Features Dynamic control over data layout and format Data is uninterpreted strings “ Does not support a full relational model” Locality of data Dynamic control over serving data from memory or disk Sparse, distributed, persistent multidimensional sorted map. The map is indexed by: A row key A column name A timestamp Each value in the map is an uninterpreted array of bytes Column oriented
  • 5. Architecture GFS SSTables Tables Chubby Clusters Tablets Tablet Servers
  • 6. Table Structure Columns Timestamp / version Key Table Indexes Column Families Expando Columns
  • 8. App Engine BigTable + Python + AppEngine SDK Choice of web frameworks: webapp (pre-installed) Django CherryPy Pylons Web.py Google Accounts integration App Engine SDK for offline development Offline development environment Online runtime environment Free to get started Priced similar to Amazon S3
  • 9. Getting Started Sign-up for an account Download Python 2.5 Download AppEngine SDK Local version of BigTable Web-server Google user account simulator Webapp framework Getting started tutorial Write you application Upload to google
  • 10. Class Definition Python code to declare a datastore class: class Patient(db.Model):   firstName = db.UserProperty() lastName = db.UserProperty() dateOfBirth = db.DateTimeProperty() sex = db.UserProperty()
  • 11. Create Python code to create and store an object: patient = Patient() patient.firstName=“George” patient.lastName=“James” dateOfBirth=“2008-01-01” sex=“M” patient.put()
  • 12. Query Python code to query a class: patients = Patient.all() for patient in patients: self.response.out.write(‘Name %s %s.’, patient.firstName, patient.lastName)
  • 13. More complex query Python code to select the 100 youngest male patients: allPatients = Patient.all() allPatients.filter(‘sex=‘,’Male’) allPatients.order(‘dateOfBirth’) patients = allPatients.fetch(100)
  • 14. Query using GQL GQL = Google Query Language GQL code to select the 100 youngest male patients: select * from Patient where sex=‘Male’ order by dateOfBirth Cannot select specific columns No joins
  • 15. Indexes Development SDK Index definitions generated automatically based on data access within your application Index definitions uploaded to the Google server - kind: Patient properties: - name: dateOfBirth direction: asc - name: sex direction: desc
  • 20. Conclusions BigTable is an Internet Scale solution Conventional databases are not up to the job Home grown solutions Increasing demand ??? Profit