SlideShare a Scribd company logo
Mongo & Ecommerce
Mongo & Ecommerce
    A perfect combination
SQL Dilemma
   Pick One
SQL Dilemma
                 Pick One


• Stick to one vertical (Sane schema)
SQL Dilemma
                 Pick One


• Stick to one vertical (Sane schema)
• Flexiblity (Insane schema)
Sane Schema
Sane Schema


• Works fine for a few types of products
Sane Schema


• Works fine for a few types of products
• Not possible when more product types
  introduced.
Let’s Use an Example
Let’s Use an Example
   How about we start with books
Book Product Schema
id:
sku:
product dimensions:                   General Product
shipping weight:
MSRP:
                                      attributes
price:
description:
...

author:       Orson Scott Card
title:      Enders Game
binding:       Hardcover
publication date: July 15, 1994       Book Specific
publisher name: Tor Science Fiction   attributes
number of pages: 352
ISBN:        0812550706
language:       English
...
Seems simple enough
Seems simple enough

What happens when we add another vertical...
            say music albums
Album Product Schema
id:
sku:
product dimensions:                General Product
shipping weight:
MSRP:
                                   attributes stay the
price:                             same
description:
...

artist:      MxPx
title:      Panic                  Album Specific
release date: June 7, 2005
label:       Side One Dummy        attributes are
track listing: [ The Darkest ...   different
language:        English
format:        CD
Okay, it’s getting hairy but
is still manageable, right?
Okay, it’s getting hairy but
is still manageable, right?

    Now the business want to sell jeans
Jeans Product Schema
id:
sku:                         General Product
product dimensions:
shipping weight:             attributes stay the
MSRP:                        same
price:
description:
...

brand:       Lucky
gender:       Mens           Jeans specific
make:        Vintage         attributes are totally
style:      Straight Cut     different ... and not
length:      34
width:       34              consistent across
color:      Hipster          brands & make
material:     Cotten Blend
...
Now we’re screwed
We need a flexible
schema in RDBMS
    We got this ... right?
Common approaches dealing with
unknown unknowns in RDBMS world
Common approaches dealing with
unknown unknowns in RDBMS world
•   EAV (Entity Attribute Value popularized by
    Magento)
Common approaches dealing with
unknown unknowns in RDBMS world
•   EAV (Entity Attribute Value popularized by
    Magento)
•   Single table inheritance aka. Insanely wides tables
    (field 1, field 2, ...)
Common approaches dealing with
unknown unknowns in RDBMS world
•   EAV (Entity Attribute Value popularized by
    Magento)
•   Single table inheritance aka. Insanely wides tables
    (field 1, field 2, ...)
•   Generic columns ( ‘attribute1’, ‘attribute2’, ...)
Common approaches dealing with
unknown unknowns in RDBMS world
•   EAV (Entity Attribute Value popularized by
    Magento)
•   Single table inheritance aka. Insanely wides tables
    (field 1, field 2, ...)
•   Generic columns ( ‘attribute1’, ‘attribute2’, ...)
•   Serialized data in blob (or similar)
Common approaches dealing with
unknown unknowns in RDBMS world
•   EAV (Entity Attribute Value popularized by
    Magento)
•   Single table inheritance aka. Insanely wides tables
    (field 1, field 2, ...)
•   Generic columns ( ‘attribute1’, ‘attribute2’, ...)
•   Serialized data in blob (or similar)
•   Concrete table inheritance
Common approaches dealing with
unknown unknowns in RDBMS world
•   EAV (Entity Attribute Value popularized by
    Magento)
•   Single table inheritance aka. Insanely wides tables
    (field 1, field 2, ...)
•   Generic columns ( ‘attribute1’, ‘attribute2’, ...)
•   Serialized data in blob (or similar)
•   Concrete table inheritance
•   Class table inheritance
None Work Well
    Let’s look at why
EAV
            as popularized by Magento
“For purposes of flexibility, the Magneto database heavily utilizes
         an Entity-Attribute-Value (EAV) data model.

   As is often the case, the cost of flexibility is complexity -
                   Magento is no exception.

  The process of manipulating data in Magento is often more
  “involved” than that typically experienced using traditional
                       relational tables.”
                                          - Varien
EAV
•   Crazy SQL queries.

•   Hundreds of joins in a
    query... or

•   Hundreds of queries
    joined in the application

•   No database enforced
    integrity
Did I say crazy SQL
(this is a single query)
Did I say crazy SQL
(this is a single query)




  You may have trouble reading this in the back
Selecting a single
    product
Single Table Inheritance
       (insanely wide tables)
•   No data integrity
    enforcement

•   Only can use FK for
    common elements

•   Very wasteful (but disk is
    cheap!)

•   Can’t effectively index
Generic Columns
•   No data integrity
    enforcement

•   No data type enforcement

•   Only can use FK for
    common elements

•   Wasteful (but disk is
    cheap!)

•   Can’t index
Serialized in Blob
•   Not searchable

•   No integrity

•   All the disadvantages of a
    document store, but none
    of the advantages

•   Never should be used

•   One exception is Oracle
    XML which operates
    similar to a document store
Concrete Table Inheritance
(a table for each product attribute set)

•   Allows for data integrity

•   Querying across
    attribute sets quite hard
    to do (lots of joins, OR
    statements and full table
    scanning)

•   New table needs to be
    created for each new
    attribute set
Class table inheritance
              (single product table,
         each attribute set in own table)
•   Likely best solution within the
    constraint of SQL

•   Supports data type enforcement

•   No data integrity enforcement

•   Easy querying across categories
    (for browse pages) since
    common data in single table

•   Every set needs a new table

•   Requires a ton of forsight, as
    changes are very complicated
Square Peg Round Hole
Mongo to the Rescue

• Flexible (and sane) Schema
• Easily searchable
• Easily accessible
• Fast
Examples

• grab products.js from http://github.com/
  spf13/mongoProducts
• mongo --shell products.js
• > use mongoProducts
Schema
Flexible Structure




             DVD
CD
Basic Queries
•   db.products.find( { 'details.title': "The
    Matrix" } );
•   db.products.find( { 'details.director': "Frank
    Capra" } );
•   db.products.find( { 'details.actor': "Groucho
    Marx" } );
•   db.products.find( { 'details.tracks': "Violins" } );
•   db.order.find( { 'items.sku': '00e8da9f' } );
Advanced Queries

• db.products.find( { 'pricing.pct_savings':
  { $gt : 15 } } );
• db.products.find( { 'details.actor': 'James
  Stewart', 'details.actor': 'Donna Reed' })
• db.order.find( { 'items.details.actor': /reed$/
  i }, {'items.sku': 1} );
Embedded documents
  great for orders
• Ordered items need to be fixed at the time
  of purchase
• Embed them right in the order
• db.order.find( { 'items.sku': '00e8da9f' } );
• db.order.find( { 'items.details.actor': 'James
  Stewart' } ).count()
What about the $$?
Stricter data
requirements for $$
Stricter data
    requirements for $$
•   For financial systems any data inconsistency is
    unacceptable
Stricter data
    requirements for $$
•   For financial systems any data inconsistency is
    unacceptable
•   Mongo supports atomic single document writes
Stricter data
    requirements for $$
•   For financial systems any data inconsistency is
    unacceptable
•   Mongo supports atomic single document writes
•   Mongo doesn’t support multi-document
    transactions
Stricter data
    requirements for $$
•   For financial systems any data inconsistency is
    unacceptable
•   Mongo supports atomic single document writes
•   Mongo doesn’t support multi-document
    transactions
•   If you need atomic writes across multi-document
    transactions don’t use Mongo.
Stricter data
    requirements for $$
•   For financial systems any data inconsistency is
    unacceptable
•   Mongo supports atomic single document writes
•   Mongo doesn’t support multi-document
    transactions
•   If you need atomic writes across multi-document
    transactions don’t use Mongo.
•   Many ecommerce transactions could be
    accomplished within a single document write.
What about ACID?
What about ACID?
• ACID compliance is difficult to implement
  with a distributed system. For the most
  sensitive data you may want the reliability
  ACID guarantees.
What about ACID?
• ACID compliance is difficult to implement
  with a distributed system. For the most
  sensitive data you may want the reliability
  ACID guarantees.
• Mongo is ACI for single collection writes.
What about ACID?
• ACID compliance is difficult to implement
  with a distributed system. For the most
  sensitive data you may want the reliability
  ACID guarantees.
• Mongo is ACI for single collection writes.
• For many systems ACID compliance for
  recording a purchase is overkill and
  unnecessary.
About Me

• VP Engineering @ OpenSky
• @spf13
• github.com/spf13
• Hacker, Father, Husband, Entrepreneur,
  Skate Punk
About OpenSky
OpenSky is a better way to shop where you buy from people you trust.
OpenSky is a free online platform that helps people discover
amazing products; share them with their friends, family and followers;
and earn money.

Launching in July 2010, the OpenSky is the first ecommerce platform built
on MongoDB.  The OpenSky Platform has been created using MongoDB,
Symfony 2, Doctrine 2, PHP 5.3, jQuery, node.js, Git (with gitflow) and a
touch of Java and Python. We support and contribute to many of these
open source projects.  

OpenSky was founded by John Caplan, former CEO of Ford Models and
former President of the About.com Network. OpenSky is well funded by
Highland Capital Partners, Canaan Partners and Ron Conway and is based
in NYC near Union Square.
Interested in what
        you saw?

• We’re Hiring
• Contact me at jobs@shopOpenSky.com
Questions
Appendix
OpenSky currently
uses a hybrid approach
• OpenSky uses a 3rd party library to
  process transactions it is written to only
  use SQL for transactions. In 2011 we will
  transition off this library.
• OpenSky uses Solr for search.
• All other functions of the site currently on
  MongoDB.
Note on Ecommerce
         and Mongo
  Concerning transactions, it is true that MongoDB doesn’t support
the multi-object kind; however, it does support atomic operations on
    individual documents. And this, combined with the document-
oriented modeling just described and a little creativity, is adequate to
many e-commerce problems. Certainly, if we needed to literally debit
   one account and credit another in the same operation, or if we
  wanted rollback, we’d need full-fledged transactions. However, the
 transactionality provided by MongoDB may be sufficient for most, if
                    not all, e-commerce operations.
                                         - Kyle Banker 10gen

More Related Content

MongoDB and Ecommerce : A perfect combination

  • 2. Mongo & Ecommerce A perfect combination
  • 3. SQL Dilemma Pick One
  • 4. SQL Dilemma Pick One • Stick to one vertical (Sane schema)
  • 5. SQL Dilemma Pick One • Stick to one vertical (Sane schema) • Flexiblity (Insane schema)
  • 7. Sane Schema • Works fine for a few types of products
  • 8. Sane Schema • Works fine for a few types of products • Not possible when more product types introduced.
  • 9. Let’s Use an Example
  • 10. Let’s Use an Example How about we start with books
  • 11. Book Product Schema id: sku: product dimensions: General Product shipping weight: MSRP: attributes price: description: ... author: Orson Scott Card title: Enders Game binding: Hardcover publication date: July 15, 1994 Book Specific publisher name: Tor Science Fiction attributes number of pages: 352 ISBN: 0812550706 language: English ...
  • 13. Seems simple enough What happens when we add another vertical... say music albums
  • 14. Album Product Schema id: sku: product dimensions: General Product shipping weight: MSRP: attributes stay the price: same description: ... artist: MxPx title: Panic Album Specific release date: June 7, 2005 label: Side One Dummy attributes are track listing: [ The Darkest ... different language: English format: CD
  • 15. Okay, it’s getting hairy but is still manageable, right?
  • 16. Okay, it’s getting hairy but is still manageable, right? Now the business want to sell jeans
  • 17. Jeans Product Schema id: sku: General Product product dimensions: shipping weight: attributes stay the MSRP: same price: description: ... brand: Lucky gender: Mens Jeans specific make: Vintage attributes are totally style: Straight Cut different ... and not length: 34 width: 34 consistent across color: Hipster brands & make material: Cotten Blend ...
  • 19. We need a flexible schema in RDBMS We got this ... right?
  • 20. Common approaches dealing with unknown unknowns in RDBMS world
  • 21. Common approaches dealing with unknown unknowns in RDBMS world • EAV (Entity Attribute Value popularized by Magento)
  • 22. Common approaches dealing with unknown unknowns in RDBMS world • EAV (Entity Attribute Value popularized by Magento) • Single table inheritance aka. Insanely wides tables (field 1, field 2, ...)
  • 23. Common approaches dealing with unknown unknowns in RDBMS world • EAV (Entity Attribute Value popularized by Magento) • Single table inheritance aka. Insanely wides tables (field 1, field 2, ...) • Generic columns ( ‘attribute1’, ‘attribute2’, ...)
  • 24. Common approaches dealing with unknown unknowns in RDBMS world • EAV (Entity Attribute Value popularized by Magento) • Single table inheritance aka. Insanely wides tables (field 1, field 2, ...) • Generic columns ( ‘attribute1’, ‘attribute2’, ...) • Serialized data in blob (or similar)
  • 25. Common approaches dealing with unknown unknowns in RDBMS world • EAV (Entity Attribute Value popularized by Magento) • Single table inheritance aka. Insanely wides tables (field 1, field 2, ...) • Generic columns ( ‘attribute1’, ‘attribute2’, ...) • Serialized data in blob (or similar) • Concrete table inheritance
  • 26. Common approaches dealing with unknown unknowns in RDBMS world • EAV (Entity Attribute Value popularized by Magento) • Single table inheritance aka. Insanely wides tables (field 1, field 2, ...) • Generic columns ( ‘attribute1’, ‘attribute2’, ...) • Serialized data in blob (or similar) • Concrete table inheritance • Class table inheritance
  • 27. None Work Well Let’s look at why
  • 28. EAV as popularized by Magento “For purposes of flexibility, the Magneto database heavily utilizes an Entity-Attribute-Value (EAV) data model. As is often the case, the cost of flexibility is complexity - Magento is no exception. The process of manipulating data in Magento is often more “involved” than that typically experienced using traditional relational tables.” - Varien
  • 29. EAV • Crazy SQL queries. • Hundreds of joins in a query... or • Hundreds of queries joined in the application • No database enforced integrity
  • 30. Did I say crazy SQL (this is a single query)
  • 31. Did I say crazy SQL (this is a single query) You may have trouble reading this in the back
  • 33. Single Table Inheritance (insanely wide tables) • No data integrity enforcement • Only can use FK for common elements • Very wasteful (but disk is cheap!) • Can’t effectively index
  • 34. Generic Columns • No data integrity enforcement • No data type enforcement • Only can use FK for common elements • Wasteful (but disk is cheap!) • Can’t index
  • 35. Serialized in Blob • Not searchable • No integrity • All the disadvantages of a document store, but none of the advantages • Never should be used • One exception is Oracle XML which operates similar to a document store
  • 36. Concrete Table Inheritance (a table for each product attribute set) • Allows for data integrity • Querying across attribute sets quite hard to do (lots of joins, OR statements and full table scanning) • New table needs to be created for each new attribute set
  • 37. Class table inheritance (single product table, each attribute set in own table) • Likely best solution within the constraint of SQL • Supports data type enforcement • No data integrity enforcement • Easy querying across categories (for browse pages) since common data in single table • Every set needs a new table • Requires a ton of forsight, as changes are very complicated
  • 39. Mongo to the Rescue • Flexible (and sane) Schema • Easily searchable • Easily accessible • Fast
  • 40. Examples • grab products.js from http://github.com/ spf13/mongoProducts • mongo --shell products.js • > use mongoProducts
  • 43. Basic Queries • db.products.find( { 'details.title': "The Matrix" } ); • db.products.find( { 'details.director': "Frank Capra" } ); • db.products.find( { 'details.actor': "Groucho Marx" } ); • db.products.find( { 'details.tracks': "Violins" } ); • db.order.find( { 'items.sku': '00e8da9f' } );
  • 44. Advanced Queries • db.products.find( { 'pricing.pct_savings': { $gt : 15 } } ); • db.products.find( { 'details.actor': 'James Stewart', 'details.actor': 'Donna Reed' }) • db.order.find( { 'items.details.actor': /reed$/ i }, {'items.sku': 1} );
  • 45. Embedded documents great for orders • Ordered items need to be fixed at the time of purchase • Embed them right in the order • db.order.find( { 'items.sku': '00e8da9f' } ); • db.order.find( { 'items.details.actor': 'James Stewart' } ).count()
  • 48. Stricter data requirements for $$ • For financial systems any data inconsistency is unacceptable
  • 49. Stricter data requirements for $$ • For financial systems any data inconsistency is unacceptable • Mongo supports atomic single document writes
  • 50. Stricter data requirements for $$ • For financial systems any data inconsistency is unacceptable • Mongo supports atomic single document writes • Mongo doesn’t support multi-document transactions
  • 51. Stricter data requirements for $$ • For financial systems any data inconsistency is unacceptable • Mongo supports atomic single document writes • Mongo doesn’t support multi-document transactions • If you need atomic writes across multi-document transactions don’t use Mongo.
  • 52. Stricter data requirements for $$ • For financial systems any data inconsistency is unacceptable • Mongo supports atomic single document writes • Mongo doesn’t support multi-document transactions • If you need atomic writes across multi-document transactions don’t use Mongo. • Many ecommerce transactions could be accomplished within a single document write.
  • 54. What about ACID? • ACID compliance is difficult to implement with a distributed system. For the most sensitive data you may want the reliability ACID guarantees.
  • 55. What about ACID? • ACID compliance is difficult to implement with a distributed system. For the most sensitive data you may want the reliability ACID guarantees. • Mongo is ACI for single collection writes.
  • 56. What about ACID? • ACID compliance is difficult to implement with a distributed system. For the most sensitive data you may want the reliability ACID guarantees. • Mongo is ACI for single collection writes. • For many systems ACID compliance for recording a purchase is overkill and unnecessary.
  • 57. About Me • VP Engineering @ OpenSky • @spf13 • github.com/spf13 • Hacker, Father, Husband, Entrepreneur, Skate Punk
  • 58. About OpenSky OpenSky is a better way to shop where you buy from people you trust. OpenSky is a free online platform that helps people discover amazing products; share them with their friends, family and followers; and earn money. Launching in July 2010, the OpenSky is the first ecommerce platform built on MongoDB.  The OpenSky Platform has been created using MongoDB, Symfony 2, Doctrine 2, PHP 5.3, jQuery, node.js, Git (with gitflow) and a touch of Java and Python. We support and contribute to many of these open source projects.   OpenSky was founded by John Caplan, former CEO of Ford Models and former President of the About.com Network. OpenSky is well funded by Highland Capital Partners, Canaan Partners and Ron Conway and is based in NYC near Union Square.
  • 59. Interested in what you saw? • We’re Hiring • Contact me at [email protected]
  • 62. OpenSky currently uses a hybrid approach • OpenSky uses a 3rd party library to process transactions it is written to only use SQL for transactions. In 2011 we will transition off this library. • OpenSky uses Solr for search. • All other functions of the site currently on MongoDB.
  • 63. Note on Ecommerce and Mongo Concerning transactions, it is true that MongoDB doesn’t support the multi-object kind; however, it does support atomic operations on individual documents. And this, combined with the document- oriented modeling just described and a little creativity, is adequate to many e-commerce problems. Certainly, if we needed to literally debit one account and credit another in the same operation, or if we wanted rollback, we’d need full-fledged transactions. However, the transactionality provided by MongoDB may be sufficient for most, if not all, e-commerce operations. - Kyle Banker 10gen

Editor's Notes

  1. actually, just the first 1/3 of it.
  2. Ironically this is how magento solves the performance problems associated with EAV, by caching the data into insanely wide tables.
  3. Can’t create a FK as each set references a different table. “Key” really made of attribute table name id and attribute table name
  4. Notice how it doesn’t matter if it’s an array or not.
  5. Whenever you use a inter system coordination you need to implement your own atomic checks in the application... But SOAP does have transactions.. so not quite accurate. kyle idea... but we are fairly atomic with authorize.net
  6. Whenever you use a inter system coordination you need to implement your own atomic checks in the application... But SOAP does have transactions.. so not quite accurate. kyle idea... but we are fairly atomic with authorize.net
  7. Whenever you use a inter system coordination you need to implement your own atomic checks in the application... But SOAP does have transactions.. so not quite accurate. kyle idea... but we are fairly atomic with authorize.net
  8. Whenever you use a inter system coordination you need to implement your own atomic checks in the application... But SOAP does have transactions.. so not quite accurate. kyle idea... but we are fairly atomic with authorize.net
  9. Whenever you use a inter system coordination you need to implement your own atomic checks in the application... But SOAP does have transactions.. so not quite accurate. kyle idea... but we are fairly atomic with authorize.net
  10. atomicity, consistency, isolation, durability.
  11. atomicity, consistency, isolation, durability.
  12. atomicity, consistency, isolation, durability.