NoSQL
This page is a brief introduction to NoSQL offering a set of definitions of the NoSQL term and NoSQL databases, explaining the reasons behind NoSQL databases.
For various guides and tutorial for getting started with NoSQL databases, check out the NoSQL: Guides, Tutorials, Books, Papers page. If you are interested in finding who is using NoSQL solutions, check the Powered by NoSQL reference. Then choose your NoSQL database and NoSQL library.
What is NoSQL?
A list of (possible) definitions for NoSQL (also referred to as NoSQL databases or NoSQL stores):
NoSQL is a movement promoting a loosely defined class of non-relational data stores that break with a long history of relational databases. These data stores may not require fixed table schemas, usually avoid join operations and typically scale horizontally. Academics and papers typically refer to these databases as structured storage.
Non-relational next generation operational datastores and databases
Dwight Merriman, CEO 10gen
Next Generation Databases mostly addressing some of the points: being non-relational, distributed, open-source and horizontal scalable.
NoSQL is a term coined by Carlo Strozzi and repurposed by Eric Evans to refer to “some” storage systems. The NoSQL term should be used as in the Not-Only-SQL and not as No to SQL or Never SQL.
NoSQL is about choice
NoSQL is not about any one feature of any of the projects. NoSQL is not about scaling, NoSQL is not about performance, NoSQL is not about hating SQL, NoSQL is not about ease of use, NoSQL is not about sharding, NoSQL is not about throughput, NoSQL is not about speed, NoSQL is not about dropping ACID, NoSQL is not about Eventual Consistency, NoSQL is not about CAP, NoSQL is not about open standards, NoSQL is not about Open Source and NoSQL is most likely not about whatever else you want NoSQL to be about. NoSQL is about choice
Why NoSQL?
- Handling massive amounts of data
- Exponential growth of newly created digital content
- More value around data
- Build value around data by connecting the dots
- Connectedness
- Information format
- Data usage scenarios (plus open data)
Fundamental papers
- Google BigTable
- Amazon Dynamo
- BASE: An ACID Alternative
- Brewster’s CAP theorem (pdf). Julian Browne’s article can be helpful too.
NoSQL databases
Columnar Stores or Wide Column Stores
- BigTable:
- Cassandra: a highly scalable second-generation distributed database, bringing together Dynamo’s fully distributed design and Bigtable’s ColumnFamily-based data model.
- HBase:
- Hypertable
Document stores or Document databases
- Colayer
- CouchDB
- FleetDB
- Jackrabbit
- Lotus Notes
- MongoDB
- OrientDB
- Raven DB
- ThruDB
- Terrastore
Graph databases
- AllegroGraph
- Bigdata
- Core Data
- DEX: a high-performance graph database written in Java and C++. Its main characteristic is its performance storage and retrieval for large graphs, in the order of billions of nodes, edges and attributes, implemented with specialized structures.
- Filament
- FlockDB
- HyperGraphDB
- InfiniteGraph
- InfoGrid
- Neo4j
- OpenLink Virtuoso
- Sones
- VertexDB
- Trinity: a graph database and computation platform over distributed memory cloud. As a database, it provides features such as highly concurrent query processing, transaction, consistency control. As a computation platform, it provides synchronous and asynchronous batch-mode computations on large scale graphs.
Key-Value Stores
- Amazon SimpleDB
- Azure Table Storage
- Berkeley DB
- Chordless
- Dynomite
- GenieDB: GenieDB is designed to be a pragmatic solution to a widespread class of data storage problems, with a high-performance native API alongside compatability with MySQL.
- GT.M / M.DB
- HamsterDB
- Hibari: Hibari is a production-ready, distributed, key-value, big data store. Hibari uses chain replication for strong consistency, high-availability, and durability. Hibari has excellent performance especially for read and large value operations.
- KAI
- KaTree
- Kumofs
- LightCloud
- Membase
- Memcachedb
- Mnesia
- NorthScale
- Orient Key/Value Server
- Pincaster
- PNUTS/Sherpa
- Project Voldemort: LinkedIn open source implementation of Amazon Dynamo key-value store
- Redis
- Riak: Dynamo-inspired key/value store that scales predictably and easily.
- Scalaris
- ScalienDB / Scalien Keyspace: a distributed, consistent key-value store
- Tokyo Cabinet
Multi-value databases
- OpenQM
- Rocket U2
Object databases
- Db4o
- GemStone/S
- KiokuDB
- InterSystems Caché
- Neo
- Objectivity/DB
- Perst
- Progress
- Versant
- ZODB
XML databases
- BerkleyDB XML
- EMC Documentum xDB
- eXist
- MarkLogic Server
- Sausalito: Sausalito powers XQuery in the Cloud
- Sedna
- Tamino
- Xindice
Unclassified
- CloudKit
- FluidDB
- Moneta
- Perservere
Cassandra ¶
- Project description
- The Apache Cassandra Project develops a highly scalable second-generation distributed database, bringing together Dynamo’s fully distributed design and Bigtable’s ColumnFamily-based data model.
- Project home
- cassandra.apache.com
- Data model
- wide column
- Distribution model
- masterless cluster (inspired by Amazon Dynamo)
- Persistence model
- Disk
- Client/network protocol(s)
- Custom
- Elasticity
- Yes
- License
- Apache
- Implementation language/supported OS
- java
- Any other exciting features
- Fault tolerant, durable
- Contributed by
DEX ¶
- Project description
- DEX is a high-performance graph database written in Java and C++. Its main characteristic is its performance storage and retrieval for large graphs, in the order of billions of nodes, edges and attributes, implemented with specialized structures.
- Project home
- sparsity-technologies.com
- Data model
Labeled Directed Attributed Multigraph
- Labeled
- nodes and edges belong to types
- Directed
- directed edges
- Attributed
- Nodes and Edges with attributes
- Multigraph
- multiple edges between the same nodes even from the same edge type.
- Distribution model
- not distributed
- Persistence model
- Disk
- Client/network protocol(s)
- None
- Elasticity
- not applicable
- License
- Free evaluation version available at http://sparsity-technologies.com/dex_downloads.php (limited to 1 Million nodes, no restriction on edges, and one concurrent user session). Non-restricted version licensed by Sparsity-Technologies, more information at [email protected]
- Implementation language/supported OS
- Java and C++ / Windows and Linux
- Any other exciting features
- Quick answering time for complex queries , Multiple graph algorithms available, Regular expression querying, Large char objects for attributes , Materialized neighbors, Indexed or not indexed attributes, CSV import/export, Script loaders and lots of more exciting features.
- Contributed by
- Dàmaris Coll
GenieDB ¶
- Project description
- GenieDB is designed to be a pragmatic solution to a widespread class of data storage problems, with a high-performance native API alongside compatability with MySQL.
- Data model
- k/v, document, and column (we’re considering adding support for graph, too)
- Distribution model
- Masterless cluster, supporting geographically dispersed operation.
- Persistence model
- Disk
- Client/network protocol(s)
- Native API C library (accessible from PHP etc), or through MySQL
- Elasticity
- Sure, you can grow and shrink clusters on-line. We provide a full replica on every server, so you can scale read capacity this way, but you can only write as fast as the slowest server can keep up with, and store as much data as your smallest server.
- License
- Closed-source (for now)
- Implementation language/supported OS
- The core’s in C; the MySQL plugin is in C++ out of necessity. Primary development is on Linux, but a Solaris port exists, and BSD is in the pipeline.
- Any other exciting features
- We’ve aimed for pragmatism, so there are lots of little things that avoid failure cases or reduce administrative overhead, such as the write flow control system (to avoid snowballing replication queues), the ability to trade off performance/cost/semantics tradeoffs at various levels, and so on.
- Contributed by
- Alaric Snell-Pym
Hibari ¶
- Project description
- Hibari is a production-ready, distributed, key-value, big data store. Hibari uses chain replication for strong consistency, high-availability, and durability. Hibari has excellent performance especially for read and large value operations.
- Data model
- key-value
- Distribution model
- Chain replication between data nodes; Cluster’s global hash managed by master/slave admin nodes
- Persistence model
- disk, disk+memory
- Client/network protocol(s)
- Client protocol
- A native Erlang API, via Erlang’s native message-passing mechanism
- Amazon S3 protocol, via HTTP
- UBF, Joe Armstrong’s “Universal Binary Format” protocol, via TCP
- UBF via several minor variations of TCP transport
- UBF over JSON-RPC, via HTTP
- JSON-encoded UBF, via TCP
- Memcached, via TCP
- UBF over Thrift, via TCP
- UBF over Protocol Buffers, via TCP
- Elasticity
- ?
- License
- Apache Public License version 2.0
- Implementation language/supported OS
- Erlang/OTP. RedHat, CentOS, and Fedora Linux distributions.
- Any other exciting features
- Chain Replication
- Contributed by
- Shinya Motohashi
Riak ¶
- Project description
- Riak is a Dynamo-inspired key/value store that scales predictably and easily. A truly fault-tolerant system, Riak has no single point of failure. No machines are special or central in Riak, so developers and operations professionals can decide exactly how fault-tolerant they want and need their applications to be.
- Project home
- basho.com
- Data model
- key-value
- Distribution model
- masterless cluster (inspired by Amazon Dynamo)
- Persistence model
- Disk (support multiple persistence engine)
- Client/network protocol(s)
- HTTP, Protocol buffers API
- Elasticity
- yes
- License
- Implementation language/supported OS
- Erlang
- Any other exciting features
- Pre-commit hooks, post-commit hooks, Links
Project Voldemort ¶
- Project description
- LinkedIn implementation of Amazon Dynamo key-value store
- Project home
- project-voldemort.com
- Data model
- key-value store
- Distribution model
- masterless cluster (inspired by Amazon Dynamo)
- Persistence model
- disk (pluggable storage engines)
- Client/network protocol(s)
- custom
- Elasticity
- License
- Apache 2.0
- Implementation language/supported OS
- Java
- Any other exciting features
- Pluggable serialization, pluggable storage engines
- Contributed by
Sausalito ¶
- Project description
- Sausalito powers XQuery in the Cloud. It is an integrated database and application server designed to run on cloud infrastructures.
- Project home
- 28msec.com
- Data model
- XML
- Distribution model
- Masterless cluster
- Persistence model
- Disk
- Client/network protocol(s)
- XQuery / HTTP
- Elasticity
- Auto Scaling + Elastic Load Balancing
- License
- Closed source
- Implementation language/supported OS
- MacOS X, Linux, Windows
- Any other exciting features
Sausalito provides an integrated stack to build web applications. It leverages Amazon AWS to scale up and down applications. Applications are entirely written in XQuery. This language has the following benefits:
- It is a unified framework for all tiers; database, application logic and presentation. This property allows to provide a single-tiered application and database architecture
- It is a functional programming language which can automatically be optimized and parallelized. This property is particularly important in cloud computing infrastructures.
- Contributed by
- 28msec Inc.
ScalienDB ¶
- Project description
- ScalienDB (the successor of Keyspace) is a distributed, consistent key-value store. You can define quorums, sets of ScalienDB nodes consistently replicating data between each other. You can define an arbitrary number of databases and tables. Data stored in a table is automatically partitioned into shards, and shards can be assigned to quorums. Access to tables are coordinated by a set of controllers, so there is no single point of failures. During development specific effort has been exercised to ensure that data is not lost no matter what happens, as long as at least one replica of the data remains accessible.
- Project home
- scalien.com
- Data model
- key-value
- Distribution model
- Paxos
- Persistence model
- Disk
- Client/network protocol(s)
- Custom
- Elasticity
- Nodes can be added to or removed from the system on the fly.
- License
- AGPL (server), BSD (client)
- Implementation language/supported OS
- ScalienDB is written in C++, with client interfaces for C, C++, Java, Python, C# and PHP. ScalienDB can run on Linux, Windows and OS/X
- Any other exciting features
- Web interface for administration, various consistency levels
- Contributed by
- Peter Schonhofen
Trinity ¶
- Project description
- Trinity is a graph database and computation platform over distributed memory cloud. As a database, it provides features such as highly concurrent query processing, transaction, consistency control. As a computation platform, it provides synchronous and asynchronous batch-mode computations on large scale graphs. Trinity can be deployed on one machine or hundreds of machines.
- Project home
- research.microsoft.com
- Data model
- graph database, hypergraph
- Distribution model
- one machine or hundreds of machines
- Persistence model
- memory-based graph store
- Client/network protocol(s)
- Elasticity
- License
- N/A
- Implementation language/supported OS
- N/A
- Any other exciting features
- Trinity supports large scale, offline batch processing. Both Synchronous and Asynchronous batch computation is supported.
Please remember that your submission should include at least one of the following points
- Project description
- a short one-liner description of the project
- Project home
- Data model
- k/v, document, column, graph, xml, object, etc.
- Distribution model
- Single server, master/slave, p2p replication, masterless cluster, etc.
- Persistence model
- Disk, memory, memory with snapshoting, etc.)
- Client/network protocol(s)
- Elasticity
- License
- Implementation language/supported OS
- Any other exciting features
- Contributed by