NoSQL

This page is a brief introduction to NoSQL offering a set of definitions of the NoSQL term and NoSQL databases, explaining the reasons behind NoSQL databases.

For various guides and tutorial for getting started with NoSQL databases, check out the NoSQL: Guides, Tutorials, Books, Papers page. If you are interested in finding who is using NoSQL solutions, check the Powered by NoSQL reference. Then choose your NoSQL database and NoSQL library.

What is NoSQL?

A list of (possible) definitions for NoSQL (also referred to as NoSQL databases or NoSQL stores):

NoSQL is a movement promoting a loosely defined class of non-relational data stores that break with a long history of relational databases. These data stores may not require fixed table schemas, usually avoid join operations and typically scale horizontally. Academics and papers typically refer to these databases as structured storage.
Wikipedia

Non-relational next generation operational datastores and databases
Dwight Merriman, CEO 10gen

Next Generation Databases mostly addressing some of the points: being non-relational, distributed, open-source and horizontal scalable.
nosql-databases.org

NoSQL is a term coined by Carlo Strozzi and repurposed by Eric Evans to refer to “some” storage systems. The NoSQL term should be used as in the Not-Only-SQL and not as No to SQL or Never SQL.

NoSQL is about choice

NoSQL is not about any one feature of any of the projects. NoSQL is not about scaling, NoSQL is not about performance, NoSQL is not about hating SQL, NoSQL is not about ease of use, NoSQL is not about sharding, NoSQL is not about throughput, NoSQL is not about speed, NoSQL is not about dropping ACID, NoSQL is not about Eventual Consistency, NoSQL is not about CAP, NoSQL is not about open standards, NoSQL is not about Open Source and NoSQL is most likely not about whatever else you want NoSQL to be about. NoSQL is about choice
Jan Lehnardt, CouchDB

Why NoSQL?

Handling massive amounts of data
- Exponential growth of newly created digital content
- More value around data
- Build value around data by connecting the dots
Connectedness
Information format
Data usage scenarios (plus open data)

Fundamental papers

Google BigTable
Amazon Dynamo
BASE: An ACID Alternative
Brewster’s CAP theorem (pdf). Julian Browne’s article can be helpful too.

NoSQL databases

Columnar Stores or Wide Column Stores

BigTable:
Cassandra: a highly scalable second-generation distributed database, bringing together Dynamo’s fully distributed design and Bigtable’s ColumnFamily-based data model.
HBase:
Hypertable

Document stores or Document databases

Colayer
CouchDB
FleetDB
Jackrabbit
Lotus Notes
MongoDB
OrientDB
Raven DB
ThruDB
Terrastore

Graph databases

AllegroGraph
Bigdata
Core Data
DEX: a high-performance graph database written in Java and C++. Its main characteristic is its performance storage and retrieval for large graphs, in the order of billions of nodes, edges and attributes, implemented with specialized structures.
Filament
FlockDB
HyperGraphDB
InfiniteGraph
InfoGrid
Neo4j
OpenLink Virtuoso
Sones
VertexDB
Trinity: a graph database and computation platform over distributed memory cloud. As a database, it provides features such as highly concurrent query processing, transaction, consistency control. As a computation platform, it provides synchronous and asynchronous batch-mode computations on large scale graphs.

Key-Value Stores

Amazon SimpleDB
Azure Table Storage
Berkeley DB
Chordless
Dynomite
GenieDB: GenieDB is designed to be a pragmatic solution to a widespread class of data storage problems, with a high-performance native API alongside compatability with MySQL.
GT.M / M.DB
HamsterDB
Hibari: Hibari is a production-ready, distributed, key-value, big data store. Hibari uses chain replication for strong consistency, high-availability, and durability. Hibari has excellent performance especially for read and large value operations.
KAI
KaTree
Kumofs
LightCloud
Membase
Memcachedb
Mnesia
NorthScale
Orient Key/Value Server
Pincaster
PNUTS/Sherpa
Project Voldemort: LinkedIn open source implementation of Amazon Dynamo key-value store
Redis
Riak: Dynamo-inspired key/value store that scales predictably and easily.
Scalaris
ScalienDB / Scalien Keyspace: a distributed, consistent key-value store
Tokyo Cabinet

Multi-value databases

OpenQM
Rocket U2

Object databases

Db4o
GemStone/S
KiokuDB
InterSystems Caché
Neo
Objectivity/DB
Perst
Progress
Versant
ZODB

XML databases

BerkleyDB XML
EMC Documentum xDB
eXist
MarkLogic Server
Sausalito: Sausalito powers XQuery in the Cloud
Sedna
Tamino
Xindice

Unclassified

CloudKit
FluidDB
Moneta
Perservere

Cassandra ¶

Project description: The Apache Cassandra Project develops a highly scalable second-generation distributed database, bringing together Dynamo’s fully distributed design and Bigtable’s ColumnFamily-based data model.
Project home: cassandra.apache.com
Data model: wide column
Distribution model: masterless cluster (inspired by Amazon Dynamo)
Persistence model: Disk
Client/network protocol(s): Custom
Elasticity: Yes
License: Apache
Implementation language/supported OS: java
Any other exciting features: Fault tolerant, durable
Contributed by

DEX ¶

Project description

DEX is a high-performance graph database written in Java and C++. Its main characteristic is its performance storage and retrieval for large graphs, in the order of billions of nodes, edges and attributes, implemented with specialized structures.

Project home

sparsity-technologies.com

Data model

Labeled Directed Attributed Multigraph

Labeled: nodes and edges belong to types
Directed: directed edges
Attributed: Nodes and Edges with attributes
Multigraph: multiple edges between the same nodes even from the same edge type.

Distribution model

not distributed

Persistence model

Disk

Client/network protocol(s)

None

Elasticity

not applicable

License

Free evaluation version available at http://sparsity-technologies.com/dex_downloads.php (limited to 1 Million nodes, no restriction on edges, and one concurrent user session). Non-restricted version licensed by Sparsity-Technologies, more information at [email protected]

Implementation language/supported OS

Java and C++ / Windows and Linux

Any other exciting features

Quick answering time for complex queries , Multiple graph algorithms available, Regular expression querying, Large char objects for attributes , Materialized neighbors, Indexed or not indexed attributes, CSV import/export, Script loaders and lots of more exciting features.

Contributed by

Dàmaris Coll

GenieDB ¶

Project description: GenieDB is designed to be a pragmatic solution to a widespread class of data storage problems, with a high-performance native API alongside compatability with MySQL.
Data model: k/v, document, and column (we’re considering adding support for graph, too)
Distribution model: Masterless cluster, supporting geographically dispersed operation.
Persistence model: Disk
Client/network protocol(s): Native API C library (accessible from PHP etc), or through MySQL
Elasticity: Sure, you can grow and shrink clusters on-line. We provide a full replica on every server, so you can scale read capacity this way, but you can only write as fast as the slowest server can keep up with, and store as much data as your smallest server.
License: Closed-source (for now)
Implementation language/supported OS: The core’s in C; the MySQL plugin is in C++ out of necessity. Primary development is on Linux, but a Solaris port exists, and BSD is in the pipeline.
Any other exciting features: We’ve aimed for pragmatism, so there are lots of little things that avoid failure cases or reduce administrative overhead, such as the write flow control system (to avoid snowballing replication queues), the ability to trade off performance/cost/semantics tradeoffs at various levels, and so on.
Contributed by: Alaric Snell-Pym

Hibari ¶

Project description

Hibari is a production-ready, distributed, key-value, big data store. Hibari uses chain replication for strong consistency, high-availability, and durability. Hibari has excellent performance especially for read and large value operations.

Data model

key-value

Distribution model

Chain replication between data nodes; Cluster’s global hash managed by master/slave admin nodes

Persistence model

disk, disk+memory

Client/network protocol(s)

Client protocol

A native Erlang API, via Erlang’s native message-passing mechanism
Amazon S3 protocol, via HTTP
UBF, Joe Armstrong’s “Universal Binary Format” protocol, via TCP
UBF via several minor variations of TCP transport
UBF over JSON-RPC, via HTTP
JSON-encoded UBF, via TCP

Protocols under development:

Memcached, via TCP
UBF over Thrift, via TCP
UBF over Protocol Buffers, via TCP

Elasticity

License

Apache Public License version 2.0

Implementation language/supported OS

Erlang/OTP. RedHat, CentOS, and Fedora Linux distributions.

Any other exciting features

Chain Replication

Contributed by

Shinya Motohashi

Riak ¶

Project description: Riak is a Dynamo-inspired key/value store that scales predictably and easily. A truly fault-tolerant system, Riak has no single point of failure. No machines are special or central in Riak, so developers and operations professionals can decide exactly how fault-tolerant they want and need their applications to be.
Project home: basho.com
Data model: key-value
Distribution model: masterless cluster (inspired by Amazon Dynamo)
Persistence model: Disk (support multiple persistence engine)
Client/network protocol(s): HTTP, Protocol buffers API
Elasticity: yes
License
Implementation language/supported OS: Erlang
Any other exciting features: Pre-commit hooks, post-commit hooks, Links

Project Voldemort ¶

Project description: LinkedIn implementation of Amazon Dynamo key-value store
Project home: project-voldemort.com
Data model: key-value store
Distribution model: masterless cluster (inspired by Amazon Dynamo)
Persistence model: disk (pluggable storage engines)
Client/network protocol(s): custom
Elasticity
License: Apache 2.0
Implementation language/supported OS: Java
Any other exciting features: Pluggable serialization, pluggable storage engines
Contributed by

Sausalito ¶

Project description

Sausalito powers XQuery in the Cloud. It is an integrated database and application server designed to run on cloud infrastructures.

Project home

28msec.com

Data model

XML

Distribution model

Masterless cluster

Persistence model

Disk

Client/network protocol(s)

XQuery / HTTP

Elasticity

Auto Scaling + Elastic Load Balancing

License

Closed source

Implementation language/supported OS

MacOS X, Linux, Windows

Any other exciting features

Sausalito provides an integrated stack to build web applications. It leverages Amazon AWS to scale up and down applications. Applications are entirely written in XQuery. This language has the following benefits:

It is a unified framework for all tiers; database, application logic and presentation. This property allows to provide a single-tiered application and database architecture
It is a functional programming language which can automatically be optimized and parallelized. This property is particularly important in cloud computing infrastructures.

Contributed by

28msec Inc.

ScalienDB ¶

Project description: ScalienDB (the successor of Keyspace) is a distributed, consistent key-value store. You can define quorums, sets of ScalienDB nodes consistently replicating data between each other. You can define an arbitrary number of databases and tables. Data stored in a table is automatically partitioned into shards, and shards can be assigned to quorums. Access to tables are coordinated by a set of controllers, so there is no single point of failures. During development specific effort has been exercised to ensure that data is not lost no matter what happens, as long as at least one replica of the data remains accessible.
Project home: scalien.com
Data model: key-value
Distribution model: Paxos
Persistence model: Disk
Client/network protocol(s): Custom
Elasticity: Nodes can be added to or removed from the system on the fly.
License: AGPL (server), BSD (client)
Implementation language/supported OS: ScalienDB is written in C++, with client interfaces for C, C++, Java, Python, C# and PHP. ScalienDB can run on Linux, Windows and OS/X
Any other exciting features: Web interface for administration, various consistency levels
Contributed by: Peter Schonhofen

Trinity ¶

Project description: Trinity is a graph database and computation platform over distributed memory cloud. As a database, it provides features such as highly concurrent query processing, transaction, consistency control. As a computation platform, it provides synchronous and asynchronous batch-mode computations on large scale graphs. Trinity can be deployed on one machine or hundreds of machines.
Project home: research.microsoft.com
Data model: graph database, hypergraph
Distribution model: one machine or hundreds of machines
Persistence model: memory-based graph store
Client/network protocol(s)
Elasticity
License: N/A
Implementation language/supported OS: N/A
Any other exciting features: Trinity supports large scale, offline batch processing. Both Synchronous and Asynchronous batch computation is supported.

Please remember that your submission should include at least one of the following points

Project description: a short one-liner description of the project
Project home
Data model: k/v, document, column, graph, xml, object, etc.
Distribution model: Single server, master/slave, p2p replication, masterless cluster, etc.
Persistence model: Disk, memory, memory with snapshoting, etc.)
Client/network protocol(s)
Elasticity
License
Implementation language/supported OS
Any other exciting features
Contributed by

by Alex Popescu
&
Ana-Maria Bacalu

About Contact