Posts about NoSQL databases and Polyglot persistence from Monday, 18 April 2011
NoSQL Speed Bumps
Dr.Doob’s Ken North writes a history of NoSQL failures:
Organizations using the new NoSQL data stores are like the settlers who often traveled uncharted territory in the 19th-century American West. Some days involve the beauty of the journey; but on other days, you’re dodging arrows.
I’m not arguing that NoSQL databases are still young and so people using them are running into bugs or even misusing them. But aren’t both these issues still present in the relational databases world after 30 years?
Original title and link: NoSQL Speed Bumps (NoSQL databases © myNoSQL)
Open Large Datasets
Quora thread listing over 50 sources of open large datasets. Combined with datasciencetoolkit.org and NoSQL databases these may lead to interesting experiments. Remember BigData doesn’t necessarily need big budgets.
Original title and link: Open Large Datasets (NoSQL databases © myNoSQL)
Scaling an RDBMS in 6 Steps
From Gavin Heavyside’s slides:
- Launch successful service
- Read saturation: add caching
- Write saturation: add hardware
- Queries slow down: denormalize
- Reads still too slow: prematerialise common queries, stop joining
- Writes too slow: drop secondary indexes and triggers
Original title and link: Scaling an RDBMS in 6 Steps (NoSQL databases © myNoSQL)
Multi-tenancy and Cloud Storage Performance
Adrian Cockcroft[1] has a great explanation of the impact of multi-tenancy on cloud storage performance. The connection with NoSQL databases is not necessarily in the Amazon EBS and SSD Price, Performance, QoS comparison, but:
- Reddit’s story of running Cassandra & PostgreSQL on Amazon EBS (nb: their setup led to a prolongued downtime)
- MongoDB in the Amazon Cloud
and
If you ever see public benchmarks of AWS that only use m1.small, they are useless, it shows that the people running the benchmark either didn’t know what they were doing or are deliberately trying to make some other system look better. You cannot expect to get consistent measurements of a system that has a very high probability of multi-tenant interference.
Original title and link: Multi-tenancy and Cloud Storage Performance (NoSQL databases © myNoSQL)
via: http://perfcap.blogspot.com/2011/03/understanding-and-using-amazon-ebs.html
Amazon EC2 Cassandra Cluster with DataStax AMI
This AMI does the following:
- installs Cassandra 0.7.4 on a Ubuntu 10.10 image
- configures emphemeral disks in raid0, if applicable (EBS is a bad fit for Cassandra
- configures Cassandra to use the root volume for the commitlog and the ephemeral disks for data files
- configures Cassandra to use the local interface for intra-cluster communication
- configures all Cassandra nodes with the same seed for gossip discovery
Note the “EBS is a bad fit for Cassandra”. That’s what Adrian Cockcroft explains in Multi-tenancy and Cloud Storage Performance.
Original title and link: Amazon EC2 Cassandra Cluster with DataStax AMI (NoSQL databases © myNoSQL)
via: http://www.datastax.com/dev/blog/setting-up-a-cassandra-cluster-with-the-datastax-ami
Why would I use Document Database?
Fitzchak Yitzchaki:
Using a document database like RavenDB will give you the following benefits:
- Better performance in your application
- Faster development time
- Better maintenance experience
Wide claims. To me, document databases’ main strengths are:
- data modeling flexibility
- preserving querying capabilities
Everything else is either a consequence of these or an implementation specific improvement which most of the time is the result of another tradeoff.
Original title and link: Why would I use Document Database? (NoSQL databases © myNoSQL)