ALL COVERED TOPICS

NoSQLBenchmarksNoSQL use casesNoSQL VideosNoSQL Hybrid SolutionsNoSQL PresentationsBig DataHadoopMapReducePigHiveFlume OozieSqoopHDFSZooKeeperCascadingCascalog BigTableCassandraHBaseHypertableCouchbaseCouchDBMongoDBOrientDBRavenDBJackrabbitTerrastoreAmazon DynamoDBRedisRiakProject VoldemortTokyo CabinetKyoto CabinetmemcachedAmazon SimpleDBDatomicMemcacheDBM/DBGT.MAmazon DynamoDynomiteMnesiaYahoo! PNUTS/SherpaNeo4jInfoGridSones GraphDBInfiniteGraphAllegroGraphMarkLogicClustrixCouchDB Case StudiesMongoDB Case StudiesNoSQL at AdobeNoSQL at FacebookNoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

dynomite: All content tagged as dynomite in NoSQL databases and polyglot persistence

NoSQL databases, Hadoop, Big Data: Pinned tabs Nov.4th

01: SlamData is an open source project that allows running SQL against MongoDB. The syntax for projections for deep documents is what you’d expect: foo.bar, foo[2]. And there’s support for some special functions array_length, flatten_array, flatten_object.


02: 4 presentations recorded at GraphConnect, Neo4j’s conference–you can read about the conference here and here—focused on visualizing data.

  • Creating interactive graph visualizations” from Cambridge Intelligence
  • Business intelligence and analytics with the graph” from GraphAlchemist
  • Visualizing the Neo4j graph database with Tom Sawyer perspectives” from TS Software
  • 6 degrees of Kevin Bacon: How to use Linkurious to explore and visualize graphs” from Linkurious.

03: A tutorial on how to use Amazon EMR, Impala and Tableau to analyze and visualize data stored in Amazon S3. Tableau connects to Hive or Impala through an ODBC driver for EMR, but “you can contact Tableau to find out how to activate Amazon EMR as a data source”.

Tableau is a tool for interactive visualization of data. I’ve always wondered what that means when combining it with big data sources.


04: Daniel Gutierrez published a good article that compares in-memory databases with in-memory data grids, their strengths and weaknesses, and how they fit in existing environments.


05: Netflix announced a new open source project that implements a Dynamo-style distributed engine that can be used to together with single-server data storage engines (Redis, Memcached, LevelDB, etc.)

The way Dynomite works is by deploying a co-located process with the storage engine that fulfills the following roles:

  1. proxy (i.e. it receives client traffic)
  2. coordinator of requests
  3. gossiper (i.e.

If you are familiar with Riak, you’ll see that’s very similar to its architecture: a generic distributed layer with pluggable storage engines. As far as I can tell from the blog post, the main differences are:

  1. Dynomite preserves the storage engine protocol
  2. Replication to replica nodes is done asynchronously and the ack is sent back once a write happens on a local node (i.e. weaker write acks)

Last but not least, the Dynomite client features are very similar to the ones the DataStax clients for Cassandra are offering:

  1. persistent connection pooling
  2. smart request routing
  3. automatic (and configurable) failover
  4. retry policies
  5. connection pooling metrics

Current Dynomite implementation has support for Redis and Memcached which I think are the engines Netflix is planning to use. Source code is on GitHub.


06: @antirez continues the investigation of the latency spikes identified by the Stripe’s chaos monkey test on a Redis cluster. He looks into the fork implementation in the Linux kernel, malloc vs jemalloc usage.


07: A long, very long, articles about Hadoop, its components and how it might or might not fit into the architecture of your data platform even if its clear cost advantage.


08: If you are interested or using Neo4j and need a cheatsheet, Michael Hunger has published one with DZone’s help. It starts with a brief definition of what a graph data model is and then spends the rest of the pages on Cypher, the Neo4j’s graph query language.


09: According to Joe Caserta, author of book “The Data Warehouse ETL Tookit“, Hadoop’s adoption went from the acadamic space in 2009-2010, to POCs in 2011-2012, with the expectation that many of these will go into production in 2014-2015.


10: An interesting idea from MapR to use labels for categorizing the different nodes in a cluster thus allowing the execution of jobs on subsets of nodes. A patch is available for YARN, but wasn’t applied yet.


11: According to the new report “Predictions 2015: Hadoop Will Become A Cornerstone Of Your Business Technology Agenda” from Forrester Research, Hadoop will become a must-have for large enterprises, SQL will be the top application, and most of the Hadoop clusters will live in the cloud. We’ll see how these predictions stand the test of time.


12: What file format should I use for storing data in Hadoop? Gwen Shapira’s answer is Avro. The main reason? Avro stores the data schema in the file.

Original title and link: NoSQL databases, Hadoop, Big Data: Pinned tabs Nov.4th (NoSQL database©myNoSQL)