Evaluation of Spark (on YARN) at Twitter, and lessons learnt from our ongoing experiments. Read less

For a class project we developed a whole set of Pig scripts for TPC-H. Our goals are: 1) identifying the bottlenecks of Pig's performance especially of its relational operators, 2) studying how to write efficient scripts by making full use of Pig Latin's features, 3) comparing with Hive's TPC-H results for verifying both 1) and 2)Read less
This document discusses flexible indexing in Hadoop. It describes how Twitter uses Elephant-Twin, an open source library they developed, to create indexes at the block level or record level in Hadoop. Elephant-Twin allows minimal changes to jobs/scripts, indexes data without copying it, supports post-factum indexing, and indexes can be used to efficiently retrieve relevant data through an IndexedI
Yahoo! has begun evaluating Hive for use as part of its Hadoop stack. Since, in many peoples' minds, Hive and Pig are roughly equivalent and Pig Latin is very close to SQL, this has led to some confusion. Why are we interested in using both technologies? As we have looked at our workloads and analyzed our use cases, we have come to the conclusion that the different use cases require different tool
We are creating infrastructure to support ad-hoc analysis of very large data sets. Parallel processing is the name of the game. Our system runs on a cluster computing architecture, on top of which sit several layers of abstraction that ultimately bring the power of parallel computing into the hands of ordinary users. The layers in between automatically translate user queries into efficient paralle
ãªãªã¼ã¹ãé害æ å ±ãªã©ã®ãµã¼ãã¹ã®ãç¥ãã
ææ°ã®äººæ°ã¨ã³ããªã¼ã®é ä¿¡
å¦çãå®è¡ä¸ã§ã
j次ã®ããã¯ãã¼ã¯
kåã®ããã¯ãã¼ã¯
lãã¨ã§èªã
eã³ã¡ã³ãä¸è¦§ãéã
oãã¼ã¸ãéã
{{#tags}}- {{label}}
{{/tags}}