[B! presto] yassã®ãƒ–ãƒƒã‚¯ãƒžãƒ¼ã‚¯

yass id:yass

prestoã«é–¢ã™ã‚‹yassã®ãƒ–ãƒƒã‚¯ãƒžãƒ¼ã‚¯ (27)

${{author_name}}$

{{{comment_expanded}}}

{{label}}

{{#is_bookmark}}ãƒªã‚¹ãƒˆ{{/is_bookmark}}{{^is_bookmark}}ãƒªãƒ³ã‚¯{{/is_bookmark}}

${{author_name}}$
{{author_name}}{{created}}
{{ #comment }}{{ comment }}{{ /comment }}
- {{ label }}

${{author_name}}$

{{{comment_expanded}}}

{{label}}

{{#is_bookmark}}ãƒªã‚¹ãƒˆ{{/is_bookmark}}{{^is_bookmark}}ãƒªãƒ³ã‚¯{{/is_bookmark}}

Presto ãƒ™ãƒ¼ã‚¹ã®ãƒžãƒãƒ¼ã‚¸ãƒ‰ã‚µãƒ¼ãƒ“ã‚¹ Amazon Athena
yass 2017/08/26
Athena

presto
ãƒªãƒ³ã‚¯
A Benchmark Test on Presto, Spark Sql and Hive on Tez
Prestoã€Spark SQLã¨Hive on Tezã®æ€§èƒ½ã«é–¢ã—ã¦ã€æ•°ä¸‡ä»¶ã‹ã‚‰æ•°åå„„ä»¶ã¾ã§ã®ãƒ‡ãƒ¼ã‚¿ä¸Šã«ã€å¸¸ç”¨ã‚¯ã‚¨ãƒªãƒ‘ã‚¿ãƒ¼ãƒ³ã®å®Ÿè¡Œã‚¹ãƒ”ãƒ¼ãƒ‰ãªã©ã‚’æ¤œè¨¼ã—ã¦ã¿ãŸã€‚ We conducted a benchmark test on mainstream big data sql engines including Presto, Spark SQL, Hive on Tez. We focused on the performance over medium data (from tens of GB to 1 TB) which is the major case used in most services. Read less
yass 2016/11/26
presto

Spark SQL

tez

Hive

benchmark

hadoop
ãƒªãƒ³ã‚¯
Presto anatomy
Presto is an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes. It is written in Java and uses a pluggable backend. Presto is fast due to code generation and runtime compilation techniques. It provides a library and framework for building distributed services and fast Java collections. Plugins all
yass 2015/09/23
presto
ãƒªãƒ³ã‚¯
Prestoé›‘æ„Ÿ - wyukawa's diary
ç´„1å¹´é–“Prestoã‚’é‹ç”¨ã—ã¦ã„ã¦æ°—ã¥ã„ãŸã“ã¨ã‚’æ›¸ã„ã¦ã¿ã‚ˆã†ã¨æ€ã†ã€‚ PrestoãŒç´ æ™´ã‚‰ã—ã„OSSãƒ—ãƒãƒ€ã‚¯ãƒˆã§ã‚ã‚‹ã“ã¨ã¯é–“é•ã„ãªãã¦ã€Hiveã‚’ä½¿ã£ã¦ã„ã‚‹äººã¯ã‚¤ãƒ³ã‚¹ãƒˆãƒ¼ãƒ«ã—ã¦æã¯ç„¡ã„ã¨æ€ã†ã€‚ ãƒ¡ãƒªãƒƒãƒˆã¯ä¸‹è¨˜ã®é€šã‚Š Hiveã«æ¯”ã¹ã‚‹ã¨ã‚ªãƒ³ãƒ¡ãƒ¢ãƒªã§å‡¦ç†ã™ã‚‹ã®ã§é«˜é€Ÿã§ã‚¢ãƒ‰ãƒ›ãƒƒã‚¯ã‚¯ã‚¨ãƒªã«å‘ã„ã¦ã„ã‚‹ å®‰å®šã—ã¦ã„ã‚‹ã€‚ ã‚¹ãƒˆãƒ¬ãƒ¼ã‚¸ã‚’æŒãŸãªã„ã‚¢ãƒ¼ã‚ãƒ†ã‚¯ãƒãƒ£ãªã®ã§ã‚¢ãƒƒãƒ—ãƒ‡ãƒ¼ãƒˆãŒç°¡å˜ é–‹ç™ºãŒæ´»ç™ºã€‚æœ€è¿‘ã¯ä»¥å‰ã«æ¯”ã¹ã‚‹ã¨ãƒãƒ¼ã‚¸ãƒ§ãƒ³ã‚¢ãƒƒãƒ—ã®ã‚¹ãƒ”ãƒ¼ãƒ‰ã¯è½ã¡ã¦ããŸãŒãã‚Œã§ã‚‚3é€±é–“ã«1å›žã¯ãƒãƒ¼ã‚¸ãƒ§ãƒ³ã‚¢ãƒƒãƒ—ã—ã¦ã„ã‚‹ã€‚ ãƒã‚°å ±å‘Šã™ã‚‹ã¨æ•°æ—¥ã§ä¿®æ£ã•ã‚ŒãŸãƒãƒ¼ã‚¸ãƒ§ãƒ³ãŒãƒªãƒªãƒ¼ã‚¹ã•ã‚Œã‚‹ã€‚ é–‹ç™ºãŒã‚ªãƒ¼ãƒ—ãƒ³ã€‚pull requestã‚‚å—ã‘ä»˜ã‘ã¦ãŠã‚Šã‚³ãƒ¼ãƒ‰ãƒ¬ãƒ“ãƒ¥ãƒ¼ãŒä¸å¯§ ã‚³ãƒ¼ãƒ‰ãŒå¥‡éº—ã§ãƒ¢ãƒ€ãƒ³Javaã®ä»£è¡¨ã ã¨å‹æ‰‹ã«æ€ã£ã¦ã‚‹ æœ€è¿‘ã®å¤‰æ›´ã‚’è¦‹ã‚‹é™ã‚ŠPrestoã¯å®‰å®šæ€§ã‚’é‡è¦–ã—ã¦ã„ã‚‹ã‚ˆã†ã«è¦‹ãˆã€ã“ã‚Œã¯åƒ•ã®ã‚ˆã†ãªç®¡ç†è€…ã«ã¨ã£ã¦ã¯é‹ç”¨è² è·ãŒå°‘ãªããª
yass 2015/09/09
presto

hadoop
ãƒªãƒ³ã‚¯
SQL on Hadoop æ¯”è¼ƒæ¤œè¨¼ ã€2014æœˆ11æ—¥ã«ãŠã‘ã‚‹æ¤œè¨¼ãƒ¬ãƒãƒ¼ãƒˆã€‘
Impala Meetup 2014/10/31 @Tokyo è¬›æ¼”è³‡æ–™ ã€æ³¨æ„äº‹é …ã€‘ æœ¬è³‡æ–™ã§ç´¹ä»‹ã—ã¦ã„ã‚‹æ¤œè¨¼çµæžœã¯2014å¹´å½“æ™‚ã®ã‚‚ã®ã§ã™ã€‚å½“è©²ã‚½ãƒ•ãƒˆã‚¦ã‚§ã‚¢ã¯æˆé•·ã‚„æ”¹å–„ãŒæ—©ãã€ç¾æ™‚ç‚¹ã®ãƒãƒ¼ã‚¸ãƒ§ãƒ³ã§ã¯å¤§ããç•°ãªã‚‹æ©Ÿèƒ½ã‚„æ€§èƒ½ã¨ãªã£ã¦ã„ã¾ã™ã€‚ SQL on Hadoopã®æœ€æ–°æƒ…å ±ã«åŸºã¥ãã‚µãƒ¼ãƒ“ã‚¹ã‚„ã‚·ã‚¹ãƒ†ãƒ ã‚¤ãƒ³ãƒ†ã‚°ãƒ¬ãƒ¼ã‚·ãƒ§ãƒ³ã«ã”èˆˆå‘³ã‚’ãŠæŒã¡ã®æ–¹ã¯ã€NTTãƒ‡ãƒ¼ã‚¿ åŸºç›¤ã‚·ã‚¹ãƒ†ãƒ äº‹æ¥æœ¬éƒ¨ OSSãƒ—ãƒãƒ•ã‚§ãƒƒã‚·ãƒ§ãƒŠãƒ«ã‚µãƒ¼ãƒ“ã‚¹ï¼ˆé›»åãƒ¡ãƒ¼ãƒ«ï¼š hadoop [AT] kits.nttdata.co.jpï¼‰ ã«ã”ç›¸è«‡ãã ã•ã„ã€‚Read less
yass 2014/11/05
Hadoop

benchmark

comparison

hive

Impala

sql

presto

impala

tez
ãƒªãƒ³ã‚¯
SQL on Hadoop in Taiwan
This document discusses SQL engines for Hadoop, including Hive, Presto, and Impala. Hive is best for batch jobs due to its stability. Presto provides interactive queries across data sources and is easier to manage than Hive with Tez. Presto's distributed architecture allows queries to run in parallel across nodes. It supports pluggable connectors to access different data stores and has language bi
yass 2014/09/27
presto

hadoop

sql
ãƒªãƒ³ã‚¯
Prestoã¨ã‹Ansibleã¨ã‹ãã®è¾ºã®è©±ã‚’è»½ãæ›¸ã„ã¦ã¿ã‚‹ - wyukawa's diary
ä»Šæ—¥ã¯Prestoã¨ã‹Ansibleã¨ã‹ãã®è¾ºã®è©±ã‚’è»½ãæ›¸ã„ã¦ã¿ã‚ˆã†ã¨æ€ã„ã¾ã™ã€‚çªã£è¾¼ã‚“ã è©±ãŒå‡ºæ¥ã‚‹ã‚ã‘ã§ã¯ãªã„ã®ã§ã‚ã—ã‹ã‚‰ãšã€‚ åƒ•ã®ã¨ã“ã‚ã®ç’°å¢ƒã§ã¯Prestoã‚’ä½¿ã£ã¦ã„ã¦ã€Prestoã¯DataNodeã‚„NodeManagerã¨åŒå±…ã—ã¦ã¾ã™ã€‚ä¸»ãªãƒ¦ãƒ¼ã‚¹ã‚±ãƒ¼ã‚¹ã¯ã‚¢ãƒ‰ãƒ›ãƒƒã‚¯ã‚¯ã‚¨ãƒªã®å®Ÿè¡Œã§ã™ã€‚ã¨ã‚ã‚‹ãƒ¬ãƒãƒ¼ãƒˆã‚’ä½œã‚ŠãŸã„ã£ã¦ãªã£ãŸã¨ãã«ãƒ‡ãƒ¼ã‚¿ã®ä¸èº«ã‚’ãƒã‚§ãƒƒã‚¯ã™ã‚‹ã®ã«ä½¿ã„ã¾ã™ã€‚å¾“æ¥ã ã¨ã“ã‚ŒãŒHiveã ã£ãŸã®ã§ã™ãŒã€Hiveã ã¨MapReduceã«ãªã£ã¦é…ã„ã®ã§ï¼ˆãƒãƒ¼ã‚«ãƒ«ãƒ¢ãƒ¼ãƒ‰ã§æ¸ˆã‚€å ´åˆã‚‚ã‚ã‚‹ã‘ã©ï¼‰ã€ãã®ç‚¹Prestoã¯æ—©ãã¦ã„ã„ã§ã™ã€‚ãŸã ã“ã‚Œã¯åƒ•ã®ç’°å¢ƒãŒã‚¹ãƒ¢ãƒ¼ãƒ«ãƒ‡ãƒ¼ã‚¿ã ã‹ã‚‰ã£ã¦ã„ã†ã®ã‚‚ã‚ã£ã¦ã€åœ§ç¸®æ¸ˆã¿æ•°ç™¾GBã®ãƒ‡ãƒ¼ã‚¿ã«å¯¾ã—ã¦selectã‹ã‘ã‚‹ã¨ã‹ã ã¨Prestoã¨ã„ãˆã©ã‚‚é…ããªã‚‹ã¨æ€ã„ã¾ã™ã€‚ã‚ã¨ãªã«ã’ã«è‰¯ã„ã®ãŒPresto CLIçµŒç”±ã ã¨ã‚«ãƒ©ãƒ åãŒè¡¨ç¤ºã•ã‚Œã‚‹ã®ã§ã©ã®ãƒ‡ãƒ¼ã‚¿ãŒã©ã®ã‚«ãƒ©ãƒ ãªã®ã‹ã™ãåˆ†ã‹
yass 2014/08/03
" ä»¥å‰ã¯é›†è¨ˆç”¨RDBMSã¯å¿…è¦ã‹ãªã‚ã¨æ€ã£ã¦ãŸã‚“ã§ã™ã‘ã©ã€é›†è¨ˆãƒ‡ãƒ¼ã‚¿ã‚’å˜ç´”ã«selectã™ã‚‹ã‚ˆã†ãªã‚±ãƒ¼ã‚¹ã ã£ãŸã‚‰Prestoã§ã‚‚ååˆ†é€Ÿã„ã®ã§é›†è¨ˆç”¨RDBMSã¯ç„¡ãã¦ã‚‚ã„ã„ã‹ã‚‚ã£ã¦æ€ã„å§‹ã‚ã¦ã¾ã™ã€‚"

ansible

presto

hive

hadoop
ãƒªãƒ³ã‚¯
War of the Hadoop SQL engines. And the winner is ...? - Sonra
War of the Hadoop SQL engines. And the winner is â€¦? You may have wondered why we were quiet over the last couple of weeks? Well, we locked ourselves into the basement and did some research and a couple of projects and PoCs on Hadoop, Big Data, and distributed processing frameworks in general. We were also looking at Clickstream data and Web Analytics solutions. Over the next couple of weeks we wil
yass 2014/07/28
" Right now I would run both batch style queries (ETL) and interactive queries on Hive Tez as Hive offers the richest SQL feature set, especially analytic functions and supports a wide set of file formats. "

hadoop

sql

hive

tez

impala

presto

spark

infinidb

drill
ãƒªãƒ³ã‚¯
MPP on Hadoop, Redshift, BigQuery - Go ahead!
Twitterã§ã€Œæ—©ãä»Šæµè¡Œã®MPPã®å¤§ã¾ã‹ãªä½¿ã„æ–¹ã®é•ã„æ›¸ã‘ã‚ˆï¼ã€ã¨ã„ã†ãƒ—ãƒ¬ãƒƒã‚·ãƒ£ãƒ¼ãŒåŠç«¯ãªã„ã®ã§ã¦ãã¨ã†ã«æ›¸ãã¾ã™ï¼Žã“ã®è¨˜äº‹ã¯ä¿ºã®çµŒé¨“ã¨å‹‰å¼·ä¼šãªã©ã§ãƒ¦ãƒ¼ã‚¶ã‹ã‚‰èžã„ãŸè©±ã‚’ã‚‚ã¨ã«æ›¸ã„ã¦ã„ã‚‹ã®ã§ï¼Œã™ã¹ã¦ãŒä¿ºã®çµŒé¨“ã§ã¯ã‚ã‚Šã¾ã›ã‚“(ç‰¹ã«BigQuery)ï¼Žå„ç¤¾ã®SAã®äººã¨ã‹ã«èžã‘ã°ï¼Œã‚‚ã£ã¨è‰¯ã„ã‚¢ãƒ—ãƒãƒ¼ãƒã¨ã‹è©³ç´°ã‚’æ•™ãˆã¦ãã‚Œã‚‹ã‹ã‚‚ã—ã‚Œã¾ã›ã‚“ï¼Ž ã‚ªãƒ³ãƒ—ãƒ¬ãƒŸã‚¹ã®å•†ç”¨MPPã¯ä½¿ã£ãŸã“ã¨ãªã„ã®ã§ãƒŽãƒ¼ã‚³ãƒ¡ãƒ³ãƒˆã§ã™ï¼Ž MPP on Hadoopã§PrestoãŒãƒ¡ã‚¤ãƒ³ãªã®ã¯ä»Šä¸€ç•ªä½¿ã£ã¦ã„ã‚‹ã‹ã‚‰ã§ï¼ŒImpalaãªã©ä»–ã®MPP on Hadoopçš„ãªã‚‚ã®ã‚‚ä¼¼ãŸã‚ˆã†ãªæ„Ÿã˜ã‹ãªã¨æ€ã£ã¦ã„ã¾ã™ï¼Ž ã‚‚ã¡ã‚ã‚“å®Ÿè£…ã®é•ã„ãªã©ãŒã‚ã‚‹ã®ã§ï¼Œãã®è¾ºã¯é©å®œè‡ªåˆ†ã§è£œé–“ã—ã¦ãã ã•ã„ï¼Ž å‰æ ã‚¢ãƒ—ãƒªã‚±ãƒ¼ã‚·ãƒ§ãƒ³ã‚’é–‹ç™ºã—ã¦ã„ã¦ï¼Œãã®ãŸã‚ã®è§£æžåŸºç›¤ã‚’ä¸€ã‹ã‚‰ä½œã‚‹ï¼Ž ç°¡å˜ãªã¾ã¨ã‚ ãƒ‡ãƒ¼ã‚¿ã‚’è²¯ã‚ã‚‹æ‰€ãŒä½œã‚Œã‚‹ã®ã§ã‚ã‚Œã°ï¼Œãã“ã«ç›´æŽ¥ã‚¯ã‚¨ãƒªã‚’æŠ•ã’ã‚‰ã‚Œã‚‹Pre
yass 2014/07/24
BigQuery

RedShift

Impala

Presto

Hadoop

mpp
ãƒªãƒ³ã‚¯
Cloudera Blog
We are thrilled to announce the general availability of the Cloudera AI Inference service, powered by NVIDIA NIM microservices, part of the NVIDIA AI Enterprise platform, to accelerate generative AI deployments for enterprises. This service supports a range of optimized AI models, enabling seamless and scala ble AI inference. Background The generative AI landscape is evolving [â€¦] Read blog post
yass 2014/05/30
" Shark required more memory than available in the cluster to run the Reporting and Deep Analytics queries on RDDs (and thus those queries could not be completed) "

impala

hive

tez

shark

spark

presto

parquet

orcfile

benchmark

hadoop
ãƒªãƒ³ã‚¯
Read Data in Parquet File Format by zhenxiao Â· Pull Request #1147 Â· prestodb/presto
yass 2014/05/16
presto

parquet
ãƒªãƒ³ã‚¯
Netflix running Presto in the AWS Cloud
Netflix runs Presto in its AWS cloud environment to enable low-latency ad-hoc queries on petabyte-scale data stored in S3. Some key things Netflix did include optimizing Presto to read from and write directly to S3, fixing bugs, integrating Presto with its EMR and Ganglia monitoring, and deploying a 100+ node Presto cluster that handles over 1000 queries per day. Performance testing showed Presto
yass 2014/05/16
Netflix

presto

SequenceFile

parquet

hadoop
ãƒªãƒ³ã‚¯
Presto in the cloud
yass 2014/05/16
Qubole

presto

hadoop

cloud
ãƒªãƒ³ã‚¯
Announcing General Availability of Presto-as-a-Service | Qubole
yass 2014/04/30
Qubole

hadoop

presto

prestodb
ãƒªãƒ³ã‚¯
CrateDB | The Database for Real-Time Analytics and Hybrid Search
Real-time Analytics Execute ad-hoc queries on billions of records in milliseconds. Columnar storage guarantees ultra-fast aggregations, enabling instant data-driven decisions. Begin with a simple query and delve into complex data relationships, revealing trends and patterns across diverse data types. Learn more > Effortless search across structured, semi-structured, geospatial, and vector data. Pe
yass 2014/04/19
" Crate Data is a distributed system that runs on one machine or a cluster of machines. Crate comes in one complete install package. It includes solid established open source components (Presto, Elasticsearch, Lucene, Netty) "

sql

presto

netty

lucene

elasticsearch

distributed
ãƒªãƒ³ã‚¯
Presto Performance - Qubole Engineering Posts - Quora
Presto is an open source distributed SQL query engine, developed by Facebook. Presto was designed and written from the ground up for interactive analytics and approaches the speed of commercial data warehouses. Qubole started its Presto-as-a-Service program a few weeks ago to make it easily acces...
yass 2014/04/14
" Presto showed a speedup of 2-7.5x over Hive for these queries. "

presto

hadoop

hive

Qubole
ãƒªãƒ³ã‚¯
What are the main differences between Facebook, Presto, and Amplab Shark?
Answer (1 of 2): 1. Primary Use Case: While both are intended for analytics, Shark's primary use case is providing SQL to an (extremely fast) in-memory database, with support also for on-disk (or abstract) data sources. Presto is designed to be a fast SQL engine for the latter, and does not have ...
yass 2014/02/14
"Â Presto has implemented some approximate aggregation operators with hard-coded characteristics (99% confidence intervals, fixed sampling, seeÂ BlinkDB "

presto

shark

spark

impala

comparison

blinkdb
ãƒªãƒ³ã‚¯
Presto-as-a-Service:AWSã§ã®ã‚¤ãƒ³ã‚¿ãƒ©ã‚¯ãƒ†ã‚£ãƒ–ãªSQLå®Ÿè¡Œ
Spring Bootã«ã‚ˆã‚‹APIãƒãƒƒã‚¯ã‚¨ãƒ³ãƒ‰æ§‹ç¯‰å®Ÿè·µã‚¬ã‚¤ãƒ‰ ç¬¬2ç‰ˆ ä½•åƒäººã‚‚ã®é–‹ç™ºè€…ãŒã€InfoQã®ãƒŸãƒ‹ãƒ–ãƒƒã‚¯ã€ŒPractical Guide to Building an API Back End with Spring Bootã€ã‹ã‚‰ã€Spring Bootã‚’ä½¿ã£ãŸREST APIæ§‹ç¯‰ã®åŸºç¤Žã‚’å¦ã‚“ã ã€‚ã“ã®æœ¬ã§ã¯ã€å‡ºç‰ˆæ™‚ã«æ–°ã—ããƒªãƒªãƒ¼ã‚¹ã•ã‚ŒãŸãƒãƒ¼ã‚¸ãƒ§ãƒ³ã§ã‚ã‚‹ Spring Boot 2 ã‚’ä½¿ç”¨ã—ã¦ã„ã‚‹ã€‚ã—ã‹ã—ã€Spring Boot3ãŒæœ€è¿‘ãƒªãƒªãƒ¼ã‚¹ã•ã‚Œã€é‡è¦ãªå¤‰...
yass 2014/02/10
" æ—¢ã«Githubä¸Šã§ã¯2,000ã®ã‚¹ã‚¿ãƒ¼ãŒä»˜ãã€350ã®ãƒ•ã‚©ãƒ¼ã‚¯ãŒã‚ã‚Šã€Impalaã®ã‚ˆã†ãªåŒç¨®ã®ãƒ—ãƒã‚¸ã‚§ã‚¯ãƒˆã‚ˆã‚Šã‚‚äººæ°—ã«ãªã£ã¦ã„ã‚‹ã€‚"

presto

hadoop

qubole
ãƒªãƒ³ã‚¯
https://www.xtendsys.net/blog/post-1
yass 2014/02/09
presto

facebook

hadoop

hive
ãƒªãƒ³ã‚¯
Hardware requirements for Presto
Most people are running Trino (formerly PrestoSQL) on the Hadoop nodes they already have. At Facebook we typically run Presto on a few nodes within the Hadoop cluster to spread out the network load. Generally, I'd go with the industry standard ratios for a new cluster: 2 cores and 2-4 gig of memory for each disk, with 10 gigabit networking if you can afford it. After you have a few machines (4+),
yass 2014/01/24
" At Facebook / We run our JVMs with a 16 gigabyte heap to leave most memory available for OS buffers / On the machines we run Presto we don't run MapReduce tasks / Most of the Presto machines we are on have 16 real cores and we use processor affinity to limit Presto to 12 cores "

presto

Facebook

hardware

server
ãƒªãƒ³ã‚¯
1 2 æ¬¡ã®ãƒšãƒ¼ã‚¸