Hivemall is an open source machine learning library built as a collection of Hive UDFs. It provides over 100 machine learning algorithms and functions for tasks like feature engineering, evaluation, and recommendation. Hivemall entered the Apache Incubator in 2016 and the first Apache release (v0.5.0) is upcoming. It supports platforms like Hive, Spark, and Pig for scalable parallel processing.Rea
The Event Collector system was one of the legacy systems at Treasure Data. Over time it faced several performance and scalability problems as usage increased. Engineers addressed these problems through optimizations like increasing socket backlogs, caching parsers, running processes in parallel, and moving deduplication to a separate thread to avoid blocking the input pipeline. These changes helpe
Bigdam is a planet-scale data ingestion pipeline designed for large-scale data ingestion. It addresses issues with the traditional pipeline such as imperfectqueue throughput limitations, latency in queries from event collectors, difficulty maintaining event collector code, many small temporary and imported files. The redesigned pipeline includes Bigdam-Gateway for HTTP endpoints, Bigdam-Pool for d
User defined partitioning is a new partitioning strategy in Treasure Data that allows users to specify which column to use for partitioning, in addition to the default "time" column. This provides more flexible partitioning that better fits customer data platform workloads. The user can define partitioning rules through Presto or Hive to improve query performance by enabling colocated joins and fi
ãªãªã¼ã¹ãé害æ å ±ãªã©ã®ãµã¼ãã¹ã®ãç¥ãã
ææ°ã®äººæ°ã¨ã³ããªã¼ã®é ä¿¡
å¦çãå®è¡ä¸ã§ã
j次ã®ããã¯ãã¼ã¯
kåã®ããã¯ãã¼ã¯
lãã¨ã§èªã
eã³ã¡ã³ãä¸è¦§ãéã
oãã¼ã¸ãéã
{{#tags}}- {{label}}
{{/tags}}