ベイズ最適化によるハイパーパラメータ探索についてざっくりと解説しました。
今回紹介する内容の元となった論文
Bergstra, James, et al. "Algorithms for hyper-parameter optimization." 25th annual conference on neural information processing systems (NIPS 2011). Vol. 24. Neural Information Processing Systems Foundation, 2011.
https://hal.inria.fr/hal-00642998/
The document discusses graph databases and their properties. Graph databases are structured to store graph-based data by using nodes and edges to represent entities and their relationships. They are well-suited for applications with complex relationships between entities that can be modeled as graphs, such as social networks. Key graph database technologies mentioned include Neo4j, OrientDB, and TinkerPop which provides graph traversal capabilities.
The document discusses the paper "t-vMF Similarity for Regularizing Intra-Class Feature Distribution" presented at CVPR2021. The paper proposes a new similarity measure called t-vMF similarity that can control the width of the peak and skirt of the cosine similarity. This allows intra-class variance to be reduced while preventing gradient vanishing, especially for imbalanced or small-scale datasets where maximizing discrimination is more important than minimizing intra-class variance. The t-vMF similarity is implemented by considering the von Mises-Fisher distribution in the process of the softmax cross-entropy loss, making it simple to implement.
This document summarizes Masahiro Nakagawa's presentation on Fluentd and Embulk. Fluentd is a data collector for unified logging that allows for streaming data transfer based on JSON. It is written in Ruby and uses plugins to collect, process, and output data. Embulk is a bulk loading tool that allows high performance parallel processing of data to load it into various databases and storage systems. Both tools use a pluggable architecture to provide flexibility in handling different data sources and targets.
ベイズ最適化によるハイパーパラメータ探索についてざっくりと解説しました。
今回紹介する内容の元となった論文
Bergstra, James, et al. "Algorithms for hyper-parameter optimization." 25th annual conference on neural information processing systems (NIPS 2011). Vol. 24. Neural Information Processing Systems Foundation, 2011.
https://hal.inria.fr/hal-00642998/
The document discusses graph databases and their properties. Graph databases are structured to store graph-based data by using nodes and edges to represent entities and their relationships. They are well-suited for applications with complex relationships between entities that can be modeled as graphs, such as social networks. Key graph database technologies mentioned include Neo4j, OrientDB, and TinkerPop which provides graph traversal capabilities.
The document discusses the paper "t-vMF Similarity for Regularizing Intra-Class Feature Distribution" presented at CVPR2021. The paper proposes a new similarity measure called t-vMF similarity that can control the width of the peak and skirt of the cosine similarity. This allows intra-class variance to be reduced while preventing gradient vanishing, especially for imbalanced or small-scale datasets where maximizing discrimination is more important than minimizing intra-class variance. The t-vMF similarity is implemented by considering the von Mises-Fisher distribution in the process of the softmax cross-entropy loss, making it simple to implement.
This document summarizes Masahiro Nakagawa's presentation on Fluentd and Embulk. Fluentd is a data collector for unified logging that allows for streaming data transfer based on JSON. It is written in Ruby and uses plugins to collect, process, and output data. Embulk is a bulk loading tool that allows high performance parallel processing of data to load it into various databases and storage systems. Both tools use a pluggable architecture to provide flexibility in handling different data sources and targets.
This document introduces Hivemall, an open-source machine learning library built as Hive UDFs. It summarizes new features in version 0.4, including Random Forest and Factorization Machine algorithms. The speaker then outlines the development roadmap, with plans to add Gradient Tree Boosting, Field-aware Factorization Machines, Online LDA, and a Mix server in upcoming versions. Real-world use cases of Hivemall are also briefly mentioned.
Embulk, an open-source plugin-based parallel bulk data loaderSadayuki Furuhashi
The document discusses Embulk, an open-source parallel bulk data loader that uses plugins. Embulk loads records from various sources ("A") to various targets ("B") using plugins for different source and target types. This makes the painful process of data integration more relaxed. Embulk executes in parallel, validates data, handles errors, behaves deterministically, and allows for idempotent retries of bulk loads.
PDF版 世界中のゲーム分析をしてきたPlayFabが大進化!一緒に裏側の最新データ探索の仕組みを覗いてみよう Db tech showcase2020Daisuke Masubuchi
世界中のオンラインゲームやスマフォアプリの分析をしてきたPlayFab。最近、従来のイベント分析に加えて様々なテレメトリーを包含したクラウド分析機能が備わりました。今回は、その裏の Azure Data Explorer a.k.a Kusto での構成や仕組みをご紹介します。Windowsのテレメトリー分析やAzureのログ解析基盤の裏側と共通した仕掛けが含まれているのでお楽しみに!ゲーム業界に限らず、ビックデータ運用を考えている大規模なSaaS事業やIoT事業にもご参考いただけたら幸いです。
at db tech showcase ONLINE 2020 https://db-tech-showcase.com/dbts/2020/online #dbts2020 #gamestackjp
*本資料は 2020年11月11日に開催された DB Tech Showcase イベントにてお話させていただいた、同タイトルのセッション資料となります
2. WHO AM I ?
• Toru Takahashi (@nora96o)
• Treasure Data, Inc.
• Support Engineering Manager
• メールにチャットに、ブログ書いたり、コードを書いたり、
• http://qiita.com/toru-takahashi
• 気づくと、社会人4年目に突入・・・
2
47. Digdagとは?
Digdag is a simple tool that helps you to build, run, schedule, and monitor
complex pipelines of tasks. It handles dependency resolution so that tasks
run in order or in parallel.
Digdag fits simple replacement of cron, IT operations automation, data
analytics batch jobs, machine learning pipelines, and many more by using
Directed Acyclic Graphs (DAG) as the infrastructure.
47
http://www.digdag.io/index.html