In the first half, we give an introduction to modern serialization systems, Protocol Buffers, Apache Thrift and Apache Avro. Which one does meet your needs?
In the second half, we show an example of data ingestion system architecture using Apache Avro.
This document summarizes a microservices meetup hosted by @mosa_siru. Key points include:
1. @mosa_siru is an engineer at DeNA and CTO of Gunosy.
2. The meetup covered Gunosy's architecture with over 45 GitHub repositories, 30 stacks, 10 Go APIs, and 10 Python batch processes using AWS services like Kinesis, Lambda, SQS and API Gateway.
3. Challenges discussed were managing 30 microservices, ensuring API latency below 50ms across availability zones, and handling 10 requests per second with nginx load balancing across 20 servers.
活用段階に入ったNoSQLですがまだまだ実際どう使えるのかご存じ無い方も多いのでは無いでしょうか。当セッションでは、MapR-DB(Hbase互換のNoSQL)が企業でどう活用されているのか、インドのマイナンバー事例や国内事例を元に実際の使い方のイメージと技術的な裏付けをご説明します。2015年6月10〜12日に開催されたdb tech showcase Tokyo 2015での講演資料です。
In the first half, we give an introduction to modern serialization systems, Protocol Buffers, Apache Thrift and Apache Avro. Which one does meet your needs?
In the second half, we show an example of data ingestion system architecture using Apache Avro.
This document summarizes a microservices meetup hosted by @mosa_siru. Key points include:
1. @mosa_siru is an engineer at DeNA and CTO of Gunosy.
2. The meetup covered Gunosy's architecture with over 45 GitHub repositories, 30 stacks, 10 Go APIs, and 10 Python batch processes using AWS services like Kinesis, Lambda, SQS and API Gateway.
3. Challenges discussed were managing 30 microservices, ensuring API latency below 50ms across availability zones, and handling 10 requests per second with nginx load balancing across 20 servers.
活用段階に入ったNoSQLですがまだまだ実際どう使えるのかご存じ無い方も多いのでは無いでしょうか。当セッションでは、MapR-DB(Hbase互換のNoSQL)が企業でどう活用されているのか、インドのマイナンバー事例や国内事例を元に実際の使い方のイメージと技術的な裏付けをご説明します。2015年6月10〜12日に開催されたdb tech showcase Tokyo 2015での講演資料です。
This document discusses strategies for optimizing access to large "master data" files in PHP applications. It describes converting master data files from PHP arrays to tab-separated value (TSV) files to reduce loading time. Benchmark tests show the TSV format reduces file size by over 50% and loading time from 70 milliseconds to 7 milliseconds without OPcache. Accessing rows as arrays by splitting on tabs is 3 times slower but still very fast at over 350,000 gets per second. The TSV optimization has been used successfully in production applications.
Effective DBMS
効率的データベースの使用
(Team study 2018, Japanese)
* This document was written for non-profit purposes, and if there is a copyright problem please contact me.
The document outlines the divisions and focus areas of a large company, with most resources allocated to technology R&D, including deep learning and CNN. Other divisions include infrastructure, promotions, UI design, SEO, big data, IT, recruiting, staffing services, and administration.
Company Recommendation for New Graduates via Implicit Feedback Multiple Matri...Recruit Technologies
This document describes a method for recommending suitable companies to new graduates based on their browsing history and other data. It proposes using implicit matrix factorization of browsing data along with Bayesian optimization of hyperparameters to focus recommendations on less popular, low-browsed companies. An evaluation on Japanese student and company data showed this approach achieved higher recall of suitable matches for low-browsed companies compared to other methods, especially when incorporating additional student and company profile information.
The document discusses the BIG DATA department of Recruit Technologies. It describes how the department has used Amazon's Elastic MapReduce (EMR) and Elastic Compute Cloud (EC2) services since 2010 to perform big data analytics on datasets using Hadoop. It provides details on how the department configures and manages EMR clusters on EC2 spot instances to perform tasks like log analysis and recommendation algorithms in a cost-effective manner. Various strategies are discussed around optimizing the use of spot instances and EMR to reduce costs while managing reliability.