SIG-KBS(ç¥èãã¼ã¹ã·ã¹ãã ç 究ä¼)ã®Google Marketingã«ãããã³ã³ãã¥ã¼ã¿ã¼ãµã¤ã¨ã³ã¹ã¨çµ±è¨å¦ã®è¬æ¼ã¡ã¢
SIG-KBSã§Googleã®è¬æ¼ããããããã®ã¡ã¢ãåã£ã¦ãããã®ã§è¼ãã¦ããã¾ããGoogleé¢é£ã®è¬æ¼ã¯ããã¾ã§ä¸åãããèãããã¨ããããã§ããã©ãçµ±è¨é¢é£ã¯èãããã¨ãç¡ãã£ãã®ã§é常ã«é¢ç½ãã£ãã§ããã
Ustã¨ãã¹ã©ã¤ãæ®ã£ã¦ãããwebã«æããã£ã¦ã®ã¯åå¼ãã¦ãã ãã><ã¨ããæãã ã£ãã®ã§ãããBlogã«ããããããªãè¯ããããã®ã§ãè¼ãã¦ããã¾ããã¡ã¢ãã®ã¾ãã¾ãªãã§ç¹ã«è£è¶³èª¬æã¨ããªãã§ããã©ãã
社å å°ç¨æé©åè¨èªã¨ããçµ±è¨è§£æã«ä½¿ã£ã¦ããçµ±è¨ã¢ãã«ã¨ãã®èª¬æã¯é¢ç½ãã£ãã§ããã
- CSã¨çµ±è¨å¦
- Computer Science and Statistics at Google Marketing
- æ±å¤§åæ¥å¾Google
- 2009 Quantitative Marketing Manager @ Tokyo
- PhD in Engineering from the University of Tokyo (focused on computational science)
- modeling, computational and simulation analysis of complex networks
- Visualization of large-scale complex graphs!
- ä¸äººããããªã
- Member
- Web search
- çµ±è¨
- 1 Let others speak for you
- 2,3 éè¦
- 2 Data not hype
- 3 Results must be trackable
- 4 Promote trial
- 5 YOu're smart and your time matters.
- 6 We're serious. Except when we're not.
- 7 Big ideas move us.
- Seven principles of Google Marketing
- 社å ã®å ¨ã¦ã®ç¤¾å¡ãçµæ§ãã¼ã¿ã®ã¢ã¯ã»ã¹ãã§ãããã¨ããã¨éã
- ã½ã¼ã¹ã³ã¼ãã®ã¢ã¯ã»ã¹ã¯ã½ããã¨ã³ã¸ãã¢ã®ã¿ã ãã©çµ±è¨é¢é£ã®äººãè¦ããã¨ã
- å¶éå³ãã
- ã¨ã³ã¸ãã¢ã§ãªã人ãlog ãã¼ã¿ãã©ãè¦ãã°ããã®ã
- ã¬ãã¼ãã£ã³ã°ã®ããã®ãã¼ã«
- analysis skills
- engineering skills
- product and market-specific knowledge and expertise
- extensive analytical and statistical skills.
- analyses to help inform marketing strategies for key products
- We have the same data/logs access privileges as software engineers
- We are supposed to be data analysis professionals
- äºã¤ã®äºä¾
- Display Ad Expertiments
- 100ãã120ã«ä¸ãã
- ããã ãã§ã¯åãããªã
- ããã
- Test & control ads
- åç»ã®åºåå¹æã®æ¸¬å®ã¯é£ãã
- æ¯è¼å®é¨
- Media Mix
- çµ±è¨ã¢ãã«
- how does data analysis work at Google
- Program Language
- Sawzall
- Python
- Javascript
- Statistical Analysis
- R
- SQL
- ãã£ã¦ãäºã¯å¤§å¦ã®ç 究ã¨ä¼¼ã¦ãã
- Rã®ã©ã¤ãã©ãªãä½ã£ã¦ãã人ãGoogle社å ã«çµæ§ããã
- Data analysis procedure
- datapull from various logs
- datapull from other data sources
- aggreage and process
- statistical analysis -- apply statistical models on the data
- visualize and publish as presentation or report
- Logs(70%)
- access log
- client download/update log
- we dont' use rdbms at this stage
- simply data is too huge
- requires distributed computing with many machines
- ofen no complex data manipulation is needed
- Goal of data analysis is ofen rather simple
- sum histogram max min topN filtering
- 解æææ³ã¯ã·ã³ãã«ã§ãã£ãããã
- ãã©ãã¤ãã¯ä¸æ°ã«è¡ã
- RDBMSã¯å¿
è¦ãªã
- é£ããã®ã¯join
- ããããäºãããå¿ è¦ã¯ãªã
- logã®æ§é
- request URL
- MapReduceãç°¡åã«ã¤ããããã«Sawzallã¨ããããã°ã©ãã³ã°è¨èªã使ã£ã¦ããã
- Sawzall
- Query Geo Distribution
- datum: table summ[t: time][lat :int][lon: int] of int:proto "querylog.proto"
log_record:QueryLogProto
- ä¸æ¥ã§ç¿å¾åºæ¥ãè¨èª
- MapReduceã®å¦çãé è½ãã¦æ°æ¥½ã«ä½¿ããã
- 社å æé©åæ¿ãã
- 90æ°%
- éã
- Failure-obvious
- discard and re-calcuatet the record with error rather than stall whole computation.
- MySQL like database negine
- Need to parse and aggreage different dat sources
- çµ±è¨ã¢ãã«ãä½ã
- Apply appropriage statistical methods for given problems Some examples
- Time-series(seasonal ARIMA) model
- LIME mixed effects(LME)
- Random forest models
- DhD propensity scoring
- Experimental design
- 20+ statisticans and quant analysts on the team.
- R mostly commonly used
- ä»å¾ä¼¸ã³ã¦ããã§ãããã¯ã©ã¤ã¢ã³ãã®è§£æ decision tree
- èªå·±ç¸é¢ã¨ç§»åå¹³åã§åããä½åå¹´ã®ã©ãã®ããã®
- æç³»ååæ!!!
- Visualization and Presentation
- æç³»åç¸é¢ã®ã¢ãã¡ã¼ã·ã§ã³
- Adhering to Engineering standards
- Sharing al source codes with all other software engineers
- check code into single repository for the whole company
- your code may be used or edited by someone in the future
- all codes have to follow coding styles
- all codes have to be reviewed by peers before check in
- Sharing computing resources with all other engineers.
- K distributed machines.
- same infrastructure as production.
- 使ãæ¨ã¦ã³ã¼ããã¬ãã¥ã¼ããã
- å®åç¯å²ãåºã
- åé¡ã¯å®ç¾©ããã¦ããªã
- Youtubeã®Traficããªããã«ä½¿ããªãã
- ã©ããã£ãã¢ããã¼ããå¿ è¦ãèããã¨ããããããã¸ã§ã¯ãã¹ã¿ã¼ã
- Challenges we are facing
- Complex questions without simple solutions
- Large volumes of data
- can't achieve w/o sophosticated computing infrastructure
- analysts need to have necessary technical / engineering and quantitative skills
- Limited resources (hiring & training)
- Privacy
- 社å åãçµ±è¨å¦è ã¯å ¨ç¶ããªãã
- CS + Statistical + math backgraound is difficult.
- çµ±è¨ã®æç§æ¸ãåºãã¦åå¼·ãã¦ããã
- Statisticså°æ»ã¯æ¥æ¬ã«ã¯ãã¾ãããªãã
- Google has many data analyst teams, including us QM
- We are NOT software engineers but are equipped with either engineering or statistics backgrounds and adhere to engineering standards at Google
- We undertake complex research and modeling projects that involve large-scale data processing and intensive statistical analysis.
- We are hoping
- google realted papers
- http://labs.google.com/papers
- QL also presents some papers at JSM(Joint Statistical Meetings) conference
- http://www.amstat.org/meetings/jsm/2010/
- æã«å ¥ããã®ã¯ä½ã§ã使ãæ¹å
- low dataããã
- éé¡ã¨ã®å ¼ãåã