You Are Where You Go: Inferring Demographic Attributes from Location Check-ins (WSDM 2015) èªãã
You Are Where You Go: Inferring Demographic Attributes from Location Check-ins
æ¦è¦
Weibo ã®ãã§ãã¯ã¤ã³ãã¼ã¿ã¨ï¼POI ã«é¢ããæ
å ± (dianping ã®ã¬ãã¥ã¼æ
å ±) ããï¼ã¦ã¼ã¶ã®å¹´é½¢ï¼æ§å¥ï¼å¦æ´ï¼æ§çå好ï¼ã¹ãã¼ã¿ã¹ (æªå©ï¼æ¢å©ãªã©)ï¼æ´ã«ã¯è¡æ¶²åã¨12æ座ã¾ã§äºæ¸¬ããï¼çµæï¼ããããã®å±æ§ã«ã¤ãã¦ã¯é«ç²¾åº¦ã«ï¼ãã¤ï¼è¡æ¶²åã¨12æ座ã«ã¤ãã¦ã¯ã©ã³ãã ã«äºæ¸¬ããããã精度ãæ¹åãã¦ããï¼
å¾è
äºã¤ãäºæ¸¬ããã¨ããè«æã¯ããã¾ã§è¦ããã¨ãç¡ãã£ãï¼è¡æ¶²åå ãã§é¨ãã§ããå ´åãããªãï¼ä¸ä½ã©ããããã¨ãªã®ãï¼åãããªãã£ãã®ã§èªãã ï¼
ããã¾ãã«è¨ãã¨ï¼ã¦ã¼ã¶ã®ãã§ãã¯ã¤ã³ãã¼ã¿ãå å·¥ãã¦ç¹å¾´ãã¯ãã«ã«ãï¼åå±æ§ãã¨ã®äºå¤åé¡/å¤å¤åé¡ãè¡ãï¼
æ¹åæ§
èè ãããï¼ãã§ãã¯ã¤ã³ãã¼ã¿ã«å«ã¾ãã次ã®3ç¹ãã¦ã¼ã¶ã®ãããã£ã¼ã«ã«é¢ä¿ãã¦ããã¨ããï¼
- Temporality (æéæ§) : 人ã®è¡åã¨æéã«ã¯å¼·ãé¢ä¿ããã (ä¾. 社ä¼äººã¯å¹³æ¥ã®æèªå® ããè·å ´ã«éå¤ããï¼å®å¹´ãã人ã¯å¹³æ¥åå¾è²·ãç©ã«è¡ãï¼ã¿ã¯ã·ã¼ãã©ã¤ãã¼ã¯ä¼æ¥ã®å¤ä¸åã)ï¼ã¾ãï¼ãã§ãã¯ã¤ã³ãããã«ãã´ãªã¨æéã«å¼·ãé¢ä¿ããã(飲é£åºã¯æ¼ã¨å¤ï¼äº¤éæ©é¢ã¯æã®æ©ãæéãªã©)ï¼
- Spatiality (空éæ§) : Like/follow/reply/RT ã¨ãã£ãè¡åã¨ã¯éã£ã¦ï¼ãã§ãã¯ã¤ã³ã¯å°ççãªå¶ç´ãå ãã (stanford university ãã 3000ãã¤ã«é¢ãã northeastern university ã¾ã§30åã®å·®ã§ãã§ãã¯ã¤ã³ãããã¨ã¯ã§ããªã)ï¼ãã¼ã¿ãè¦ã¦ã¿ãã¨ï¼é£ç¶ãããã§ãã¯ã¤ã³ã®è·é¢å·®ã¯ã»ã¨ãã©ã 20km 以ä¸ï¼ã¾ãï¼ç¾å°æ°ã¨æ è¡å®¢ã§ãã§ãã¯ã¤ã³ããå ´æã®åå¸ãè¦ã¦ã¿ãã¨ï¼æ è¡å®¢ããã§ãã¯ã¤ã³ããã¨ãªã¢ã¯é常ã«çã(観å å°ãã訪ãã¦ããªã)ãã¨ããããï¼
- Location knowledge (å ´ææ å ±) : ãã§ãã¯ã¤ã³ããå ´æ (POI) ã®æ©è½ãéè¦ï¼å¦çã¯åå¼·ããããã«å¦æ ¡ã«è¡ããï¼ãã¸ãã¹ãã³ã¯åãããã«ãã¸ãã¹è¡ã«åããï¼POIã¯ã«ãã´ãªæ§é ã¨ç´ä»ãã¦ããã®ã§ãããå©ç¨ããï¼ããã«ï¼ã«ãã´ãªã ãã§ãªãï¼æ§ã ãªå´é¢ããã¤ãããã(5段éè©ä¾¡ãªã©ã®)ã¬ãã¥ã¼ãããã®ã§ãããéè¦ï¼
ãããã®æ å ±ã次ã®æç¶ãã«ãã£ã¦æ½åºããï¼
ç¹å¾´éæ½åº
åæºå
å京ã¨ä¸æµ·ã«ä½ãã¦ã¼ã¶ 3,354,918 人ã®ãã¼ã¿ï¼ 81,781,544 åã®ãã§ãã¯ã¤ã³ãã¼ã¿ã Weibo ããã¯ãã¼ã«ï¼é©å½ãªå½¢ã§ãã¼ã¿ãæ´ããçµæï¼æçµçã« 159,530 人ã®ã¦ã¼ã¶ãåæã«ä½¿ãï¼
Spatiality (空éæ§)
å¸è¡å°ãåºåã£ã¦èª°ãã©ã®é åã«ä½åãã§ãã¯ã¤ã³ããããç¹å¾´éã«ããï¼é åã®åºåãæ¹ã¯éè·¯æ å ±ã¨ãã§ãã¯ã¤ã³æ å ±ãçµã¿åãããã»ã°ã¡ã³ãã¼ã·ã§ã³ (Discovering Regions of Different Functions in a City Using Human Mobility and POIs (KDD 2014))ï¼
Temporality (æéæ§)
å¹³æ¥/ä¼æ¥ã®ãã¿ã¼ã³ãèæ ®ãããã®ã§ï¼24 æé x å¹³æ¥/ä¼æ¥ ã® 48è¦ç´ ãããªã time bin ãèãï¼POIã¸ã®ãã§ãã¯ã¤ã³ã¨çµã³ã¤ããï¼
Location knowledge (å ´ææ å ±)
æéãããã£ã¦ãï¼å¤§ã¾ãã«æµããæ¸ãã¨
- Cross domain POI conflation : Weibo ã®ãã§ãã¯ã¤ã³ãã¼ã¿ã«ããã POI ã¨ï¼dianping ã«ãããã¬ãã¥ã¼ã® POI ã¨ãç´ä»ãã
- ååï¼ä½æï¼é»è©±çªå·ï¼è·é¢ã使ã£ã¦æ°åã§é å¼µã
- Weibo POI ã® 35% ãç´ã¥ã
- Lexicon creation : äºæ®µéã§ãã
- POI ã ããã¤ãã®ã«ãã´ãªã«åé¡ããä¸ã§ï¼ dianping ã®ã¬ãã¥ã¼æãã log(term frequency) ãåã£ãä¸ä½200èªã lexicon ã¨ãã
- ãã®å¾ï¼ Weibo ãã POI ã«é¢ãããã¤ã¼ããéãï¼lexicon ã«å«ã¾ãããã®ãæ½åº
- ããã«ããï¼ POI ã«ç´ã¥ã (Weibo ã«ãã¨ã¥ã) ãã¼ã¯ã¼ããå¾ããã
- Location knowledge transferring : dianping ã¨ç´ä»ãã POI 㯠35%ãªã®ã§ã¬ãã¥ã¼æ
å ±ãä¸å®å
¨ï¼ãããåãã
- POI ã«ç´ä»ãããã¼ã¯ã¼ããç¹å¾´éã¨ãã¦ã¬ãã¥ã¼ãæ¨å®ãã
- ã¬ãã¥ã¼ã®åè¦ç´ ã¯5段éè©ä¾¡ãªã®ã§åé¡ï¼ç·åæ¡ç¹ã¯èªç¶æ°ãªã®ã§å帰
ãã³ã½ã«å解
ãããã¦å¾ãããæ
å ±ããã³ã½ã«å解ããï¼
å
·ä½çã«ã¯ ã¦ã¼ã¶ x location knowledge x contextual feature ã®ä¸éãã³ã½ã«ã«ããï¼
contextual feature ã«ã¤ãã¦ã¯ï¼24 x å¹³æ¥/ä¼æ¥ã示ã time bin 㨠region ãé å¼µã£ã¦ãã©ã®region ã« time bin ã§ãã§ãã¯ã¤ã³ãããããä¸æ¬¡å
ã§è¡¨ç¾ããï¼
location knowledge ã«ã¤ãã¦ã¯ï¼ã¾ãã¬ãã¥ã¼ã¨ãã¼ã¯ã¼ããå¥ã
ã«k-meansã§ã¯ã©ã¹ã¿ãªã³ã°ãï¼N_R 次å
㨠N_K 次å
ã®ãã¯ãã«ãä½ãï¼ãã¨ã¯ããã« POI ã®ã«ãã´ãªæ
å ±ã 1-of-k 表ç¾ãããã¯ãã«ãç¨æãï¼å
¨é¨ãã²ã¨ã¾ã¨ãã«ãªãã¹ã¦ï¼ contextual feature ã¨åæ§ã«ä¸æ¬¡å
ã§è¡¨ç¾ããï¼ãããã¸ãã®æä½ã¯å£ã§èª¬æãã«ããã®ã§åå
¸åç
§ãã¦æ¬²ããï¼
ãã®ãããè¤æ°ãã³ã½ã«ãç¨æããã«å解ãã¦ãã¦æ°åãæããï¼
ãã¨ã¯ã¿ãã«ã¼å解ã§ã¦ã¼ã¶ãã¨ã®ä½æ¬¡å
表ç¾ãå¾ãï¼
æ¨å®ã»å®é¨
ãããã£ã¼ã«æ¨å®
ã¿ãã«ã¼å解ã§å¾ãããç¹å¾´éã§ã¦ã¼ã¶ã®å±æ§ãæ¨å®ããï¼äºæ¸¬å¯¾è±¡ã¯ä»¥ä¸ã®ãã®ï¼
å±æ§ | åå¨ããå²å[%] | é ç® |
---|---|---|
æ§å¥ | 94.02 | ç·ï¼å¥³ |
å¹´é½¢ | 33.16 | æ´æ°å¤ |
å¦æ´ | 36.72 | 大åï¼å¦ |
æ§çå好 | 2.55 | ç°æ§æï¼ä¸¡æ§æï¼åæ§æ(ç·)ï¼åæ§æ(女) |
ã¹ãã¼ã¿ã¹ | 2.64 | ã²ã¨ãï¼æ¢ãä¸ï¼äº¤éä¸ï¼æ¢å© |
è¡æ¶²å | 1.64 | Oï¼Aï¼Bï¼AB |
12æ座 | 58.16 | zodiac signã£ã¦ãã£ããã |
æ´æ°å¤ã¯å帰ï¼ãã以å¤ã¯äºå¤/å¤å¤åé¡ï¼åé¡å¨ã¯è²ã 試ãï¼
å®é¨çµæ
ã¾ã注æããªããã°ãªããªãã®ãè©ä¾¡ææ¨ï¼äºå¤åé¡ã¯ãã¦ããï¼å¤å¤åé¡ã®å®é¨çµæã§ã¯ï¼ã¯ã©ã¹ãã¨ã®ç²¾åº¦/åç¾çã®å¹³åå¤ãåã£ã¦ããï¼ãããéè¦ï¼
å®é¨çµæï¼å¤§ä½ã®å±æ§ãããæãã«äºæ¸¬ã§ãã¦ããï¼åé¡ã¯è¡æ¶²å㨠12 æ座ï¼
è¡æ¶²åã®äºæ¸¬ç²¾åº¦ã 0.3012ï¼åç¾çã 0.3103ï¼ãããããä¸æ§ã«åå¸ãã¦ãããï¼ O åãå¤ãã®ã ããï¼å
¨é¨ O ã¨äºæ¸¬ãã¦ããããããåºãã®ã§ã¯ãã¨ä¸è¦ããã¨æããã©ï¼ã¯ã©ã¹ãã¨ã®å¹³åå¤ãªã®ã§å
¨ã¦ã¼ã¶ãOã¨äºæ¸¬ãã¦ã (1.0 + 0.0 + 0.0 + 0.0) / 4 㧠0.25 ãããã«ãããªããªãï¼ã¤ã¾ãï¼ã©ã³ãã ã«äºæ¸¬ããããã¯ç²¾åº¦ãæ¹åãã¦ããï¼
12æ座ã«ã¤ãã¦ãåæ§ã§ï¼ç²¾åº¦ 0.1303 ã¨ããæ°åï¼ã©ããä¸ã¤ã®æ座ã ã¨äºæ¸¬ãã¦ã精度 0.083 ãããã«ãããªããªãã®ã§ï¼ãã¯ãï¼ã©ã³ãã ã«äºæ¸¬ãããã精度ãæ¹åãã¦ããï¼
èè
ããã確ãã«ç²¾åº¦ã¯ä½ããï¼ãã¹ã¦ã®å±æ§ã®ä¸ã§ä¸çªæªããã©ï¼ããã§ãã©ã³ãã ããã¾ãã ãã¨ä¸»å¼µãã¦ããï¼
ãã®çç±ãæ£ç´ããããããªãï¼ãã£ã¨ããã¨ï¼ã©ãããã¡ã«ããºã ã§è¡æ¶²åãæ座ã¨ã¦ã¼ã¶ã®è¡åãç´ä»ãã¦ããã®ããããããªãï¼12æ座ã«ã¤ãã¦ã¯ï¼ãèªçæ¥ã«ãã£ã¦è¡åãå¤ããã®ã§ã¯ãã¨ããæè¦ãè²°ã£ãããã©ï¼ãããæéæ
å ±ã«ã¯å¹´æãå
¥ã£ã¦ãªãã®ã§é£ããããã«æããï¼