skip to main content
10.1145/2684822.2685287acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article

You Are Where You Go: Inferring Demographic Attributes from Location Check-ins

Published: 02 February 2015 Publication History

Abstract

User profiling is crucial to many online services. Several recent studies suggest that demographic attributes are predictable from different online behavioral data, such as users' "Likes" on Facebook, friendship relations, and the linguistic characteristics of tweets. But location check-ins, as a bridge of users' offline and online lives, have by and large been overlooked in inferring user profiles. In this paper, we investigate the predictive power of location check-ins for inferring users' demographics and propose a simple yet general location to profile (L2P) framework. More specifically, we extract rich semantics of users' check-ins in terms of spatiality, temporality, and location knowledge, where the location knowledge is enriched with semantics mined from heterogeneous domains including both online customer review sites and social networks. Additionally, tensor factorization is employed to draw out low dimensional representations of users' intrinsic check-in preferences considering the above factors. Meanwhile, the extracted features are used to train predictive models for inferring various demographic attributes.
We collect a large dataset consisting of profiles of 159,530 verified users from an online social network. Extensive experimental results based upon this dataset validate that: 1) Location check-ins are diagnostic representations of a variety of demographic attributes, such as gender, age, education background, and marital status; 2) The proposed framework substantially outperforms compared models for profile inference in terms of various evaluation metrics, such as precision, recall, F-measure, and AUC.

References

[1]
E. Acar, S. A. Camtepe, M. S. Krishnamoorthy, and B. Yener. Modeling and multiway analysis of chatroom tensors. In Intelligence and Security Informatics, pages 256--268. Springer, 2005.
[2]
E. Acar, D. M. Dunlavy, and T. G. Kolda. Link prediction on evolving data using matrix and tensor factorizations. In Data Mining Workshops, 2009. ICDMW'09. IEEE International Conference on, pages 262--269. IEEE, 2009.
[3]
M. D. Back, S. C. Schmukle, and B. Egloff. How extraverted is honey. bunny77@ hotmail. definferring personality from e-mail addresses. Journal of Research in Personality, 42 (4): 1116--1122, 2008.
[4]
B. W. Bader, M. W. Berry, and M. Browne. Discussion tracking in enron email using parafac. In Survey of Text Mining II, pages 147--163. Springer, 2008.
[5]
C. Boulis and M. Ostendorf. A quantitative analysis of lexical differences between genders in telephone conversations. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pages 435--442. Association for Computational Linguistics, 2005.
[6]
S. Brdar, D. Culibrk, and V. Crnojevic. Demographic attributes prediction on the real-world mobile data. In Proc. Mobile Data Challenge by Nokia Workshop, in Conjunction with Int. Conf. on Pervasive Computing, Newcastle, UK, 2012.
[7]
X. Cao, G. Cong, and C. S. Jensen. Mining significant semantic locations from gps data. Proceedings of the VLDB Endowment, 3 (1--2): 1009--1020, 2010.
[8]
P. T. Costa Jr and R. R. McCrae. Reply to ben-porath and waller. Psychological Assessment, 4 (1): 20--22, 1992.
[9]
G. De'Ath. Boosted trees for ecological modeling and prediction. Ecology, 88 (1): 243--251, 2007.
[10]
Y. Dong, Y. Yang, J. Tang, and N. V. Chawla. Inferring user demographics and social strategies in mobile social networks. 2014.
[11]
L. A. Fast and D. C. Funder. Personality as manifest in word use: correlations with self-report, acquaintance report, and behavior. Journal of personality and social psychology, 94 (2): 334, 2008.
[12]
M. C. Gonzalez, C. A. Hidalgo, and A.-L. Barabasi. Understanding individual human mobility patterns. Nature, 453 (7196): 779--782, 2008.
[13]
M. Kosinski, D. Stillwell, and T. Graepel. Private traits and attributes are predictable from digital records of human behavior. Proceedings of the National Academy of Sciences, 110 (15): 5802--5805, 2013.
[14]
W. Labov. The social stratification of English in New York City. PhD thesis, Columbia university., 1964.
[15]
J. Liu, O. Wolfson, and H. Yin. Extracting semantic location from outdoor positioning systems. In MDM, page 73, 2006.
[16]
A. Mislove, B. Viswanath, K. P. Gummadi, and P. Druschel. You are who you know: inferring user profiles in online social networks. In Proceedings of the third ACM international conference on Web search and data mining, pages 251--260. ACM, 2010.
[17]
F. Mosteller and D. L. Wallace. Inference in an authorship problem: A comparative study of discrimination methods applied to the authorship of the disputed federalist papers. Journal of the American Statistical Association, 58 (302): 275--309, 1963.
[18]
D. Murray and K. Durrell. Inferring demographic attributes of anonymous internet users. In Web Usage Analysis and User Profiling, pages 7--20. Springer, 2000.
[19]
G. Ou and Y. L. Murphey. Multi-class pattern classification using neural networks. Pattern Recognition, 40 (1): 4--18, 2007.
[20]
M. Pennacchiotti and A.-M. Popescu. Democrats, republicans and starbucks afficionados: user classification in twitter. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 430--438. ACM, 2011.
[21]
S. Pradhan. Semantic location. Personal Technologies, 4 (4): 213--216, 2000.
[22]
T. Qin, R. Xiao, L. Fang, X. Xie, and L. Zhang. An efficient location extraction algorithm by leveraging web contextual information. In proceedings of the 18th SIGSPATIAL international conference on advances in geographic information systems, pages 53--60. ACM, 2010.
[23]
D. Rao, D. Yarowsky, A. Shreevats, and M. Gupta. Classifying latent user attributes in twitter. In Proceedings of the 2nd international workshop on Search and mining user-generated contents, pages 37--44. ACM, 2010.
[24]
L. Rokach. Ensemble-based classifiers. Artificial Intelligence Review, 33 (1--2): 1--39, 2010.
[25]
P. Symeonidis, A. Nanopoulos, and Y. Manolopoulos. A unified framework for providing recommendations in social tagging systems based on ternary semantic analysis. Knowledge and Data Engineering, IEEE Transactions on, 22 (2): 179--192, 2010.
[26]
P. Trudgill. The social differentiation of English in Norwich, volume 13. CUP Archive, 1974.
[27]
H. Wang, Y. Lu, and C. Zhai. Latent aspect rating analysis without aspect keyword supervision. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 618--626. ACM, 2011.
[28]
Q. Wu, C. J. Burges, K. M. Svore, and J. Gao. Adapting boosting for information retrieval measures. Information Retrieval, 13 (3): 254--270, 2010.
[29]
M. Ye, K. Janowicz, C. Mülligann, and W.-C. Lee. What you are is when you are: the temporal dimension of feature types in location-based social networks. In Proceedings of the 19th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pages 102--111. ACM, 2011.
[30]
Ye, Shou, Lee, Yin, and Janowicz}ye2011semanticM. Ye, D. Shou, W.-C. Lee, P. Yin, and K. Janowicz. On the semantic annotation of places in location-based social networks. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 520--528. ACM, 2011.
[31]
J. Yuan, Y. Zheng, and X. Xie. Discovering regions of different functions in a city using human mobility and pois. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 186--194. ACM, 2012.
[32]
N. J. Yuan, F. Zhang, D. Lian, K. Zheng, S. Yu, and X. Xie. We know how you live: exploring the spectrum of urban lifestyles. In Proceedings of the first ACM conference on Online social networks, pages 3--14. ACM, 2013.
[33]
E. Zheleva and L. Getoor. To join or not to join: the illusion of privacy in social networks with mixed public and private user profiles. In Proceedings of the 18th international conference on World wide web, pages 531--540. ACM, 2009.

Cited By

View all
  • (2025)Contextual Inference From Sparse Shopping Transactions Based on Motif PatternsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.345263837:2(572-583)Online publication date: Feb-2025
  • (2025)Demographic Information Inference from Passively Collected DataJournal of Transportation Engineering, Part A: Systems10.1061/JTEPBS.TEENG-8628151:3Online publication date: Mar-2025
  • (2025)Knowledge-enhanced heterogeneous graph attention networks for privacy co-disclosure detection in online social networkExpert Systems with Applications10.1016/j.eswa.2024.126266268(126266)Online publication date: Apr-2025
  • Show More Cited By

Index Terms

  1. You Are Where You Go: Inferring Demographic Attributes from Location Check-ins

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    WSDM '15: Proceedings of the Eighth ACM International Conference on Web Search and Data Mining
    February 2015
    482 pages
    ISBN:9781450333177
    DOI:10.1145/2684822
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 02 February 2015

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. demographics
    2. location knowledge
    3. prediction
    4. spatiality
    5. temporality
    6. tensor facotorization

    Qualifiers

    • Research-article

    Conference

    WSDM 2015

    Acceptance Rates

    WSDM '15 Paper Acceptance Rate 39 of 238 submissions, 16%;
    Overall Acceptance Rate 498 of 2,863 submissions, 17%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)60
    • Downloads (Last 6 weeks)9
    Reflects downloads up to 17 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Contextual Inference From Sparse Shopping Transactions Based on Motif PatternsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.345263837:2(572-583)Online publication date: Feb-2025
    • (2025)Demographic Information Inference from Passively Collected DataJournal of Transportation Engineering, Part A: Systems10.1061/JTEPBS.TEENG-8628151:3Online publication date: Mar-2025
    • (2025)Knowledge-enhanced heterogeneous graph attention networks for privacy co-disclosure detection in online social networkExpert Systems with Applications10.1016/j.eswa.2024.126266268(126266)Online publication date: Apr-2025
    • (2024)Investigating the Civic Emotion Dynamics during the COVID-19 Lockdown: Evidence from Social MediaSSRN Electronic Journal10.2139/ssrn.4782864Online publication date: 2024
    • (2024)SecDM: A Secure and Lossless Human Mobility Prediction SystemIEEE Transactions on Services Computing10.1109/TSC.2024.335829217:4(1793-1805)Online publication date: Jul-2024
    • (2024)LocGuard: A Location Privacy Defender for Image SharingIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2024.337692921:6(5526-5537)Online publication date: Nov-2024
    • (2024)A Complete and Comprehensive Semantic Perception of Mobile Traveling for Mobile Communication ServicesIEEE Internet of Things Journal10.1109/JIOT.2023.330747811:3(5467-5490)Online publication date: 1-Feb-2024
    • (2024)Towards semantic enrichment for spatial interactionsAnnals of GIS10.1080/19475683.2024.232439230:2(151-166)Online publication date: 6-Mar-2024
    • (2024)Social demographics imputation based on similarity in multi-dimensional activity-travel pattern: A two-step approachTravel Behaviour and Society10.1016/j.tbs.2024.10084337(100843)Online publication date: Oct-2024
    • (2023)Watch your watchProceedings of the 32nd USENIX Conference on Security Symposium10.5555/3620237.3620249(193-210)Online publication date: 9-Aug-2023
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media