Description:


Emergence of the web, social media and online social networking websites gave rise to detailed traces of human social activity. This offers many opportunities to analyze and model behaviors of millions of people. For example, we can now study ""planetary scale"" dynamics of a full Microsoft Instant Messenger network of 240 million people, with more than 255 billion exchanged messages per month.
Many types of data, especially web and "social" data, come in a form of a network or a graph. This tutorial will cover several aspects of such network data: macroscopic properties of network datasets [2,8,29,31]; statistical models for modeling large scale network structure of static [1,4,15,18,28] and dynamic [14,22,25] networks; properties and models of network structure and evolution at the level of groups of nodes [3,26] and algorithms for extracting such structures [6,9]. I will also present several applications and case studies of blogs, instant messaging, Wikipedia and web search [17,19,21,30,32].
Machine learning as a topic will be present throughout the tutorial. The idea of the tutorial is to introduce the machine learning community to recent developments in the area of social and information networks that underpin the Web and other on-line media.

Slides:


Slides for the tutorial. (PDF, 17mb)
(final version)

Outline:


  1. Introduction [15 min]
    • Examples of networks and motivating applications
    • Why should machine learning community care about networks?
  2. Network structure and models [45 min]
  • Characteristics of large networks, differences and similarities between different types of networks [8, 27, 29, 31]
  • Models of macroscopic network structure [1, 2, 4, 15, 28, 32]
  • Estimating models from network data [2, 16, 33]
  • Navigation and search in networks [16, 32]
  • Link prediction in networks [30 min]
    • Microscopic and macroscopic properties of evolving networks [13, 14, 22, 25]
    • Models of network evolution: [10, 12, 22, 25]
    • Link prediction in networks [6, 19]
  • Network structure at the level of groups of nodes [45 min]
    • Evolution of groups of nodes in networks: formation, evolution and group membership in networks [3, 34]
    • Algorithms and models for community structure in networks [6, 9, 26, 30]
    • Properties of community structure in large networks and implications for machine learning in graphs [26]
  • Conclusion [15 min]
    • Current research directions
    • Open problems and research opportunities
    • Connections to economics, social and cognitive sciences, and game theory.

  • Materials:


    Presentation slides will be posted prior the tutorial.

    Who should attend:


    Since network data arises in so many different areas of data mining and machine learning, this tutorial should be of theoretical and practical interest to a large part of the machine learning community. The tutorial will not require prior knowledge beyond the basic concepts covered in introductory machine learning and algorithms classes.

    Presenter:


    Jure Leskovec is an assistant professor in Computer Science at Stanford University currently on leave at Cornell University. Jure received his PhD in Machine Learning from Carnegie Mellon University in 2008. He received the ACM KDD 2005 and ACM KDD 2007 best paper awards, won the ACM KDD cup in 2003 and topped the Battle of the Sensor Networks 2007 competition. Jure holds three patents. His research focuses on the analysis and modeling of large real-world social and information networks as the study of phenomena across the social, technological, and natural worlds.

    References:


    • [1] Aiello, W., Chung, F., and Lu, L. (2000). A random graph model for massive graphs. In STOC "00: Proceedings of the 32nd annual ACM symposium on Theory of computing, pages 171-180.
    • [2] Albert, R. and Barabasi, A.-L. (1999). Emergence of scaling in random networks. Science, 286:509-512.
    • [3] Backstrom L., D. Huttenlocher, J. Kleinberg, X. Lan. Group Formation in Large Social Networks: Membership, Growth, and Evolution (2006). In KDD "06: Proc. 12th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining.
    • [4] Blum, A., Chan, H., and Rwebangira, M. (2006). A random-surfer web-graph model. In ANALCO "06: Proceedings of the 3rd Workshop on Analytic Algorithmics and Combinatorics.
    • [5] Chakrabarti, D. and Faloutsos, C. (2006). Graph mining: Laws, generators, and algorithms. ACM Computing Survey, 38(1):2.
    • [6] Clauset, A., C. Moore, and M. E. J. Newman. Hierarchical structure and the prediction of missing links in networks. Nature, 453(7191): 98-101, 2008.
    • [7] Erdos, P. and Renyi, A. (1960). On the evolution of random graphs. Publication of the Mathematical Institute of the Hungarian Acadamy of Science, 5:17-67.
    • [8] Faloutsos, M., Faloutsos, P., and Faloutsos, C. (1999). On power-law relationships of the internet topology. In SIGCOMM "99: Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication, pages 251-262.
    • [9] Girvan, M. and Newman, M. E. J. (2002). Community structure in social and biological networks. In Proceedings of the National Academy of Sciences, volume 99, pages 7821-7826.
    • [10] Guo F., S. Hanneke, W. Fu and E. P. Xing. Recovering Temporally Rewiring Networks: A model-based approach (2007). In Proceedings of the 24th International Conference on Machine Learning.
    • [11] Kempe, D., Kleinberg, J. M., and Tardos E (2003). Maximizing the spread of influence through a social network. In KDD "03: Proceedings of the 9th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 137-146.
    • [12] Krapivsky, P. L. and Redner, S. (2005). Network growth by copying. Physical Review E, 71(036118):036118.
    • [13] Kumar, R., Novak, J., Raghavan, P., and Tomkins, A. (2003). On the bursty evolution of blogspace. In WWW "02: Proceedings of the 11th international conference on World Wide Web, pages 568-576.
    • [14] Kumar, R., Novak, J., and Tomkins, A. (2006). Structure and evolution of online social networks. In KDD "06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 611-617.
    • [15] Kumar, R., Raghavan, P., Rajagopalan, S., Sivakumar, D., Tomkins, A., and Upfal, E. (2000). Stochastic models for the web graph. In FOCS "00: Proceedings of the 41st Annual Symposium on Foundations of Computer Science, page 57.
    • [16] Leskovec, J. and C. Faloutsos. Scalable modeling of real graphs using kronecker multiplication. In ICML "07: Proceedings of the 24th International Conference on Machine Learning, 2007.
    • [17] Kleinberg, J. M. (2000) Navigation in a Small World. In Nature, 406.
    • [18] Li L., D. Alderson, John C. Doyle, and W. Willinger. Towards a theory of scale-free graphs: Definition, properties, and implications. Internet Mathematics, 2(4):431-523, 2005.
    • [19] Liben-Nowell, D., J. Kleinberg (2003). The Link Prediction Problem for Social Networks. In Proc. 12th International Conference on Information and Knowledge Management (CIKM).
    • [20] Leskovec, J., Adamic, L. A., and Huberman, B. A. (2007). The dynamics of viral marketing. ACM Transactions on the Web (TWEB), 1(1):2.
    • [21] Leskovec, J., Horvitz, E., and Dumais, S. (2007). Web projections: Learning from contextual subgraphs of the web. In WWW "07: Proceedings of the 16th international conference on World Wide Web.
    • [22] Leskovec, J., Kleinberg, J. M., and Faloutsos, C. (2007). Graph evolution: Densification and shrinking diameters. ACM Transactions on Knowledge Discovery from Data (TKDD), 1(1):2.
    • [23] Leskovec, J., Krause, A., Guestrin, C., Faloutsos, C., VanBriesen, J., and Glance, N. (2007). Cost-effective outbreak detection in networks. In KDD "07: Proceeding of the 13th ACM SIGKDD international conference on Knowledge discovery in data mining.
    • [24] Leskovec and E. Horvitz. Planetary-scale views on a large instant-messaging network. In WWW "08: Proceedings of the 17th International Conference on World Wide Web, 2008.
    • [25] Leskovec, L. Backstrom, R. Kumar, and A. Tomkins. Microscopic evolution of social networks. In KDD "08: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 462-470, 2008.
    • [26] Leskovec, K. J. Lang, A. Dasgupta, and M. W. Mahoney. Statistical properties of community structure in large social and information networks. In WWW "08: Proceedings of the 17th International Conference on World Wide Web, 2008.
    • [27] Newman, M.E. The structure and function of complex networks. In SIAM Review, 45:167-256.
    • [28] Pennock, D. M., Flake, G. W., Lawrence, S., Glover, E. J., and Giles, C. L. (2002). Winners don"t take all: Characterizing the competition for links on the Web. Proceedings of the National Academy of Sciences, 99(8):5207-5211.
    • [29] Ravasz, E. and Barabasi, A.-L. (2003). Hierarchical organization in complex networks. Physical Review E, 67(2):026112.
    • [30] Tauro, S. L., Palmer, C., Siganos, G., and Faloutsos, M. (2001). A simple conceptual model for the internet topology. In GLOBECOM "01: Global Telecommunications Conference, volume 3, pages 1667 - 1671.
    • [31] Watts, D. J. and Strogatz, S. H. (1998). Collective dynamics of "small-world" networks. Nature, 393:440-442.
    • [32] Watts, D. J., Dodds, P. S., and Newman, M. E. J. (2002). Identity and search in social networks. Science, 296:1302-1305.
    • [33] Wasserman, S. and Pattison, P. Logit models and logistic regressions for social networks. Psychometrika, 60:401-425, 1996.
    • [34] Hopcroft, J., Khan, O., Kulis, B., and Selman, B. Tracking evolving communities in large linked networks. Proceedings of the National Academy of Sciences, 101:5249-5253, 2004.
    • [35] Crandall, D., Cosley, D., Huttenlocher, D., Kleinberg J., and Suri, S. Feedback effects between similarity and social influence in online communities. In KDD "08: Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, 2008.