ABSTRACT
Prediction tasks over nodes and edges in networks require careful effort in engineering features used by learning algorithms. Recent research in the broader field of representation learning has led to significant progress in automating prediction by learning the features themselves. However, present feature learning approaches are not expressive enough to capture the diversity of connectivity patterns observed in networks. Here we propose node2vec, an algorithmic framework for learning continuous feature representations for nodes in networks. In node2vec, we learn a mapping of nodes to a low-dimensional space of features that maximizes the likelihood of preserving network neighborhoods of nodes. We define a flexible notion of a node's network neighborhood and design a biased random walk procedure, which efficiently explores diverse neighborhoods. Our algorithm generalizes prior work which is based on rigid notions of network neighborhoods, and we argue that the added flexibility in exploring neighborhoods is the key to learning richer representations.
We demonstrate the efficacy of node2vec over existing state-of-the-art techniques on multi-label classification and link prediction in several real-world networks from diverse domains. Taken together, our work represents a new way for efficiently learning state-of-the-art task-independent representations in complex networks.
- L. A. Adamic and E. Adar. Friends and neighbors on the web. Social networks, 25(3):211--230, 2003.Google ScholarCross Ref
- L. Backstrom and J. Leskovec. Supervised random walks: predicting and recommending links in social networks. In WSDM, 2011. Google ScholarDigital Library
- M. Belkin and P. Niyogi. Laplacian eigenmaps and spectral techniques for embedding and clustering. In NIPS, 2001. Google ScholarDigital Library
- Y. Bengio, A. Courville, and P. Vincent. Representation learning: A review and new perspectives. IEEE TPAMI, 35(8):1798--1828, 2013. Google ScholarDigital Library
- B.-J. Breitkreutz, C. Stark, T. Reguly, L. Boucher, A. Breitkreutz, M. Livstone, R. Oughtred, D. H. Lackner, J. Bahler, V. Wood, et al. The BioGRID interaction database. Nucleic acids research, 36:D637--D640, 2008.Google Scholar
- S. Cao, W. Lu, and Q. Xu. GraRep: Learning Graph Representations with global structural information. In CIKM, 2015. Google ScholarDigital Library
- S. Fortunato. Community detection in graphs. Physics Reports, 486(3--5):75 -- 174, 2010.Google Scholar
- B. Gallagher and T. Eliassi-Rad. Leveraging label-independent features for classification in sparsely labeled networks: An empirical study. In Lecture Notes in Computer Science: Advances in Social Network Mining and Analysis. Springer, 2009. Google ScholarDigital Library
- Z. S. Harris. Word. Distributional Structure, 10(23):146--162, 1954.Google Scholar
- K. Henderson, B. Gallagher, T. Eliassi-Rad, H. Tong, S. Basu, L. Akoglu, D. Koutra, C. Faloutsos, and L. Li. RolX: structural role extraction & mining in large graphs. In KDD, 2012. Google ScholarDigital Library
- K. Henderson, B. Gallagher, L. Li, L. Akoglu, T. Eliassi-Rad, H. Tong, and C. Faloutsos. It's who you know: graph mining using recursive structural features. In KDD, 2011. Google ScholarDigital Library
- P. D. Hoff, A. E. Raftery, and M. S. Handcock. Latent space approaches to social network analysis. J. of the American Statistical Association, 2002.Google Scholar
- D. E. Knuth. The Stanford GraphBase: a platform for combinatorial computing, volume 37. Addison-Wesley Reading, 1993. Google Scholar
- J. Leskovec and A. Krevl. SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data, June 2014.Google Scholar
- K. Li, J. Gao, S. Guo, N. Du, X. Li, and A. Zhang. LRBM: A restricted boltzmann machine based approach for representation learning on linked data. In ICDM, 2014. Google ScholarDigital Library
- X. Li, N. Du, H. Li, K. Li, J. Gao, and A. Zhang. A deep learning approach to link prediction in dynamic networks. In ICDM, 2014.Google ScholarCross Ref
- Y. Li, D. Tarlow, M. Brockschmidt, and R. Zemel. Gated graph sequence neural networks. In ICLR, 2016.Google Scholar
- D. Liben-Nowell and J. Kleinberg. The link-prediction problem for social networks. J. of the American society for information science and technology, 58(7):1019--1031, 2007. Google ScholarDigital Library
- A. Liberzon, A. Subramanian, R. Pinchback, H. Thorvaldsdóttir, P. Tamayo, and J. P. Mesirov. Molecular signatures database (MSigDB) 3.0. Bioinformatics, 27(12):1739--1740, 2011. Google ScholarDigital Library
- M. Mahoney. Large text compression benchmark. www.mattmahoney.net/dc/textdata, 2011.Google Scholar
- T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. In ICLR, 2013.Google Scholar
- T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In NIPS, 2013.Google ScholarDigital Library
- J. Pennington, R. Socher, and C. D. Manning. GloVe: Global vectors for word representation. In EMNLP, 2014.Google ScholarCross Ref
- B. Perozzi, R. Al-Rfou, and S. Skiena. DeepWalk: Online learning of social representations. In KDD, 2014. Google ScholarDigital Library
- P. Radivojac, W. T. Clark, T. R. Oron, A. M. Schnoes, T. Wittkop, A. Sokolov, K. Graim, C. Funk, Verspoor, et al. A large-scale evaluation of computational protein function prediction. Nature methods, 10(3):221--227, 2013.Google ScholarCross Ref
- B. Recht, C. Re, S. Wright, and F. Niu. Hogwild!: A lock-free approach to parallelizing stochastic gradient descent. In NIPS, 2011.Google Scholar
- S. T. Roweis and L. K. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500):2323--2326, 2000.Google ScholarCross Ref
- J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, and Q. Mei. LINE: Large-scale Information Network Embedding. In WWW, 2015. Google ScholarDigital Library
- L. Tang and H. Liu. Leveraging social media networks for classification. Data Mining and Knowledge Discovery, 23(3):447--478, 2011. Google ScholarDigital Library
- J. B. Tenenbaum, V. De Silva, and J. C. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500):2319--2323, 2000.Google ScholarCross Ref
- F. Tian, B. Gao, Q. Cui, E. Chen, and T.-Y. Liu. Learning deep representations for graph clustering. In AAAI, 2014. Google ScholarDigital Library
- K. Toutanova, D. Klein, C. D. Manning, and Y. Singer. Feature-rich part-of-speech tagging with a cyclic dependency network. In NAACL, 2003. Google ScholarDigital Library
- G. Tsoumakas and I. Katakis. Multi-label classification: An overview. Dept. of Informatics, Aristotle University of Thessaloniki, Greece, 2006.Google Scholar
- A. Vazquez, A. Flammini, A. Maritan, and A. Vespignani. Global protein function prediction from protein-protein interaction networks. Nature biotechnology, 21(6):697--700, 2003.Google ScholarCross Ref
- S. Yan, D. Xu, B. Zhang, H.-J. Zhang, Q. Yang, and S. Lin. Graph embedding and extensions: a general framework for dimensionality reduction. IEEE TPAMI, 29(1):40--51, 2007. Google ScholarDigital Library
- J. Yang and J. Leskovec. Overlapping communities explain core-periphery organization of networks. Proceedings of the IEEE, 102(12):1892--1902, 2014.Google ScholarCross Ref
- S.-H. Yang, B. Long, A. Smola, N. Sadagopan, Z. Zheng, and H. Zha. Like like alike: joint friendship and interest propagation in social networks. In WWW, 2011. Google ScholarDigital Library
- R. Zafarani and H. Liu. Social computing data repository at ASU, 2009.Google Scholar
- S. Zhai and Z. Zhang. Dropout training of matrix factorization and autoencoder for link prediction in sparse graphs. In SDM, 2015.Google ScholarCross Ref
Index Terms
- node2vec: Scalable Feature Learning for Networks
Recommendations
A network representation learning method based on topology
AbstractIn the network, nodes with similar neighborhood topology often have similar functions and play similar roles. Accurately identifying the role of nodes is the key to our understanding of the network and then facilitates the completion ...
Higher-order Network Representation Learning
WWW '18: Companion Proceedings of the The Web Conference 2018This paper describes a general framework for learning Higher-Order Network Embeddings (HONE) from graph data based on network motifs. The HONE framework is highly expressive and flexible with many interchangeable components. The experimental results ...
VERSE: Versatile Graph Embeddings from Similarity Measures
WWW '18: Proceedings of the 2018 World Wide Web ConferenceEmbedding a web-scale information network into a low-dimensional vector space facilitates tasks such as link prediction, classification, and visualization. Past research has addressed the problem of extracting such embeddings by adopting methods from ...
Comments