How do we identify the topic and the level of expertise of individuals? This is a problem of particular interest in large and distributed organizations. Fortunately, there has been increase adoption of social media inside the firm over last decade. The user activities on such platforms provide us with rich user generated content that can be used to learn a lot about the users. This project aims to develop methods to identify experts and expertise from such datasets. The key idea behind this project is that by analyzing the network of communication among individuals and the content of their conversation we can identify central people in various conversation occurring in the organization.
Working Paper
Title: ``Socio-temporal analysis of conversations in intra-organizational blogs’’ (PDF), N. Sahoo, R. Krishnan, C. Faloutsos
Abstract: Blogs have been popular on the Internet for a number of years and are becoming increasingly popular within organizations as well. The analysis of blog posts is a useful way to understand the nature of expertise within the firm. In this paper we are interested in understanding the topics of conversations that evolve through blog posts and replies. While keywords within blog posts can be used to characterize the topic being discussed, their timestamps permit one to monitor how the intensity of the topic has changed over time, and the author information permit the social nature of the topics to be monitored. Based on this observation we define topics of conversation using keywords, author & recipient, and timestamps of the blog posts & replies. We use tensors to capture these multiple modes of the blog data. With this rich representation of the multi-modal data we identify significant topics and key entities in those topics. This is done by generalizing the idea of significance by association, that has been extensively used in social network analysis, to multi- modal network data. We show that such significance in blogs can be calculated by tensor factorization. This method is illustrated by applying it to a dataset extracted from the blog network within a large globally distributed IT services firm. We discuss implications of this work for monitoring opinion developments and detecting opinion leaders within the organization. We find that the central bloggers identified by tensor factorization are more “on topic” with respect to the topic of discussion in their responses than the central bloggers identified by HITS algorithm. Finally a tensor factorization based clustering method is designed to discover communities from the online social conversations. The effectiveness of this method is measured with the help of author provided community labels on the conversations.
Acknowledgement
This project is partially supported by a grant from the Rafik B. Hariri Institute for Computing and Computational Science & Engineering.