CAOSS: Computational and Online Social Science
Friday, October 12, 2012
Altschul Auditorium
Columbia University
417 International Affairs Building
New York City
With an explosion of data on every aspect of our everyday existence—from what we buy, to where we travel, to who we know—we are able to observe human behavior with granularity largely thought impossible just a decade ago. The growth of such online activity has further facilitated the design of web-based experiments, enhancing both the scale and efficiency of traditional methods. Together these advances have created an unprecedented opportunity to address longstanding questions in the social sciences, ranging from sociology to political science to economics and beyond.
The inaugural 2012 workshop on Computational and Online Social Science (CAOSS, pronounced “chaos”) aims to bring together diverse research communities that work at the intersection of computer science and social sciences to build a lasting foundation for this emerging discipline.
As space is limited, we ask that you please complete the free registration form if you plan to attend.
Please contact us if you have any additional questions.
Steering Committee
- Sharad Goel, Microsoft Research
- Jake Hofman, Microsoft Research
- David Park, Columbia University
- Sergei Vassilvitskii, Google
- Microsoft Research
- eBay Research
- Applied Statistics Center, Columbia University
- ACM SIGecom
- Johnson Research Lab
CAOSS will be held at the Altschul Auditorium at Columbia University on October 12th, 2012. The workshop consists of 5 plenary talks by invited speakers along with shorter contributed talks and a poster session.
As space is limited, we ask that you please complete the free registration form if you plan to attend.
| 8:50am-9:00am | Welcome |
| 9:00am-9:50am |
Measuring and Propagating Influence in Networks Sinan Aral, NYU Stern Measuring influence and finding influential people in social networks is now all the rage. But, true estimates of influence are fraught with statistical difficulties which naïve scoring methods cannot address. So, how can we robustly measure influence and identify influentials in networks? Whether in the spread of disease, the diffusion of information, the propagation of social contagions, the effectiveness of viral marketing, or the magnitude of peer effects in a variety of settings, a key problem is understanding whether and when the statistical relationships we observe can be interpreted causally. Sinan will review what we know and where work might go with respect to identifying causal peer influence in social networks and the importance of causal inference for understanding the spread of products, political views and public health behaviors through society. He will provide examples from large scale observational and experimental studies in online social media networks and describe a new project to spread HIV testing using peer to peer influence and mobile messaging in South Africa, the subject of which is the basis for a new documentary film entitled "The Social Cure." |
| 10:00am-10:50am |
Inferring Causality in Observational Data about Social Networks David Jensen, University of Massachusetts Amherst Over the past decade, realization has been growing about a fruitful synthesis between machine learning and social science. One area of particularly high potential is the connection between large data sets and the desire to understand the deep causal structure of social systems. Over the past several decades, computer scientists and others have developed theoretical infrastructure to formally express causal models and to reason about the connections between a given causal model and its observable consequences in data. This work has resulted in highly effective algorithms for learning causal models that are consistent with a given set of observational data, often allowing strong inferences about the direction and size of specific causal dependencies. Unfortunately, most of this theoretical infrastructure assumes that data records are statistically independent and identically distributed, although many of the most interesting social science problems concern interacting sets of heterogeneous people, places, and things. I will discuss recent progress in extending the formal theories of causal inference and discovery to data about the behavior of social systems. I will also identify several key challenges that remain unsolved. |
| 11:00am-11:30am | Break |
| 11:30am-12:20pm |
How users evaluate each other in social media Jure Leskovec, Stanford University In a variety of domains, mechanisms for evaluation allow one user to say whether he or she trusts another user, or likes the content they produced, or wants to confer special levels of authority or responsibility on them. We investigate a number of fundamental ways in which user and item characteristics affect the evaluations in online settings. For example, evaluations are not unidimensional but include multiple aspects that all together contribute to user’s overall rating. We investigate methods for modeling attitudes and attributes from online reviews that help us better understand user’s individual preferences. We also examine how to create a composite description of evaluations that accurately reflects some type of cumulative opinion of a community. Natural applications of these investigations include predicting the evaluation outcomes based on user characteristics and to estimate the chance of a favorable overall evaluation from a group knowing only the attributes of the group's members, but not their expressed opinions. |
| 12:30pm-1:45pm |
Lunch & Poster Session Wein Hall |
| 1:45pm-2:45pm |
Short talks
|
| 2:45pm-3:00pm | Break |
| 3:00pm-3:50pm |
"Which Half is Wasted?": Controlled Experiments to Measure Online-Advertising Effectiveness David Reiley, Google The department-store retailer John Wanamaker famously stated, “Half the money I spend on advertising is wasted—I just don’t know which half.” Compared with the measurement of advertising effectiveness in traditional media, online advertisers and publishers have considerable data advantages, including individual-level data on advertising exposures, clicks, searches, and other online user behaviors. However, as I shall discuss in this talk, the science of advertising effectiveness requires more than just quantity of data - even more important is the quality of the data. In particular, in many cases, using various statistical techniques with observational data leads to incorrect measurements. To measure the true causal effects, we run controlled experiments that suppress advertising to a control group, much like the placebo in a drug trial. With experiments to determine the ground truth, we can show that in many circumstances, observational-data techniques rely on identifying assumptions that prove to be incorrect, and they produce estimates differing wildly from the truth. Despite increases in data availability, Wanamaker's complaint remains just as true for online advertising as it was for print advertising a century ago. In this talk, I will discuss recent advances in running randomized experiments online, measuring the impact of online display advertising on consumer behavior. Interesting results include the measurable effects of online advertising on offline transactions, the impact on viewers who do not click the ads, the surprisingly large effects of frequency of exposure, and the heterogeneity of advertising effectiveness across users in different demographic groups or geographic locations. I also show that sample sizes of a million or more customers may be necessary to get enough precision for statistical significance of economically important effects - so we have just reached the cusp of being able to measure effects precisely with present technology. (By comparison, previous controlled experiments using split-cable TV systems, with sample sizes in the mere thousands, have lacked statistical power to measure precise effects for a given campaign.) As I show with several examples that establish the ground truth using controlled experiments, the bias in observational studies can be extremely large, over-or-underestimating the true causal effects by an order of magnitude. I will discuss the (implicit or explicit) modeling assumptions made by researchers using observational data, and identify several reasons why these assumptions are violated in practice. I will also discuss future directions in using experiments to measure advertising effectiveness. |
| 4:00pm-4:50pm |
The Virtual Lab Duncan Watts, Microsoft Research Crowdsourcing sites like Amazon's Mechanical Turk are increasingly being used by researchers to construct "virtual labs" in which they can conduct behavioral experiments. In this talk, I describe some recent experiments that showcase the advantages of virtual over traditional physical labs, as well as some of the limitations. I then discuss how this relatively new experimental capability may unfold in the near future, along with some implications for social and behavioral science. |