A Study of social influence in diffusion of innovation over
23 Slides1.02 MB

A Study of social influence in diffusion of innovation over Facebook Shaomei Wu [email protected] Information Science Cornell University Information Science Breakfast, Dec 5, 2008

Diffusion of Innovation “ Diffusion is the process in which an innovation is communicated through certain channels over time among the members of a social system. ” –––– Everett M. Rogers * “innovation”: Friendship Quiz – a Facebook application “Communicated”: Invitations among Facebook friends “time”: September 25, 2008 – Now “social system”: Facebook * Rogers, Everett M. (2003). Diffusion of Innovations, 5th ed. New York, NY: Free Press, pp 5-6

Basic Diffusion Models Threshold Model Cascade Model Statistically Equivalent * *David Kempe, Jon Kleinberg, Eva Tardos. Maximizing the Spread of Influence through a Social Network. KDD, 2003

Cascade Model Each recommendation will succeed with certain probability . h k b pgk pab pgl g c pac pag a l pad paf f i pab pdi d pdj pae j non-adopter e adopter social link Question: how to estimate puv ? recommendation

Question: how to estimate puv? Current practice Constant [1] Based on ONLY network structure (e.g., in/out-degree) [2] Do individuals and the social relationship among them matter? [1] Jure Leskovec, Mary McGlohon, Christos Faloutsos, Natalie Glance, Matthew Hurst, Cascading Behavior in Large Blog Graphs. SDM 2007. [2] Jure Leskovec, Lada Adamic, Bernardo Huberman. The Dynamics of Viral Marketing. ACM Conference on Electronic Commerce (EC) 2006.

Theories from Empirical Diffusion Research: Opinion leaders: who own “greater exposure to mass media than their followers”, “are more cosmopolite”, “have greater social participation” , “have higher socioeconomic status”, and “are more innovative” [Rogers 2003, pp 316-318]. The importance of heterophily between participants on certain attributes (i.e., education and socioeconomic status) at determining the efficiency of diffusion, despite the fact that “more effective communication occurs when two or more individuals are homophilous” [Rogers, 2003, pp19]

This project is to Model puv’s for cascade model Identify the most influential factors at determining puv Predict the success of contagion Exploit Facebook data A real-world, ongoing diffusion instance; Rich and (most of the time) trustable profile information of individuals and their social connections/activities; Precisely timestamped diffusion process, a complete log of events;

Status Launched: Sep 25, 2008. Currently used data is until: Nov 25, 2008. 216 adopters, 375 individuals, 737 edges between 266 pairs of people, 90 successful infection 178 failed infection Network Evolution (in the first month after release)

political view distribution Gender distribution 80 10 70 8 56 adopters non-adopters 6 4 female male 30 2 26 0 co ns 10 0 adopters # of people non-adopters Religious View Distribution Age distribution 16 30 14 25 10 adopters 8 non-adopters 6 4 2 people count 12 Count m od e er va tiv e 20 ot he r 40 er al Li be r ta D em ria oc n ra tic R Pa ep rty ub lic an Pa rty Ap at he tic 47 50 lib 60 # of people 12 82 ra te 90 20 Non-adopter Adopter 15 10 5 0 Christian Muslim Other 0 Religion age

Predict the success of invitation with SVM A Binary classifier: each invitation is either successful or failed. Features Individual features Pair features (homophily/heterophily)

Individual Features Social Activeness Innovativeness Socioeconomics Education # of events attended/invited # of photo tagged # of wall posts # of networks # of groups participated # of notes Religion Political View Gender Age Culture Background Relationship Status Work Info Education Info

Pair-wise Features Biological traits Belief Socioeconomics Proximity Age difference Same gender? Same political view? Same religion? Same culture background? # of same networks # of photos both tagged # of groups both participated # of events both attended Same education level? Same high school? Same college? Same workplace? Same current city?

Each invitation is a training example - machine learning. time sender receiver class sender features receiver features pair features 2008-09-25 18:25:41 589483260 3621185 1 1:22 2:1 3:0 4:0 5:0 6:1 35:1 47:0 48:0 49:0 50:0 51:0 68:0 69:0 70:0 74:1 76:1 2008-09-25 18:25:49 3621185 571023231 -1 2008-11-24 02:40:34 768059413 81405257 -1 Training Data * all numerical features are normalized across examples.

AdaBoost (with DecisionDump) A popular way to do feature selection. Selected Features sender wall post count sender group count sender network count receiver age receiver group count sender & receiver common group count Performance (10-fold cross validation) Accuracy: 83.6% Class precision Recall -1 83.5% 93.8% 1 83.8% 63.3%

SVM performance SVM-light (10-fold cross-validation) fold accuracy precision recall 1 80.77 100 58.33 2 80.77 100 44.44 3 88.46 100 62.5 4 76.92 50 33.33 5 73.08 100 30 6 84.62 100 50 7 69.23 50 50 8 76.92 100 53.85 9 88.46 100 66.67 10 88.24 80 57.14 average 80.747 88 50.626

Weights from SVM weight feature weight distribution 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 -0.2 -0.4 receiver age receiver groupCount receiver isChristian sender isWorking sender isModerate sender isCollege sender isInARelationship sender isChristian sender isMarried sender isOther sender age receiver photoTagged receiver isMiddleEastern receiver isMuslim sameReligion sameCollege receiver isAtheist/Agnostic sender networkCount receiver eventCount sameWorkPlace receiver isWorking receiver noteCount sender wallCount receiver isRepublic feature

Result SVM-light performance 209 records into 5 folds, 4 for training, 1 for testing. Top weighted features: Performance on the testing set: 8, sender events invited, Accuracy: 71.43% (30 correct, 12 incorrect, 4, sender friend count, 42 total) 11, sender gender Feature Weights Precision/recall: 55.56%/38.46% 35, receiver is It's 1.4 1.2 8 Feature weights distribution 1 0.8 0.6 0.4 0.2 0 -0.2 0 -0.4 -0.6 4 Complicated 5, sender wall post count, 9, sender note count 27. sender is In a Relationship 35 So, the story can be: when a sender 9 27 who has been invited to greater 28 22 24has more of events in Facebook, friends, wrote more 18 1 number 2 3 10 12 30313233 17 192021 2526 5 10 15 20 25 30 40 posts, in a 34less wall Facebook notes (blog entries), is female, has35 5 6 11 relationship, tried to infect a person whose relationship status is “it’s complicated”, it’s more like the infection will happen compared to other cases.

SVM with features selected by AdaBoost fold accuracy precision recall 1 80.77 100 58.33 2 80.77 83.33 55.56 3 88.46 100 62.5 4 73.08 0 0 5 76.92 100 40 6 84.62 83.33 62.5 7 76.92 66.67 50 8 80.77 100 61.54 9 96.15 100 88.89 10 91.18 83.33 71.43 average 82.96 81.67 55.075

Background Diffusion of Innovation Question: How does it work in large online social networks? What are the key factors at determining the success of infection? Can we predict the propagation path?

Social influence depends on 5 dimensions of similarities: geographical distance Hypothesis current location(country/state/city), current school, current major, year of class, current workplace, current courses enrolled; background similarity sex, sexual preference, dating interest, relationship interest, relationship status, birthday, political view, religious view, hometown address, previous school, previous workplace; social similarity number of mutual networks they belong to, number of mutual friends; interest similarity activities, favorite books, favorite music, favorite movies, favorite TV shows, favorite quotas; social status distance difference of numbers of friends, difference of wallpost counts, difference of counts of message sent and received, difference of counts of notes.

Project Description Objectives Identify the key factors for social influence; Predict occurrence of adoption based on the key factors. Friendship Quiz A Facebook application we developed; Enable users to make quizzes and send to their friends (take a peek!); We track the spread of application.

Highlights A real-world diffusion of innovation; Rich and (most of the time) trustful profile information of individuals and their social connections/activities; Precisely timestamped diffusion process, a complete log of events; Ongoing diffusion process

Backup: Threshold Model