data mining for moderation of social data
TRANSCRIPT
![Page 2: Data Mining for Moderation of Social Data](https://reader033.vdocument.in/reader033/viewer/2022060112/55700502d8b42a84618b5313/html5/thumbnails/2.jpg)
![Page 3: Data Mining for Moderation of Social Data](https://reader033.vdocument.in/reader033/viewer/2022060112/55700502d8b42a84618b5313/html5/thumbnails/3.jpg)
3 © 2011 SolidQ
![Page 4: Data Mining for Moderation of Social Data](https://reader033.vdocument.in/reader033/viewer/2022060112/55700502d8b42a84618b5313/html5/thumbnails/4.jpg)
Introductions • Fernando G. Guerrero •Global CEO of SolidQ • [email protected]
•Microsoft Regional Director for Spain since 2004 • SQL Server MVP from year 2000 till 2007 •Usual suspect at many international conferences
![Page 5: Data Mining for Moderation of Social Data](https://reader033.vdocument.in/reader033/viewer/2022060112/55700502d8b42a84618b5313/html5/thumbnails/5.jpg)
SolidQ 2012… 10th anniversary •160 people in 23 countries:
• Argentina, Australia, Austria, Bulgaria, Canada, Chile, Costa Rica, Croatia, Denmark, France, Germany, India, Israel, Italy, Mexico, Saudi Arabia, Serbia, Slovakia, Slovenia, Spain, Sweden, UK, USA
•50 current or former RDs or MVPs •Authors of many books, articles, and whitepapers •Research Collaboration with:
• Universidad de Alicante • Universidad de les Illes Balears • Universidad de Santiago de Compostela • The European Union • The Spanish Ministry of Economy and Innovation
![Page 6: Data Mining for Moderation of Social Data](https://reader033.vdocument.in/reader033/viewer/2022060112/55700502d8b42a84618b5313/html5/thumbnails/6.jpg)
6 © 2012 SolidQ
Agenda
• Social Data •Market Research • Sentiment Analysis, Text Mining •Moderation, Data Mining • SolidQ Research Lines in Social Data
![Page 7: Data Mining for Moderation of Social Data](https://reader033.vdocument.in/reader033/viewer/2022060112/55700502d8b42a84618b5313/html5/thumbnails/7.jpg)
7 © 2012 SolidQ
Social data is everywhere
![Page 8: Data Mining for Moderation of Social Data](https://reader033.vdocument.in/reader033/viewer/2022060112/55700502d8b42a84618b5313/html5/thumbnails/8.jpg)
8
![Page 9: Data Mining for Moderation of Social Data](https://reader033.vdocument.in/reader033/viewer/2022060112/55700502d8b42a84618b5313/html5/thumbnails/9.jpg)
9 © 2012 SolidQ
Social data is about everything
Music
![Page 10: Data Mining for Moderation of Social Data](https://reader033.vdocument.in/reader033/viewer/2022060112/55700502d8b42a84618b5313/html5/thumbnails/10.jpg)
10 © 2012 SolidQ
Social is there
• Is your organization promoting social about you?
Products Services Stories
![Page 11: Data Mining for Moderation of Social Data](https://reader033.vdocument.in/reader033/viewer/2022060112/55700502d8b42a84618b5313/html5/thumbnails/11.jpg)
11 © 2012 SolidQ
Social is there, reputation
•What is social saying about you? • Product • Services • Decisions • Image
![Page 12: Data Mining for Moderation of Social Data](https://reader033.vdocument.in/reader033/viewer/2022060112/55700502d8b42a84618b5313/html5/thumbnails/12.jpg)
12 © 2012 SolidQ
Market Research
•What is social requesting you? • Future Services • Product updates
•Can you ask questions to social?
• Is this service going to succeed • How can I fixed the current problem • Is society ready for this law
![Page 13: Data Mining for Moderation of Social Data](https://reader033.vdocument.in/reader033/viewer/2022060112/55700502d8b42a84618b5313/html5/thumbnails/13.jpg)
13 © 2012 SolidQ
Sentiment Analysis, Text Mining
The movie was fabulous!
The movie stars Mr. X
The movie was horrible!
[ Factual ] [ Sentimental ] [ Sentimental ]
![Page 14: Data Mining for Moderation of Social Data](https://reader033.vdocument.in/reader033/viewer/2022060112/55700502d8b42a84618b5313/html5/thumbnails/14.jpg)
14 © 2011 SolidQ
![Page 15: Data Mining for Moderation of Social Data](https://reader033.vdocument.in/reader033/viewer/2022060112/55700502d8b42a84618b5313/html5/thumbnails/15.jpg)
15 © 2012 SolidQ
What is Data Mining?
• Inform actionable business decisions •Contrasts with “machine learning”
![Page 16: Data Mining for Moderation of Social Data](https://reader033.vdocument.in/reader033/viewer/2022060112/55700502d8b42a84618b5313/html5/thumbnails/16.jpg)
16 © 2012 SolidQ
Media Case Study
•Millions of posts per year (different moderation scenarios) •About 25% are human moderated •About 10% of the moderated posts fail •No Business Intelligence applications for analysis
or reporting
![Page 17: Data Mining for Moderation of Social Data](https://reader033.vdocument.in/reader033/viewer/2022060112/55700502d8b42a84618b5313/html5/thumbnails/17.jpg)
17 © 2012 SolidQ
Moderation, Data Mining
• Contextual Information • Time • Location • User
• At 10am comments are safer than at 2AM. • A user maybe safe talking about science bad
dangerous talking about sports. • If a thread is hot (dangerous), comment maybe hot. • Combining context pattern the systems assign risk to
posts without going into the text.
![Page 18: Data Mining for Moderation of Social Data](https://reader033.vdocument.in/reader033/viewer/2022060112/55700502d8b42a84618b5313/html5/thumbnails/18.jpg)
18 © 2012 SolidQ
Solution – Logical Model
•Post Context (behavior analysis) • Patterns, data mining.
•Post Content (text analysis) • Profanity, low score sentences, text mining, mood or
tone (sentiment analysis)
![Page 19: Data Mining for Moderation of Social Data](https://reader033.vdocument.in/reader033/viewer/2022060112/55700502d8b42a84618b5313/html5/thumbnails/19.jpg)
19 © 2012 SolidQ
Typically Available Data on Posts
•Historical and real time data for: • User (e.g. userid, email, nationalid) • Location (e.g. Life & Style Fashion) • Time (e.g. 12 March 2011 18:56) • Content (e.g. text, link, picture, video). • Moderation result
•Other attributes like geography, age, education could be used
![Page 20: Data Mining for Moderation of Social Data](https://reader033.vdocument.in/reader033/viewer/2022060112/55700502d8b42a84618b5313/html5/thumbnails/20.jpg)
Post context, Patterns, Data Mining •User behavior. • Time behavior. • Location behavior.
20 © 2012 Solid Quality Mentors
![Page 21: Data Mining for Moderation of Social Data](https://reader033.vdocument.in/reader033/viewer/2022060112/55700502d8b42a84618b5313/html5/thumbnails/21.jpg)
Building useful attributes • 1.- Thread ( % Fails in a certain thread) • 2.- User (% Fails per User) • 3.- Diff Hour Forum Created (TimeDatePosted-TimeForumCreated) • 4.- User Forum (% Fails in a certain forum) • 5.- Diff Last for User (TimeDatePosted - TimeLastFailUser) • 6.- Hour of the day • 7.- Diff hour UserJoined-Now (TimeDatePosted-TimeUserJoined) • 8.- User Thread (% Fails per User in a thread) • 9.- Diff Hour Thread Created (TimeDatePosted-TimeThreadCreated) • 10.- Day of Week • More than 100 attributes.
21 © 2012 Solid Quality Mentors
![Page 22: Data Mining for Moderation of Social Data](https://reader033.vdocument.in/reader033/viewer/2022060112/55700502d8b42a84618b5313/html5/thumbnails/22.jpg)
Hard Work •Periods. •Algorithms. •Algorithms' parameters. •Model refreshing. •Attribute analysis. •Outliers. •Overpopulating. •Behavior after this systems is in production.
22 © 2012 Solid Quality Mentors
![Page 23: Data Mining for Moderation of Social Data](https://reader033.vdocument.in/reader033/viewer/2022060112/55700502d8b42a84618b5313/html5/thumbnails/23.jpg)
Data Mining Algorithms
•Decision Trees/Linear Regression • Sequence Analysis •Neural Networks/Logistic Regression •Clustering • Text Mining (Words and Phrases)
23 © 2012 SolidQ
![Page 24: Data Mining for Moderation of Social Data](https://reader033.vdocument.in/reader033/viewer/2022060112/55700502d8b42a84618b5313/html5/thumbnails/24.jpg)
24 © 2012 SolidQ
Conclusion on Context
•Risk based on context of the post • Time • User’s history • Publish location
• Enables risk analysis for all type of content • Comments (in any language) • Links • Pictures • Videos
![Page 25: Data Mining for Moderation of Social Data](https://reader033.vdocument.in/reader033/viewer/2022060112/55700502d8b42a84618b5313/html5/thumbnails/25.jpg)
Logical Model: Post content
•Profanity Analysis • Text Mining
The first minister and his secretary found sleeping together last night. They got drunk at a nearby pub.
• Sentiment Analysis
25 © 2012 SolidQ
![Page 26: Data Mining for Moderation of Social Data](https://reader033.vdocument.in/reader033/viewer/2022060112/55700502d8b42a84618b5313/html5/thumbnails/26.jpg)
26 © 2011 SolidQ
![Page 27: Data Mining for Moderation of Social Data](https://reader033.vdocument.in/reader033/viewer/2022060112/55700502d8b42a84618b5313/html5/thumbnails/27.jpg)
27 © 2012 SolidQ
Moderation, Data Mining System
![Page 28: Data Mining for Moderation of Social Data](https://reader033.vdocument.in/reader033/viewer/2022060112/55700502d8b42a84618b5313/html5/thumbnails/28.jpg)
28 © 2011 SolidQ
![Page 29: Data Mining for Moderation of Social Data](https://reader033.vdocument.in/reader033/viewer/2022060112/55700502d8b42a84618b5313/html5/thumbnails/29.jpg)
Analysis and Reporting •Published through integrated web application
• Moderation statistics. • Users statistics. • News and Stories Statistics. • Peaks.
29 © 2012 SolidQ
![Page 30: Data Mining for Moderation of Social Data](https://reader033.vdocument.in/reader033/viewer/2022060112/55700502d8b42a84618b5313/html5/thumbnails/30.jpg)
30 © 2012 SolidQ
Conclusion: Benefits
•Moderating half of the total posts, the solution captures 90% of failing posts. The remaining 10% seem to be likely safe posts. •Using Intelligent Moderation, media companies
scan the whole universe of posts at a comparatively low cost. •At peak times, Intelligent Moderation works
perfect.
![Page 31: Data Mining for Moderation of Social Data](https://reader033.vdocument.in/reader033/viewer/2022060112/55700502d8b42a84618b5313/html5/thumbnails/31.jpg)
31 © 2011 SolidQ
Football night in Europe
•On January 25th, 2012: • Liverpool defeated Manchester City in the Carling Cup • Barcelona defeated Real Madrid in Copa del Rey
•More than 100.000 comments arrived to the different BBC sites during 10 hours •All comments were filtered through our system •No problems observed during that time
![Page 32: Data Mining for Moderation of Social Data](https://reader033.vdocument.in/reader033/viewer/2022060112/55700502d8b42a84618b5313/html5/thumbnails/32.jpg)
32 © 2012 SolidQ
SolidQ Team in this project
•Project Managers • Francisco Gonzalez, Javier Torrenteras, Alejandro
Leguizamo
•Developers • Itzik Ben-Gan, Enrique Puig, Ruben Pertusa, Carlos
Martinez , Fernando G. Guerrero
• Technical reviewers • Mark Tabladillo, Dejan Sarka
• Social Media Specialist. • Jose Quinto, Rocio Díaz
![Page 33: Data Mining for Moderation of Social Data](https://reader033.vdocument.in/reader033/viewer/2022060112/55700502d8b42a84618b5313/html5/thumbnails/33.jpg)
33 © 2012 SolidQ
SolidQ Reseach
• Incomplete Grammar Analysis •Human interaction with IT systems
• Collaboration • Contextual analysis
• Sentiment Analysis • Market Research • Reputation
•Data Mining of context Social • Moderation • Market Research • Reputation
![Page 34: Data Mining for Moderation of Social Data](https://reader033.vdocument.in/reader033/viewer/2022060112/55700502d8b42a84618b5313/html5/thumbnails/34.jpg)
Invisible computing…
34
… Driven by Social Data