© Hoffman and Novak 2017 | http://postsocial.gwu.edu
Visualizing Emergent Identity of Assemblages in the Internet of Things:
A Topological Data Analysis Approach
Paperpresentedat2017INFORMSMarketingScienceConference,LosAngeles,CA,June10,2017
TomNovakandDonnaHoffman,TheGeorgeWashingtonUniversity
© Hoffman and Novak 2017 | http://postsocial.gwu.edu
AgendaUse Topological Data Analysis (TDA) to operationalize DeLanda’s (2002, 2006, 2011, 2016) concept of an assemblage’s possibility space(a.k.a. the market structure of underlying consumer needs).
› The Internet of Things (IOT) and IFTTT
› Topological Data Analysis (TDA)
› Analysis of IFTTT Recipes with TDA
› Future Directions
2
© Hoffman and Novak 2017 | http://postsocial.gwu.edu© Hoffman and Novak 2017 | http://postsocial.gwu.edu
The IoT and IFTTT
3
© Hoffman and Novak 2017 | http://postsocial.gwu.edu© Hoffman and Novak 2017 | http://postsocial.gwu.edu
IFTTT Codes IoT and Web Interactions
4
InternetofThings(IoT)ThewiderangeofeverydayobjectsandproductsintherealworldthatareenhancedwithprogrammablesensorsandactuatorsthatcommunicatewithotherdevicesandconsumersthroughtheInternet.--HoffmanandNovak2016
IFTTT.com
© Hoffman and Novak 2017 | http://postsocial.gwu.edu© Hoffman and Novak 2017 | http://postsocial.gwu.edu
Anatomy of an IFTTT Recipe
5
If any new post on Blogger then create a link post on Facebook
TriggerEvent
TriggerChannel
ActionEvent
ActionChannel
IFTTT (If-This-Then-That) Recipe
© Hoffman and Novak 2017 | http://postsocial.gwu.edu© Hoffman and Novak 2017 | http://postsocial.gwu.edu
5 Years of IFTTT
6
Each day, 20 million IFTTT recipes are run by IFTTT users (Lunden 2015).
Users can choose to make their recipes public by publishing them. From 2011 to 2016, a total of 331,391 IFTTT recipes have been published.
Of these published recipes, 20,675 are unique. Variants of these unique recipes have been published anywhere from 1 to 9273 times.
The 20,675 unique published IFTTT recipes use:
• 347 different trigger channels using 1110 different trigger events
• 297 different action channels using 591 different action events
Research Question: What is the topological structure of the 20,675 published IFTTT recipes?
© Hoffman and Novak 2017 | http://postsocial.gwu.edu© Hoffman and Novak 2017 | http://postsocial.gwu.edu
An Assemblage Emerges from the Interaction of its Components
7
Consumer Alexa
Hue IFTTTAssemblage
InteractingComponents
© Hoffman and Novak 2017 | http://postsocial.gwu.edu
The Possibility Space and Assemblages
TDA provides an empirical way of visualizing the mechanism-independent possibility space, given a population of individual assemblages.
Two requirements for emergence of an assemblage (DeLanda 2011):
› Mechanism-dependent: ongoing recurrent processes involve interaction among components of an assemblage, through their exercised paired capacities.
› Mechanism-independent: a mathematical topological structure, the possibility space, contains points of attraction that guide the recurrent processes of assemblage. This leads to the emergence of populations of assemblages that wind up in the space place, although “different trajectories may be attracted to the same final state” (DeLanda 2002, p 7).
8
© Hoffman and Novak 2017 | http://postsocial.gwu.edu© Hoffman and Novak 2017 | http://postsocial.gwu.edu
Topological Data Analysis (TDA)
9
© Hoffman and Novak 2017 | http://postsocial.gwu.edu© Hoffman and Novak 2017 | http://postsocial.gwu.edu
Topological Data Analysis (TDA)
TDA (Carlsson 2009; Lum, et.al. 2012; Singh, Memoli and Carlsson2007) uses computational topology techniques on complex high-dimensional data to produce a three-dimensional topology of simplicial complexes (discrete, combinatorial objects) in which groups of data are represented as nodes that contain rows that are similar to each other in the high-dimensional topological space and the edges connect nodes that share rows.
10
© Hoffman and Novak 2017 | http://postsocial.gwu.edu© Hoffman and Novak 2017 | http://postsocial.gwu.edu
How TDA Creates Topological Models
11
X
yBi
n 1
Bin
2
Bin
3
Bin
4
Bin 1
Bin 2
Bin 3
Bin 4
Slide images adapted courtesy of Ayasdi, Inc. http://ayasdi.com/
Step 1
Rectangular data. Many rows, 2 columns x and y. When plotted, these data define a circle.
Step 2
Map data onto a single number using a function (or lens). Here, the lens is the y coordinate.
Step 3
Sort data into overlapping bins, based upon the value of the lens. Then, look at values of the original variables in these bins.
Step 4
Cluster the original data (i.e. x and y values) within each bin. Bins 1 and 4 have one cluster. Bins 2 and 3 have two clusters. This results in 6 nodes.
Step 5
Connect nodes by an edge if they have rows in common. The shape of the topological model has meaning and represents the shape of the data.
© Hoffman and Novak 2017 | http://postsocial.gwu.edu© Hoffman and Novak 2017 | http://postsocial.gwu.edu
Analysis of IFTTT Recipes with TDA
12
© Hoffman and Novak 2017 | http://postsocial.gwu.edu© Hoffman and Novak 2017 | http://postsocial.gwu.edu
TDA Implementation
Open source
› Python Mapper (Müllner and Babu 2013; Singh, Mémoli, and Carlsson 2007)› Kepler Mapper in Python (Triskelion 2015, proof of concept for Ayasdi flavor of TDA)› Dionysus (in C++ with Python bindings) based on Zomorodian and Carlsson (2005)
and Edelsbrunner, › Letscher and Zomorodian (2000) on computing persistent homology› Package TDA - R interface for GUDHI, Dionysus, PHAT› TDA Mapper – R package using Mapper› JavaPlex library implements persistent homology for MATLAB and java-based systems› CTL = C++ library for computational topology› And others, see GitHub
Commercial
› Ayasdi (Workbench Web platform and Python SDK)
13
© Hoffman and Novak 2017 | http://postsocial.gwu.edu© Hoffman and Novak 2017 | http://postsocial.gwu.edu
IFTTT Recipes – Preparation for TDA20,675 unique IFTTT published recipe text strings from 2011-2016 of the form:
IF say a specific phrase using_channel Amazon Alexa THEN boost your hot water using_channel Hive Active Heating
Created binary variables for ngrams that had a frequency > 50, ignoring standard stop words:
› 1 grams (triggers n=364 and actions n=264): “Alexa”
› 2 grams (triggers n=451 and actions n=353): “Amazon Alexa”
› 3 grams (triggers n=407 and actions n=324): “Amazon Alexa add”
PCA on 2163 ngrams (of these, 880 were non-redundant)
› 362 eigenvalues > 1
› First 100 eigenvalues explained 51.04% of variance (first 2 explained 1.91%)
› Obtained scores on first 100 components (variance = eigenvalue)
14
© Hoffman and Novak 2017 | http://postsocial.gwu.edu© Hoffman and Novak 2017 | http://postsocial.gwu.edu
Assumptions for TDA of Our IFTTT Data
Euclidean distance was used as the metric defining distance between the rows of the data matrix.
› Why used? Each row contains scores on the first 100 principal components of ngrams, so Euclidean distance has a natural interpretation.
Neighborhood Lenses 1 & 2 (coordinates of a k-nearest neighbors graph of the data embedded in two-dimensions)
› Why used? Magnifies differentiation among groups by locally adapting distance/closeness.
Note: other metrics and lenses were also tried, but produced less clearly interpretable topological modes.
15
© Hoffman and Novak 2017 | http://postsocial.gwu.edu© Hoffman and Novak 2017 | http://postsocial.gwu.edu
Topological Model of 5 years of Published IFTTT Recipes from 2011-2016
32
71%ofIFTTTrecipesareinnodesofonelarge
connectedcomponent
10%ofIFTTTrecipesareinnodesthataresingletons
19%ofIFTTTrecipesareinnodesof13smallconnectedcomponents
Ayasdi 6.7WorkbenchandPythonSDKusedtogeneratetopologicalmodels
© Hoffman and Novak 2017 | http://postsocial.gwu.edu© Hoffman and Novak 2017 | http://postsocial.gwu.edu 16
367 nodes contain 14,692 IFTTT recipes (71% of all recipes).Recipes can be in more than one node.Nodes are connected by an edge if they have a recipe in common.
Topological Model of 5 years of Published IFTTT Recipes from 2011-2016
© Hoffman and Novak 2017 | http://postsocial.gwu.edu© Hoffman and Novak 2017 | http://postsocial.gwu.edu 17
Topological Models – Varying Resolution (number of bins used by TDA to generate nodes)LowResolutionModel(resolution=25)
HighResolutionModel(resolution=60)
© Hoffman and Novak 2017 | http://postsocial.gwu.edu© Hoffman and Novak 2017 | http://postsocial.gwu.edu 18
Topological Models – Varying Gain (degree of overlap of IFTTT recipes within nodes)LowGainModel(gain=1.4)
HighGainModel(gain=2.8)
© Hoffman and Novak 2017 | http://postsocial.gwu.edu© Hoffman and Novak 2017 | http://postsocial.gwu.edu 19
T=DateTimeA=RescueTime,LIFX,Wemo
T=AndroidDevice,Reddit,Android
BatteryA=AndroidDevice,AndroidWear
T=FacebookPages,Facebook,Tumblr,
TwitterA=Facebook,
Dropbox,OneDrive,Box,Tumblr,
WordPress,Blogger
z
T=GmailA=IF,Hue,
Boxcar,Wemo
T=Alexa,Wemo,SmartThings,Ring,ScoutAlarm,Dlink
Motion,ArloA=SmartThings,
Harmony,Manything
T=LocationT=GoogleCalendar,
Stocks,SquareA=Skype,Pushbullet
T=Fitbit,JawboneUP,
Misfit,Withings,Nike+A=Slack,Hue,IF,Gail,SMS
T=Pocket,WordPress,Tumblr,Blogger,Ebay,Instapaper,CraigslistA=Evernote,Delicious,
Diigo,Pinboard,Tumblr,WordPress,
Blogger
A=OneNote
T=AndroidSMS
A=AndroidDevice
T=Youtube,Foursquare,Vimeo,Flickr,DailyMotion
A=Pocket,Instapaper,Bitly,Tumblr,FBPages,Blogger
T=Inoreader,Tumblr,Flickr,
TwitterA=Buffer
A=GoogleDrive
T=Twitter,Life360
A=Ecobee,Tumblr
T=Automatic,Spotify,
Withings,Dash,Toodledoo,
FitBitA=GoogleCal,GoogleGlass,JawboneUp,
SMS
A=Todooist,Beminder,Toodledo
T=EmailA=Hue,Wemo,
ManyThing
T=Weather,Space
A=Hue,Nest,LittleBits,LIFX,WeMo,IF
T=IFTTT,GitHub,Wemo,Yo,
Particle,Stripe,Smappee
A=email,Gmail,Pusbullet,IFnotifications,
SMS
T=Alexa(sayspecificphrase)
T=500pxA=Feedly
Interpretation viaIFTTT Channels
GroupsofnodesidentifiedwithnetworkclusteringusingAyasdi’s CommunityAlgorithm(Louvainmodularityoptimization).
© Hoffman and Novak 2017 | http://postsocial.gwu.edu© Hoffman and Novak 2017 | http://postsocial.gwu.edu 20
IMAGEANDVIDEOSOCIALCONTENT
SOCIALMEDIA
HOMEIOT
INFORMATION
TASKS
WEARABLEIOT
TDA Identifies 6 Broad Groups of IFTTT Recipes
© Hoffman and Novak 2017 | http://postsocial.gwu.edu© Hoffman and Novak 2017 | http://postsocial.gwu.edu 21
Year 1
Year 3
Year 5
© Hoffman and Novak 2017 | http://postsocial.gwu.edu© Hoffman and Novak 2017 | http://postsocial.gwu.edu 22
487 Amazon Alexa IFTTT Recipes
”Social Alexa”Consumer talks
to Alexa
”Alexa Automation”Alexa triggers other actions
Redshadingindicatesthat>50%ofrecipesinanodeuseAmazonAlexa.
TDAidentifiestwocategoriesofAlexaIFTTTrecipesoccupyingdifferentlocationsinthetopologicalmodel.
SOCIALMEDIA
HOMEIOT
© Hoffman and Novak 2017 | http://postsocial.gwu.edu© Hoffman and Novak 2017 | http://postsocial.gwu.edu 33
TDA of Subset of 487 Alexa IFTTT Recipes
z
z
z
SocialAlexa247Recipes
AlexaAutomation168Recipes
OtherAlexaRecipes72Recipes
© Hoffman and Novak 2017 | http://postsocial.gwu.edu© Hoffman and Novak 2017 | http://postsocial.gwu.edu
Future Directions
23
© Hoffman and Novak 2017 | http://postsocial.gwu.edu© Hoffman and Novak 2017 | http://postsocial.gwu.edu 24
Removing Noise in Data Provides a Clearer Topological Model
TDAof2163IFTTTngrams(HammingMetric)
TDAof100principalcomponentsof2163IFTTTngrams
(EuclideanDistanceMetric)
© Hoffman and Novak 2017 | http://postsocial.gwu.edu© Hoffman and Novak 2017 | http://postsocial.gwu.edu 25
Preliminary Analysis of Manually Coded IFTTT Rules
TDAof455IFTTTcodesTriggernounandverbphrasecodesActionnounandverbphrasecodes
(HammingMetric)
© Hoffman and Novak 2017 | http://postsocial.gwu.edu© Hoffman and Novak 2017 | http://postsocial.gwu.edu 28
Analysis of Manually Coded IFTTT RulesTHING TRIGGERSTHING ACTIONS
WEB TRIGGERSWEB ACTIONS
WEB TRIGGERSTHING ACTIONS
THING TRIGGERSWEB ACTIONS
© Hoffman and Novak 2017 | http://postsocial.gwu.edu
Future ResearchDirections
Methodological› Comparison of alternative approaches for
processing structured text data (IFTTT recipes): N-grams, word2vec, doc2vec, topic models, latent semantic analysis, human coding.
› Comparison of alternative visualization approaches: TDA, network analysis, PCA, MDS, hierarchical and k-means clustering
Substantive› Why are certain IFTTT recipes more
frequently published, favorited and added?
› What are the dynamics of how new IFTTT recipes emerge over time?
29
© Hoffman and Novak 2017 | http://postsocial.gwu.edu© Hoffman and Novak 2017 | http://postsocial.gwu.edu 30
Acknowledgments
The Ayasdi 6.7 software platform for topological data analysis (ayasdi.com) was used to construct all topological models of the IFTTT data. The authors acknowledge the support of Devi Ramanan, Global Head – Product Collaborations, Ayasdi Inc., Menlo Park, CA
IFTTT public recipe data from 2011-2016 were provided by, and used with permission of, IFTTT.com, San Francisco, CA.
© Hoffman and Novak 2017 | http://postsocial.gwu.edu© Hoffman and Novak 2017 | http://postsocial.gwu.edu 31
postsocial.gwu.edu