unsupervised streaming feature selection in social media

Post on 18-Jan-2018

224 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Outline Background and Motivation Problem Statement Proposed USFS Framework Experimental Results Conclusions and Future Work

TRANSCRIPT

Unsupervised Streaming Feature Selection in Social MediaArizona State University Data Mining and Machine Learning Lab CIKM 2015 1

Unsupervised Streaming Feature Selection in Social Media

Jundong Li1, Xia Hu2, Jiliang Tang3 and Huan Liu1

1Arizona State University2Texas A&M University

3Yahoo! Labs

Unsupervised Streaming Feature Selection in Social MediaArizona State University Data Mining and Machine Learning Lab CIKM 2015 22

• Background and Motivation

• Problem Statement

• Proposed USFS Framework

• Experimental Results

• Conclusions and Future Work

Outline

Unsupervised Streaming Feature Selection in Social MediaArizona State University Data Mining and Machine Learning Lab CIKM 2015 33

Social Media

• Rapid growth of social media provides a platform for people to perform online social activities

• Massive amounts of high dimensional data are user generated and quickly disseminated

• It is desirable to reduce the dimensionality of social media data due to curse of dimensionality

Unsupervised Streaming Feature Selection in Social MediaArizona State University Data Mining and Machine Learning Lab CIKM 2015 44

• Feature selection is effective to preparing high-dimensional data by selecting a subset of relevant features for a compact and accurate representation

Feature Selection

feature selection

Unsupervised Streaming Feature Selection in Social MediaArizona State University Data Mining and Machine Learning Lab CIKM 2015 55

• Traditional feature selection assumes that all features are static and known in advance

• Features in social media are usually generated dynamically in a streaming fashion– Twitter produces more than 500 millions of tweets

everyday and a large amount of slang words (features) are continuously being user generated

– In disaster relief, topics (features) like ``Chile Earthquake” emerge to be hot shortly

Feature Selection in Social Media

Unsupervised Streaming Feature Selection in Social MediaArizona State University Data Mining and Machine Learning Lab CIKM 2015 66

• It is more appealing to perform streaming feature selection to capture relevant features timely

Streaming Feature Selection

Unsupervised Streaming Feature Selection in Social MediaArizona State University Data Mining and Machine Learning Lab CIKM 2015 77

• Challenges– Label information is costly – Data not i.i.d

• Opportunities– Link information is abundant and maybe helpful

• Target– Propose an unsupervised streaming feature selection

algorithm for social media data

Challenges, Opportunities and Target

No existing unsupervised streaming feature selection

algorithms !

Unsupervised Streaming Feature Selection in Social MediaArizona State University Data Mining and Machine Learning Lab CIKM 2015 88

• Background and Motivation

• Problem Statement

• Proposed USFS Framework

• Experimental Results

• Conclusions and Future Work

Outline

Unsupervised Streaming Feature Selection in Social MediaArizona State University Data Mining and Machine Learning Lab CIKM 2015 99

• Given n linked instances, let adjacency M denotes their link information. Assume that features arrive dynamically one each time, at time step t, each instance is associated with a set of streaming features X(t) = {f1, f2, …, ft}

• we want to select a subset of relevant features at each time step effectively and efficiently by using link information M and content information X(t)

Problem Statement

Unsupervised Streaming Feature Selection in Social MediaArizona State University Data Mining and Machine Learning Lab CIKM 2015 1010

Illustration

……

t+i ……

……

…………

t t+1

t+it t+1t+it t+1

t+it t+1

t+it t+1

Selected Feature Set

Accept the new feature?

Reject existing feature?

Unsupervised Streaming Feature Selection in Social MediaArizona State University Data Mining and Machine Learning Lab CIKM 2015 1111

• Background and Motivation

• Problem Statement

• Proposed USFS Framework

• Experimental Results

• Conclusions and Future Work

Outline

Unsupervised Streaming Feature Selection in Social MediaArizona State University Data Mining and Machine Learning Lab CIKM 2015 1212

• Social media users connect due to a variety of reasons such as movie fans, sports enthusiasts, colleagues, etc

• Users with similar hidden factors are similar• Hidden factors are helpful to steer unsupervised

streaming feature selection • We use mixed membership stochastic blockmodel

(MMSB) [Blei+NIPS2009] to extract hidden social factors from link information

Modeling Link Information

Unsupervised Streaming Feature Selection in Social MediaArizona State University Data Mining and Machine Learning Lab CIKM 2015 1313

• At time step t:

• Hidden social factors as regression targets• L1-norm can be used for feature selection

Modeling Link Information (con’t)

Unsupervised Streaming Feature Selection in Social MediaArizona State University Data Mining and Machine Learning Lab CIKM 2015 1414

• If two users are similar in the original feature space, the two users are also similar in the selected feature space.

Modeling Content Information

Unsupervised Streaming Feature Selection in Social MediaArizona State University Data Mining and Machine Learning Lab CIKM 2015 1515

Optimization Formulation at Time t

• By combining network information and content information

• Decompose into a set of sub-problems

Unsupervised Streaming Feature Selection in Social MediaArizona State University Data Mining and Machine Learning Lab CIKM 2015 1616

• At time step t+1 when the new feature arrives:

• Objective function is reduced if the reduction in 1st,3rd,4th term outweighs the increase in the 2nd term

• Therefore, the condition to accept the new feature is

Testing New Feature

Unsupervised Streaming Feature Selection in Social MediaArizona State University Data Mining and Machine Learning Lab CIKM 2015 1717

• Test existing features when new feature is added• When new feature is accepted, we optimize the

following w.r.t. current variables, which forces some feature coefficient to be zero

• Convex optimization problem, we use Broyden-Fletcher-Goldfarb-Shanno (BFGS)

Testing Existing Features

Unsupervised Streaming Feature Selection in Social MediaArizona State University Data Mining and Machine Learning Lab CIKM 2015 1818

Feature Selection by USFS

• If the new feature is accepted, we obtain sparse coefficient matrix by solving all sub-problems

• For each feature j, if any of its k corresponding feature weight is nonzero, the feature is included in the final model, the feature score is defined as

• Features are ranked in a descending order by their feature scores

Unsupervised Streaming Feature Selection in Social MediaArizona State University Data Mining and Machine Learning Lab CIKM 2015 1919

• Background and Motivation

• Problem Statement

• Proposed USFS Framework

• Experimental Results

• Conclusions and Future Work

Outline

Unsupervised Streaming Feature Selection in Social MediaArizona State University Data Mining and Machine Learning Lab CIKM 2015 2020

• Q1: How is the quality of selected features by the USFS framework?

• Q2: How efficient is the proposed USFS framework?

Questions to Investigate

Unsupervised Streaming Feature Selection in Social MediaArizona State University Data Mining and Machine Learning Lab CIKM 2015 2121

• BlogCatalog (social blog directory)• Flickr (image sharing website)

• Assume features arrive in a random order, take {20%,30%,…,90%,100%} of all features

Datasets

Unsupervised Streaming Feature Selection in Social MediaArizona State University Data Mining and Machine Learning Lab CIKM 2015 2222

• Evaluation– Clustering: K-means– Metrics: Accuracy and NMI

• Baseline batch-mode methods• Laplacian Score [He et al. NIPS 2005]• SPEC [Zhao and Liu. ICML 2007]• NDFS [Li et al. AAAI 2012]• LUFS [Tang and Liu, KDD 2012]

Experimental Settings

Unsupervised Streaming Feature Selection in Social MediaArizona State University Data Mining and Machine Learning Lab CIKM 2015 2323

Performance on Flickr

Unsupervised Streaming Feature Selection in Social MediaArizona State University Data Mining and Machine Learning Lab CIKM 2015 2424

Performance on BlogCatalog

Unsupervised Streaming Feature Selection in Social MediaArizona State University Data Mining and Machine Learning Lab CIKM 2015 2525

Cumulative Running Time

• In BlogCatalog, USFS is 7x, 20x, 29x, 76x faster • In Flickr, USFS is 5x, 11x, 20x, 75x faster

Unsupervised Streaming Feature Selection in Social MediaArizona State University Data Mining and Machine Learning Lab CIKM 2015 2626

• Background and Motivation

• Problem Statement

• Proposed USFS Framework

• Experimental Results

• Conclusions and Future Work

Outline

Unsupervised Streaming Feature Selection in Social MediaArizona State University Data Mining and Machine Learning Lab CIKM 2015 2727

• Goals: – Perform unsupervised streaming feature selection

for social media data• Solutions:

– Leverage link information as constraints– Stagewise algorithm for streaming features

• Results: – Achieve better feature selection performance in

terms of clustering– Reduce running time compared with batch-mode

methods

Conclusion

Unsupervised Streaming Feature Selection in Social MediaArizona State University Data Mining and Machine Learning Lab CIKM 2015 2828

• In this work, we consider the link information is relative stable compared with dynamic content information, we will investigate streaming feature selection in dynamic networks

• Streaming features come from different sources, we will investigate how to fuse heterogeneous feature sources for streaming feature selection

Future Work

Unsupervised Streaming Feature Selection in Social MediaArizona State University Data Mining and Machine Learning Lab CIKM 2015 29

Acknowledgement: This material is, in part, supported by National Science Foundation (NSF) under grant number IIS-1217466. Comments and suggestions from DMML members and reviewers are greatly appreciated.

Questions

top related