learning from instagram

14
Learning from Instagram Nikita Pestrov Oleg Grinchuk Nikita Kluchnikov Sergey Muratov

Upload: -

Post on 11-Aug-2015

100 views

Category:

Science


1 download

TRANSCRIPT

Learning from Instagram

Nikita PestrovOleg Grinchuk

Nikita Kluchnikov Sergey Muratov

Problem Formulation• GOAL: Predict possible number of likes for the picture

user is about to post.

• MEANS: Instagram Data, Deep and Shallow learning techniques

Data CollectionWe’ve used Instagram API to collect photos and their attributes: – likes count – comments – hashtags – geotags – etc…

We have also collected likes count for each photo in the timeframe of 6, 12 and 24 hours.

Result: 300k photos (in 4 days) — thanks to Kirill Potekhin

Data AnalysisWe have employed an instance of reduced trained ImageNet deep network in order to extract photo features.

tSNE

Dark = liked by more than 20% of subscribers Azure = liked by 10-20% Yellow = liked by <10%

No meaningful data here…

Architecture• We have designed our own deep learning network so that it takes a photo as an

input and returns it’s «score».«Score» is a value that represents how many likes a photo is expected to receive.

• Network structure: 32CONV3-MP2-64CONV3-MP2-128CONV3-MP2-256CONV3-MP2-1000FC-1FC

• «Score» formulas we have tried:

And neither worked.

Pivot!

• Learned scores were not significantly different between photos.

• The reason for that is probably that «likeability» of the photo is not characterized by the features ImageNet is extracting.

• Trendiness? Right place, right time? Mere luck?

• «Can’t predict likes… Predict hashtags! #deeplearning #YOLO»

New Data Analysis

New Data Analysis

New Data Analysis

New ArchitecturePipeline:• choose the most significant and different hashtags

GOOD: #food, #flowers, #sunset BAD: #cool, #like4like, #follow4follow

• form photos datasets based on these hashtags in photo comment * some cherry-picking is required

• extract features using reduced ImageNet net without 2 last layers (N=4096)

• train an SVM on those features

ResultsWe have tested the quality of trained classifiers on the remains of our dataset:

#food: ~94%#flowers: ~85%#sunset: ~92%

It is possible to avoid choosing these hashtags by hand using tSNE representation.

Application

• Hashtag Prediction: When a user uploads a photo, he is presented to a list of the most relevant hashtags (which are popular now)

• Relevant Instagram Search: Separate meaningful hashtags from «like-hunters’s» posts.

• Hashtag-Specific Geosearch: Extract photos by their location and find it’s current characteristics in terms of hashtags — food, parties, etc…

Conclusion

• Worked on estimating likeability of the photo with RDL techniques.

• Successfully extracted hashtags from photos with RDL techniques.

• Put the knowledge of Deep and Shallow Learning to practice.

Thank you

Kudos to Victor Lempitsky and Kirill Potekhin