learning from instagram

Learning from Instagram

Nikita PestrovOleg Grinchuk

Nikita Kluchnikov Sergey Muratov

Problem Formulation• GOAL: Predict possible number of likes for the picture

user is about to post.

• MEANS: Instagram Data, Deep and Shallow learning techniques

Data CollectionWe’ve used Instagram API to collect photos and their attributes: – likes count – comments – hashtags – geotags – etc…

We have also collected likes count for each photo in the timeframe of 6, 12 and 24 hours.

Result: 300k photos (in 4 days) — thanks to Kirill Potekhin

Data AnalysisWe have employed an instance of reduced trained ImageNet deep network in order to extract photo features.

tSNE

Dark = liked by more than 20% of subscribers Azure = liked by 10-20% Yellow = liked by <10%

No meaningful data here…

Architecture• We have designed our own deep learning network so that it takes a photo as an

input and returns it’s «score».«Score» is a value that represents how many likes a photo is expected to receive.

• Network structure: 32CONV3-MP2-64CONV3-MP2-128CONV3-MP2-256CONV3-MP2-1000FC-1FC

• «Score» formulas we have tried:

And neither worked.

Pivot!

• Learned scores were not significantly different between photos.

• The reason for that is probably that «likeability» of the photo is not characterized by the features ImageNet is extracting.

• Trendiness? Right place, right time? Mere luck?

• «Can’t predict likes… Predict hashtags! #deeplearning #YOLO»

New Data Analysis

New ArchitecturePipeline:• choose the most significant and different hashtags

GOOD: #food, #flowers, #sunset BAD: #cool, #like4like, #follow4follow

• form photos datasets based on these hashtags in photo comment * some cherry-picking is required

• extract features using reduced ImageNet net without 2 last layers (N=4096)

• train an SVM on those features

ResultsWe have tested the quality of trained classifiers on the remains of our dataset:

#food: ~94%#flowers: ~85%#sunset: ~92%

It is possible to avoid choosing these hashtags by hand using tSNE representation.

Application

• Hashtag Prediction: When a user uploads a photo, he is presented to a list of the most relevant hashtags (which are popular now)

• Relevant Instagram Search: Separate meaningful hashtags from «like-hunters’s» posts.

• Hashtag-Specific Geosearch: Extract photos by their location and find it’s current characteristics in terms of hashtags — food, parties, etc…

Conclusion

• Worked on estimating likeability of the photo with RDL techniques.

• Successfully extracted hashtags from photos with RDL techniques.

• Put the knowledge of Deep and Shallow Learning to practice.

Thank you

Kudos to Victor Lempitsky and Kirill Potekhin

learning from instagram

Science

instagram data

instagram api

data analysis

data collection weve

photo features

shallow learning techniques

possible number

problem formulation