![Page 1: Master Thesis: Imputation of missing Product Information ... · Global retail ecommerce sales will reach about $4 trillion in 2020, accounting for 14.6% of total retail spending worldwide](https://reader036.vdocument.in/reader036/viewer/2022062605/5fd181a1a32b69023a1f8862/html5/thumbnails/1.jpg)
Chair of Software Engineering for Business Information Systems (sebis)
Faculty of Informatics
Technische Universität München
wwwmatthes.in.tum.de
Master Thesis: Imputation of missing Product Information using
Deep LearningA Use Case on Amazon Product Catalogue
Aamna Najmi, 18.01.2019, Kickoff
Advisor: Ahmed Elnaggar
![Page 2: Master Thesis: Imputation of missing Product Information ... · Global retail ecommerce sales will reach about $4 trillion in 2020, accounting for 14.6% of total retail spending worldwide](https://reader036.vdocument.in/reader036/viewer/2022062605/5fd181a1a32b69023a1f8862/html5/thumbnails/2.jpg)
▪ Introduction
▪ Motivation
▪ Objective
▪ Approach
▪ Research Questions
▪ Dataset
▪ Methodology
▪ Timeline of the Project
Outline
© sebisKickoff Master Thesis – Aamna Najmi 2
![Page 3: Master Thesis: Imputation of missing Product Information ... · Global retail ecommerce sales will reach about $4 trillion in 2020, accounting for 14.6% of total retail spending worldwide](https://reader036.vdocument.in/reader036/viewer/2022062605/5fd181a1a32b69023a1f8862/html5/thumbnails/3.jpg)
▪ Introduction
▪ Motivation
▪ Objective
▪ Approach
▪ Research Questions
▪ Dataset
▪ Methodology
▪ Timeline of the Project
Outline
© sebisKickoff Master Thesis – Aamna Najmi 3
![Page 4: Master Thesis: Imputation of missing Product Information ... · Global retail ecommerce sales will reach about $4 trillion in 2020, accounting for 14.6% of total retail spending worldwide](https://reader036.vdocument.in/reader036/viewer/2022062605/5fd181a1a32b69023a1f8862/html5/thumbnails/4.jpg)
Introduction
▪ Global retail ecommerce sales will reach about $4 trillion in 2020, accounting for 14.6% of total retail
spending worldwide [1]
▪ 20% of purchase failures are potentially a result of missing or unclear product information [2]
▪ Detailed product information = improved customer experience and company profit
© sebisKickoff Master Thesis – Aamna Najmi 4
![Page 5: Master Thesis: Imputation of missing Product Information ... · Global retail ecommerce sales will reach about $4 trillion in 2020, accounting for 14.6% of total retail spending worldwide](https://reader036.vdocument.in/reader036/viewer/2022062605/5fd181a1a32b69023a1f8862/html5/thumbnails/5.jpg)
▪ Introduction
▪ Motivation
▪ Objective
▪ Approach
▪ Research Questions
▪ Dataset
▪ Methodology
▪ Timeline of the Project
Outline
© sebisKickoff Master Thesis – Aamna Najmi 5
![Page 6: Master Thesis: Imputation of missing Product Information ... · Global retail ecommerce sales will reach about $4 trillion in 2020, accounting for 14.6% of total retail spending worldwide](https://reader036.vdocument.in/reader036/viewer/2022062605/5fd181a1a32b69023a1f8862/html5/thumbnails/6.jpg)
Motivation
© sebisKickoff Master Thesis – Aamna Najmi 6
Organizational Benefits Customer Experience
Machine Learning
Transfer
LearningMulti Task
LearningNLP
Computer
Vision
Reliability
Informed
Decision
Making
Enhanced
Website
Navigation
Inventory
Management
Delivery and
TransportationCompany
Profit
![Page 7: Master Thesis: Imputation of missing Product Information ... · Global retail ecommerce sales will reach about $4 trillion in 2020, accounting for 14.6% of total retail spending worldwide](https://reader036.vdocument.in/reader036/viewer/2022062605/5fd181a1a32b69023a1f8862/html5/thumbnails/7.jpg)
▪ Introduction
▪ Motivation
▪ Objective
▪ Approach
▪ Research Questions
▪ Dataset
▪ Methodology
▪ Timeline of the Project
Outline
© sebisKickoff Master Thesis – Aamna Najmi 7
![Page 8: Master Thesis: Imputation of missing Product Information ... · Global retail ecommerce sales will reach about $4 trillion in 2020, accounting for 14.6% of total retail spending worldwide](https://reader036.vdocument.in/reader036/viewer/2022062605/5fd181a1a32b69023a1f8862/html5/thumbnails/8.jpg)
Objective
© sebisKickoff Master Thesis – Aamna Najmi 8
Predict missing information (Category, Color and Gender) of Amazon products belonging to the
Fashion department in the European market using textual information in five languages,
namely English, German, French, Spanish and Italian, and product images as inputs to the
model.
![Page 9: Master Thesis: Imputation of missing Product Information ... · Global retail ecommerce sales will reach about $4 trillion in 2020, accounting for 14.6% of total retail spending worldwide](https://reader036.vdocument.in/reader036/viewer/2022062605/5fd181a1a32b69023a1f8862/html5/thumbnails/9.jpg)
▪ Introduction
▪ Motivation
▪ Objective
▪ Approach
▪ Research Questions
▪ Dataset
▪ Methodology
▪ Timeline of the Project
Outline
© sebisKickoff Master Thesis – Aamna Najmi 9
![Page 10: Master Thesis: Imputation of missing Product Information ... · Global retail ecommerce sales will reach about $4 trillion in 2020, accounting for 14.6% of total retail spending worldwide](https://reader036.vdocument.in/reader036/viewer/2022062605/5fd181a1a32b69023a1f8862/html5/thumbnails/10.jpg)
Approach
© sebisKickoff Master Thesis – Aamna Najmi 10
• Multi-task learning : Train one model to
perform multiple tasks concurrently [3]
• Transfer Learning : Use pre-trained network
to apply its weights and biases to a different
domain [4]
![Page 11: Master Thesis: Imputation of missing Product Information ... · Global retail ecommerce sales will reach about $4 trillion in 2020, accounting for 14.6% of total retail spending worldwide](https://reader036.vdocument.in/reader036/viewer/2022062605/5fd181a1a32b69023a1f8862/html5/thumbnails/11.jpg)
▪ Introduction
▪ Motivation
▪ Approach
▪ Research Questions
▪ Dataset
▪ Methodology
▪ Timeline of the Project
Outline
© sebisKickoff Master Thesis – Aamna Najmi 11
![Page 12: Master Thesis: Imputation of missing Product Information ... · Global retail ecommerce sales will reach about $4 trillion in 2020, accounting for 14.6% of total retail spending worldwide](https://reader036.vdocument.in/reader036/viewer/2022062605/5fd181a1a32b69023a1f8862/html5/thumbnails/12.jpg)
Research Questions
1. Could training on multiple tasks and transfer learning perform better than training on one task
independently using the Amazon Product Catalog Dataset?
2. What architecture choices and hyperparameters shall we use in both multi-task and transfer
learning to obtain better performance?
3. Can multi-task learning and transfer learning be useful in the ecommerce domain to enhance user-
experience?
© sebisKickoff Master Thesis – Aamna Najmi 12
![Page 13: Master Thesis: Imputation of missing Product Information ... · Global retail ecommerce sales will reach about $4 trillion in 2020, accounting for 14.6% of total retail spending worldwide](https://reader036.vdocument.in/reader036/viewer/2022062605/5fd181a1a32b69023a1f8862/html5/thumbnails/13.jpg)
▪ Introduction
▪ Motivation
▪ Approach
▪ Research Questions
▪ Dataset
▪ Methodology
▪ Timeline of the Project
Outline
© sebisKickoff Master Thesis – Aamna Najmi 13
![Page 14: Master Thesis: Imputation of missing Product Information ... · Global retail ecommerce sales will reach about $4 trillion in 2020, accounting for 14.6% of total retail spending worldwide](https://reader036.vdocument.in/reader036/viewer/2022062605/5fd181a1a32b69023a1f8862/html5/thumbnails/14.jpg)
Dataset
▪ Source: Scraped data from five regional Amazon websites (UK, DE, FR, IT,ES)
▪ Records: 200k from each website, 1 million in total
▪ Attributes: Product ID, Product Title, Product Description, Color, Category, Product Summary, Product
Specifications, Product Image
▪ Sample:
© sebisKickoff Master Thesis – Aamna Najmi 14
![Page 15: Master Thesis: Imputation of missing Product Information ... · Global retail ecommerce sales will reach about $4 trillion in 2020, accounting for 14.6% of total retail spending worldwide](https://reader036.vdocument.in/reader036/viewer/2022062605/5fd181a1a32b69023a1f8862/html5/thumbnails/15.jpg)
Dataset (contd.)
© sebisKickoff Master Thesis – Aamna Najmi 15
Note: The above image is representative of the data scraped from www.amazon.co.uk
![Page 16: Master Thesis: Imputation of missing Product Information ... · Global retail ecommerce sales will reach about $4 trillion in 2020, accounting for 14.6% of total retail spending worldwide](https://reader036.vdocument.in/reader036/viewer/2022062605/5fd181a1a32b69023a1f8862/html5/thumbnails/16.jpg)
Dataset (contd.)
© sebisKickoff Master Thesis – Aamna Najmi 16
Note: The above image is representative of the data scraped from www.amazon.co.uk
![Page 17: Master Thesis: Imputation of missing Product Information ... · Global retail ecommerce sales will reach about $4 trillion in 2020, accounting for 14.6% of total retail spending worldwide](https://reader036.vdocument.in/reader036/viewer/2022062605/5fd181a1a32b69023a1f8862/html5/thumbnails/17.jpg)
Dataset (contd.)
© sebisKickoff Master Thesis – Aamna Najmi 17
Note: The above image is representative of the data scraped from www.amazon.co.uk
![Page 18: Master Thesis: Imputation of missing Product Information ... · Global retail ecommerce sales will reach about $4 trillion in 2020, accounting for 14.6% of total retail spending worldwide](https://reader036.vdocument.in/reader036/viewer/2022062605/5fd181a1a32b69023a1f8862/html5/thumbnails/18.jpg)
Dataset (contd.)
© sebisKickoff Master Thesis – Aamna Najmi 18
Note: The above image is representative of the data scraped from www.amazon.co.uk
![Page 19: Master Thesis: Imputation of missing Product Information ... · Global retail ecommerce sales will reach about $4 trillion in 2020, accounting for 14.6% of total retail spending worldwide](https://reader036.vdocument.in/reader036/viewer/2022062605/5fd181a1a32b69023a1f8862/html5/thumbnails/19.jpg)
▪ Introduction
▪ Motivation
▪ Approach
▪ Research Questions
▪ Dataset
▪ Methodology
▪ Timeline of the Project
Outline
© sebisKickoff Master Thesis – Aamna Najmi 19
![Page 20: Master Thesis: Imputation of missing Product Information ... · Global retail ecommerce sales will reach about $4 trillion in 2020, accounting for 14.6% of total retail spending worldwide](https://reader036.vdocument.in/reader036/viewer/2022062605/5fd181a1a32b69023a1f8862/html5/thumbnails/20.jpg)
Methodology
© sebisKickoff Master Thesis – Aamna Najmi 20
Web ScrapingPreparing the
dataset Integrate dataset
Train the model on MTL and TL architecture
Evaluation of the results
Verify and validate the research
questions
![Page 21: Master Thesis: Imputation of missing Product Information ... · Global retail ecommerce sales will reach about $4 trillion in 2020, accounting for 14.6% of total retail spending worldwide](https://reader036.vdocument.in/reader036/viewer/2022062605/5fd181a1a32b69023a1f8862/html5/thumbnails/21.jpg)
▪ Introduction
▪ Motivation
▪ Approach
▪ Research Questions
▪ Dataset
▪ Methodology
▪ Timeline of the Project
Outline
© sebisKickoff Master Thesis – Aamna Najmi 21
![Page 22: Master Thesis: Imputation of missing Product Information ... · Global retail ecommerce sales will reach about $4 trillion in 2020, accounting for 14.6% of total retail spending worldwide](https://reader036.vdocument.in/reader036/viewer/2022062605/5fd181a1a32b69023a1f8862/html5/thumbnails/22.jpg)
Timeline
© sebisKickoff Master Thesis – Aamna Najmi 22
Thesis start
Kick-off presentation
26/ Nov/ 18 5/ Jan/ 19 14/ Feb/ 19 26/ Mrz/ 19 5/ Mai/ 19 14/ Jun/ 19
Literature Review
Implementation
Evaluation
Documentation
Review
Literature ReviewImplementationEvaluationDocumentationReview
Start Date December 1, 2018January 1, 2019February 1, 2019March 1, 2019April 1, 2019
Duration 120119887560
End Date March 31, 2019April 30, 2019April 30, 2019May 15, 2019May 31, 2019
Thesis end
Hand in Thesis
![Page 23: Master Thesis: Imputation of missing Product Information ... · Global retail ecommerce sales will reach about $4 trillion in 2020, accounting for 14.6% of total retail spending worldwide](https://reader036.vdocument.in/reader036/viewer/2022062605/5fd181a1a32b69023a1f8862/html5/thumbnails/23.jpg)
References
© sebisKickoff Master Thesis – Aamna Najmi 23
[1] https://www.emarketer.com/Article/Worldwide-Retail-Ecommerce-Sales-Will-Reach-1915-Trillion-This-Year/1014369
[2] https://www.nngroup.com/reports/ecommerce-user-experience/
[3] Sebastian Ruder. An overview of Multi-Task Learning in Deep Neural Networks, 2017
[4] Sebastian Ruder. http://ruder.io/transfer-learning/
![Page 24: Master Thesis: Imputation of missing Product Information ... · Global retail ecommerce sales will reach about $4 trillion in 2020, accounting for 14.6% of total retail spending worldwide](https://reader036.vdocument.in/reader036/viewer/2022062605/5fd181a1a32b69023a1f8862/html5/thumbnails/24.jpg)
Technische Universität München
Faculty of Informatics
Chair of Software Engineering for Business
Information Systems
Boltzmannstraße 3
85748 Garching bei München
Aamna Najmi
Imputation of missing Product Information
using Deep Learning