data science challenges in flight search
TRANSCRIPT
#SkyscannerSofia
Agenda• Introduction
• Why is it hard to do meta-search for flights?
• A few applications of flights meta-search data
Image from mastersindatascience.org
Who are we?
Konstantin Halachev
• Data science for bioinformatics (PhD with
focus on epigenetic data)
• Joined the new Skyscanner office in Sofia
nine months ago
Plamen Aleksandrov
• Worked on flights search engine
• Principal software engineer and squad lead
in Skyscanner
What is Skyscanner?
Skyscanner is a leading travel search site offering:
• unbiased
• comprehensive
• free
search services
Skyscanner in numbers?
- 9 global offices
- Sofia is the latest.
Started with 7 people, now at 16 and growing fast
- 700+ employees
- 40+ million app downloads
- 40+ million unique monthly visitors
- 13+ million searches per day
How do you plan your travel?
by destination and dates
by destination, choose dates
by dates, choose destination
Online Travel Search - Flights?
4000+ airports served by commercial airlines
700+ airlines in the world; 25,000+ aircrafts
40 million scheduled commercial flights in 2014
100,000 flights per day - i.e. >1 per second
40% of flights within US and Canada
79% average airplane fill rate
3 billion passengers in 2014
Flights Frequency
Source: http://www.iata.org/publications/economics/Pages/Air-Passenger-Monthly-Analysis.aspx
Profitability
Source: http://www.iata.org/publications/economics/Pages/Air-Passenger-Monthly-Analysis.aspx
at $8.27 per passenger
distribution is where the money is
Profitability is growing due to oil prices
Routes
Source: https://www.itasoftware.com
One Ways and Round Trips
Multi-leg: Open Jaws, Circle Trips
Fares, Fare components, Pricing Units, Tickets
Itinerary Structure
A BA B
A B
A B
A
CA
BC B
A B
A B
CC
take AA flights/fares on a SFO-BOS route
A total of 25,401,415 valid AA solutions
Only this particular airline and route
Example Route
SFO ORD
DWF BOS
5 * 36 = 85 fc
19 * 32 = 109 fc
41 * 32 = 162 fc
9 * 32 = 87 fc
Even exact dates are complicated
time to travel changes price
weekend stay and seasonality
advance purchase
Dates give interesting features and patterns
day of week
stay duration
age of quote/price
seasons: Christmas, Easter, holidays
Dates
Prices
Airline use seat availability to adjust price
prices are volatile – 26 booking classes
Airlines do Variable Pricing for fare portfolios
your flight neighbour paid a different price
15,000,000 availability questions per sec
no lock-down between search and book
Prices for the same seat can still be different
who sells your ticket? – codeshare, agency, OTA
40m unique monthly visitors
120m visits on web and mobile per month
13m searches per day
results are up-to-date user experiences on the web
Searches on Month view and Browse view
Exits by redirects
we don’t take ownership of the booking
we keep true to our users, providers and own values
Searches and Exits
2bn quotes per day => 700bn quotes per year
quotes contain entire itinerary and price
data can be easily processed and/or extracted
prices are up to date, but we also keep historical data
200GB gzipped data per day => 80TB per year
95% airlines and OTAs world coverage
Data
How much data is that?
A small list of technologies used:
• Thrift/ RabbitMQ/ Ruby/ FluentD,
• Scala/ Spark/ Hive,
• AWS (S3, Glacier, EC2, Elastic
MapReduce, DynamoDB),
• Elasticsearch/ Kibana,
• Python/ Flask
Image from vicchi.org
2,000,000,000 quotes per day
#SkyscannerSofia
What can we do with these data?1. Dynamics of flight prices
2. Travel Insights for airlines and airports
3. Inspiration – finding good deals
4. A small analysis
Dynamics of flight prices
Route
LON - MAD
Direct only
One way
1. Too many routes ->
Let’s select a popular route (London - Madrid)
2. Let’s focus on direct connections only
3. Let’s focus on one-way only
Route
LON - MAD
Direct only
One way
Carrier –
Ryanair
Travelling
on Wednesday
Dynamics of flight prices
Route
LON - MAD
Direct only
One way
Carrier –
Ryanair
Travelling
on Wednesday
Month of
travel
Dynamics of flight prices
Dynamics of flight prices
Route
LON - MAD
Direct only
One way
Carrier –
Ryanair
Travelling
on Wednesday
Month of
travel
Route
LON - MAD
Direct only
One way
Carrier –
Ryanair
Travelling
on Wednesday
Month of
travel - May
Dynamics of flight prices
Dynamics of flight prices
Route
LON - MAD
Direct only
One way
Carrier –
Ryanair
Travelling
on Wednesday
Month of
travel - May
Dynamics of flight prices
Route
LON - MAD
Direct only
One way
Carrier –
Ryanair
Travelling
on Wednesday
Month of
travel - May
#SkyscannerSofia
What can we do with these data?1. Dynamics of flight prices
2. Travel Insights for airlines and airports
3. Inspiration – finding good deals
4. A small analysis
Travel Insights – for airlines
Another small list of technologies used :
• Python, .Net
• AWS (S3, Redshift, EC2), MS SQL
• Tableau
#SkyscannerSofia
What can we do with these data?1. Dynamics of flight prices
2. Travel Insights for airlines and airports
3. Inspiration – finding good deals• Where?
• When?
• Which deal is good?
4. A small analysis
Travel Inspiration - Skyscanner API
Technologies used:Google maps, Python, Flask, AWS Redshift, Skyscanner API
You want to do better?http://business.skyscanner.net/
You can get a trial API key by filling in the feedback form at
the end of the event:http://goo.gl/forms/i4C2VcSGyW
#SkyscannerSofia
What can we do with this data?1. Dynamics of flight prices
2. Travel Insights for airlines and airports
3. Inspiration – finding good deals
4. A small analysis or how did demand for trips to Greece
change in the heat of the crisis and what do the Danish
know about it?
What we know we did not talk about?
• What is the best way to get the cheapest deals?
• Recommendations
• Personalization
• A/B testing
• Sorting of flight results
• Infrastructure
• Ahum, “Travel”…
Image credit: jangosteve.com
#SkyscannerSofia
Thank you!
Please give us feedback or apply for API keys
here: http://goo.gl/forms/i4C2VcSGyW
• Konstantin [email protected]
• Plamen [email protected]
We are hiring!!!