data science challenges in flight search

56
Data science challenges in flight search Konstantin Halachev Plamen Aleksandrov 29 th July 2015

Upload: datasciencesociety

Post on 21-Aug-2015

71 views

Category:

Data & Analytics


3 download

TRANSCRIPT

Data science challenges in flight search

Konstantin Halachev

Plamen Aleksandrov

29th July 2015

#SkyscannerSofia

Agenda• Introduction

• Why is it hard to do meta-search for flights?

• A few applications of flights meta-search data

Image from mastersindatascience.org

Who are we?

Konstantin Halachev

• Data science for bioinformatics (PhD with

focus on epigenetic data)

• Joined the new Skyscanner office in Sofia

nine months ago

Plamen Aleksandrov

• Worked on flights search engine

• Principal software engineer and squad lead

in Skyscanner

What is Skyscanner?

Skyscanner is a leading travel search site offering:

• unbiased

• comprehensive

• free

search services

Skyscanner in numbers?

- 9 global offices

- Sofia is the latest.

Started with 7 people, now at 16 and growing fast

- 700+ employees

- 40+ million app downloads

- 40+ million unique monthly visitors

- 13+ million searches per day

#SkyscannerSofia

Why is it hard to do meta-search

for flights?

How do you plan your travel?

by destination and dates

by destination, choose dates

by dates, choose destination

Online Travel Search - Flights?

#SkyscannerSofia

Airline industry

4000+ airports served by commercial airlines

700+ airlines in the world; 25,000+ aircrafts

40 million scheduled commercial flights in 2014

100,000 flights per day - i.e. >1 per second

40% of flights within US and Canada

79% average airplane fill rate

3 billion passengers in 2014

Flights Frequency

Source: http://www.iata.org/publications/economics/Pages/Air-Passenger-Monthly-Analysis.aspx

Profitability

Source: http://www.iata.org/publications/economics/Pages/Air-Passenger-Monthly-Analysis.aspx

at $8.27 per passenger

distribution is where the money is

Profitability is growing due to oil prices

#SkyscannerSofia

Dimensionality of flights

One Ways and Round Trips

Multi-leg: Open Jaws, Circle Trips

Fares, Fare components, Pricing Units, Tickets

Itinerary Structure

A BA B

A B

A B

A

CA

BC B

A B

A B

CC

take AA flights/fares on a SFO-BOS route

A total of 25,401,415 valid AA solutions

Only this particular airline and route

Example Route

SFO ORD

DWF BOS

5 * 36 = 85 fc

19 * 32 = 109 fc

41 * 32 = 162 fc

9 * 32 = 87 fc

Even exact dates are complicated

time to travel changes price

weekend stay and seasonality

advance purchase

Dates give interesting features and patterns

day of week

stay duration

age of quote/price

seasons: Christmas, Easter, holidays

Dates

Prices

Airline use seat availability to adjust price

prices are volatile – 26 booking classes

Airlines do Variable Pricing for fare portfolios

your flight neighbour paid a different price

15,000,000 availability questions per sec

no lock-down between search and book

Prices for the same seat can still be different

who sells your ticket? – codeshare, agency, OTA

All tickets are booked at website or GDS

Distribution Providers

#SkyscannerSofia

Data and Scale at Skyscanner

40m unique monthly visitors

120m visits on web and mobile per month

13m searches per day

results are up-to-date user experiences on the web

Searches on Month view and Browse view

Exits by redirects

we don’t take ownership of the booking

we keep true to our users, providers and own values

Searches and Exits

2bn quotes per day => 700bn quotes per year

quotes contain entire itinerary and price

data can be easily processed and/or extracted

prices are up to date, but we also keep historical data

200GB gzipped data per day => 80TB per year

95% airlines and OTAs world coverage

Data

How much data is that?

A small list of technologies used:

• Thrift/ RabbitMQ/ Ruby/ FluentD,

• Scala/ Spark/ Hive,

• AWS (S3, Glacier, EC2, Elastic

MapReduce, DynamoDB),

• Elasticsearch/ Kibana,

• Python/ Flask

Image from vicchi.org

2,000,000,000 quotes per day

#SkyscannerSofia

What can we do with these

data?

Search

Search

Search

Search

Search

#SkyscannerSofia

What can we do with these data?1. Dynamics of flight prices

2. Travel Insights for airlines and airports

3. Inspiration – finding good deals

4. A small analysis

Dynamics of flight prices

Route

LON - MAD

Direct only

One way

1. Too many routes ->

Let’s select a popular route (London - Madrid)

2. Let’s focus on direct connections only

3. Let’s focus on one-way only

Dynamics of flight prices

Route

LON - MAD

Direct only

One way

Dynamics of flight prices

Route

LON - MAD

Direct only

One way

Carrier

Dynamics of flight prices

Route

LON - MAD

Direct only

One way

Carrier –

Ryanair

Dynamics of flight prices

Route

LON - MAD

Direct only

One way

Carrier –

Ryanair

Travelling

on

Dynamics of flight prices

Route

LON - MAD

Direct only

One way

Carrier –

Ryanair

Travelling

on

Route

LON - MAD

Direct only

One way

Carrier –

Ryanair

Travelling

on Wednesday

Dynamics of flight prices

Route

LON - MAD

Direct only

One way

Carrier –

Ryanair

Travelling

on Wednesday

Month of

travel

Dynamics of flight prices

Dynamics of flight prices

Route

LON - MAD

Direct only

One way

Carrier –

Ryanair

Travelling

on Wednesday

Month of

travel

Route

LON - MAD

Direct only

One way

Carrier –

Ryanair

Travelling

on Wednesday

Month of

travel - May

Dynamics of flight prices

Dynamics of flight prices

Route

LON - MAD

Direct only

One way

Carrier –

Ryanair

Travelling

on Wednesday

Month of

travel - May

Dynamics of flight prices

Route

LON - MAD

Direct only

One way

Carrier –

Ryanair

Travelling

on Wednesday

Month of

travel - May

#SkyscannerSofia

What can we do with these data?1. Dynamics of flight prices

2. Travel Insights for airlines and airports

3. Inspiration – finding good deals

4. A small analysis

Travel Insights – for airlines and airports

Travel Insights – for airlines and airports

Travel Insights – for airlines and airports

Travel Insights – for airlines

Another small list of technologies used :

• Python, .Net

• AWS (S3, Redshift, EC2), MS SQL

• Tableau

#SkyscannerSofia

What can we do with these data?1. Dynamics of flight prices

2. Travel Insights for airlines and airports

3. Inspiration – finding good deals• Where?

• When?

• Which deal is good?

4. A small analysis

Travel Inspiration- When and Where

Travel Inspiration - a hack day project

Travel Inspiration – is it a good deal?

Travel Inspiration - Skyscanner API

Technologies used:Google maps, Python, Flask, AWS Redshift, Skyscanner API

You want to do better?http://business.skyscanner.net/

You can get a trial API key by filling in the feedback form at

the end of the event:http://goo.gl/forms/i4C2VcSGyW

#SkyscannerSofia

What can we do with this data?1. Dynamics of flight prices

2. Travel Insights for airlines and airports

3. Inspiration – finding good deals

4. A small analysis or how did demand for trips to Greece

change in the heat of the crisis and what do the Danish

know about it?

Analysis - Greece

Analysis - Greece

Red represents week

on week decrease.

Green is increase.

Data for 2015

Analysis - Greece

Red represents week

on week decrease.

Green is increase.

Data for 2014

What we know we did not talk about?

• What is the best way to get the cheapest deals?

• Recommendations

• Personalization

• A/B testing

• Sorting of flight results

• Infrastructure

• Ahum, “Travel”…

Image credit: jangosteve.com

#SkyscannerSofia

Thank you!

Please give us feedback or apply for API keys

here: http://goo.gl/forms/i4C2VcSGyW

• Konstantin [email protected]

• Plamen [email protected]

We are hiring!!!