extracting value from big data - the case vehicular traffic data by christian s. jensen
TRANSCRIPT
Christian S. Jensen
www.cs.aau.dk/~csj
Extracting Value from Big Data –
The Case of Vehicular Traffic Data
Roadmap
• Big data
Hype or substance?
Instrumentation of reality and digitization
The digital universe
Moore’s Law generalized
Big data challenges
• Big data in traffic
Motivation
Data and systems
Eco driving and routing
Traffic analytics examples
Hype or Substance?
• We have been pushing the boundaries for decades
How much data we can handle
How fast
Data integration
• Examples
VLDB: International Conference on Very Large Database
TODS: ACM Transactions on Database Systems
• So is it all hype?
No
Instrumentation and Digitization
• Instrumentation of reality
Notably, smartphones
• Digitization of processes
E.g., e-commerce, public services, communications, social
interactions
The Vatican, 2005
The Vatican, 2013
2005 vs. 2013
The Digital Universe
• The digital universe
Doubling every ~18-24 months
Grew 60% in 2009, 50% in 2010
2009: 0.8 zettabyte, 2010: 1.2 zettabyte, 2020: 35 zettabytes
2009-20: growth by a factor of 44
http://www.emc.com/collateral/demos/microsites/idc-digital-universe/iview.htm
1 zettabyte = 1024 exabytes
1 exabyte = 1024 petabytes =
260 = 1,152,921,504,606,846,976 ≈ 1018 bytes
Moore’s Law – The Bicycle Analogy
• Moore’s Law: computers double in speed every 24
months.
Applies also to quality-adjusted microprocessor prices, memory
capacity, disks, networks, sensors, and the number and size of
pixels in cameras.
• How fast would a bicyclist be if Moore’s Law applied?
50 years of doubling every 24 months
30 km/h originally
~1 billion km/h now
• Three lessons
Growth rates in computing are dramatic and difficult to imagine.
Hardware advances are important information technology drivers.
Humans don’t really improve – they are the constants.
Big Data – Synthesis
• The result is new opportunity.
• Lots of data and unprecedented computing infrastructure
combine to offer potentials for value creation from data.
• To be competitive, society and businesses must be able to
create value from data
• Data-based decisions and data-driven processes
Decisions based on good data beat decisions based on feelings or
opinions
• A finer granularity of services
• Entirely new services
Big Data – Data-Driven Society, Business
ITS – Motivation
• A safer, greener, and more efficient and cost-effective
transportation infrastructure
• Greenhouse gas emissions reductions via eco-routing
• Congestion, greater Copenhagen region
~10 billion DKK/year (2004)
• Bad setting of signalized intersections in Denmark
~9,3 billion DKK/year (2012)
Data, Software and Hardware
• Experimental infrastructure
• Data
4+ billion GPS records, 17.000+ vehicles
350+ million CAN Bus records
Conventional and electric vehicles (GPS/CAN bus data)
17 data sources, ~3 million rows per day from 3,500 vehicles
• Software
Have complete software stack for handling traffic data
Map-matching, data cleansing, multiple map support
• Hardware
Very modern server farm
Newest machine has 2TB main memory
Travel Speeds, Alssundbroen
Eco-Routing Framework
3D Laser Scan Point Cloud
Road Network Lifting
2D Road Network
3D Road Network Historical GPS Data Real-time GPS data
Eco-Weight Initialization Eco-Weight Maintenance
Eco-Weighted Road Network
Basic Eco-
Routing Skyline
Eco-Routing
Personalized
Eco-Routing
Source,
Target,
Time
Basic Eco-routes Skyline Eco-routes Personalized Eco-routes
3D Spatial Network
• Spatial network lifting
2D spatial network: OpenStreetMap
Aerial laser scan of Denmark (1+ point per m2; 2.5 TB for
Denmark)
Basic Eco Routing – CPH to the Train
Good 17:08 8.06 km 0.80 l
Bad 18:06 13.64 km 1.23 l
Napoleon’s Russian Campaign
Digression: Methodology
• The same methodology underlies the studies.
• Define precisely a problem of (perceived) real-world
interest.
• Develop solutions
Concepts, data structures, algorithms
• Carry out mathematical analyses
Correctness, complexity, storage size
• Prototype the solutions and perform empirical studies
Often, real data is needed
Offers detailed insight in the design properties of the solutions
• Iterate!
The Future
• Much more data
Inductive loop detectors
Bus data
Rejsekortet
• Much more connected vehicles
• New services
Routing
Safety and warnings
Parking, fees, insurance, road pricing
Car sharing, multi-modality
• Driver-less vehicles
Thank you for your attention.
Four Prototypes
• http://daisy.aau.dk/its
Point based
Travel-time map, congestion map, and eco route
• http://daisy.aau.dk/its/spqdemo
Trajectory based, Strict-Path Queries
Trips (historical travel-time), route choice, Napoleon (road usage)
• http://daisy.aau.dk/its/sheaf
Trajectory based
Traffic sheaf (advanced, high-performance)
• http://daisy.aau.dk/its/eco
Point-based skyline queries
Advanced weights