open source big data landscape and possible its applications
TRANSCRIPT
Tomasz Szymański Adam Warski
SoftwareMill
Open source big data landscape and possible ITS applications
Big Data? Fast Data?• No clear definition• Big Data
– 100s+ of GB? – Time frame?
• Fast Data– Real-time– Single-node vs multi-node
Why Open Source?• Large developer base
Easy to learn• Projects usually backed by a commercial entity
Support• Cost efficiency
leverage latest developments• Future-proofing
tools with a large user base will be around for longer
Apache Spark / Cassandra / Kafka• Data ingestion: Kafka• Data processing: Spark• Data storage: Cassandra
Apache Spark / Cassandra / Kafka• Spark: largest cluster 8k nodes, eBay, Baidu, NASA, Amazon• Cassandra: over 75k nodes storing 10PB of data at Apple• Kafka: over 1.1 trillion messages per day at LinkedIn
Possible ITS applications
Hotspot detectionComputed using New York open taxi data, Akka & Apache Flink
Architecture of a traffic-jam detection systemLeveraging Apache Kafka, Hadoop, Spark, Cassandra & Akka
Summing up and the future• Open source has a lot to offer• Open data?• Fast-evolving field
– Rapid development, rapid data insights– Leverage in ITS!
technical expertise
‘s ITS domainexperts
• Founded in 2009• Bespoke software development services• Various domains, including logistics & transport• Big data a common theme in our projects