paul singman insight
TRANSCRIPT
Fantasy Football FisherPaul Singman
Insight Data Engineering FellowOctober 2016
Motivation• Leverage distributed,
open-source technologies to fill gaps in current offerings
• lifetime leagues (handle large volume, 100s of millions of play data)
• 10 second micro leagues (low-latency updates to millions of users)
Data Simulation and Ingestion
• 500 plays per second are simulated from JSON files of real NFL plays from SportRadar API
• JSON play data is parsed for relevant info and sent through keyed Kafka producer
Fantasy Football Fisher Architecture
Instances Cost Total
2 x 4 m4.large $0.0131 /hr $.1048 /hr
Input Data{"player_name": "Marcus Mariota", "timestamp": "2016-09-29_02:19:23", "epoch_timestamp": 1475101163, "touchdown": 0, "yards": 9, "player_id": "7c16c04c-04de-41f3-ac16-ad6a9435e3f7", "position": "QB"}
{"player_name": "Brent Celek", "user_id": 4}{"player_name": "Kelvin Benjamin", "user_id": 4}
Output Tables
user_id | user_points---------+------------- 674594 | 26.56 1199431 | 20.3 990425 | 28.39
Plays
Users - players
Lifetime user scoresLatest player pointsMicro-league Winners
Spark Windowed Stream with 10 second window
Challenges and Other considerations● Ensuring the stream application is stable
● Stable at 1,000,000 users and 500 plays per second● At higher rates instability ocurred● Could be further improved via better
parallelization of streams (one for each Kafka partition)
• Bachelor of Science in Stats from Penn● Shelf full of O’Reilly books● Serial online course taker
(and completer)
• Jr Data Engineer experience at early-stage startup (Mighty)
• Enjoy movies, backgammon, and rooftop yoga