Download - Productive data engineer
How to Be Productive Data Engineer
Rafal Wojdyla - [email protected]: My views are my own and don't necessarily represent those of Spotify.
• Operations
• Development
• Organization
• Culture
What is Spotify?For everyone:
• Streaming Service
• Launched in October 2008
• 60 Million Monthly Users
• 15 Million Paid Subscribers
+ and for me:
• 1.3K nodes Hadoop cluster
Automation
ME
ADAM
Apache AmbariCloudera Manager
+ Puppet
Not InventedHere
Never InventedHere
Wild Wild West
Apache Bigtop
Enable log aggregation
To enable log aggregation
yarn.log-aggregation-enable = trueyarn.log-aggregation.retain-seconds = ?
+ <property>+ <name>yarn.log-aggregation-enable</name>+ <value>true</value>+ </property>++ <property>+ <name>yarn.log-aggregation.retain-seconds</name>+ <value>315569260</value>+ <!--retention: 10 years-->+ </property>
Heap Memory used is 97%
Hellelephant
Custom logs• Profiling
• Garbage collection
Right tool for the job
Right abstraction for the job
Scaling machines is easy, scaling
people is hard
• Map split size
• Number of reducers
• HDFS data retention
• User feedback (ongoing)
Automation
Organization
Ownerless
Ownerless Squad
Ownerless
Squad Upgrades
Ownerless
Squad Upgrades Getting there
Culture
ExperimentFail Fast
Embrace Failure
Spark
But we have tried!
Non grata
Spark
spark.storage.memoryFraction (0.6)spark.shuffle.memoryFraction (0.2)
In shuffle heavy algorithms reduce cache fraction in favour of shuffle.
Spark
spark.executor.heartbeatInterval (10K)spark.core.connection.ack.wait.timeout (60)
Increase in case of long GC pauses.
Learnings• Operations Automation
• Development Abstraction
• Organization Team
• Culture Experiment
Join the bandEngineers wanted inNYC & Stockholm
http://spotify.com/jobs