productive data engineer
TRANSCRIPT
![Page 1: Productive data engineer](https://reader034.vdocument.in/reader034/viewer/2022042716/55a92b931a28ab793e8b463d/html5/thumbnails/1.jpg)
How to Be Productive Data Engineer
Rafal Wojdyla - [email protected]: My views are my own and don't necessarily represent those of Spotify.
![Page 2: Productive data engineer](https://reader034.vdocument.in/reader034/viewer/2022042716/55a92b931a28ab793e8b463d/html5/thumbnails/2.jpg)
• Operations
• Development
• Organization
• Culture
![Page 3: Productive data engineer](https://reader034.vdocument.in/reader034/viewer/2022042716/55a92b931a28ab793e8b463d/html5/thumbnails/3.jpg)
What is Spotify?For everyone:
• Streaming Service
• Launched in October 2008
• 60 Million Monthly Users
• 15 Million Paid Subscribers
+ and for me:
• 1.3K nodes Hadoop cluster
![Page 4: Productive data engineer](https://reader034.vdocument.in/reader034/viewer/2022042716/55a92b931a28ab793e8b463d/html5/thumbnails/4.jpg)
Automation
![Page 5: Productive data engineer](https://reader034.vdocument.in/reader034/viewer/2022042716/55a92b931a28ab793e8b463d/html5/thumbnails/5.jpg)
ME
ADAM
![Page 6: Productive data engineer](https://reader034.vdocument.in/reader034/viewer/2022042716/55a92b931a28ab793e8b463d/html5/thumbnails/6.jpg)
Apache AmbariCloudera Manager
![Page 7: Productive data engineer](https://reader034.vdocument.in/reader034/viewer/2022042716/55a92b931a28ab793e8b463d/html5/thumbnails/7.jpg)
+ Puppet
![Page 8: Productive data engineer](https://reader034.vdocument.in/reader034/viewer/2022042716/55a92b931a28ab793e8b463d/html5/thumbnails/8.jpg)
Not InventedHere
![Page 9: Productive data engineer](https://reader034.vdocument.in/reader034/viewer/2022042716/55a92b931a28ab793e8b463d/html5/thumbnails/9.jpg)
Never InventedHere
![Page 10: Productive data engineer](https://reader034.vdocument.in/reader034/viewer/2022042716/55a92b931a28ab793e8b463d/html5/thumbnails/10.jpg)
Wild Wild West
![Page 11: Productive data engineer](https://reader034.vdocument.in/reader034/viewer/2022042716/55a92b931a28ab793e8b463d/html5/thumbnails/11.jpg)
Apache Bigtop
![Page 12: Productive data engineer](https://reader034.vdocument.in/reader034/viewer/2022042716/55a92b931a28ab793e8b463d/html5/thumbnails/12.jpg)
Enable log aggregation
![Page 13: Productive data engineer](https://reader034.vdocument.in/reader034/viewer/2022042716/55a92b931a28ab793e8b463d/html5/thumbnails/13.jpg)
To enable log aggregation
yarn.log-aggregation-enable = trueyarn.log-aggregation.retain-seconds = ?
![Page 14: Productive data engineer](https://reader034.vdocument.in/reader034/viewer/2022042716/55a92b931a28ab793e8b463d/html5/thumbnails/14.jpg)
+ <property>+ <name>yarn.log-aggregation-enable</name>+ <value>true</value>+ </property>++ <property>+ <name>yarn.log-aggregation.retain-seconds</name>+ <value>315569260</value>+ <!--retention: 10 years-->+ </property>
![Page 15: Productive data engineer](https://reader034.vdocument.in/reader034/viewer/2022042716/55a92b931a28ab793e8b463d/html5/thumbnails/15.jpg)
Heap Memory used is 97%
![Page 16: Productive data engineer](https://reader034.vdocument.in/reader034/viewer/2022042716/55a92b931a28ab793e8b463d/html5/thumbnails/16.jpg)
Hellelephant
![Page 17: Productive data engineer](https://reader034.vdocument.in/reader034/viewer/2022042716/55a92b931a28ab793e8b463d/html5/thumbnails/17.jpg)
Custom logs• Profiling
• Garbage collection
![Page 18: Productive data engineer](https://reader034.vdocument.in/reader034/viewer/2022042716/55a92b931a28ab793e8b463d/html5/thumbnails/18.jpg)
Right tool for the job
![Page 19: Productive data engineer](https://reader034.vdocument.in/reader034/viewer/2022042716/55a92b931a28ab793e8b463d/html5/thumbnails/19.jpg)
![Page 20: Productive data engineer](https://reader034.vdocument.in/reader034/viewer/2022042716/55a92b931a28ab793e8b463d/html5/thumbnails/20.jpg)
![Page 21: Productive data engineer](https://reader034.vdocument.in/reader034/viewer/2022042716/55a92b931a28ab793e8b463d/html5/thumbnails/21.jpg)
Right abstraction for the job
![Page 22: Productive data engineer](https://reader034.vdocument.in/reader034/viewer/2022042716/55a92b931a28ab793e8b463d/html5/thumbnails/22.jpg)
Scaling machines is easy, scaling
people is hard
![Page 23: Productive data engineer](https://reader034.vdocument.in/reader034/viewer/2022042716/55a92b931a28ab793e8b463d/html5/thumbnails/23.jpg)
• Map split size
• Number of reducers
• HDFS data retention
• User feedback (ongoing)
Automation
![Page 24: Productive data engineer](https://reader034.vdocument.in/reader034/viewer/2022042716/55a92b931a28ab793e8b463d/html5/thumbnails/24.jpg)
Organization
![Page 25: Productive data engineer](https://reader034.vdocument.in/reader034/viewer/2022042716/55a92b931a28ab793e8b463d/html5/thumbnails/25.jpg)
![Page 26: Productive data engineer](https://reader034.vdocument.in/reader034/viewer/2022042716/55a92b931a28ab793e8b463d/html5/thumbnails/26.jpg)
Ownerless
![Page 27: Productive data engineer](https://reader034.vdocument.in/reader034/viewer/2022042716/55a92b931a28ab793e8b463d/html5/thumbnails/27.jpg)
Ownerless Squad
![Page 28: Productive data engineer](https://reader034.vdocument.in/reader034/viewer/2022042716/55a92b931a28ab793e8b463d/html5/thumbnails/28.jpg)
Ownerless
Squad Upgrades
![Page 29: Productive data engineer](https://reader034.vdocument.in/reader034/viewer/2022042716/55a92b931a28ab793e8b463d/html5/thumbnails/29.jpg)
Ownerless
Squad Upgrades Getting there
![Page 30: Productive data engineer](https://reader034.vdocument.in/reader034/viewer/2022042716/55a92b931a28ab793e8b463d/html5/thumbnails/30.jpg)
Culture
![Page 31: Productive data engineer](https://reader034.vdocument.in/reader034/viewer/2022042716/55a92b931a28ab793e8b463d/html5/thumbnails/31.jpg)
ExperimentFail Fast
Embrace Failure
![Page 32: Productive data engineer](https://reader034.vdocument.in/reader034/viewer/2022042716/55a92b931a28ab793e8b463d/html5/thumbnails/32.jpg)
Spark
But we have tried!
Non grata
![Page 33: Productive data engineer](https://reader034.vdocument.in/reader034/viewer/2022042716/55a92b931a28ab793e8b463d/html5/thumbnails/33.jpg)
Spark
spark.storage.memoryFraction (0.6)spark.shuffle.memoryFraction (0.2)
In shuffle heavy algorithms reduce cache fraction in favour of shuffle.
![Page 34: Productive data engineer](https://reader034.vdocument.in/reader034/viewer/2022042716/55a92b931a28ab793e8b463d/html5/thumbnails/34.jpg)
Spark
spark.executor.heartbeatInterval (10K)spark.core.connection.ack.wait.timeout (60)
Increase in case of long GC pauses.
![Page 35: Productive data engineer](https://reader034.vdocument.in/reader034/viewer/2022042716/55a92b931a28ab793e8b463d/html5/thumbnails/35.jpg)
Learnings• Operations Automation
• Development Abstraction
• Organization Team
• Culture Experiment
![Page 36: Productive data engineer](https://reader034.vdocument.in/reader034/viewer/2022042716/55a92b931a28ab793e8b463d/html5/thumbnails/36.jpg)
Join the bandEngineers wanted inNYC & Stockholm
http://spotify.com/jobs