clean code in jupyter notebook

43
@KNerush @Volodymyrk Clean Code In Jupyter notebooks, using Python 1 5th of July, 2016

Upload: volodymyr-kazantsev

Post on 14-Apr-2017

703 views

Category:

Software


4 download

TRANSCRIPT

Page 1: Clean Code in Jupyter notebook

@KNerush @Volodymyrk

Clean CodeIn Jupyter notebooks, using Python

1

5th of July, 2016

Page 2: Clean Code in Jupyter notebook

@KNerush @Volodymyrk

Volodymyr (Vlad) Kazantsev

Head of Data @ product madness

Product Manager

MBA @LBS

Graphics programming

Writes code for money since 2002

Math degree2

Kateryna (Katya) Nerush

Mobile Dev @ Octopus Labs

Dev Lead in Finance

Data Engineer

Web Developer

Writes code for money since 2003

CS degree

Page 3: Clean Code in Jupyter notebook

@KNerush @Volodymyrk

Why we end-up with messy ipy notebooks?

3

Coding

Stats Business

Page 4: Clean Code in Jupyter notebook

@KNerush @Volodymyrk

Who are Data Scientists, really?

4

Coding

Stats Business “In a nutshell, coding is telling a computer to do something using a language it understands.”

Data Science with Python

Page 5: Clean Code in Jupyter notebook

@KNerush @Volodymyrk

It is not going to production anyway!

5

Page 6: Clean Code in Jupyter notebook

@KNerush @Volodymyrk

“Any fool can write code that a computer can understand. Good programmers write code that humans can understand” - Kent Beck, 1999

6

WTF! How am I suppose to validate this??

Sorry, but how do can I calculate 7 day retention ?

Page 7: Clean Code in Jupyter notebook

@KNerush @Volodymyrk

From Prototype to ... The Data Science Spiral

7

Ideas & Questions

Data Analysis

Insights

Impact

Page 8: Clean Code in Jupyter notebook

@KNerush @Volodymyrk

You do it for your own good..

8

Re-run all AB tests analysis for the last months, by tomorrow

Ideas & Questions

Data Analysis

Insights

Impact

Page 9: Clean Code in Jupyter notebook

@KNerush @Volodymyrk

Part 2What can Data Scientists learn from

Software Engineers?

9

Page 10: Clean Code in Jupyter notebook

@KNerush @Volodymyrk

Robert C. Martin, a.k.a. “Uncle Bob”

10

https://cleancoders.com/

Page 11: Clean Code in Jupyter notebook

@KNerush @Volodymyrk

“Clean Code” ?

11

Pleasingly graceful and stylish in appearance or manner

Bjarne Stroustrup

Inventor of C++

Clean code reads like well written proseGrady Booch

creator of UML

.. each routine turns out to be pretty much what you expected

Ward Cunningham

inventor of Wiki and XP

Page 12: Clean Code in Jupyter notebook

@KNerush @Volodymyrk

One does not simply start writing clean code..

12

First make it work,Then make it Right,Then make it fast and small

Kent Beckco-inventor of XP and TDD

Leave the campground cleaner than you found it

- Run all the tests- Contains no duplicate code- Expresses all ideas...- Minimize classes and methods

Ron Jeffriesauthor of Extreme

Programming Installed

The Boy Scouts of America

Applied to programming by Uncle Bob

Page 13: Clean Code in Jupyter notebook

@KNerush @Volodymyrk

I'm not a great programmer; I'm just a good programmer with great habits.

13

Kent Beck

Page 14: Clean Code in Jupyter notebook

@KNerush @Volodymyrk

“There are only two hard problems in Computer Science: cache invalidation and naming things" - Phil Karlton● long_descriptive_names

○ Avoid: x, i, stuff, do_blah()

● Pronounceable and Searchable○ revenue_per_payer vs. arpdpu

● Avoid encodings, abbreviations, prefixes, suffixes.. if possible○ bonus_points_on_iphone vs. cns_crm_dip

● Add meaningful context○ daily_revenue_per_payer

● Don’t be lazy. ○ Spend time naming and renaming things.

14

Page 15: Clean Code in Jupyter notebook

@KNerush @Volodymyrk

“each routine turns out to be pretty much what you expected” - Ward Cunningham● Small

● Do one thing

● One Level of Abstraction

● Have only few arguments (one is the best)

○ Less important in Python, with named arguments.

15

Page 16: Clean Code in Jupyter notebook

@KNerush @Volodymyrk

● Use good names

● Avoid obvious comments.

● Dead Commented-out Code

● ToDo, licenses, history, markup for documentation and other nonsense

● But there are exceptions..

“When you feel the need to write a comment, first try to refactor the code so that any comment becomes superfluous” Kent Beck

16

Page 17: Clean Code in Jupyter notebook

@KNerush @Volodymyrk

// When I wrote this, only God and I understood what I was doing// Now, God only knows

17

Page 18: Clean Code in Jupyter notebook

@KNerush @Volodymyrk

// sometimes I believe compiler ignores all my comments

18

Page 19: Clean Code in Jupyter notebook

@KNerush @Volodymyrk

/*** Always returns true.*/public boolean isAvailable() { return false;}

19

Page 20: Clean Code in Jupyter notebook

@KNerush @Volodymyrk

“Long functions is where classes are trying to hide” - Robert C. Martin

20

● Small

● Do one thing

● SOLID, Design Patterns, etc.

Page 21: Clean Code in Jupyter notebook

@KNerush @Volodymyrk

Code conventions

● Team should produce same style code as if that was one person● Team conventions over language one, over personal ones● Automate style formatting

21

Page 22: Clean Code in Jupyter notebook

@KNerush @Volodymyrk

Part 3How to write Clean Code in Python?

(e.g. this is not Java)

22

Page 23: Clean Code in Jupyter notebook

@KNerush @Volodymyrk

● Indentation● Tabs or Spaces?● Maximum Line Length● Should a line break before or after a binary operator?● Blank Lines● Imports● Comments● Naming Conventions

Example:

PEP 8 -- Style Guide for Python Code

23

foo = long_function_name(var_one, var_two, var_three, var_four)

foo = long_function_name(var_one, var_two, var_three, var_four)

Good Bad

https://www.python.org/dev/peps/pep-0008/

Page 24: Clean Code in Jupyter notebook

@KNerush @Volodymyrk

Google Python Style Guide

24

https://google.github.io/styleguide/pyguide.html

Page 25: Clean Code in Jupyter notebook

@KNerush @Volodymyrk25

My favourite !

This is not Java or C++

● Functions are first-class objects● Duck-typing as an interface● No setters/getters● Itertools, zip, enumerate● etc.

Page 26: Clean Code in Jupyter notebook

@KNerush @Volodymyrk

Part 4How to write Clean Python Code in Jupyter

Notebook?

26

Page 27: Clean Code in Jupyter notebook

@KNerush @Volodymyrk

1. Imports

27

2. Get Data

5.Visualisation

6. Making sense of the data

4. Modelling

3. Transform Data

Typical structure of the ipynb

Page 28: Clean Code in Jupyter notebook

@KNerush @Volodymyrk

How big should a notebook file be?

28

Page 29: Clean Code in Jupyter notebook

@KNerush @Volodymyrk

How big should a notebook file be?

Hypothesis - Data - Interpretation

29

Page 30: Clean Code in Jupyter notebook

@KNerush @Volodymyrk

Keep your notebooks small!

(4-10 cells each)

30

Page 31: Clean Code in Jupyter notebook

@KNerush @Volodymyrk

Example:

Tip 1: break fat notebook into many small ones

31

1_data_preparation.ipynb

df.to_pickle(‘clean_data_1.pkl)

2_linear_model.py

df = pd.read_pickle(‘clean_data_1.pkl)

3_ensamble.py

df = pd.read_pickle(‘clean_data_1.pkl)

Page 32: Clean Code in Jupyter notebook

@KNerush @Volodymyrk

Tip 2: shared library

● Data access● Common plotting functionality● Report generation● Misc. utils

32

acme_data_utils Data_access.py plotting.py setup.py tests/

Page 33: Clean Code in Jupyter notebook

@KNerush @Volodymyrk

Tip 3: Don’t just be pythonic. Be IPythonicDon’t hide “secret sauce” inside imported module

BAD:

Good:

33

Page 34: Clean Code in Jupyter notebook

@KNerush @Volodymyrk

Clean code reads like well written prose

34

Grady Booch

Page 35: Clean Code in Jupyter notebook

@KNerush @Volodymyrk

Good jupyter notebook reads like well written prose

35

Page 36: Clean Code in Jupyter notebook

@KNerush @Volodymyrk

How big should one Cell be?

36

Page 37: Clean Code in Jupyter notebook

@KNerush @Volodymyrk

● One “idea - execution - output” triplet per cell● Import Cell: expected output is no import errors● CMD+SHIFT+P

37

Tip 4: each cell should have one logical output

Page 38: Clean Code in Jupyter notebook

@KNerush @Volodymyrk

Tip 5: write tests .. in jupyter notebooks

38

https://pypi.python.org/pypi/pytest-ipynb

Page 39: Clean Code in Jupyter notebook

@KNerush @Volodymyrk

Tip 6: ..to the cloud

39

Page 40: Clean Code in Jupyter notebook

@KNerush @Volodymyrk

Code Smells .. in ipynb

- Cells can’t be executed in order (with runAll and Restart&RunAll)- Prototype (check ideas) code is mixed with “analysis” code- Debugging cells- Copy-paste cells- Duplicate code (in general)- Multiple notebooks that re-implement the same function

40

Page 41: Clean Code in Jupyter notebook

@KNerush @Volodymyrk

Tip 7: Run notebook from another notebook!

41

analysis.ipynb

Page 42: Clean Code in Jupyter notebook

@KNerush @Volodymyrk

Make Data Product from notebooks!

42

Page 43: Clean Code in Jupyter notebook

@KNerush @Volodymyrk

Summary: How to organise a Jupyter project

1. Notebook should have one Hypothesis-Data-Interpretation loop2. Make a multi-project utils library3. Good jupyter notebook reads like a well written prose4. Each cell should have one and only one output5. Write tests in notebooks6. Deploy a shared Jupyter server7. Try to keep code inside notebooks. Avoid refactoring to modules, if possible.

43