extending ruby by harnessing other languages
TRANSCRIPT
Extending Ruby by harnessing other languages
Brendon McLean@brendon9x
What if Ruby doesn’t do everything well?
But you want to use Ruby anyway
A long time ago, in a startup far, far away…
The Challenge a.k.a “The value proposition”
To bring meaningful insight and intelligence to market research*
*In two weeks please
SolutionMVP on Ruby on Rails
The Requirement a.k.a “The Problem”
a.k.a Numerical Ruby Please
Dimensionality* *lots of it
RESP AGE GENDER USE_IOS USE_ANDROID CAT_PERSON1 18-24 M 1 1 N2 25-35 M N3 35-50 F 1 1 Y4 35-50 F Y5 18-24 M 1 1 Y6 25-35 M 1 Y7 25-35 M 1 1 Y8 35-50 F 1 N9 18-24 F 1 Y10 18-24 F 1 N11 35-50 F 1 N12 35-50 F 1 N13 18-24 F 1 Y14 35-50 M 1 N15 18-24 F 1 N16 35-50 M 1 N17 35-50 M 1 1 NN 25-35 M Y
up to 40GB* *per dataset
Loosely structured
Interrogate Anything* *goodbye clever caching strategy
Long story short…
Approx FLOPS
1,000
1,000,000
1,000,000,000
1,000,000,000,000
Plain Ruby (2011) Bitset (2012) GSL (2012) NArray (2013)
Complexity inflection point
Performance Abstraction Power Complexity
Time to consider other options
If, hypothetically, we weren’t using Ruby, what else is out there?
HaskellComes with free beard*
*beard must compile
NOBODY EXPECTED PYTHON!
Why Python?
Size of scientific community
Depth of ecosystem
Similar* attitude to usability and expressiveness first
*ish
Performance
Numpy
• Lineage goes back to 1995
• Array computing — vectorised operations for Python
• NArray is based on Numpy
• Is the bedrock upon which the rest of scientific Python is built
Vectorisation
$> array.reduce(&:+)
$> a.zip(b).map do |l, r| l * r end
$> array.sum
$> a * b
Pandas
• Built on Numpy
• Basically ports the best bits of R into Python
• Fast
• Cognitively simpler for general programmers
• Munging!
Bonus extras
• Scipy: Linear Algebra, FFT, Clustering, Stats
• IPython Notebooks
• Sympy: Computer Algebra System
• nltk: Natural Language Toolkit
• scikit-learn: Machine Learning
Strength of community
Total commits
NArray GSL Pandas Numpy
12,588
10,865
193141
Contributers
NArray GSL Numpy Pandas
310
249
44
Issues: Open and Closed
GSL NArray Numpy Pandas
1051
651
181
4681
2691
185
Using Python from Ruby ❤️
Problem statement
Flexibility of Pandas
Speed of Numpy
Scales horizontally
Ruby ❤️ API
API Inspiration
ActiveRecord scopesDeferred, composable
API implementation problemGetting to Ruby to run Python === Getting Python to run Ruby
Ruby => Data => Python*
*or other
“Code is data”— people with LISP personality disorder
S-Expressions(function arg1 arg2 arg3 ...)
Simple s-expression example
$> (+ 1 1) => 2
$> (find User 1 2)
$> 1 + 1 => 2
$> User.find(1, 2) $> User.send(:find, 1, 2)
Example with nesting
$> => 5
2 2* 1+
Example with nesting
$> => 5
22*1+( ( ) )
ActiveLISP
User.select(:state). where(no_spam: false). group(:state). count
(count (group :state (where :no_spam false (select :state User) ) ) )
Ruby => S-Expressions
S-Expressions => Python
Does have limitations*
Added benefits
Optimisation (tree rewrites)
Automatic query sharding
Target multiple backends through a common API
Enough talking…
Live Demo
Thanks
• Min RK Initial iRuby Kernel
• Daniel Mendler for continued work on iRuby Kernel
• My Team @ Intellection
• And all the gems!
London Cape Townhttp://www.public-domain-image.com/architecture/bridge/slides/bridge-london-england.html
intellection