hadoop + clojure · clojure a new lisp, neither common lisp nor scheme dynamic, functional...
TRANSCRIPT
![Page 1: Hadoop + Clojure · Clojure a new Lisp, neither Common Lisp nor Scheme Dynamic, Functional Immutability and concurrency Hosted on the JVM Open Source (Eclipse Public License)](https://reader031.vdocument.in/reader031/viewer/2022022016/5b5fe4827f8b9a2e618b511f/html5/thumbnails/1.jpg)
Hadoop + Clojure
Hadoop World NYCFriday, October 2, 2009
Stuart Sierra, AltLaw.org
![Page 2: Hadoop + Clojure · Clojure a new Lisp, neither Common Lisp nor Scheme Dynamic, Functional Immutability and concurrency Hosted on the JVM Open Source (Eclipse Public License)](https://reader031.vdocument.in/reader031/viewer/2022022016/5b5fe4827f8b9a2e618b511f/html5/thumbnails/2.jpg)
JVM Languages
Native tothe JVM
Ported tothe JVM
ObjectOriented
JRubyJythonRhino
Groovy
Functional
Clojure
Armed Bear CLKawa
Scala
Java is dead, long live the JVM
![Page 3: Hadoop + Clojure · Clojure a new Lisp, neither Common Lisp nor Scheme Dynamic, Functional Immutability and concurrency Hosted on the JVM Open Source (Eclipse Public License)](https://reader031.vdocument.in/reader031/viewer/2022022016/5b5fe4827f8b9a2e618b511f/html5/thumbnails/3.jpg)
Clojure
● a new Lisp,neither Common Lisp nor Scheme
● Dynamic, Functional● Immutability and concurrency● Hosted on the JVM● Open Source (Eclipse Public License)
![Page 4: Hadoop + Clojure · Clojure a new Lisp, neither Common Lisp nor Scheme Dynamic, Functional Immutability and concurrency Hosted on the JVM Open Source (Eclipse Public License)](https://reader031.vdocument.in/reader031/viewer/2022022016/5b5fe4827f8b9a2e618b511f/html5/thumbnails/4.jpg)
Clojure Primitive TypesString "Hello, World!\n"
Integer 42
Double 2.0e64
BigInteger 9223372036854775808
BigDecimal 1.0M
Ratio 3/4
Boolean true, false
Symbol foo
Keyword :foo
null nil
![Page 5: Hadoop + Clojure · Clojure a new Lisp, neither Common Lisp nor Scheme Dynamic, Functional Immutability and concurrency Hosted on the JVM Open Source (Eclipse Public License)](https://reader031.vdocument.in/reader031/viewer/2022022016/5b5fe4827f8b9a2e618b511f/html5/thumbnails/5.jpg)
Clojure Collections
List (print :hello "NYC")
Vector [:eat "Pie" 3.14159]
Map {:lisp 1 "The Rest" 0}
Set #{2 1 3 5 "Eureka"}
Homoiconicity
![Page 6: Hadoop + Clojure · Clojure a new Lisp, neither Common Lisp nor Scheme Dynamic, Functional Immutability and concurrency Hosted on the JVM Open Source (Eclipse Public License)](https://reader031.vdocument.in/reader031/viewer/2022022016/5b5fe4827f8b9a2e618b511f/html5/thumbnails/6.jpg)
(defn greet [name] (println "Hello," name))
(greet "New York")Hello, New York
public void greet(String name) { System.out.println("Hi, " + name);}
greet("New York");Hi, New York
![Page 7: Hadoop + Clojure · Clojure a new Lisp, neither Common Lisp nor Scheme Dynamic, Functional Immutability and concurrency Hosted on the JVM Open Source (Eclipse Public License)](https://reader031.vdocument.in/reader031/viewer/2022022016/5b5fe4827f8b9a2e618b511f/html5/thumbnails/7.jpg)
(defn average [& nums] (/ (reduce + nums) (count nums)))
(average 1 2 3 4)5/2
public double average(double[] nums) { double total = 0; for (int i = 0; i < nums.length; i++) { total += nums[i]; } return total / nums.length;}
![Page 8: Hadoop + Clojure · Clojure a new Lisp, neither Common Lisp nor Scheme Dynamic, Functional Immutability and concurrency Hosted on the JVM Open Source (Eclipse Public License)](https://reader031.vdocument.in/reader031/viewer/2022022016/5b5fe4827f8b9a2e618b511f/html5/thumbnails/8.jpg)
(def m {:f "foo" :b "bar"})
(m :f)"foo"
(:b m)"bar"
(def s #{1 5 3})
(s 3)true
(s 7)false
Data Structures as Functions
![Page 9: Hadoop + Clojure · Clojure a new Lisp, neither Common Lisp nor Scheme Dynamic, Functional Immutability and concurrency Hosted on the JVM Open Source (Eclipse Public License)](https://reader031.vdocument.in/reader031/viewer/2022022016/5b5fe4827f8b9a2e618b511f/html5/thumbnails/9.jpg)
(import '(com.example.package MyClass YourClass))
(. object method arguments)
(new MyClass arguments)
(.method object arguments)
(MyClass. arguments)
(MyClass/staticMethod)
SyntacticSugar
![Page 10: Hadoop + Clojure · Clojure a new Lisp, neither Common Lisp nor Scheme Dynamic, Functional Immutability and concurrency Hosted on the JVM Open Source (Eclipse Public License)](https://reader031.vdocument.in/reader031/viewer/2022022016/5b5fe4827f8b9a2e618b511f/html5/thumbnails/10.jpg)
...open a stream...try { ...do stuff with the stream...} finally { stream.close();}
(defmacro with-open [args & body] `(let ~args (try ~@body (finally (.close ~(first args))))))
(with-open [stream (...open a stream...)] ...do stuff with stream...)
![Page 11: Hadoop + Clojure · Clojure a new Lisp, neither Common Lisp nor Scheme Dynamic, Functional Immutability and concurrency Hosted on the JVM Open Source (Eclipse Public License)](https://reader031.vdocument.in/reader031/viewer/2022022016/5b5fe4827f8b9a2e618b511f/html5/thumbnails/11.jpg)
coordinated
independent
unshared
synchronous asynchronous
ref
agentatom
var
![Page 12: Hadoop + Clojure · Clojure a new Lisp, neither Common Lisp nor Scheme Dynamic, Functional Immutability and concurrency Hosted on the JVM Open Source (Eclipse Public License)](https://reader031.vdocument.in/reader031/viewer/2022022016/5b5fe4827f8b9a2e618b511f/html5/thumbnails/12.jpg)
mapper(key, value)
reducer(key, values)
list of key-value pairs
list of key-value pairs
(map function values)
(reduce function values)
list of values
single value
![Page 13: Hadoop + Clojure · Clojure a new Lisp, neither Common Lisp nor Scheme Dynamic, Functional Immutability and concurrency Hosted on the JVM Open Source (Eclipse Public License)](https://reader031.vdocument.in/reader031/viewer/2022022016/5b5fe4827f8b9a2e618b511f/html5/thumbnails/13.jpg)
public static class MapClass extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { String line = value.toString(); StringTokenizer itr = new StringTokenizer(line); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); output.collect(word, one); } } }
public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> { public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { int sum = 0; while (values.hasNext()) { sum += values.next().get(); } output.collect(key, new IntWritable(sum)); } }
![Page 14: Hadoop + Clojure · Clojure a new Lisp, neither Common Lisp nor Scheme Dynamic, Functional Immutability and concurrency Hosted on the JVM Open Source (Eclipse Public License)](https://reader031.vdocument.in/reader031/viewer/2022022016/5b5fe4827f8b9a2e618b511f/html5/thumbnails/14.jpg)
(mapper key value)
(reducer key values)
list of key-value pairs
list of key-value pairs
![Page 15: Hadoop + Clojure · Clojure a new Lisp, neither Common Lisp nor Scheme Dynamic, Functional Immutability and concurrency Hosted on the JVM Open Source (Eclipse Public License)](https://reader031.vdocument.in/reader031/viewer/2022022016/5b5fe4827f8b9a2e618b511f/html5/thumbnails/15.jpg)
Clojure-Hadoop 1
(defn mapper-map [this key val out reporter] (doseq [word (enumeration-seq (StringTokenizer. (str val)))] (.collect out (Text. word) (IntWritable. 1))))
(defn reducer-reduce [this key vals out reprter] (let [sum (reduce + (map (fn [w] (.get w)) (iterator-seq values)))] (.collect output key (IntWritable. sum))))
(gen-job-classes)
![Page 16: Hadoop + Clojure · Clojure a new Lisp, neither Common Lisp nor Scheme Dynamic, Functional Immutability and concurrency Hosted on the JVM Open Source (Eclipse Public License)](https://reader031.vdocument.in/reader031/viewer/2022022016/5b5fe4827f8b9a2e618b511f/html5/thumbnails/16.jpg)
Clojure-Hadoop 2
(defn my-map [key value] (map (fn [token] [token 1]) (enumeration-seq (StringTokenizer. value))))
(def mapper-map (wrap-map my-map int-string-map-reader))
(defn my-reduce [key values] [[key (reduce + values)]])
(def reducer-reduce (wrap-reduce my-reduce))
(gen-job-classes)
![Page 17: Hadoop + Clojure · Clojure a new Lisp, neither Common Lisp nor Scheme Dynamic, Functional Immutability and concurrency Hosted on the JVM Open Source (Eclipse Public License)](https://reader031.vdocument.in/reader031/viewer/2022022016/5b5fe4827f8b9a2e618b511f/html5/thumbnails/17.jpg)
Clojure print/read
DATA
STRING
read
![Page 18: Hadoop + Clojure · Clojure a new Lisp, neither Common Lisp nor Scheme Dynamic, Functional Immutability and concurrency Hosted on the JVM Open Source (Eclipse Public License)](https://reader031.vdocument.in/reader031/viewer/2022022016/5b5fe4827f8b9a2e618b511f/html5/thumbnails/18.jpg)
Clojure-Hadoop 3
(defn my-map [key val] (map (fn [token] [token 1]) (enumeration-seq (StringTokenizer. val))))
(defn my-reduce [key values] [[key (reduce + values)]])
(defjob job :map my-map :map-reader int-string-map-reader :reduce my-reduce :inputformat :text)
![Page 19: Hadoop + Clojure · Clojure a new Lisp, neither Common Lisp nor Scheme Dynamic, Functional Immutability and concurrency Hosted on the JVM Open Source (Eclipse Public License)](https://reader031.vdocument.in/reader031/viewer/2022022016/5b5fe4827f8b9a2e618b511f/html5/thumbnails/19.jpg)
public static class MapClass extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { String line = value.toString(); StringTokenizer itr = new StringTokenizer(line); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); output.collect(word, one); } } }
public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> { public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { int sum = 0; while (values.hasNext()) { sum += values.next().get(); } output.collect(key, new IntWritable(sum)); } }
![Page 20: Hadoop + Clojure · Clojure a new Lisp, neither Common Lisp nor Scheme Dynamic, Functional Immutability and concurrency Hosted on the JVM Open Source (Eclipse Public License)](https://reader031.vdocument.in/reader031/viewer/2022022016/5b5fe4827f8b9a2e618b511f/html5/thumbnails/20.jpg)
Clojure-Hadoop 3
(defn my-map [key val] (map (fn [token] [token 1]) (enumeration-seq (StringTokenizer. val))))
(defn my-reduce [key values] [[key (reduce + values)]])
(defjob job :map my-map :map-reader int-string-map-reader :reduce my-reduce :inputformat :text)
![Page 21: Hadoop + Clojure · Clojure a new Lisp, neither Common Lisp nor Scheme Dynamic, Functional Immutability and concurrency Hosted on the JVM Open Source (Eclipse Public License)](https://reader031.vdocument.in/reader031/viewer/2022022016/5b5fe4827f8b9a2e618b511f/html5/thumbnails/21.jpg)
More
● http://clojure.org/● Google Groups: Clojure● #clojure on irc.freenode.net● #clojure on Twitter● http://richhickey.github.com/clojure-contrib● http://stuartsierra.com/● http://github.com/stuartsierra● http://www.altlaw.org/