lebanon softshore artificial intelligence seminar - march 38, 2014
DESCRIPTION
Lebanon SoftShore organized a seminar on Artificial Intelligence at USEK on March 28, 2014. This is the presentation of Dr Hayssam SerhanTRANSCRIPT
Artificial Intelligence
Presented by Dr. Hayssam Serhan
Outline Overview of AI Neural Networks Fuzzy Logic Expert Systems R Language (Introduction)
AI Computing Caution: AI is NOT magic AI is a unique approach to programming computers Thinking or conscious computer, is still far off on the digital horizon
AI Objectives Making machines more useful by Making them SMARTER
Understanding intelligence shall be our First Goal
Intelligent Behavior Learn from experience Apply knowledge acquired from experience Handle complex situations Solve problems when important information is missing React quickly and correctly to a new situation Be creative and imaginative Use heuristics
Major Branches of AI Robotics & Perceptive Systems Mechanical and computer devices that perform tedious
tasks with high precision.
Games Playing programming computers to play games. The greatest
advances have occurred in the field of games playing.
Natural Language Processing (NLP) Computers understand and react to statements and
commands made in a “natural” language.
Major Branches of AI Expert System (ES) programming computers to make decisions in real-life
Neural Network Computer system that can act like or simulate the
functioning of the human brain. Unsupervised learning. Supervised learning.
Machine Learning Learning System Machine learning is the study of computer algorithms
that improve automatically through experience Computer changes how it functions or reacts to
situations based on feedback.
“A computer program is said to learn from experience E with respect to some task T and some performance
measure P, if its performance on T, as measured by P, improves with experience E”
Tom Mitchell (1998)
Human VS Artificial Intelligence - Pros
Human Intelligence
Intuition, Common sense, Judgment, Creativity, etc.
The ability to demonstrate their intelligence by communicating effectively
Reasoning and Critical thinking
Artificial Intelligence
Ability to simulate human behavior and cognitive processes
Capture and preserve human expertise
Fast Response.
Human VS Artificial Intelligence - Cons
Human Intelligence Humans are fallible
They have limited knowledge
Information processing of
serial nature proceed very slowly in the brain
Humans are unable to retain large amounts of data
Artificial Intelligence No "common sense" Cannot readily deal with "mixed" knowledge May have high development costs Raise legal and ethical concerns
Conventional Computing VS Artificial Intelligence
Artificial Intelligence
AI software uses the techniques of search and pattern matching
Programmers design AI software to give the computer only the problem, not the steps necessary to solve it
Conventional computing
Conventional computer software follow a logical series of steps to reach a conclusion
Computer programmers originally designed software that accomplished tasks by completing algorithms
Knowledge Representation & Limits
The number of atomic facts that the average person knows is astronomical. Building a complete knowledge base of commonsense requires enormous amounts of engineering. Much of what people know is not represented as "facts" that they could express verbally
Conclusion Intelligent Agents must be able to set goals and achieve them. They need a way to visualize the future and be able to make choices. Currently, no computers exhibit full artificial intelligence. Early AI researchers developed algorithms that require enormous computational resources. The search for more efficient problem-solving algorithms is a high priority for AI research.
Neural Networks Traditional computers cannot work around the failure of even a single transistor. With the biological designs, the algorithms are ever changing, allowing the system to continuously adapt and work around failures to complete tasks.
“We’re moving from engineering computing systems to something that
has many of the characteristics of biological computing”
Larry Smarr, an astrophysicist who directs the California Institute for
Telecommunications and Information Technology
“The new approach, used in both hardware and software, is being driven
by the explosion of scientific knowledge about the brain. But scientists are still far from fully
understanding how brains function” Kwabena Boahen,
a computer scientist who leads Stanford’s Brains in Silicon research program
“The largest class this fall at Stanford was a graduate level machine-learning course covering both statistical and biological
approaches, taught by the computer scientist Andrew Ng. More than 760 students enrolled”
“Everyone knows there is something big happening, and they’re trying find out what it is.”
Terry Sejnowski, a computational neuroscientist at the Salk Institute
Human Brain Movie
Nervous Systems Human brain contains ~ 1011 neurons. Each neuron is connected ~ 104 others. Neurons are slower than logic gates : 10-9 secs for semiconductors 10-3 secs for biologicals neurons
Energy efficiency of the brain is estimated at: 10-16 Joules / operation / sec, The best energy efficiency of computers : is
10-6 Joules / operation / sec
Nervous Systems it takes on average between 100 and 200 msec to recognize a familiar face, it takes days to process much simpler tasks with conventional computers Some scientists compared the brain with a “complex, nonlinear, parallel computer”.
IBM Supercomputer – Compass I.B.M. announced last year that it had built a supercomputer simulation of the brain (Compass) It encompassed roughly 10 billion neurons. It ran about 1,500 times more slowly than an actual brain. Further, it required several megawatts of power, compared with just 20 watts of power used by the biological brain. “attempting to simulate a brain, at the same speed would require a flow of electricity in a conventional
computer that is equivalent to what is needed to power both San Francisco and New York,”
Dr. Modha said
Google & DeepMind Google has acquired DeepMind for 400M$ DeepMind has not yet developed any commercial products. DeepMind main asset appears to be its personnel DeepMind claims that it combines “the best techniques from machine learning and systems neuroscience to build powerful general-purpose learning algorithms.”
Google & AI Google researchers were able to get a machine-learning algorithm based on neural networks, to perform an identification task. The network scanned a database of 10 million images, and in doing so trained itself to recognize cats In June, Google said it had used those neural network techniques to develop a new search service to help customers find specific photos more accurately
Applications
Pattern classification Object recognition Function approximation Data compression Time series analysis and forecast . . .
Neurons
The main purpose of neurons is to receive, analyze and transmit further the information in a form of signals (electric pulses). When a neuron sends the information we say that a neuron “fires”.
Structure of a Biological Neuron
Artificial Neuron
Artificial Neural Networks Movie
Multilayer Perceptron
. . .
. . .
. . .
. . .
x1 x2 xm
y1 y2 yn
Hidden Layer
Input Layer
Output Layer
Knowledge and Memory
. . .
. . .
. . .
. . .
x1 x2 xm
y1 y2 yn
The output behavior of a network is determined by the weights. Weights the memory of an NN. Knowledge distributed across the network. Large number of nodes increases the storage “capacity”; ensures that the knowledge is
robust; fault tolerance.
Store new information by changing weights.
Exp.: Pattern Classification
. . .
. . .
. . .
. . .
x1 x2 xm
y1 y2 yn
Function: x y
The NN’s output is used to distinguish between and recognize different input patterns. Different output patterns correspond to particular classes of input patterns. Networks with hidden layers can be used for solving more complex problems then just a linear pattern classification.
input pattern x
output pattern y
Neural Networks Learning Rules Learning Rules for Multiple-Layered Perceptron Networks
Supervised Learning Goals The goal of any supervised learning algorithm is to find a function that best maps a set of inputs to its correct output. An example would be a simple classification task, where the input is an image of an animal (or the characteristics of this animal), and the correct output would be the name of the animal.
Training Neural Network: Back-Propagation
Supervised learning method, Requires a dataset of the desired output for many inputs, making up the training set, Backpropagation requires that the activation function used by the artificial neurons (or "nodes") be differentiable.
A multi-layered network can create internal representations and learn different features per layer. The first layer may be responsible for learning the orientations of lines using the inputs from the individual pixels in the image. The second layer may combine the features learned in the first layer and learn to identify simple shapes. Each higher layer learns more and more abstract features that can be used to classify the image. Each layer finds patterns in the layer below it and it is this ability to create internal representations that are independent of outside input that gives multi-layered networks its power.
Motivation
Backpropagation Learning Algo. The learning algorithm can be divided into two phases: Phase 1: Propagation
Forward propagation of a training pattern's input through the neural network in order to generate the propagation's output activations.
Backward propagation of the propagation's output activations through the neural network using the training pattern target in order to generate the deltas of all output and hidden neurons.
Phase 2: Weight update Subtract a ratio (percentage) of the gradient from the weight. This ratio (percentage) influences the speed and quality of
learning; it is called the learning rate. The greater the ratio, the faster the neuron trains; the lower the ratio, the more accurate the training is.
Algorithm initialize network weights (often small random values) do forEach training example ex prediction = neural-net-output(network, ex) // forward pass actual = teacher-output(ex) compute error (prediction - actual) at the output units compute for all weights from output layer to hidden layer // backward pass compute for all weights from hidden layer to input layer // backward pass continued update network weights until all examples classified correctly or another stopping criterion satisfied return the network
Neural Network: Simulation
Neuromorphic Processors Those new processors consist of electronic components that can be connected by wires that mimic biological synapses. They are based on large groups of neuron-like elements, and known as neuromorphic processors, They are not “programmed.” The connections between circuits are “weighted” according to correlations in data that the processor has already “learned.” Those weights are then altered as data flows in to the chip, causing them to change their values and to “spike.” That generates a signal that travels to other components and, in reaction, changes the neural network,
Conclusion Neural Network technology offers more natural interaction with the real world. Neural Networks can: learn and adapt to changes in a problem’s environment, establish patterns in situations where rules are not known, deal with fuzzy or incomplete information.
However, they lack explanation facilities and usually act as a black box. The process of training neural networks with current technologies is still slow.
Motion and manipulation: Robotics The field of robotics is closely related to AI.
Motion and Manipulation: Robotics
Intelligence is required for robots to be able to handle such tasks as object manipulation and navigation, with sub-problems of: localization mapping and motion
Robot Quick Description
Each Leg consists of 7 DOFs 3 DOFs – Active for the HIP 1 DOFs – Active for the KNEE 2 DOFs – Active for the ANKLE 1 DOFs – Passive for the FOOT
Robot Control Algorithm
Université de Versailles St Quentin
Neural Network A More Complicated Design (Muscle Modelling)
)()2()1(
)()2()1(
)()2()1(
)(
)(
t
te
te
te
ty
ty
ty
tr
tr
tr
tId
d
d
Learning with plant Identification
Université de Versailles St Quentin – Université Libanaise
Extension
Extension
`Plantarflexion
Robot: Walking – Movies & Stability
Fuzzy Logic
Very important technology dealing with vague, imprecise and uncertain knowledge and data
Fuzzy Logic Fuzzy logic or fuzzy set theory was introduced by Professor Lotfi Zadeh Human experts do not usually think in probability values, but in such terms as often, generally, sometimes, occasionally and rarely. At the heart of fuzzy logic lies the concept of a linguistic variable Linguistic variables are words rather than numbers Fuzzy logic provides the way to break through the computational bottlenecks of traditional expert systems. Eventually, fuzzy theory, ignored in the West, was taken seriously in the East – by the Japanese
Fuzzy Logic: Motivation Modeling of imprecise concepts: Age, Weight, Height, …
Modeling of imprecise dependencies: If Temperature is low and Oil is cheap then
crank up the heating system Origin of Information: Modeling of Expert Knowledge Representation of information extracted from
inherently imprecise data
Characteristic Functions: Crisp Sets
Classical Sets can be described by a characteristic function:
Example: A = {x | a ≤ x ≤ b}
Characteristic Functions: Fuzzy Sets
Fuzzy Sets are described by a membership function: Example:
Linguistic Variables and Values
Linguistic values & Context
Fuzzy Rule System
Basic Elements of a Fuzzy Logic System
2- Fuzzy-Inference
1- Fuzzification 3- Defuzzification
Linguistic Level
Numerical Level
Fuzzy Rule Systems: Example 1
Center Of gravity
Application of Fuzzy Logic
Term Definitions: Distance:= {far, medium, close, zero, neg_close} Angle := {pos_big, pos_small, zero, neg_small, neg_big} Power := {pos_high, pos_medium, zero, neg_medium, neg_high}
1. Fuzzification: - Linguistic Variables -
Membership Function Definition:
-90° -45° 0° 45° 90°0
1
µ
Angle
zero
pos_smallneg_smallneg_big pos_big
4°
0.8
0.2
-10 0 10 20 300
1
µ
Distance [yards]
zero close medium farneg_close
12m
0.9
0.1
Computation of the “IF-THEN”-Rules: #1: IF Distance = medium AND Angle = pos_small THEN Power = pos_medium #2: IF Distance = medium AND Angle = zero THEN Power = zero #3: IF Distance = far AND Angle = zero THEN Power = pos_medium #4: …….
2. Fuzzy-Inference: “IF-THEN”-Rules
Aggregation: Computing the “IF”-Part Composition: Computing the “THEN”-Part
The Rules of the Fuzzy Logic Systems Are the “Laws” It Executes !
2. Fuzzy-Inference: Composition Result for the Linguistic Variable "Power": pos_high with the degree 0.0 pos_medium with the degree 0.8 ( = max{ 0.8, 0.1 } ) zero with the degree 0.2 neg_medium with the degree 0.0 neg_high with the degree 0.0
Composition Computes How Each Rule Influences the Output Variables !
3. Defuzzification Finding a Compromise Using “Center-of-Maximum”:
-30 -15 0 15 300
1
µ
Power [Kilowatts]
zeroneg_mediumneg_high pos_medium pos_high
6.4 KW
“Balancing” Out the Result !
Fuzzy Logic: Simulation
Improved Computational Power Fuzzy rule-based systems perform faster than conventional expert systems Fuzzy Systems require fewer rules. A fuzzy expert system merges the rules, making them more powerful. Lotfi Zadeh believes that in a few years most expert systems will use fuzzy logic to solve highly nonlinear and computationally difficult problems.
Summary Fuzzy systems allow expression of expert knowledge in a more natural way, they still depend on the rules extracted from the experts, and thus might be smart or dumb. Some experts can provide very clever fuzzy rules – but some just guess and may even get them wrong. Therefore, all rules must be tested and tuned, which can be a prolonged and tedious process. It took Hitachi engineers several years to test and tune only 54 fuzzy rules to guide the Sendal Subway System.
Expert Systems
An expert system is a computer program that is designed to hold the accumulated knowledge of one or more domain experts
ES imitate the expert’s reasoning processes to solve specific problems
Overview of Expert Systems Can… Explain their reasoning or suggested decisions Display intelligent behavior Draw conclusions from complex relationships Provide portable knowledge
Expert system shell A collection of software packages and tools used
to develop expert systems
IBM & Expert Systems It has been two years since Watson, the artificial intelligence program created by I.B.M.. Watson, Watson has access to roughly 200 million pages of information, and is able to understand natural language queries and answer questions. The computer maker had initially planned to test the system as an expert adviser to doctors; the idea was that Watson’s encyclopedic knowledge of medical conditions could aid a human expert in diagnosing illnesses.
IBM & Watson In May, I.B.M. announced a general-purpose version of its software, the “I.B.M. Watson Engagement Advisor.” The idea is to make the company’s question-answering system available in a wide range of call center, technical support and telephone sales applications. The company says that as many as 61 percent of all telephone support calls currently fail because human support-center employees are unable to give people correct or complete information.
When to Use an Expert System
Capture and preserve irreplaceable human expertise Provide expertise needed at a number of locations at the same time Provide expertise needed in a hostile environment that is dangerous to human health Provide expertise that is expensive or rare Develop a solution faster than human experts Provide a high potential payoff or significantly reduced downside risk
Limitations of Expert Systems Limited to relatively narrow problems
May have high development costs
May raise legal and ethical concerns
Cannot readily deal with “mixed” knowledge
Possibility of error
Difficult to maintain
Legal and Ethical Issues Who is responsible if the advice is wrong? The user? The domain expert? The knowledge engineer? The programmer of the expert system shell? The company selling the software?
Transferring Expertise Objective of an expert system To transfer expertise from an expert to a
computer system and Then on to other humans (nonexperts)
Activities Knowledge acquisition Knowledge representation Knowledge inferencing Knowledge transfer to the user
Knowledge is stored in a knowledge base
An Expert System Example General Electric's (GE) : Top Locomotive Field Service Engineer was
Nearing Retirement Traditional Solution: Apprenticeship but would like A more effective and dependable way to disseminate expertise To prevent valuable knowledge from retiring To minimize extensive travel or moving the locomotives
To MODEL the way a human troubleshooter works Months of knowledge acquisition 3 years of prototyping
A novice engineer or technician can perform at an expert’s level On a personal computer Installed at every railroad repair shop served by GE
Participants in Expert Systems
Domain expert The individual or group whose expertise and
knowledge is captured for use in an expert system Knowledge user The individual or group who uses and benefits from
the expert system Knowledge engineer Someone trained or experienced in the design,
development, implementation, and maintenance of an expert system
Determining requirements
Identifying experts
Construct expert system components
Implementing results
Maintaining and reviewing system
Expert Systems Development
Domain • The area of knowledge
addressed by the expert system.
Inference engine
Explanation facility
Knowledge base
acquisition facility
User interface
Knowledge base
Experts User
Expert System Components
Evolution of Expert Systems Software
Expert system shell Collection of software packages & tools to design,
develop, implement, and maintain expert systems
Ease
of u
se
low
high
Before 1980 1980s 1990s
Traditional programming languages
Special and 4th generation languages
Expert system shells
Expert Systems Shells Software Development Packages
Exsys InstantTea K-Vision KnowledgePro
Applications of Expert Systems
PROSPECTOR: Used by geologists to identify sites for drilling or mining
PUFF: Medical system for diagnosis of
respiratory conditions
Applications of Expert Systems
DESIGN ADVISOR: Gives advice to
designers of processor chips
MYCIN: Medical system for
diagnosing blood disorders. First used in 1979
Applications of Expert Systems
DENDRAL: Used to identify the structure of chemical compounds.
First used in 1965
LITHIAN: Gives advice to archaeologists
examining stone tools
Expert Systems Development Alternatives
low
high
low high
Development costs
Time to develop expert system
Use existing package
Develop from shell
Develop from
scratch
Expert Systems Benefits Enhancement of Problem Solving and Decision Making
Improved Product and Decision Quality
Increased Output and Productivity
Decreased Decision Making Time
Capture Scarce Expertise
Can Work with Incomplete or Uncertain Information
Knowledge Transfer to Remote Locations
Problems and Limitations of Expert Systems
Domain experts not always able to explain their logic and reasoning ES work well only in a narrow domain of knowledge Knowledge engineers are rare and expensive Expert system users have natural cognitive limits Lack of trust by end-users ES may not be able to arrive at valid conclusions ES may sometimes produce incorrect recommendations Lacks common sense Cannot make creative responses as human expert Cannot adapt to changing environments
Conclusion Classic expert systems are especially good for closed-system applications with precise inputs and logical outputs. They use expert knowledge in the form of rules and, if required, can interact with the user to establish a particular fact. A major drawback is that human experts cannot always express their knowledge in terms of rules or explain the line of their reasoning. This can prevent the expert system from accumulating the necessary knowledge, and consequently lead to its failure.
Summary Expert, neural and fuzzy systems have now matured and been applied to a broad range of different problems, mainly in engineering, medicine, finance, business and management. Each technology handles the uncertainty and ambiguity of human knowledge differently, and each technology has found its place in knowledge engineering. They no longer compete; rather they complement each other. A synergy of expert systems with fuzzy logic and neural computing improves adaptability, robustness, fault-tolerance and speed of knowledge-based systems. Besides, computing with words makes them more “human”.
R Language Statistic analysis on the fly Mathematical function and graphic module embedded FREE! & Open Source!
R Tops Data Mining Software Poll
For the past 12 years, KDNuggets has conducted an annual poll asking "What analytics/data mining software you used in the past 12 months for a real project (not just evaluation)". In this year's poll, R was the top-ranked data mining solution, selected by 30.7% of poll respondents. Microsoft Excel was second, at 29.8%. Rapidminer, which took the #1 spot over R in 2011 and 2010, ranked third. And as Bob Muenchen notes, four of the top five ranked data mining solutions in this year's poll are open-source. R was also ranked in this poll as the most popular language for implementing data mining application, beating out SQL and Java.
Important Problems in Data Mining
Prediction Finding patterns (Apriori) Clustering Classification
Regression Ranking Density Estimation
Prediction For most of the following algorithms (as well as linear regression), we would in practice first generate the model using training data, and then predict values for test data. To make predictions, we use the predict function. Typically, the first argument is the variable in which you saved the model, and the second argument is a matrix or data frame of test data. For instance, if we were to predict for the linear regression model above, and x1 test and x2 test are vectors containing test data, we can use the command
>predicted_values<-predict(lm_model, newdata=as.data.frame(cbind(x1_test, x2_test)))
Finding patterns (Apriori) In large datasets -e.g. (Diapers → Beer). Use Apriori! To run the Apriori algorithm, first install the arules package and load it. Note that the dataset must be a binary incidence matrix; the column names should correspond to the “items” that make up the “transactions.” The following commands print out a summary of the results and a list of the generated rules.
> dataset <-read.csv("C:\\Datasets\\mushroom.csv", header = TRUE) > mushroom_rules <-apriori(as.matrix(dataset), parameter = list(supp = 0.8, conf = 0.9)) > summary(mushroom_rules) > inspect(mushroom_rules)
Clustering grouping data into clusters that “belong” together -objects within a cluster are more similar to each other than to those in other clusters. Kmeans, Kmedians Input: {xi}mi=1,xi ∈X ⊂ Rn Output: f : X →{1,...,K} (K clusters) clustering consumers for market research, clustering genes into families, image segmentation (medical imaging) If X is the data matrix and m is the number of clusters, then the command is:
> kmeans_model <-kmeans(x=X, centers=m)
Classification
Input: {(xi,yi)}m “examples,” “instances with labels,” “observations” xi ∈X,yi ∈ {−1, 1} “binary” Let X train and X test be matrices of the training and test data respectively, and labels be a binary vector of class attributes for the training examples. For k equal to K, the command is: > knn_model <-knn(train=X_train, test=X_test, cl=as.factor(labels), k=K)
automatic handwriting recognition, speech recognition, biometrics, document classification
Identifying to which of a set of categories a new observation belongs, on the basis of a training set of data.
Decision trees: rpart, party Random forest: randomForest, party SVM: e1071, kernlab Neural networks: nnet, neuralnet, RSNNS Performance evaluation: ROCR
Regression Input: {(xi,yi)}mi=1, xi ∈X,yi ∈ R Output: f : X→ R predicting an individual’s income, predict house prices, predict stock prices, predict test scores the command is: > glm_mod <-glm(y ∼ x1+x2, family=binomial(link="logit"), data=as.data.frame(cbind(y,x1,x2)))
Ranking in between classification and regression. Search engines use ranking methods
Density Estimation predict conditional probabilities {(xi,yi)}mi=1, xi ∈X,yi ∈ {−1, 1} Output: f : X→ [0, 1] as “close” to P(y =1|x) as possible. estimate probability of failure, probability to default on loan
Training and Testing for supervised learning
Training: training data are input, and model f is the output Testing: You want to predict y for a new x, where (x, y) comes from the same distribution as Compute f(x) and compare it to y. How well does f(x) match y? Measure goodness of f using a loss function Rtest(f) Rtest is also called the true risk or the test error We want Rtest to be small, to indicate that f(x) would be a good predictor (“estimator”) of y called the true risk or the test error
Time series decomposition: decomp(), decompose(), arima(), stl()
Time series forecasting: forecast Time Series Clustering: TSclust Dynamic Time Warping (DTW): dtw
Time Series Analysis with R
Packages: igraph, sna Centrality measures: degree(), betweenness(),
closeness(), transitivity() Clusters: clusters(), no.clusters() Cliques: cliques(), largest.cliques(), maximal.cliques(),
clique.number() Community detection: fastgreedy.community(),
spinglass.community()
Social Network Analysis with R
Scatter plot dataset <-read.csv ('fbgood.txt',head=TRUE, sep='t', row.names=1) x = dataset$friends y = dataset$getgoods plot(x,y)
Linear Fit fit <- lm(y ~ x); abline(fit, col = 'red', lwd=3)
2nd order polynomial fit plot(x,y) polyfit2 <- lm(y ~ poly(x, 2)); lines(sort(x), polyfit2$fit[order(x)], col = 2, lwd = 3)
3rd order polynomial fit plot(x,y) polyfit3 <- lm(y ~ poly(x, 3)); lines(sort(x), polyfit3$fit[order(x)], col = 2, lwd = 3)
Packages: RHadoop, RHive RHadoop10 is a collection of 3 R packages:
rmr2 - perform data analysis with R via MapReduce on a Hadoop cluster
rhdfs - connect to Hadoop Distributed File System (HDFS) rhbase - connect to the NoSQL HBase database
You can play with it on a single PC (in standalone or pseudo-distributed mode), and your code developed on that will be able to work on a cluster of PCs (in full-distributed mode)!
Step by step to set up my first R Hadoop system http://www.rdatamining.com/tutorials/rhadoop
¹⁰https://github.com/RevolutionAnalytics/RHadoop/wiki
R and Hadoop
An Example of MapReducing with R
library(rmr2) map <- function(k, lines) { words.list <- strsplit(lines, "s") words <- unlist(words.list) return(keyval(words, 1)) } reduce <- function(word, counts) { keyval(word, sum(counts)) } wordcount <- function(input, output = NULL) { mapreduce(input = input, output = output, input.format = "text", map = map, reduce = reduce) } ## Submit job out <- wordcount(in.file.path, out.file.path)