hosting and using “r” with azure ml
TRANSCRIPT
Hosting and Using “R” with Azure ML
November 18, 2014
Meetup Azure Machine Learning
http://www.meetup.com/New-York-Azure-Machine-Learning-Meetup/
Why host “R” in Azure ML
• R has great depth and breadth in many areas
• Very high value but not always easy to transition to a pluggable solution callable by other processes or services
• Can be combined with existing Azure ML modules (mix and match)
• ORSimply host a working R solution as a web service
Common challenge
Focus on right hand side…
One Azure ML module to learn and use
More input ports than usual
Output ports for “data” and R device output
Adding additional “R” packages/scripts…
From script to running web service
Using the web service…
references
• https://marketbasket.cloudapp.net/
• https://datamarket.azure.com/dataset/amla/mba
• https://azureinfo.microsoft.com/CO-Azure-WBNR-FY15-11Nov-OperationalizingRasaWebService-ThankYou.html?aliId=8371905
• http://azure.microsoft.com/en-us/documentation/articles/machine-learning-r-csharp-web-service-examples/
Next meetup – will be mid-January 2015
• Will post slides and script from tonight to
• http://www.meetup.com/New-York-Azure-Machine-Learning-Meetup/files/
• Remaining slides have R script and illustrations.
R scripts used last night
• Loading and referencing an external R package• Note: make sure follow steps on slide 8
• #this is trivial and just used to show package load and testinstall.packages("src/slam_0.1-32.zip",lib=".",repos=NULL,verbose=TRUE)install.packages("src/clue_0.3-48.zip",lib=".",repos=NULL,verbose=TRUE)install.packages("src/skmeans_0.2-6.zip",lib=".",repos=NULL,verbose=TRUE)library(skmeans,lib.loc=".",verbose=TRUE)#our package and libraries should be loaded up#stuff <- packages.installed(skmeans)#dataset1 <- maml.mapInputPort(1) # class: data.frame#dataset2 <- maml.mapInputPort(2) # class: data.framesamp <-matrix(sample.int(1000,size=20*50,replace=TRUE),nrow=20,ncol=500,dimnames=list(1:20,1:500))fit <-skmeans(samp,5)result <- data.frame(list(rownames(samp),fit$cluster),row.names=NULL)colnames(result) <- c("sample row","cluster")print(result) #R console output
Simple kmeans cluster
mydata <- maml.mapInputPort(1) # get our data from the R script input module instead of inline – this is the web service input signature
# parse and structure the input data to become a dataframe for the clustering
data.split <- strsplit(mydata[1,1], ",")[[1]]
data.split <- sapply(data.split, strsplit, ";", simplify = TRUE)
data.split <- sapply(data.split, strsplit, ";", simplify = TRUE)
data.split <- as.data.frame(t(data.split))
data.split <- data.matrix(data.split)
data.split <- data.frame(data.split)
# K-Means Cluster Analysis
fit <- kmeans(data.split, mydata$k) # k-cluster solution
# get cluster means
aggregate(data.split,by=list(fit$cluster),FUN=mean)
# append cluster assignment
mydatafinal <- data.frame(t(fit$cluster))
n_col=ncol(mydatafinal)
colnames(mydatafinal) <- paste("V",1:n_col,sep="")
mydatafinal
maml.mapOutPortPort(mydatafinal) # this will become the web service publishing port – i.e. what is returned – output must be a dataframe…
Input schema and sample data for kmeansthis is hosted in the “R” script module
mydata <- data.frame(value = "1; 3; 5; 6; 7; 7, 5; 5; 6; 7; 2; 1, 3; 7; 2; 9; 56; 6, 1; 4; 5; 26; 4; 23, 15; 35; 6; 7; 12; 1, 32; 51; 62; 7; 21; 1", k=5, stringsAsFactors=FALSE)
maml.mapOutputPort("data"); # this is the key as it wires the sample schema above to the downstream receiver (see the next illustration)
A model that is ready to be published as a web service.Note the publishing icons on the lower modules input and output ports.
Input Schema
Simple kmeans cluster script