how to extend rapidminer 5

Upload: majdoline19

Post on 10-Feb-2018

262 views

Category:

Documents


1 download

TRANSCRIPT

  • 7/22/2019 How to Extend RapidMiner 5

    1/92

    How toExtend

    RapidMiner 5

  • 7/22/2019 How to Extend RapidMiner 5

    2/92

  • 7/22/2019 How to Extend RapidMiner 5

    3/92

    How to Extend RapidMiner 5White Paper

    Rapid-Iwww.rapid-i.com

  • 7/22/2019 How to Extend RapidMiner 5

    4/92

    c 2012 by Rapid-I GmbH. All rights reserved.

    No part of this publication may be reproduced, stored in a retrieval system, ortransmitted, in any form or by means electronic, mechanical, photocopying, orotherwise, without prior written permission of Rapid-I GmbH.

  • 7/22/2019 How to Extend RapidMiner 5

    5/92

    Contents

    1 Introduction 1

    2 Using the Scripting Operator 32.1 Writing the Script . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.2 Connecting with other Operators . . . . . . . . . . . . . . . . . . . 6

    3 The RapidMiner data storage strategy 93.1 The Example Table . . . . . . . . . . . . . . . . . . . . . . . . . . . 103.2 The ExampleSet and its Attributes . . . . . . . . . . . . . . . . . . 113.3 More than one ExampleSet . . . . . . . . . . . . . . . . . . . . . . . 143.4 Changing data on the y . . . . . . . . . . . . . . . . . . . . . . . . 153.5 The ExampleSet layer stack . . . . . . . . . . . . . . . . . . . . . . 16

    4 Creating your own Extension 19

    5 Building Operators 215.1 Our rst operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215.2 Adding Ports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225.3 Declaring operators to RapidMiner . . . . . . . . . . . . . . . . . . 245.4 Adding preconditions to input ports . . . . . . . . . . . . . . . . . . 265.5 Adding generation rules to the output ports . . . . . . . . . . . . . 285.6 Adding documentation to the operators . . . . . . . . . . . . . . . . 305.7 Creating super operators . . . . . . . . . . . . . . . . . . . . . . . . 315.8 Adding a PortExtender . . . . . . . . . . . . . . . . . . . . . . . . . 335.9 Adding meta data transformation rules . . . . . . . . . . . . . . . . 34

    I

  • 7/22/2019 How to Extend RapidMiner 5

    6/92

    5.10 Doing the work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365.11 Dening parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . 365.12 Using Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395.13 Adding dependencies to parameters . . . . . . . . . . . . . . . . . . 40

    6 Building special data objects 436.1 Dening the object class . . . . . . . . . . . . . . . . . . . . . . . . 446.2 Processing your own IOObjects . . . . . . . . . . . . . . . . . . . . 466.3 Taking a look into your IOObject . . . . . . . . . . . . . . . . . . . 51

    6.4 Leaving the 80s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

    7 Publishing a RapidMiner Extension 597.1 The extension bundle . . . . . . . . . . . . . . . . . . . . . . . . . . 597.2 The ant build le . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

    8 Using advanced Extension mechanism 698.1 The PluginInit class . . . . . . . . . . . . . . . . . . . . . . . . . . . 698.2 Adding custom congurators . . . . . . . . . . . . . . . . . . . . . . 70

    8.2.1 Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 718.2.2 Customizing the conguration panel . . . . . . . . . . . . . . 758.3 Adding custom GUI elements . . . . . . . . . . . . . . . . . . . . . 788.4 Adding custom actions to the GUI . . . . . . . . . . . . . . . . . . . 81

  • 7/22/2019 How to Extend RapidMiner 5

    7/92

    1 Introduction

    If you are reading this tutorial, you probably have already installed RapidMiner 5and gained some experience by playing around with the enormous set of operators.Chances are that you already have been part of the RapidMiner Community forsome time and it already has been quite a while ago, since you last developedyour own extension. Back then you might have developed for RapidMiner 4.x,in which case you will probably notice the great number of changes from version4.6 to 5.0 immediately:

    The new ow layout gives a complete new quality of insight into your pro-cesses, even for untrained users.

    The typed ports give detailed information what kind of input is desired andmake process design a much simpler game.

    Where you had to remember the name of attributes in earlier versions, younow can select them from a drop-down menu, even if the process has neverbeen run!

    These and several other improvements make the life of todays data analystsmuch easier and they can spend much more time with their family instead of having to wait for a restarted process because of a typo in an attributes name.

    But even with the huge amount of functions provided by RapidMiner, sometimesyou have a problem at hand, that is unsolvable or only solvable with what seemsto be a too complex process. Then you have two choices:

    1

  • 7/22/2019 How to Extend RapidMiner 5

    8/92

    1. Introduction

    On the one hand you could use the built-in scripting operator for writing a quickand dirty hack. If this solves your problem, very well, go ahead. Chapter Usingthe Scripting Operator will illustrate how to access the RapidMiner API withouteven starting an IDE.

    The other solution is to build your own extension to RapidMiner, providing newoperators and new data objects with all the functionality of RapidMiner 5. Thisoption is more heavy weight, so it really depends on the task at hand and theneed for reusability, if its worth to go this way.

    If its a more general problem or if you are going to implement something likea new learning scheme, building an extension is denitively the best way to letthe community participate in your work: You let all members prot from yourachievements and they will give you valuable feedback. And always keep in mind,that its a good feeling to know, that your piece of software is still used by someoneand you didnt waste all the time you spent hunting bugs.

    As a more experienced user, you might already have written a plug-in for the oldversions of RapidMiner. Then you will be confronted with the down-side of allthe advantages of version 5: We unfortunately had to break with the backwardcompatibility to 4.x. All these features simply didnt t into the old plug-inframework, and so we decided to rather publish a new extension mechanism thanarticially limiting its possibilities. Thats why you will have to change somecode in order to port your old plug-ins to RapidMiner 5.

    Where we thought it helpful, there will be short hints. For easily recognizingthese paragraphs, they will be shaded with light gray, so that you might skipuninteresting parts without missing valuable information.

    2

  • 7/22/2019 How to Extend RapidMiner 5

    9/92

    2 Using the Scripting Opera-tor

    Using the Scripting Operator Lets assume we have the following situation: Weget data from a machine, that counts the seconds since it was switched on. Eachentry in this log le has this time stamp. Unfortunately other data sources we aregoing to use have an absolut time stamp. So we have to transform the relativeformat into a regular date and time format. Since RapidMiner doesnt provide an

    operator solving this particular problem, we decide to write a small script. Thisproblem doesnt seem to be worth the effort of building a complete extension,because we cant believe there are many other machines around, that dont havean integrated clock, and so dont expect to be able to reuse an extension. Hencewe prefer to build a simple process, which should do the trick:

    Figure 2.1: A simple process for applying a script

    As a rst step we are going to load the data and then directly apply our script.As a last step we will do some date adjustment, but we will come back to thislater. After loading we have an ExampleSet consisting of a number of attributes,describing the machines state. They are called att1 , att2 to att500 . The time

    3

  • 7/22/2019 How to Extend RapidMiner 5

    10/92

    2. Using the Scripting Operator

    stamp is contained in an attribute named relative time . During scripting wemight ignore the states attributes. We just want to focus on the one singleattribute relative time .

    Next we insert an Execute Script operator. It lets us implement a simple programwritten using the Groovy scripting language. This script can be entered in thescript parameter of the operator. The language is quite equal to Java, but if youneed further documentation, you may refer to the Groovy homepage at http://groovy.codehaus.org/ .

    2.1 Writing the Script

    In the rst step we have to get access to the ExampleSet thats delivered to therst port by the Retrieve operator.

    1 ExampleSet exampleSet = input [ 0 ] ;

    We now have the ExampleSet stored in a local variable and might use the wholeRapidMiner API for accessing data. Since we are going to transform the relativetime attribute we utilize the Attribute object of the example set to retrieve thisAttribute :

    1 A t t r i b u t e s a t t r i b u t e s = e xa mp le Se t . g e t A t t r i b u t e s ( ) ;2 A t t r i b u te s o u r c e A t t r i b u t e = a t t r i b u t e s . g e t ( r e l a t i v e t im e ) ;

    We now have access to the attribute and its values stored inside the single ex-

    amples. But we want to create a new date attribute and we cannot change thetype of an existing attribute. So we have to create a new one. We could give itany arbitrary name, but for now it seems to be reasonable to just wrap a date() around the old name. Therefore we extract the old name and create a newAttribute object:

    1 St r ing newName = ( da t e ( + sou rceAt t r i bu t e . getName( ) + ) ;2 At t r i bu t e t a rg e tA t t r i bu t e = A t t r i bu t eFac to ry . c r ea t eAt t r i b u t e (newName

    , Ont ol og y .DATE TIME) ;

    4

    http://groovy.codehaus.org/http://groovy.codehaus.org/http://groovy.codehaus.org/http://groovy.codehaus.org/
  • 7/22/2019 How to Extend RapidMiner 5

    11/92

    2.1. Writing the Script

    If we execute this script, it will crash, because it doesnt know the Ontology class,which denes the value types of RapidMiners attributes. To solve this problem,we have to import it manually, as we would have to do with any class, thats notpart of the standard imports. So we will add the following line at the top of thescript:

    1 import com. rap idminer . t oo ls . Ontology ;

    To put it all together, we should have a script like this:1 import com. rap idminer . t oo ls . Ontology ;2

    3 ExampleSet exampleSet = input [ 0 ] ;4 A t t r i b u t e s a t t r i b u t e s = e xa mp le Se t . g e t A t t r i b u t e s ( ) ;5 A t t r i b u te s o u r c e A t t r i b u t e = a t t r i b u t e s . g e t ( r e l a t i v e t im e ) ;6 St r ing newName = ( da t e ( + sou rceA t t r i bu t e . getName( ) + ) ;7 At t r i bu t e t a rge tA t t r i bu t e = A t t r i bu t eFac to ry . c r e a t eAt t r i bu t e (newName

    , On to logy .DATE TIME) ;

    Now we have created a new attribute, but it has not been attached to any of theunderlying data columns, yet. What we have to do now, is to connect the newAttribute with the values of the old one. We could insert a new column into thedata table, or just reuse the old. Since reusing saves copying of the data, we takethis approach here. The mechanics of the data storage will be described in thenext chapter in detail.

    1 t a r ge t At t r ib u te . se tTableIndex ( sour ceA t t r ibu t e . ge tTableIndex ( ) ) ;

    Now the new date attribute will use the old integer values as if they wouldhave been dates. The problem is that the formats are not compatible: The

    date attribute will save dates using milliseconds after the 1st

    of January 1970.The integer in our attribute contained the seconds after the rst start up of themachine. At rst we will tackle the problem with the wrong unit. We haveto multiply each entry with 1000 to convert the seconds to milliseconds. Theproblem is, that we cannot access the new attribute yet, because it isnt part of the example set. We will change that, by adding it to the example sets attributesand removing the old attribute:

    1 a t t r i b u t e s . addRegula r ( t a rge tA t t r i b u t e ) ;2 a t t r ib u t es . remove( sour ceA t t r ibu t e ) ;

    5

  • 7/22/2019 How to Extend RapidMiner 5

    12/92

    2. Using the Scripting Operator

    Only thing we have to do now is to iterate over all examples, get the value of theattribute, multiply it with 1000 and write it back. This is fairly easy:

    1 fo r (Example example : exampleSet) {2 double t imeStampValue = example . getValue ( ta rg et At tr ib ut e ) ;3 example . setV alue ( ta rge tA ttr ibu te , t imeStampValue 1000) ;4 }

    All we have to do now is to return the example set. If we want to return morethan one data object, we could wrap it in an array. The outgoing ports of the

    script operator will deliver the corresponding object in the array: The rst portthe rst element of the array, the second the second and so on. This time, wesimply could return the single object, because we only have one output. Thecomplete code now looks like:

    1 import com . rap idmi ner . to ol s . Ontology ;2

    3 ExampleSet exampleSet = input [ 0 ] ;4 A t t r i b u t e s a t t r i b u t e s = e xa mp le Se t . g e t A t t r i b u t e s ( ) ;5 A t t r i b u te s o u r c e A t t r i b u t e = a t t r i b u t e s . g e t ( r e l a t i v e t im e ) ;6 St r ing newName = ( da t e ( + sou rceAt t r i bu t e . getName( ) + ) ;7 At t r i bu t e t a rg e tA t t r i bu t e = A t t r i bu t eFac to ry . c r ea t eAt t r i b u t e (newName

    , Ont ol og y .DATE TIME) ;8 t a r ge t At t r ib u te . se tTableInde x ( sour ceA t t r ibu t e . ge tTableIndex ( ) ) ;9 a t t r i b u t e s . addRegu la r ( t a rge t A t t r i bu t e ) ;

    10 a t t r i b u t e s . remove ( sou rceA t t r i bu t e ) ;11

    12 fo r (Example example : exampleSet) {13 double t imeStampValue = example . getValue ( ta rg et At tr ib ut e ) ;14 example . setV alue ( ta rge tA ttr ibu te , t imeStampValue 1000) ;15 }16

    17 return ( exampleSet ) ;

    2.2 Connecting with other Operators

    If you take a look at the screenshot above showing the process, we have connectedthe rst port of the scripting operator with the following operator. We want to

    6

  • 7/22/2019 How to Extend RapidMiner 5

    13/92

    2.2. Connecting with other Operators

    use this operator to adjust the date: We have written a script to transform theseconds after startup time into a date format. But this is now relative to the1st January 1970 and not to the startup time. So we want to use the AdjustDate operator to correct this. With correct parameter settings, it will add thedifference between the startup time of the machine and the 1 st January 1970.But when trying to select the correct attribute, we notice one of the limitationsof the scripting operator: It doesnt take care of the meta data of data objects.Every information in the meta data is lost and so one cannot select the attributesin the drop down list, we have to type it manually. The process then works, butif you have become used to the benets from the meta data transformation, youprobably wont like to loose them, especially not in a more complex process setup.The only way of not loosing them when writing your own code is to build yourown Extension to RapidMiner. The next chapters will show how this works, andhow meta data can be treated correctly.

    7

  • 7/22/2019 How to Extend RapidMiner 5

    14/92

  • 7/22/2019 How to Extend RapidMiner 5

    15/92

    3 The RapidMiner data storagestrategy

    Chances are that you have made rst contact with the RapidMiner API foraccessing data in the script above. If you are already an experienced Rapid-Miner developer and have already written plug-ins for RapidMiner 4.x, you arealready familiar with the underlying data structures, you might skip this part.Although there have been several improvements in details, the concepts havent

    been changed.

    If you still read this, you might ask, why theres a complete section about sucha simple thing like storing data. But storing data isnt as simple as it sounds, if we have certain requirements like they occur frequently in data mining tasks.

    High data volume with both a high number of rows which might grow intothe millions and in the same time a high number of columns. Especially intext mining tasks, working on over 100.000 columns is very common.

    Data might be sparse, that means that only a very small fraction of entriesdiffers from a default value.

    Data is accessed in many different ways, sequentially or in random order,read or written or both.

    Data manipulation is crucial, but not only single values have to be altered.In many applications hole columns or rows must be added or removed. Forcross-validation complete folds have to be selected or deselected.

    9

  • 7/22/2019 How to Extend RapidMiner 5

    16/92

    3. The RapidMiner data storage strategy

    Data might be of different types like numbers, dates, times, words or wholetexts.

    Some columns might have a different meaning, as well in reality as forthe analysis. One might be the classication, others might be input fromsensors.

    The order of rows must be changeable; some algorithms need a randomsequence, some other a special ordering.

    These requirements need a special treatment and this makes everything a littlebit more complex. What you have seen in the script example above was thesurface of a layer concept, we will describe in detail now. In the next section wewill begin our introduction with the basement: The ExampleTable .

    3.1 The Example Table

    The ExampleTable is designed for storing the actual raw data. In this rst level,the data hasnt any meaning yet and is always saved as number. It is organizedrow-wise, that means, that the single values are rst bundled into their rows andthese rows are then combined to a table. Hence each row must have exactly thesame number of columns.

    Figure 3.1: The inner structure of an ExampleTable. Columns exist only logicallyas indicated by the dotted lines.

    10

  • 7/22/2019 How to Extend RapidMiner 5

    17/92

    3.2. The ExampleSet and its Attributes

    We see this in the image above, where the single numerical values are shown asblack boxes inside the grey boxes of the rows. The columns are logically present,that means each value can be addressed using the column index, but since thecolumns are not represented by objects, they are only indicated by the dottedlines. The ExampleTable combines an arbitrary number of these rows, which arerepresented by the DataRow interface.

    There are some different implementations of the DataRow interface, using eitherdifferent java number types like double , oat or int for data storage or saving the

    row in a sparse manner: Values different from zero are stored together with anindex, so if one retrieves the value of column x, the array of indices is searched forx, if found the respective value will be returned. The different data types may savememory consumption hence a oat only consumes four bytes and saves the fourbytes compared to a double. But this is paid with a loss of precision: Roundingerrors might occur, or if you switch to integer representation, the fractional partis lost.

    3.2 The ExampleSet and its Attributes

    Semantics and typing are introduced in the next layer, the ExampleSet layer.AnExampleSet is built on top of an ExampleTable and will represent the ExampleTables columns and rows as Attributes and Examples. For the sake of simplicity wewill rst stick to numerical Attributes. The following image shows how a simpleExampleSet is connected with an underlying ExampleTable. It consists of only forattributes called att1, att2, att3 and att4 and has a size of only two Examples.

    In this image, the long dashed lines are references, while the dotted lines arestanding for implicit logical content thats not really stored there. We can seethat the examples are not materialized; they just consist of a reference on therespective row in the table and the ExampleSet s Attributes. Thats the reason,why under no circumstances one should try to keep references to examples: Theyare only views on the underlying row of the table. If the rows values are changedor the complete row discarded, accessing the values will fail, or even worse deliverunexpected wrong results!

    11

  • 7/22/2019 How to Extend RapidMiner 5

    18/92

    3. The RapidMiner data storage strategy

    Figure 3.2: A simple ExampleSet build a top of an ExampleTable. References

    are shown by the long dashed lines.

    The Attributes are used to access the correct column in the table. As depicted,att3 references column four in the table, while att4 references the third column.Theres no specic guarantee on the ordering, the attributes keep track of thecolumns they refer to. The mechanism to retrieve a value by calling getValue(Attribute) on an example is as follows:

    1. TheExample

    will retrieve the correspondingDataRow

    from itsExampleSet

    parent ExampleTable .

    2. The Example will ask the DataRow to deliver the value of the Attribute bycalling get(Attribute)

    3. The DataRow will ask the Attribute to retrieve the value from the correctcolumn of itself by invoking getValue(DataRow) .

    The same way is used when writing values into an Example. Although this

    12

  • 7/22/2019 How to Extend RapidMiner 5

    19/92

    3.2. The ExampleSet and its Attributes

    mechanism seems to be more complex than it needs to, we will see, that it allowsa exible view concept that wouldnt be possible otherwise. Anyway we are nowfamiliar how to retrieve values, but as mentioned above, we have concentratedour focus on numerical values. How are nominal values stored and accessed? Theunderlying ExampleTable only stores numbers, so how should this be possible? Thekey to this is the Attribute object. It does not only store a name, that is printedbold in the picture above, and not only a type like numerical, nominal or date,but it also may contain a NominalMapping. This object is a Map, translating thenumerical values into Strings and vice versa. So if you want to set an Examplesvalue of a nominal attribute, you might call:

    1 example . se tValue ( a t t r ib u te , new va lue ) ;

    And for getting the nominal value:

    1 Str ing va lue = example . ge tNominalValue( a t t r ibu te ) ;

    If the value is unknown a new entry in the mapping will be created. The index of this mapping will be stored as numerical value in the ExampleTable . So be carefullywhen directly manipulating the ExampleTable or when accessing the indices behindthe nominal values! Changes might result in undesired behaviour. The methodsfor manipulating the numerical values look quite different and we have used themalready in the script example. Anyway we will describe them again in more detail:

    1 double v a l u e = 9 d ;2 example . se tValue ( a t t r ib u te , va lue ) ;

    And for getting the nominal value:

    1 double value = example . ge tValue ( a t t r ib u t e ) ;

    One special value is the missing value . There are several possibilities why aspecic value might be missing and we have to cope with that. In RapidMinerseveral operators handle missing values, but what do we do during programming?Missing values are simply encoded as Double.NaN . So you will receive a NaN whengetting the value and have to pass a NaN when you want to set a value unknown.On nominal attributes you simply could pass null as String for the nominal value.

    13

  • 7/22/2019 How to Extend RapidMiner 5

    20/92

    3. The RapidMiner data storage strategy

    Beside from being used for accessing the data, the Attribute object holds addi-tional information about the column. We already have seen that an Attribute isof a certain type, which is depicted by the small n in the graphic, n for numericalattributes, nom for nominals. There are a few other types like date, time andthe subtypes of nominal text, polynominal and binominal.

    How the attribute is used during analysis is controlled by its role. There areseveral predened roles like label and prediction , cluster , weight , batch and severalmore. You are free to set user dened roles in RapidMiner using the Set Role

    operator, but these are not interpreted by RapidMiner operators. All attributeswith a role have in common, that they are not treated as regular attributes andhence are not used for analysis, if not required as their special role like the labelfor learning from examples. The Attributes object of an ExampleSet manages thespecial roles. It offers several methods for manipulating these rules. Please keepin mind, that iterating over the single Attribute s of an Attributes Object does onlyiterate over the regular attributes! If you want all attributes the allAttributes ()method must be used.

    3.3 More than one ExampleSet

    We have seen how an ExampleSet works on top of its ExampleTable . In RapidMinerone frequently is confronted with situations, where more than one ExampleSet ata time is processed. Does every ExampleSet have its own ExampleTable , even if they differ only in the presence of some Attributes? No, they havent. Multiple

    ExampleSets can share one ExampleTable , although each ExampleSet can only referto one ExampleTable. So it is possible, that there are different attribute sets,giving a view on the same underlying data.

    In the image above two ExampleSets are sharing a common table. The rst at-tribute of the second ExampleSet even shares a complete column with the otherExampleSet , although this doesnt have to be the case as seen on the other columns.The columns are kept until no ExampleSet references them and then are removedfrom memory.

    14

  • 7/22/2019 How to Extend RapidMiner 5

    21/92

    3.4. Changing data on the y

    Figure 3.3: Two ExampleSets sharing an ExampleTable

    The setting above is frequently used for example in an attribute selection process.We dont want to remove the column from memory each time we de-select anattribute to test the performance of the remaining set. In most of the times

    we have to re-add it later and it would not be efficient to reload the completeExampleSet , instead, we simply might use a copy of the original ExampleSet or addthe Attribute again.

    One potential danger, one always has to keep in mind, is marked by the redcells. They are shared now in two ExampleSet s. If we are going to change thevalue in one of the ExampleSet s it will be changed in the other one, too, becausethe underlying data is changed. This can be very confusing, especially if theattributes have different names (here att1 and kunde). Please take care of this,

    by either building a materialized copy in your RapidMiner process or using onthe y calculations for the changed values.

    3.4 Changing data on the y

    There are many situations, where you want to change all values of a column in anequal way, but dont want to alter the underlying data. Take the normalization

    15

  • 7/22/2019 How to Extend RapidMiner 5

    22/92

    3. The RapidMiner data storage strategy

    for an example, where each value is transformed in the same way, but you mustuse the same data elsewhere in the process. In this case you can make thecalculation each time a value is requested. This might even save computationtime and memory, if the values are requested only once, like it is frequent thecase when applying a model or even during training for some models.

    The class that does this is the ViewAttribute . It wraps around another Attribute ,which can even be another ViewAttribute , to retrieve the value and then delegatesthe actual computation to a ViewModel. The computed value is then returned

    as result. One Attribute can be shared by several ViewAttribute s. The image belowdepicts this.

    Figure 3.4: Two binominal ViewAttributes indicate if the numerical att3 was ei-ther 1 or 2

    3.5 The ExampleSet layer stack

    With all this functionality described above, we cant solve problems like samplingor sorting. This is achieved by stacking ExampleSets of different functionality.One might reorder the examples by storing an array for translating the indices.Another might skip some examples of the underlying set and realize a samplingthis way. All this can be done with different subclasses of ExampleSet . Please referto the JavaDoc for further information. Each of them delegates the functions thatare not used to the parent ExampleSet . So a sampling realizing ExampleSet delegates

    16

  • 7/22/2019 How to Extend RapidMiner 5

    23/92

    3.5. The ExampleSet layer stack

    the attribute handling to its parent. The principle will be shown in the imagebelow, where the attributes are shown in dotted lines to indicate that they areonly logically present.

    Figure 3.5: The stacking of two ExampleSets to realize a sampling. The attributesare take from the parent.

    17

  • 7/22/2019 How to Extend RapidMiner 5

    24/92

  • 7/22/2019 How to Extend RapidMiner 5

    25/92

    4 Creating your own Extension

    When you are going to build your own Extension, you will need Java with version1.6 and above as well as an IDE like Eclipse. The example projects that come withthis tutorial are Eclipse projects, so we strongly recommend using Eclipse, whichis freely available at Eclipse.org . On our website you will nd a tutorial how tocheck out the latest version of RapidMiner from the svn repository. Please testif it starts by creating a debug conguration and starting the RapidMinerGUI class.

    If started from Eclipse, RapidMiner will only allocate as much RAM as de-fault for any java program: 64 MB. Since this is really insufficient for mostreal data mining applications, you will have to increase this. Select Run / DebugCongurations . . . and select the one for RapidMiner. Got to the Arguments tab

    and enter Xmx256m . You might enter any number after Xmx, but ensure thatthat much megabytes of RAM are available. Especially on 32 bit systems themaximum is relatively low around 1.5 GB.

    After you have done this, we will add two additional projects: One is the tuto-

    rial extension that already contains everything described in the next chapters.Whenever you are not sure, there is example code. The other one is an Exten-sion template, where you only change a few le names and entries to adapt it foryour own Extension. You might use it while reading for experimenting with ownimplementations of what is described here.

    Together with this tutorial you got two zip les. Each of them contains one of the projects, which we will now import into Eclipse.

    19

    http://localhost/var/www/apps/conversion/tmp/scratch_5/Eclipse.orghttp://localhost/var/www/apps/conversion/tmp/scratch_5/Eclipse.org
  • 7/22/2019 How to Extend RapidMiner 5

    26/92

    4. Creating your own Extension

    4. Select Import . . . from the File menu.

    5. When the selection menu for the project type opens, select Existing Projectsinto Workspace from the General folder and click next.

    6. The Import Projects page appears. Select the radio button before Selectarchive le : and select one of the two zip les with the Browse button.

    7. The project will be listed in the Projects window. Select it by checking itand click Finish.

    8. The project will show up in the Package Explorer. Repeat the steps for thesecond zip le.

    After this, you should have three projects, and the Package Explorer should looklike the picture below.

    Figure 4.1: Our three projects

    Now you can start implementing. If you are going to deploy your Extension toRapidMiner for testing purpose, you might execute the install target of the ant lebuild.xml . Please make sure that the RapidMiner Vega project is named exactlyas above, because the ant le references RapidMiner. Otherwise the deploymentwouldnt work, without changing the le. We will go into details later, how toadapt the build le.

    20

  • 7/22/2019 How to Extend RapidMiner 5

    27/92

    5 Building Operators

    There are two types of operators in RapidMiner: Normal operators and suchwhich contain one or more sub processes. We call the second type super operator,to differentiate from the normal operators. For getting some training we will startto implement a normal operator. Once nished, we will show how to transfer thesetechniques to the super operators and which special concerns might arise there.

    5.1 Our rst operatorWhat to do, if we want to implement the above functionality in an Extensioninstead of a script? Basically we would have to write a special class. If you madeyou decision for another IDE than Eclipse, create a new project and make sure,that RapidMiner is in the class path, either as .jar le or checked out from source-forge.net in a separate project. Our website contains additional information anda guide how to check out RapidMiner.

    The next step is to create the new class. Each normal operator has to extendOperator or a subclass of Operator. There are many subclasses for more special-ized operators like learning or preprocessing operators, but we will focus on thesimplest case. If you are interested in more, take a look at the type hierarchy of Operator in the API documentation or the IDE itself.

    If you have created your class, you must implement a one argument constructorreceiving an OperatorDescription as parameter. This is needed by RapidMiner inorder to create the operator. The class le will look like that:

    21

  • 7/22/2019 How to Extend RapidMiner 5

    28/92

    5. Building Operators

    1 package com. rap idminer . opera tor . p repr oce ss i ng . t ran sforma t ion ;2

    3 import com. rap idminer . op er ato r . Operator ;4 import com. rap idminer . opera tor . Oper a torDe scr ip t ion ;5

    6 /* *7 * T hi s i s t he N u me ri c al 2 Da t e t ut or ia l o p er at or.8 *9 * @author Sebas t i an Land

    10 */11 p ub li c c l a s s Numerical2DateOperator extends Operator {12

    13 /* *14 * Cons t ruc to r15 */16 public Numerica l2DateOpera tor ( Opera t orDesc r ip t ion

    d e s c r i p t i o n ) {17 super ( de s c r i p t i on ) ;18 }19 }

    5.2 Adding Ports

    Before writing the working part of the operator, we want to dene ports to getinput from the process or delivering results. Having operators without any portsis not suggested, since the execution order in the process would be undened.

    How to dene these ports? You simply add them as private variable using the

    following lines of code:1 private InputPort exampleSet Input = ge t Inpu tPor ts ( ) . c rea tePo r t (

    example s e t ) ;2 private OutputPort exampleSetOutput = getOutputPorts () . cr eat ePo rt (

    examplese t) ;

    Please mention, that you have to set unique names for the ports of one operator.If you want to follow the name convention, you are recommended to write thenames in lower case and use blanks to separate words. If you would add this

    22

  • 7/22/2019 How to Extend RapidMiner 5

    29/92

    5.2. Adding Ports

    operator to your process, you would see that the two ports are already attached.Heres how it would look like:

    Figure 5.1: Your new operator

    But in contrast to the usual ports of RapidMiner operators, they are simplywhite. Normally the ports are colored in the color of the needed object that hasto be fed into the port. If it is not connected to a port generating an object of the desired type, half of the port will be drawn in a warning red. We will come tothis. For now, we just want to see how we can add some function to the operator.

    For this we have to override the following function:1 @Override

    2 publ ic void doWork() throws Opera torExcept ion {3

    4 }

    The default implementation simply does nothing, but we now can add the func-tion described detailed in the Scripting chapter above. Therefore we just have tochange the method of getting input and delivering the result. Take a look in therst and the last line:

    1 @Override2 publ ic void

    doWork()throws

    Opera torExcept ion {3 ExampleSet exampleSet = exampleSetInput . getData () ;4 A t t r i b u t e s a t t r i b u t e s = e xa mp le Se t . g e t A t t r i b u t e s ( ) ;5 A t t r i b ut e s o u r c e A t t r i b u t e = a t t r i b u t e s . g e t ( r e l a t i v e t im e ) ;6 Str ing newName = da te ( + sour ceA t t r ibu t e . ge tName () + ) ;7 A t t r i b ut e t a rg e t A t t r i b u t e = A t t r ib u t e Fa c t o r y . c r e a t e A t t r i b u t e

    (newName, Ont ol ogy . DATE TIME) ;8 t a rge tA t t r i bu t e . s e tTab le Index ( sou rceAt t r i b u t e . ge tTable Index

    ( ) ) ;9 a t t r i b u t e s . addRegu lar ( t a rg e tA t t r i bu t e ) ;

    10 a t t r i b u t e s . remove ( sou rceA t t r i bu t e ) ;

    23

  • 7/22/2019 How to Extend RapidMiner 5

    30/92

    5. Building Operators

    11

    12 fo r (Example example : exampleSet ) {13 double timeStampValue = example . getV alue (

    t a rge tA t t r i bu t e ) ;14 example . setVa lue ( ta rge tAt tr ibu te , t imeStampValue

    1000) ;15 }16

    17 exampleSetOutput . de l i ve r ( exampleSet ) ;18 }

    We see that one call suffices to retrieve the ExampleSet from the input port. Andthe single line 17 delivers the result to the output port. We could execute thisoperator and would receive the same output as with the scripting operator above.

    If you already have written operators in previous RapidMiner versions, you willremember the two methods getInputClasses and getOutputClasses , which denedthe input and output classes back then. The simplest way is to delete theseneedless methods and create one port per input object. If your operator doesntuse a xed number of objects, you could insert a PortExtender , but we will comeback to this when describing super operators.

    Beside this, you will have to exchange the main working method. Instead of thedeprecated apply method you now have to implement the doWork method. Sinceit doesnt receive anything as input and is of type void, you are forced to usethe ports for retrieving input and delivering output.

    5.3 Declaring operators to RapidMiner

    Once we have implemented an operator, we want to test it in RapidMiner. Unfor-tunately RapidMiner isnt prophetic, (Actually it could be, but using data miningmethods for guessing class usage would be overkill) so we will have to specify itin a le. The le in the template project is called OperatorsTemplate.xml , but ingeneral we will refer to this le as the operator descriptor . RapidMiner knowswhich le is as descriptor, because it is linked with a property in the manifest le

    24

  • 7/22/2019 How to Extend RapidMiner 5

    31/92

    5.3. Declaring operators to RapidMiner

    of the Extensions jar. We dont have to bother now how this works, but we willtake care later on. So lets take a look how to specify operators to RapidMiner:

    1 < ?xml ve rs io n= 1.0 encodin g=UTF 8 stan dalo ne=no? >2 < operatorsname=t emplate ve rs ion= 5.0 docbundle=com/rapidm iner /

    resources / i18n/Opera torsDocTempla te >3 < group key= >4 < group key=da t a t r ans fo rma t i on >5 . . .6 < opera tor >7 < key > n u m e r i c a l t o d a t e < /key >8 < class > com. rapidmi ner . oper ato r .

    p r ep roce s s ing . t r ans fo rma t ion < /class >

    9 < r e p l a c e s > Numerical2Date < / r e p l a c e s >10 < /opera tor >11 . . .12 < /group >13 < /group >14 < / ope ra to r s >

    While the rst line only contains information about the xml format used, thesecond line contains several important properties. The name attribute must be thenamespace as specied in the manifest, version must currently be xed at 5.0. Themost important attribute docbundle must link to another xml le, which containsthe documentation for the operators. There the behavior of each operator shouldbe described in detail to guide other users when utilizing an extension.

    The child tags of operators reect the group structure in RapidMiners New Operatorstree. The group with the empty key corresponds to the invisible root of the

    operator tree. Custom operators and groups might be inserted only as childrenof this root. Each group and operator has a key that should consist only of lowercase letters, digits and underscores. In RapidMiner these keys are translated toa language dependent name using one of the documentation bundles. As youmight see from the above example, operators are simply inserted as child tags of groups. They must contain two child tags: Beside the key tag, there must be aclass tag, containing the qualied class name of the implementing class.

    25

  • 7/22/2019 How to Extend RapidMiner 5

    32/92

    5. Building Operators

    Optionally there might be a replaces tag. It species how this operator was calledin 4.x versions of RapidMiner. If it is set, each operator with that name will bereplaced during import of a 4.x process automatically with this new operator.That might be important for renaming the operators to obey the new namingschema.

    When we have saved a le looking like this, adding an operator to RapidMiner, weonly need to execute the ant target install to deploy the Extension to RapidMiner.The ant target should be executed and its status messages should be logged to

    the Console view. They should look like this:1 c r e a t e J a r :2 [ e cho ] C rea t i ng j a r . . .3 [ echo ] Mani fes t Classpa th :4 [ mkdir ] Crea ted d i r : C: \ RapidMiner Vega \ r e l e a s e \ l i b f i l e s5 [ j a r ] B u il d in g j a r : C: \ RapidMiner Vega \ r e l e a s e \ rapidminer

    TemplateExtension 5.0 . ja r6 [ d e l e t e ] D e l et i n g d i r e c t o r y C : \ RapidMiner Vega \ r e l e a s e \

    l i b f i l e s7 i n s t a l l :8 [move ] Moving 1 f i l e t o C : \ RapidMiner Vega \ l i b \ p l u g i n s9 BUILD SUCCESSFUL

    10 Tot a l t im e : 5 s e c on d s

    Now there should be a rapidminer Template Extension 5.0.jar le in the lib/pluginsdirectory of the RapidMiner project. RapidMiner will load all Extensions on thenext start up.

    Again, for making this work, RapidMiner needs to be stored in the sameworkspace and with the same name as depicted above. Otherwise the path

    entries in the build.xml of the Extension project must be adapted!

    5.4 Adding preconditions to input ports

    As we have seen after restarting RapidMiner, the operator already works, but doesnot alert the user, if nothing is connected or a port delivering an object of wrongtype is connected to the input port. Probably we want to change this behavior

    26

  • 7/22/2019 How to Extend RapidMiner 5

    33/92

    5.4. Adding preconditions to input ports

    to ease the use of the operator. This can be done by adding preconditions to theports. These preconditions will register errors, if they are not fullled and areregistered during construction time of the operator. So we will have to add a fewcode fragments to the constructor. For example this precondition will check if acompatible IOObject is delivered:

    1 public Numer ica l2DateOpera tor ( Ope ra to rDesc r ip t i on des c r i p t i on ) {2 super ( de s c r i p t i on ) ;3

    4 exampleSetInput . addPreco ndit i on ( newSimple Precon dit ion (

    exampleSe tInput , newMetaData( ExampleSet . c l a s s ) ) ) ;5 }

    Since this is one of the most common cases, there exists a shortcut to achievethis. We can specify the target IOObject class already when constructing the inputport:

    1 private InputPor t exampleSet Input = ge t Inpu tPor ts ( ) . c re a teP or t (example se t , ExampleSet . c l a s s ) ;

    There are many more special preconditions, which for example test if an exampleset satises some conditions, if it contains a special attribute of a specic role, orif the attribute with a name is inserted. In this case, we could add a preconditionthat tests, if the attribute relative time is part of the input example set.

    1 exampleSetInput . addPr econ dit ion ( new ExampleSet Precondit i on (exampleSetInput , new Str i ng [ ] { r e l a t i v e t im e } , Ontology .ATTRIBUTE VALUE) ) ;

    The ExampleSetPrecondition is more powerful than required here. In fact, it cancheck not only if xed names are part of the example set, but also if the regularattributes are of a certain type, which special attributes have to be contained andof which type they must be. We dont need this here, so we chose a constructorignoring most options and insert the most general value type for not making anycondition. If we insert the operator into a process without connecting an exampleset output port with our input port, an error is shown. If we attach an exampleset without the relative time attribute, the following warning is shown:

    27

  • 7/22/2019 How to Extend RapidMiner 5

    34/92

    5. Building Operators

    Figure 5.2: A warning is shown if the precondition is not fullled.

    In addition to the getInputClasses / getOutputClasses approach of 4.x now muchmore detailed conditions might be formulated. You might even write your own

    precondition to check on any information that is part of the meta data. Youcould even create your own errors with special error messages and Quick Fixes.

    5.5 Adding generation rules to the output ports

    If we take a look at our process, there is still something missing. Although wenow get the behavior on the input port we know from RapidMiners operators,we still have an uncolored output port and the subsequent operator alerts, thatit doesnt receive the correct object.

    Figure 5.3: Half the way done

    The problem is, that our operator still doesnt do any transformation of the metadata. It already makes use of the meta data to check the preconditions, butdoesnt deliver any meta data to the output port. We can change this by addinggeneration rules in the constructor:

    1 public Numer ica l2DateOpera tor ( Ope ra to rDesc r ip t i on des c r i p t i on ) {2 super ( de s c r i p t i on ) ;3

    4 exampleSetInput . addPr econ dit ion ( new ExampleSetPrecondit ion (exampleSetInput , new Str i ng [ ] { r e l a t i v e t im e } ,

    28

  • 7/22/2019 How to Extend RapidMiner 5

    35/92

    5.5. Adding generation rules to the output ports

    Ontology .ATTRIBUTE VALUE) ) ;5

    6 getTran sformer ( ) . addPassThroughRule ( exampleSetInput ,exampleSetOutput ) ;

    7 }

    This rule will simply pass the received meta data to the output port. This willcause the warning to vanish, but then the meta data doesnt reect the actualdelivered data: As you remember, we change not only the name of one attribute,but also its value type. This should be reected in the meta data and thats whywe have to implement a much more special transformation rule. We can do thisusing an anonymous class, so it will look like this:

    1 getTra nsformer ( ) . addRule( new ExampleSetPassThroughRule(exampleS etInput , exampleSetOutput , Se tR el at io n .EQUAL) {

    2 @Override3 public ExampleSetMetaData modifyExampleSet (

    ExampleSetMetaData metaData) throwsUndefinedParameterError {

    4 return metaData ;5 }6 } ) ;

    Of course this wont do anything except passing the received meta data to theoutput port, as long as we dont change the meta data. But we now have a hook,where we can grab the meta data and change it, so that it reects the changesmade on the data during executing this operator. After adding some meaningfulcode, the method will look like this:

    1 public ExampleSetMetaData modifyExampleS et ( ExampleSetMetaData

    metaData) throws UndefinedParameterError {2 Attr ibut eMet aDat a timeAMD = metaData . getAtt ribute ByName (

    r e l a t i v e t ime ) ;3 i f (timeAMD != nul l ) {4 timeAMD . se tT yp e ( On to log y .DATE TIME) ;5 timeAMD. setName ( dat e ( + timeAMD. getName () + ) ) ;6 timeAMD . s e t Va lu e S e tR e la t io n ( Se tR e la t io n .UNKNOWN) ;7 }8 return metaData ;9 }

    29

  • 7/22/2019 How to Extend RapidMiner 5

    36/92

    5. Building Operators

    If we insert the operator into a process, we will see, that the meta data is nowcorrectly transformed and every alert vanishes. We are now even able to selectthe attribute for the Adjust Date operator in the drop down list.

    Figure 5.4: The result of our work: The meta data correctly describes the result-ing data.

    5.6 Adding documentation to the operators

    Of course it should be natural to add documentation to the program code. Butthis does not help the end user who never sees any part of the program code.So we must give him another help. Unfortunately that will cost us some effortand might double the work for documentation. But users and developers have acompletely different perspective, so it will not be helpful give the user any devel-opers comment anyway. Thats the time to introduce the new documentation

    30

  • 7/22/2019 How to Extend RapidMiner 5

    37/92

    5.7. Creating super operators

    mechanism of RapidMiner 5.

    As we have mentioned above, theres a link to an operator documentation bundlein the operator descriptor le. This le is called OperatorsDocTemplate.xml in thetemplate project we created above. It does not only offer the possibility to entera full length description of the operator, but also assigns a more readable andexplanatory name than the key, as well as a synopsis of the help. The structurethis le must have is quite simple:

    1 < ?xml ve rs io n= 1.0 encodin g=windows 1252 stan dalo ne=no? >2 < opera torHelp >3 < group >4 < key > da t a t r ans fo rma t ion < /key >5 < name > Data Transformation < /name >6 < /group >7 < opera tor >8 < name > ExperimentEmbedder < /name >9 < synops is > . . . < / synops i s >

    10 < help > . . . < / he lp >11 < /opera tor >12 < /opera torHelp >

    The second line contains the xml root node operatorHelp. A sequence consistingof two tags might be added as child to this element: The group and the operatortag. The group tag translates a key of a group into a language specic name.The operator tag offers three child tags. The name tag does the translation of the key, while the synopsis and help might contain arbitrary escaped html textfor documenting the operators behaviour, as one would enter into a body tagof an html page. To escape the text, each and must be exchanged by the

    corresponding xml entities < and > . Please have in mind, that the renderingcapacity of the help window is limited. One should stick to rather simple HTML.

    5.7 Creating super operators

    Sometimes an operator relies on the execution of other operators. And sometimesthese operators should be user dened. Take the cross-validation as an example:

    31

  • 7/22/2019 How to Extend RapidMiner 5

    38/92

    5. Building Operators

    The user might specify the learner and the way how performance is measuredand then it executes these subprocesses as it needs. This section will describehow you can implement your own super operators.

    Lets assume, we have a process that should be executed once every minute,checking something inside a database. If you would have the RapidMiner Enter-prise Analytics Server, this would be only two clicks away. But the order is stucksomewhere inside another department and you need a solution really fast. So letsbuild a super operator that re-executes its inner operators every minute. In order

    to do this, we have again to create a new class, but this time it has to extend theOperatorChain class. The name of the super class is somehow misleading, becausethere is no chain anymore, but we stick to this name because of historical reasons.As with a simple operator, we have to implement a constructor. The empty classlooks like this:

    1 /* *2 * T hi s s up er o pe ra t or w il l e xe cu te i t s i nn er p ro ce ss i n fi n it el y3 * o nc e e ve ry m in ut e .4 * @author Sebas t i an Land5

    */6 p ub li c c l a s s L o o p I n f i n i t e l y extends OperatorChain {7

    8 /* *9 * Cons t ruc to r

    10 */11 public L o o p I n f i n i t e l y ( O p e r a to r D e s c ri p t i on d e s c r i p t i o n ) {12 super ( de sc r ip t i on , Execut ed P roces s ) ;13 }14 }

    In contrast to the simple operator we must give the super constructor the names of the subprocesses, we are going to create inside our super operator. The numberof names we pass to the super constructor determines the number of createdsubprocesses. If you want to follow the naming convention, you should start eachword uppercase and use blanks to separate words. Later we might access thesesubprocesses by index to execute them. But lets rst dene some ports to passdata to the super operator.

    32

  • 7/22/2019 How to Extend RapidMiner 5

    39/92

    5.8. Adding a PortExtender

    5.8 Adding a PortExtender

    We could do this in exactly the same manner we did with the simple operator. Butsince we dont know which data should be passed to the inner process, we wantto do it now in a more general way, so that the user is able to pass any numberand any type of object to the inner process. You might know this behavior fromthe Loop operator of RapidMiner. The code for adding this PortPairExtender lookslike this:

    1 p ri va te f i n a l Por tPa i rEx tende r i npu tPor tPa i rEx tende r = newPor tPa i rEx tende r ( inpu t , ge t Inpu tPo r t s ( ) , ge tSubproces s (0 ) .ge t I nner Sour ces ( ) ) ;

    Beside the PortPairExtender theres also a PortExtender available, but we want anequal number of input and output ports. The PortPairExtender takes care of this,so we dont have to do anything else. Lets take a closer look at the constructor.In addition to the name, we have to specify to which input ports the extendershould attach. The getInputPorts method delivers the input ports of the currentoperator, so the port extender is attached on the left side of the operator box.

    The paired ports are added to the inner sources of the rst subprocess. You see,that you can access the subprocesses via the getSubprocess method. If you arefamiliar with RapidMiners integrated super operators like the Loop operator,you know that there are always input ports on the left and output ports on theright of the subprocess. But for distinguishing these ports from the in- and outputports of the super operator, we call them inner sources and inner sinks. In factan inner source is technically an output port for the super operator, because he

    has to deliver data to this port, while the inner sink is an input port for the superoperator where it can retrieve the output of the subprocesses from.

    If we would want to deliver outputs from our loop, we could add the followingsecond variant of the PortPairExtender to collect the outputs from all iterationsand pass them as a collection to the output of our super operator:

    1 p ri va te f i n a l Col l ec t i ngPor tPa i rEx tende r ou tEx tende r = newCol l ec t i ngPor tPa i rEx tende r ( output , ge tSubproces s (0 ) .ge t I nner Sink s ( ) , ge tOutputPor ts ( ) ) ;

    33

  • 7/22/2019 How to Extend RapidMiner 5

    40/92

    5. Building Operators

    This would result in something like this:

    Figure 5.5: Our port extenders which return a collection on the right

    But since we want to run innitely, we will never return anything. So we omitthis change and get back to the rst PortPairExtender. In order to make aPortExtender work, we have to initialize them during construction time of theoperator. You simply have to add the following line in the constructor:

    1 inputPor tPa i rExtender . s t a r t ( ) ;

    5.9 Adding meta data transformation rules

    To have proper meta data available at the output ports, we have to add somerules. The problem is that we dont know the number of ports, which are createdduring process design time. To cope with that, the port extender itself is able togenerate the correct pass through rules:

    1 getTran sformer () . addRule( inp utPor tPairE xtend er . makePassThroughRule( )) ;

    If we take a look inside our operator, we see a strange behaviour. Although thereis meta data information present at the sources, the inner operators doesnt seemto recognize them. They dont do anything with the information.

    The reason, why this looks like this, is that we have to add a rule dening whenthe subprocess meta data has to be transformed. The ordering of the rulesdenition is crucial, because if the meta data isnt forwarded to the inner ports,theres nothing the meta data transformation of the inner operators can do. Thisline will add the rule:

    34

  • 7/22/2019 How to Extend RapidMiner 5

    41/92

    5.9. Adding meta data transformation rules

    Figure 5.6: The meta data transformation of the inner operators seems to bedead.

    1 getTra nsforme r () . addRule( new SubprocessTransformRule ( ge tSub process( 0) ) ) ;

    After all, with the rules in correct order, our operator looks like this:

    1 p ub li c c l a s s L o o p I n f i n i t e l y extends OperatorChain {2

    3 p ri va te f i n a l Por tPa i rEx tende r i npu tPo r tPa irEx tender = newPor tPa i rEx tende r ( inpu t , ge t Inpu tPo r t s ( ) , ge tSubproces s(0) . ge t Inne rSou rce s ( ) ) ;

    4

    5 /* *6 * Const ructo r7 */8 public L o o p I n f i n i t e l y ( O p e r a t or D e s c r ip t i o n d e s c r i p t i o n ) {9 super ( de sc r ip t i o n , Execut ed P roces s ) ;

    10

    11 inputPor tPa i rExtender . s t a r t ( ) ;12

    13 getTra nsformer () . addRule ( inputP ortPai rExte nder .makePassThroughRule ( ) ) ;

    14 getTra nsformer ( ) . addRule( new SubprocessTransformRule( getS ubpr oce ss (0) ) ) ;

    15 }16 }

    35

  • 7/22/2019 How to Extend RapidMiner 5

    42/92

    5. Building Operators

    5.10 Doing the work

    Whats still missing in our operator is code that calls the subprocess. The ideais pretty simple: First pass the input data to the inner sources, since it neverchanges, we can do this outside the loop. Then loop innitely and execute theinner process. To ensure that we can stop the process using the stop button,we should add the method checkForStop inside the loop. A better alternativeespecially for looping operators is the inApplyLoop method. It will not only check

    if the process must be stopped, but also resets the loop time of this operator, sothat it can be accessed by the Log operator. So we decide for the later:

    1 @Override2 publ ic void doWork() throws Opera torExcept ion {3 input PortPa irExte nder . passDataThrough () ;4 while ( true ) {5 inApplyLoop () ;6 getSubp rocess (0) . execute ( ) ;7 }8 }

    You see that we have full control over which subprocess is executed when.In contrast to the old RapidMiner versions, where the subprocess was ratherimplicitly dened by the position of the child operators inside the chain, theyare now clearly separated. This eases not only the process design and increasesthe understandability of a process, but makes writing super operators easier, too.

    Over and above the old and complex method for dening, which operator hasto deliver which class, is now the same as for all operators. All you have to dois to reformulate the old getInnerOperatorCondition method as a new input portprecondition.

    5.11 Dening parameters

    Thats already very nice and does the innite execution. But we have the prob-lem, that we want the process to be executed every minute. And hence this

    36

  • 7/22/2019 How to Extend RapidMiner 5

    43/92

    5.11. Dening parameters

    interval might change or be different in other settings we want to avoid hardcoding it. Its now time for dening our rst parameter. Parameters are pre-sented to the users in the parameter tab of RapidMiner, where they can alterthe parameters values. There are several types of parameters available for den-ing real or integer numbers, strings, collections of strings in comboboxes eithereditable or not. Special types for selecting an attribute or several attributes areavailable, too. The most complex parameter type might even dene an own GUIcomponent as a conguration wizard.

    Parameters might be either normal or expert parameters. The last arent shown,when the user did not switch to expert mode. So its good practice to deneparameters as expert whose effect is only understandable by those who havedeeper knowledge of the underlying algorithm. All of these parameters musthave default values otherwise the user is bothered with dening a parameter hecannot understand. That would be even worse than showing it with a reasonabledefault value.

    Further guidance might be offered to the user by dening parameter dependencies.

    Some parameters are only used if other parameters are set to specic parameters.A simple and well known example is the use of a local random seed. Many of RapidMiners operators offer the possibility to take random numbers from a localrandom generator instead of using the global random number sequence. This isuseful for ensuring reproducible results in sub parts of your process. If you wantuse such a local random generator, this must be initialized with a so called seed.So if you check the parameter use local random seed of the X-Validation operator,a eld is shown to insert such a seed. Technically the eld is shown, because allits dependencies were satised. This time there has only been one, namely the

    use local random seed parameter has to be checked, but in general there mightbe arbitrary conditions.

    Using these dependencies show the user in each situation which parameter willhave an effect and he isnt bothered with irrelevant parameters. If you are familiarwith the great amount of parameters kernel based methods like the SVM offer,you probably will immediately understand, why this is important.

    Lets do something practical and add parameters to our operator. In fact, we

    37

  • 7/22/2019 How to Extend RapidMiner 5

    44/92

    5. Building Operators

    just have to overwrite one method:1 @Override2 public Lis t < ParameterType > getParameterTypes ( ) {3 return super . getParameterTypes () ;4 }

    We see, that we must return a list of ParameterTypes. If we are extendinganother operator or some abstract class providing basic functionality, we have tocall the super method in order to retrieve the parameters dened there. Otherwise

    the functionality provided by the super class might fail, because we dont havedened the needed parameters.

    For now, we want to add a parameter dening the number of seconds betweenthe starts of subprocess execution. Using an integer for that, it would look likethat:

    1 @Override2 public Lis t < ParameterType > getParameterTypes ( ) {3 Lis t < ParameterType > t y p e s = super . getParameterTypes () ;4 types . add( new Pa ram ete rT yp eIn t (PARAMETER FREQUENCY, Th is

    p ar am et er d e f i n e s t h e n umber o f s e c on d s b et we en t h es t a r t o f two s u bs e qu e nt s u b p r oc e s s e x e c u t i o n s . , 1 ,I n t e g e r .MAX VALUE, 5 , f a l s e ) ) ;

    5 return type s ;6 }

    First of all we retrieve the list of ParameterTypes of the super class and thenadd our own parameter. This is of type integer and shall be named with thepublic constant PARAMETER FREQUENCY . The following string should describe

    the functionality of this parameter type and is shown in the tool tip of thisparameter. The three integer values dene the minimal, the maximal and thedefault value. The last parameter determines if the parameter is expert or not.In this case we decided, that this parameter is quite understandable.

    Before we can take a look at the result, we have to add the constant to the class.This is important, to give API users access to the parameters if they want toutilize this operator internally. Otherwise they would have to retype the stringand if then the parameter name is changed because of any reason, might be a

    38

  • 7/22/2019 How to Extend RapidMiner 5

    45/92

    5.12. Using Parameters

    typo or something similar, each utilizing class would have to be adapted, too. Toavoid this, simply dene a public constant:

    1 p ub li c s t a t ic f i n a l St r i ng PARAMETER FREQUENCY = f r equency ;

    The Parameters tab now would look like this:

    Figure 5.7: The parameter tab showing our new parameter

    5.12 Using Parameters

    After we have dened the parameter, we want to use it to avoid executing oursubprocess too frequently. At rst we have to retrieve the value the user hasentered and store it in a local variable:

    1 in t se co nds Be twe en St ar ts = ge tP ar am et er As In t (PARAMETER FREQUENCY) ;

    Now we are going to use the wait functionality of Javas threads to ensure that wepause. Since this isnt RapidMiner specic, this will not be explained in detail,but the code nally looks like this:

    1

    @Override2 publ ic void doWork() throws Opera torExcept ion {3 in t secondsBe tweenS tar t s = ge tPa rame te rAs In t (

    PARAMETERFREQUENCY) ;4

    5 inputP ortPai rExte nder . passDataThrough () ;6 while ( true ) {7 checkForStop () ;8 long s t a r t = Sys tem . cu r r en tTi meMi l l i s ( ) ;9 getSubpr ocess (0) . execute ( ) ;

    10 long end = Sys tem . cur r en tT imeMi l l i s ( ) ;

    39

  • 7/22/2019 How to Extend RapidMiner 5

    46/92

    5. Building Operators

    11

    12 long wai t = ( secondsBetweenStar t s 1000) (end s t a r t ) ;

    13 i f (wa i t > 0) { / / if we h av e t o w ai t a ny wa y14 tr y {15 Thread . s l ee p (wait ) ;16 } catch ( I n t e r r up t e d E xc e p t i o n e ) {17 / / D on t d o a ny th i ng : O nl y e x ec ut in g

    too ea rly18 }19 }20 }21 }

    5.13 Adding dependencies to parameters

    Chances are we want to have the process to re-execute as fast as possible. Wecould enter something link a zero into the parameter eld to achieve this, but

    this isnt very selfexplanatory. To avoid this, we are going to add a Booleanparameter determining if theres any time restriction for the execution. Only if this one is checked, we want the user to see the parameter eld for the seconds.So we introduce another parameter with its constant:

    1 p ub li c s t a t i c f i n a l S t r in g PARAMETER RESTRICT FREQUENCY = r e s t r i c t f r e q u e n c y ;

    2

    3 . . .4

    5

    @Override6 public Lis t < ParameterType > getParameterTypes ( ) {7 Lis t < ParameterType > t y p e s = super . getParameterTypes () ;8 types . add( new ParameterTypeBoolean(

    PARAMETER RESTRICT FREQUENCY, I f ch ec ke d , th e f r equencyo f s u bp r oc e s s e x e c ut i on might be r e s t r i c t e d . , f a l s e ,

    f a l s e ) ) ;9

    10 ParameterType type = new ParameterTypeInt(PARAMETER FREQUENCY, Thi s pa ra met er d e f i n e s th e numbero f s e co nd s b et we en t he s t a r t o f two s ub se qu en t

    40

  • 7/22/2019 How to Extend RapidMiner 5

    47/92

    5.13. Adding dependencies to parameters

    sub pro ces s ex ec ut ion s . , 1 , In te ge r .MAX VALUE, 5 , f a l s e );

    11 type . reg is te rDep endency Condi t ion ( newBooleanParameterCondit ion( th is ,PARAMETER RESTRICT FREQUENCY, true , true ) ) ;

    12 types .add( type) ;13

    14 return type s ;15 }

    For registering the condition, we had to remember the type in a local variable,which must be added to the list separately. But then its fairly easy to adda condition. Here we add a BooleanParameterCondition, which needs to havea reference to a ParameterHandler. For operators, this is the operator itself.The second method argument is the name of the referenced parameter. The twoBoolean values indicate if the parameter becomes mandatory if the condition issatised and the second denes the value the referenced parameter must have inorder to full this satised.

    The resulting parameter tab now looks like this, depending on the parametersettings:

    Figure 5.8: The parameter tab without restrict frequency checked

    Now you already have all basic the knowledge you need to write your rst ownoperator for RapidMiner. For further detail information about classes availablein RapidMiner you might refer to the API documentation, which is available asdownload on our website at rapid-i.com . The next chapter will show, how youcan extend not only the functionality of RapidMiner by adding operators, butadding new data objects to pass between the operators.

    41

    http://localhost/var/www/apps/conversion/tmp/scratch_5/rapid-i.comhttp://localhost/var/www/apps/conversion/tmp/scratch_5/rapid-i.comhttp://localhost/var/www/apps/conversion/tmp/scratch_5/rapid-i.com
  • 7/22/2019 How to Extend RapidMiner 5

    48/92

    5. Building Operators

    Figure 5.9: The parameter tab with restrict frequency checked: The conditionedparameter is shown

    42

  • 7/22/2019 How to Extend RapidMiner 5

    49/92

    6 Building special data objects

    If you are from the scientic community or trying to integrate RapidMiner withanother program, you will sooner or later face the problem, that the standard dataobjects dont full all your requirements. Lets assume for example you are goingto analyze data recorded from some sort of game engine. You are planning to usemachine learning algorithms to make the characters played by the computer alittle bit smarter. The format the original data comes cant directly be expressedas a table. So you have to write some preprocessing steps anyway and you decideto do this in RapidMiner. The plan is to make everything as modular as possible.Although you could simply write one operator that reads in the data from a le,and does all the translation and feature extraction, you decide, that it would bebest to split it up. With this modularity, it will be much easier to extend themechanism later on and optimize the steps separately.

    This can be achieved as follows. Users who are familiar with the time series orthe text processing extension are already familiar with this approach. We haveone super operator which loads the data and passes it to an inner sub process.Inside this sub process, a special data object, representing the current data ispassed from one operator to the next, each one changing the data or adding newinformation. This added data is nally written into a table which is returned asan ExampleSet to the subsequent RapidMiner operators, which now do the actuallearning. We already learned how to build operators, both normal and superoperators, and how to pass data between them. Now we are going to dene anew data object.

    43

  • 7/22/2019 How to Extend RapidMiner 5

    50/92

    6. Building special data objects

    6.1 Dening the object class

    First of all, we have to dene a new class that should hold the informationwe need. This class must implement the interface IOObject, but it is recom-mended to extend ResultObjectAdapter instead. This abstract class has alreadyimplemented much of the non special functionality and is suitable for the mostcases. Only in special circumstances where you already have a class that mighthold the game data and provide some important functionality, it might be a

    better idea to extend this class and let it implement the interface. An emptyimplementation would look like that:

    1 package com . rap idmi ner . game ;2

    3 import com. rapidmi ner . ope rat or . ResultObje ctAdapte r ;4

    5 /* *6 * T hi s c la ss c on ta i ns t he g am e d at e , r e co rd ed d ur in g7 * r un ti me o f t h e g am e .8 *9 * @author Sebas t i an Land

    10 */11 p ub li c c l a s s GameDataIOObject extends Resul tObjec tAdapter {12

    13 p ri va te s t a t ic f i n a l lon g s e r i a lVe r s i o nU I D =1725159059797569345L;

    14 }

    This is only an empty object, that doesnt hold any information. We will addsome content now:

    1 package com . rap idmi ner . game ;2

    3 import com. rapidmi ner . ope rat or . ResultObje ctAdapte r ;4

    5 /* *6 * T hi s c la ss c on ta i ns t he g am e d at e , r e c or de d d ur in g7 * r un ti me o f t h e g am e .8 *9 * @author Sebas t i an Land

    10 */

    44

  • 7/22/2019 How to Extend RapidMiner 5

    51/92

    6.1. Dening the object class

    11 p ub li c c l a s s GameDataIOObject extends Resul tObjec tAdapter {12

    13 p ri va te s t a t i c f i n a l lo ng s e r i a l Ve r s i o n UI D =1725159059797569345L;

    14

    15 private GameData da ta ;16

    17 public GameDataIOObject( GameData d ata ) {18 t h i s . data = data ;19 }20

    21 public GameData getGameData ( ) {22 return data ;23 }24 }

    This class already gives access to an object of the class GameData, which shallbe the representative for everything we want to access. This might be morecomplex in real-world applications, but you might conclude how things work ingeneral. Now we want to extract attribute values from the game data, which the

    super operator can store into a table. This data table might then be returnedas example set for learning. This should be done by operators contained in thesuper operators sub process. Each of them could retrieve the GameData fromthe GameDataIOObject and attach one or more attributes. Only one GameData istreated per execution of the sub process and each becomes a single example of the resulting ExampleSet.

    So we need a mechanism to add data to the IOObject. For making things lesscomplicated, we assume that we only have numerical attributes. This way we

    save the effort of remembering the correct types of the data. Lets add a Map forstoring the values with identier as local variable:

    1 private Map< Str ing , Double > valueMap = new HashMap < Str ing , Double > ();

    Then we extend the GameDataIOObject with two methods for accessing the map:

    1 /* *2 * T hi s s et s a v al ue o f t hi s G a me Da ta IO Ob j ec t , w hi ch i s l at er o n

    extracted

    45

  • 7/22/2019 How to Extend RapidMiner 5

    52/92

    6. Building special data objects

    3 * a s a n a tt ri b ut e i n t he r e su lt i ng E xa m pl eS e t .4 */5 publ ic void s e tVal u e ( S t r i ng i d e n t i f i e r , double value ) {6 valueMap . put ( id en t i f i e r , va lue ) ;7 }8

    9 /* *10 * F or e xt r ac ti n g a ll i de n ti f ie r s / v al ue s11 */12 public Map< Str ing , Double > getValueMap() {13 return valueMap ;14 }

    6.2 Processing your own IOObjects

    Using these methods we now might implement our rst operator, which extractsproperties of the GameData. Lets assume each situation in the game is about acharacter of a specic age. We might want to extract its age as an attribute. Fordoing that, we are going to build an ExtractAgeOperator. The idea is that thisoperator will be executed in the subprocess and attaches the age as a value tothe GameDataIOObject it received and will return it again. From there it is passedto the next operator and so on. For implementing this logic, we will rst exercisewhat we have learned in the section Creating super operators and implementthe super operator:

    1 import j ava . u t i l . L inkedLis t ;2 import j ava . u t i l . L is t ;3

    4 import com . rap id min er . example . ExampleSet ;5 import com . rap idmi ner . op er at or . OperatorChain ;6 import com. rap idminer . opera tor . Oper a torDe scr ip t ion ;7 import com. rapid miner . oper ato r . OperatorEx ception ;8 import com. rapid miner . oper ato r . ports . InputPort ;9 import com . rapidm ine r . op er at or . por ts . OutputPort ;

    10 import com . rapidm ine r . op er at or . por ts . metadata .SubprocessTransformRule ;

    11

    12 /* *

    46

  • 7/22/2019 How to Extend RapidMiner 5

    53/92

    6.2. Processing your own IOObjects

    13 * T hi s o pe ra to r w il l f ee d a ll G a me Da ta o bj ec ts t o i t s i nn er s ubprocess and

    14 * w il l e xe cu te it in o rd er to b ui ld a n e xa mp le s et f ro m t heextracted

    15 * k ey v al ue p ai rs .16 *17 * @author Sebas t ian Land18 */19 p ub li c c l a s s ProcessGameDataOperator extends OperatorChain {20

    21 private OutputPort innerGameDataSource = getS ubpr oce ss (0 ) .

    ge t I nner Sour ces ( ) . c re a teP or t (game da ta) ;22 private InputPor t innerGameDataSink = ge tSubpr ocess (0) .

    ge t Inne rSin ks ( ) . c re a teP or t (game da ta) ;23 private OutputPort exampleSetOutput = getOutputPorts () .

    c r ea t ePo r t ( example s e t ) ;24

    25 public ProcessGameDataOperator( Ope rat orD esc rip t io nd e s c r i p t i o n ) {

    26 super ( d e s c r i p t i o n , P r op e rt y E x t r ac t i o n ) ;27

    28 / ** v er y s ho rt a nd i n su f fi c ie nt m et a d at a

    t r ans fo rma t ion : Shou ld be much29 * m or e so ph i st i ca te d .30 */31

    32 getTra nsformer () . addGenerat ionRule (innerGameDataSource , GameDataIOObject . c l a s s ) ;

    33 getTra nsformer ( ) . addRule( new SubprocessTransformRule( getS ubpr oce ss (0) ) ) ;

    34 getTra nsformer ( ) . addGenerat ionRule( exampleSetOutput ,ExampleSet . c l a s s ) ;

    35 }36

    37 @Override38 public void doWork() throws Opera torExcept ion {39 Lis t < GameData > loadedData = new LinkedLis t < GameData

    > () ;40 loadedData .add( new GameData() ) ;41 /* *42 * I te ra te o ve r a ll G am eD a ta o bj ec ts a nd f ee d t he m

    t hr ou gh t he s ub p ro c es s o ne b y o ne .43 * E x te nd in g E xa m pl eS e t e ac h t im e b y o ne e xa mp le

    47

  • 7/22/2019 How to Extend RapidMiner 5

    54/92

    6. Building special data objects

    44 */45 ExampleSe t r e su l tSe t = nul l ;46 fo r (GameData gameData : loadedData ) {47 innerGameDataSource . de l i ve r ( new

    GameDataIOObject ( gameData) ) ;48 getSubpr ocess (0) . execute ( ) ;49 GameDataIOObject r e s u l t = innerGameDataSink .

    getData () ;50

    51 i f ( r e su l tSe t == nul l )52 r e s u l t S e t = c r e a t e I n it i a l E xa m p l e S et (

    r e su l t ) ;53 e l se54 ex tendExampleSet ( r e su l tSe t , r e su l t ) ;55 }56

    57 exampleSetOutput . de l iv er ( re su l t Se t ) ;58 }59

    60 /* *61 * T hi s m et ho d h as t o e xt en d t he g iv en r es u lt Se t b y t he

    example ex t rac ted f rom62 * t he r es ul t o bj ec t .63 */64 pr ivate void extendExampleSet ( ExampleSet res u l t Se t ,

    GameDataIOObject r e s u l t ) {65 }66

    67 /* *68 * T hi s w il l c re at e t he f ir st i ni ti al e xa mp le s et f ro m t he

    resu l t ob jec t .69 * A t f ir st t he M e mo r yE x am p le Tab l e w il l b e c re at ed t o

    s tor ing the data , t hen70 * f or e ac h e nt ry in t he m ap an a tt ri bu te is c re at ed a nd p ut

    toge the r in to an71 * e xa mp le s et .72 */73 private ExampleSet cr ea te In i t ia lEx am pl eS et (GameDataIOObject

    r e s u l t ) {74 return nul l ;75 }76 }

    48

  • 7/22/2019 How to Extend RapidMiner 5

    55/92

    6.2. Processing your own IOObjects

    Figure 6.1: The above operator after inserting it into a process

    Of course this operator still lacks all real functionality consisting of reading thegame data from a source of some kind, probably depending on some parametersettings specifying the location. But the previous sections should have made itclear, which steps one would have to go, if one has such a task at hand.

    Now we want to build one of the inner operators:

    1 package com. rap idminer . opera tor . game . ex t r ac to rs ;2

    3 import com. rap idminer . op era tor . Operator ;4 import com. rap idminer . opera tor . Oper a torDe scr ip t ion ;5 import com. rapidm iner . oper ato r . OperatorExc eption ;6 import com. r apid mine r . op er at or . game . GameDataIOObject ;7 import com. rapidm iner . oper ato r . ports . InputPort ;8 import com. rap idmi ner . ope ra to r . po rt s . OutputPort ;9

    10 /* *11 * A s im pl e e xt ra c to r o f p ro p er t ie s o f a g am e d at a o bj ec t .12 *13 * @author Sebas t ian Land14 */15 p ub li c c l a s s

    ExtractAgeOperatorextends

    Operator {16

    17 / * * d ef in in g t he p or ts * /18 private InputPor t gameDataInput = ge t Inp utPor ts ( ) . c rea tePo r t

    ( game data , GameDataIOObject . c l a s s ) ;19 private OutputPort gameDataOutput = getOu tputPo rts () .

    c re a teP or t (game da ta) ;20

    21 /* *22 * T he d ef au lt c o ns t ru c to r n ee de d i n e xa ct ly t hi s s ig na t ur e23 */

    49

  • 7/22/2019 How to Extend RapidMiner 5

    56/92

    6. Building special data objects

    24 public Extr ac tAgeOpera to r ( Ope ra to rDesc r ip t i on des c r i p t i on ) {25 super ( de s c r i p t i on ) ;26

    27 / * * A dd in g a r ul e f or m et a d at a t r an s fo r ma t io n :G am e Da ta w il l b e p as se d t hr ou gh * /

    28 get Tran sfor mer ( ) . addPassThroughRule ( gameDataInput ,gameDataOutput) ;

    29 }30

    31 @Override32 public void doWork() throws Opera torExcept ion {33 GameDataIOObject inp ut = gameDataInput . getD ata () ;34

    35 ext r ac tV alues ( input ) ;36

    37 gameDataOutput . d e l i v e r ( in pu t ) ;38 }39

    40 /* *41 * T hi s m et ho d c ou ld e xt ra ct a rb it r ar y p r op er t ie s f ro m t he

    G am eD at a a nd p ut it as a k ey v al ue p ai r i nt o42 * t he G am e Da t aI O Ob j ec t . E ac h p ai r w il l b ec om e a s in gl e

    a t tr ibute in the r e su l t ing ExampleSe t and hence43 * e ac h e xe c ut io n o f t he s ub p ro c es s m us t r es ul t i n e xa ct ly

    t he s am e n um be r o f p ai rs .44 * O t he rw i se f or s om e e xa mp l es t he re a re u nd e fi ne d

    at t r ibutes .45 */46 pr ivate void ext rac tVa lue s (GameDataIOObject input ) {47 inp ut . set Val ue ( Age , inp ut . getGameData( ) . getAge () ) ;48 }49 }

    This is just a simple example for extracting one attribute, adding it and passingthe object. Of course it is a good idea to let this operator inherit from anAbstractExtractionOperator which already provides all functionality that is sharedamong all extraction operators. Then only the method extractValues have to beimplemented and one could concentrate on the real problem of extracting thevalues. The image below shows a sub process with four extraction operators.

    50

  • 7/22/2019 How to Extend RapidMiner 5

    57/92

    6.3. Taking a look into your IOObject

    Figure 6.2: The sub process containing several extraction operators like the onedescribed above

    Of course its possible to build more complex constructions. You might think of splitting and merging the GameDataIOObject, or building loops and conditionsinside the sub process. The latter might be achieved by creating new super

    operators. Every way of treating your own IOObjects is possible by combiningwhat we have learned.

    6.3 Taking a look into your IOObject

    When building a process for your own IOObject s, you will notice, that its anincredibly valuable feature to set breakpoints with the process and take a look

    whats contained in the objects. To continue our example above, it would beinteresting, which values would have been extracted. If we set a breakpoint,RapidMiner will display the result of the toString method as the default fallback.

    Theres plenty of space one could ll with information about the object. Howcould we do this? The simplest approach would be to override the toString methodof the IOObject. Anyway its more suitable to override the toResultString method,which per default only calls the toString method. But anybody having debuggeda complex program with huge data objects knows the problems arising when the

    51

  • 7/22/2019 How to Extend RapidMiner 5

    58/92

    6. Building special data objects

    Figure 6.3: If nothing else is dened, RapidMiner will return the default Stringrepresentation as result.

    toString method is too chatty: The IDE will hang for seconds until the huge stringis built. This can be avoided by implementing it in the following way:

    1 @Override2 public S t r i n g t o R e s u l t S t r i n g ( ) {3 S t r i n g Bu i l d e r b u i l d e r = new St r in gBu i lde r ( ) ;4 bu i ld e r . append (The fo l l owin g va lues have been ex t r ac t e d : \ n

    ) ;5 fo r ( Str in g key : getValueMap () . keySet () ) {6 bui lde r . append( key + : \ t + getValueMap () . get ( key)

    + \ n) ;7 }8

    9 bui lde r . append( \ n \ nThe data : \ n) ;10 bu il de r . append( data . toSt r i ng () ) ;11

    12 return bu i l de r . t oS t r i ng ( ) ;13 }

    52

  • 7/22/2019 How to Extend RapidMiner 5

    59/92

    6.4. Leaving the 80s

    Figure 6.4: The result of overwriting the method.

    6.4 Leaving the 80s

    Although text output has its advantages, writing Courier characters on screenseems a little bit outdated since the late eighties. How do we add nice represen-tations to the output as done with nearly all core IOObjects of RapidMiner?

    RapidMiner uses a renderer concept for displaying the various types of IOObjects. Theres some conguration le specifying which renderers are used for whichclasses of IOObject s. We will see how to extend this xml le, but currently wewant to concentrate on implementing a renderer for our GameDataIOObject .

    The interface Renderer must be implemented for this purpose. Here we extendthe AbstractRenderer , which will have most of the methods already implemented forus. Most of the methods are used for handling parameters, since renderers might

    53

  • 7/22/2019 How to Extend RapidMiner 5

    60/92

    6. Building special data objects

    have parameters as operators do. They are used during automatic reporting of objects and control the output. The handling of these parameters and their valueis done by the abstract class, all we have to do is to take their values into accountwhen rendering. Here are the methods we have to implement:

    1 p ub li c c l a s s GameDataRenderer extends Abst rac tRenderer {2

    3 @Override4 public Repor tab l e c r ea t eRe por t ab l e ( Ob jec t r ende rab l e ,

    IOConta iner ioConta iner , in t desiredWidth , in t

    des i r edHe i gh t ) {5 return nul l ;6