rapidminer · pdf file©2014 by rapidminer. ... there are other documents available for...
TRANSCRIPT
RapidMinerOperator Reference Manual
2014 by RapidMiner. All rights reserved.
No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by means electronic, mechanical, photocopying, or otherwise, without prior written permission of RapidMiner.
Preface
Welcome to the RapidMiner Operator Reference, the final result of a long work-ing process. When we first started to plan this reference, we had an extensive discussion about the purpose of this book. Would anybody want to read the hole book? Starting with Ada Boost and reading the description of every single operator to the X-Validation? Or would it only serve for looking up particular operators, although that is also possible in the program interface itself?We decided for the latter and with growing progress in the making of this book, we realized how fuitile this entire discussion has been. It was not long until the book reached the 600 pages limit and now we nearly hit the 1000 pages, what is far beyond anybody would like to read entirely. Even if there would be a great structure, explaining the usage of single groups of operators as guiding transitions between the explanations of single operators, nobody could comprehend all that. The reader would have long forgotten about the Loop Clusters operator until he gets to know about cross validation. So we didnt dump any effort in that and hence the book has become a pure reference. For getting to know RapidMiner itself, this is not a suitable document. Therefore we would rather recommend to read the manual as a starting point. There are other documents available for particular scenarios, like using RapidMiner as a researcher or when you want to extend its functionality. Please take a look at our website rapidminer.com to get an overview, which documentations are available.
From that fact, we can draw some suggestions about how to read this book:
Whenever you want to know about a particular operator, just open the index
at the end of this book, and directly jump to the operator. The order of the
V
operators in this book is determined by the group structure in the operator tree,
as you will immediately see, when taking a look at the contents. As operators for
similar tasks are grouped together in RapidMiner, these operators are also near
to each other in this book. So if you are interested in broading your perspective
of RapidMiner beyond an already known operator, you can continue reading a
few pages before and after the operator you picked from the index.
Once you read the description of an operator, you can jump to the tutorial pro-
cess, that will explain a possible use case. Often the functionality of an operator
can be understood easier with a context of a complete process. All these pro-
cesses are also available in RapidMiner. You simply need to open the description
of this operator in the help view and scroll down. After pressing on the respective
link, the process will be opened and you can inspect the details, execute it and
analyse the results from break points. Apart from that, the explanation of the
parameters will give you a good insight of what the operator is capable of and
what it can be configured for.
I think theres nothing left to say except wishing you a lot of illustrative encoun-
ters with the various operators. And if you really read it from start to end, please
tell us, as we have bets running on that. Of course we will verify that by checking
if you found all the easter eggs. . .
Sebastian Land
VI
Contents
1 Process Control 1
Remember . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Recall . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Multiply . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Join Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Handle Exception . . . . . . . . . . . . . . . . . . . . . . . 12
Throw Exception . . . . . . . . . . . . . . . . . . . . . . . . 15
1.1 Parameter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Set Parameters . . . . . . . . . . . . . . . . . . . . . . . . . 17
Optimize Parameters (Grid) . . . . . . . . . . . . . . . . . . 21
Optimize Parameters (Evolutionary) . . . . . . . . . . . . . 26
1.2 Loop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Loop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Loop Attributes . . . . . . . . . . . . . . . . . . . . . . . . 36
Loop Values . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
Loop Examples . . . . . . . . . . . . . . . . . . . . . . . . . 46
Loop Clusters . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Loop Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . 51
Loop and Average . . . . . . . . . . . . . . . . . . . . . . . 54
Loop Parameters . . . . . . . . . . . . . . . . . . . . . . . . 57
Loop Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
X-Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . 65
1.3 Branch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
Branch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
VII
Contents
Select Subprocess . . . . . . . . . . . . . . . . . . . . . . . . 75
1.4 Collections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
Collect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
Select . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
Loop Collection . . . . . . . . . . . . . . . . . . . . . . . . . 85
2 Utility 89
Subprocess . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
2.1 Macros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
Set Macro . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
Set Macros . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
Generate Macro . . . . . . . . . . . . . . . . . . . . . . . . 102
Extract Macro . . . . . . . . . . . . . . . . . . . . . . . . . 110
2.2 Logging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
Log . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
Provide Macro as Log Value . . . . . . . . . . . . . . . . . . 125
Log to Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
2.3 Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
Execute Process . . . . . . . . . . . . . . . . . . . . . . . . 132
Execute Script . . . . . . . . . . . . . . . . . . . . . . . . . 135
Execute SQL . . . . . . . . . . . . . . . . . . . . . . . . . . 141
Execute Program . . . . . . . . . . . . . . . . . . . . . . . . 145
2.4 Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
Write as Text . . . . . . . . . . . . . . . . . . . . . . . . . . 149
Copy File . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
Rename File . . . . . . . . . . . . . . . . . . . . . . . . . . 153
Delete File . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
Move File . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
Create Directory . . . . . . . . . . . . . . . . . . . . . . . . 161
2.5 Data Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
Generate Data . . . . . . . . . . . . . . . . . . . . . . . . . 163
Generate Nominal Data . . . . . . . . . . . . . . . . . . . . 165
Generate Direct Mailing Data . . . . . . . . . . . . . . . . . 167
Generate Sales Data . . . . . . . . . . . . . . . . . . . . . . 169
VIII
Contents
Add Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
2.6 Miscellaneous . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
Materialize Data . . . . . . . . . . . . . . . . . . . . . . . . 177
Free Memory . . . . . . . . . . . . . . . . . . . . . . . . . . 179
3 Repository Access 183
Retrieve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
Store . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
4 Import 189
4.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
Read csv . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
Read Excel . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
Read SAS . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
Read Access . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
Read AML . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
Read ARFF . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
Read Database . . . . . . . . . . . . . . . . . . . . . . . . . 210
Stream Database . . . . . . . . . . . . . . . . . . . . . . . . 215
Read SPSS . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
4.2 Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
Read Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
4.3 Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
Read Weights . . . . . . . . . . . . . . . . . . . . . . . . . . 224
5 Export 227
Write . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
5.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
Write AML . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
Write Arff . . . . . . . . . . . . . . . . . . .