rapidminer · pdf file©2014 by rapidminer. ... there are other documents available for...

987
RapidMiner Operator Reference Manual

Upload: trantuyen

Post on 12-Mar-2018

246 views

Category:

Documents


6 download

TRANSCRIPT

  • RapidMinerOperator Reference Manual

  • 2014 by RapidMiner. All rights reserved.

    No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by means electronic, mechanical, photocopying, or otherwise, without prior written permission of RapidMiner.

  • Preface

    Welcome to the RapidMiner Operator Reference, the final result of a long work-ing process. When we first started to plan this reference, we had an extensive discussion about the purpose of this book. Would anybody want to read the hole book? Starting with Ada Boost and reading the description of every single operator to the X-Validation? Or would it only serve for looking up particular operators, although that is also possible in the program interface itself?We decided for the latter and with growing progress in the making of this book, we realized how fuitile this entire discussion has been. It was not long until the book reached the 600 pages limit and now we nearly hit the 1000 pages, what is far beyond anybody would like to read entirely. Even if there would be a great structure, explaining the usage of single groups of operators as guiding transitions between the explanations of single operators, nobody could comprehend all that. The reader would have long forgotten about the Loop Clusters operator until he gets to know about cross validation. So we didnt dump any effort in that and hence the book has become a pure reference. For getting to know RapidMiner itself, this is not a suitable document. Therefore we would rather recommend to read the manual as a starting point. There are other documents available for particular scenarios, like using RapidMiner as a researcher or when you want to extend its functionality. Please take a look at our website rapidminer.com to get an overview, which documentations are available.

    From that fact, we can draw some suggestions about how to read this book:

    Whenever you want to know about a particular operator, just open the index

    at the end of this book, and directly jump to the operator. The order of the

    V

  • operators in this book is determined by the group structure in the operator tree,

    as you will immediately see, when taking a look at the contents. As operators for

    similar tasks are grouped together in RapidMiner, these operators are also near

    to each other in this book. So if you are interested in broading your perspective

    of RapidMiner beyond an already known operator, you can continue reading a

    few pages before and after the operator you picked from the index.

    Once you read the description of an operator, you can jump to the tutorial pro-

    cess, that will explain a possible use case. Often the functionality of an operator

    can be understood easier with a context of a complete process. All these pro-

    cesses are also available in RapidMiner. You simply need to open the description

    of this operator in the help view and scroll down. After pressing on the respective

    link, the process will be opened and you can inspect the details, execute it and

    analyse the results from break points. Apart from that, the explanation of the

    parameters will give you a good insight of what the operator is capable of and

    what it can be configured for.

    I think theres nothing left to say except wishing you a lot of illustrative encoun-

    ters with the various operators. And if you really read it from start to end, please

    tell us, as we have bets running on that. Of course we will verify that by checking

    if you found all the easter eggs. . .

    Sebastian Land

    VI

  • Contents

    1 Process Control 1

    Remember . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

    Recall . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

    Multiply . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

    Join Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

    Handle Exception . . . . . . . . . . . . . . . . . . . . . . . 12

    Throw Exception . . . . . . . . . . . . . . . . . . . . . . . . 15

    1.1 Parameter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

    Set Parameters . . . . . . . . . . . . . . . . . . . . . . . . . 17

    Optimize Parameters (Grid) . . . . . . . . . . . . . . . . . . 21

    Optimize Parameters (Evolutionary) . . . . . . . . . . . . . 26

    1.2 Loop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

    Loop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

    Loop Attributes . . . . . . . . . . . . . . . . . . . . . . . . 36

    Loop Values . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

    Loop Examples . . . . . . . . . . . . . . . . . . . . . . . . . 46

    Loop Clusters . . . . . . . . . . . . . . . . . . . . . . . . . . 49

    Loop Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . 51

    Loop and Average . . . . . . . . . . . . . . . . . . . . . . . 54

    Loop Parameters . . . . . . . . . . . . . . . . . . . . . . . . 57

    Loop Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

    X-Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . 65

    1.3 Branch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

    Branch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

    VII

  • Contents

    Select Subprocess . . . . . . . . . . . . . . . . . . . . . . . . 75

    1.4 Collections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

    Collect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

    Select . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

    Loop Collection . . . . . . . . . . . . . . . . . . . . . . . . . 85

    2 Utility 89

    Subprocess . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

    2.1 Macros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

    Set Macro . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

    Set Macros . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

    Generate Macro . . . . . . . . . . . . . . . . . . . . . . . . 102

    Extract Macro . . . . . . . . . . . . . . . . . . . . . . . . . 110

    2.2 Logging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

    Log . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

    Provide Macro as Log Value . . . . . . . . . . . . . . . . . . 125

    Log to Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

    2.3 Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

    Execute Process . . . . . . . . . . . . . . . . . . . . . . . . 132

    Execute Script . . . . . . . . . . . . . . . . . . . . . . . . . 135

    Execute SQL . . . . . . . . . . . . . . . . . . . . . . . . . . 141

    Execute Program . . . . . . . . . . . . . . . . . . . . . . . . 145

    2.4 Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148

    Write as Text . . . . . . . . . . . . . . . . . . . . . . . . . . 149

    Copy File . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

    Rename File . . . . . . . . . . . . . . . . . . . . . . . . . . 153

    Delete File . . . . . . . . . . . . . . . . . . . . . . . . . . . 156

    Move File . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158

    Create Directory . . . . . . . . . . . . . . . . . . . . . . . . 161

    2.5 Data Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163

    Generate Data . . . . . . . . . . . . . . . . . . . . . . . . . 163

    Generate Nominal Data . . . . . . . . . . . . . . . . . . . . 165

    Generate Direct Mailing Data . . . . . . . . . . . . . . . . . 167

    Generate Sales Data . . . . . . . . . . . . . . . . . . . . . . 169

    VIII

  • Contents

    Add Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171

    2.6 Miscellaneous . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177

    Materialize Data . . . . . . . . . . . . . . . . . . . . . . . . 177

    Free Memory . . . . . . . . . . . . . . . . . . . . . . . . . . 179

    3 Repository Access 183

    Retrieve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183

    Store . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185

    4 Import 189

    4.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189

    Read csv . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189

    Read Excel . . . . . . . . . . . . . . . . . . . . . . . . . . . 194

    Read SAS . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197

    Read Access . . . . . . . . . . . . . . . . . . . . . . . . . . . 199

    Read AML . . . . . . . . . . . . . . . . . . . . . . . . . . . 201

    Read ARFF . . . . . . . . . . . . . . . . . . . . . . . . . . . 206

    Read Database . . . . . . . . . . . . . . . . . . . . . . . . . 210

    Stream Database . . . . . . . . . . . . . . . . . . . . . . . . 215

    Read SPSS . . . . . . . . . . . . . . . . . . . . . . . . . . . 219

    4.2 Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222

    Read Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 222

    4.3 Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224

    Read Weights . . . . . . . . . . . . . . . . . . . . . . . . . . 224

    5 Export 227

    Write . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227

    5.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229

    Write AML . . . . . . . . . . . . . . . . . . . . . . . . . . . 229

    Write Arff . . . . . . . . . . . . . . . . . . .