stat requirement analysis
TRANSCRIPT
![Page 1: STAT Requirement Analysis](https://reader034.vdocument.in/reader034/viewer/2022042607/5585a21ad8b42ae3228b46b2/html5/thumbnails/1.jpg)
Requirement AnalysisTHE STAT PROJECTTHE STAT PROJECT
Milestone 1 Report
![Page 2: STAT Requirement Analysis](https://reader034.vdocument.in/reader034/viewer/2022042607/5585a21ad8b42ae3228b46b2/html5/thumbnails/2.jpg)
To design a framework, how many variations we need to protect? How many
functionalities we need to provide for supporting all these variations?
QUESTIONSQUESTIONS
![Page 3: STAT Requirement Analysis](https://reader034.vdocument.in/reader034/viewer/2022042607/5585a21ad8b42ae3228b46b2/html5/thumbnails/3.jpg)
Variation for importing dataset (File Sources)
![Page 4: STAT Requirement Analysis](https://reader034.vdocument.in/reader034/viewer/2022042607/5585a21ad8b42ae3228b46b2/html5/thumbnails/4.jpg)
Variations for importing dataset (File formats)
![Page 5: STAT Requirement Analysis](https://reader034.vdocument.in/reader034/viewer/2022042607/5585a21ad8b42ae3228b46b2/html5/thumbnails/5.jpg)
Variations for importing dataset (Schemas)
Even if we only consider dataset in XML, each dataset may have its own schema.
![Page 6: STAT Requirement Analysis](https://reader034.vdocument.in/reader034/viewer/2022042607/5585a21ad8b42ae3228b46b2/html5/thumbnails/6.jpg)
Reuters dataset example
![Page 7: STAT Requirement Analysis](https://reader034.vdocument.in/reader034/viewer/2022042607/5585a21ad8b42ae3228b46b2/html5/thumbnails/7.jpg)
Simplified approach
One approach: High Level Reader Class, - ReutersReader- RCV1ReaderOnce written, can be shared by community
One approach: High Level Reader Class, - ReutersReader- RCV1ReaderOnce written, can be shared by community
Observation: for the sake of comparison, researchers usually deal with a few famous dataset (e.g., Reuters, RCV-1)
![Page 8: STAT Requirement Analysis](https://reader034.vdocument.in/reader034/viewer/2022042607/5585a21ad8b42ae3228b46b2/html5/thumbnails/8.jpg)
Able to persist and read back memory objects
![Page 9: STAT Requirement Analysis](https://reader034.vdocument.in/reader034/viewer/2022042607/5585a21ad8b42ae3228b46b2/html5/thumbnails/9.jpg)
Able to visualize memory objects
![Page 10: STAT Requirement Analysis](https://reader034.vdocument.in/reader034/viewer/2022042607/5585a21ad8b42ae3228b46b2/html5/thumbnails/10.jpg)
STAT (brief) Domain Model
Note: We ignore texts on connectors for brevity. Some connections are not drawn because of space limitation Note: We ignore texts on connectors for brevity. Some connections are not drawn because of space limitation
![Page 11: STAT Requirement Analysis](https://reader034.vdocument.in/reader034/viewer/2022042607/5585a21ad8b42ae3228b46b2/html5/thumbnails/11.jpg)
STAT framework sample code (conceptual)
![Page 12: STAT Requirement Analysis](https://reader034.vdocument.in/reader034/viewer/2022042607/5585a21ad8b42ae3228b46b2/html5/thumbnails/12.jpg)
![Page 13: STAT Requirement Analysis](https://reader034.vdocument.in/reader034/viewer/2022042607/5585a21ad8b42ae3228b46b2/html5/thumbnails/13.jpg)
Domain Concept: RawCorpus
A collection of RawDocument, supporting collection operations: - Add new RawDocument element - Remove existing RawDocument element - Accessing elements in the collection - …
![Page 14: STAT Requirement Analysis](https://reader034.vdocument.in/reader034/viewer/2022042607/5585a21ad8b42ae3228b46b2/html5/thumbnails/14.jpg)
Domain Concept: RawCorpus
abstract class RawCorpus {List<RawDocument> rawDocuments;RawDocument getDocument(int index);void setDocument(int index, T doc);void removeDocument(int index);
}
![Page 15: STAT Requirement Analysis](https://reader034.vdocument.in/reader034/viewer/2022042607/5585a21ad8b42ae3228b46b2/html5/thumbnails/15.jpg)
Domain Concept: RawDocument
An object with one or more string fields, serving as a non-processed, in-memory representation of a document unit - Like Java beans with getter and setter - All fields must be string type, even for numbers
![Page 16: STAT Requirement Analysis](https://reader034.vdocument.in/reader034/viewer/2022042607/5585a21ad8b42ae3228b46b2/html5/thumbnails/16.jpg)
Domain Concept: RawDocument
class MyRawDocument extends RawDocument {String title;String author;String body;String date;String numOfClicks;String topicType;…
}
abstract class RawDocument {public RawDocument() {}
}
![Page 17: STAT Requirement Analysis](https://reader034.vdocument.in/reader034/viewer/2022042607/5585a21ad8b42ae3228b46b2/html5/thumbnails/17.jpg)
Domain Concept: Processor
An object that processes RawCorpus and produces Corpus. - Linguistic: Tokenizer, Stemmer, StopRemover, PosTagger, … - Machine learning: Feature-specific, document-specific
![Page 18: STAT Requirement Analysis](https://reader034.vdocument.in/reader034/viewer/2022042607/5585a21ad8b42ae3228b46b2/html5/thumbnails/18.jpg)
Domain Concept: Corpus
An object representing a collection of Document for use by machine learning side of framework. This object provides a notion of splits which is commonly used (e.g., train, test)
![Page 19: STAT Requirement Analysis](https://reader034.vdocument.in/reader034/viewer/2022042607/5585a21ad8b42ae3228b46b2/html5/thumbnails/19.jpg)
Domain Concept: Trainer
A representation of a machine learning algorithm, which can learn from a Corpus and produce a Model.
![Page 20: STAT Requirement Analysis](https://reader034.vdocument.in/reader034/viewer/2022042607/5585a21ad8b42ae3228b46b2/html5/thumbnails/20.jpg)
Domain Concept: Model
An object of what machine learning algorithm (i.e., Trainer) creates to store parameters that are "learned" from the data (i.e., Corpus)
![Page 21: STAT Requirement Analysis](https://reader034.vdocument.in/reader034/viewer/2022042607/5585a21ad8b42ae3228b46b2/html5/thumbnails/21.jpg)
Domain Concept: Classifier
An object that maps Documents to target values (label, number, probability). It takes a Corpus and a Model as inputs, and produces a Prediction associated with the Corpus according to the Model.
![Page 22: STAT Requirement Analysis](https://reader034.vdocument.in/reader034/viewer/2022042607/5585a21ad8b42ae3228b46b2/html5/thumbnails/22.jpg)
Domain Concept: Prediction
A collection of target values (label, number, probability) that associate with a Corpus, i.e., a collection of Document.
![Page 23: STAT Requirement Analysis](https://reader034.vdocument.in/reader034/viewer/2022042607/5585a21ad8b42ae3228b46b2/html5/thumbnails/23.jpg)
Domain Concept: Evaluator
An object used for comparing the Prediction against its associated Corpus and generating Evaluation
![Page 24: STAT Requirement Analysis](https://reader034.vdocument.in/reader034/viewer/2022042607/5585a21ad8b42ae3228b46b2/html5/thumbnails/24.jpg)
Domain Concept: Evaluation
A representation of evaluation result given by a Evaluator, in a summarized manner.
![Page 25: STAT Requirement Analysis](https://reader034.vdocument.in/reader034/viewer/2022042607/5585a21ad8b42ae3228b46b2/html5/thumbnails/25.jpg)
THE STAT PROJECTTHE STAT PROJECT
Thanks
![Page 26: STAT Requirement Analysis](https://reader034.vdocument.in/reader034/viewer/2022042607/5585a21ad8b42ae3228b46b2/html5/thumbnails/26.jpg)
CorpusCorpus
ReaderReader ProcessorProcessorRawCorpusRawCorpus
TrainerTrainerModelModel
ClassifierClassifier
PredictionPrediction
EvaluatorEvaluator
EvaluationEvaluation
STAT (brief) Domain Model
Note: We ignore texts on connectors for brevity. Some connections are not drawn because of space limitation Note: We ignore texts on connectors for brevity. Some connections are not drawn because of space limitation
WriterWriter
VocabularyVocabulary
![Page 27: STAT Requirement Analysis](https://reader034.vdocument.in/reader034/viewer/2022042607/5585a21ad8b42ae3228b46b2/html5/thumbnails/27.jpg)
CorpusCorpusReaderReader ProcessorProcessorRawCorpusRawCorpus
TrainerTrainer
ModelModelClassifierClassifierPredictionPredictionEvaluatorEvaluator
EvaluationEvaluation WriterWriter
STAT Domain Model
Note: We ignore texts above lines for brevity
![Page 28: STAT Requirement Analysis](https://reader034.vdocument.in/reader034/viewer/2022042607/5585a21ad8b42ae3228b46b2/html5/thumbnails/28.jpg)
CorpusCorpus
ReaderReader
ProcessorProcessor
RawCorpusRawCorpus
TrainerTrainerModelModel
ClassifierClassifier
PredictionPrediction
EvaluatorEvaluator
EvaluationEvaluation
STAT Domain Model
Note: We ignore texts above lines for brevity
DocumentDocument
RawDocumentRawDocument