report data mining

17
Name :Samah Ziad Abudaia Student ID:220133612

Upload: samoha-samoha

Post on 07-Oct-2015

220 views

Category:

Documents


0 download

DESCRIPTION

Report Data Mining

TRANSCRIPT

Introduction: The report shows the mechanics of data mining on a large database and reduce the size of the data and find useful relationships we bring data and preparation data and the use of methods for extracting data will remember the steps in detail and explain the resultsThe first step: the process of bringing dataFrom site https://archive.ics.uci.edu/ml/datasets.htmlExplanation of data: data that has been extracted were talking about a database of diabetes where they used algorithm literature to predict the beginning of diabetes in pregnant women and the results turned to the variable binary 0 or 1 and are heading 1 is a positive test and Depends this examination on several variables1. Number of times pregnant2. Plasma glucose concentration a 2 hours in an oral glucose tolerance test3. Diastolic blood pressure (mm Hg)4. Triceps skin fold thickness (mm)5. 2-Hour serum insulin (mu U/ml)6. Body mass index (weight in kg/(height in m)^2)7. Diabetes pedigree function8. Age (years)9. Class variable (0 or 1)1-Taking the data and put it in a file and call in the program to executed their operations2-After the data insertion in the program And the designation of the columns and reservation the process then pull data to implementation square

3-Run the program shows the following table, which contains data

Step Two: Prepare the data in order to become ready for use and the use of three methodsThe first process Remove duplication1-Look in a box operations Filtering Then we choose to Remove duplication and implementation of the process and see the results

2-After running the program: does not have a repeat of the dataThe Second process1-Look in a box operations Data Transformation Then data cleansing we choose to Replace missing value and implementation of the process and see the results

2-After running the program: does not have the missing values

The third process outliers1-Look in a box operations Data Transformation Then data cleansing we choose to Detect outliers (Distances)and implementation of the process and see the results

2-After running the program: does not have the missing values

After these three processes for data processing data are ready and correct for the application Association Rules, two classification methods, clustering outlier.We will apply the Association Rules existing dataProcess: We withdraw the data set to square work and then look for Data Transformation then Type Conversion and we withdraw Numerical to Binominal and Numerical to Numerical square to work and connected with the data set and Look for Modeling then choose Association and Item Set mining and choose FP-Growth and withdraw to square implementation

Explains the Association Rules of the special relationship between the characteristicsResult : This table shows the Special relationships between variables

:This chart to Association Rules

We will apply the classification existing data1-Process: pull the data set to square working and looking for Numerical to Binominal then define the label from the list look for Data transformation then withdraw name and role modification then set role on right of the page there are properties in which we define label

2-We create a splitter to split the data into testingand training dataLooking for a list of training and evaluation, and validation and then the spilt validation Tow click on validation appear divided into two parts, the first section screen training looking for modeling and classification and regression induction tree and then withdraw the decision treeThe second section is the test we put the search for model application and then the confidences and pull apply model forLooking for a performance evaluation of the list and then the validation of performance and regression then the (performance classification) to measure the accuracy of the data finally make running to the process

Result : The new data classification based on old data and measure the accuracy and classification analyze the input data and to develop an accurate description or model for each class using the features present in the dataAfter conducting this process was accurate measurement of old and new data and taking average and equal81 %Chart:

Naive Bayes: The second method of classification are the same steps.

Clustering : pull the data set to square working and looking for Numerical to Binominal and Remove Duplicate then define the label from the list look for Data transformation then withdraw name and role modification then set role on right of the page there are properties in which we define labelLooking for a performance evaluation of the list and then the validation of performance and regression then the (performance clustering)Looking for a list of Modeling then cluster and segmentation and withdraw the K-meansthen search for data transformation then attribute set reduction and transformation then transformation then singular values composite

Result: split the data into tow clustering

The last process: Outlierspull the data set to square working and look for Data transformation then Data Cleansing then outliers Detect and withdraw Detect outliers(Distances) Find SVD and also withdrawn.

Result : The data were classified to the Outliers = TrueAnd not outliers=false

This image represents statistical outliersThere are only ten outliers.Outliers=10Not Outliers=758

This chart: shows the percentage of non-outliers and outliers values Fallon Red represents a few outliers either blue color represents a non- outliers values, the largest percentage

Conclusion: been identified in this report on how to attract data and operations on data processing and make it usable for the application of data mining techniques, including the Association Rules ,outliers, clustering and classification.Each one of them has a different mechanism and different result identify the existing data set we have large they need these easy ways and identify Statistics.2