horizontal aggregatios in sql to prepare dataset using split-spj metho

Post on 13-Dec-2014

1.353 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Preparing a data set for analysis is generally the most time consuming task in a data mining project, requiring many complex SQL queries, joining tables, and aggregating columns. Existing SQL aggregations have limitations to prepare data sets because they return one column per aggregated group. In general, a significant manual effort is required to build data sets, where a horizontal layout is required. We propose simple, yet powerful, methods to generate SQL code to return aggregated columns in a horizontal tabular layout, returning a set of numbers instead of one number per row. This new class of functions is called horizontal aggregations in Split-SPJ method. Horizontal aggregations build data sets with a horizontal de-normalized layout (e.g., point-dimension, observation variable, instance-feature), which is the standard layout required by most data mining algorithms.

TRANSCRIPT

Welcome To

Thesis Presentation

PresentationOn

Horizontal Aggregations in SQL to prepare Dataset using Split-SPJ Method

ATHESIS & PROJECT

BY

 Arifur Rahman (074051)Md. Taz Uddin (074044)

Md. Tareq Imran (074050)

Supervised BY

Sumaya KazaryAssistant professor, Dept. of CSE, DUET

Introduction1

Analysis2

Experimental Overview3

Compare Performance4

Future plans5

Overview

3April 10, 2023

IntroductionPreparing a data set for analysis is generally the most time

consuming task in a data mining project, requiring many complex SQL queries, joining tables, and aggregating columns. Existing SQL aggregations have limitations to prepare data sets because they return one column per aggregated group. In general, a significant manual effort is required to build data sets, where a horizontal layout is required. We propose simple, yet powerful, methods to generate SQL code to return aggregated columns in a horizontal tabular layout, returning a set of numbers instead of one number per row. This new class of functions is called horizontal aggregations in Split-SPJ method. Horizontal aggregations build data sets with a horizontal de-normalized layout (e.g., point-dimension, observation variable, instance-feature), which is the standard layout required by most data mining algorithms.

4April 10, 2023

5April 10, 2023

Introduction (Contd)

Data Mining : Generally, data mining (sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it into usefulinformation.

6April 10, 2023

Introduction (Contd)

Dataset : A dataset (or data set) is a collection of data, usually presented in tabular form. Each column represents a particular variable. Each row corresponds to a given member of the dataset in question.

Vertical Aggregation : It arrange dataset from database in vertically as respect with necessary query (such as group by clause in SQL) .Generally in relational database system the aggregation are arranged by vertical aggregation.

7April 10, 2023

Introduction (Contd)

Horizontal Aggregation : Here introduce a new class of aggregations that have similar behavior to SQL standard aggregations, but which produce tables with a horizontal layout. In contrast, we call standard SQL aggregations vertical aggregations since they produce tables with a vertical layout. Horizontal aggregations just require a small syntax extension to aggregate functions called in a SELECT statement.

8April 10, 2023

Analysis

Problem of Horizontal Aggregation : Number of column may be exceed than the allowed number of column of DBMS. That means reaching the maximum number of columns in one table and reaching the maximum column name length when columns are automatically named.

To elaborate on this, a horizontal aggregation can return a table that goes beyond the maximum number of columns in the DBMS when the set of columns {R1,. . .,Rk} has a large number of distinct combinations of values, or when there are multiple horizontal aggregations in the same query.

9April 10, 2023

Analysis (Contd)

Column limit of different Database System :

Database Maximum Permitted ColumnMicrosoft Access 255Microsoft SQL Server 1024MySql 4096Oracle Default 1000 but it can be

increase by command.

10April 10, 2023

Analysis (Contd)

Introduce with Split-SPJ method

If vertical attributes of a table is :ID, VA1, VA2, VA3, VA4,, . . . . .. . . . . ,VA255, VA256, VA257, . . . . ,VA272, VA273 (It is impossible to aggregate in SPJ method)

The output of Split-SPJ method :Table-1ID, VA1, VA2, VA3, VA4, VA5, VA6, VA7, . . . . . . . . . ,VA255Table-2ID, VA256, VA257, . . . . . . . . . . . ,VA270, VA271, VA272, VA273

11April 10, 2023

Experimental Overview

Facebook_id Image_name Character_lengthUser1 Pic1 31User1 Pic2 27User1 Pic4 20User1 Pic10 30

.

.

.

.

.

.

.

.

.

.

.

.User4 Pic200 10User4 Pic220 26User4 Pic299 15User4 Pic340 25User4 Pic360 35

Vertical aggregation of experimental data :

12April 10, 2023

Experimental Overview (Contd)

Horizontal aggregation in SPJ method :

Facebook_id Image_name_pic1 Image_name_pic2 Image_name_pic3

. . . . . . . . . . . . . . . .

Image_name_pic255

User1 31 31 20

User2 14 17 14

User3 17 15 13

User4 10 5 8

13April 10, 2023

Experimental Overview (Contd)Horizontal aggregation in proposed Split-SPJ method :

Facebook_id Image_name_pic1 Image_name_pic2 Image_name_pic3

...

Image_name_pic255

User1 31 31 20

User2 14 17 14

User3 17 15 13

User410 5 8

Facebook_id Image_name_pic256 Image_name_pic267 Image_name_pic258

...

Image_name_pic360

User1 31 31 60 50

User2 14 45 40

User3 17 15

User4 10 5 80

Table-1

Table-2

14April 10, 2023

Experimental OverviewCompare Performance :

When aggregated column < 255, performance is same for SPJ and Split-SPJ method.

15April 10, 2023

Experimental Overview (Contd)Compare Performance :

When aggregated column > 255, it is unable to aggregate up to 255 column.

16April 10, 2023

Experimental Overview (Contd)Compare Performance :

When aggregated column > 255, it is possible to aggregate into multiple table.

Future Plan

17April 10, 2023

If the length of aggregate object is exceed column length of related database than there occur an error which may be overcome by using alias method. That means it is very complex to aggregate when data field’s are contain image or file (such as blob data).

top related