linux dm lab manual

Upload: seravana-kumar

Post on 14-Apr-2018

258 views

Category:

Documents


1 download

TRANSCRIPT

  • 7/29/2019 Linux DM Lab Manual

    1/68

    Contents

    S.No TopicPage

    no

    1. List of Linux Programs 4

    2.List of Data Mining Programs 6

    3.

    Week1

    1. Write a shell script that accepts a file name, starting and ending line

    numbers as arguments and displays all the lines between the given line

    numbers.

    8

    2. Write a shell script that deletes all lines containing a specified word in

    one or more files supplied as arguments to it.

    3. Write a shell script that displays a list of all the files in the current

    directory to which the user has read, write and execute permissions.

    4. Write a shell script that receives any number of file names as arguments

    checks if every argument supplied is a file or a directory and reports

    accordingly. Whenever the argument is a file, the number of lines on it is also

    reported.

    4.

    Week 2

    5. Write a shell script that accepts a list of file names as its arguments, counts

    and reports the occurrence of each word that is present in the first argument

    file on other argument files12

    6. Write a shell script to list all of the directory files in a directory.

    7. Write a shell script to find factorial of a given integer.

    5.

    Week 38. Write an awk script to count the number of lines in a file that do not

    contain vowels.

    149. Write an awk script to find the number of characters, words and lines in a

    file.

    10. Write a c program that makes a copy of a file using standard I/O and

    system calls

    6.

    Week 4

    11. Implement in C the following UNIX commands using System calls

    A. cat B. ls C. mv

    15

    1

  • 7/29/2019 Linux DM Lab Manual

    2/68

    12. Write a program that takes one or more file/directory names as command

    line input and reports the following information on the file.

    A. File type. B. Number of links.

    C. Time of last access.

    D. Read, Write and Execute permissions.

    7.

    Week 5

    13. Write a C program to emulate the UNIX ls l command.

    1814. Write a C program to list for every file in a directory, its inode number

    and file name.

    15. Write a C program that demonstrates redirection of standard output to a

    file.

    Ex: ls > f1.

    8.

    Week 6

    16. Write a C program to create a child process and allow the parent to

    display parent and the child to display child on the screen.

    20

    17. Write a C program to create a Zombie process.

    18. Write a C program that illustrates how an orphan is created.

    9.

    Week 7

    19. Write a C program that illustrates how to execute two commands

    concurrently with a command pipe.

    Ex: - ls l | sort

    22

    20. Write C programs that illustrate communication between two unrelated

    processes using named pipe

    21. Write a C program to create a message queue with read and write

    permissions to write 3 messages to it with different priority numbers.

    22. Write a C program that receives the messages (from the above message

    queue as specified in (21)) and displays them.

    10.

    Week 8

    23. Write a C program to allow cooperating processes to lock a resource for

    exclusive use, using a) Semaphores b) flock or lockf system calls.30

    24. Write a C program that illustrates suspending and resuming processes

    using signals

    11.

    Week 9

    25. Write a C program that implements a producer-consumer system with

    two processes.

    31

    2

  • 7/29/2019 Linux DM Lab Manual

    3/68

    (Using Semaphores).

    26. Write client and server programs (using c) for interaction between server

    and client processes using Unix Domain sockets.

    12.

    Week 10

    27. Write client and server programs (using c) for interaction between server

    and client processes using Internet Domain sockets.33

    28. Write a C program that illustrates two processes communicating using

    shared memory

    13. Listing of categorical attributes and the real-valued attributesseparately.

    39

    14. Rules for identifying attributes. 40

    15. Training a decision tree. 43

    16. Test on classification of decision tree. 47

    17. Testing on the training set . 51

    18. Using cross validation for training. 52

    19. Significance of attributes in decision tree. 55

    20. Trying generation of decision tree with various number of decision

    tree.

    58

    21. Find out differences in results using decision tree and cross-validation on a data set.

    60

    22. Decision trees. 62

    23. Reduced error pruning for training Decision Trees using cross-validation

    62

    24. Convert a Decision Trees into "if-then-else rules". 65

    List of Linux Programs

    1.

    Write a shell script that accepts a file name, starting and ending line numbers as

    arguments and displays all the lines between the given line numbers.

    3

  • 7/29/2019 Linux DM Lab Manual

    4/68

    2.Write a shell script that deletes all lines containing a specified word in one or

    more files supplied as arguments to it.

    3.Write a shell script that displays a list of all the files in the current directory to

    which the user has read, write and execute permissions.

    4.

    Write a shell script that receives any number of file names as arguments checks

    if every argument supplied is a file or a directory and reports accordingly.

    Whenever the argument is a file, the number of lines on it is also reported

    5.

    Write a shell script that accepts a list of file names as its arguments, counts and

    reports the occurrence of each word that is present in the first argument file on

    other argument files

    6. Write a shell script to list all of the directory files in a directory.

    7. Write a shell script to find factorial of a given integer.

    8.Write an awk script to count the number of lines in a file that do not contain

    vowels.

    9. Write an awk script to find the number of characters, words and lines in a file.10.

    Write a c program that makes a copy of a file using standard I/O and system

    calls

    11. Implement in C the following UNIX commands using System calls

    A. cat B. ls C. mv

    12.

    Write a program that takes one or more file/directory names as command line

    input and reports the following information on the file.

    A. File type. B. Number of links.

    C. Time of last access. D. Read, Write and Execute permissions.

    13. Write a C program to emulate the UNIX ls l command.

    14.Write a C program to list for every file in a directory, its inode number and file

    name.

    15.

    Write a C program that demonstrates redirection of standard output to a file.Ex:

    ls > f1.

    16.Write a C program to create a child process and allow the parent to display

    parent and the child to display child on the screen.

    17. Write a C program to create a Zombie process.

    18. Write a C program that illustrates how an orphan is created.

    19.

    Write a C program that illustrates how to execute two commands concurrently

    with a command pipe.

    Ex: - ls l | sort

    20.Write C programs that illustrate communication between two unrelated

    processes using named pipe.

    21.Write a C program to create a message queue with read and write permissions

    to write 3 messages to it with different priority numbers.

    22.Write a C program that receives the messages (from the above message queue

    as specified in (21)) and displays them.23. Write a C program to allow cooperating processes to lock a resource for

    4

  • 7/29/2019 Linux DM Lab Manual

    5/68

    exclusive use, using a) Semaphores b) flock or lockf system calls.

    24.Write a C program that illustrates suspending and resuming processes using

    signals.

    25.Write a C program that implements a producer-consumer system with two

    processes. (using Semaphores).

    26.Write client and server programs (using c) for interaction between server and

    client processes using Unix Domain sockets.

    27.Write client and server programs (using c) for interaction between server and

    client processes using Internet Domain sockets.

    28.Write a C program that illustrates two processes communicating using shared

    memory

    List of Data Mining Programs

    5

  • 7/29/2019 Linux DM Lab Manual

    6/68

    6

    S.No. Task Description1. List all the categorical (or nominal) attributes and the real-valued

    attributes separately.2. What attributes do you think might be crucial in making the credit

    assessment ? Come up with some simple rules in plain English usingyour selected attributes.

    3. One type of model that you can create is a Decision Tree - train aDecision Tree using the complete dataset as the training data. Report themodel obtained after training.

    4. Suppose you use your above model trained on the complete dataset, and

    classify credit good/bad for each of the examples in the dataset. What %of examples can you classify correctly ? (This is also called testing on thetraining set) Why do you think you cannot get 100 % training accuracy ?

    5. Is testing on the training set as you did above a good idea ?Why or Why not ?

    6. One approach for solving the problem encountered in the previousquestion is using cross-validation ? Describe what is cross-validationbriefly. Train a Decistion Tree again using cross-validation and reportyour results. Does your accuracy increase/decrease ? Why ? (10 marks)

    7. Check to see if the data shows a bias against "foreign workers" (attribute20),or "personal-status" (attribute 9). One way to do this (perhaps rather

    simple minded) is to remove these attributes from the dataset and see ifthe decision tree created in those cases is significantly different from thefulldataset case which you have already done. To remove an attribute youcan use the preprocess tab in Weka's GUI Explorer. Did removing theseattributes have any significant effect? Discuss.

    8. Another question might be, do you really need to input so manyattributes to get good results? Maybe only a few would do. For example,you could try just having attributes 2, 3, 5, 7, 10, 17 (and 21, the classattribute (naturally)). Try out some combinations. (You had removed twoattributes inproblem 7. Remember to reload the arff data file to get all the attributes

    initially before you start selecting the ones you want.)9. Sometimes, the cost of rejecting an applicant who actually has a good

    credit (case 1) might be higher than accepting an applicant who has badcredit (case 2). Instead of counting the misclassifications equally in bothcases, give a higher cost to the first case (say cost 5) and lower cost tothe second case. You can do this by using a cost mAECx in Weka. Trainyour Decision Treeagain and report the Decision Tree and cross-validation results. Are theysignificantly different from results obtained in problem 6 (using equalcost)?

    10. Do you think it is a good idea to prefer simple decision trees instead ofhaving long complex decision trees? How does the complexity of aDecision Tree relate to the bias of the model?

    11. You can make your Decision Trees simpler by pruning the nodes. Oneapproach is to use Reduced Error Pruning - Explain this idea briefly. Tryreduced error pruning for training your Decision Trees using cross-validation (you can do this in Weka) and report the Decision Tree youobtain ? Also,report your accuracy using the pruned model. Does your accuracyincrease ?

    12. (Extra Credit): How can you convert a Decision Trees into "if-then-elserules". Make up your own small Decision Tree consisting of 2-3 levels and

    convert it into a set of rules. There also exist different classifiers thatoutput the model in the form of rules - one such classifier in Weka isrules.PART, train this model and report the set of rules obtained.Sometimes just one attribute canbe good enough in making the decision, yes, just one ! Can you predictwhat attribute that might be in this dataset ? OneR classifier uses asingle attribute to make decisions (it chooses the attribute based onminimum error). Report the rule obtained by training a one R classifier.

  • 7/29/2019 Linux DM Lab Manual

    7/68

    Week1

    1. Write a shell script that accepts a file name, starting and ending line numbers as

    arguments and displays all the lines between the given line numbers.

    Aim: ToWrite a shell script that accepts a file name, starting and ending line numbers as

    arguments and displays all the lines between the given line numbers.

    Script:

    if [ $# -ne 3 ]

    then echo "Error : Invalid number of arguments."

    exitfi

    7

  • 7/29/2019 Linux DM Lab Manual

    8/68

    if [ $2 -gt $3 ]

    then

    echo "Error : Invalid range value."

    exit

    fil=`expr $3 - $2 + 1`

    cat $1 | tail +$2 | head -$l

    Output:

    $sh 11b.sh test 5 7

    abc 1234

    def 5678

    ghi 91011

    Description :

    head command : This command is used to display at the beginning of one ormore

    files. By default it displays first 10 lines of a file

    head [ count option ] filename

    tail command : This command is used to display last few lines at the end of a file. By

    default it displays last 10 lines of a file

    tail [ +/-start] filename

    startis starting line number

    tail -5 filename : It displays last 5 lines of the file

    tail +5 filename : It displays all the lines ,beginning from line number 5 to end of the file.

    2. Write a shell script that deletes all lines containing a specified word in one or more

    files supplied as arguments to it.

    Aim: To write a shell script that deletes all lines containing a specified word in one or

    more files supplied as arguments to it.

    Script:

    clear

    if [ $# -eq 0 ]

    then

    echo no arguments passed

    exit

    fi

    echo the contents before deleting

    for i in $*

    do

    echo $i

    cat $i

    doneecho enter the word to be deleted

    8

  • 7/29/2019 Linux DM Lab Manual

    9/68

    read word

    for i in $*

    do

    grep -vi "$word" $i > temp

    mv temp $iecho after deleting

    cat $i

    done

    Output:

    $ sh 8b.sh test1

    the contents before deleting

    test1

    hello

    hello

    bangalore

    mysore city

    enter the word to be deleted

    city

    after deleting

    hello

    hello

    Bangalore

    $ sh 8b.shno argument passed

    3. Write a shell script that displays a list of all the files in the current directory to

    which the user has read, write and execute permissions.

    Aim: To write a shell script that displays a list of all the files in the current directory to

    which the user has read, write and execute permissions.

    Script:

    echo "enter the directory name"

    read dir

    if [ -d $dir ]

    then

    cd $dir

    ls > f

    exec < fwhile read line

    9

  • 7/29/2019 Linux DM Lab Manual

    10/68

    do

    if [ -f $line ]

    then

    if [ -r $line -a -w $line -a -x $line ]

    thenecho "$line has all permissions"

    else

    echo "files not having all permissions"

    fi

    fi

    done

    fi

    4. Write a shell script that receives any number of file names as arguments checks if

    every argument supplied is a file or a directory and reports accordingly. Wheneverthe argument is a file, the number of lines on it is also reported

    Aim: To write a shell script that receives any number of file names as arguments checks

    if every argument supplied is a file or a directory

    Script:

    for x in $*

    do

    if [ -f $x ]

    then

    echo " $x is a file "

    echo " no of lines in the file are "

    wc -l $x

    elif [ -d $x ]

    then

    echo " $x is a directory "

    else

    echo " enter valid filename or directory name "

    fidone

    10

  • 7/29/2019 Linux DM Lab Manual

    11/68

    Week 2

    5. Write a shell script that accepts a list of file names as its arguments, counts and

    reports the occurrence of each word that is present in the first argument file on

    other argument files.

    Aim : To write a shell script that accepts a list of file names as its arguments, counts

    and reports the occurrence of each word that is present in the first argument file on

    other argument files.

    Script:

    if [ $# -ne 2 ]

    then

    echo "Error : Invalid number of arguments."

    exit

    fi

    str=`cat $1 | tr '\n' ' '`

    for a in $str

    do

    echo "Word = $a, Count = `grep -c "$a" $2`"

    done

    Output :

    $ cat test

    hello AEC$ cat test1

    hello AEC

    hello AEC

    hello

    $ sh 1.sh test test1

    Word = hello, Count = 3

    Word = AEC, Count = 2

    6. Write a shell script to list all of the directory files in a directory.

    11

  • 7/29/2019 Linux DM Lab Manual

    12/68

    Script:

    # !/bin/bash

    echo"enter directory name"

    read dirif[ -d $dir]

    then

    echo"list of files in the directory"

    ls $dir

    else

    echo"enter proper directory name"

    fi

    Output:

    Enter directory name

    AEC

    List of all files in the directoty

    CSE.txt

    ECE.txt

    7. Write a shell script to find factorial of a given integer.

    Script:

    # !/bin/bash

    echo "enter a number"read num

    fact=1

    while [ $num -ge 1 ]

    do

    fact=`echo $fact\* $num|bc`

    let num--

    done

    echo "factorial of $n is $fact"

    Output:

    Enter a number

    5

    Factorial of 5 is 120

    12

  • 7/29/2019 Linux DM Lab Manual

    13/68

    Week 3

    8. Write an awk script to count the number of lines in a file that do not contain

    vowels.

    9. Write an awk script to find the number of characters, words and lines in a file.10. Write a c program that makes a copy of a file using standard I/O and system calls

    Aim : To write an awk script to find the number of characters, words and lines in a file.

    Script:

    BEGIN{print "record.\t characters \t words"}

    #BODY section

    {

    len=length($0)

    total_len+=len

    print(NR,":\t",len,":\t",NF,$0)

    words+=NF

    }

    END{

    print("\n total")

    print("characters :\t" total len)

    print("lines :\t" NR)

    }

    13

  • 7/29/2019 Linux DM Lab Manual

    14/68

    Week 4

    11. Implement in C the following UNIX commands using System calls

    A. cat B. ls C. mv

    12. Write a program that takes one or more file/directory names as command line inputand reports the following information on the file.

    A. File type. B. Number of links.

    C. Time of last access. D. Read, Write and Execute permissions.

    AIM: Implement in C the cat Unix command using system calls

    #include

    #include

    #define BUFSIZE 1

    int main(int argc, char **argv)

    {

    int fd1;

    int n;

    char buf;

    fd1=open(argv[1],O_RDONLY);

    printf("Welcome to AEC\n");

    while((n=read(fd1,&buf,1))>0)

    {

    printf("%c",buf);/* or

    write(1,&buf,1); */

    }

    return (0);

    }

    AIM: Implement in C the following ls Unix command using system calls

    Algorithm:

    1. Start.

    2. open directory using opendir( ) system call.

    3. read the directory using readdir( ) system call.

    4. print dp.name and dp.inode .

    5. repeat above step until end of directory.

    6. End

    #include

    #include

    #include

    #include

    14

  • 7/29/2019 Linux DM Lab Manual

    15/68

    #define FALSE 0

    #define TRUE 1

    extern int alphasort();

    char pathname[MAXPATHLEN];

    main() {

    int count,i;

    struct dirent **files;

    int file_select();

    if (getwd(pathname) == NULL )

    { printf("Error getting pathn");

    exit(0);

    }

    printf("Current Working Directory = %sn",pathname);

    count = scandir(pathname, &files, file_select, alphasort);

    if (count d_name, ".") == 0) ||(strcmp(entry->d_name, "..") == 0))

    return (FALSE);

    else

    return (TRUE);

    }

    AIM: Implement in C the Unix command mv using system calls

    Algorithm:

    1. Start

    2. open an existed file and one new open file using open()

    system call

    3. read the contents from existed file using read( ) systemcall

    15

  • 7/29/2019 Linux DM Lab Manual

    16/68

    4. write these contents into new file using write system

    call using write( ) system call

    5. repeat above 2 steps until eof

    6. close 2 file using fclose( ) system call

    7. delete existed file using using unlink( ) system8. End.

    Program:

    #include

    #include

    #include

    #include

    int main(int argc, char **argv)

    {

    int fd1,fd2;

    int n,count=0;

    fd1=open(argv[1],O_RDONLY);

    fd2=creat(argv[2],S_IWUSR);

    rename(fd1,fd2);

    unlink(argv[1]);

    printf( file is copied );

    return (0);

    }

    16

  • 7/29/2019 Linux DM Lab Manual

    17/68

    Week 5

    13. Write a C program to emulate the UNIX ls l command.

    ALGORITHM :

    Step 1: Include necessary header files for manipulating directory.

    Step 2: Declare and initialize required objects.

    Step 3: Read the directory name form the user.

    Step 4: Open the directory using opendir() system call and report error if the directory is not

    available.

    Step 5: Read the entry available in the directory.

    Step 6: Display the directory entry ie., name of the file or sub directory.

    Step 7: Repeat the step 6 and 7 until all the entries were read.

    /* 1. Simulation of ls command */

    #include

    #include

    #include

    #includemain()

    {

    char dirname[10];

    DIR *p;

    struct dirent *d;printf("Enter directory name ");

    scanf("%s",dirname);

    p=opendir(dirname);

    if(p==NULL)

    {

    perror("Cannot find dir.");

    exit(-1);

    }

    while(d=readdir(p))

    printf("%s\n",d->d_name);

    }

    SAMPLE OUTPUT:

    enter directory name iii

    ...

    f2

    17

  • 7/29/2019 Linux DM Lab Manual

    18/68

    14. Write a C program to list for every file in a directory, its inode number and file

    name.

    15. Write a C program that demonstrates redirection of standard output to a file.

    Ex: ls > f1.Description:

    An Inode number points to an Inode. An Inode is a data structure that stores

    the following information about a file :

    Size of file

    Device ID

    User ID of the file

    Group ID of the file

    The file mode information and access privileges for owner, group and others

    File protection flags

    The timestamps for file creation, modification etc

    link counter to determine the number of hard links

    Pointers to the blocks storing files contents

    18

  • 7/29/2019 Linux DM Lab Manual

    19/68

    19

  • 7/29/2019 Linux DM Lab Manual

    20/68

    Week 6

    16. Write a C program to create a child process and allow the parent to display

    parent and the child to display child on the screen.

    #include

    #includemain()

    {

    int childpid;

    if (( childpid=fork())0)

    {

    }

    else

    printf(Child process);

    }

    17. Write a C program to create a Zombie process.

    If child terminates before the parent process then parent process with out child is

    called zombie process

    #include

    #include

    main()

    {

    int childpid;

    if (( childpid=fork())0)

    {

    Printf(child process);

    exit(0);

    }

    else

    {wait(100);

    20

  • 7/29/2019 Linux DM Lab Manual

    21/68

    printf(parent process);

    }

    }

    18. Write a C program that illustrates how an orphan is created.

    #include

    main()

    {

    int id;

    printf("Before fork()\n");

    id=fork();

    if(id==0)

    {

    printf("Child has started: %d\n ",getpid());

    printf("Parent of this child : %d\n",getppid());

    printf("child prints 1 item :\n ");

    sleep(25);

    printf("child prints 2 item :\n");

    }

    else

    {

    printf("Parent has started: %d\n",getpid());printf("Parent of the parent proc : %d\n",getppid());

    }

    printf("After fork()");

    }

    21

  • 7/29/2019 Linux DM Lab Manual

    22/68

    Week 7

    19. Write a C program that illustrates how to execute two commands concurrently with

    a command pipe.

    Ex: - ls l | sort

    AIM: Implementing Pipes

    D ESCRIPTION :

    A pipe is created by calling a pipe() function.

    int pipe(int filedesc[2]);

    It returns a pair of file descriptors filedesc[0] is open for reading and filedesc[1] is

    open for writing. This function returns a 0 if ok & -1 on error.

    ALGORITHM:

    The following is the simple algorithm for creating, writing to and reading from a

    pipe.

    1) Create a pipe through a pipe() function call.

    2) Use write() function to write the data into the pipe. The syntax is as follows

    write(int [],ip_string,size);

    int [] filedescriptor variable, in this case if int filedesc[2] is the variable, then

    use the filedesc[1] as the first parameter.

    ip_string The string to be written in the pipe.

    Size buffer size for storing the input

    3) Use read() function to read the data that has been written to the pipe.

    The syntax is as follows

    read(int [], char,size);

    PROGRAM:

    #include

    #include

    main()

    {

    int pipe1[2],pipe2[2],childpid;

    if(pipe(pipe1)

  • 7/29/2019 Linux DM Lab Manual

    23/68

    printf("cannot fork");

    }

    else

    if(childpid >0)

    {close(pipe1[0]);

    close(pipe2[1]);

    client(pipe2[0],pipe1[1]);

    while (wait((int *) 0 ) !=childpid);

    close(pipe1[1]);

    close(pipe2[0]);

    exit(0);

    }

    else

    {

    close(pipe1[1]);

    close(pipe2[0]);

    server(pipe1[0],pipe2[1]);

    close(pipe1[0]);

    close(pipe2[1]);

    exit(0);

    }

    }

    client(int readfd,int writefd){

    int n;

    char buff[1024];

    if(fgets(buff,1024,stdin)==NULL)

    printf("file name read error");

    n=strlen(buff);

    if(buff[n-1]=='\n')

    n--;

    if(write(writefd,buff,n)!=n)

    printf("file name write error");

    while((n=read(readfd,buff,1024))>0)

    if(write(1,buff,n)!=n)

    printf("data write error");

    if(n

  • 7/29/2019 Linux DM Lab Manual

    24/68

    n=read(readfd,buff,1024);

    buff[n]='\0';

    if((fd=open(buff,0))0)

    write(writefd,buff,n);

    }

    }

    20. Write C programs that illustrate communication between two unrelated processes

    using named pipe.

    AIM: Implementing IPC using a FIFO (or) named pipe.

    D ESCRIPTION :

    Another kind of IPC is FIFO(First in First Out) is sometimes also called as

    named pipe.It is like a pipe, except that it has a name.Here the name is that of a file

    that multiple processes can open(), read and write to. A FIFO is created using the

    mknod() system call. The syntax is as follows

    int mknod(char *pathname, int mode, int dev);

    The pathname is a normal Unix pathname, and this is the name of the FIFO.

    The mode argument specifies the file mode access mode.The dev value is ignored for

    a FIFO.

    Once a FIFO is created, it must be opened for reading (or) writing using either the

    open system call, or one of the standard I/O open functions-fopen, or freopen.

    ALGORITHM:

    The following is the simple algorithm for creating, writing to and reading from

    a

    FIFO.

    1) Create a fifo through mknod() function call.

    2) Use write() function to write the data into the fifo. The syntax is as follows

    write(int [],ip_string,size);

    24

  • 7/29/2019 Linux DM Lab Manual

    25/68

    int [] filedescriptor variable, in this case if int filedesc[2] is the variable, then

    use the filedesc[1] as the first parameter.

    ip_string The string to be written in the fifo.

    Size buffer size for storing the input

    3) Use read() function to read the data that has been written to the fifo.

    The syntax is as follows

    read(int [], char,size);

    PROGRAM:

    #define FIFO1 "Fifo1"

    #define FIFO2 "Fifo2"

    #include

    #include

    #include

    #include

    #include

    main()

    {int childpid,wfd,rfd;

    mknod(FIFO1,0666|S_IFIFO,0);

    mknod(FIFO2,0666|S_IFIFO,0);

    if (( childpid=fork())==-1)

    {

    printf("cannot fork");

    }

    else

    if(childpid >0)

    {

    wfd=open(FIFO1,1);

    rfd=open(FIFO2,0);

    client(rfd,wfd);

    while (wait((int *) 0 ) !=childpid);

    close(rfd);

    close(wfd);

    unlink(FIFO1);

    unlink(FIFO2);

    }

    else25

  • 7/29/2019 Linux DM Lab Manual

    26/68

    {

    rfd=open(FIFO1,0);

    wfd=open(FIFO2,1);

    server(rfd,wfd);

    close(rfd);close(wfd);

    }

    }

    client(int readfd,int writefd)

    {

    int n;

    char buff[1024];

    printf ("enter s file name");

    if(fgets(buff,1024,stdin)==NULL)

    printf("file name read error");

    n=strlen(buff);

    if(buff[n-1]=='\n')

    n--;

    if(write(writefd,buff,n)!=n)

    printf("file name write error");

    while((n=read(readfd,buff,1024))>0)

    if(write(1,buff,n)!=n)

    printf("data write error");

    if(n

  • 7/29/2019 Linux DM Lab Manual

    27/68

    21. Write a C program to create a message queue with read and write permissions to

    write 3 messages to it with different priority numbers.

    22. Write a C program that receives the messages (from the above message queue as

    specified in (21)) and displays them.

    Aim: To create a message queue

    DESCRIPTION:

    Message passing between processes are part of operating system, which are done through a

    message queue. Where messages are stored in kernel and are associated with message queue

    identifier (msqid). Processes read and write messages to an arbitrary queue in a way such

    that a process writes a message to a queue, exits and other process reads it at later time.

    ALGORITHM:

    Before defining a structure ipc_perm structure should be defined which is done by

    including following file.

    #include

    #include

    A structure of information is maintained by kernel, it should contain following.

    struct msqid_ds{

    struct ipc_perm msg_perm; /*operation permission*/

    struct msg *msg_first; /*ptr to first msg on queue*/

    struct msg *msg_last; /*ptr to last msg on queue*/

    ushort msg_cbytes; /*current bytes on queue*/

    ushort msg_qnum; /*current no of msgs on queue*/

    ushort msg_qbytes; /*max no of bytes on queue*/

    ushort msg_lspid; /*pid o flast msg send*/

    ushort msg_lrpid; /*pid of last msgrecvd*/

    time_t msg_stime; /*time of last msg snd*/

    time_t msg_rtime; /*time of last msg rcv*/

    time_t msg_ctime; /*time of last msg ctl*/

    };To create new message queue or access existing message queue msgget() function is

    used

    Syntax:

    int msgget(key_t key ,int msgflag);

    Msg flag values

    Num val Symb value desc

    0400 MSG_R Read by owner

    0200 MSG_w Write by owner

    0040 MSG_R >>3 Read by group0020 MSG_W>>3 Write by group

    27

  • 7/29/2019 Linux DM Lab Manual

    28/68

    Msgget returns msqid, or -1 if error

    1. To put message on queue msgsnd() function is used.

    Syntax: int msgsnd(int msqid , struct msgbuf *ptr,int length, int flag);

    msqid is message queue id, a unique id

    msgbufis actual content to send, a pointer to structure which contain following

    struct msgbuf

    {

    Long mtype; /*message type >0 */

    Char mtext[1]; /*data*/

    };

    length is the size of message in bytes

    flag is

    - IPC_NOWAIT which allows sys call to return immediately when no room on

    queue, when this is specified msgsnd will return -1 if no room on queue.

    Else flag can be specified as 0

    2. To receive Message msgrcv() function is used

    Syntax:

    Int msgrcv(int msqid , struct msgbuf *ptr, int length, long msgtype, int flag);

    *ptr is pointer to structure where message received is to be storedLength is size to be received and stored in pointer area

    Flag hasMSG_NOERROR , it returns an error if length is not large enough to

    receive msg, if data portion is greater than msg length it truncates and returns.

    3. Variety of control operations on msg can be done through msgctl() function

    Int msgctl(int msqid, int cmd, struct msqid_ds *buff);

    IPC_RMID in cmd is givento remove a message queue from the system.

    Let us create a header file msgq.h with following in it

    #include

    #include

    #include

    #include

    extern int errno;

    #define MKEY1 1234L

    #define MKEY2 2345L#define PERMS 0666

    28

  • 7/29/2019 Linux DM Lab Manual

    29/68

    Server operation algorithm:

    #include msgq.h

    main()

    { Int readid, writeid;

    If((readid = msgget(MSGKEY1, PERMS |IPC_CREAT))

  • 7/29/2019 Linux DM Lab Manual

    30/68

    24. Write a C program that illustrates suspending and resuming processes using

    signals.

    23. a) AIM: C program that illustrate file locking using semaphores

    PROGRAM:

    #include

    #include

    #include

    #include

    #include

    #include

    int main(void)

    {

    key_t key;

    int semid;

    union semun arg;

    if((key==ftok("sem demo.c","j"))== -1)

    {

    perror("ftok");

    exit(1);

    }

    if(semid=semget(key,1,0666|IPC_CREAT))== -1){

    perror("semget"):

    exit(1);

    }

    arg.val=1;

    if(semctl(semid,0,SETVAL,arg)== -1)

    {

    perror("smctl");

    exit(1);

    }

    return 0;

    }

    OUTPUT:

    semget

    smctl

    Week 9

    30

  • 7/29/2019 Linux DM Lab Manual

    31/68

    25. Write a C program that implements a producer-consumer system with two

    processes. (using Semaphores).

    26. Write client and server programs (using c) for interaction between server and client

    processes using Unix Domain sockets.

    Algorithm:

    1. Start

    2. create semaphore using semget( ) system call

    3. if successful it returns positive value

    4. create two new processes

    5. first process will produce

    6. until first process produces second process cannot consume

    7. End.

    Source code:

    #include

    #include

    #include

    #include

    #include

    #include

    #define num_loops 2int main(int argc,char* argv[])

    {

    int sem_set_id;

    int child_pid,i,sem_val;

    struct sembuf sem_op;

    int rc;

    struct timespec delay;

    clrscr();

    sem_set_id=semget(ipc_private,2,0600);

    if(sem_set_id==-1)

    {

    perror(main:semget);

    exit(1);

    }

    printf(semaphore set created,semaphore setid%d\n ,

    sem_set_id);

    child_pid=fork();

    switch(child_pid)

    {case -1:

    31

  • 7/29/2019 Linux DM Lab Manual

    32/68

    perror(fork);

    exit(1);

    case 0:

    for(i=0;i

  • 7/29/2019 Linux DM Lab Manual

    33/68

    27. Write client and server programs (using c) for interaction between server and client

    processes using Internet Domain sockets.

    28. Write a C program that illustrates two processes communicating using shared

    memory.

    DESCRIPTION:

    Shared Memory is an efficeint means of passing data between programs. One

    program will create a memory portion which other processes (if permitted) can access.

    The problem with the pipes, FIFOs and message queues is that for two processes to

    exchange information, the information has to go through the kernel. Shared memory provides

    a way around this by letting two or more processes share a memory segment.

    In shared memory concept if one process is reading into some shared memory, for

    example, other processes must wait for the read to finish before processing the data.

    A process creates a shared memory segment using shmget()|. The original owner of a

    shared memory segment can assign ownership to another user with shmctl(). It can also

    revoke this assignment. Other processes with proper permission can perform various control

    functions on the shared memory segment using shmctl(). Once created, a shared segment can

    be attached to a process address space using shmat(). It can be detached using shmdt() (see

    shmop()). The attaching process must have the appropriate permissions for shmat(). Once

    attached, the process can read or write to the segment, as allowed by the permission requestedin the attach operation. A shared segment can be attached multiple times by the same process.

    A shared memory segment is described by a control structure with a unique ID that points to

    an area of physical memory. The identifier of the segment is called the shmid. The structure

    definition for the shared memory segment control structures and prototypews can be found in

    .

    shmget() is used to obtain access to a shared memory segment. It is prottyped by:

    int shmget(key_t key, size_t size, int shmflg);

    The key argument is a access value associated with the semaphore ID. The size argument is

    the size in bytes of the requested shared memory. The shmflg argument specifies the initial

    access permissions and creation control flags.

    When the call succeeds, it returns the shared memory segment ID. This call is also used to get

    the ID of an existing shared segment (from a process requesting sharing of some existing

    memory portion).

    The following code illustrates shmget():#include

    33

  • 7/29/2019 Linux DM Lab Manual

    34/68

    #include

    #include

    ...

    key_t key; /* key to be passed to shmget() */

    int shmflg; /* shmflg to be passed to shmget() */int shmid; /* return value from shmget() */

    int size; /* size to be passed to shmget() */

    ...

    key = ...

    size = ...

    shmflg) = ...

    if ((shmid = shmget (key, size, shmflg)) == -1) {

    perror("shmget: shmget failed"); exit(1); } else {

    (void) fprintf(stderr, "shmget: shmget returned %d\n", shmid);

    exit(0);

    }

    ...

    Controlling a Shared Memory Segment

    shmctl() is used to alter the permissions and other characteristics of a shared memory

    segment. It is prototyped as follows:

    int shmctl(int shmid, int cmd, struct shmid_ds *buf);The process must have an effective shmid of owner, creator or superuser to perform this

    command. The cmd argument is one of following control commands:

    SHM_LOCK

    -- Lock the specified shared memory segment in memory. The

    process must have the effective ID of superuser to perform this

    command.

    SHM_UNLOCK

    -- Unlock the shared memory segment. The process must have the

    effective ID of superuser to perform this command.

    IPC_STAT

    -- Return the status information contained in the control structure

    and place it in the buffer pointed to by buf. The process must have

    read permission on the segment to perform this command.

    IPC_SET

    -- Set the effective user and group identification and access

    permissions. The process must have an effective ID of owner,

    creator or superuser to perform this command.

    IPC_RMID

    -- Remove the shared memory segment.The buf is a sructure of type struct shmid_ds which is defined in

    34

  • 7/29/2019 Linux DM Lab Manual

    35/68

    The following code illustrates shmctl():

    #include

    #include

    #include

    ...int cmd; /* command code for shmctl() */

    int shmid; /* segment ID */

    struct shmid_ds shmid_ds; /* shared memory data structure to

    hold results */

    ...

    shmid = ...

    cmd = ...

    if ((rtrn = shmctl(shmid, cmd, shmid_ds)) == -1) {

    perror("shmctl: shmctl failed");

    exit(1);

    }

    ..

    Attaching and Detaching a Shared Memory Segment

    shmat() and shmdt() are used to attach and detach shared memory segments. They are

    prototypes as follows:

    void *shmat(int shmid, const void *shmaddr, int shmflg);

    int shmdt(const void *shmaddr);

    shmat() returns a pointer, shmaddr, to the head of the shared segment associated with a valid

    shmid. shmdt() detaches the shared memory segment located at the address indicated byshmaddr

    . The following code illustrates calls to shmat() and shmdt():

    #include

    #include

    #include

    static struct state { /* Internal record of attached segments. */

    int shmid; /* shmid of attached segment */

    char *shmaddr; /* attach point */

    int shmflg; /* flags used on attach */

    } ap[MAXnap]; /* State of current attached segments. */

    int nap; /* Number of currently attached segments. */

    ...

    char *addr; /* address work variable */

    register int i; /* work area */

    register struct state *p; /* ptr to current state entry */

    ...

    p = &ap[nap++];

    p->shmid = ...

    p->shmaddr = ...p->shmflg = ...

    35

  • 7/29/2019 Linux DM Lab Manual

    36/68

    p->shmaddr = shmat(p->shmid, p->shmaddr, p->shmflg);

    if(p->shmaddr == (char *)-1) {

    perror("shmop: shmat failed");

    nap--;

    } else(void) fprintf(stderr, "shmop: shmat returned %#8.8x\n",

    p->shmaddr);

    ...

    i = shmdt(addr);

    if(i == -1) {

    perror("shmop: shmdt failed");

    } else {

    (void) fprintf(stderr, "shmop: shmdt returned %d\n", i);

    for (p = ap, i = nap; i--; p++)

    if (p->shmaddr == addr) *p = ap[--nap];

    }

    ...

    Algorithm:

    1. Start

    2. create shared memory using shmget( ) system call

    3. if success full it returns positive value

    4. attach the created shared memory using shmat( ) system

    call5. write to shared memory using shmsnd( ) system call

    6. read the contents from shared memory using shmrcv( )

    system call

    7. End .

    Source Code:

    #include

    #include

    #include

    #include

    #include

    #include

    #define shm_size 1024

    int main(int argc,char * argv[])

    {

    key_t key;

    int shmid;

    char *data;

    int mode;

    if(argc>2){

    36

  • 7/29/2019 Linux DM Lab Manual

    37/68

    fprintf(stderr,usage:stdemo[data_to_writte]\n);

    exit(1);

    }

    if((shmid=shmget(key,shm_size,0644/ipc_creat))==-1)

    {perror(shmget);

    exit(1);

    }

    data=shmat(shmid,(void *)0,0);

    if(data==(char *)(-1))

    {

    perror(shmat);

    exit(1);

    }

    if(argc==2)

    printf(writing to segment:\%s\\n,data);

    if(shmdt(data)==-1)

    {

    perror(shmdt);

    exit(1);

    }

    return 0;

    }

    Input:#./a.out swarupa

    Output:

    writing to segment swarupa

    Data Mining Lab

    Credit Risk Assessment

    Description: The business of banks is making loans. Assessing the credit worthiness

    of an applicant is of crucial importance. You have to develop a system to help a loan

    officer decide whether the credit of a customer is good, or bad. A banks business

    rules regarding loans must consider two opposing factors. On the one hand, a bank

    wants to make as many loans as possible. Interest on these loans is the bans profit

    source. On the other hand, a bank cannot afford to make too many bad loans. Too

    many bad loans could lead to the collapse of the bank. The banks loan policy mustinvolve a compromise not too strict, and not too lenient.

    37

  • 7/29/2019 Linux DM Lab Manual

    38/68

    To do the assignment, you first and foremost need some knowledge about the world

    of credit . You can acquire such knowledge in a number of ways.

    1. Knowledge Engineering. Find a loan officer who is willing to talk. Interview her and

    try to represent her knowledge in the form of production rules.

    2. Books. Find some training manuals for loan officers or perhaps a suitable textbook on

    finance. Translate this knowledge from text form to production rule form.

    3. Common sense. Imagine yourself as a loan officer and make up reasonable rules

    which can be used to judge the credit worthiness of a loan applicant.

    4. Case histories. Find records of actual cases where competent loan officers correctly

    judged when not to, approve a loan application.

    The German Credit Data :

    Actual historical credit data is not always easy to come by because of confidentiality

    rules. Here is one such dataset ( original) Excel spreadsheet version of the German credit

    data (download from web).

    In spite of the fact that the data is German, you should probably make use of it for this

    assignment, (Unless you really can consult a real loan officer !)

    A few notes on the German dataset :

    DM stands for Deutsche Mark, the unit of currency, worth about 90 centsCanadian (but looks and acts like a quarter).

    Owns_telephone. German phone rates are much higher than in Canada so fewer

    people own telephones.

    Foreign_worker. There are millions of these in Germany (many from Turkey). It

    is very hard to get German citizenship if you were not born of German parents.

    There are 20 attributes used in judging a loan applicant. The goal is the classify

    the applicant into one of two categories, good or bad.

    Subtasks : (Turn in your answers to the following tasks)

    Laboratory Manual For Data Mining

    EXPERIMENT-1

    Aim: To list all the categorical(or nominal) attributes and the real valued attributes using Weka

    mining tool.

    Tools/ Apparatus: Weka mining tool..

    38

  • 7/29/2019 Linux DM Lab Manual

    39/68

    Procedure:

    1) Open the Weka GUI Chooser.

    2) Select EXPLORER present in Applications.

    3) Select Preprocess Tab.

    4) Go to OPEN file and browse the file that is already stored in the system bank.csv.

    5) Clicking on any attribute in the left panel will show the basic statistics on that selected attribute.

    SampleOutput:

    EXPERIMENT-2

    Aim:To identify the rules with some of the important attributes by a) manually and b) Using Weka .

    Tools/ Apparatus: Weka mining tool..

    Theory:

    Association rule mining is defined as: Let be a set ofnbinary attributes called items. Let be a set of

    transactions called the database. Each transaction inD has a unique transaction ID and contains a

    subset of the items inI. A rule is defined as an implication of the form X=>Y where X,Y C I and X

    Y= . The sets of items (for short itemsets) X and Y are called antecedent (left hand side or LHS) and

    consequent(righthandside or RHS) of the rule respectively.

    To illustrate the concepts, we use a small example from the supermarket domain.

    39

  • 7/29/2019 Linux DM Lab Manual

    40/68

    The set of items isI= {milk,bread,butter,beer} and a small database containing the items (1 codes

    presence and 0 absence of an item in a transaction) is shown in the table to the right. An example rule

    for the supermarket could be meaning that if milk and bread is bought, customers also buy butter.

    Note: this example is extremely small. In practical applications, a rule needs a support of several

    hundred transactions before it can be considered statistically significant, and datasets often containthousands or millions of transactions.

    To select interesting rules from the set of all possible rules, constraints on various measures of

    significance and interest can be used. The bestknown constraints are minimum thresholds on support

    and confidence. The support supp(X) of an itemsetXis defined as the proportion of transactions in the

    data set which contain the itemset. In the example database, the itemset {milk,bread} has a support of

    2 / 5 = 0.4 since it occurs in 40% of all transactions (2 out of 5 transactions).

    The confidence of a rule is defined . For example, the rule has a confidence of 0.2 / 0.4 = 0.5 in the

    database, which means that for 50% of the transactions containing milk and bread the rule is correct.

    Confidence can be interpreted as an estimate of the probabilityP(Y|X), the probability of finding theRHS of the rule in transactions under the condition that these transactions also contain the LHS .

    ALGORITHM:

    Association rule mining is to find out association rules that satisfy the predefined minimum support

    and confidence from a given database. The problem is usually decomposed into two subproblems.

    One is to find those itemsets whose occurrences exceed a predefined threshold in the database; those

    itemsets are called frequent or large itemsets. The second problem is to generate association rules

    from those large itemsets with the constraints of minimal confidence.

    Suppose one of the large itemsets is Lk, Lk = {I1, I2, , Ik}, association rules with this itemsets aregenerated in the following way: the first rule is {I1, I2, , Ik1} and {Ik}, by checking the confidence

    this rule can be determined as interesting or not. Then other rule are generated by deleting the last

    items in the antecedent and inserting it to the consequent, further the confidences of the new rules are

    checked to determine the interestingness of them. Those processes iterated until the antecedent

    becomes empty. Since the second subproblem is quite straight forward, most of the researches focus

    on the first subproblem. The Apriori algorithm finds the frequent setsL In DatabaseD.

    Find frequent setLk 1.

    Join Step.

    o Ckis generated by joiningLk 1with itself

    Prune Step.

    o Any (k 1) itemset that is not frequent cannot be a subset of a

    frequent kitemset, hence should be removed.

    Where (Ck: Candidate itemset of size k)

    (Lk: frequent itemset of size k)

    40

  • 7/29/2019 Linux DM Lab Manual

    41/68

    Apriori Pseudocode

    Apriori (T,)

    L

  • 7/29/2019 Linux DM Lab Manual

    42/68

  • 7/29/2019 Linux DM Lab Manual

    43/68

  • 7/29/2019 Linux DM Lab Manual

    44/68

    2) Select EXPLORER present in Applications.

    3) Select Preprocess Tab.

    4) Go to OPEN file and browse the file that is already stored in the system bank.csv.

    5) Go to Classify tab.

    6) Here the c4.5 algorithm has been chosen which is entitled as j48 in Java and can be selected by

    clicking the button choose

    7) and select tree j48

    9) Select Test options Use training set

    10) if need select attribute.

    11) Click Start .

    12)now we can see the output details in the Classifier output.

    13) right click on the result list and select visualize tree option .

    Sample output:

    44

  • 7/29/2019 Linux DM Lab Manual

    45/68

    The decision tree constructed by using the implementedC4.5 algorithm

    45

    http://en.wikipedia.org/wiki/C4.5_algorithmhttp://en.wikipedia.org/wiki/C4.5_algorithmhttp://en.wikipedia.org/wiki/C4.5_algorithm
  • 7/29/2019 Linux DM Lab Manual

    46/68

    EXPERIMENT-4

    Aim: To find the percentage of examples that are classified correctly by using the above created

    decision tree model? ie.. Testing on the training set.

    Tools/ Apparatus: Weka mining tool..

    Theory:

    Naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class is

    unrelated to the presence (or absence) of any other feature. For example, a fruit may be considered to

    be an apple if it is red, round, and about 4" in diameter. Even though these features depend on the

    existence of the other features, a naive Bayes classifier considers all of these properties to

    independently contribute to the probability that this fruit is an apple.

    An advantage of the naive Bayes classifier is that it requires a small amount of training data toestimate the parameters (means and variances of the variables) necessary for classification. Because

    independent variables are assumed, only the variances of the variables for each class need to be

    determined and not the entirecovariance mAECx The naive Bayes probabilistic model :

    The probability model for a classifier is a conditional model

    P(C|F1 .................Fn) over a dependent class variable Cwith a small number of outcomes orclasses,

    conditional on several feature variablesF1 throughFn. The problem is that if the number of features

    n is large or when a feature can take on a large number of values, then basing such a model on

    probability tables is infeasible. We therefore reformulate the model to make it more tractable.

    Using Bayes' theorem, we write P(C|F1...............Fn)=[{p(C)p(F1..................Fn|C)}/p(F1,........Fn)]

    46

  • 7/29/2019 Linux DM Lab Manual

    47/68

    In plain English the above equation can be written as

    Posterior= [(prior *likehood)/evidence]

    In practice we are only interested in the numerator of that fraction, since the denominator does not

    depend on Cand the values of the featuresFi are given, so that the denominator is effectively

    constant. The numerator is equivalent to the joint probability model p(C,F1........Fn) which can be

    rewritten as follows, using repeated applications of the definition of conditional probability:

    p(C,F1........Fn) =p(C) p(F1............Fn|C) =p(C)p(F1|C) p(F2.........Fn|C,F1,F2)

    =p(C)p(F1|C) p(F2|C,F1)p(F3.........Fn|C,F1,F2)

    = p(C)p(F1|C) p(F2|C,F1)p(F3.........Fn|C,F1,F2)......p(Fn|C,F1,F2,F3.........Fn1)

    Now the "naive" conditional independence assumptions come into play: assume that each featureFi is

    conditionally independent of every other featureFj for ji .

    This means that p(Fi|C,Fj)=p(Fi|C)

    and so the joint model can be expressed as p(C,F1,.......Fn)=p(C)p(F1|C)p(F2|C)...........

    =p(C) p(Fi|C)

    This means that under the above independence assumptions, the conditional distribution over the class

    variable Ccan be expressed like this:

    p(C|F1..........Fn)= p(C) p(Fi|C)

    Z

    whereZis a scaling factor dependent only on F1.........Fn, i.e., a constant if the values of the feature

    variables are known.

    Models of this form are much more manageable, since they factor into a so called class prior p(C) and

    independent probability distributions p(Fi|C). If there are kclasses and if a model for eachp(Fi|C=c)

    can be expressed in terms ofrparameters, then the corresponding naive Bayes model has (k 1) + n r

    kparameters. In practice, often k= 2 (binary classification) and r= 1 (Bernoulli variables as features)

    are common, and so the total number of parameters of the naive Bayes model is 2n + 1, where n is the

    number of binary features used for prediction

    P(h/D)= P(D/h) P(h) P(D)

    P(h) : Prior probability of hypothesis h

    P(D) : Prior probability of training data D

    P(h/D) : Probability of h given D

    P(D/h) : Probability of D given h

    47

  • 7/29/2019 Linux DM Lab Manual

    48/68

  • 7/29/2019 Linux DM Lab Manual

    49/68

  • 7/29/2019 Linux DM Lab Manual

    50/68

    17 309 | b = NO

    EXPERIMENT-5

    Aim: To Is testing a good idea.

    Tools/ Apparatus: Weka Mining tool

    Procedure:

    1) In Test options, select the Supplied test set radio button

    2) click Set

    3) Choose the file which contains records that were not in the training set we used to create the

    model.

    4) click Start(WEKA will run this test data set through the model we already created. )

    5) Compare the output results with that of the 4th experiment

    Sample output:

    This can be experienced by the different problem solutions while doing practice.

    The important numbers to focus on here are the numbers next to the "Correctly Classified Instances"(92.3 percent) and the "Incorrectly Classified Instances" (7.6 percent). Other important numbers are inthe "ROC Area" column, in the first row (the 0.936); Finally, in the "Confusion MAECx," it showsthe number of false positives and false negatives. The false positives are 29, and the false negativesare 17 in this mAECx.

    Based on our accuracy rate of 92.3 percent, we say that upon initial analysis, this is a good model.

    One final step to validating our classification tree, which is to run our test set through the model and

    ensure that accuracy of the model

    50

  • 7/29/2019 Linux DM Lab Manual

    51/68

    Comparing the "Correctly Classified Instances" from this test set with the "Correctly ClassifiedInstances" from the training set, we see the accuracy of the model , which indicates that the modelwill not break down with unknown data, or when future data is applied to it.

    EXPERIMENT-6

    Aim: To create a Decision tree by cross validation training data set using Weka mining tool.

    Tools/ Apparatus: Weka mining tool..

    Theory:

    Decision tree learning, used in data mining and machine learning, uses a decision tree as a predictive

    model which maps observations about an item to conclusions about the item's target value In these

    tree structures, leaves represent classifications and branches represent conjunctions of features that

    lead to those classifications. In decision analysis, a decision tree can be used to visually and explicitly

    represent decisions and decision making. In data mining, a decision tree describes data but not

    decisions; rather the resulting classification tree can be an input for decision making. This page deals

    with decision trees in data mining.

    Decision tree learning is a common method used in data mining. The goal is to create a model that

    predicts the value of a target variable based on several input variables. Each interior node corresponds

    to one of the input variables; there are edges to children for each of the possible values of that input

    variable. Each leaf represents a value of the target variable given the values of the input variables

    represented by the path from the root to the leaf.

    A tree can be "learned" by splitting the source set into subsets based on an attribute value test. This

    process is repeated on each derived subset in a recursive manner called recursive partitioning. The

    recursion is completed when the subset at a node all has the same value of the target variable, or when

    splitting no longer adds value to the predictions.

    In data mining, trees can be described also as the combination of mathematical and computational

    techniques to aid the description, categorisation and generalization of a given set of data.

    Data comes in records of the form:

    (x, y) = (x1, x2, x3..., xk, y)

    51

  • 7/29/2019 Linux DM Lab Manual

    52/68

    The dependent variable, Y, is the target variable that we are trying to understand, classify or

    generalise. The vectorx is comprised of the input variables, x1, x2, x3 etc., that are used for that task.

    Procedure:

    1) Given the Bank database for mining.

    2) Use the Weka GUI Chooser.

    3) Select EXPLORER present in Applications.

    4) Select Preprocess Tab.

    5) Go to OPEN file and browse the file that is already stored in the system bank.csv.

    6) Go to Classify tab.

    7) Choose Classifier Tree

    8) Select J48

    9) Select Test options Cross-validation.

    10) Set Folds Ex:10

    11) if need select attribute.

    12) now Start weka.

    13)now we can see the output details in the Classifier output.

    14)Compare the output results with that of the 4 th experiment

    15) check whether the accuracy increased or decreased?

    Sample output:

    52

  • 7/29/2019 Linux DM Lab Manual

    53/68

    === Stratified cross-validation ===

    === Summary ===

    Correctly Classified Instances 539 89.8333 %

    Incorrectly Classified Instances 61 10.1667 %

    Kappa statistic 0.7942

    Mean absolute error 0.167

    Root mean squared error 0.305

    Relative absolute error 33.6511 %

    Root relative squared error 61.2344 %

    Total Number of Instances 600

    === Detailed Accuracy By Class ===

    53

  • 7/29/2019 Linux DM Lab Manual

    54/68

  • 7/29/2019 Linux DM Lab Manual

    55/68

  • 7/29/2019 Linux DM Lab Manual

    56/68

    56

  • 7/29/2019 Linux DM Lab Manual

    57/68

    57

  • 7/29/2019 Linux DM Lab Manual

    58/68

    EXPERIMENT-8

    Aim: Select some attributes from GUI Explorer and perform classification and see the effect using

    Weka mining tool.

    Tools/ Apparatus: Weka mining tool..

    Procedure:

    1) Given the Bank database for mining.

    2) Use the Weka GUI Chooser.

    3) Select EXPLORER present in Applications.

    4) Select Preprocess Tab.

    5) Go to OPEN file and browse the file that is already stored in the system bank.csv.

    6) select some of the attributes from attributes list which are to be removed. With this step only the

    attributes necessary for classification are left in the attributes panel.

    7) The go to Classify tab.

    8) Choose Classifier Tree

    9) Select j48

    10) Select Test options Use training set

    11) if need select attribute.

    12) now Start weka.

    13)now we can see the output details in the Classifier output.

    14) right click on the result list and select visualize tree option .

    15)Compare the output results with that of the 4 th experiment

    16) check whether the accuracy increased or decreased?

    17)check whether removing these attributes have any significant effect.

    Sample output:

    58

  • 7/29/2019 Linux DM Lab Manual

    59/68

    EXPERIMENT-9

    59

  • 7/29/2019 Linux DM Lab Manual

    60/68

    Aim: To create a Decision tree by cross validation training data set by changing the cost mAECx in

    Weka mining tool.

    Tools/ Apparatus: Weka mining tool..

    Procedure:

    1) Given the Bank database for mining.

    2) Use the Weka GUI Chooser.

    3) Select EXPLORER present in Applications.

    4) Select Preprocess Tab.

    5) Go to OPEN file and browse the file that is already stored in the system bank.csv.

    6) Go to Classify tab.

    7) Choose Classifier Tree

    8) Select j48

    9) Select Test options Training set.

    10)Click on more options.

    11)Select cost sensitive evaluation and click on set button

    12)Set the mAECx values and click on resize. Then close the window.

    13)Click Ok

    14)Click start.

    15) we can see the output details in the Classifier output

    16) Select Test options Cross-validation.

    17) Set Folds Ex:10

    18) if need select attribute.

    19) now Start weka.

    20)now we can see the output details in the Classifier output.

    21)Compare results of 15th and 20th steps.

    22)Compare the results with that of experiment 6.

    Sample output:

    60

  • 7/29/2019 Linux DM Lab Manual

    61/68

  • 7/29/2019 Linux DM Lab Manual

    62/68

    Tools/ Apparatus: Weka mining tool..

    Procedure:

    This will be based on the attribute set, and the requirement of relationship among attribute we want to

    study. This can be viewed based on the database and user requirement.

    EXPERIMENT-11

    Aim: To create a Decision tree by using Prune mode and Reduced error Pruning and show accuracy

    for cross validation trained data set using Weka mining tool.

    Tools/ Apparatus: Weka mining tool..

    Theory :

    Reduced-error pruning

    Each node of the (over-fit) tree is examined for pruning

    A node is pruned (removed) only if the resulting pruned tree

    performs no worse than the original over the validation set

    Pruning a node consists of

    Removing the sub-tree rooted at the pruned node

    Making the pruned node a leaf node

    Assigning the pruned node the most common classification of the training instances attached to that

    node

    Pruning nodes iteratively

    Always select a node whose removal most increases the DT accuracy over the validation set

    Stop when further pruning decreases the DT accuracy over the validation set

    IF (Children=yes) (income=>30000)

    THEN (car=Yes)

    Procedure:

    1) Given the Bank database for mining.

    2) Use the Weka GUI Chooser.

    3) Select EXPLORER present in Applications.

    62

  • 7/29/2019 Linux DM Lab Manual

    63/68

    4) Select Preprocess Tab.

    5) Go to OPEN file and browse the file that is already stored in the system bank.csv.

    6) select some of the attributes from attributes list

    7) Go to Classify tab.

    8) Choose Classifier Tree

    9) Select NBTree i.e., Navie Baysiean tree.

    10) Select Test options Use training set

    11) right click on the text box besides choose button ,select show properties

    12) now change unprone mode false to true.

    13) change the reduced error pruning % as needed.

    14) if need select attribute.

    15) now Start weka.

    16)now we can see the output details in the Classifier output.

    17) right click on the result list and select visualize tree option .

    Sample output:

    63

  • 7/29/2019 Linux DM Lab Manual

    64/68

    64

  • 7/29/2019 Linux DM Lab Manual

    65/68

  • 7/29/2019 Linux DM Lab Manual

    66/68

  • 7/29/2019 Linux DM Lab Manual

    67/68

  • 7/29/2019 Linux DM Lab Manual

    68/68

    One R

    PART