data processing

Meaning

MIS is an Information system which helps in providing the management of an organization

with information which is used by management for decision making.

A management information system (MIS) is a subset of the overall internal controls of a

business covering the application of people, documents, technologies, and procedures by

management accountants to solving business problems such as costing a product, service or a

business-wide strategy. Management information systems are distinct from regular information

systems in that they are used to analyze other information systems applied in operational

activities in the organization. Academically, the term is commonly used to refer to the group of

information management methods tied to the automation or support of human decision making,

e.g. Decision Support Systems, Expert systems, and Executive information systems.

During the period of preindustrial revolution most of the data processing was done manually. It

was after the industrial revolution that the computers slowly started replacing manual labour. The

modern digital computer was basically designed to handle scientific calculations. During the

period 1940 to 1960 computers were commercially used for census and payroll work. This

involved

Large amount of data and its processing. Since then the commercial application exceeded the

scientific applications for which the computer were mainly intended for. MIS is an Information

system which helps in providing the management of an organization with information which is

used by management for decision making.

The Basic characteristics of an effective Management Information System are as follows:

I.

Management-oriented: The basic objective of MIS is to provide information support to the

management in the organization for decision making. So an effective MIS should start its

journey from appraisal of management needs, mission and goal of the business organization. It

may be individual or collective goals of an organization. The MIS is such that it serves all the

levels of management in an organization i.e. top, middle and lower level.

II.

Management directed: When MIS is management-oriented, it should be directed by the

management because it is the management who tells their needs and requirements more

effectively than anybody else. Manager should guide the MIS professionals not only at the stage

of planning but also on development, review and implementation stages so that effective system

should be the end product of the whole exercise in making an effective MIS.

III.

Integrated: It means a comprehensive or complete view of all the sub systems in the organization

of a company. Development of information must be integrated so that all the operational and

functional information sub systems should be worked together as a single entity. This integration

is necessary because it leads to retrieval of more meaningful and useful information.

V.

Common data flows: The integration of different sub systems will lead to a common data flow

which will further help in avoiding duplicity and redundancy in data collection, storage and

processing. For example, the customer orders are the basis for many activities in an organization

viz. billing, sales for cashing, etc. Data is collected by a system analyst from its original source

only one time. Then he utilizes the data with minimum number of processing procedures and

uses the information for production output documents and reports in small numbers and

eliminates the undesirable data. This will lead to elimination of duplication that simplify the

operations and produce an efficient information system.

Data processing is, broadly, "the collection and manipulation of items of data to

produce meaningful information. In this sense it can be considered a subset of information processing, "the

change (processing) of information in any manner detectable by an observer."

http://en.wikipedia.org/wiki/Information_processing

http://en.wikipedia.org/wiki/Relevance

http://en.wikipedia.org/wiki/Data_collection

The term is often used more specifically in the context of a business or other organization to refer to the class

of commercial data processing applications

Data-Processing System

a system of interrelated techniques and means of collecting and processing the data needed t

oorganize control of some system. Automatic, or electronic, data-processing systems make use o

felectronic computers or other modern information-processing equipment. Without a computer

adata-processing system can be constructed for only a small controlled system. The use of

acomputer means that the data-processing system can perform not just individual information-

processing and computing tasks but a set of tasks that are interconnected and can be carried outin

a single sequence of operations.

Data-processing systems should be distinguished from automated control systems. The primaryf

unction of the latter is the performance of calculations associated, for example, with the solution

ofproblems of control and with the selection of optimal variants of plans on the basis of models a

ndthe techniques of mathematical economics. The chief purpose of automated control systems is

toincrease efficiency of control or management. The functions of data-processing systems, howe

ver,are to collect, store, retrieve, and process the data needed to carry out the calculations at thelo

west possible cost. When an automatic data-processing system is constructed, efforts are madeto

identify and automate laborious, regularly repeating routine operations on large files of data. Ada

ta-processing system is usually a part of an automated control system and represents the firststag

e in the development of an automated control system. Data-processing systems, however,also fun

ction as independent systems. In many cases it is more efficient to use a single system toprocess

similar data for a large number of control problems handled by different automated controlsyste

ms —that is, to use a shared-access data-processing system.

The first data-processing systems were constructed in the USA in the 1950’s, when it became cle

arthat the use of a computer to solve individual problems, such as calculating wages or keeping tr

ackof

of goods and materials, was inefficient. It was seen at that time that integratedprocessing of the d

ata fed to the computer was necessary.

The USSR has a number of large data-processing systems, most of which are the bases o

fautomated control systems. Examples are the systems that have been set up at such largeindustri

al enterprises as Frezer, Kalibr, the Likhachev Automotive Plant, the L’vov Television Plant,and

the 15th Anniversary of the Ukrainian Lenin Komsomol Donetsk Plant. Data-processingsystems

are coming into use not only in industrial enterprises but also in planning bodies,statistical agenci

es, ministries, and banking institutions. They are finding application in

and in the supply of materials and equipment. The introduction of a data-processing system is

aprerequisite for the development of an automated control system.

The experience that has been gained with data-processing systems permits identification of theba

sic principles of the construction and techniques of development of such systems. The mostimpor

tant principle is the principle of integration, which requires that the raw data undergoingprocessi

ng be fed to the data-processing system once. The problems being solved in the data-processing s

ystem are coordinated in such a way that the raw data and the data resulting from thesolution of s

ome problems are used as the initial data for as many of the other problems aspossible. This coor

dination eliminates duplication of the operations of collection, preparation, andchecking of data a

nd ensures the integrated use of the data. As a result, the costs of obtaining thenecessary informat

ion are reduced, and the efficiency of the data-processing system is increased.

Closely related to the principle of integration is the principle of centralization of data processing.

When a data-processing system is constructed, many information-processing tasks are removedfr

om the control of the respective subdivisions and are concentrated at a single computing center o

rat a small number of such centers. Large data files are established at these centers; the files area

vailable for integrated processing. Special information retrieval systems, called automatic databa

nks, are set up in the data-processing system to manage and make optimal use of the files. Theaut

omatic data bank receives data that are subject to repeated use, and, in conformity with theoperat

ing schedule of the data-processing system, the data are used to form the work files for theproble

ms being solved. The data bank also supplies information in response to inquiries. Thecentralizat

ion of data processing in constructing a data-processing system usually assumes areorganization

of the structure of control.

The principle of the systems approach to the organization of the sequence of operations consists i

nthe following: When the data-processing system is constructed, there must be integratedmechan

ization and automation of the operations at all stages of data collection and processing, andthe ha

rdware used must be self-consistent with respect to throughput and other parameters. If thisis not

done, the unity of the sequence of operations is disrupted, and the efficiency of the data-

processing system drops sharply.

Before a data-processing system is constructed, the following are subjected to thoroughinvestigat

ion and analysis: the controlled system, the control problems, the structure of control, thecontent

of the information, and the information flows. On the basis of the analysis of the results ofthe inv

estigation, an information model of the data-processing system is developed that establishesnew i

nformation flows and the relation between the data-processing tasks. The hardware is chosen,and

the sequence of operations of the data-processing system is worked out, according to thevolumes

of data being processed, stored, and transmitted as determined from the informationmodel of the

data-processing system. The successful construction of a data-processing systemrequires the part

icipation not only of specialists but also of managers and other personnel who aredirectly involve

d in the solution of control problems at all stages of the development andintroduction of the syste

m.

Manual data processing

Although widespread use of the term data processing dates only from the nineteen-fifties data

processing functions have been performed manually for millennia. For

example bookkeeping involves functions such as posting transactions and producing reports like

the balance sheet and the cash flow statement. Completely manual methods were augmented by

the application of mechanical or electronic calculators. A person whose job it was to perform

calculations manually or using a calculator was called a "computer."

The 1850 United States Census schedule was the first to gather data by individual rather

than household. A number of questions could be answered by making a check in the appropriate

box on the form. From 1850 through 1880 the Census Bureau employed "a system of tallying,

which, by reason of the increasing number of combinations of classifications required, became

increasingly complex. Only a limited number of combinations could be recorded in one tally, so

it was necessary to handle the schedules 5 or 6 times, for as many independent tallies. It took

over 7 years to publish the results of the 1880 census" using manual processing methods.

Automatic data processing

The term automatic data processing was applied to operations performed by means of unit

record equipment, such asHerman Hollerith's application of punched card equipment for

the 1890 United States Census. "Using Hollerith's punchcard equipment, the Census Office was

able to complete tabulating most of the 1890 census data in 2 to 3 years, compared with 7 to 8

years for the 1880 census. ... It is also estimated that using Herman Hollerith's system saved

some $5 million in processing costs"[5] (in 1890 dollars) even with twice as many questions as in

1880.

Electronic data processing

Computerized data processing, or Electronic data processing represents the further evolution,

with the computer taking the place of several independent pieces of equipment. The Census

Bureau first made limited use of electronic computers for the 1950 United States Census, using

a UNIVAC I system, delivered in 1952.

Further evolution

"Data processing (DP)" has also previously been used to refer to the department within an

organization responsible for the operation of data processing applications. The term data

processing has mostly been subsumed under the newer and somewhat more general

term information technology (IT). "Data processing" has acquired a negative connotation,

suggesting use of older technologies. As an example, in 1996 the Data Processing Management

Association(DPMA) changed its name to the Association of Information Technology

Professionals. Nevertheless, the terms are roughly synonymous.

Processing of data--editing, coding, classification and tabulation

Editing

What is editing?

Editing is the process of correcting faulty data, in order to allow the production of reliable

statistics.

Data editing does not exist in isolation from the rest of the collection processing cycle and the

nature and extent of any editing and error treatment will be determined by the aims of the

collection. In many cases it will not be necessary to pay attention to every error.

Errors in the data may have come from respondents or have been introduced during data entry

or data processing. Editing aims to correct a number of non-sampling errors, which are those

errors that may occur in both censuses and sample surveys; for example, non-sampling errors

include those errors introduced by misunderstanding questions or instructions, interviewer bias,

miscoding, non-availability of data, incorrect transcription, non-response and non-contact. But

editing will not reveal all non-sampling errors - for example, while an editing system could be

designed to detect transcription errors, missing values and inconsistent responses, other

problems such as interviewer bias may easily escape detection.

Editing should aim:

to ensure that outputs from the collection are mutually consistent, for example, a component

should not exceed an aggregate value; two different methods of deriving the same value should

give the same answer;

to detect major errors, which could have a significant effect on the outputs;

to find any unusual outputs and their causes.

The required level of editing

The function of editing is to help achieve the aims of a collection so, before edits are created or

modified, it is important to know these aims - since these have a major say in the nature of the

editing system created for the given collection. We need to know about features such as:

the outputs from the collection

the level at which outputs are required

their required accuracy

how soon after the reference period the outputs are needed

the users and uses of the collection. A collection may be simple (with limited data collected) and

designed to meet the requirements of only one type of user (e.g., Survey of new Motor Vehicle

Registration and Retail TRADE ) or it may collect more complex data and aim to meet the

needs of many different types of users (e.g. Agricultural Finance Survey, Household Expenditure

Survey, etc.). If there are many types of users there is a likelihood of conflicting requirements

amongst the users, which can lead to stresses on the collection.

the reliability of each item (eg. is the definition easily understood or is the item sensitive?)

While the goal of editing is to produce data that represent as closely as possible the activity

being measured, there are usually a number of constraints (such as the time and number of

people available for data editing) within which editing is conducted. These constraints will also

influence the design of the editing system for the given collection.

The structure of an edit

An edit is defined by specifying:

the test to be applied,

the domain, which is a description of the set of data that the test should be applied to, and

the follow-up action if the test is failed.

The test

This is a statement of something that is expected to be true for good data. A test typically

consists of data items connected by arithmetic or comparison operators. Ideas for suitable tests

may arise from people with a knowledge of the subject matter, the aims of the collection or

relationships that should hold between items.

Examples

http://www.nss.gov.au/nss/home.nsf/SurveyDesignDoc/5E4D15E56092F0E5CA2571AB00247A4D?OpenDocument

this item should not be missing

the sum of these items equals that item

The Domain

The domain is defined by specifying the conditions which the data must satisfy before the test

can be applied.

Example

A test may only be relevant to those businesses in a certain industry and the domain will

therefore consist of all records belonging to that industry.

The Follow-up

The edit designer must also think about the appropriate follow-up action if a test is failed. Some

edits will be minor failures that simply require human attention, but do not need to be amended.

Other edits identify major failures that require human attention and an amendment. The sort of

treatment given to an edit failure is commonly done by classifying edits to a grade of severity,

such as fatal, query and warning

.

Example

Where a record lacks critical information which is essential for further processing a fatal error

should be displayed.

It is important to note that even if we go through comprehensive editing processes errors may

still occur, as editing can identify only noticeable errors. Information wrongly given by

respondents or wrongly transcribed by interviewers can only be corrected when there are clues

that point to the error and provide the solution. Thus, the final computer file will not be error-

free, but hopefully should be internally consistent.

Generally different levels of editing are carried out at several stages during data processing.

Some of the stages involved are provided below.

Clerical Coding

This stage includes mark-in of the forms as they are returned, all manual coding (eg. country and

industry coding) and manual data conversion (eg. miles to kilometres).

Clerical Editing

This stage includes all editing done manually by clerks before the unit data are loaded into a

computer file.

Input Editing

Input editing deals with each respondent independently and includes all "within record" edits

which are applied to unit records. It is carried out before any aggregates for the production of

estimates are done. An important consideration in input editing is the setting of the tolerances for

responses. Setting low tolerances will result in the generation of large numbers of edit failures

and impact directly on resources and in the meeting of timetables.

Ideally, an input edit system has been designed after carefully considering and setting edit

tolerances, clerical scrutiny levels, resource costs (against benefits), respondent load and timing

implications.

Output Editing

Output editing includes all edits applied to the data once it has been weighted and aggregated in

preparation for publication. If a unit contributes a large amount to a cell total, then the response

for that unit should be checked with a follow-up.

Output editing is not restricted to examination of aggregates within the confines of a survey. A

good output edit system will incorporate comparisons against other relevant statistical indicators.

Return to top

Types of Edits Commonly Used

Validation Edit

Checks the validity or legality of basic identification or classificatory items in unit data.

Examples

the respondent's reference number is of a legal form

state code is within the legal range

sex is coded as either M or F

Missing Data Edit

Checks that data that should have been reported were in fact reported. An answer to one question

may determine which other questions are to be answered and the editing system needs to ensure

that the right sequence of questions has been answered.

Examples

in an employment survey a respondent should report a value for employment

a respondent who has replied NO to the question: Do you have any children? should not have

answered any of the questions about the ages, sexes and education of any children

Logical Edit

Ensures that two or more categorical items in a record do not have contradictory values.

Example

a respondent claiming to be 16 years old and receiving the age pension would clearly fail an edit

Consistency (or reconciliation) edits

Checks that precise arithmetical relationships hold between continuous numeric variables that

are subject to such relationships. Consistency edits could involve the checking of totals or

products.

Examples

totals: a reported total should equal the sum of the reported components

totals of different breakdowns of the same item should be equal (eg. the Australian estimate

should be the same whether obtained by summing state estimates or industry estimates

products: if one item is a known percentage of another then this can be checked (eg. land tax

paid should equal the product of the taxable value and the land tax rate)

income from the sales of a commodity should equal the product of the unit price and the amount

sold

Range Edit

Checks that approximate relationships hold between numeric variables that are subject to such

relationships. A range edit can be thought of as a loosening of a consistency edit and it's

definition will include a description of the range of acceptable values (or tolerance range).

Examples

If a company's value for number of employees increases by more than a certain predefined

amount (or proportion) then the unit will fail. Note that both the absolute change and the

proportional change should be considered since a change from 500 to 580 may not be as useful

as a change from 6 to 10. So, if the edit was defined to accept the record if the current value is

within 20% of the previous value a change from 500 to 580 would be accepted and the change

from 6 to 10 would be queried.

In a survey which collects total amount loaned for houses and total number of housing loans

from each lending institution it would probably be sensible to check that the derived item

average housing loan is within an acceptable range.

After collecting data, the method of converting raw data into meaningful statement; includes data

processing, data analysis, and data interpretation and presentation.

Data reduction or processing mainly involves various manipulations necessary for preparing the

data for analysis. The process (of manipulation) could be manual or electronic. It involves

editing, categorizing the open-ended questions, coding, computerization and preparation of

tables and diagrams.

Editing data :

Information gathered during data collection may lack uniformity. Example: Data collected

through questionnaire and schedules may have answers which may not be ticked at proper

places, or some questions may be left unanswered. Sometimes information may be given in a

form which needs reconstruction in a category designed for analysis, e.g., converting

daily/MONTHLY INCOME in annual income and so on. The researcher has to take a decision as

to how to edit it.

Editing also needs that data are relevant and appropriate and errors are modified. Occasionally,

the investigator makes a mistake and records and impossible answer. “How much red chilies do

you use in a month” The answer is written as “4 kilos”. Can a family of three members use four

kilo chilies in a month? The correct answer could be “0.4 kilo”

Care should be taken in editing (re-arranging) answers to open-ended questions. Example:

Sometimes “don’t know” answer is edited as “no response”. This is wrong. “Don’t know” means

that the respondent is not sure and is in a double mind about his reaction or considers the

questions personal and does not want to answer it. “No response” means that the respondent is

not familiar with the situation/object/event/individual about which he is asked.

Coding of data:

Coding is translating answers into numerical values or assigning numbers to the various

categories of a variable to be used in data analysis. Coding is done by using a code book, code

sheet, and a computer card. Coding is done on the basis of the instructions given in the

codebook. The code book gives a numerical code for each variable.

Now-a-days, codes are assigned before going to the field while constructing the

questionnaire/schedule. Pose data collection; pre-coded items are fed to the computer for

processing and analysis. For open-ended questions, however, post-coding is necessary. In such

cases, all answers to open-ended questions are placed in categories and each category is assigned

a code.

Manual processing is employed when qualitative methods are used or when in quantitative

studies, a small sample is used, or when the questionnaire/schedule has a large number of open-

ended questions, or when accessibility to computers is difficult or inappropriate. However,

coding is done in manual processing also.

Data editing and coding

Many data processing activities that are typically completed when collecting data via paper

questionnaires were unnecessary in these studies because the questionnaires were computer

administered. For example, making sure that questions were asked in the correct sequence,

checking for out-of-range or inconsistent responses, and filling in the appropriate question text

based on a respondent's previous answers were all controlled by the interviewing application

software. Inconsistent responses that failed the programme's edit checks were brought to the

attention of the interviewer who could resolve the inconsistency with the respondent during the

interview, improving the quality of data and minimizing the need for back-end editing. Although

these software programs automatically performed many of the decisions formerly made by

interviewers using paper questionnaires, the data for each study did require some additional

editing and coding. Editing operations included processing each interview through a series of

programming routines that evaluated question responses and assigned codes to indicate the

presence or absence of each mental health disorder assessed by the study. A number of other

summary variables based on individual question items were also created in preparation for the

project's analysis phase. In addition, each study included several open-ended questions, which

were coded.

Data classification/distribution :

Sarantakos (1998: 343) defines distribution of data as a form of classification of scores obtained

for the various categories or a particular variable. There are four types of distributions:

1. Frequency distribution

2. Percentage distribution

3. Cumulative distribution

4. Statistical distributions

Frequency distribution:

In social science research, frequency distribution is very common. It presents the frequency of

occurrences of certain categories. This distribution appears in two forms:

Ungrouped: Here, the scores are not collapsed into categories, e.g., distribution of ages of the

students of a BJ (MC) class, each age value (e.g., 18, 19, 20, and so on) will be presented

separately in the distribution.

Grouped: Here, the scores are collapsed into categories, so that 2 or 3 scores are presented

together as a group. For example, in the above age distribution groups like 18-20, 21-22 etc., can

be formed)

Percentage distribution:

It is also possible to give frequencies not in absolute numbers but in percentages. For instance

instead of saying 200 respondents of total 2000 had a MONTHLY INCOME of less than Rs.

500, we can say 10% of the respondents have a monthly income of less than Rs. 500.

Cumulative distribution:

It tells how often the value of the random variable is less than or equal to a particular reference

value.

Statistical data distribution:

In this type of data distribution, some measure of average is found out of a sample of

respondents. Several kind of averages are available (mean, median, mode) and the researcher

must decide which is most suitable to his purpose. Once the average has been calculated, the

question arises: how representative a figure it is, i.e., how closely the answers are bunched

around it. Are most of them very close to it or is there a wide range of variation?

Tabulation of data :

After editing, which ensures that the information on the schedule is accurate and categorized in a

suitable form, the data are put together in some kinds of tables and may also undergo some other

forms of statistical analysis.

Table can be prepared manually and/or by computers. For a small study of 100 to 200 persons,

there may be little point in tabulating by computer since this necessitates putting the data on

punched cards. But for a survey analysis involving a large number of respondents and requiring

cross tabulation involving more than two variables, hand tabulation will be inappropriate and

time consuming.

Usefulness of tables:

Tables are useful to the researchers and the readers in three ways:

1. The present an overall view of findings in a simpler way.

2. They identify trends.

3. They display relationships in a comparable way between parts of the findings.

By convention, the dependent variable is presented in the rows and the independent variable in

the columns.

Data Processing/Tabulation

Data Processing/Tabulation - Services offered by a company that has over the years earned a reputation as the industry's most experienced service provider, offering accurate data processing services to Market Research companies. These services are solicited by leading agencies and independent market researchers alike who require tabulation using Quantum, SPSS etc..

Data processing

Experts in Data Processing /Tabulation services! We Offer high quality services with minimum hassle and within the client’s budget. We provide Data processing/tabulation services for various Market Research clients/Management consultancies across the globe. Our most esteemed clients are from Europe, North America, Middle East, South East Asia and Japan.

Our competitive research designs, analytics and presentations help our clients focus on business development, client servicing and project management, enabling them to be more focused and business oriented.We are highly experienced in helping clients with basic tabulations as well as complex ones like vicariate and multivariate analysis.

Data tabulation refers to generating tables from data (collected in a survey, for example) after it has been validated and analyzed. This helps researchers and businesses to interpret survey results, or any other data.

Capital Typing has extensive experiencing handling hundreds of data tabulation projects for clients across the globe every year. Our data analysts have expertise using data tabulation tools such as Quantum, SPSS, SPSS Dimensions, Wincross and others. These applications support all tabulation requirements from cross tabulation, generating pivot tables to complex weighting. We can provide all your tabulation services, from simple to large-scale multi-country, multi-wave data collected in different file formats.

By outsourcing your data tabulation needs to Capital Typing, you can free valuable resources from a rather time consuming process which requires uncompromising attention to detail.

Our data processing professionals will create tabulations for analysis from physical documents such as mail or onsite surveys, or from respondent data collected over the phone or online.

Depending on client requirements, our Cross-Tabulation Reports may include:

Table of Contents Banner tables (including labeling, flexible basing and statistics) Volumetric and Sigma bases Up to 20 columns per Banner Descriptive Statistics (Mean, Median, Standard Error, Standard deviation, Mode) Minimum and Maximum values Data Weighting Vertical and/or Horizontal percentages Rankings Selection of statistical testing (T-Test/Chi-Square, ANOVA …) Customized formatting options for final output (Word, Excel or PowerPoint)

Reports are delivered as hard copy or are e-mailed immediately as files (Word, PDF, Excel, etc.)

Some of the benefits of entrusting data tabulation requirements to Capital Typing are:

All data tabulation is cross-referenced and quality checked (quality control is an integral part of our processes and procedures).

We are flexible enough to work with your schedule and meet your delivery time needs. No matter how many changes and/or corrections to the tables or the specifications, our

rates remain unchanged.

Data Processing and TabulationConstructing a frequency distribution table:

The procedure we use to describe a set of data in a frequency table is as follows:

1. Decide on the number of classes.

2. Determine the class interval or width.

3. Set the individual class limits.

4. Tally the classes.

5. Count the number of items in each class.

EXAMPLE:

The following data represent the heights in centimeters of 50 patients in an outpatient clinic.

145 95 148 112 132

140 162 118 170 144

145 127 148 165 138

173 113 104 141 142

116 178 123 141 138

127 143 134 136 137

155 93 102 154 142

134 165 123 124 124

138 160 157 138 131

114 135 151 138 157

1. What are the lowest and highest heights?

Lowest height= 93 = L Highest height= 178 = H

1. How many classes you are going to decide?

Since n = 50 we use the (2k method)

If k= 5 then ( 25 = 32 ) , if k= 6 then ( 26 = 64 )

Choose k such that ( 2k ≥ n )

Choose k = 6 = number of classes

1. What is the width of each class ?

Class interval = width = (H – L)/ k = (178 – 93) / 6 = 14.2

the width = 15

1. Tally the heights into the classes:

Class Tally Frequency

____________________________________________

91 – 105 //// 4

106 – 120 //// 5

121 – 135 //// //// / 11

136 – 150 //// //// //// /// 18

151 – 165 //// //// 9

166 – 180 /// 3

1. Prepare tabular summaries of the height data (Frequency table):

Salary Frequency R. Frequency P. Frequency C. Frequency

_________________________________________________________________

91 – 105 4 0.08 8 4

106 – 120 5 0.10 10 9

121 – 135 11 0.22 22 20

136 – 150 18 0.36 36 38

151 – 165 9 0.18 18 47

166 – 180 3 0.06 6 50

—— ——- ——

50 1.00 100

R = Relative, P = Percentage , C = Cumulative

1. What percentage heights are more than 150 ?

24 %

1. What proportion of the heights are 135 or less ?

20 / 50

1. Determine class boundaries for your class marks.

Class interval Class boundaries

_________________________________

91 – 105 90.5 – 105.5

106 – 120 105.5 – 120.5

121 – 135 120.5 – 135.5

136 – 150 135.5 – 150.5

151 – 165 150.5 – 165.5

166 – 180 165.5 – 180.5

EXPERIMENT:

The following data are the hours of personal computer usage during one week for a sample of

30 persons .

4 1 10 5 3 5 1 6 3 3

3 4 2 14 5 4 3 4 11 3

4 4 8 5 4 3 7 10 6 7

Construct a frequency table showing

1. What are the highest and lowest weekly usage of

Personal computer in this sample? what is the range ?

1. How many classes are you going to choose? Why?

1. What is the width of each class? Why?

1. Tally the usage hours of personal computers into the classes.

1. Construct the frequency distribution table. Showing the relative frequency, percentage

frequency, cumulative frequency, and class boundaries.

1. What proportion of weekly usage of personal computer are 3.5 hours or more?

1. What percentage of weekly usage of personal computer are less than 2 hours?

1. What did you understand from the word frequency (define).

How to start process of data classification

TITUS was the first entrant into the data classification industry and continues to be the market

and thought leader with a wide range of solutions supporting data classification in Microsoft

Outlook to data classification for iOS and Android mobile devices.

In the field of data management, data classification as a part of Information Lifecycle

Management (ILM) process can be defined as a tool for categorization of data to enable/help

organization to effectively answer following questions:

What data types are available?

Where are certain data located?

What access levels are implemented?

What protection level is implemented and does it adhere to compliance regulations?

When implemented it provides a bridge between IT professionals and process or application

owners. IT staff is informed about the data value and on the other hand management (usually

application owners) understands better to what segment of data centre has to be invested to keep

operations running effectively. This can be of particular importance in risk management, legal

discovery, and compliance with government regulations. Data classification is typically a manual

process; however, there are many tools from different vendors that can help gather information

about the data.

Note that this classification structure is written from a Data Management perspective and

therefore has a focus for text and text convertible binary data sources. Images, videos, and audio

files are highly structured formats built for industry standard API's and do not readily fit within

the classification scheme outlined below.

First step is to evaluate and divide the various applications and data into their respective category

as follows:

Relational or Tabular data (around 15% of non audio/video data)

Generally describes proprietary data which can be accessible only through application

or application programming interfaces (API)

http://en.wikipedia.org/wiki/Application_programming_interfaces

http://en.wikipedia.org/wiki/Compliance_(regulation)

http://en.wikipedia.org/wiki/Information_Lifecycle_Management

http://en.wikipedia.org/wiki/Information_Lifecycle_Management

http://en.wikipedia.org/wiki/Data_management

Applications that produce structured data are usually database applications.

This type of data usually brings complex procedures of data evaluation and migration

between the storage tiers.

To ensure adequate quality standards, the classification process has to be monitored by

subject matter experts.

Semi-structured or Poly-structured data (all other non audio/video data that does not conform

to a system or platform defined Relational or Tabular form).

Generally describes data files that have a dynamic or non-relational semantic structure

(e.g. documents,XML,JSON,Device or System Log output,Sensor Output).

Relatively simple process of data classification is criteria assignment.

Simple process of data migration between assigned segments of predefined storage tiers.

Types of data classification - note that this designation is entirely orthogonal to the application

centric designation outlined above. Regardless of structure inherited from application, data may

be of the types below

1. Geographical: i.e. according to area (supposing the rice production of a state or country etc.) 2.

Chronological: i.e. according to time (sale of last 3 months) 3. Qualitative: i.e. according to

distinct categories. (E.g.: population on the basis of poor and rich) 4. Quantitative: i.e. according

to magnitude (a) discrete and b)continuous

Basic criteria for semi-structured or poly-structured data classification

Time criteria is the simplest and most commonly used where different type of data is

evaluated by time of creation, time of access, time of update, etc.

Metadata criteria as type, name, owner, location and so on can be used to create more

advanced classification policy

Content criteria which involve usage of advanced content classification algorithms are most

advanced forms of unstructured data classification

Note that any of these criteria may also apply to Tabular or Relational data as "Basic Criteria".

These criteria are application specific, rather than inherent aspects of the form in which the data

is

Resented..

Basic criteria for relational or Tabular data classification

These criteria are usually initiated by application requirements such as:

Disaster recovery and Business Continuity rules

Data centre resources optimization and consolidation

Hardware performance limitations and possible improvements by reorganization

Note that any of these criteria may also apply to semi/poly structured data as "Basic Criteria".

These criteria are application specific, rather than inherent aspects of the form in which the data

is presented.

Benefits of data classification

Benefits of effective implementation of appropriate data classification can significantly improve

ILM process and save data centre storage resources. If implemented systemically it can generate

improvements in data centre performance and utilization. Data classification can also reduce

costs and administration overhead. "Good enough" data classification can produce these results:

Data compliance and easier risk management. Data are located where expected on predefined

storage tier and "point in time"

Simplification of data encryption because all data need not be encrypted. This saves valuable

processor cycles and all related consecutiveness.

Data indexing to improve user access times

Data protection is redefined where RTO (Recovery Time Objective) is improved.

Research streamlines data processing to solve problems more efficiently

http://en.wikipedia.org/wiki/Recovery_Time_Objective

http://en.wikipedia.org/wiki/Risk_management

Researchers at North Carolina State University have developed a new analytical method that

opens the door to faster processing of large amounts of information, with applications in fields as

diverse as the military, medical diagnostics and homeland security.

"The problem we address here is this: When faced with a large amount of data, how do you

determine which pieces of that information are relevant for solving a specific problem," says Dr.

Joel Trussell, a professor of electrical and computer engineering at NC State and co-author of a

paper describing the research. "For example, how would you select the smallest number of

features that would allow a robot to differentiate between water and solid ground, based on

visual data collected by video?"

This is important, because the more data you need to solve a problem, the more expensive it is to

collect the data and the longer it will take to process the data. "The work we've done here allows

for a more efficient collection of data by targeting exactly what information is most important to

the decision-making process," Trussell says. "Basically, we've created a new algorithm that can

be used to determine how much data is needed to make a decision with a minimal rate of error."

One application for the new algorithm, discussed in the paper, is for the development of

programs that can analyze hyperspectral data from military cameras in order to identify potential

targets. Hyperspectral technology allows for finer resolution of the wavelengths of light that are

visible to the human eye, though it can also collect information from the infrared spectrum --

which can be used to identify specific materials, among other things. The algorithm could be

used to ensure that such a program would operate efficiently, minimizing data storage needs and

allowing the data to be processed more quickly.

But Trussell notes that "there are plenty of problems out there where people are faced with a vast

amount of data, visual or otherwise, -- such as medical situations, where doctors may have the

results from multiple imaging tests. For example, the algorithm would allow the development of

a more efficient screening process for evaluating medical images -- such as mammograms --

from a large group of people."

Another potential application would be for biometrics, such as homeland security efforts to

identify terrorists and others on the Department of Homeland Security watchlist based on video

and camera images.

The research, "Constrained Dimensionality Reduction Using A Mixed-Norm Penalty Function

With Neural Networks," was funded by the U.S. Army Research Office and co-authored by

Trussell and former NC State Ph.D. student Huiwen Zeng. The work is published in the March

issue of IEEE Transactions on Knowledge and Data Engineering.

NC State's Department of Electrical and Computer Engineering is part of the university's College

of Engineering.

Data-Processing System

a system of interrelated techniques and means of collecting and processing the data needed to org

anize control of some system. Automatic,or electronic, data-processing systems make use of elec

tronic computers or other modern information-processing equipment. Without acomputer a data-

processing system can be constructed for only a small controlled system. The use of a computer

means that the data-processing system can perform not just individual information-processing an

d computing tasks but a set of tasks that are interconnected andcan be carried out in a single sequ

ence of operations.

Data-processing systems should be distinguished from automated control systems. The primary f

unction of the latter is the performance ofcalculations associated, for example, with the solution

of problems of control and with the selection of optimal variants of plans on the basisof models a

nd the techniques of mathematical economics. The chief purpose of automated control systems is

to increase efficiency ofcontrol or management. The functions of data-processing systems, howe

ver, are to collect, store, retrieve, and process the data needed tocarry out the calculations at the l

owest possible cost. When an automatic data-processing system is constructed, efforts are made t

o identifyand automate laborious, regularly repeating routine operations on large files of data. A

data-processing system is usually a part of anautomated control system and represents the first st

age in the development of an automated control system. Data-processing systems,however, also f

unction as independent systems. In many cases it is more efficient to use a single system to proce

ss similar data for a largenumber of control problems handled by different automated control syst

ems —that is, to use a shared-access data-processing system.

The first data-processing systems were constructed in the USA in the 1950’s, when it became cle

ar that the use of a computer to solveindividual problems, such as calculating wages or keeping tr

ack of stocks of goods and materials, was inefficient. It was seen at that timethat integrated proce

ssing of the data fed to the computer was necessary.

The USSR has a number of large data-processing systems, most of which are the bases of autom

ated control systems. Examples are thesystems that have been set up at such large industrial enter

prises as Frezer, Kalibr, the Likhachev Automotive Plant, the L’vov TelevisionPlant, and the 15t

h Anniversary of the Ukrainian Lenin Komsomol Donetsk Plant. Data-processing systems are co

ming into use not only inindustrial enterprises but also in planning bodies, statistical agencies, mi

nistries, and banking institutions. They are finding application intrade and in the supply of materi

als and equipment. The introduction of a data-processing system is a prerequisite for the develop

ment of anautomated control system.

The experience that has been gained with data-processing systems permits identification of the b

asic principles of the construction andtechniques of development of such systems. The most imp

ortant principle is the principle of integration, which requires that the raw dataundergoing proces

sing be fed to the data-processing system once. The problems being solved in the data-processing

system are coordinatedin such a way that the raw data and the data resulting from the solution of

some problems are used as the initial data for as many of the otherproblems as possible. This coo

rdination eliminates duplication of the operations of collection, preparation, and checking of data

and ensuresthe integrated use of the data. As a result, the costs of obtaining the necessary inform

ation are reduced, and the efficiency of the data-processing system is increased.

Closely related to the principle of integration is the principle of centralization of data processing.

When a data-processing system isconstructed, many information-processing tasks are removed fr

om the control of the respective subdivisions and are concentrated at a singlecomputing center or

at a small number of such centers. Large data files are established at these centers; the files are a

vailable for integratedprocessing. Special information retrieval systems, called automatic data ba

nks, are set up in the data-processing system to manage andmake optimal use of the files. The aut

omatic data bank receives data that are subject to repeated use, and, in conformity with the operat

ingschedule of the data-processing system, the data are used to form the work files for the proble

ms being solved. The data bank also suppliesinformation in response to inquiries. The centralizat

ion of data processing in constructing a data-processing system usually assumes areorganization

of the structure of control.

The principle of the systems approach to the organization of the sequence of operations consists i

n the following: When the data-processingsystem is constructed, there must be integrated mecha

nization and automation of the operations at all stages of data collection andprocessing, and the h

ardware used must be self-consistent with respect to throughput and other parameters. If this is n

ot done, the unity ofthe sequence of operations is disrupted, and the efficiency of the data-

processing system drops sharply.

Before a data-processing system is constructed, the following are subjected to thorough investiga

tion and analysis: the controlled system,the control problems, the structure of control, the content

of the information, and the information flows. On the basis of the analysis of theresults of the inv

estigation, an information model of the data-processing system is developed that establishes new

information flows and therelation between the data-processing tasks. The hardware is chosen, an

d the sequence of operations of the data-processing system isworked out, according to the volum

es of data being processed, stored, and transmitted as determined from the information model of t

he data-processing system. The successful construction of a data-processing system requires the

participation not only of specialists but also ofmanagers and other personnel who are directly inv

olved in the solution of control problems at all stages of the development and introductionof the s

ystem.

data processing

Documents

management information

information support

management accountants

levels of management

development of information

executive information

regular information

useful information