data mining,cobol,memory
TRANSCRIPT
7/29/2019 Data Mining,Cobol,Memory
http://slidepdf.com/reader/full/data-miningcobolmemory 1/54
Data mining
An Introduction to Data Mining
Discovering hidden value in your data warehouse
Overview
Data mining, the extraction of hidden predictive information from large databases,is a powerful new technology with great potential to help companies focus on themost important information in their data warehouses. Data mining tools predictfuture trends and behaviors, allowing businesses to make proactive, knowledge-driven decisions. The automated, prospective analyses offered by data miningmove beyond the analyses of past events provided by retrospective tools typical of decision support systems. Data mining tools can answer business questions thattraditionally were too time consuming to resolve. They scour databases for hidden
patterns, finding predictive information that experts may miss because it liesoutside their expectations.
Most companies already collect and refine massive quantities of data. Data miningtechniques can be implemented rapidly on existing software and hardware
platforms to enhance the value of existing information resources, and can be
integrated with new products and systems as they are brought on-line. Whenimplemented on high performance client/server or parallel processing computers,data mining tools can analyze massive databases to deliver answers to questionssuch as, "Which clients are most likely to respond to my next promotional mailing,and why?"
This white paper provides an introduction to the basic technologies of data mining.Examples of profitable applications illustrate its relevance to today’s business
environment as well as a basic description of how data warehouse architectures canevolve to deliver the value of data mining to end users.
The Foundations of Data Mining
Data mining techniques are the result of a long process of research and productdevelopment. This evolution began when business data was first stored oncomputers, continued with improvements in data access, and more recently,generated technologies that allow users to navigate through their data in real time.
7/29/2019 Data Mining,Cobol,Memory
http://slidepdf.com/reader/full/data-miningcobolmemory 2/54
Data mining takes this evolutionary process beyond retrospective data access andnavigation to prospective and proactive information delivery. Data mining is readyfor application in the business community because it is supported by threetechnologies that are now sufficiently mature:
Massive data collection Powerful multiprocessor computers Data mining algorithms
Commercial databases are growing at unprecedented rates. A recent META Groupsurvey of data warehouse projects found that 19% of respondents are beyond the50 gigabyte level, while 59% expect to be there by second quarter of 1996.1 Insome industries, such as retail, these numbers can be much larger. Theaccompanying need for improved computational engines can now be met in a cost-
effective manner with parallel multiprocessor computer technology. Data miningalgorithms embody techniques that have existed for at least 10 years, but have onlyrecently been implemented as mature, reliable, understandable tools thatconsistently outperform older statistical methods.
In the evolution from business data to business information, each new step has built upon the previous one. For example, dynamic data access is critical for drill-through in data navigation applications, and the ability to store large databases iscritical to data mining. From the user’s point of view, the four steps listed in Table
1 were revolutionary because they allowed new business questions to be answered
accurately and quickly.
Evolutionary
Step Business
Question Enabling
Technologies Product
Providers Characteristics
DataCollection
(1960s)
"What was mytotal revenue in
the last fiveyears?"
Computers,tapes, disks
IBM, CDC Retrospective,static data
delivery
Data Access
(1980s)
"What were unitsales in NewEngland lastMarch?"
Relationaldatabases(RDBMS),Structured Query
Oracle,Sybase,Informix,IBM,
Retrospective,dynamic datadelivery atrecord level
7/29/2019 Data Mining,Cobol,Memory
http://slidepdf.com/reader/full/data-miningcobolmemory 3/54
Language (SQL),ODBC
Microsoft
Data
Warehousing&
DecisionSupport
(1990s)
"What were unit
sales in NewEngland lastMarch? Drilldown toBoston."
On-line analytic
processing(OLAP),multidimensionaldatabases, datawarehouses
Pilot,
Comshare,Arbor,Cognos,Microstrategy
Retrospective,
dynamic datadelivery atmultiple levels
Data Mining
(EmergingToday)
"What’s likely to
happen to
Boston unit salesnext month?Why?"
Advancedalgorithms,
multiprocessor computers,massivedatabases
Pilot,Lockheed,
IBM, SGI,numerousstartups(nascentindustry)
Prospective, proactive
informationdelivery
Table 1. Steps in the Evolution of Data Mining.
The core components of data mining technology have been under development for
decades, in research areas such as statistics, artificial intelligence, and machinelearning. Today, the maturity of these techniques, coupled with high-performancerelational database engines and broad data integration efforts, make thesetechnologies practical for current data warehouse environments.
The Scope of Data Mining
Data mining derives its name from the similarities between searching for valuable business information in a large database — for example, finding linked products ingigabytes of store scanner data — and mining a mountain for a vein of valuable
ore. Both processes require either sifting through an immense amount of material,or intelligently probing it to find exactly where the value resides. Given databasesof sufficient size and quality, data mining technology can generate new businessopportunities by providing these capabilities:
Automated prediction of trends and behaviors. Data mining automatesthe process of finding predictive information in large databases. Questions
7/29/2019 Data Mining,Cobol,Memory
http://slidepdf.com/reader/full/data-miningcobolmemory 4/54
that traditionally required extensive hands-on analysis can now be answereddirectly from the data — quickly. A typical example of a predictive problemis targeted marketing. Data mining uses data on past promotional mailings toidentify the targets most likely to maximize return on investment in futuremailings. Other predictive problems include forecasting bankruptcy andother forms of default, and identifying segments of a population likely torespond similarly to given events.
Automated discovery of previously unknown patterns. Data mining toolssweep through databases and identify previously hidden patterns in one step.An example of pattern discovery is the analysis of retail sales data to identifyseemingly unrelated products that are often purchased together. Other
pattern discovery problems include detecting fraudulent credit cardtransactions and identifying anomalous data that could represent data entry
keying errors.
Data mining techniques can yield the benefits of automation on existing softwareand hardware platforms, and can be implemented on new systems as existing
platforms are upgraded and new products developed. When data mining tools areimplemented on high performance parallel processing systems, they can analyzemassive databases in minutes. Faster processing means that users can automaticallyexperiment with more models to understand complex data. High speed makes it
practical for users to analyze huge quantities of data. Larger databases, in turn,yield improved predictions.
Databases can be larger in both depth and breadth:
More columns. Analysts must often limit the number of variables theyexamine when doing hands-on analysis due to time constraints. Yet variablesthat are discarded because they seem unimportant may carry informationabout unknown patterns. High performance data mining allows users toexplore the full depth of a database, without preselecting a subset of variables.
More rows. Larger samples yield lower estimation errors and variance, andallow users to make inferences about small but important segments of a
population.
A recent Gartner Group Advanced Technology Research Note listed data miningand artificial intelligence at the top of the five key technology areas that "willclearly have a major impact across a wide range of industries within the next 3 to 5
7/29/2019 Data Mining,Cobol,Memory
http://slidepdf.com/reader/full/data-miningcobolmemory 5/54
years."2 Gartner also listed parallel architectures and data mining as two of the top10 new technologies in which companies will invest during the next 5 years.According to a recent Gartner HPC Research Note, "With the rapid advance in datacapture, transmission and storage, large-systems users will increasingly need toimplement new and innovative ways to mine the after-market value of their vaststores of detail data, employing MPP [massively parallel processing] systems tocreate new sources of business advantage (0.9 probability)."3
The most commonly used techniques in data mining are:
Artificial neural networks: Non-linear predictive models that learn throughtraining and resemble biological neural networks in structure.
Decision trees: Tree-shaped structures that represent sets of decisions.
These decisions generate rules for the classification of a dataset. Specificdecision tree methods include Classification and Regression Trees (CART)and Chi Square Automatic Interaction Detection (CHAID) .
Genetic algorithms: Optimization techniques that use processes such asgenetic combination, mutation, and natural selection in a design based on theconcepts of evolution.
Nearest neighbor method: A technique that classifies each record in adataset based on a combination of the classes of the k record(s) most similar
to it in a historical dataset (where k ³ 1). Sometimes called the k-nearestneighbor technique.
Rule induction: The extraction of useful if-then rules from data based onstatistical significance.
Many of these technologies have been in use for more than a decade in specializedanalysis tools that work with relatively small volumes of data. These capabilitiesare now evolving to integrate directly with industry-standard data warehouse andOLAP platforms. The appendix to this white paper provides a glossary of data
mining terms.
How Data Mining Works
How exactly is data mining able to tell you important things that you didn't knowor what is going to happen next? The technique that is used to perform these featsin data mining is called modeling. Modeling is simply the act of building a model
7/29/2019 Data Mining,Cobol,Memory
http://slidepdf.com/reader/full/data-miningcobolmemory 6/54
in one situation where you know the answer and then applying it to another situation that you don't. For instance, if you were looking for a sunken Spanishgalleon on the high seas the first thing you might do is to research the times whenSpanish treasure had been found by others in the past. You might note that theseships often tend to be found off the coast of Bermuda and that there are certaincharacteristics to the ocean currents, and certain routes that have likely been taken by the ship’s captains in that era. You note these similarities and build a model that
includes the characteristics that are common to the locations of these sunkentreasures. With these models in hand you sail off looking for treasure where your model indicates it most likely might be given a similar situation in the past.Hopefully, if you've got a good model, you find your treasure.
This act of model building is thus something that people have been doing for along time, certainly before the advent of computers or data mining technology.
What happens on computers, however, is not much different than the way people build models. Computers are loaded up with lots of information about a variety of situations where an answer is known and then the data mining software on thecomputer must run through that data and distill the characteristics of the data thatshould go into the model. Once the model is built it can then be used in similar situations where you don't know the answer. For example, say that you are thedirector of marketing for a telecommunications company and you'd like to acquiresome new long distance phone customers. You could just randomly go out andmail coupons to the general population - just as you could randomly sail the seaslooking for sunken treasure. In neither case would you achieve the results youdesired and of course you have the opportunity to do much better than random -you could use your business experience stored in your database to build a model.
As the marketing director you have access to a lot of information about all of your customers: their age, sex, credit history and long distance calling usage. The goodnews is that you also have a lot of information about your prospective customers:their age, sex, credit history etc. Your problem is that you don't know the longdistance calling usage of these prospects (since they are most likely now customersof your competition). You'd like to concentrate on those prospects who have large
amounts of long distance usage. You can accomplish this by building a model.Table 2 illustrates the data used for building a model for new customer prospectingin a data warehouse.
7/29/2019 Data Mining,Cobol,Memory
http://slidepdf.com/reader/full/data-miningcobolmemory 7/54
Customers Prospects
General information (e.g.demographic data)
Known Known
Proprietary information (e.g.customer transactions)
Known Target
Table 2 - Data Mining for Prospecting
The goal in prospecting is to make some calculated guesses about the informationin the lower right hand quadrant based on the model that we build going from
Customer General Information to Customer Proprietary Information. For instance,a simple model for a telecommunications company might be:
98% of my customers who make more than $60,000/year spend more than$80/month on long distance
This model could then be applied to the prospect data to try to tell something aboutthe proprietary information that this telecommunications company does notcurrently have access to. With this model in hand new customers can be selectively
targeted.
Test marketing is an excellent source of data for this kind of modeling. Mining theresults of a test market representing a broad but relatively small sample of
prospects can provide a foundation for identifying good prospects in the overallmarket. Table 3 shows another common scenario for building models: predict whatis going to happen in the future.
Yesterday Today Tomorrow
Static information andcurrent plans (e.g.demographic data, marketing
plans)
Known Known Known
7/29/2019 Data Mining,Cobol,Memory
http://slidepdf.com/reader/full/data-miningcobolmemory 8/54
Dynamic information (e.g.customer transactions)
Known Known Target
Table 3 - Data Mining for Predictions
If someone told you that he had a model that could predict customer usage howwould you know if he really had a good model? The first thing you might trywould be to ask him to apply his model to your customer base - where you alreadyknew the answer. With data mining, the best way to accomplish this is by settingaside some of your data in a vault to isolate it from the mining process. Once themining is complete, the results can be tested against the data held in the vault to
confirm the model’s validity. If the model works, its observations should hold for the vaulted data.
An Architecture for Data Mining
To best apply these advanced techniques, they must be fully integrated with a datawarehouse as well as flexible interactive business analysis tools. Many data miningtools currently operate outside of the warehouse, requiring extra steps for extracting, importing, and analyzing the data. Furthermore, when new insightsrequire operational implementation, integration with the warehouse simplifies the
application of results from data mining. The resulting analytic data warehouse can be applied to improve business processes throughout the organization, in areassuch as promotional campaign management, fraud detection, new product rollout,and so on. Figure 1 illustrates an architecture for advanced analysis in a large datawarehouse.
7/29/2019 Data Mining,Cobol,Memory
http://slidepdf.com/reader/full/data-miningcobolmemory 9/54
Figure 1 - Integrated Data Mining Architecture
The ideal starting point is a data warehouse containing a combination of internal
data tracking all customer contact coupled with external market data aboutcompetitor activity. Background information on potential customers also providesan excellent basis for prospecting. This warehouse can be implemented in a varietyof relational database systems: Sybase, Oracle, Redbrick, and so on, and should beoptimized for flexible and fast data access.
An OLAP (On-Line Analytical Processing) server enables a more sophisticatedend-user business model to be applied when navigating the data warehouse. Themultidimensional structures allow the user to analyze the data as they want to view
their business – summarizing by product line, region, and other key perspectives of their business. The Data Mining Server must be integrated with the data warehouseand the OLAP server to embed ROI-focused business analysis directly into thisinfrastructure. An advanced, process-centric metadata template defines the datamining objectives for specific business issues like campaign management,
prospecting, and promotion optimization. Integration with the data warehouseenables operational decisions to be directly implemented and tracked. As thewarehouse grows with new decisions and results, the organization can continuallymine the best practices and apply them to future decisions.
This design represents a fundamental shift from conventional decision supportsystems. Rather than simply delivering data to the end user through query andreporting software, the Advanced Analysis Server applies users’ business models
directly to the warehouse and returns a proactive analysis of the most relevantinformation. These results enhance the metadata in the OLAP Server by providinga dynamic metadata layer that represents a distilled view of the data. Reporting,visualization, and other analysis tools can then be applied to plan future actionsand confirm the impact of those plans.
Profitable Applications
A wide range of companies have deployed successful applications of data mining.While early adopters of this technology have tended to be in information-intensiveindustries such as financial services and direct mail marketing, the technology isapplicable to any company looking to leverage a large data warehouse to better manage their customer relationships. Two critical factors for success with datamining are: a large, well-integrated data warehouse and a well-defined
7/29/2019 Data Mining,Cobol,Memory
http://slidepdf.com/reader/full/data-miningcobolmemory 10/54
understanding of the business process within which data mining is to be applied(such as customer prospecting, retention, campaign management, and so on).
Some successful application areas include:
A pharmaceutical company can analyze its recent sales force activity andtheir results to improve targeting of high-value physicians and determinewhich marketing activities will have the greatest impact in the next fewmonths. The data needs to include competitor market activity as well asinformation about the local health care systems. The results can bedistributed to the sales force via a wide-area network that enables therepresentatives to review the recommendations from the perspective of thekey attributes in the decision process. The ongoing, dynamic analysis of thedata warehouse allows best practices from throughout the organization to be
applied in specific sales situations. A credit card company can leverage its vast warehouse of customer
transaction data to identify customers most likely to be interested in a newcredit product. Using a small test mailing, the attributes of customers with anaffinity for the product can be identified. Recent projects have indicatedmore than a 20-fold decrease in costs for targeted mailing campaigns over conventional approaches.
A diversified transportation company with a large direct sales force canapply data mining to identify the best prospects for its services. Using datamining to analyze its own customer experience, this company can build a
unique segmentation identifying the attributes of high-value prospects.Applying this segmentation to a general business database such as those
provided by Dun & Bradstreet can yield a prioritized list of prospects byregion.
A large consumer package goods company can apply data mining toimprove its sales process to retailers. Data from consumer panels, shipments,and competitor activity can be applied to understand the reasons for brandand store switching. Through this analysis, the manufacturer can select
promotional strategies that best reach their target customer segments.
Each of these examples have a clear common ground. They leverage theknowledge about customers implicit in a data warehouse to reduce costs andimprove the value of customer relationships. These organizations can now focustheir efforts on the most important (profitable) customers and prospects, and designtargeted marketing strategies to best reach them.
7/29/2019 Data Mining,Cobol,Memory
http://slidepdf.com/reader/full/data-miningcobolmemory 11/54
Conclusion
Comprehensive data warehouses that integrate operational data with customer,supplier, and market information have resulted in an explosion of information.Competition requires timely and sophisticated analysis on an integrated view of thedata. However, there is a growing gap between more powerful storage and retrievalsystems and the users’ ability to effectively analyze and act on the information they
contain. Both relational and OLAP technologies have tremendous capabilities for navigating massive data warehouses, but brute force navigation of data is notenough. A new technological leap is needed to structure and prioritize informationfor specific end-user problems. The data mining tools can make this leap.Quantifiable business benefits have been proven through the integration of datamining with current information systems, and new products are on the horizon thatwill bring this integration to an even wider audience of users.
1 META Group Application Development Strategies: "Data Mining for DataWarehouses: Uncovering Hidden Patterns.", 7/13/95 .
2 Gartner Group Advanced Technologies and Applications Research Note, 2/1/95.
3 Gartner Group High Performance Computing Research Note, 1/31/95.
Glossary of Data Mining Terms
analytical model A structure and process for analyzing a dataset. For example, adecision tree is a model for the classification of a dataset.
anomalous data Data that result from errors (for example, data entry keyingerrors) or that represent unusual events. Anomalous datashould be examined carefully because it may carry importantinformation.
artificial neuralnetworks
Non-linear predictive models that learn through training andresemble biological neural networks in structure.
CART Classification and Regression Trees. A decision tree technique
7/29/2019 Data Mining,Cobol,Memory
http://slidepdf.com/reader/full/data-miningcobolmemory 12/54
used for classification of a dataset. Provides a set of rules thatyou can apply to a new (unclassified) dataset to predict whichrecords will have a given outcome. Segments a dataset bycreating 2-way splits. Requires less data preparation than
CHAID.
CHAID Chi Square Automatic Interaction Detection. A decision treetechnique used for classification of a dataset. Provides a set of rules that you can apply to a new (unclassified) dataset to
predict which records will have a given outcome. Segments adataset by using chi square tests to create multi-way splits.Preceded, and requires more data preparation than, CART.
classification The process of dividing a dataset into mutually exclusive
groups such that the members of each group are as "close" as possible to one another, and different groups are as "far" as possible from one another, where distance is measured withrespect to specific variable(s) you are trying to predict. For example, a typical classification problem is to divide adatabase of companies into groups that are as homogeneous as
possible with respect to a creditworthiness variable with values"Good" and "Bad."
clustering The process of dividing a dataset into mutually exclusivegroups such that the members of each group are as "close" as
possible to one another, and different groups are as "far" as possible from one another, where distance is measured withrespect to all available variables.
data cleansing The process of ensuring that all values in a dataset areconsistent and correctly recorded.
data mining The extraction of hidden predictive information from large
databases.
data navigation The process of viewing different dimensions, slices, and levelsof detail of a multidimensional database. See OLAP.
datavisualization
The visual interpretation of complex relationships inmultidimensional data.
7/29/2019 Data Mining,Cobol,Memory
http://slidepdf.com/reader/full/data-miningcobolmemory 13/54
data warehouse A system for storing and delivering massive quantities of data.
decision tree A tree-shaped structure that represents a set of decisions. Thesedecisions generate rules for the classification of a dataset. See
CART and CHAID.
dimension In a flat or relational database, each field in a record representsa dimension. In a multidimensional database, a dimension is aset of similar entities; for example, a multidimensional salesdatabase might include the dimensions Product, Time, andCity.
exploratory dataanalysis
The use of graphical and descriptive statistical techniques tolearn about the structure of a dataset.
geneticalgorithms
Optimization techniques that use processes such as geneticcombination, mutation, and natural selection in a design basedon the concepts of natural evolution.
linear model An analytical model that assumes linear relationships in thecoefficients of the variables being studied.
linear regression A statistical technique used to find the best-fitting linear relationship between a target (dependent) variable and its
predictors (independent variables).
logisticregression
A linear regression that predicts the proportions of acategorical target variable, such as type of customer, in a
population.
multidimensionaldatabase
A database designed for on-line analytical processing.Structured as a multidimensional hypercube with one axis per dimension.
multiprocessor computer
A computer that includes multiple processors connected by anetwork. See parallel processing.
nearest neighbor A technique that classifies each record in a dataset based on acombination of the classes of the k record(s) most similar to itin a historical dataset (where k ³ 1). Sometimes called a k-
7/29/2019 Data Mining,Cobol,Memory
http://slidepdf.com/reader/full/data-miningcobolmemory 14/54
nearest neighbor technique.
non-linear model An analytical model that does not assume linear relationshipsin the coefficients of the variables being studied.
OLAP On-line analytical processing. Refers to array-oriented databaseapplications that allow users to view, navigate through,manipulate, and analyze multidimensional databases.
outlier A data item whose value falls outside the bounds enclosingmost of the other corresponding values in the sample. Mayindicate anomalous data. Should be examined carefully; maycarry important information.
parallel processing
The coordinated use of multiple processors to performcomputational tasks. Parallel processing can occur on amultiprocessor computer or on a network of workstations or PCs.
predictive model A structure and process for predicting the values of specifiedvariables in a dataset.
prospective dataanalysis
Data analysis that predicts future trends, behaviors, or events based on historical data.
RAID Redundant Array of Inexpensive Disks. A technology for theefficient parallel storage of data for high-performancecomputer systems.
retrospectivedata analysis
Data analysis that provides insights into trends, behaviors, or events that have already occurred.
rule induction The extraction of useful if-then rules from data based onstatistical significance.
SMP Symmetric multiprocessor. A type of multiprocessor computer in which memory is shared among the processors.
terabyte One trillion bytes.
time series The analysis of a sequence of measurements made at specified
7/29/2019 Data Mining,Cobol,Memory
http://slidepdf.com/reader/full/data-miningcobolmemory 15/54
analysis time intervals. Time is usually the dominating dimension of thedata.
Data mining
From Wikipedia, the free encyclopedia Not to be confused with information extraction.
This article or section reads like an editorial or opinion piece and may
require cleanup. Please improve this article by rewriting this article or section in an encyclopedic style to make it neutral in tone. Please see WP:Nooriginal research and WP:NOTOPINION for further details. (July 2010)
Data mining (the analysis step of the Knowledge Discovery in Databases process,or KDD), a relatively young and interdisciplinary field of computer science,[1][2] isthe process of extracting patterns from large data sets by combining methods fromstatistics and artificial intelligence with database management.[3]
With recent technical advances in processing power, storage capacity, and inter-
connectivity of computer technology, data mining is seen as an increasinglyimportant tool by modern business to transform unprecedented quantities of digitaldata into business intelligence giving an informational advantage. It is currentlyused in a wide range of profiling practices, such as marketing, surveillance, fraud detection, and scientific discovery. The growing consensus that data mining can
bring real value has led to an explosion in demand for novel data miningtechnologies.[4]
The related terms data dredging , data fishing and data snooping refer to the use of data mining methods to sample parts of a larger population data set that are (or
may be) too small for reliable statistical inferences to be made about the validity of any patterns discovered. These methods can, however, be used in creating newhypotheses to test against the larger data populations.
Contents
7/29/2019 Data Mining,Cobol,Memory
http://slidepdf.com/reader/full/data-miningcobolmemory 16/54
[hide]
1 Background o 1.1 Research and evolution
2 Process o 2.1 Pre-processing o 2.2 Data mining o 2.3 Results validation
3 Notable uses o 3.1 Games o 3.2 Business o 3.3 Science and engineering o 3.4 Spatial data mining
3.4.1 Challenges
o 3.5 Surveillance 3.5.1 Pattern mining 3.5.2 Subject-based data mining
4 Privacy concerns and ethics 5 Marketplace surveys 6 Groups and associations 7 See also
o 7.1 Methods and algorithms o 7.2 Applications o 7.3 Miscellaneous o 7.4 Commercial data-mining software and applications o 7.5 Free libre open source data-mining software and applications
8 References 9 Further reading 10 External links
[edit] Background
The manual extraction of patterns from data has occurred for centuries. Earlymethods of identifying patterns in data include Bayes' theorem (1700s) andregression analysis (1800s). The proliferation, ubiquity and increasing power of computer technology has increased data collection, storage and manipulations. Asdata sets have grown in size and complexity, direct hands-on data analysis hasincreasingly been augmented with indirect, automatic data processing. This has
been aided by other discoveries in computer science, such as neural networks,
7/29/2019 Data Mining,Cobol,Memory
http://slidepdf.com/reader/full/data-miningcobolmemory 17/54
clustering, genetic algorithms (1950s), decision trees (1960s) and support vector machines (1990s). Data mining is the process of applying these methods to datawith the intention of uncovering hidden patterns.[5] It has been used for many years
by businesses, scientists and governments to sift through volumes of data such asairline passenger trip records, census data and supermarket scanner data to producemarket research reports. (Note, however, that reporting is not always considered to
be data mining.)
A primary reason for using data mining is to assist in the analysis of collections of observations of behavior. Such data are vulnerable to collinearity because of unknown interrelations. An unavoidable fact of data mining is that the (sub-)set(s)of data being analyzed may not be representative of the whole domain, andtherefore may not contain examples of certain critical relationships and behaviorsthat exist across other parts of the domain. To address this sort of issue, the
analysis may be augmented using experiment-based and other approaches, such aschoice modelling for human-generated data. In these situations, inherentcorrelations can be either controlled for, or removed altogether, during theconstruction of the experimental design.
There have been some efforts to define standards for data mining, for example the1999 European Cross Industry Standard Process for Data Mining (CRISP-DM 1.0)and the 2004 Java Data Mining standard (JDM 1.0). These are evolving standards;later versions of these standards are under development. Independent of thesestandardization efforts, freely available open-source software systems like the R
language, Weka, KNIME, RapidMiner , jHepWork and others have become aninformal standard for defining data-mining processes. Notably, all these systemsare able to import and export models in PMML (Predictive Model MarkupLanguage) which provides a standard way to represent data mining models so thatthese can be shared between different statistical applications.[6] PMML is an XML-
based language developed by the Data Mining Group (DMG),[7] an independentgroup composed of many data mining companies. PMML version 4.0 was releasedin June 2009.[7][8][9]
[edit] Research and evolution
The premier professional body in the field is the Association for ComputingMachinery's Special Interest Group on Knowledge discovery and Data Mining(SIGKDD).[citation needed ] Since 1989 they have hosted an annual internationalconference and published its proceedings,[10] and since 1999 have published a
7/29/2019 Data Mining,Cobol,Memory
http://slidepdf.com/reader/full/data-miningcobolmemory 18/54
biannual academic journal titled "SIGKDD Explorations".[11]
Other computer science conferences on data mining include:
DMIN – International Conference on Data Mining[12] DMKD – Research Issues on Data Mining and Knowledge Discovery ECDM – European Conference on Data Mining ECML-PKDD – European Conference on Machine Learning and Principles
and Practice of Knowledge Discovery in Databases EDM – International Conference on Educational Data Mining ICDM – IEEE International Conference on Data Mining[13] MLDM – Machine Learning and Data Mining in Pattern Recognition PAKDD – The annual Pacific-Asia Conference on Knowledge Discovery
and Data Mining PAW – Predictive Analytics World[14]
SDM – SIAM International Conference on Data Mining
[edit] Process
The CRoss Industry Standard Process for Data Mining (CRISP-DM)[15] is a datamining process model that describes commonly used approaches that expert dataminers use to tackle problems. It defines six phases as (1) Business Understanding,(2) Data Understanding, (3) Data Preparation, (4) Modeling, (5) Evaluation, and(6) Deployment.[16]
Alternatively, other process models may define three phases as (1) Pre-processing,(2) Data mining, and (3) Results validation.
[edit] Pre-processing
Before data mining algorithms can be used, a target data set must be assembled. Asdata mining can only uncover patterns already present in the data, the target datasetmust be large enough to contain these patterns while remaining concise enough to
be mined in an acceptable timeframe. A common source for data is a datamart or data warehouse. Pre-process is essential to analyse the multivariate datasets before
data mining.
The target set is then cleaned. Data cleaning removes the observations with noise and missing data.
[edit] Data mining
7/29/2019 Data Mining,Cobol,Memory
http://slidepdf.com/reader/full/data-miningcobolmemory 19/54
Data mining commonly involves four classes of tasks:[17]
Association rule learning – Searches for relationships between variables. For example a supermarket might gather data on customer purchasing habits.Using association rule learning, the supermarket can determine which
products are frequently bought together and use this information for marketing purposes. This is sometimes referred to as market basket analysis.
Clustering – is the task of discovering groups and structures in the data thatare in some way or another "similar", without using known structures in thedata.
Classification – is the task of generalizing known structure to apply to newdata. For example, an email program might attempt to classify an email aslegitimate or spam. Common algorithms include decision tree learning, nearest neighbor , naive Bayesian classification, neural networks and support
vector machines. Regression – Attempts to find a function which models the data with the
least error.
[edit] Results validation
The final step of knowledge discovery from data is to verify the patterns produced by the data mining algorithms occur in the wider data set. Not all patterns found bythe data mining algorithms are necessarily valid. It is common for the data miningalgorithms to find patterns in the training set which are not present in the general
data set. This is called overfitting. To overcome this, the evaluation uses a test set of data on which the data mining algorithm was not trained. The learned patternsare applied to this test set and the resulting output is compared to the desiredoutput. For example, a data mining algorithm trying to distinguish spam fromlegitimate emails would be trained on a training set of sample emails. Once trained,the learned patterns would be applied to the test set of emails on which it had not
been trained. The accuracy of these patterns can then be measured from how manyemails they correctly classify. A number of statistical methods may be used toevaluate the algorithm such as ROC curves.
If the learned patterns do not meet the desired standards, then it is necessary toreevaluate and change the pre-processing and data mining. If the learned patternsdo meet the desired standards then the final step is to interpret the learned patternsand turn them into knowledge.
[edit] Notable uses
7/29/2019 Data Mining,Cobol,Memory
http://slidepdf.com/reader/full/data-miningcobolmemory 20/54
[edit] Games
Since the early 1960s, with the availability of oracles for certain combinatorialgames, also called tablebases (e.g. for 3x3-chess) with any beginningconfiguration, small-board dots-and-boxes, small-board-hex, and certain endgamesin chess, dots-and-boxes, and hex; a new area for data mining has been opened.This is the extraction of human-usable strategies from these oracles. Current
pattern recognition approaches do not seem to fully acquire the high level of abstraction required to be applied successfully. Instead, extensive experimentationwith the tablebases, combined with an intensive study of tablebase-answers to welldesigned problems and with knowledge of prior art, i.e. pre-tablebase knowledge,is used to yield insightful patterns. Berlekamp in dots-and-boxes etc. and John
Nunn in chess endgames are notable examples of researchers doing this work,though they were not and are not involved in tablebase generation.
[edit] Business
Data mining in customer relationship management applications can contributesignificantly to the bottom line.[citation needed ] Rather than randomly contacting a
prospect or customer through a call center or sending mail, a company canconcentrate its efforts on prospects that are predicted to have a high likelihood of responding to an offer. More sophisticated methods may be used to optimizeresources across campaigns so that one may predict to which channel and to whichoffer an individual is most likely to respond — across all potential offers.
Additionally, sophisticated applications could be used to automate the mailing.Once the results from data mining (potential prospect/customer and channel/offer)are determined, this "sophisticated application" can either automatically send an e-mail or regular mail. Finally, in cases where many people will take an actionwithout an offer, uplift modeling can be used to determine which people will havethe greatest increase in responding if given an offer. Data clustering can also beused to automatically discover the segments or groups within a customer data set.
Businesses employing data mining may see a return on investment, but also theyrecognize that the number of predictive models can quickly become very large.Rather than one model to predict how many customers will churn, a business could
build a separate model for each region and customer type. Then instead of sendingan offer to all people that are likely to churn, it may only want to send offers tocustomers. Finally, it may want to determine which customers are going to be
profitable over a window of time and only send the offers to those that are likely to
7/29/2019 Data Mining,Cobol,Memory
http://slidepdf.com/reader/full/data-miningcobolmemory 21/54
be profitable. In order to maintain this quantity of models, they need to managemodel versions and move to automated data mining .
Data mining can also be helpful to human-resources departments in identifying thecharacteristics of their most successful employees. Information obtained, such asuniversities attended by highly successful employees, can help HR focus recruitingefforts accordingly. Additionally, Strategic Enterprise Management applicationshelp a company translate corporate-level goals, such as profit and margin sharetargets, into operational decisions, such as production plans and workforcelevels.[18]
Another example of data mining, often called the market basket analysis, relates toits use in retail sales. If a clothing store records the purchases of customers, a data-mining system could identify those customers who favor silk shirts over cotton
ones. Although some explanations of relationships may be difficult, takingadvantage of it is easier. The example deals with association rules withintransaction-based data. Not all data are transaction based and logical or inexactrules may also be present within a database.
Market basket analysis has also been used to identify the purchase patterns of theAlpha consumer . Alpha Consumers are people that play a key role in connectingwith the concept behind a product, then adopting that product, and finallyvalidating it for the rest of society. Analyzing the data collected on this type of user has allowed companies to predict future buying trends and forecast supply
demands.[citation needed ]
Data Mining is a highly effective tool in the catalog marketing industry.[citation needed ] Catalogers have a rich history of customer transactions on millions of customersdating back several years. Data mining tools can identify patterns amongcustomers and help identify the most likely customers to respond to upcomingmailing campaigns.
Data Mining for business applications is a component which needs to be integratedinto a complex modelling and decision making process. Reactive BusinessIntelligence (RBI) advocates a holistic approach that integrates data mining,modeling and interactive visualization, into an end-to-end discovery andcontinuous innovation process powered by human and automated learning.[19] Inthe area of decision making the RBI approach has been used to mine theknowledge which is progressively acquired from the decision maker and self-tunethe decision method accordingly.[20]
7/29/2019 Data Mining,Cobol,Memory
http://slidepdf.com/reader/full/data-miningcobolmemory 22/54
Related to an integrated-circuit production line, an example of data mining isdescribed in the paper "Mining IC Test Data to Optimize VLSI Testing."[21] In this
paper the application of data mining and decision analysis to the problem of die-level functional test is described. Experiments mentioned in this paper demonstratethe ability of applying a system of mining historical die-test data to create a
probabilistic model of patterns of die failure. These patterns are then utilized todecide in real time which die to test next and when to stop testing. This system has
been shown, based on experiments with historical test data, to have the potential toimprove profits on mature IC products.
[edit] Science and engineering
In recent years, data mining has been used widely in the areas of science andengineering, such as bioinformatics, genetics, medicine, education and electrical
power engineering.
In the study of human genetics, an important goal is to understand the mappingrelationship between the inter-individual variation in human DNA sequences andvariability in disease susceptibility. In lay terms, it is to find out how the changesin an individual's DNA sequence affect the risk of developing common diseasessuch as cancer . This is very important to help improve the diagnosis, preventionand treatment of the diseases. The data mining method that is used to perform thistask is known as multifactor dimensionality reduction.[22]
In the area of electrical power engineering, data mining methods have been widelyused for condition monitoring of high voltage electrical equipment. The purpose of condition monitoring is to obtain valuable information on the insulation's healthstatus of the equipment. Data clustering such as self-organizing map (SOM) has
been applied on the vibration monitoring and analysis of transformer on-load tap-changers(OLTCS). Using vibration monitoring, it can be observed that each tapchange operation generates a signal that contains information about the conditionof the tap changer contacts and the drive mechanisms. Obviously, different tap
positions will generate different signals. However, there was considerablevariability amongst normal condition signals for exactly the same tap position.SOM has been applied to detect abnormal conditions and to estimate the nature of the abnormalities.[23]
Data mining methods have also been applied for dissolved gas analysis (DGA) on power transformers. DGA, as a diagnostics for power transformer, has beenavailable for many years. Methods such as SOM has been applied to analyze data
7/29/2019 Data Mining,Cobol,Memory
http://slidepdf.com/reader/full/data-miningcobolmemory 23/54
and to determine trends which are not obvious to the standard DGA ratio methodssuch as Duval Triangle.[23]
A fourth area of application for data mining in science/engineering is withineducational research, where data mining has been used to study the factors leadingstudents to choose to engage in behaviors which reduce their learning[24] and tounderstand the factors influencing university student retention.[25] A similar example of the social application of data mining is its use in expertise findingsystems, whereby descriptors of human expertise are extracted, normalized andclassified so as to facilitate the finding of experts, particularly in scientific andtechnical fields. In this way, data mining can facilitate Institutional memory.
Other examples of applying data mining method applications are biomedical datafacilitated by domain ontologies,[26] mining clinical trial data,[27] traffic analysis
using SOM,[28]
et cetera.
In adverse drug reaction surveillance, the Uppsala Monitoring Centre has, since1998, used data mining methods to routinely screen for reporting patternsindicative of emerging drug safety issues in the WHO global database of 4.6 million suspected adverse drug reaction incidents.[29] Recently, similar methodology has been developed to mine large collections of electronic healthrecords for temporal patterns associating drug prescriptions to medicaldiagnoses.[30]
[edit] Spatial data mining
Spatial data mining is the application of data mining methods to spatial data.Spatial data mining follows along the same functions in data mining, with the endobjective to find patterns in geography. So far, data mining and GeographicInformation Systems (GIS) have existed as two separate technologies, each with itsown methods, traditions and approaches to visualization and data analysis.Particularly, most contemporary GIS have only very basic spatial analysisfunctionality. The immense explosion in geographically referenced dataoccasioned by developments in IT, digital mapping, remote sensing, and the globaldiffusion of GIS emphasises the importance of developing data driven inductiveapproaches to geographical analysis and modeling.
Data mining, which is the partially automated search for hidden patterns in largedatabases, offers great potential benefits for applied GIS-based decision-making.Recently, the task of integrating these two technologies has become critical,especially as various public and private sector organizations possessing huge
7/29/2019 Data Mining,Cobol,Memory
http://slidepdf.com/reader/full/data-miningcobolmemory 24/54
databases with thematic and geographically referenced data begin to realise thehuge potential of the information hidden there. Among those organizations are:
offices requiring analysis or dissemination of geo-referenced statistical data public health services searching for explanations of disease clusters environmental agencies assessing the impact of changing land-use patterns
on climate change geo-marketing companies doing customer segmentation based on spatial
location.
[edit] Challenges
Geospatial data repositories tend to be very large. Moreover, existing GIS datasetsare often splintered into feature and attribute components, that are conventionally
archived in hybrid data management systems. Algorithmic requirements differ substantially for relational (attribute) data management and for topological(feature) data management.[31] Related to this is the range and diversity of geographic data formats, that also presents unique challenges. The digitalgeographic data revolution is creating new types of data formats beyond thetraditional "vector" and "raster" formats. Geographic data repositories increasinglyinclude ill-structured data such as imagery and geo-referenced multi-media.[32]
There are several critical research challenges in geographic knowledge discoveryand data mining. Miller and Han [33] offer the following list of emerging research
topics in the field:
Developing and supporting geographic data warehouses – Spatial properties are often reduced to simple aspatial attributes in mainstream datawarehouses. Creating an integrated GDW requires solving issues in spatialand temporal data interoperability, including differences in semantics,referencing systems, geometry, accuracy and position.
Better spatio-temporal representations in geographic knowledge
discovery – Current geographic knowledge discovery (GKD) methodsgenerally use very simple representations of geographic objects and spatialrelationships. Geographic data mining methods should recognize morecomplex geographic objects (lines and polygons) and relationships (non-Euclidean distances, direction, connectivity and interaction throughattributed geographic space such as terrain). Time needs to be more fullyintegrated into these geographic representations and relationships.
7/29/2019 Data Mining,Cobol,Memory
http://slidepdf.com/reader/full/data-miningcobolmemory 25/54
Geographic knowledge discovery using diverse data types – GKDmethods should be developed that can handle diverse data types beyond thetraditional raster and vector models, including imagery and geo-referencedmultimedia, as well as dynamic data types (video streams, animation).
In four annual surveys of data miners (2007-2010),[34][35][36][37] data mining practitioners consistently identified that they faced three key challenges more thanany others:
Dirty Data Explaining Data Mining to Others Unavailability of Data / Difficult Access to Data
In the 2010 survey data miners also shared their experiences in overcoming these
challenges.
[38]
[edit] Surveillance
Prior data mining to stop terrorist programs under the U.S. government include theTotal Information Awareness (TIA) program, Secure Flight (formerly known asComputer-Assisted Passenger Prescreening System (CAPPS II)), Analysis,Dissemination, Visualization, Insight, Semantic Enhancement (ADVISE),[39] andthe Multi-state Anti-Terrorism Information Exchange (MATRIX).[40] These
programs have been discontinued due to controversy over whether they violate the
US Constitution's 4th amendment, although many programs that were formedunder them continue to be funded by different organizations, or under differentnames.[41]
Two plausible data mining methods in the context of combating terrorism include"pattern mining" and "subject-based data mining".
[edit] Pattern mining
"Pattern mining" is a data mining method that involves finding existing patterns in
data. In this context patterns often means association rules. The original motivationfor searching association rules came from the desire to analyze supermarkettransaction data, that is, to examine customer behavior in terms of the purchased
products. For example, an association rule "beer ⇒ potato chips (80%)" states thatfour out of five customers that bought beer also bought potato chips.
7/29/2019 Data Mining,Cobol,Memory
http://slidepdf.com/reader/full/data-miningcobolmemory 26/54
In the context of pattern mining as a tool to identify terrorist activity, the NationalResearch Council provides the following definition: "Pattern-based data mininglooks for patterns (including anomalous data patterns) that might be associatedwith terrorist activity — these patterns might be regarded as small signals in alarge ocean of noise."
[42][43][44] Pattern Mining includes new areas such a Music
Information Retrieval (MIR) where patterns seen both in the temporal and nontemporal domains are imported to classical knowledge discovery search methods.
[edit] Subject-based data mining
"Subject-based data mining" is a data mining method involving the search for associations between individuals in data. In the context of combating terrorism, the
National Research Council provides the following definition: "Subject-based datamining uses an initiating individual or other datum that is considered, based on
other information, to be of high interest, and the goal is to determine what other persons or financial transactions or movements, etc., are related to that initiatingdatum."[43]
[edit] Privacy concerns and ethics
Some people believe that data mining itself is ethically neutral.[45] It is important tonote that the term data mining has no ethical implications. The term is oftenassociated with the mining of information in relation to peoples' behavior.However, data mining is a statistical method that is applied to a set of information,
or a data set. Associating these data sets with people is an extreme narrowing of the types of data that are available in today's technological society. Examples couldrange from a set of crash test data for passenger vehicles, to the performance of agroup of stocks. These types of data sets make up a great proportion of theinformation available to be acted on by data mining methods, and rarely haveethical concerns associated with them. However, the ways in which data miningcan be used can raise questions regarding privacy, legality, and ethics.
[46] In
particular, data mining government or commercial data sets for national security or law enforcement purposes, such as in the Total Information Awareness Program or in ADVISE, has raised privacy concerns.[47][48]
Data mining requires data preparation which can uncover information or patternswhich may compromise confidentiality and privacy obligations. A common wayfor this to occur is through data aggregation. Data aggregation is when the data areaccrued, possibly from various sources, and put together so that they can beanalyzed.[49] This is not data mining per se, but a result of the preparation of data
7/29/2019 Data Mining,Cobol,Memory
http://slidepdf.com/reader/full/data-miningcobolmemory 27/54
before and for the purposes of the analysis. The threat to an individual's privacycomes into play when the data, once compiled, cause the data miner, or anyonewho has access to the newly compiled data set, to be able to identify specificindividuals, especially when originally the data were anonymous.
It is recommended that an individual is made aware of the following before dataare collected:
the purpose of the data collection and any data mining projects, how the data will be used, who will be able to mine the data and use them, the security surrounding access to the data, and in addition, how collected data can be updated.[49]
In the United States, privacy concerns have been somewhat addressed by their congress via the passage of regulatory controls such as the Health InsurancePortability and Accountability Act (HIPAA). The HIPAA requires individuals to
be given "informed consent" regarding any information that they provide and itsintended future uses by the facility receiving that information. According to anarticle in Biotech Business Week, "In practice, HIPAA may not offer any greater
protection than the longstanding regulations in the research arena, says the AAHC.More importantly, the rule's goal of protection through informed consent isundermined by the complexity of consent forms that are required of patients and
participants, which approach a level of incomprehensibility to average
individuals." [50] This underscores the necessity for data anonymity in dataaggregation practices.
One may additionally modify the data so that they are anonymous, so thatindividuals may not be readily identified.[49] However, even de-identified data setscan contain enough information to identify individuals, as occurred when
journalists were able to find several individuals based on a set of search historiesthat were inadvertently released by AOL.[51]
Data Mining: What is Data Mining?
Overview
7/29/2019 Data Mining,Cobol,Memory
http://slidepdf.com/reader/full/data-miningcobolmemory 28/54
Generally, data mining (sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it intouseful information - information that can be used to increase revenue, cuts costs, or
both. Data mining software is one of a number of analytical tools for analyzingdata. It allows users to analyze data from many different dimensions or angles,categorize it, and summarize the relationships identified. Technically, data miningis the process of finding correlations or patterns among dozens of fields in largerelational databases.
Continuous Innovation
Although data mining is a relatively new term, the technology is not. Companieshave used powerful computers to sift through volumes of supermarket scanner dataand analyze market research reports for years. However, continuous innovations in
computer processing power, disk storage, and statistical software are dramaticallyincreasing the accuracy of analysis while driving down the cost.
Example
For example, one Midwest grocery chain used the data mining capacity of Oracle software to analyze local buying patterns. They discovered that when men boughtdiapers on Thursdays and Saturdays, they also tended to buy beer. Further analysisshowed that these shoppers typically did their weekly grocery shopping onSaturdays. On Thursdays, however, they only bought a few items. The retailer
concluded that they purchased the beer to have it available for the upcomingweekend. The grocery chain could use this newly discovered information invarious ways to increase revenue. For example, they could move the beer displaycloser to the diaper display. And, they could make sure beer and diapers were soldat full price on Thursdays.
Data, Information, and Knowledge
Data
Data are any facts, numbers, or text that can be processed by a computer. Today,organizations are accumulating vast and growing amounts of data in differentformats and different databases. This includes:
operational or transactional data such as, sales, cost, inventory, payroll, andaccounting
7/29/2019 Data Mining,Cobol,Memory
http://slidepdf.com/reader/full/data-miningcobolmemory 29/54
nonoperational data, such as industry sales, forecast data, and macroeconomic data
meta data - data about the data itself, such as logical database design or datadictionary definitions
Information
The patterns, associations, or relationships among all this data can provideinformation. For example, analysis of retail point of sale transaction data can yieldinformation on which products are selling and when.
Knowledge
Information can be converted into knowledge about historical patterns and futuretrends. For example, summary information on retail supermarket sales can beanalyzed in light of promotional efforts to provide knowledge of consumer buying
behavior. Thus, a manufacturer or retailer could determine which items are mostsusceptible to promotional efforts.
Data Warehouses
Dramatic advances in data capture, processing power, data transmission, andstorage capabilities are enabling organizations to integrate their various databases
into data warehouses. Data warehousing is defined as a process of centralized datamanagement and retrieval. Data warehousing, like data mining, is a relatively newterm although the concept itself has been around for years. Data warehousingrepresents an ideal vision of maintaining a central repository of all organizationaldata. Centralization of data is needed to maximize user access and analysis.Dramatic technological advances are making this vision a reality for manycompanies. And, equally dramatic advances in data analysis software are allowingusers to access this data freely. The data analysis software is what supports datamining.
What can data mining do?
Data mining is primarily used today by companies with a strong consumer focus -retail, financial, communication, and marketing organizations. It enables thesecompanies to determine relationships among "internal" factors such as price,
product positioning, or staff skills, and "external" factors such as economicindicators, competition, and customer demographics. And, it enables them to
7/29/2019 Data Mining,Cobol,Memory
http://slidepdf.com/reader/full/data-miningcobolmemory 30/54
determine the impact on sales, customer satisfaction, and corporate profits. Finally,it enables them to "drill down" into summary information to view detailtransactional data.
With data mining, a retailer could use point-of-sale records of customer purchasesto send targeted promotions based on an individual's purchase history. By miningdemographic data from comment or warranty cards, the retailer could develop
products and promotions to appeal to specific customer segments.
For example, Blockbuster Entertainment mines its video rental history database torecommend rentals to individual customers. American Express can suggest
products to its cardholders based on analysis of their monthly expenditures.
WalMart is pioneering massive data mining to transform its supplier relationships.
WalMart captures point-of-sale transactions from over 2,900 stores in 6 countriesand continuously transmits this data to its massive 7.5 terabyte Teradata datawarehouse. WalMart allows more than 3,500 suppliers, to access data on their
products and perform data analyses. These suppliers use this data to identifycustomer buying patterns at the store display level. They use this information tomanage local store inventory and identify new merchandising opportunities. In1995, WalMart computers processed over 1 million complex data queries.
The National Basketball Association (NBA) is exploring a data mining applicationthat can be used in conjunction with image recordings of basketball games. The
Advanced Scout software analyzes the movements of players to help coachesorchestrate plays and strategies. For example, an analysis of the play-by-play sheetof the game played between the New York Knicks and the Cleveland Cavaliers onJanuary 6, 1995 reveals that when Mark Price played the Guard position, JohnWilliams attempted four jump shots and made each one! Advanced Scout not onlyfinds this pattern, but explains that it is interesting because it differs considerablyfrom the average shooting percentage of 49.30% for the Cavaliers during thatgame.
By using the NBA universal clock, a coach can automatically bring up the videoclips showing each of the jump shots attempted by Williams with Price on thefloor, without needing to comb through hours of video footage. Those clips show avery successful pick-and-roll play in which Price draws the Knick's defense andthen finds Williams for an open jump shot.
How does data mining work?
7/29/2019 Data Mining,Cobol,Memory
http://slidepdf.com/reader/full/data-miningcobolmemory 31/54
While large-scale information technology has been evolving separate transactionand analytical systems, data mining provides the link between the two. Datamining software analyzes relationships and patterns in stored transaction data
based on open-ended user queries. Several types of analytical software areavailable: statistical, machine learning, and neural networks. Generally, any of four types of relationships are sought:
Classes: Stored data is used to locate data in predetermined groups. For example, a restaurant chain could mine customer purchase data to determinewhen customers visit and what they typically order. This information could
be used to increase traffic by having daily specials.
Clusters: Data items are grouped according to logical relationships or consumer preferences. For example, data can be mined to identify market
segments or consumer affinities.
Associations: Data can be mined to identify associations. The beer-diaper example is an example of associative mining.
Sequential patterns: Data is mined to anticipate behavior patterns andtrends. For example, an outdoor equipment retailer could predict thelikelihood of a backpack being purchased based on a consumer's purchase of sleeping bags and hiking shoes.
Data mining consists of five major elements:
Extract, transform, and load transaction data onto the data warehousesystem.
Store and manage the data in a multidimensional database system.
Provide data access to business analysts and information technology professionals.
Analyze the data by application software.
Present the data in a useful format, such as a graph or table.
Different levels of analysis are available:
7/29/2019 Data Mining,Cobol,Memory
http://slidepdf.com/reader/full/data-miningcobolmemory 32/54
Artificial neural networks: Non-linear predictive models that learn throughtraining and resemble biological neural networks in structure.
Genetic algorithms: Optimization techniques that use processes such asgenetic combination, mutation, and natural selection in a design based on theconcepts of natural evolution.
Decision trees: Tree-shaped structures that represent sets of decisions.These decisions generate rules for the classification of a dataset. Specificdecision tree methods include Classification and Regression Trees (CART)and Chi Square Automatic Interaction Detection (CHAID) . CART andCHAID are decision tree techniques used for classification of a dataset.They provide a set of rules that you can apply to a new (unclassified) datasetto predict which records will have a given outcome. CART segments a
dataset by creating 2-way splits while CHAID segments using chi squaretests to create multi-way splits. CART typically requires less data preparation than CHAID.
Nearest neighbor method: A technique that classifies each record in adataset based on a combination of the classes of the k record(s) most similar to it in a historical dataset (where k 1). Sometimes called the k -nearestneighbor technique.
Rule induction: The extraction of useful if-then rules from data based on
statistical significance.
Data visualization: The visual interpretation of complex relationships inmultidimensional data. Graphics tools are used to illustrate datarelationships.
What technological infrastructure is required?
Today, data mining applications are available on all size systems for mainframe, client/server, and PC platforms. System prices range from severalthousand dollars for the smallest applications up to $1 million a terabyte for thelargest. Enterprise-wide applications generally range in size from 10 gigabytes toover 11 terabytes. NCR has the capacity to deliver applications exceeding 100terabytes. There are two critical technological drivers:
7/29/2019 Data Mining,Cobol,Memory
http://slidepdf.com/reader/full/data-miningcobolmemory 33/54
Size of the database: the more data being processed and maintained, themore powerful the system required.
Query complexity: the more complex the queries and the greater thenumber of queries being processed, the more powerful the system required.
Relational database storage and management technology is adequate for manydata mining applications less than 50 gigabytes. However, this infrastructure needsto be significantly enhanced to support larger applications. Some vendors haveadded extensive indexing capabilities to improve query performance. Others usenew hardware architectures such as Massively Parallel Processors (MPP) toachieve order-of-magnitude improvements in query time. For example, MPPsystems from NCR link hundreds of high-speed Pentium processors to achieve
performance levels exceeding those of the largest supercomputers.
7/29/2019 Data Mining,Cobol,Memory
http://slidepdf.com/reader/full/data-miningcobolmemory 34/54
Memory
Memory
From Wikipedia, the free encyclopediaFor other uses, see Memory (disambiguation).
Neuropsychology
Topics[show]
Brain functions[show]
People[show]
Tests[show]
Mind and Brain Portal
v · d · e
Overview of the forms and functions of memory in the sciences
7/29/2019 Data Mining,Cobol,Memory
http://slidepdf.com/reader/full/data-miningcobolmemory 35/54
In psychology, memory is an organism's ability to store, retain, and recall information andexperiences. Traditional studies of memory began in the fields of philosophy, includingtechniques of artificially enhancing memory. During the late nineteenth and early twentiethcentury, scientists have put memory within the paradigm of cognitive psychology. In recentdecades, it has become one of the principal pillars of a branch of science called cognitive
neuroscience, an interdisciplinary link between cognitive psychology and neuroscience.
Contents
[hide]
1 Processes o 1.1 Sensory memory o 1.2 Short-term o 1.3 Long-term
2 Models o 2.1 Atkinson-Shiffrin model o 2.2 Working memory o 2.3 Levels of processing
3 Classification by information type 4 Classification by temporal direction 5 Physiology 6 Genetics 7 Disorders 8 Methods 9 Memory and aging 10 Improving memory 11 Memory tasks 12 See also 13 Footnotes 14 References 15 External links
[edit] Processes
From an information processing perspective there are three main stages in the formation and
retrieval of memory:
Encoding or registration (receiving, processing and combining of received information) Storage (creation of a permanent record of the encoded information) Retrieval , recall or recollection (calling back the stored information in response to some
cue for use in a process or activity)
[edit] Sensory memory
7/29/2019 Data Mining,Cobol,Memory
http://slidepdf.com/reader/full/data-miningcobolmemory 36/54
Main article: Sensory memory
Sensory memory corresponds approximately to the initial 200 – 500 milliseconds after an item is perceived. The ability to look at an item, and remember what it looked like with just a second of observation, or memorisation, is an example of sensory memory. With very short presentations,
participants often report that they seem to "see" more than they can actually report. The firstexperiments exploring this form of sensory memory were conducted by George Sperling (1960)using the "partial report paradigm". Subjects were presented with a grid of 12 letters, arrangedinto three rows of four. After a brief presentation, subjects were then played either a high,medium or low tone, cuing them which of the rows to report. Based on these partial reportexperiments, Sperling was able to show that the capacity of sensory memory was approximately12 items, but that it degraded very quickly (within a few hundred milliseconds). Because thisform of memory degrades so quickly, participants would see the display, but be unable to reportall of the items (12 in the "whole report" procedure) before they decayed. This type of memorycannot be prolonged via rehearsal.
[edit] Short-term
Main article: Short-term memory
Short-term memory allows recall for a period of several seconds to a minute without rehearsal.Its capacity is also very limited: George A. Miller (1956), when working at Bell Laboratories, conducted experiments showing that the store of short-term memory was 7±2 items (the title of his famous paper, "The magical number 7±2"). Modern estimates of the capacity of short-termmemory are lower, typically on the order of 4 – 5 items,[1] however, memory capacity can beincreased through a process called chunking.[2] For example, in recalling a ten-digit telephonenumber , a person could chunk the digits into three groups: first, the area code (such as 215), then
a three-digit chunk (123) and lastly a four-digit chunk (4567). This method of rememberingtelephone numbers is far more effective than attempting to remember a string of 10 digits; this is because we are able to chunk the information into meaningful groups of numbers. Herbert Simon showed that the ideal size for chunking letters and numbers, meaningful or not, was three.[citation
needed ] This may be reflected in some countries in the tendency to remember telephone numbers asseveral chunks of three numbers with the final four-number groups, generally broken down intotwo groups of two.
Short-term memory is believed to rely mostly on an acoustic code for storing information, and toa lesser extent a visual code. Conrad (1964)[3] found that test subjects had more difficultyrecalling collections of letters that were acoustically similar (e.g. E, P, D). Confusion with
recalling acoustically similar letters rather than visually similar letters implies that the letterswere encoded acoustically. Conrad's (1964) study however, deals with the encoding of writtentext, thus while memory of written language may rely on acoustic components, generalisations toall forms of memory cannot be made.
However, some individuals have been reported to be able to remember large amounts of information, quickly, and be able to recall that information in seconds.[citation needed ]
7/29/2019 Data Mining,Cobol,Memory
http://slidepdf.com/reader/full/data-miningcobolmemory 37/54
[edit] Long-term
Olin Levi Warner , Memory (1896). Library of Congress Thomas Jefferson Building, Washington, D.C.Main article: Long-term memory
The storage in sensory memory and short-term memory generally have a strictly limited capacityand duration, which means that information is available only for a certain period of time, but is
not retained indefinitely. By contrast, long-term memory can store much larger quantities of information for potentially unlimited duration (sometimes a whole life span). Its capacity isimmeasurably large. For example, given a random seven-digit number we may remember it for only a few seconds before forgetting, suggesting it was stored in our short-term memory. On theother hand, we can remember telephone numbers for many years through repetition; thisinformation is said to be stored in long-term memory.
While short-term memory encodes information acoustically, long-term memory encodes itsemantically: Baddeley (1966)[4] discovered that after 20 minutes, test subjects had the mostdifficulty recalling a collection of words that had similar meanings (e.g. big, large, great, huge).
Short-term memory is supported by transient patterns of neuronal communication, dependent onregions of the frontal lobe (especially dorsolateral prefrontal cortex) and the parietal lobe. Long-term memories, on the other hand, are maintained by more stable and permanent changes inneural connections widely spread throughout the brain. The hippocampus is essential (for learning new information) to the consolidation of information from short-term to long-termmemory, although it does not seem to store information itself. Without the hippocampus, newmemories are unable to be stored into long-term memory, and there will be a very short attentionspan. Furthermore, it may be involved in changing neural connections for a period of three
7/29/2019 Data Mining,Cobol,Memory
http://slidepdf.com/reader/full/data-miningcobolmemory 38/54
months or more after the initial learning. One of the primary functions of sleep is thought to beimproving consolidation of information, as several studies have demonstrated that memorydepends on getting sufficient sleep between training and test. Additionally, data obtained fromneuroimaging studies have shown activation patterns in the sleeping brain which mirror thoserecorded during the learning of tasks from the previous day, suggesting that new memories may
be solidified through such rehearsal.
[edit] Models
Models of memory provide abstract representations of how memory is believed to work. Beloware several models proposed over the years by various psychologists. Note that there is somecontroversy as to whether there are several memory structures, for example, Tarnow (2005) findsthat it is likely that there is only one memory structure between 6 and 600 seconds.
[edit] Atkinson-Shiffrin model
See also: Memory consolidation
The multi-store model (also known as Atkinson-Shiffrin memory model) was first recognised in1968 by Atkinson and Shiffrin.
The multi-store model has been criticised for being too simplistic. For instance, long-termmemory is believed to be actually made up of multiple subcomponents, such as episodic and procedural memory. It also proposes that rehearsal is the only mechanism by which informationeventually reaches long-term storage, but evidence shows us capable of remembering thingswithout rehearsal.
The model also shows all the memory stores as being a single unit whereas research into thisshows differently. For example, short-term memory can be broken up into different units such asvisual information and acoustic information. Patient KF proves this. Patient KF was braindamaged and had problems with his short term memory. He had problems with things such asspoken numbers, letters and words and with significant sounds (such as doorbells and catsmeowing). Other parts of short term memory were unaffected, such as visual (pictures).[5]
It also shows the sensory store as a single unit whilst we know that the sensory store is split upinto several different parts such as taste, vision, and hearing.
[edit] Working memory
7/29/2019 Data Mining,Cobol,Memory
http://slidepdf.com/reader/full/data-miningcobolmemory 39/54
The working memory model.Main article: working memory
In 1974 Baddeley and Hitch proposed a working memory model which replaced the concept of general short term memory with specific, active components. In this model, working memoryconsists of three basic stores: the central executive, the phonological loop and the visuo-spatialsketchpad. In 2000 this model was expanded with the multimodal episodic buffer .[6]
The central executive essentially acts as attention. It channels information to the three component processes: the phonological loop, the visuo-spatial sketchpad, and the episodic buffer.
The phonological loop stores auditory information by silently rehearsing sounds or words in acontinuous loop: the articulatory process (for example the repetition of a telephone number over and over again). Then, a short list of data is easier to remember.
The visuospatial sketchpad stores visual and spatial information. It is engaged when performingspatial tasks (such as judging distances) or visual ones (such as counting the windows on a houseor imagining images).
The episodic buffer is dedicated to linking information across domains to form integrated unitsof visual, spatial, and verbal information and chronological ordering (e.g., the memory of a storyor a movie scene). The episodic buffer is also assumed to have links to long-term memory andsemantical meaning.
The working memory model explains many practical observations, such as why it is easier to dotwo different tasks (one verbal and one visual) than two similar tasks (e.g., two visual), and theaforementioned word-length effect. However, the concept of a central executive as noted herehas been criticised as inadequate and vague.[citation needed ]
[edit] Levels of processing
Main article: Levels-of-processing effect
Craik and Lockhart (1972) proposed that it is the method and depth of processing that affectshow an experience is stored in memory, rather than rehearsal.
7/29/2019 Data Mining,Cobol,Memory
http://slidepdf.com/reader/full/data-miningcobolmemory 40/54
Organization - Mandler (1967) gave participants a pack of word cards and asked them tosort them into any number of piles using any system of categorisation they liked. Whenthey were later asked to recall as many of the words as they could, those who used morecategories remembered more words. This study suggested that the act of organisinginformation makes it more memorable.
Distinctiveness - Eysenck and Eysenck (1980) asked participants to say words in adistinctive way, e.g. spell the words out loud. Such participants recalled the words better than those who simply read them off a list.
Effort - Tyler et al. (1979) had participants solve a series of anagrams, some easy(FAHTER) and some difficult (HREFAT). The participants recalled the difficultanagrams better, presumably because they put more effort into them.
Elaboration - Palmere et al. (1983) gave participants descriptive paragraphs of afictitious African nation. There were some short paragraphs and some with extrasentences elaborating the main idea. Recall was higher for the ideas in the elaborated paragraphs.
[edit] Classification by information type
Anderson (1976)[7] divides long-term memory into declarative (explicit) and procedural
(implicit) memories.
Declarative memory requires conscious recall, in that some conscious process must call back theinformation. It is sometimes called explicit memory, since it consists of information that isexplicitly stored and retrieved.
Declarative memory can be further sub-divided into semantic memory, which concerns factstaken independent of context; and episodic memory, which concerns information specific to a
particular context, such as a time and place. Semantic memory allows the encoding of abstractknowledge about the world, such as "Paris is the capital of France". Episodic memory, on theother hand, is used for more personal memories, such as the sensations, emotions, and personalassociations of a particular place or time. Autobiographical memory - memory for particular events within one's own life - is generally viewed as either equivalent to, or a subset of, episodicmemory. Visual memory is part of memory preserving some characteristics of our senses pertaining to visual experience. One is able to place in memory information that resemblesobjects, places, animals or people in sort of a mental image. Visual memory can result in priming and it is assumed some kind of perceptual representational system underlies this phenomenon.[2]
In contrast, procedural memory (or implicit memory) is not based on the conscious recall of information, but on implicit learning. Procedural memory is primarily employed in learningmotor skills and should be considered a subset of implicit memory. It is revealed when one does better in a given task due only to repetition - no new explicit memories have been formed, butone is unconsciously accessing aspects of those previous experiences. Procedural memoryinvolved in motor learning depends on the cerebellum and basal ganglia.
7/29/2019 Data Mining,Cobol,Memory
http://slidepdf.com/reader/full/data-miningcobolmemory 41/54
Topographic memory is the ability to orient oneself in space, to recognize and follow anitinerary, or to recognize familiar places.[8] Getting lost when traveling alone is an example of thefailure of topographic memory. This is often reported among elderly patients who are evaluatedfor dementia. The disorder could be caused by multiple impairments, including difficulties with perception, orientation, and memory.[9]
[edit] Classification by temporal direction
A further major way to distinguish different memory functions is whether the content to beremembered is in the past, retrospective memory, or whether the content is to be remembered inthe future, prospective memory. Thus, retrospective memory as a category includes semantic,episodic and autobiographical memory. In contrast, prospective memory is memory for futureintentions, or remembering to remember (Winograd, 1988). Prospective memory can be further broken down into event- and time-based prospective remembering. Time-based prospectivememories are triggered by a time-cue, such as going to the doctor (action) at 4pm (cue). Event- based prospective memories are intentions triggered by cues, such as remembering to post a
letter (action) after seeing a mailbox (cue). Cues do not need to be related to the action (as themailbox example is), and lists, sticky-notes, knotted handkerchiefs, or string around the finger are all examples of cues that are produced by people as a strategy to enhance prospectivememory.
[edit] Physiology
Brain areas involved in the neuroanatomy of memory such as the hippocampus, the amygdala, the striatum, or the mammillary bodies are thought to be involved in specific types of memory.For example, the hippocampus is believed to be involved in spatial learning and declarative
learning, while the amygdala is thought to be involved in emotional memory. Damage to certainareas in patients and animal models and subsequent memory deficits is a primary source of information. However, rather than implicating a specific area, it could be that damage to adjacentareas, or to a pathway traveling through the area is actually responsible for the observed deficit.Further, it is not sufficient to describe memory, and its counterpart, learning, as solely dependenton specific brain regions. Learning and memory are attributed to changes in neuronal synapses, thought to be mediated by long-term potentiation and long-term depression.
Hebb distinguished between short-term and long-term memory. He postulated that any memorythat stayed in short-term storage for a long enough time would be consolidated into a long-termmemory. Later research showed this to be false. Research has shown that direct injections of cortisol or epinephrine help the storage of recent experiences. This is also true for stimulation of the amygdala. This proves that excitement enhances memory by the stimulation of hormones thataffect the amygdala. Excessive or prolonged stress (with prolonged cortisol) may hurt memorystorage. Patients with amygdalar damage are no more likely to remember emotionally chargedwords than nonemotionally charged ones. The hippocampus is important for explicit memory.The hippocampus is also important for memory consolidation. The hippocampus receives inputfrom different parts of the cortex and sends its output out to different parts of the brain also. Theinput comes from secondary and tertiary sensory areas that have processed the information a lot
7/29/2019 Data Mining,Cobol,Memory
http://slidepdf.com/reader/full/data-miningcobolmemory 42/54
already. Hippocampal damage may also cause memory loss and problems with memorystorage.[10]
[edit] Genetics
Study of the genetics of human memory is in its infancy. A notable initial success was theassociation of APOE with memory dysfunction in Alzheimer's Disease. The search for genesassociated with normally-varying memory continues. One of the first candidates for normalvariation in memory is the gene KIBRA
[11], which appears to be associated with the rate at whichmaterial is forgotten over a delay period.
[edit] Disorders
Much of the current knowledge of memory has come from studying memory disorder s. Loss of memory is known as amnesia. There are many sorts of amnesia, and by studying their different
forms, it has become possible to observe apparent defects in individual sub-systems of the brain'smemory systems, and thus hypothesize their function in the normally working brain. Other neurological disorders such as Alzheimer's disease and Parkinson's disease [12] can also affectmemory and cognition. Hyperthymesia, or hyperthymesic syndrome, is a disorder which affectsan individual's autobiographical memory, essentially meaning that they cannot forget smalldetails that otherwise would not be stored.[13] Korsakoff's syndrome, also known as Korsakoff's psychosis, amnesic-confabulatory syndrome, is an organic brain disease that adversely affectsmemory.
While not a disorder, a common temporary failure of word retrieval from memory is the tip-of-the-tongue phenomenon. Sufferers of Nominal Aphasia (also called Anomia), however, do
experience the tip-of-the-tongue phenomenon on an ongoing basis due to damage to the frontaland parietal lobes of the brain.
[edit] Methods
Methods to optimize memorization
Memorization is a method of learning that allows an individual to recall information verbatim.Rote learning is the method most often used. Methods of memorizing things have been thesubject of much discussion over the years with some writers, such as Cosmos Rossellius usingvisual alphabets. The spacing effect shows that an individual is more likely to remember a list of
items when rehearsal is spaced over an extended period of time. In contrast to this is cramming which is intensive memorisation in a short period of time. Also relevant is the Zeigarnik effect which states that people remember uncompleted or interrupted tasks better than completed ones.The so-called Method of loci uses spatial memory to memorize non-spatial information.
Interference from previous knowledge
7/29/2019 Data Mining,Cobol,Memory
http://slidepdf.com/reader/full/data-miningcobolmemory 43/54
At the Center for Cognitive Science at Ohio State University, researchers have found thatmemory accuracy of adults is hurt by the fact that they know more than children and tend toapply this knowledge when learning new information. The findings appeared in the August 2004edition of the journal Psychological Science.
Interference can hamper memorisation and retrieval. There is retroactive interference whenlearning new information causes forgetting of old information, and proactive interference wherelearning one piece of information makes it harder to learn similar new information.[14]
Influence of odors and emotions
In March 2007 German researchers found they could use odors to re-activate new memories inthe brains of people while they slept and the volunteers remembered better later .[15] Emotion canhave a powerful impact on memory. Numerous studies have shown that the most vividautobiographical memories tend to be of emotional events, which are likely to be recalled moreoften and with more clarity and detail than neutral events.[16]
[edit] Memory and aging
Main article: Memory and aging
One of the key concerns of older adults is the experience of memory loss, especially as it is oneof the hallmark symptoms of Alzheimer's disease. However, memory loss is qualitativelydifferent in normal aging from the kind of memory loss associated with a diagnosis of Alzheimer's (Budson & Price, 2005).
[edit] Improving memoryMain article: Improving memory
A UCLA research study published in the June 2006 issue of the American Journal of GeriatricPsychiatry found that people can improve cognitive function and brain efficiency through simplelifestyle changes such as incorporating memory exercises, healthy eating, physical fitness andstress reduction into their daily lives. This study examined 17 subjects, (average age 53) withnormal memory performance. Eight subjects were asked to follow a "brain healthy" diet,relaxation, physical, and mental exercise (brain teasers and verbal memory training techniques).After 14 days, they showed greater word fluency (not memory) compared to their baseline
performance. No long term follow up was conducted, it is therefore unclear if this interventionhas lasting effects on memory.[17]
There are a loosely associated group of mnemonic principles and techniques that can be used tovastly improve memory known as the Art of memory.
The International Longevity Center released in 2001 a repor t[18] which includes in pages 14 – 16recommendations for keeping the mind in good functionality until advanced age. Some of the
7/29/2019 Data Mining,Cobol,Memory
http://slidepdf.com/reader/full/data-miningcobolmemory 44/54
recommendations are to stay intellectually active through learning, training or reading, to keep physically active so to promote blood circulation to the brain, to socialize, to reduce stress, tokeep sleep time regular, to avoid depression or emotional instability and to observe goodnutrition.
[edit] Memory tasks
Paired associate learning - when one learns to associate one specific word with another.For example when given a word such as "safe" one must learn to say another specificword, such as "green". This is stimulus and response.[19]
Free recall - during this task a subject would be asked to study a list of words and thensometime later they will be asked to recall or write down as many words that they canremember .[20]
Recognition - subjects are asked to remember a list of words or pictures, after which point they are asked to identify the previously presented words or pictures from among alist of alternatives that were not presented in the original list.[21]
7/29/2019 Data Mining,Cobol,Memory
http://slidepdf.com/reader/full/data-miningcobolmemory 45/54
COBOL
From Wikipedia, the free encyclopediaFor other uses, see COBOL (disambiguation).
COBOL
Paradigm procedural, object-oriented
Appeared in 1959
Designed by
Grace Hopper , William Selden,Gertrude Tierney, HowardBromberg, Howard Discount,Vernon Reeves, Jean E. Sammet
Stable release COBOL 2002 (2002)
Typingdiscipline
strong, static
Major
implementations
OpenCOBOL, Micro FocusInternational (e.g. the Eclipse- plug-in Micro Focus Net Express)
Dialects
HP3000 COBOL/II, COBOL/2,IBM OS/VS COBOL, IBMCOBOL/II, IBM COBOL SAA,IBM Enterprise COBOL, IBMCOBOL/400, IBM ILE COBOL,Unix COBOL X/Open, MicroFocus COBOL, MicrosoftCOBOL, Ryan McFarlandRM/COBOL, Ryan McFarlandRM/COBOL-85, DOSVSCOBOL, UNIVAC COBOL,Realia COBOL, Fujitsu COBOL,ICL COBOL, ACUCOBOL-GT,COBOL-IT, DEC COBOL-10,DEC VAX COBOL, Wang VSCOBOL, Visual COBOL,Tandem (NonStop) COBOL85,Tandem (NonStop) SCOBOL (aCOBOL74 variant for creatingscreens on text-based terminals)
Influenced byFLOW-MATIC, COMTRAN, FACT
Influenced PL/I, CobolScript, ABAP
7/29/2019 Data Mining,Cobol,Memory
http://slidepdf.com/reader/full/data-miningcobolmemory 46/54
COBOL at Wikibooks
COBOL ( /ˈkoʊ bɒl/) is one of the oldest programming languages. Its name is an acronym for
COmmon Business-Oriented Language, defining its primary domain in business, finance, andadministrative systems for companies and governments.
The COBOL 2002 standard includes support for object-oriented programming and other modernlanguage features.[1]
Contents
[hide]
1 History and specification o 1.1 ANS COBOL 1968
o 1.2 COBOL 1974 o 1.3 COBOL 1985 o 1.4 COBOL 2002 and object-oriented COBOL o 1.5 History of COBOL standards o 1.6 Legacy
2 Features o 2.1 Self-modifying code o 2.2 Syntactic features o 2.3 Data types o 2.4 Hello, world
3 Criticism and defense
o 3.1 Lack of structurability o 3.2 Verbose syntax o 3.3 Other defenses
4 See also 5 References 6 Sources 7 External links
[edit] History and specification
The COBOL specification was created by a committee of researchers from private industry,universities, and government during the second half of 1959. The specifications were to a greatextent inspired by the FLOW-MATIC language invented by Grace Hopper - commonly referredto as "the mother of the COBOL language." The IBM COMTRAN language invented by BobBemer was also drawn upon, but the FACT language specification from Honeywell was notdistributed to committee members until late in the process and had relatively little impact.
7/29/2019 Data Mining,Cobol,Memory
http://slidepdf.com/reader/full/data-miningcobolmemory 47/54
FLOW-MATIC's status as the only language of the bunch to have actually been implementedmade it particularly attractive to the committee.[2]
The scene was set on April 8, 1959 at a meeting of computer manufacturers, users, anduniversity people at the University of Pennsylvania Computing Center. The United States
Department of Defense subsequently agreed to sponsor and oversee the next activities. Ameeting chaired by Charles A. Phillips was held at the Pentagon on May 28 and 29 of 1959(exactly one year after the Zürich ALGOL 58 meeting); there it was decided to set up threecommittees: short, intermediate and long range (the last one was never actually formed). It wasthe Short Range Committee, chaired by Joseph Wegstein of the US National Bureau of Standards, that during the following months created a description of the first version of COBOL.[3] The committee was formed to recommend a short range approach to a common business language. The committee was made up of members representing six computer manufacturers and three government agencies. The six computer manufacturers were BurroughsCorporation, IBM, Minneapolis-Honeywell (Honeywell Labs), RCA, Sperry Rand, and SylvaniaElectric Products. The three government agencies were the US Air Force, the Navy's David
Taylor Model Basin, and the National Bureau of Standards (now National Institute of Standardsand Technology). The intermediate-range committee was formed but never became operational.In the end a sub-committee of the Short Range Committee developed the specifications of theCOBOL language. This sub-committee was made up of six individuals:
William Selden and Gertrude Tierney of IBM Howard Bromberg and Howard Discount of RCA Vernon Reeves and Jean E. Sammet of Sylvania Electric Products[4]
The decision to use the name "COBOL" was made at a meeting of the committee held on 18September 1959. The subcommittee completed the specifications for COBOL in December
1959.
The first compilers for COBOL were subsequently implemented in 1960, and on December 6and 7, essentially the same COBOL program ran on two different computer makes, an RCAcomputer and a Remington-Rand Univac computer, demonstrating that compatibility could beachieved.
[edit] ANS COBOL 1968
After 1959 COBOL underwent several modifications and improvements. In an attempt toovercome the problem of incompatibility between different versions of COBOL, the American
National Standards Institute (ANSI) developed a standard form of the language in 1968. Thisversion was known as American National Standard (ANS) COBOL.
[edit] COBOL 1974
In 1974, ANSI published a revised version of (ANS) COBOL, containing a number of featuresthat were not in the 1968 version.
7/29/2019 Data Mining,Cobol,Memory
http://slidepdf.com/reader/full/data-miningcobolmemory 48/54
[edit] COBOL 1985
In 1985, ANSI published still another revised version that had new features not in the 1974
standard, most notably structured language constructs ("scope terminators"), including END-IF,
END-PERFORM, END-READ, etc.
[edit] COBOL 2002 and object-oriented COBOL
The language continues to evolve today. In the early 1990s it was decided to add object-orientation in the next full revision of COBOL. The initial estimate was to have this revisioncompleted by 1997 and an ISO CD (Committee Draft) was available by 1997. Someimplementers (including Micro Focus, Fujitsu, Veryant, and IBM) introduced object-orientedsyntax based on the 1997 or other drafts of the full revision. The final approved ISO Standard(adopted as an ANSI standard by INCITS) was approved and made available in 2002.
Like the C++ and Java programming languages, object-oriented COBOL compilers are available
even as the language moves toward standardization. Fujitsu and Micro Focus currently supportobject-oriented COBOL compilers targeting the .NET framework .[5]
The 2002 (4th revision) of COBOL included many other features beyond object-orientation.These included (but are not limited to):
National Language support (including but not limited to Unicode support) Locale-based processing User-defined functions CALL (and function) prototypes (for compile-time parameter checking) Pointers and syntax for getting and freeing storage
Calling conventions to and from non-COBOL languages such as C Support for execution within framework environments such as Microsoft's .NET and Java
(including COBOL instantiated as Enterprise JavaBeans) Bit and Boolean support ―True‖ binary support (up until this enhancement, binary items were truncated based on
the (base-10) specification within the Data Division) Floating-point support Standard (or portable) arithmetic results XML generation and parsing
[edit] History of COBOL standards
The specifications approved by the full Short Range Committee were approved by the ExecutiveCommittee on January 3, 1960, and sent to the government printing office, which edited and printed these specifications as Cobol 60.
The American National Standards Institute (ANSI) produced several revisions of the COBOLstandard, including:
7/29/2019 Data Mining,Cobol,Memory
http://slidepdf.com/reader/full/data-miningcobolmemory 49/54
COBOL-68 COBOL-74 COBOL-85 Intrinsic Functions Amendment - 1989 Corrections Amendment - 1991
After the Amendments to the 1985 ANSI Standard (which were adopted by ISO), primarydevelopment and ownership was taken over by ISO. The following editions and TRs (TechnicalReports) have been issued by ISO (and adopted as ANSI) Standards:
COBOL 2002 Finalizer Technical Report - 2003 Native XML syntax Technical Report - 2006 Object Oriented Collection Class Libraries - pending final approval...
From 2002, the ISO standard is also available to the public coded as ISO/IEC 1989.
Work progresses on the next full revision of the COBOL Standard. Approval and availabilitywas expected early 2010s. For information on this revision, to see the latest draft of this revision,or to see what other works is happening with the COBOL Standard, see the COBOL StandardsWebsite.
[edit] Legacy
COBOL programs are in use globally in governmental and military agencies and in commercialenterprises, and are running on operating systems such as IBM's z/OS, the POSIX families(Unix/Linux etc.), and Microsoft's Windows as well as ICL's VME operating system and Unisys'
OS 2200. In 1997, the Gartner Group reported that 80% of the world's business ran on COBOLwith over 200 billion lines of code in existence and with an estimated 5 billion lines of new codeannually.[6]
Near the end of the twentieth century the year 2000 problem was the focus of significantCOBOL programming effort, sometimes by the same programmers who had designed thesystems decades before. The particular level of effort required for COBOL code has beenattributed both to the large amount of business-oriented COBOL, as COBOL is by design a business language and business applications use dates heavily, and to constructs of the COBOLlanguage such as the PICTURE clause, which can be used to define fixed-length numeric fields,including two-digit fields for years.[
citation needed ] Because of the clean-up effort put into these
COBOL programs for Y2K, many of them have been kept in use for years since then.
[citation needed ]
[edit] Features
COBOL as defined in the original specification included a PICTURE clause for detailed fieldspecification. It did not support local variables, recursion, dynamic memory allocation, or structured programming constructs. Support for some or all of these features has been added in
7/29/2019 Data Mining,Cobol,Memory
http://slidepdf.com/reader/full/data-miningcobolmemory 50/54
later editions of the COBOL standard. COBOL has many reserved words (over 400), calledkeywords.
[edit] Self-modifying code
The original COBOL specification supported self-modifying code via the infamous "ALTER XTO PROCEED TO Y" statement. X and Y are paragraph labels, and any "GOTO X" statementsexecuted after such an ALTER statement have the meaning "GOTO Y" instead. Most [citation needed ] compilers still support it, but it should not be used in new programs.
[edit] Syntactic features
COBOL provides an update-in-place syntax, for example
ADD YEARS TO AGE
The equivalent construct in many procedural languages would be
age = age + years
This syntax is similar to the compound assignment operator later adopted by C:
age += years
The abbreviated conditional expression
IF SALARY > 8000 OR SUPERVISOR-SALARY OR = PREV-SALARY
is equivalent to
IF SALARY > 8000
OR SALARY > SUPERVISOR-SALARY
OR SALARY == PREV-SALARY
COBOL provides "named conditions" (so-called 88-levels). These are declared as sub-items of another item (the conditional variable). The named condition can be used in an IF statement, andtests whether the conditional variable is equal to any of the values given in the named condition'sVALUE clause. The SET statement can be used to make a named condition TRUE (by assigningthe first of its values to the conditional variable).
COBOL allows identifiers up to 30 characters long. When COBOL was introduced, muchshorter lengths (e.g., 6 characters for FORTRAN) were prevalent.
COBOL introduced the concept of copybooks — chunks of code that can be inserted into a larger program. COBOL does this with the COPY statement, which also allows other code to replace parts of the copybook's code with other code (using the REPLACING ... BY ... clause).
7/29/2019 Data Mining,Cobol,Memory
http://slidepdf.com/reader/full/data-miningcobolmemory 51/54
[edit] Data types
Standard COBOL provides the following data types:
Data type Sample declaration Notes
Character PIC X(20)
PIC A(4)9(5)X(7)
Alphanumeric and alphabetic-onlySingle-byte character set (SBCS)
Edited character PIC X99BAXX Formatted and inserted characters
Numeric fixed-point binary
PIC S999V99
[USAGE]
COMPUTATIONAL or BINARY
Binary 16, 32, or 64 bits (2, 4, or 8 bytes)Signed or unsigned. Conformingcompilers limit the maximumvalue of variables based on the picture clause and not the number
of bits reserved for storage. Numeric fixed-point packed decimal
PIC S999V99
PACKED-DECIMAL
1 to 18 decimal digits (1 to 10 bytes)Signed or unsigned
Numeric fixed-pointzoned decimal
PIC S999V99
[USAGE DISPLAY]
1 to 18 decimal digits (1 to 18 bytes)Signed or unsignedLeading or trailing sign,overpunch or separate
Numeric floating-point PIC S9V999ES99 Binary floating-point
Edited numeric
PIC +Z,ZZ9.99
PIC $***,**9.99CR Formatted characters and digits
Group ( record )
01 CUST-NAME.
05 CUST-LAST PIC
X(20).
05 CUST-FIRST PIC
X(20).
Aggregated elements
Table ( array ) OCCURS 12 TIMES Fixed-size array, row-major order Up to 7 dimensions
Variable-length table
OCCURS 0 to 12 TIMES
DEPENDING ON CUST-
COUNT
Variable-sized array, row-major order Up to 7 dimensions
Renames ( variant or union data)
66 RAW-RECORDRENAMES CUST-
RECORD
Character data overlaying other variables
Condition name88 IS-RETIRED-AGE
VALUES 65 THRU 150 Boolean valuedependent upon another variable
Array index [USAGE] INDEX Array subscript
Most vendors provide additional types, such as:
7/29/2019 Data Mining,Cobol,Memory
http://slidepdf.com/reader/full/data-miningcobolmemory 52/54
Data type Sample declaration Notes
Numeric floating-pointsingle precision
PIC S9V9999999ES99 [USAGE]
COMPUTATIONAL-1
Binary floating-point (32 bits, 7+digits)(IBM extension)
Numeric floating-pointdouble precision
PIC S9V999ES99 [USAGE]
COMPUTATIONAL-2
Binary floating-point (64 bits, 16+
digits)(IBM extension)
Numeric fixed-point packed decimal
PIC S9V999 [USAGE]
COMPUTATIONAL-3
same as PACKED DECIMAL(IBM extension)
Numeric fixed-point binary
PIC S999V99
[USAGE]
COMPUTATIONAL-4
same as COMPUTATIONAL or BINARY(IBM extension)
Numeric fixed-point binary(native binary)
PIC S999V99[USAGE]
COMPUTATIONAL-5
Binary 16, 32, or 64 bits (2, 4, or 8 bytes)Signed or unsigned. The
maximum value of variables based on the number of bitsreserved for storage and not onthe picture clause.(IBM extension)
Numeric fixed-point binaryin native byte order
PIC S999V99
[USAGE]
COMPUTATIONAL-4
Binary 16, 32, or 64 bits (2, 4, or 8 bytes)Signed or unsigned
Numeric fixed-point binary
in big-endian byte order
PIC S999V99
[USAGE]
COMPUTATIONAL-5
Binary 16, 32, or 64 bits (2, 4, or 8 bytes)
Signed or unsigned
Wide character PIC G(20) AlphanumericDouble-byte character set (DBCS)
Edited wide character PIC G99BGGG Formatted and inserted widecharacters
Edited floating-point PIC +9.9(6)E+99 Formatted characters, decimaldigits, and exponent
Data pointer [USAGE] POINTER Data memory address
Code pointer [USAGE] PROCEDURE-
POINTER Code memory address
Bit field PIC 1(n) [USAGE]COMPUTATIONAL-5
n can be from 1 to 64, defining ann-bit integer Signed or unsigned
Index [USAGE] INDEX
Binary value corresponding to anoccurrence of a table elementMay be linked to a specific table
using INDEXED BY
7/29/2019 Data Mining,Cobol,Memory
http://slidepdf.com/reader/full/data-miningcobolmemory 53/54
[edit] Hello, world
An example of the "Hello, world" program in COBOL:
IDENTIFICATION DIVISION.
PROGRAM-ID. HELLO-WORLD.
PROCEDURE DIVISION.
DISPLAY 'Hello, world'.
STOP RUN.
[edit] Criticism and defense
[edit] Lack of structurability
In his letter to an editor in 1975 titled "How do we tell truths that might hurt?" which was criticalof several programming languages contemporaneous with COBOL, computer scientist andTuring Award recipient Edsger Dijkstra remarked that "The use of COBOL cripples the mind; itsteaching should, therefore, be regarded as a criminal offense."[7]
In his dissenting response to Dijkstra's article and the above "offensive statement," computer scientist Howard E. Tompkins defended structured COBOL: "COBOL programs withconvoluted control flow indeed tend to 'cripple the mind'," but this was because "There are toomany such business application programs written by programmers that have never had the benefit of structured COBOL taught well..."[8]
Additionally, the introduction of OO-COBOL has added support for object-oriented code as wellas user-defined functions and user-defined data types to COBOL's repertoire.
[edit] Verbose syntax
COBOL 85 was not fully compatible with earlier versions, resulting in the "cesarean birth" of COBOL 85. Joseph T. Brophy, CIO, Travelers Insurance, spearheaded an effort to inform usersof COBOL of the heavy reprogramming costs of implementing the new standard. As a result theANSI COBOL Committee received more than 3,200 letters from the public, mostly negative,requiring the committee to make changes. On the other hand, conversion to COBOL 85 wasthought to increase productivity in future years, thus justifying the conversion costs.[9]
COBOL syntax has often been criticized for its verbosity. However, proponents note that this
was intentional in the language design, and many consider it one of COBOL's strengths. One of the design goals of COBOL was that non-programmers — managers, supervisors, and users — could read and understand the code. This is why COBOL has an English-like syntax andstructural elements — including: nouns, verbs, clauses, sentences, sections, and divisions.Consequently, COBOL is considered by at least one source to be "The most readable,understandable and self-documenting programming language in use today. [...] Not only doesthis readability generally assist the maintenance process but the older a program gets the morevaluable this readability becomes."[10] On the other hand, the mere ability to read and understand
7/29/2019 Data Mining,Cobol,Memory
http://slidepdf.com/reader/full/data-miningcobolmemory 54/54
a few lines of COBOL code does not grant to an executive or end user the experience andknowledge needed to design, build, and maintain large software systems.[citation needed ]
[edit] Other defenses
Additionally, traditional COBOL is a simple language with a limited scope of function (with no pointers, no user-defined types, and no user-defined functions), encouraging a straightforwardcoding style. This has made it well-suited to its primary domain of business computing — wherethe program complexity lies in the business rules that need to be encoded rather thansophisticated algorithms or data structures. And because the standard does not belong to any particular vendor, programs written in COBOL are highly portable. The language can be used ona wide variety of hardware platforms and operating systems. And the rigid hierarchical structurerestricts the definition of external references to the Environment Division, which simplifies platform changes.[10]