final submission format instructions for proceedings of business

Using Data Mining Technology to Build an Quality Improvement System

Ruey-Shun ChenInstitute of Information Management, China University of Technology, Taiwan ,

No. 56, Sec. 3, Shinglung Rd., Wenshan Chiu, Taipei City 116, Taiwan [email protected]

R. C. Wu, and C. C. ChenInstitute of Information Management, National Chiao Tung University, Taiwan ,

1001 Ta Hsueh Road, Hsinchu, Taiwan 300, ROC

ABSTRACT Data mining technology is to provide enterprises to make decisions efficiently. In this

paper, we illustrate the design and establishing of such a system, we called an quality

improvement system is using data warehouse and data mining technology to discover

the main significant variables in the manufacturing packaging plants. Through the

comparisons of classification analysis of the proposed methods, we set up an

improvement system providing an efficiency tool for analyzing the data and detecting

problems, with a view to identifying the causes of problems and eventually enhancing

the yield. Moreover, analysis of data related to some real world problems are found

and solved in this research. The experimental results of this research shows that the

predictions made by decision tree analysis are more accurate than those made by the

other classifications, i.e. neural network, Bayesian, clustering and association rules.

Eventually, the use of decision tree algorithm will increase the yields and is more

powerful for detecting hidden patterns of problems in the packaging industry.

Keyword: Data mining, Information System, quality improve

INTRODUCTIONModern semiconductor manufacturing companies continues to play an important role

in the demands of the marketplace to push increasing chip productions. This paper

shows how the packaging flows run and discover the critical problems in a detailed

way. In each independent station will capture, store, integrate, and report the data

generated from each of the machine. In this paper, we based on our experiences to

develop such a system to analyze the main variable (Alex et al., 2000; Michael et al.,

1999). We also discuss experiences related to the practical use of data mining as a tool

to improve the productivity of problem solving in the yield enhancement. Because

enterprises engaged in keen competition attach great importance to improve products

quality and deliver products on time meeting customer’s requirements whoever

obtains accurate information faster and make decisions immediately, than rivals do,

will have a chance of being successful (Robert, 2000). Moreover, we detect the

hidden reasons of product quality problems and solve them. Another key issue that a

corporate faced is information system nowadays is to grasp correct information at the

right time and deliver it to the correct executives. In a word, an intelligent quality

improvement system based on data warehouse and data mining should be

implemented with a view to achieving the following four achievements.

1. Constructing data warehouse of quality problems from legacy systems with

semiconductor product quality and establishing applicable procedures to analyze

and solve the quality problems.

2. Exploring the critical variables that cause the problems of products quality and

finding the solutions to the vital variables from each problem.

3. Distinguish the accurate method of the proposed algorithms.4. Decreasing the chip defeats and increasing the production yields.

LITERATURE REVIEW2.1 Data warehouse

Data warehouse is the summation of all decision-making support techniques. It assists

knowledge workers in making decisions better and faster (Jiawei Han and Micheline

Kamber, 2001) and it is the core of a decision-making support system. It is believed

that, data warehouse should not only have a database function, but it should also have

the following four features: 1. Integration – data warehouse combines an enterprise’s information sources,

including various computer systems, databases, and application programs, etc. The information sources may be discrete and inconsistent.

2. Subject-oriented – combinations of data are created at will to answer questions raised by specific companies or organizations.

3. Time variables – unlike conventional operating data, data warehouse attaches great importance to dynamic data that vary with time (week, month, year) and data acquired from outside an enterprise.

4. Invariability – once data are stored in a data warehouse, they will be preserved and will not be changed any more, and as a result they are read-only. In other words, new data increase with time and thus they are continually added to a data warehouse to be used by decision-makers.

In short, by creating a centralized data warehouse, using appropriate data analysis

tools, and quickly developing software that supports decision-making, data warehouse

enables decision-makers to acquire intended information at any time and use the

acquired information as important references for supporting their decision-making (J.

Ross Quinlan, 1993; Surajit Chaudhuri, Umeshwar Dayal, 1997).

2.2 Data mining

Data mining is the process whereby knowledge is discovered in a database and then

implicit, previously unknown and potentially useful information is extracted from the

database (Frawley et al., 1991). It enables the discovery of potentially useful

information in voluminous information in order to provide references for decision-

makers. The whole process of data mining comprises data selection, preprocessing,

conversion, data analysis, and interpretation and evaluation (Yu, 1999).

After understanding the definition of data mining and the objective thereof, we have

to look into the steps leading to the discovery of knowledge. Kleissner (1998)

suggests that a knowledge discovery cycle should comprise the following four steps

1. data selection

2. data cleaning

3. data conversion and meaning-giving

4. data mining

The aforesaid steps lead to discovery of knowledge wherein the essence lies in mining

target data in order to discover knowledge. Brachman et al. (1996) believe that all the

activities and processes in connection to exploration of knowledge are intended to

find out useful patterns in those data, and then important causes of problems are

identified in order to solve the problems, using the data mining algorithm as well as

subsequent processing or re-processing of knowledge. After the discovery of

knowledge, related experts have to evaluate and explain the extracted knowledge so

as to ensure that the discovered knowledge will have genuine efficacy. A complete

process of knowledge discovery is shown in Fig. 1 (Jiawei Han, Micheline Kamber,

2001).

Fig 1. Data collection and relevant procedures in the course of data mining

2.3 LCD driver IC packaging process

To categorize different LCD driver IC back-end manufacturing processes, it can be

done by IC packaging types, three types can be identified: TCP (Tape Carrier

Package), COF (Chip on Film) and COG (Chip on Glass). Currently LCD driver IC

mostly use TCP package, Mobile phone LCD plate modules’ driver IC mostly use

COG package, and COF package is the future trend. Fig. 2 (Tsai, 2001) is LCD driver

IC back-end main process (Yang et al., 2001), and Table 1 (Industrial Technology

Research Institute Material Center, 2004) is the advantage comparisons among TAB

(Tape Automated Bonding, i.e. TCP) COG and COF.

Fig. 2. LCD driver IC back-end packaging process.

Table 1 :Advantage comparisons among TAB, COG and COF

PROBLEM STATEMENT

3.1 TCP packaging technology

The original purpose of TCP’s tape packaging technology was to replace the wire

bonding packaging, along with the increase of IC’s I/O numbers, and the trend of

automated production, TCP technology became ever more mature, and is currently the

main steam technology for large sized LCD driver IC packaging (Yang et al., 2001),

illustrated as Fig. 3. The following is the LCD driver IC’s packaging procedure

covered in our study.

1. Irradiate ultra-violet light:

After the wafers are cut, they become individual separate chips, still attached

to the original UV tape film. Therefore, following irradiating by UV light, the

tape film is softened, allowing the attached chips to be conveniently detached.

2. Inner Lead Bonding (ILB):

The inner lead bonding process involves in taking the inner lead of the tape

and the gold bumps on the chips, using heated press to attach them together on

the tape, making connection points; Thus the process allows the chip to be

connected to the circuit on the tape.

3. Inner Lead Bonding QC:

After making the inner lead bonding process, follows the quality control of the

inner lead’s completeness, pitch, lead and tape bond, etc.

4. Potting:

The process of applying a coating uses resin sealing to provide protection for

the chips to prevent damage by moisture, increase support of lead frames and

also helps in heat exchange etc.

5. Curing:

When a product finishes with potting, in addition to the brief heating of the

resin on the machine, the finished product needs to be put in a oven for further

heating, making the resin on the product completely moisture free and harden.

6. Marking:

The marking process mainly deals with printing text onto the IC product’s

packaging, allowing the product specifications to be identified and the originality

production process. The marking method is determined by the client needs.

7. Final testing:

Final testing is done after the completion of the packaging process, using

probes to connect with the product’s outer leads; using electronic detection

products to ensure the TCP packaged product reaches specification and sift

defects.

8. RW:

Using the tape bonding machine to wind-up and coarctation of the leads. The

method of transport is using fixed rollers to transport soft chip type bearer, when

the inner leads that are attached to the tape, reach position, the bonding head will

lower down to press and bond the leads, thus completing a chip’s RW process.

9. Packing:

Classifying the finished products and packing them into client specified

packing containers, and affix label and logos etc.

Fig. 3. TCP packaging process.

3.2 Problem definition

The study is designed to discuss data mining in the semiconductor packaging industry

with an aim to identify unknown but useful knowledge. As the manufacturing industry

always uses run card to record quality-related problems raised by customers, lacking

an effective way to make the most of information, it results in wasteful time and

unnecessary cost on investigations and analysis when the problem reoccurs.

Furthermore, data is often large in size and complicated, the personnel in charge of

quality-related problems can hardly identify the discrepancy factor or generalize the

characteristics or types of the problems rapidly or correctly. For these two main

reasons, the challenge required to be settled in the study lies in the conclusion of the

problems arising in connection with the semiconductor manufacturing plant.

There are cases applying data mining in a number of literature reviews, including

manufacturing, financing and telecommunications. Among the available data mining

tools, the common classification methods are decision Tree, Naïve Bayesian, Neural

Network, clustering and association rules, shown as Table 2 (Chien, 2004).

Table 2:Classification of data mining technology

In applied telecommunications cases, Naïve Bayesian is better than decision Tree in

prediction effect (Hsieh, 2005). Thus the study is intended to use the data mining

methods, decision Tree, Neural Network, Bayesian, clustering and association rules

acquire the previous run card data content for analysis, and identify which algorithm

data mining result is superior in application to the semiconductor packaging industry,

based on implementation outcomes. In this way, we may make a reasonable

conclusion for the patterns of incidents and build a problem diagnosis analysis system

and, through discussion with the experts in the domain concerned, assist the personnel

in charge of quality-related problems to reduce the diagnosis time and scope of the

incident.

SYSTEM DESIGN AND IMPLEMENTATION

4.1 Proposed system architecture

Based upon the data mining system, the complete design framework for intelligent

quality improvement system is illustrated in Fig. 4 (Lee, 2001). The function of each

design element is shown is in the following steps.

1. Experts, domain knowledge: At first, determine which goals to achieve with data

mining for relevant data collection, data pre-processing, selection of data

attributes and data mining methods.

2. Data collection: The conversion of historical data from the existing system, i.e.

WIP, ERP into the processing area should be considered.

3. Data standardization: After determination of data acquisition source, standardize

the data type to ensure the consistency between subsequently collected data and

pre-processed data.

4. Data preprocess: When data is collected in the processing area but not stored in

the data warehouse, there might be lost or inaccurate data in some fields. To

enhance processing efficiency and accuracy, it's essential to proceed with data

integration, conversion, extraction, and cleaning.

5. Data warehouse, OLAP (Online Analytical Process): As the data to be processed

is distributed in different databases and is always large in size, reduced data

search time is the key to the whole process of data mining. Hence data

warehousing is applied to address these challenges. Besides, OLAP operations in

data cubes include rollup, drilldown, slice, dice, and pivot.

6. Select attributes: Typically, data analysis is proceeded after a proper attribute is

selected, which is based on the proper attribute for specific analysis target

determined by the expert in the domain concerned, because either insufficient or

excessive attributes cannot achieve correct analysis results.

7. Data mining engine: A data mining engine is the core, and also the critical part,

in the system framework. The most commonly used classification methods are

decision tree, Bayesian, Neural Network, clustering, and association rules, etc.

8. Results evaluation: Tremendous mined data and patterns may exist; the mined

result can be more available and interpretable only through parameter setup. In

addition, we may set some restrictive conditions to retrieve more significant

outcomes, and then the expert may participate to or assist interpret and assess the

mined rules or patterns. In case of dissatisfaction with the assessment results, one

may return to the previous adjustment methods or parameters until a proper

outcome is retrieved.

9. Results display: The mined result may be presented by user preference.

10. Knowledge base: The knowledge base, which stores expert expertise and the

rules available after data mining, can be updated from time to time to be the

basis for various decision-making supports.

Experts Domain Knowledge

Application goal is determined

KnowledgeBase

Data preprocess

Data collection

DataWarehouse

Data standardization

Evaluate results

Decision tree mined knowledge

OLAP

Results Display

3

4

7

8

5

9

1

10

2

ERPWIP

select attributes6

Decision tree

Neural network

Bayesian

Clustering

Association

Dataminingengine

Fig. 4. Complete design framework for intelligent quality improvement system.

In order to construct the proposed system architecture, we should some set up

elements including:

1. Setting up a data warehouse including following steps:

(1)Setting up data warehouse architecture

(2)Setting up data warehouse procedures

(3)Setting up data warehouse schema

(4)Setting up fact table

(5)Setting up dimension table

(6)Setting up multidimensional model

2. Setting up Decision Analysis and Data Mining System

After completing the construction of data cubes, it is possible to integrate

decision-making analysis and the data mining system. The goals of integration

are to allow OLAP analysis results to supply the knowledge base within the data

mining system, thus providing analysis information to the data mining system

and creating a point of reference for data mining tasks. OLAP technology is able

to blend together people’s observations and intelligence within the data mining

system, thus improving the speed and depth at which data is excavated.

Furthermore, the intelligence discovered by the data mining system acts as a

guide in OLAP analysis tasks, increasing the depth of analysis. As a result,

information left unearthed by the OLAP, is extremely complex and delicate in

nature.

3. Setting up Data Mining System

Data classification is basically comprised of the following two-step process (J.

Ross Quinlan, 1993; Jiawei Han and Micheline Kamber, 2001):

(1)Training model: Through the collecting of items within the database, a training

data set is determined. This set is analyzed in accordance with the algorithm used

to classify data, for example, decision tree and clustering. The learning model or

classifier is represented in the form of classification rules.

(2)Classification: Through the collecting of items within the database, a test data

set is established. This set is entered into a classifier. After deviations within the

classification model have been rectified, unknown data is entered into the revised

classifier, thus, predicting subsequent results.

4.2 System implementation architecture

The system's environment and framework are shown in Fig. 5; including Data

Warehouse Server, Data Mining Server, Web Server, and Data Mining and quality

improvement front-end PC. Microsoft SQL 2005 provides several kinds of data

mining algorithms for various applications. When building classifications engines, we

adopt the algorithms of SQL server 2005 for calculation (Hsieh, 2005).

Front-end pc

Front-end pc

Firewall

Internet

DMZ

Web server

Data Warehouse

Data Mining Server

WIP Server

ERP Server

Intranet

Fig.5. The environment and framework of intelligent quality improvement system.

4.3 Experimental results analysis

Quality problem data (25,150 entries) are predicted, classified, and analyzed with the

decision tree, neural network, Bayesian, clustering and association rules algorithm.

The data is randomized to training set and testing set by 3:1 (training set: 18,862

entries, testing set: 6,288 entries). For classification and analysis with the decision

tree, there are 5,653 entries of correct data, representing a success rate of 89.9%.

To establish the database, we have collected the data form January to December in

2005. With the decision tree and clustering provide by the intelligent quality

improvement system for reference of man operation analysis and problem settlement

form January to December in 2004, the four major factors that influence quality are:

broken/bending/delamination internal pin, shrinkage, short, resin wrapping drawn-in

object/tape indent. Before using decision tree algorithm, we have the improvement

rate 8.0%, 7.8%, 7.9%, 7.6%, 7.9%, 8.2%, 7.9%, 7.3%, 7.9%, and 7.6% for each

quality problem, and overall average improvement rate is 7.8%. After improvement

with the decision tree, we have the improvement rate 13.0%, 13.9%, 13.1%, 14.5%,

12.7%, 12.6%, 12.8%, 13.6%, 13.2%, and 13.3% for each quality problem, and

overall average improvement rate is 13.3%. The total average improvement rate

shown in Table 3 and the statistical curves before improvement vs. after improvement

is shown in Fig. 6. Hence, decision tree method is more effective and accurate than

the other methods to apply to the quality problems in the semiconductor packaging

industry.

Table 3: Comparisons of data mining results

Fig. 6. Diagram of overall improvement rate curves for each quality problem yearly.

CONCLUSIONS

In order to meet the target mentioned above, our research involves using data

warehouse, OLAP, decision tree, neural network, Bayesian, clustering and

association rules algorithms to perform classification analysis of the causes of yields

in the manufacturing process of semiconductor packaging plant, comparing the

correctness and applicability of proposed algorithms, and providing a decision-

making policy for the executives, with a view to identifying the causes of problems

and solutions of main variables to the problems, making decisions quickly, and

eventually reducing the time taken to solve quality problems. The results and

contributions of this research are listed as follows.

Compared with proposed classification algorithms, predictions made by means of

decision tree have an accuracy of 89.9% and predictions made by means of neural

network, Bayesian, clustering and association rules have accuracies of 84.3%, 83.1%,

82.6% and 80.7% respectively. Decision tree algorithm is more effective and

appropriate than clustering algorithm to analyze the quality problems in the

semiconductor packaging industry.

In the experimental results, it is found that among the four attributes, man, machine,

material and method, we will explore the first priority is machine, second priority is

material, third priority is method and fourth priority is man in the semiconductor

packaging industry.

We have also found the solutions to the major variables for pressure and temperature

of inner lead bonding and potting flows occurred in the packaging level.

References

Alex Berson, Stephen Smith, Kurt Thearling. (2000). Building data mining

applications for CRM. McGraw-Hill.

Atsumi, K., N. Kashima, Y. Maehara, T. Mitsuhashi, T. Komatsu, and N. Ochiai.

(1989). Inner lead bonding techniques for 500 lead dies having a 90 um lead

pitch. Proc. 39th Electronic Components Conference, 171-176.

Brachman, R.J., T. Khabaza, W. Kloesgen, G.P. Shapiro, E. Simoudis. (1996). Mining

business databases. Communication of the ACM, 39(11), 42-48.

Chien H.H. (2004). Using data mining techniques for analysis of manufacturing

process quality and improvement – using LCD drive IC packaging as example.

National Chiao Tung University Masters in Management.

Frawley, W.J., G. Paitetsky-Shapiro, C.J. Matheus. (1991). Knowledge discovery in

database: an overview. Knowledge Discovery in Database, AAAI/MIT Press, 1-

30.

Hsieh C.B. (2005). Data Mining and Business intelligence: SQL Server 2005. Ting

Mao Publish Company.

Hsu G.H. (1999). An Advanced Packaging Technology: Wafer Packaging Technology.

Materials Magazine, 151, 86-91.

Ikeya, Y., K. Atsumi, N. Kashima, Y. Maehara, K. Okano. (1989). High-accuracy

inner lead bonding technique. Proc. IEMT-Japan, 71-74.

Industrial Technology Research Institute Material Center, (2004). Industrial

Economics and Knowledge Center Project.

J. Ross Quinlan. (1993). C4.5: Programs for machine learning. Morgan Kaufmann

Publishers.

Jiawei Han, Micheline Kamber. (2001). Data mining: concepts and techniques.

Morgan Kaufmann Publishers.

Kleissner, C. (1998). Data mining for the enterprise. IEEE Proceedings of the 31st

Annual Hawaii International Conference on System Sciences, 7, 295-304.

Lee J. F. (2001). Research and exploration of data mining. Information and Education

Magazine.

Michael J.A. Berry, Gordon S. Linoff. (1997). Data mining techniques: for marketing,

sales, and customer support. John Wiley & Sons.

Michael J.A. Berry, Gordon S. Linoff. (1999). Mastering data mining: the art &

science of customer relationship management. John Wiley & Sons.

Peter F. Drucker, Ikujiro Nonaka, David A. Garvin. (1998). Harvard business review

on knowledge management. Harvard Business School Press.

Robert Groth. (2000). Data mining: building competitive advantage. Prentice-Hall

Inc.

Scharr, T.A. (1983). TAB bonding a 200 lead die. Proc. ISHM Symposium, 561-565.

Surajit Chaudhuri, Umeshwar Dayal. (1997). An overview of data warehousing and

OLAP technology. SIGMOD, 26, 65-74.

Tsai Tsan-Lian. (2001). Research in Taiwan LCD driver IC finishing process optimum

work distribution model. National Chiao Tung University Masters in High

Executive Management.

Vivek R. Gupta. (1997). An introduction to data warehousing. System Services

Corporation.

Yang et al. (2001). Analysis and reliability assessment in inner lead welding machine

characteristics for tape carrier package IC. Electronics and Materials Magazine,

132-142.

Yu, P.S. (1999). Data mining and personalization technologies. IBM T. J. Watson

Research Center, IEEE.

Zhengxin Chen. (2001). Data mining and uncertain reasoning: an integrated approach.

John Wiley & Sons.

final submission format instructions for proceedings of business

Documents

john wiley

semiconductor packaging industry

data mining engine

average improvement rate

data mining techniques

data mining system

data mining technology

data mining methods