project documentation carol george

8/3/2019 Project Documentation Carol George

http://slidepdf.com/reader/full/project-documentation-carol-george 1/33

RESEARCH PROPOSAL

TITLE:

EFFECTS OF EMPLOYING

DATA MINING IN A HOTEL MANAGEMENT SYSTEM.

A CASE STUDY OF Norfolk Hotel Nairobi (Fairmont Hotel).

PRESENTED BY:

GEORGE NJUGUNA

REG.NO.

BMIT/OO54/05/09

A Research Proposal documentation to be submitted in partial fulfillment of the Degree of Bachelor of Management $ Information Technology



DECLARATION.

I would like to declare that this is my own work and that it has not been presented to any other

University or this university for marking before. It is the creation of my own ideas from

beginning to the end of it all.

Student GEORGE NJUGUNA

Signature ……………………………………

Date ……………………………………

Supervisor

Signature ……………………………………

Date ……………………………………



ABSTRACT

The overall goal of this research is to come up with a computer system that can automate the

operations of a Hotel system which has data mining capabilities.

The case study is the Norfolk hotel(fairnont) . It is a hotel which is dedicated to providing best

services to its customer. Currently it is using a system that doesn’t incorporate a data mining

feature.

With this in mind, a research was necessary to investigate data mining issues in management

computer systems.



DEDICATION.

I affectionately dedicate this work to all the generous and helpful people, men and women who

have this gift and sharing and helping others. And to two of them in particular: My parents Mr.

and Mrs. Anthony Kamau without them I couldn’t have made it this far.

To my siblings Willy, Trevor, Emma and Mitchell who have given me a reason to believe in

myself.



ACKNOWLEGDEMENT.

With the research proposal completion, I wish to acknowledge with thanks the help of those

without whom, I would not have accomplished half as much in this project.

First I thank the Almighty God for his strength, grace and mercy which has been unconditionally

given to me from the start up to now.

Sincere thanks to friends who in one-way or another helped me see this through. Their support

financial, moral or otherwise was well appreciated.

Exceptional thanks also to my various lecturers who have given me inspiration, support, and

criticisms.

Individual thanks to my supervisor Mr. Ngeno. for his guidance throughout the research and to

my friend Mr. N.Karie, Warui, Melvin Mwangi and Peris Wanjiru for their endless

encouragement.

God Bless you all.



TABLE OF CONTENTS.

CHAPTER 1.

1.0 INTRODUCTION.



1.1BACKGROUND INFORMATION

Before and during the colonial period there were few if any large hotels in Kenya. Early British

settlers in Kenya often lived in the cities for part of the year but they usually rented a house from

their British predecessors , if they did not own one, rather than staying in a hotel. Numbers of

business and foreign visitors were very small by modern standards. The accommodation

available to them included lodging houses and coaching inns. Lodging houses were more like

private homes with rooms to let than commercial hotels, and were often run by widows.

Coaching inns served passengers from the stage coaches which were the main means of long-

distance passenger transport before railways began to develop in the 1830s. The last surviving

galleried coaching inn in London is the George Inn which now belongs to the National Trust.

A few hotels of a more modern variety began to be built in the early 1900’s and so was the

norfork Hotel which opened its doors in the early seventies , the precursor of Claridge's, opened

its doors in 1812 but, up to the mid-19th century it closed down due to bankruptsy and went into

receivership.The Norfolk Hotel Fairmont is one of the most characterful hotels in East Africa.

Older than the London Ritz, the Norfolk started life on Christmas day, 1904 and

apart from being one of the oldest hotels in Kenya and indeed the whole African region,

the Norfolk has gone through various transitions over the decades,

from restyling and renovations, to change of ownership and is dedicated to offering services to

tourists from all over the world including the local tourists. It offers a wide range of services

ranging from providing conference rooms, Accommodations for tourists, hearty meals for all

kind of people, entertainments like live performance from local and international singers, Sports

such as Golf which is the major sport, boat riding, camel riding, to name just but a few. It is

managed by experienced personnel. It has about 500 employees. Because of the thirst to explore

Kenya by tourists and the beautiful and cool nature of the country especially the Rift valley, the

hotel business has picked up. It is among the top Hotels in the region, hence the need to give the

best services efficiently. The company is currently using a system that does not have data mining

features. The directors are also intending to expand the Hotels to other towns in the near future.

The aim of the company is:

To provide employment.

To promote both domestic tourism and foreign tourism.



Help tourists to explore on nature and different cultures in Kenya.

1.2 STATEMENT OF THE PROBLEM.

Norfolk Hotel Nairobi (Fairmont Hotel) deals with hundreds of customers and carries out

thousands of computerized transactions a year. How to use this data has become kind of a

complex phenomenon to managers. Questions that pop up are what next after this accumulation?

As the scope of the Hotel become wide there is need to control costs and understand services

mixes of different customer groups. How to price services and focus certain services to a group

of people is very important for the growth of the hotel.

Each customer has a different buying habit. They might differ according to seasons. Trackingthis is as complex as inferring information got from it.

1.2 PROBLEM SOLUTION.

The above problems can adequately be solved by developing a data mining system. The system

will be able to perform among others the following functions:

1. Use the data mining capacity of the software to analyze tourist request patterns.

2. The patterns, associates, or relationships among all this data can provide information. For

example, analysis of the hotel’s point of sale transaction data can yield information on

which services are selling and when.3. Information can be converted into knowledge about historical patterns and future trends.

For example, summary information on services sales can be analyzed in light of promotional efforts to provide knowledge of tourist buying behavior. Thus, helps todetermine which services are most susceptible to promotional efforts.

4. The system will also enable Green Park Golf and Country Hotel to determinerelationships among "internal" factors such as price, service delivery, or staff skills, and"external" factors such as economic indicators, competition, and customer demographics.

5. They will be able to determine the impact on sales, customer satisfaction, and hotel’s

profits.6. Finally, it enables them to "drill down" into summary information to view detail

transactional data.

1.3 RESEARCH OBJECTIVES.

1.3.1 General Objective

To develop a data mining system with capabilities of mining and inferring conclusions

from a large complex hotel’s dataset



1.3.2 Specific Objectives

i. Research on data mining as an aid to recognizing patterns from hotel’s data

ii. Investigate on the recent state of the tourist Market in Kenya and its growth capacity.

1.4 RESEARCH QUESTIONS

i. A general overview of what is data mining?

ii. What are the foundations of data mining?

iii. What can data mining do to an underlying business venture?

iv. What are the most commonly used techniques in data mining?

v. What is the essence of visualizing data mining models?

1.5 Justification of the study This study is intended to help management deal with the ever growing problem of acquiring

large volumes of data from complex databases in their organizations it is thus to simplify thework of top level management deal with the issue of this as well as to minimize the time wasted

in trying to maneuver through the said systems

The study will also help employees learn how the system will also enable Norfolk Hotel Nairobi

(Fairmont Hotel) to determine relationships among "internal" factors such as price, service

delivery, or staff skills, and "external" factors such as economic indicators, competition, and

customer demographics

On the other hand, present and future researchers will have a starting point on ways of handling

large amounts of information to their advantage .

1.6 Scope of the study

The research will be carried out in Norfolk Hotel Nairobi (Fairmont Hotel) in the Nairobi CBD

Area and adjusting hotels and restaurants.

1.7 Limitations and delimitations of the study

Budget – due to the distance from my learning institution to the said venue of conducting the



research finances will be a huge setback but with few funds from my parents I will manage

Insufficient time to obtain research data, therefore, findings will give a short run overview and

not the general trend of events.

Time – the short time required to have completed the research proposal will be agreat hindrance

Primary data.

I intend to use:

Questioners.

Interviews.

Observations.

Secondary Sources

Internet- It is a very reliable source for providing information in the research area

because it contains many systems in use.

Books- Having to search on information e.g. on how hotel automation would enhance

services provided to the users, books helps in giving adequate information.

1.7 REQUIREMENTS

Hardware requirement

Pentium 4 PC.

Mouse and keyboard

Printer.

2gb flash disk for backup.

128 MB of



CHAPTER 2

2.0 DATA MINING

2.1 WHAT IS DATA MINING? OVERVIEW

1. Generally, data mining (sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it into useful information -information that can be used to increase revenue, cuts costs, or both. Data miningsoftware is one of a number of analytical tools for analyzing data. It allows users toanalyze data from many different dimensions or angles, categorize it, and summarize therelationships identified. Technically, data mining is the process of finding correlations orpatterns among dozens of fields in large relational databases.

2. Data mining is the process of sorting through large amounts of data and picking out

relevant information. It is usually used by business intelligence organizations, andfinancial analysts, but is increasingly being used in the sciences to extract informationfrom the enormous data sets generated by modern experimental and observationalmethods. According to W. Frawley and G. Piatetsky (1992) he described Data mining as"the nontrivial extraction of implicit, previously unknown, and potentially usefulinformation from data" and D. Hand, H. Mannila and P. Smyth (2001) described it as"The science of extracting useful information from large data sets or databases.", AndKantardzic, Mehmed (2003) defined Data mining in relation to enterprise resourceplanning is the statistical and logical analysis of large sets of transaction data, looking forpatterns that can aid decision making.

3. Data mining, the extraction of hidden predictive information from large databases , is apowerful new technology with great potential to help companies focus on the mostimportant information in their data warehouses. Data mining tools predict future trendsand behaviors, allowing businesses to make proactive, knowledge-driven decisions. Theautomated, prospective analyses offered by data mining move beyond the analyses of past events provided by retrospective tools typical of decision support systems. Datamining tools can answer business questions that traditionally were too time consuming to



resolve. They scour databases for hidden patterns, finding predictive information thatexperts may miss because it lies outside their expectations.

Most companies already collect and refine massive quantities of data. Data miningtechniques can be implemented rapidly on existing software and hardware platforms to

enhance the value of existing information resources, and can be integrated with newproducts and systems as they are brought on-line. When implemented on highperformance client/server or parallel processing computers, data mining tools can analyzemassive databases to deliver answers to questions such as, "Which clients are most likelyto respond to my next promotional mailing, and why?"

2.2 DATA, INFORMATION, AND KNOWLEDGE

2.2.1 Data

Data are any facts, numbers, or text that can be processed by a computer. Today, organizations

are accumulating vast and growing amounts of data in different formats and different databases.This includes:

operational or transactional data such as, sales, cost, inventory, payroll, and accounting

non-operational data, such as industry sales, forecast data, and macro economic data

meta data - data about the data itself, such as logical database design or data dictionarydefinitions

2.2.2 Information

The patterns, associations, or relationships among all this data can provide information. Forexample, analysis of retail point of sale transaction data can yield information on which productsare selling and when.

2.2.3 Knowledge

Information can be converted into knowledge about historical patterns and future trends. Forexample, summary information on retail supermarket sales can be analyzed in light of promotional efforts to provide knowledge of consumer buying behavior. Thus, a manufacturer orretailer could determine which items are most susceptible to promotional efforts.

2.2.4 Data Warehouses

Dramatic advances in data capture, processing power, data transmission, and storage capabilitiesare enabling organizations to integrate their various databases into data warehouses. Datawarehousing is defined as a process of centralized data management and retrieval. Data

warehousing, like data mining, is a relatively new term although the concept itself has beenaround for years. Data warehousing represents an ideal vision of maintaining a central repositoryof all organizational data. Centralization of data is needed to maximize user access and analysis.Dramatic technological advances are making this vision a reality for many companies. And,equally dramatic advances in data analysis software are allowing users to access this data freely.The data analysis software is what supports data mining.

2.3 THE FOUNDATIONS OF DATA MINING



Data mining techniques are the result of a long process of research and product development.This evolution began when business data was first stored on computers, continued withimprovements in data access, and more recently, generated technologies that allow users tonavigate through their data in real time. Data mining takes this evolutionary process beyondretrospective data access and navigation to prospective and proactive information delivery. Data

mining is ready for application in the business community because it is supported by threetechnologies that are now sufficiently mature:

Massive data collection Powerful multiprocessor computers Data mining algorithms

Commercial databases are growing at unprecedented rates. A survey of data warehouse projectsfound that 19% of respondents are beyond the 200 gigabyte level, while 59% expect to be thereby the start of the new millennia. [1] In some industries, such as retail, these numbers can bemuch larger. The accompanying need for improved computational engines can now be met in a

cost-effective manner with parallel multiprocessor computer technology. Data mining algorithmsembody techniques that have existed for at least 10 years, but have only recently beenimplemented as mature, reliable, understandable tools that consistently outperform olderstatistical methods.

In the evolution from business data to business information, each new step has built upon theprevious one. For example, dynamic data access is critical for drill-through in data navigationapplications, and the ability to store large databases is critical to data mining. From the user’s

point of view, the four steps listed in Table 1 were revolutionary because they allowed newbusiness questions to be answered accurately and quickly.

Evolutionary

Step Business Question Enabling

Technologies Product

Providers Characteristics

Data Collection

(1960s)

"What was my totalrevenue in the last fiveyears?"

Computers, tapes,disks

IBM, CDC Retrospective,static datadelivery

Data Access

(1980s)

"What were unit salesin New England last

March?"

Relational databases(RDBMS),

Structured QueryLanguage (SQL),ODBC

Oracle,Sybase,

Informix,IBM,Microsoft

Retrospective,dynamic data

delivery atrecord level

DataWarehousing &

"What were unit salesin New England lastMarch? Drill down to

On-line analyticprocessing (OLAP),multidimensional

Pilot,Comshare,Arbor,

Retrospective,dynamic datadelivery at



DecisionSupport

(1990s)

Boston." databases, datawarehouses

Cognos,Microstrategy

multiple levels

Data Mining

(EmergingToday)

"What’s likely to

happen to Boston unitsales next month?Why?"

Advancedalgorithms,multiprocessorcomputers, massivedatabases

Pilot,Lockheed,IBM, SGI,numerousstartups(nascentindustry)

Prospective,proactiveinformationdelivery

Table 1. Steps in the Evolution of Data Mining.

The core components of data mining technology have been under development for decades, inresearch areas such as statistics, artificial intelligence, and machine learning. Today, the maturityof these techniques, coupled with high-performance relational database engines and broad dataintegration efforts, make these technologies practical for current data warehouse environments.

2.4 WHAT CAN DATA MINING DO?

Data mining is primarily used today by companies with a strong consumer focus - retail,financial, communication, and marketing organizations. It enables these companies to determinerelationships among "internal" factors such as price, product positioning, or staff skills, and

"external" factors such as economic indicators, competition, and customer demographics. And, itenables them to determine the impact on sales, customer satisfaction, and corporate profits.Finally, it enables them to "drill down" into summary information to view detail transactionaldata.

2.4.1 The Scope of Data Mining

Data mining derives its name from the similarities between searching for valuable businessinformation in a large database — for example, finding linked products in gigabytes of storescanner data — and mining a mountain for a vein of valuable ore. Both processes require eithersifting through an immense amount of material, or intelligently probing it to find exactly wherethe value resides. Given databases of sufficient size and quality, data mining technology can

generate new business opportunities by providing these capabilities:

Automated prediction of trends and behaviors. Data mining automates the process of finding predictive information in large databases. Questions that traditionally requiredextensive hands-on analysis can now be answered directly from the data — quickly. Atypical example of a predictive problem is targeted marketing. Data mining uses data onpast promotional mailings to identify the targets most likely to maximize return oninvestment in future mailings. Other predictive problems include forecasting bankruptcy



and other forms of default, and identifying segments of a population likely to respondsimilarly to given events.

Automated discovery of previously unknown patterns. Data mining tools sweepthrough databases and identify previously hidden patterns in one step. An example of

pattern discovery is the analysis of retail sales data to identify seemingly unrelatedproducts that are often purchased together. Other pattern discovery problems includedetecting fraudulent credit card transactions and identifying anomalous data that couldrepresent data entry keying errors.

Data mining techniques can yield the benefits of automation on existing software and hardwareplatforms, and can be implemented on new systems as existing platforms are upgraded and newproducts developed. When data mining tools are implemented on high performance parallelprocessing systems, they can analyze massive databases in minutes. Faster processing means thatusers can automatically experiment with more models to understand complex data. High speedmakes it practical for users to analyze huge quantities of data. Larger databases, in turn, yield

improved predictions.

Databases can be larger in both depth and breadth:

More columns. Analysts must often limit the number of variables they examine whendoing hands-on analysis due to time constraints. Yet variables that are discarded becausethey seem unimportant may carry information about unknown patterns. Highperformance data mining allows users to explore the full depth of a database, without pre-selecting a subset of variables.

More rows. Larger samples yield lower estimation errors and variance, and allow users

to make inferences about small but important segments of a population.

The most commonly used techniques in data mining are:

Artificial neural networks: Non-linear predictive models that learn through training andresemble biological neural networks in structure.

Decision trees: Tree-shaped structures that represent sets of decisions. These decisionsgenerate rules for the classification of a dataset. Specific decision tree methods includeClassification and Regression Trees (CART) and Chi Square Automatic InteractionDetection (CHAID) .

Genetic algorithms: Optimization techniques that use processes such as geneticcombination, mutation, and natural selection in a design based on the concepts of evolution.

Nearest neighbor method: A technique that classifies each record in a dataset based on acombination of the classes of the k record(s) most similar to it in a historical dataset(where k ³ 1). Sometimes called the k-nearest neighbor technique.



Rule induction: The extraction of useful if-then rules from data based on statisticalsignificance.

Many of these technologies have been in use for more than a decade in specialized analysis toolsthat work with relatively small volumes of data. These capabilities are now evolving to integrate

directly with industry-standard data warehouse and OLAP platforms. The appendix to this whitepaper provides a glossary of data mining terms.

2.5 HOW DOES DATA MINING WORK? While large-scale information technology has been evolving separate transaction and analyticalsystems, data mining provides the link between the two. Data mining software analyzesrelationships and patterns in stored transaction data based on open-ended user queries. Severaltypes of analytical software are available: statistical, machine learning, and neural networks.Generally, any of four types of relationships are sought:

Classes: Stored data is used to locate data in predetermined groups. For example, arestaurant chain could mine customer purchase data to determine when customers visitand what they typically order. This information could be used to increase traffic byhaving daily specials.

Clusters: Data items are grouped according to logical relationships or consumerpreferences. For example, data can be mined to identify market segments or consumeraffinities.

Associations: Data can be mined to identify associations. The beer-diaper example is anexample of associative mining.

Sequential patterns: Data is mined to anticipate behavior patterns and trends. Forexample, an outdoor equipment retailer could predict the likelihood of a backpack beingpurchased based on a consumer's purchase of sleeping bags and hiking shoes.

Data mining consists of five major elements:

Extract, transform, and load transaction data onto the data warehouse system.

Store and manage the data in a multidimensional database system.

Provide data access to business analysts and information technology professionals.

Analyze the data by application software.

Present the data in a useful format, such as a graph or table.

Different levels of analysis are available:



Artificial neural networks: Non-linear predictive models that learn through training andresemble biological neural networks in structure.

Genetic algorithms: Optimization techniques that use processes such as geneticcombination, mutation, and natural selection in a design based on the concepts of natural

evolution.

Decision trees: Tree-shaped structures that represent sets of decisions. These decisionsgenerate rules for the classification of a dataset. Specific decision tree methods includeClassification and Regression Trees (CART) and Chi Square Automatic InteractionDetection (CHAID) . CART and CHAID are decision tree techniques used forclassification of a dataset. They provide a set of rules that you can apply to a new(unclassified) dataset to predict which records will have a given outcome. CARTsegments a dataset by creating 2-way splits while CHAID segments using chi square teststo create multi-way splits. CART typically requires less data preparation than CHAID.

Nearest neighbor method: A technique that classifies each record in a dataset based on acombination of the classes of the k record(s) most similar to it in a historical dataset(where k 1). Sometimes called the k -nearest neighbor technique.

Rule induction: The extraction of useful if-then rules from data based on statisticalsignificance.

Data visualization: The visual interpretation of complex relationships inmultidimensional data. Graphics tools are used to illustrate data relationships.

How exactly is data mining able to tell you important things that you didn't know or what is

going to happen next? The technique that is used to perform these feats in data mining is calledmodeling. Modeling is simply the act of building a model in one situation where you know theanswer and then applying it to another situation that you don't. For instance, if you were lookingfor a sunken Spanish galleon on the high seas the first thing you might do is to research the timeswhen Spanish treasure had been found by others in the past. You might note that these shipsoften tend to be found off the coast of Bermuda and that there are certain characteristics to theocean currents, and certain routes that have likely been taken by the ship’s captains in that era.You note these similarities and build a model that includes the characteristics that are common tothe locations of these sunken treasures. With these models in hand you sail off looking fortreasure where your model indicates it most likely might be given a similar situation in the past.Hopefully, if you've got a good model, you find your treasure.

This act of model building is thus something that people have been doing for a long time,certainly before the advent of computers or data mining technology. What happens oncomputers, however, is not much different than the way people build models. Computers areloaded up with lots of information about a variety of situations where an answer is known andthen the data mining software on the computer must run through that data and distill thecharacteristics of the data that should go into the model. Once the model is built it can then beused in similar situations where you don't know the answer. For example, say that you are the



director of marketing for a telecommunications company and you'd like to acquire some newlong distance phone customers. You could just randomly go out and mail coupons to the generalpopulation - just as you could randomly sail the seas looking for sunken treasure. In neither casewould you achieve the results you desired and of course you have the opportunity to do muchbetter than random - you could use your business experience stored in your database to build a

model.

As the marketing director you have access to a lot of information about all of your customers:their age, sex, credit history and long distance calling usage. The good news is that you also havea lot of information about your prospective customers: their age, sex, credit history etc. Yourproblem is that you don't know the long distance calling usage of these prospects (since they aremost likely now customers of your competition). You'd like to concentrate on those prospectswho have large amounts of long distance usage. You can accomplish this by building a model.Table 2 illustrates the data used for building a model for new customer prospecting in a datawarehouse.

Customers Prospects

General information (e.g. demographicdata)

Known Known

Proprietary information (e.g. customertransactions)

Known Target

Table 2 - Data Mining for Prospecting

The goal in prospecting is to make some calculated guesses about the information in the lowerright hand quadrant based on the model that we build going from Customer General Informationto Customer Proprietary Information. For instance, a simple model for a telecommunicationscompany might be:

98% of my customers who make more than $60,000/year spend more than $80/month on longdistance

This model could then be applied to the prospect data to try to tell something about theproprietary information that this telecommunications company does not currently have access to.With this model in hand new customers can be selectively targeted.

Test marketing is an excellent source of data for this kind of modeling. Mining the results of atest market representing a broad but relatively small sample of prospects can provide a



foundation for identifying good prospects in the overall market. Table 3 shows another commonscenario for building models: predict what is going to happen in the future.

Yesterday Today Tomorrow

Static information and currentplans (e.g. demographic data,marketing plans)

Known Known Known

Dynamic information (e.g.customer transactions)

Known Known Target

Table 3 - Data Mining for Predictions

If someone told you that he had a model that could predict customer usage how would you knowif he really had a good model? The first thing you might try would be to ask him to apply hismodel to your customer base - where you already knew the answer. With data mining, the bestway to accomplish this is by setting aside some of your data in a vault to isolate it from themining process. Once the mining is complete, the results can be tested against the data held inthe vault to confirm the model’s validity. If the model works, its observations should hold for the

vaulted data.

Today, data mining applications are available on all size systems for mainframe, client/server,and PC platforms. System prices range from several thousand dollars for the smallestapplications up to $1 million a terabyte for the largest. Enterprise-wide applications generallyrange in size from 10 gigabytes to over 11 terabytes. NCR has the capacity to deliverapplications exceeding 100 terabytes. There are two critical technological drivers:

Size of the database: the more data being processed and maintained, the more powerfulthe system required.

Query complexity: the more complex the queries and the greater the number of queries

being processed, the more powerful the system required.

Relational database storage and management technology is adequate for many data miningapplications less than 50 gigabytes. However, this infrastructure needs to be significantlyenhanced to support larger applications. Some vendors have added extensive indexingcapabilities to improve query performance. Others use new hardware architectures such asMassively Parallel Processors (MPP) to achieve order-of-magnitude improvements in querytime. For example, MPP systems from NCR link hundreds of high-speed Pentium processors to



achieve performance levels exceeding those of the largest supercomputers.

2.7 AN ARCHITECTURE FOR DATA MINING

To best apply these advanced techniques, they must be fully integrated with a data warehouse aswell as flexible interactive business analysis tools. Many data mining tools currently operate

outside of the warehouse, requiring extra steps for extracting, importing, and analyzing the data.Furthermore, when new insights require operational implementation, integration with thewarehouse simplifies the application of results from data mining. The resulting analytic datawarehouse can be applied to improve business processes throughout the organization, in areassuch as promotional campaign management, fraud detection, new product rollout, and so on.Figure 1 illustrates an architecture for advanced analysis in a large data warehouse.

Figure 1 - Integrated Data Mining Architecture

The ideal starting point is a data warehouse containing a combination of internal data tracking allcustomer contact coupled with external market data about competitor activity. Backgroundinformation on potential customers also provides an excellent basis for prospecting. Thiswarehouse can be implemented in a variety of relational database systems: Sybase, Oracle,Redbrick, and so on, and should be optimized for flexible and fast data access.

An OLAP (On-Line Analytical Processing) server enables a more sophisticated end-userbusiness model to be applied when navigating the data warehouse. The multidimensionalstructures allow the user to analyze the data as they want to view their business – summarizingby product line, region, and other key perspectives of their business. The Data Mining Servermust be integrated with the data warehouse and the OLAP server to embed ROI-focused

business analysis directly into this infrastructure. An advanced, process-centric metadatatemplate defines the data mining objectives for specific business issues like campaignmanagement, prospecting, and promotion optimization. Integration with the data warehouseenables operational decisions to be directly implemented and tracked. As the warehouse growswith new decisions and results, the organization can continually mine the best practices andapply them to future decisions.



This design represents a fundamental shift from conventional decision support systems. Ratherthan simply delivering data to the end user through query and reporting software, the AdvancedAnalysis Server applies users’ business models directly to the warehouse and returns a proactive

analysis of the most relevant information. These results enhance the metadata in the OLAPServer by providing a dynamic metadata layer that represents a distilled view of the data.

Reporting, visualization, and other analysis tools can then be applied to plan future actions andconfirm the impact of those plans.

2.8 VISUALIZING DATA MINING MODELS The point of data visualization is to let the user understand what is going on. Since data miningusually involves extracting "hidden" information from a database, this understanding process canget somewhat complicated. In most standard database operations nearly everything the user seesis something that they knew existed in the database already. A report showing the breakdown of sales by product and region is straightforward for the user to understand because they intuitivelyknow that this kind of information already exists in the database. If the company sells differentproducts in different regions of the county, there is no problem translating a display of this

information into a relevant understanding of the business process.

Data mining, on the other hand, extracts information from a database that the user did not alreadyknow about. Useful relationships between variables that are non-intuitive are the jewels that datamining hopes to locate. Since the user does not know beforehand what the data mining processhas discovered, it is a much bigger leap to take the output of the system and translate it into anactionable solution to a business problem. Since there are usually many ways to graphicallyrepresent a model, the visualizations that are used should be chosen to maximize the value to theviewer. This requires that we understand the viewer's needs and design the visualization with thatend-user in mind. If we assume that the viewer is an expert in the subject area but not datamodeling, we must translate the model into a more natural representation for them. For thispurpose we suggest the use of orienteering principles as a template for our visualizations.

2.9 DATA MINING AND CUSTOMER RELATIONSHIP MANAGEMENT Customer relationship management (CRM) is a process that manages the interactions between acompany and its customers. The primary users of CRM software applications are databasemarketers who are looking to automate the process of interacting with customers.

To be successful, database marketers must first identify market segments containing customersor prospects with high-profit potential. They then build and execute campaigns that favorablyimpact the behavior of these individuals.

The first task, identifying market segments, requires significant data about prospective customersand their buying behaviors. In theory, the more data the better. In practice, however, massivedata stores often impede marketers, who struggle to sift through the minutiae to find the nuggetsof valuable information.

Recently, marketers have added a new class of software to their targeting arsenal. Data miningapplications automate the process of searching the mountains of data to find patterns that aregood predictors of purchasing behaviors.



After mining the data, marketers must feed the results into campaign management software that,as the name implies, manages the campaign directed at the defined market segments.

In the past, the link between data mining and campaign management software was mostly

manual. In the worst cases, it involved "sneaker net," creating a physical file on tape or disk,which someone then carried to another computer and loaded into the marketing database.

This separation of the data mining and campaign management software introduces considerableinefficiency and opens the door for human errors. Tightly integrating the two disciplines presentsan opportunity for companies to gain competitive advantage.

2.9.1 How Data Mining Helps Database Marketing Data mining helps marketing users to target marketing campaigns more accurately; and also toalign campaigns more closely with the needs, wants, and attitudes of customers and prospects.

If the necessary information exists in a database, the data mining process can model virtually anycustomer activity. The key is to find patterns relevant to current business problems.

Typical questions that data mining addresses include the following:

Which customers are most likely to drop their cell phone service?What is the probability that a customer will purchase at least Ksh1000 worth of merchandisefrom a particular mail-order catalog?Which prospects are most likely to respond to a particular offer?Answers to these questions can help retain customers and increase campaign response rates,which, in turn, increase buying, cross-selling, and return on investment (ROI).



CHAPTER 3

3.0 DATA COLLECTION AND ANALYSIS.

3.1 Data Collection Techniques.

This were fact-finding techniques that I used to collect data about the requirements of the system that I

proposed.

The methods I used included:

Questionnaire.

Observation

Interview

The main technique used was questionnaire as most of the Green Park Golf and Country Club

staff members are usually very busy and didn’t have time for an interview. Interview was used toensure accurate and comprehensive investigation. I mainly interviewed the Hotel manager.

3.1.1 Questionnaire.

Questionnaires were prepared to collect information about various aspects of the system from

various respondents. Questionnaires were designed and issued to selected population. The use of

standardized questionnaire helped to yield more reliable data than other fact-finding techniques and the

wide distribution ensured greater anonymity for respondents, which lead to more honest responses.

Questionnaires included both open and closed-ended questionnaires, which were filled by the

respondents. Open-ended questionnaires were used to learn about feelings, opinions, and general

experiences or to explore a process or a problem. Closed-ended questionnaires controlled the frame of

reference by presenting respondents with specific responses from which to choose.

The responses were not subjected to any influence, as the respondents were required to fill in the

questionnaires and nearly all the respondents responded in time.



3.1.2 Observation.

Observation enables the analyst to have to have an inside view of the system operations rather

than the outside view of the system.

In these case observation was used to get requirements that might be under looked by both the interview

and the questioner and not justify the achievable requirements and non-achievable requirement. The area

that the developer observed were the physical task performance and also familiarize with the commission

day today activities that go on there including customer’s registration task. The other reason for using

observation is that some workers may not be able to describe some work experience in exact terms and

may distort some facts about the system.

Advantages

Fast response time.

Enables the analyst to verify information obtained by other methods.

Disadvantages.

Time consuming.

Exaggeration may occur. Summary of the results gathered from the questionnaires and analysis using

pie charts and bar charts

3.1.3 Interview

An interview was conducted on one of the company’s manager and managed to gather the

following;

a) There was the need to analyze tourist request patterns.

b) There was need to find ways to enhance the expansion of the customer base.

After the interview I realized the proposed system would enhance the company’s business

activity in the following ways;

a) The proposed system would use data mining capacity of the software to analyze

tourist request patterns.b) The patterns, associations, or relationship among all this data can provide

information which can be converted into Knowledge about historical patterns and

future trends.

c) It would lead to improved profitability due to the enlarged customer base.

d) It would lead to enhanced stock variety leading directly to improved profitability.



Advantages of using interviews:

The analyst can be able frame questions differently to individuals depending on their levels of

understanding. Thus it allowed detailed facts gathering.

The analyst can observe non-verbal communication from the respondents or interviewees.

The response rate rends to be high

Provides immediate response

3.2 ANALYSIS OF THE COLLECTED DATA.

The research findings were analyzed to get the actual information about Green Park Hotel and

Country club. The table below represents the number of people who were interviewed and the

roles they play in the company.

Person interviewed Role in the company Number of interviewee(s)

Manager To manage the Hotel’s resources. 1

Chief Chef He is in charge of food department. 2Cashier Takes the money and issues receipts 3

Waiters They offer services in the hotel 7

Housekeepers In charge of room maintenance 5

Receptionist Front office personnel 2

Total 46



Summary

The majority of the employees that responded were the waiters. They complained about

the slowness of services due to the data accumulation.

The cashiers also helped to analyze the system since they were using the current system

during day to day operation.

For the manager the main concern was the lack of knowledge about customer’s future

trends because of lack of mining tools in the current system which can be of help in decision

making.

The Chief Chef was also concerned because the current system could not help in knowing

what food the customers preferred ordered and at which time. So as to know their trends.

3.3 FINDINGS AND RESULTS ANALYSIS

The respondents answered all the questions and yielded the following:

No of

respond

ents

Question Answer conclusion

20 Are you an employee of Green

Park Hotel and Country Club?

All the respondents were

employee of the hotel.

All were employees.

20 If yes, which position do you Manager -1

Manager

Chief Chef

Cashier

Waiters

Houskeeper

Receptionist



hold in Green Park Hotel and

Country Club?

Chief chef -2

cashier -8

others -9

20 How long have you been

working in Green Park Hotel

and Country Club?

Less than I year-1

1-3 years-10

More than 3 years-9

Majority of them have

been in the hotel for

less than 3 years.

20 Are you computer literate? Yes-17

No-3

Many of them were

computer literate.

20 For how long have Green Park

Hotel and Country Club

accumulated data?

Most of them felt that it was

for quiet a long time

20 Which season do you get many

customers?

Summer -12

Winter -2

Spain -3

Autumn -2

Most respondents

selected summer as the

season with most

customers.

18 On average how many

customers do you serve per

day?

Less than 100 -0

Between 100-500 -6

Between 500-1000-12

More than 1000-1

Majority selected

between 500-1000

9 How do you store your data? In files and cabinets -2

In a DBMS -7

Others -0

Most choose DBMS.

20 How many transactions does

the hotel make per day?

Less than 100 -0

Between 100-500 -6

Between 500-1000-12

More than 1000-1

20 From which region do most of

your customers come from?

Within Kenya -2

Within Eastern Africa

continent -3

Within Africa continent-9

Within Africa

continent.



Worldwide -6.

20 How fast do you think the

employees will take to adapt the

data mining system?

Very fast -14

Fast -3

Slow -1

Too slow -0

The respondents felt

that it will be adapted

very fast.

20 Which of the problem do you

think will be solved by the data

mining system?

Accumulation of data -0

Analyzing of customers

habits-0

Understanding the customers

mixes -3

Pricing and focusing of

services to a group of

customers-0

All the above -20.

None of the above-0

Employees thought it

will solve all problems

that were provided in

the question

19 Do you think implementing

data mining system will be

cheaper or will it provide

additional expenses?

It will be cheaper -12

There are cost implications-4

Not sure-3

Majority thought it

will be cheap.

20 Will the data mining system

make good use of resources

available in the hotel?

Yes-17

No-3

87 % felt that it will

make good use of the

resources.

15 Would you recommend a data

mining system for the Hotel?

Yes-12

No-3

Majority of them felt

that it would be better

to implement one.

15 If yes, which features of the

current system needs to be

improved?

Majority felt that the data

storage and also data mining

should be implemented.

3.4 FINDINGS.



From the respondents’ feedback, it can be concluded that most of the respondents recommended

development of a data mining system that will be able to analyze tourist requests, convert the

information into knowledge to help to know the future trend of the Hotel, enable the

management to determine the impacts on sales, customer satisfaction and hotel’s profits.

3.5 Secondary sources of information included:-

i) Internet – Material provided on the internet was used to develop a clear understanding on the use of

data mining technique to analyze tourist request patterns, determine the impact on sales, customer

satisfaction e.t.c. This provided a wide variety of information on development of a data mining system.

ii) Books – Books, magazines and journals on human-computer interaction also contributed to the

collection of the required information.

iii) Analyzing documents

The organization’s documents that were analyzed during data collection process were grouped into three

categories:

Documents that might describe the problem: Those analyzed included customer complaints,

interoffice memos, suggestion box notes, reports, work measurement reviews and accounting records.

Documents that might describe the business functions: Those analyzed included: Organizational

policies, departmental objectives and standard operating procedures.

3.6 FEASIBILITY STUDYFeasibility study was carried out to determine whether the proposed system was worthwhile.

3.6.1 Economic Feasibility

An economic feasibility study of the system was carried out to establish whether the benefits of

the proposed system outweigh the cost of implementation of the new system. It was found that the system

is viable since the organization has the resources required during the implementation of the system. The

system will be able to determine the impact on sales and hotel’s profits.

BENEFITS AMOUNT

Costs saved on file buying and

maintenance (yearly)

35,000

Costs saved on salaries (yearly) 55,000



COST ANALYSES TABLE.

BENEFITS ANALYSIS TABLE

3.6.2 Legal Feasibility

The management is keen on the system legality and registration if any. The use of valid

development tools and software is of utmost importance to ensure the system passes all the legal

requirements and tests. The work is also copyrighted as original work.

3.6.3 Social Feasibility.

The system in addition to analyzing tourist request patterns will enhance communication

and interaction between the employees and customers. This will improve communication.

3.6.4 Operational Feasibility.

An Operational Feasibility study was carried out to address the question whether the new

system operations are acceptable to the users. The study indicated that the intended system users

support implementation of the system, there was no resistance to the new system to be

implemented and therefore they will be able to use the system with little training. The system

will not affect the organizational structure the effect of the system on the existing organizational

structure.

The following were the main areas touched on:

The effect of the system on current organization structure Implication of the system on existing staff development programmers

Redundancy and retrenchment implication to the employees as a result of new

system.

3.6.5 Technical Feasibility.

Technical Feasibility study was conducted to determine whether the proposed system can be

Loss avoidance (yearly) 20,000

TOTAL 110,000

COST AMOUNT

Maintenance cost 9,000

Training cost 10, 000

Operational cost 15,400

Development cost 27, 000

Total 61,400



implemented using the available hardware, software and technical resources. The study indicated that the

institution has enough resources required in terms of equipment, personnel, technology and the likelihood

that it can be developed

Technical feasibility thus was aimed at evaluation of the following:

The hardware required for the new system

The software required for the new system

Determination of whether the current facilities are adequate or inadequate for the new system after

implementation.

Evaluation of the current technology and how application it is to the new system.

The inputs, outputs, files and procedures that the proposed system should have as compared to the

outputs, files and procedures for the current system.

3.7 SYSTEM REQUIREMENT SPECIFICATION

Functional Requirements

1. The system should provide an easy and user friendly interface for employees convenient

use

2. The system should be able to capture and store all the useful data.

3. It should provide full functionality in the sense that it should be able to analyze tourist

request patterns.

4. The system should also be able to convert the customer’s patterns, associations, or

relationships among all data into knowledge so as to help in knowing historical patterns

and future trends of the customers.

5. The system should be able to determine relationships among internal factors such as price

or service delivery and external factors such as economic indicators and competition.

6. The system should enable the management to drill down into summary information to

view detail transactional data.

7. The system should also enable the management to determine the impact on sales,

customer satisfaction and hotel’s profits.

Non-Functional Requirements.

1. Reliability

The system will be available to the user whenever required.



2. Maintainability

The system can handle changes in the future at minimal costs and effort.

3. Re-Usability

Modules in the system can be used more than once throughout the system and in future versions

of the same.

4. Resource Utilization.

The system should ensure that all resources are effectively used.

3.8 Legal Requirements

The system shall comply with all legal requirements as stated by the government law and

must make sure that all copyrights are adhered to which prevent it from competitor copying.

3.8.1 User requirement

To enhance usability which is an important aspect of web navigation, the website interface

should maintain consistency of colors and language and simplicity to the users for understanding

purposes. The system is easily adoptable to its usage through the help manual. In response to

repetitive strain injuries to the users appropriate work design should be implemented, with anti-

glares in place to protect the users’ eyes and regular breaks for the relaxation purposes.

3.8.2 Data Requirements

Input data

The system will require the user to input data through the keyboard and the mouse so that he/she

will be authenticated.

Output data

The system will enable querying from the database all the details the system stores and display

them for the user.