marat analysis

28
UNIVERTY OF DAR ES SALAAM RESEARCH PROPOSAL FOR THE MASTERS OF SCIENCE IN COMPUTER SCIENCE DEGREE BY THESIS STAGE: I & II 1.0. NAME OF CANDIDATE: LUNGO, JUMA H. Reg.No: HD/TP.1/2000 B.Sc. (Comp.) (Hons.) (DAR) 2.0. NAME OF SUPERVISOR: 1. Dr. S. C. N. Kitinya 2. Dr. H. M. Twaakyondo 3.0. DEPARTMENT AND FACULTY: DEPARTMENT OF COMPUTER SCIENCE – FACULTY OF SCIENCE. 4.0. PROPOSED DEGREE: M.Sc. (COMPUTER SCIENCE) 5.0. TITLE: 1

Upload: cyrus-bondo

Post on 29-Nov-2014

73 views

Category:

Environment


2 download

DESCRIPTION

we

TRANSCRIPT

Page 1: MARAT ANALYSIS

UNIVERTY OF DAR ES SALAAM

RESEARCH PROPOSAL FOR THE MASTERS OF SCIENCE IN

COMPUTER SCIENCE DEGREE BY THESIS

STAGE: I & II

1.0. NAME OF CANDIDATE: LUNGO, JUMA H.

Reg.No: HD/TP.1/2000

B.Sc. (Comp.) (Hons.)

(DAR)

2.0. NAME OF SUPERVISOR: 1. Dr. S. C. N. Kitinya

2. Dr. H. M. Twaakyondo

3.0. DEPARTMENT AND FACULTY: DEPARTMENT OF

COMPUTER SCIENCE – FACULTY OF SCIENCE.

4.0. PROPOSED DEGREE: M.Sc. (COMPUTER SCIENCE)

5.0. TITLE:

Design And Implementation of a Data Warehouse

Prototype For The Chief Academic Officer, University Of

Dar Es Salaam Within The Context of Relational Online

Analytical Processing (Data Analysis).

1

Page 2: MARAT ANALYSIS

6.0 INTRODUCTION

6.1 GENERAL INTRODUCTION

IBM first published a technical article on information warehouse

strategy in 1988 (Ballard, Chuck. 1996). This is a strategy for satisfying

business needs for complex queries and insightful information with a

managed database. In 1990, William Inmon (Inmon, W. H. 1997) coined

he phrase “Data warehouse”. The ultimate goal of data warehousing is

the creation of a single, logical view of data, which may reside in many

physically disparate databases (Butler Group. 1996). “…traditional

database systems are good at recording and reporting what happened.

A data warehouse shows why” (Fisher, Lawrence. 1996).

Data warehouses represent the latest great paradigm of database

management. The earliest data management systems were

hierarchical, run on massive mainframes, and were used primarily for

archival purposes. The first big change came in the early 1980’s, with

the adoption of relational database systems, which have primarily

operational applications. These systems, typically run on

minicomputers, are used for online transaction processing (O.L.T.P.),

for example, to operate networks automated teller machine. Now come

Data warehouses, commonly run on client/server networks of personal

computers and more powerful server machines. These latest systems

are used for online analytical processing (O.L.A.P.), an essentially

strategic application.

2

Page 3: MARAT ANALYSIS

Data warehouse organize and store data, from the operational

environment, over a long historical time perspective. Consequently,

they provide data found in the operational environment. Data

warehouse allows user to recognize data they want and, using simple

query tools, create their own queries, based on solid repository of

integrated, historical data.

The concept of data warehouse is that: It’s a place where data

extracted from production systems in the enterprise is stored (Warner,

Tim. 1995). The University of Dar es Salaam as a big organization,

there are operational systems like: Admission systems,

Accommodation system, Examination record system, Master timetable,

etc. all of these systems generate data that are vital to the University

decision makers. Data warehouse is required to organize all of these

data to be readily accessible and meaningful to the Chief Academic

Office to support their decisions making.

This study is divided into two main parts. The first part of the study

will involve literature study, and documentation of the architecture,

planning and designing methods, implementation techniques and

laying out options for data ware house. This part of the research will be

carried out and documented to enhance future references. The second

part of the research will be that of laboratory work. This will involve the

real development of the prototype of Data warehouse within the

context of Relational Online Analytical Processing (ROLAP).

6.2. STATEMENT OF THE RESEARCH PROBLEM:

The frustrations of the 1970s are felt more keenly today, because the

technology that facilitating sharing of data (network, communication

protocols, sophisticated Database Management Systems, Decision

support systems, etc.) are freely available, yet organizations still find

3

Page 4: MARAT ANALYSIS

that data is organized into functional silos, from which it is hard to

extricate what you want in other, related function (Jack D. Doyle.

1997).

At the University of Dar es Salaam, despite the availability of more and

more powerful computers on everyone’s desk and communication

networks, large number of executives and decision makers can’t get

their hands on critical information that already exist in the University.

One of the executives of the University is the Chief Academic officer.

As an education institution, the University every day creates data

about students, supporting programmes, staff etc, of which are

important in supporting the daily works of the Chief Academic office of

the University, but for the most part, this data is locked up in a myriad

of manual and computer systems and is exceedingly difficult for the

chief academic officer to get at.

We are intending to conduct a study to analyse, design and implement

data warehouse that will enable high improvement of information

access for the Chief Academic Office.

According to Michael Haisten, 1998 the most powerful justifications for

opting Data warehouse investment in the Chief Academic office

therefore are:

Quality goals, since its typical objective are improving

information access,

Bringing the user in touch with their data,

Enhancing the quality of their decisions and

Providing cross-function integration of operation systems within

the Organisation.

4

Page 5: MARAT ANALYSIS

The result obtained will then be useful for future development of

successful Data warehouse of the Chief Academic office of the

University of Dar es Salaam.

6.3. RESEARCH OBJECTIVES

The general aim of the research is to study the architecture, design

and implementation of Data warehouses by developing a model for

Chief Academic Office Data warehouse.

The proposed research objectives, derived from this general aim are:

To study and document the architecture of Data warehouse,

To determine (identify) aspects, playing key roles in the design

and implementation of data warehouse,

To develop a University system model (prototype) for Data

warehouse,

To test (validate) the model in a real life cases.

6.4 SIGNIFICANCE OF THE STUDY

The result obtained from this research will be used to develop Data

warehouse for the Chief Academic Office of the University. Also the

documentation (report) of the research will be used as reference for

any other study on the topic of Data warehouse especially from the

University of Dar es Salaam.

This study too will encourage and challenge many organisation to opt

for data warehouse investment in order to improve information access

within their firms, bringing the user of their information in touch with

their data, and providing cross-function integration of operation

systems within their organisation. Data warehouse for the Chief

Academic Office will enable the decision makers to access data,

5

Page 6: MARAT ANALYSIS

understood the data and manipulate them while making decisions for

the UDSM.

6.5 LITERATURE REVIEW

Data warehouse is defined as a subject – oriented, integrated, time

variant, non-volatile collection of data in support of management’s

decision – making process (Inmon, W.H. 1996). Subject-Oriented

means the data warehouse focuses on the high-level concerns of the

business, as in contrary to operational systems, which deals with

process, e.g., order processing, Billing system etc. Integrated implies

that data being stored in a consistent format. Time variant means each

data point is associated with a point in time. And non-volatile means

the data does not change once it gets into the warehouse (Jack D.

Doyle. 1997).

Ken Orr, 1996 stated that Data warehouse is a field that grows out of

integration of a number of different technologies and experiences over

the last two decades. Data warehouse can be best represented as an

enterprise-wide framework for managing informational data within the

organization. There are two fundamentally different types of

information systems in all organizations namely Information systems

and Operation systems.

Operational systems are the systems that help us run the enterprise on

day to day activities (Ken Orr. 1996). The University of Dar es Salaam

has systems like Admission system, examinations record systems,

accommodation system, Payroll, Timetable etc. Because of their

importance to the University, operational systems were almost always

the first to be computerized. Indeed, most large organizations couldn’t

operate without their operational systems and data that these systems

6

Page 7: MARAT ANALYSIS

maintain. Other functions within the organization have to do with

planning, forecasting and managing the organization. These are the

knowledge-based functions, which form the Information system of

the organization. Information systems have to do with analyzing data

and making decision, often major decisions about how the enterprise

will operate, now and in the future. Information data needs often span

a number of different areas and needs large amounts of different

operational data that are in summary form.

Data warehouse provide information to the knowledge-based function

(Decision Support Systems) within the organization. The operational

systems generate data that have to be put and organized to the data

warehouse (Vince Desio). Consider fig.1: below.

Fig.1: The concept of data warehouse.

(Source: http://www.datawarehouseconsulting.com/img2.gif)

A Data warehouse can be physically centralized, logically centralized

but physically distributed, or simply distributed. With today’s powerful

Local Area Network based Database servers, data warehouse can also

take advantage of the benefits of distributed computing.

7

Page 8: MARAT ANALYSIS

Building a data warehouse is essentially a complex integration effort.

Literally hundreds of system components must be brought together to

work as an integrated application (Vince Desio. 1998). The graphic on

the next page below represents only a high-level view of the basic

components that comprise a Data warehouse.

Fig.2: Data Warehouse Components

DATA WAREHOUSE ARCHITECTURE.

A Data warehouse architecture is a way of representing the overall

structure of data communication, processing and presentation that

exists for end user computing within the enterprise. The architecture is

8

INTERNAL & INTERNAL & EXTERNALEXTERNAL

OPERATIONAL OPERATIONAL DATADATA

Warehouse Meta DataSystem of RecordModelsStewardship

SOURCINGSOURCINGTransformation

MetadataIntegration

ConditioningAggregation

Initial vs. Change Load

DATA DATA WAREHOUSE WAREHOUSE REPOSITORYREPOSITORY

RDBMSPhysical Meta DataMult- Dimensional

INFORMATION INFORMATION ACCESSACCESS

MiddlewarePerformance ManagementAbstractionsUser objectPreparations

Desktop Desktop ToolsTools

QueryOLAPWWWReportGraphicsSpread SheetMeta data Catalog

METADATAMETADATA

ADMINISTRATIONADMINISTRATION

Page 9: MARAT ANALYSIS

made up of a number of interconnected parts (The Ken Orr Institute;

revised edition, 2000):

· Operational Data Base / External Data Base Layer

· Information Access Layer

· Data Access Layer

· Data Directory (Metadata) Layer

· Process Management Layer

· Application Messaging Layer

· Data Warehouse Layer

· Data Staging Layer

Operational Data Base / External Data Base Layer

The goal of data warehousing is to free the information that is locked

up in the operational data bases and to mix it with information from

other, often external, sources of data. Increasingly, large organizations

are acquiring additional data from outside data bases. This information

includes demographic, econometric, competitive and purchasing

trends. The so-called "information superhighway" is providing access

to more data resources every day.

Information Access Layer

The Information Access layer of the Data Warehouse Architecture is

the layer that the end-user deals with directly. In particular, it

represents the tools that the end-user normally uses day to day, e.g.

Excel, Word, Access, PowerPoint, SAS, etc. This layer also includes the

hardware and software involved in displaying and printing reports,

spreadsheets, graphs and charts for analysis and presentation.

Data Access Layer

The Data Access layer of the Data Warehouse Architecture is involved

with allowing the Information Access layer talk to the Operational

Layer. In the network world today, the common data language that has

9

Page 10: MARAT ANALYSIS

emerged is SQL. The Data Access layer then is responsible for

interfacing between Information Access tools and Operational Data

Bases.

Data Directory (Metadata) Layer

In order to provide for universal data access, it is absolutely necessary

to maintain some form data directory or repository of meta-data

information. Meta-data is the data about data within the enterprise. In

order to have a fully functional warehouse, it is necessary to have a

variety of meta-data available, data about the end-user views of data

and data about the operational data bases.

Process Management Layer

The Process Management layer is involved in scheduling the various tasks that must be

accomplished to build and maintain the data warehouse and data directory information.

The Process Management layer can be thought of as the scheduler or the high level job

control for the many processes (procedures) that must occur to keep the Data Warehouse

up-to-date.

Application Messaging Layer

The Application Message layer has to do with transporting information

around the enterprise-computing network.

Data Warehouse (Physical) Layer

The (core) Data Warehouse is where the actual data used primarily for

informational uses occurs.

Data Staging Layer

Data staging is also called copy management or replication

management, but in fact, it includes all of the processes necessary to

select, edit, summarize, combine and load data warehouse and

information access data from operation and/or external databases.

10

Page 11: MARAT ANALYSIS

The knowledge of Data warehouse in Tanzania is new. Currently there

is no known Data warehouse in Tanzania. This research will then create

awareness to the Tanzanian IT professionals and society in general to

utilize the power of data warehouse especially at higher learning

institutions like in the Universities where all necessary facilities for

building Data warehouses are present.

6.6 RESEARCH HYPOTHESIS

The architecture of the Data warehouse can be studied and

documented to become standard and known to every one

developing data warehouse.

There are key issues playing roles in the design and

implementation of data warehouse that need to be determined.

The existing expertise and computer facilities at the University

can facilitate to develop a data warehouse.

The resulting Data warehouse Model could be tested in a real

case in order to evaluate its completeness.

7.0 METHODOLOGY

7.1 Study Area

The University of Dar es Salaam was born out of a decision taken on

March 25th, 1970, by the East African Authority, to split the then

University of East Africa into three independent universities for

Kenya, Uganda and Tanzania.

The University of Dar es Salaam consists of six faculties, five institutes

and two colleges: Faculty of Arts and Social Sciences; Faculty of

Commerce and Management; Faculty of Education; Faculty of

11

Page 12: MARAT ANALYSIS

Engineering; Faculty of Law; Faculty of Science; Institute of

Development Studies; Institute of Kiswahili Research; Institute of

Marine Sciences; Institute of Production Innovation; Institute of

Resource Assessment; the University College of Lands and

Architectural Studies and the Muhimbili University College of Health

Sciences. The University also operates a Computing Centre, a Library

and four bureaus: the Economic Research Bureau in the Faculty of Arts

and Social Sciences; the Bureau for Educational Research and

Evaluation in the Faculty of Education; the Bureau for Industrial

Cooperation in the Faculty of Engineering and the University

Consultancy Bureau.

The University is situated on the west side of the city of Dar es Salaam,

occupying 1,625 acres on Observation Hill, 13 k.m. from the centre of

the city of Dar es Salaam.

For purposes of maintaining East African inter-university academic

cooperation and communication, an Inter-University Council for East

Africa was set up in 1970. The Council has established an

Inter-University Exchange Programme, through which the University

admits students from other East African countries mainly Kenya and

Uganda. The University also admits students from several other

countries the world-over through established links, exchange

programmes or individual applications. Most of these students receive

their bursaries from their respective governments. Students from other

countries are considered for admission to both undergraduate and

postgraduate studies, subject to the availability of vacancies.

7.2 Methodology

12

Page 13: MARAT ANALYSIS

A short visit will be made to the Chief Academic Office. This visit is

intended to familiarize the researcher and the stakeholders and also

will enable an initial study of how information flows in and out of the

CACO’s office.

7.2 Data Collection techniques

Observation

The aim of including this data collection technique is to conduct a

detailed notation of behaviors, events and the contexts surrounding

the Chief Academic Office. To fulfill this, physical observations of what

tools the CACO have that are used to collect analyze and disseminating

information will be conducted.

Interviews

An interview will be held between the researcher and the Chief

Academic Office staff. The purpose of interview is to find out what is in

or on some else’s mind (John W. Best & James V. Kahn. 1993).

Questions will be designed in such a way that it will enable us to

capture most information we need that will help us in our research.

Case Study

Case study should help in “capturing the knowledge of practitioners

and developing theories from it”.

A case study methodology is well suited to identify key events and

actors and to linking them in a casual chain.

The case strategy is particularly well suited to IS research because the

technology is relatively new and interest has shifted to organizational

rather than technical issues.

Case study is chosen because of its abilities to:

Give the possibility to generate theories from practice (as a

preparation stage for developing the model of Data warehouse);

13

Page 14: MARAT ANALYSIS

Allow to understand the nature and complexity of the processes

taking place in Data warehouse;

Research an area in which few previous studies have been

carries out;

Research an area in which it is necessary to measure variables,

but there is no a priori knowledge of what the variables of

interest will be. In this case the variables are aspects, which are

necessary to determine and estimate their role.

7.3 EXPECTED RESULTS OF THE RESEARCH

Theoretical Results

The main theoretical result of the research will be the model, which

supports Design and implementation of Data warehouse. The model

should comply with the ongoing Information Plan Policy (IPP) at the

University of Dar es Salaam. The model could include methods,

techniques and/or instrumentation, which have to be able to support

the Design and Implementation of Data warehouses in Tanzania.

Practical results

The main practical result of the research should be the realization of

the Design and implementation of Chief Academic office Data

warehouse of the University of Dar es Salaam. The success of this

part of the research depends on the full support and willingness of the

technical staff and management of already installed systems to realize

that this research will help in their daily needs of information.

8.0 REFERENCE/BIBLIOGRAPHY:

14

Page 15: MARAT ANALYSIS

1. Jack D. Doyle.(1997). Informed Decision Making Through

Data warehousing.

http://dhrinfo.hr.state.or.us/intranet/tands/Dwpap/DWWHITEP.htm

2. Vince Desio. Data warehouse Components.

http://www.datawarehouseconsulting.com/page3.html

3. Ken Orr. (1996). Data warehousing Technology. The Ken

Orr Institute; revised edition, 2000.

4. Roger Burlton. (1998). Data warehousing in the Knowledge

Management Cycle. http//datawarehouse.dci.com/articles.

5.

6. Ralph Kimball The Data warehouse Life Cycle Toolkit

7. Building the Data warehouse by William H. Inmon

8. Data warehouse Design Solutions by Christopher Adamson,

Michael Venerble.

9. SQL Server 7 Data warehousing by Michael Abbey, Ian

Abramson, Larry Barner, be Taub, Michael J. Corey.

10. High performance Oracle Data warehousing by Donald

Burleson.

11. Data Preparation for Data Mining by Dorian Pyle

12. Data warehousing: Architecture and Implementataion by

Mark Humphries, Michael w. Hawkins, Michelle C. Dy.

13. Butler Group. 1996. Business Case for Data Warehousing.

Strategies and Technologies. October 1996, Butler Group,

UK.

http//www.butlergroup.co.uk/manguide/dwuk1096/contents

.htm.

14. Fisher, Lawrence. 1996. Along the Infobahn. Data

Warehouses. Third Quarter, 1996. Strategy & Business,

15

Page 16: MARAT ANALYSIS

BoozAllen & Hamilton Inc. http//www.strategy-

business.com/technology/96308/page1.html

15. Boar, Bernard (Bernie). 1996. Understanding Data

Warehousing Strategically. White paper commissioned by

NCR's Communication Industry Line of Business. June 14,

1996. The Data Warehousing Institute, Gaithersburg, MD.

http://www.tekptnr.com/tpi/tdwi/review/bboar1.htm.pp.25

16. Imirie, Peggy. 1996. Your Data Warehouse: A Business

Success or Science Project? Lesson from the Experts. 29

December 1996. The Data Warehousing Institute,

Gaithersburg, MD.

http://www.dw-institute.com/lessons/sciproj.htm. pp. 2.

17. Ballard, Chuck. 1996. Strategies to make your Data

Warehouse a Success. Lesson from the Experts. December

29,1996. The Data Warehousing Institute, Gaithersburg,

MD. http://www.dw-institute.com/lessons/strateg.htm pp.2.

18. Byte. 1997. Architectural Distinctions. June 1997.

http://www.byte.com/art/9706/sec20/art4.htm.

19. Eckerson, Wayne W. 1994. Implementing Access to

Distributed Data Using a Data Warehouse Strategy. Patricia

Seybold Group, Distributed Computing Monitor Case Study,

September 1994.

http://www.psgroup.com/cases/1994/cs994d.htm.

20. Barbara, Gaskin 1998. Realizing the Strategic Value of Data

Warehouses (Decision Support Technology).

16

Page 17: MARAT ANALYSIS

9.0 OTHER INFORMATIONS:

9.1: Financial Requirement:

The proposed study is to be financed by the University of Dar es

Salaam. The technical assistance and equipment facilities will be

provided by the Department of Computer Science.

9.1.1: BUDGET:

(a). University costs:

DESCRIPTION YEAR 1 SUBSQUENT YEAR SPONSOR

Tuition fees 950,000/= 950,000/= UDSMApplication fee 10,000/= - -do-Registration fee 20,000/= - -do-Thesis Supervision 200,000/= 200,000/= -do-Medical capitation fee 100,000/= 100,000/= -do-Special Faculty Requirement

100,000/= 100,000/= -do-

Research Field Cost 750,000/= - -do-TOTAL 2,130,000/= 1,350,000/= -do-

(b). Student costs:

DESCRIPTION YEAR 1 SUBSQUENT YEAR SPONSORCaution money 2,000/= - UDSMStudent Union 1,200/= 1,200/= -do-Books 300,000/= 300,000/= -do-Stationary 50,000/= 50,000/= -do-

17

Page 18: MARAT ANALYSIS

Thesis Production - 150,000/= -do-Stipend (based on 130,000/=per month) 1,560,000/= 1,560,000/= -do-

TOTAL 1,913,200/=

2,061,200/= -do-

9.1.2: RESEARCH/FIELD AND MATERIAL COSTS (Computer Lab.)

Up-keep allowance and transport 530,000/=Processing fee 120,000/=Electrical and electronics components 100,000/=

Subtotal 750,000/=

9.1.3: RESEARCH PROPOSAL PRODUCTION:

Paper 5rims @ 5,000/= 25,000/=

Secretarial services, 30 pages @ 600/= 18,000/=Photocopy, Department level 30 pages @40/=, 20 copies 24,000/=Photocopy, Faculty level 30 pages @40/=, 20 copies 24,000/=Photocopy, Senate level, 30 pages @40/=, 20 copies 24,000/=

Subtotal 115,000/=

9.1.4: THESIS PRODUCTION

Paper 5rims @ 5,000/= 15,000/=Secretarial services 250 pages @ 600/= 15,000/=Diskettes 3 boxes @ 5,000/= 15,000/=Photocopy, 250 pages @40/=, 4 copies 40,000/=Loose bound 4 copies @ 5,000/= 20,000/=Final binding 4 copies @ 6,000/= 24,000/=

Subtotal 264,000/=TOTAL 1,129,000/=

18

Page 19: MARAT ANALYSIS

19

Page 20: MARAT ANALYSIS

9.2: RESEARCH SCHEDULE

ACTIVITY 2000/2001 2001/2002

Nov. Dec. Jan. Feb. Mar. Apr. May Jun. Jul. Aug. Sep. Oct. Nov. Dec. Jan. Feb. Mar. Apr. May Jun Jul. Aug. Sep.

Registration,

literature

review,

Research

Proposal.

Data

warehouse

planning,

Analysis and

Design.

Data

warehouse

Implementat

ion and

Testing.

Thesis

write-up,

production

&

submission.

20

Page 21: MARAT ANALYSIS

9.3: COMMENTS

Date:...............................................................................Signature:...........................

Name: LUNGO, J. H. (Reg.No: HD/TP.

1/2000)

(candidate)

Supervisor's Comments.

....................................................................................................................................

....................................................................................................................................

....................................................................................................................................

Date:..............................................................................Signature:............................

Name:

(Supervisor)

Head of Department's Comments

....................................................................................................................................

....................................................................................................................................

....................................................................................................................................

....................................................................................................................................

Date:.........................................................................Signature:.................................

Name: Dr. H. Twaakyondo

The Head, Department of Computer Science.

21