ub grant project - university at buffalobina/dataintensive/bigdata/evaluationreport_… · national...

39
EVALUATION REPORT YEAR 2: September, 2010 August, 2011 National Science Foundation (NSF) Grant Project CCLI DUE-0920335 PROJECT TITLE: A Comprehensive Framework for Timely Introduction of Emerging Data-Intensive Computing to STEM Audiences REPORT DATE: August 1, 2011 Submitted to: Bina Ramamurthy, Ph.D., Grant Project Director Submitted by: Jeannette G. Neal, Ph.D., Evaluator

Upload: others

Post on 18-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: UB Grant Project - University at Buffalobina/DataIntensive/BigData/EvaluationReport_… · National Science Foundation (NSF) Grant Project CCLI DUE-0920335 PROJECT TITLE: A Comprehensive

EVALUATION REPORT YEAR 2: September, 2010 – August, 2011

National Science Foundation (NSF) Grant Project

CCLI DUE-0920335

PROJECT TITLE:

A Comprehensive Framework for Timely Introduction of

Emerging Data-Intensive Computing to STEM Audiences

REPORT DATE:

August 1, 2011

Submitted to:

Bina Ramamurthy, Ph.D., Grant Project Director

Submitted by:

Jeannette G. Neal, Ph.D., Evaluator

Page 2: UB Grant Project - University at Buffalobina/DataIntensive/BigData/EvaluationReport_… · National Science Foundation (NSF) Grant Project CCLI DUE-0920335 PROJECT TITLE: A Comprehensive

i

Table of Contents

1 Introduction ............................................................................................................................. 1 2 Project Description .................................................................................................................. 2 3 Evaluation Approach ............................................................................................................... 8

4 Planning Evaluation ............................................................................................................... 10 5 Formative Evaluation ............................................................................................................ 15 6 Project-Level Evaluation ....................................................................................................... 17 7 Summative Evaluation ........................................................................................................... 35

8 Appendix A. Letter from SUNY on Certificate Program Approval & Registration ............ 8-1 9 Appendix B. Student Course Evaluation ............................................................................. 9-1

Page 3: UB Grant Project - University at Buffalobina/DataIntensive/BigData/EvaluationReport_… · National Science Foundation (NSF) Grant Project CCLI DUE-0920335 PROJECT TITLE: A Comprehensive

Project Evaluation Report Year 2: 2010-2011

1

1 Introduction

EVALUATION INTRODUCTION

Evaluation Goals and Approach

In compliance with the evaluation goals and guidelines of the NSF, this evaluation is being performed not

only to determine whether the project’s goals are being met and on how different aspects of the project

are working, but also with the intent that the evaluation to help improve the project. The information on

how well project goals are being met and activities are progressing will be used throughout the project to

improve the process. In addition, it is hoped that the evaluation process will provide new insights or new

information that was not anticipated. It is well recognized that the “unanticipated consequences” of a

program are frequently among the most useful outcomes of the assessment endeavor.

We recognize the inherent interrelationships between evaluation and program implementation.

Evaluation is not separate from, or added to, a project, but rather it is part of the project from the

beginning. Planning, evaluation, and implementation are all parts of a whole, and they work best

when they work together.

Furthermore, we recognize that evaluation provides information for communicating to a variety of

stakeholders. It allows projects to better tell their story and prove their worth. It also gives managers

the data they need to report “up the line,” to inform senior decision makers about the outcomes of

their investments.

We recognize that, in response to requirements of the Government Performance and Results Act

(GPRA), NSF has chosen to focus on three general strategic outcomes and that projects will be asked

to provide data on their accomplishments in these areas:

• Developing a diverse internationally competitive and globally engaged workforce of

scientists, engineers, and well-prepared citizens;

• Enabling discoveries across the frontiers of science and engineering connected to learning,

innovations, and service to society; and

• Providing broadly accessible, state-of-the-art information bases and shared research and

education tools.

This evaluation will be comprised of the two major types of evaluation recognized in the educational

realm, namely formative evaluation and summative evaluation.

Formative - The purpose of a formative evaluation is to assess initial and ongoing project activities,

and provide information to monitor and improve the project. The components of formative

evaluation:

1. Implementation evaluation - The purpose of implementation evaluation is to assess whether

the project is being conducted as planned.

2. Progress evaluation - The purpose of a progress evaluation is to assess progress in meeting

the goals of the program and the project.

Summative - The purpose of a summative evaluation is to assess the quality and impact of a fully

implemented project.

Page 4: UB Grant Project - University at Buffalobina/DataIntensive/BigData/EvaluationReport_… · National Science Foundation (NSF) Grant Project CCLI DUE-0920335 PROJECT TITLE: A Comprehensive

Project Evaluation Report Year 2: 2010-2011

2

2 Project Description

PROJECT

Project Rationale

This Phase 2 Grant Project at UB (CCLI DUE 0920335) addresses the important need to educate

undergraduate students about the new technology area of data-intensive computing or big-data

computing. This critical area of study is currently focusing on novel methods for storing peta-scale (and

greater volume) data and algorithms for the efficient processing of these data.

The need to deal with huge volumes of data is being felt across industry, government, and academia. This

includes the needs of intelligence agencies to analyze data from satellite imagery, signal intercepts,

sensors, etc.; medical agencies to analyze CAT, MRI. genetic and other diagnostic data; environmental

agencies analyzing sensor-collected data on air, water, and meteorological conditions; retail companies

such as Wal-Mart and JCPenney analyzing sales data; and organizations in general needing to exploit the

massive document databases on the WWW (several hundred trillion byes of text); among others.

This project aims to improve the big-data preparedness of diverse Science, Technology, Engineering,

and Mathematics (STEM) audiences by defining a comprehensive framework called TIDE (Timely

Introduction of Emerging Data-intensive computing). The approach of the team for this Phase 2 project

leverages the results of their previous project (DUE CCLI A&I 0311473) by expanding its scope to form

a certificate program and also by providing methods to institutionalize the framework developed. TIDE is

a transformative education model that can be applied to other emerging sciences and technologies. Thus

we expect TIDE’s appeal to transcend domain and disciplinary boundaries.

This project also advances education in two areas: (1) big-data computing and (2) pedagogy for

introducing emerging technologies. This project is motivated by and thus will be guided by the team’s

collective experience in grid computing and high-performance computing [43, 51, 52, 63-68] and in

solving domain-specific scientific problems in life sciences [12, 13, 20, 26] and environmental

engineering [42, 58]. A highly qualified industrial and educational consultant will assist in the assessment

of the effectiveness of TIDE

Data-intensive computing or big-data computing has been receiving much attention as a collective

solution to address the data deluge that has been brought about by tremendous advances in distributed

systems and Internet-based computing. An innovative programming model called MapReduce [16] and a

peta-scale distributed file system to support it have revolutionized and fundamentally changed approaches

to large scale data storage and processing. These data-intensive computing approaches are expected to

have a profound impact on any application domain that deals with large scale data, from healthcare

delivery to military intelligence. A new forum called the Big-Data Computing Group [25] has been

formed by a consortium of industrial stakeholders and agencies including NSF and CRA (Computing

Research Associates) to promote wider dissemination of the big-data issues & solutions and to transform

mainstream applications.

Given the omnipresent nature of large scale data and the tremendous impact they have on a wide variety

of application domains, it is imperative that we prepare our workforce to face the challenges in this area.

Motivation for this project arises from the fact that timely introduction of these concepts to our

undergraduates is important for them to remain competitive in a fast-moving global environment. Schools

such as the University of California at Berkley [28] and the University of Washington [15] have

introduced the concepts in their curriculum at various levels. However there exists no systematic

approach to teach the big-data concepts that can be adopted widely for a diverse audience ranging from

typical undergraduates to non-traditional adult learners. There is no educational framework that supports a

cost-effective lab environment that can be assembled for experiential learning of the data-intensive

Page 5: UB Grant Project - University at Buffalobina/DataIntensive/BigData/EvaluationReport_… · National Science Foundation (NSF) Grant Project CCLI DUE-0920335 PROJECT TITLE: A Comprehensive

Project Evaluation Report Year 2: 2010-2011

3

concepts.

This project aims to address these issues through the objectives listed in the next paragraphs.

Project Purpose and Basic Operation

This project aims to address the above-listed issues and help prepare a skilled workforce in data-intensive

(big-data) computing through the following objectives:

1. Define a set of core competencies that are required for research (to advance the field) and for practical

application design (to build systems) in data-intensive computing area.

2. Define a certificate program that consists of five courses that effectively addresses the competencies

defined above: data structures & algorithms, distributed systems, data-intensive computing, domain-

specific course, and capstone project solving a data-intensive domain problem. These courses will be

developed from existing courses as part of this TIDE project.

3. Design and develop curriculum for the courses, including teaching materials such as laboratory

exercises and case studies for practical experimentations.

4. Provide multiple paths (multiple entry points) through the program, to allow maximum and diverse

participation from a variety of audiences, from research students to industrial workforce.

5. Broaden participation in the certificate program by including undergraduates from all departments of

the School of Engineering and Applied Sciences and the collaborating disciplines and industrial

workforce seeking retraining.

6. Mentor underrepresented minorities to enroll in the program and to benefit from it.

7. Disseminate details for implementation of a simple and low-cost laboratory environment for

supporting the courses and the data-intensive storage and computing.

8. Assess the progress and the effectiveness of the program using survey instruments prepared in

consultation with an experienced external evaluator and through periodic meetings.

9. Provide strategies for educators to effectively adopt the TIDE framework.

These objectives will be accomplished by building a comprehensive package of learning materials, from a

curriculum to lab infrastructure based on the strong foundation established by PI Ramamurthy’s earlier

CCLI A&I award [43] for learning grid computing. The current Phase 2 project addresses the “Creating

Learning Material and Teaching Strategies” component of the CCLI solicitation, specifically providing a

model for introduction of an emerging science and technology to STEM undergraduates. It demonstrates

the viability of the project with two significant case studies in diverse disciplines, biomedical and

environmental sciences and engineering.

For TIDE, the team has identified a set of competencies to support data-intensive computing based on

their collective experience in the area and the published literature. A set of five courses is recommended

to address these competencies. A certificate program will be established to cover the five courses. Any

undergraduate student or non-traditional adult learner will need to complete these courses to become

certified in data-intensive computing or as big-data competent. Five courses are needed to cover the topic

in depth and in breath and to provide a complete coverage from basic concepts to advanced research in

the field. This may appear to overload the already heavy curricular requirements of many programs. The

team creatively addresses this challenge by choosing these courses out of existing courses in the

curriculum and revising (modernizing) them appropriately. This approach will also reveal to the students

the context of data-intensive computing and its relationship to currently existing courses. It provides a

systematic approach to educating our workforce about an emerging field and the model established can be

adapted to teaching other emerging areas. The solutions and models developed for these applications will

serve as demonstration instruments during the training sessions we have planned for a diverse audience

from educators to advanced domain scientists.

Page 6: UB Grant Project - University at Buffalobina/DataIntensive/BigData/EvaluationReport_… · National Science Foundation (NSF) Grant Project CCLI DUE-0920335 PROJECT TITLE: A Comprehensive

Project Evaluation Report Year 2: 2010-2011

4

Project Targeted Clients

The clients of the program are:

• Students in the Computer Science and Engineering curriculum at UB.

• Students in fields that need the technology such as Engineering (e.g., Electrical, Mechanical, etc.),

Biology and Biomedical fields (e.g., Biological Sciences, Bioinformatics), Chemistry (e.g., Quantum

Chemistry), Physics (e.g., Particle Physics).

• Underrepresented groups.

• Information technology workforce personnel who need training/retraining.

• People seeking new careers in industry.

Project Team and Responsibilities

Project Team personnel:

• Prof. Bina Ramamurthy, Ph.D., Project Director and Principal Investigator

o Responsibility for technical oversight and management, directing the overall implementation

of TIDE, project reporting, financial management, coordination of activities, and

accomplishment of the project objectives.

o Responsibility for design and implementation of the Data-Intensive Computing Certificate

Program, the individual courses and the course materials.

o Responsibility for providing computing infrastructure.

• Prof. Vipin Chaudhary, Ph.D., Co-Principal Investigator

o Responsibility for support with respect to curriculum and computing infrastructure.

• Associate Dean for Undergraduate Education, John Van Benschoten, Ph.D., Co-Principal Investigator

o Responsibility for administrative support with respect to curriculum.

• Jeannette G. Neal, Ph.D., Consultant

o Responsibility for project evaluation and support with respect to industry advisement.

PI Ramamurthy has excellent background in distributed systems research [40, 41] and HDFS

projects. The details of the proof-of-concept big-data projects implemented by Ramamurthy’s students

can be found at [55]. She will direct the overall implementation of the TIDE Project.

Co-PI Chaudhary has over twenty five years of research experience in distributed systems

research. He has demonstrated his support for the integration of research, education, outreach and training

and is deeply committed to introducing the research results from his several projects into undergraduate

courses. He has led an IGERT program and an REU Site in the area of High Performance Computing

with significant minority and under-represented group participation.

As Associate Dean for Undergraduate Education, Co-PI Van Benschoten is uniquely suited to

assist in the integration of TIDE and the Data-Intensive Computing Certificate Program in the

undergraduate curriculum. Our engineering degree programs are ABET accredited and part of the

accreditation criteria pertain to the use of techniques, skills, and modern engineering tools necessary for

engineering practice. Clearly, TIDE would help to satisfy this criterion. As an Environmental Engineer,

Page 7: UB Grant Project - University at Buffalobina/DataIntensive/BigData/EvaluationReport_… · National Science Foundation (NSF) Grant Project CCLI DUE-0920335 PROJECT TITLE: A Comprehensive

Project Evaluation Report Year 2: 2010-2011

5

he also will be able to assist in identifying interesting domain-specific applications for the capstone

research experience.

Dr. Jeanette Neal played a critical role as an industrial advisor and assessment consultant for the

Phase 1 CCLI project of Dr. Ramamurthy. The collaboration was extremely successful with periodic

meetings and excellent analysis of data that can be found in the annual report of the previous project. She

is a retired Principal Scientist and Program Manager from a Fortune-500 technology organization,

General Dynamics and is very knowledgeable about the needs of industrial stakeholders. She is highly

qualified to fulfill her advisory role on this project.

Project Background

In this section we discuss the driving forces behind the proposed project and the guiding principles that

led the way to its development: (i) large scale distributed system initiatives, (ii) the emerging model of

computation in MapReduce and the supporting distributed file system design and (iii) current educational

efforts to create learning material and courses for this area. We will also describe PI Ramamurthy’s prior

effort and educational experience in grid computing and Co-PI Chaudhary’s experience in high

performance computing and domain research in life sciences.

2.1 Large scale distributed system initiatives

The Internet has matured into mainstream computing at a faster pace than many evolutionary

technologies that came before it. Many organizations including NSF realized the importance of the

rapidly advancing technology and initiated programs such as Partnership for Advanced Computing

Infrastructures (PACI) [39] and NSF middleware initiative [36]. Several middleware tools and software

such as Condor [61] and Globus Toolkits [21, 22] were prototyped and released. Many data grids and

compute grids [4, 18, 19] have been successfully implemented and used.

A blue ribbon panel of experts on the current status of cyber infrastructure projects [2]

recommends that “sustained work is needed on software tools and infrastructure that enable general use of

computing at the highest end, as well as on discipline-specific codes and infrastructure.” These advances

are very nicely captured in the technical reports [3, 23] by late Jim Gray. He discusses the evolution of

distributed computing systems from Seti@home project to web services. He recommends designing

programming models to process information for human consumption and to focus on data-intensive

computing. These issues are directly addressed by distributed file systems such as Google File System

(GFS) [16] and Hadoop Distributed File System (HDFS) [24]. Grid computing focuses on infrastructure

issues and more specifically on facilitating moving data to the location of the computing. On the other

hand, the emerging area of data-intensive computing focuses on efficient storage structures for very large

scale data and to provision the computing where the data is stored. Managing peta-scale data is not

anymore a unique problem to search engines and scientific experiments [5]. Storing and analyzing large

volumes of semi-structured and unstructured data have become a common problem for many domains,

especially with the prolific use of Web 2.0 applications. Meeting the demands of these state-of-the-art

applications depends heavily on scalable storage and processing power. This critical necessity has

resulted in reincarnation of high performance computing and grid computing in the form of massive

parallel computing. However the programming model used is still the traditional one with no special

attention to nature of the data. Next we discuss the MapReduce programming model that exploits the

parallelism offered by a class of data that is “write once and read-many”.

2.2 MapReduce Programming Model

A major issue facing many emerging applications is the large volumes of data from the

heterogeneous sources along with their semantic markup and metadata. Processing large volumes of data

Page 8: UB Grant Project - University at Buffalobina/DataIntensive/BigData/EvaluationReport_… · National Science Foundation (NSF) Grant Project CCLI DUE-0920335 PROJECT TITLE: A Comprehensive

Project Evaluation Report Year 2: 2010-2011

6

in parallel has been enabled by the MapReduce programming model. It uses a map algorithm that

processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce algorithm that

merges all intermediate values associated with the same intermediate key. It exploits the parallelism

afforded by the “write-once and read-many” (WORM) characteristics. While “word count” and “page

rank” on web logs are common examples, there are numerous other domains such healthcare patient data

archives, business analytics, historical environmental weather data where MapReduce can be effectively

used. To understand this programming model, consider a very simple example in a paper that discusses its

application [69].

Figure 1 Map-Reduce Programming Model based on [16]

Consider a web log file with data collected over a period of time. It is a very large file that

records every hit that happens on the site. Our interest is the number of hits at a given time. We split our

task into two sub-tasks: (i) map: gathers the hit information for the timestamp on each line and (ii) reduce:

accumulates the hits that occurred for each timestamp using timestamp as the key. The map algorithm is

to extract the <timestamp hit> pairs (key, value pairs) from the original web log. The input to map

algorithms is a very large file that contains lines of this form:

“192.168.0.5 - - [22/Aug/2005:22:07:52 +0000] "GET / HTTP/1.1" 200 1722”

Output of map stage is pairs like <1345 1 >, <1345 1 >, <1345 1>, <1345 1>, <1346 1> … that results in

<1345 4> <1346 1> after the reduce stage. The first number in the pair is the time stamp and the second

number indicates a hit. The reduce algorithm accumulates the number of hits for each timestamp. The

example is depicted in figure 1. The figure shows the task split into four nodes running map algorithm

and two nodes running reduce algorithm. The output in this case is provided in two parts. It is possible to

process the outputs further using MapReduce model.

A collection of the algorithms or functions used for processing the data for specific application

domain will be written as Mappers and Reducers and will be provisioned at every node of the distributed

file system where the data is stored. The set of mappers and reducers to be used in a certain application is

configurable using special languages [24, 27, 38]. The distributed file storage on which MapReduce

algorithms work is explained next.

2.3 Peta-Scale Distributed File System

Google uses a proprietary distributed file system called the Google File System (GFS) that is a

virtual file that resides on racks and racks storage. (We use the term peta-scale to refer to any very large

scale data larger than or equal to a terabyte.) Yahoo has an equivalent distributed file system initiative

called Hadoop [9, 24] that is an open source implementation of a file system similar to GFS. Currently

Hadoop is an open source project managed by Apache foundation. Hadoop implements MapReduce

Split 0

Split 1

Split 2

Split 3

input

map()

map()

map()

map()

reduce(

reduce(

part1

part2

map tasks reduce tasks output

Page 9: UB Grant Project - University at Buffalobina/DataIntensive/BigData/EvaluationReport_… · National Science Foundation (NSF) Grant Project CCLI DUE-0920335 PROJECT TITLE: A Comprehensive

Project Evaluation Report Year 2: 2010-2011

7

programming model for processing large scale WORM (Write once read many) data using Hadoop

Distributed File System (HDFS).

HDFS has nodes made up of commodity PCs that operate on master-slave mode. Master node

stores the meta-data and manages the file system and is called the NameNode. A number of slave

machines called the DataNodes store the blocks of data in their local file system. Typical block size is

128M or larger and each block is stored in triplicate for reliability. For writing a block of data NameNode

allocates the blocks for the three copies using rack-aware placement policy [38]. For reading a block of

data, the block closest to the processing code is selected. MapReduce operations that process the data are

provisioned at every node. In other words in Hadoop MapReduce model, computation is moved to where

the data is located, obviously due to the scale of data. The HDFS-MapReduce combination provides a

significant departure from the traditional methods for handling large scale data, and it is imperative that

our educational programs respond to this major innovation.

2.4 Existing Education Efforts

How do people learn about MapReduce and HDFS? Google supported the educational effort by

lectures that are available on YouTube as a series of video presentations [33, 70]. Google’s education site

[15] also refers to a set of distributed systems courses by the University of Washington, Rutgers and

Swarthmore that cover MapReduce and Hadoop file systems. Cornell University’s Information Retrieval

course uses MapReduce and Hadoop for one of the projects assigned. Berkley’s infusion of MapReduce

is in the introductory course [28]. Thus MapReduce concept impacts a wide range of courses and is

introduced mostly at higher levels [31, 32]. The Special Interest Group in Computer Science Education

(SIGCSE) 2008 [59] featured lectures from Google and University of Washington. SIGCSE 2009 [60]

features a full day pre-conference workshop by the Cloudera organization. We need more than isolated

exposures like the above. A sustained and focused effort is needed to educate a diverse STEM audience

[6]. We also need an easy-to-use pedagogy, learning material and a transformative solution with real

applications.

Page 10: UB Grant Project - University at Buffalobina/DataIntensive/BigData/EvaluationReport_… · National Science Foundation (NSF) Grant Project CCLI DUE-0920335 PROJECT TITLE: A Comprehensive

Project Evaluation Report Year 2: 2010-2011

8

3 Evaluation Approach

EVALUATION

Evaluation Goals and Objectives

The purpose of the evaluation is to:

• Determine whether the project is meeting its stated goals and objectives according to the proposed

timeline,

• Determine any identifiable strengths and/or weaknesses of the courses, course adaptations, and course

materials, and

• Determine if the project is proceeding as planned.

The evaluation should provide information to improve the project as it develops and progresses. That is,

the evaluation should be a tool that not only measures success, but also contributes to it.

Objects of Evaluation

The objects of evaluation will include the Core Competencies, the Certificate Program in Data-Intensive

Computing, the individual courses that comprise or can be used as credit within the certificate program,

the instructional materials developed for the courses, the promotional/publicity materials, the

dissemination vehicles, among others.

1. The set of Core Competencies that are required for research (to advance the field) and for practical

application design (to build systems) in the data-intensive computing area

2. The Certificate Program in Data-Intensive Computing

o Comprised of five (5) courses: 3 specified required courses and 2 elective courses

o To be offered starting Spring 2010

o The design of multiple paths (multiple entry points) through the program, to allow maximum and

diverse participation from a variety of audiences, from research students to industrial workforce.

3. Coverage and appropriateness of the content and example applications within the area of data-

intensive computing in the certificate program courses.

4. The Courses comprising the Certificate Program, including course materials for the five (5) courses:

o Course 1: CSE 250 Algorithms & Data Structures (CS2) – Enhancement of existing UB CSE

course

Additional lecture materials and handouts

o Course 2: CSE 486/586 Distributed Systems – Modification of existing UB CSE course

Modification of existing departmental course outline

Modification of course syllabus

Additional/supplementary lecture materials, materials/notes for students, case studies,

student lab exercises/projects

o Course 3: CSE 487/587 Data-Intensive Computing – New UB CSE course with the new course

materials listed below

Departmental course outline

Course syllabus

Lecture materials

Supplementary materials/notes and case studies for students

Student hands-on lab exercises/projects

o Course 4: Domain-Specific Course in the student’s major field

Supplementary materials/notes for students

Page 11: UB Grant Project - University at Buffalobina/DataIntensive/BigData/EvaluationReport_… · National Science Foundation (NSF) Grant Project CCLI DUE-0920335 PROJECT TITLE: A Comprehensive

Project Evaluation Report Year 2: 2010-2011

9

Case studies

Hands-on student projects

o Course 5: Capstone Project

Supplementary materials/notes for students

Case studies

Hands-on student projects

5. Computational Infrastructure

o A stable implementation of a peta-byte distributed file system and computational environment

for student use

6. Dissemination and promotion to a wide audience via

o Website(s)

o Promotional/publicity materials, activities, and presentations within UB and external to UB,

including colleges, high schools, and industry

o Conference presentations

7. The broadening of participation in the Certificate Program by including

o Undergraduates at UB from

All UB departments within the School of Engineering & Applied Sciences (SEAS) and

Other non-SEAS collaborating disciplines

o Industrial workforce personnel seeking training/retraining

o Mentoring activities, especially for underrepresented minorities to enroll in the program and to

benefit from it via organizations such as: BEAM, LSAMP, CSTEP, UB Honors Program

8. Dissemination of the TIDE framework and Certificate Program to educators for their implementation

o Dissemination of the certificate program and framework materials including the Certificate

Program design and documents, course outlines, syllabi, course/lecture materials, handouts,

student lab exercises/projects, and details for implementation of a simple and low-cost

computing environment for supporting the courses.

o Dissemination vehicles including documentation in the form of manuals, instructions, &

strategies, conference presentations, and website

Type of Evaluation

This evaluation will consist of two types of evaluation:

1. Formative evaluation.

2. Summative evaluation.

Audiences and Stakeholders

• The ultimate client (recipient) of the evaluation is the National Science Foundation.

• Audience of the evaluation is SUNY at Buffalo.

• Stakeholders of the evaluation are (1) industry partners and companies that may benefit from a better

educated workforce and (2) students that will receive education in data-intensive computing.

Page 12: UB Grant Project - University at Buffalobina/DataIntensive/BigData/EvaluationReport_… · National Science Foundation (NSF) Grant Project CCLI DUE-0920335 PROJECT TITLE: A Comprehensive

Project Evaluation Report Year 2: 2010-2011

10

4 Planning Evaluation

PLANNING EVALUATION

Purpose: To assess understanding of the project's goals, objectives, strategies, and timelines.

Questions (to be used as checklist and the foundation for formative and summative evaluation):

Answers were written and provided by the Grant Project PI during the early planning stage of the project.

(a) Why was the project developed? What is the problem or need it is attempting to address?

Answer:

This project aims to improve the big-data or data-intensive preparedness of the diverse STEM

audiences. Terabytes of data have become the norm in most data environments from scientific

experiments to sensor networks to financial/business databases. Given the increasing need for

industry, government, and academia to deal with large-scale data and the tremendous impact such

data have on a wide variety of application domains, it is imperative that we ready our workforce to

face the challenges in this area. Timely introduction of these concepts to our undergraduates is

important for them to remain competitive in a fast-changing global environment. Schools such as

University of California at Berkley and University of Washington have introduced the concepts in

their curriculum at various levels. However there exists no systematic approach to teach the big-data

concepts that can be adopted widely for a diverse audience ranging from a typical undergraduate to a

non-traditional adult learner. There is no educational framework that supports a cost-effective lab

environment that can be assembled for experiential learning of the data-intensive concepts. This

project aims to address these issues.

Data-intensive computing has been receiving much attention as a collective solution to address the

data deluge that has been brought about by tremendous advances in distributed systems and Internet-

based computing. An innovative programming model called MapReduce [16] and a peta-scale

distributed file system to support it have revolutionized and fundamentally changed approaches to

large scale data storage and processing. These data-intensive computing approaches are expected to

have profound impact on any application domain that deals with large scale data, from healthcare

delivery to military intelligence. A new forum called big-data computing group [25] has been formed

by a consortium of industrial stakeholders and agencies including NSF and CRA (Computing

Research Associates) to promote wider dissemination of the big-data solutions and transform

mainstream applications.

(b) What is the model to be implemented?

Answer: The main objective of the project is to define a comprehensive framework called TIDE

(Timely Introduction of Emerging Data-intensive computing). The approach for this Phase 2 project

leverages the results of the earlier UB project (CCLI A&I) by expanding its scope to form a

certificate program and also by providing methods to institutionalize the framework developed. TIDE

is a transformative educational model that can be applied to teach/learn other emerging sciences and

technologies. Thus the expectation is for TIDE’s appeal to transcend domain and disciplinary

boundaries. It also advances education in two areas: big-data computing and pedagogy for

introducing emerging technologies. The new learning framework offered by the proposed TIDE

project is expected to inspire the Net Generation [37] technology users to be active researchers,

designers and technical innovators contributing to scientific advancement.

(c) How is the model to be implemented?

Page 13: UB Grant Project - University at Buffalobina/DataIntensive/BigData/EvaluationReport_… · National Science Foundation (NSF) Grant Project CCLI DUE-0920335 PROJECT TITLE: A Comprehensive

Project Evaluation Report Year 2: 2010-2011

11

Answer:

TIDE aims to implement the model through the implementation of the objectives listed below. These

objectives also define the Objects of Evaluation listed in the previous section.

• Define a set of core competencies that are required for research (to advance the field) and for

practical application design (to build systems) in data-intensive computing area.

• Define a certificate program that consists of five courses that effectively addresses the

competencies defined above: data structures & algorithms, distributed systems, data-intensive

computing, domain-specific course, and capstone project solving a data-intensive domain

problem. These courses will be developed from existing courses as part of this TIDE project.

• Design and develop curriculum for the courses, including teaching material such as laboratory

exercises and case studies for practical experimentations.

• Provide multiple paths (multiple entry points) through the program, to allow maximum and

diverse participation from a variety of audiences, from research students to industrial workforce.

• Broaden participation in the certificate program by including undergraduates from all

departments of School of Engineering and Applied Sciences and the collaborating disciplines and

industrial workforce seeking retraining.

• Mentor underrepresented minorities to enroll in the program and to benefit from it.

• Disseminate details for implementation of a simple and low-cost laboratory environment for

supporting the courses and the data-intensive storage and computing.

• Assess the progress and the effectiveness of the program using survey instruments prepared in

consultation with an experienced external evaluator and through periodic meetings.

• Provide strategies for educators to effectively adopt the TIDE framework.

(d) Who are the stakeholders (those who have credibility, power or other capital involved in the project)?

Who are the people interested in the project who may not be involved?

Answer: Stakeholders include:

1. The National Science Foundation, which funded the project.

2. Industry partners and companies that may have interest in and may benefit from the

application of the content of the Certificate Program and courses, and that may benefit from a

better educated workforce

3. Students that will receive education in data-intensive computing.

(e) What do the stakeholders want to know? What questions are most important to which stakeholders?

What questions are secondary in importance? Where do concerns coincide? Where are they in

conflict?

Answer:

1. Is the course content & technology the most advanced, up-to-date, leading-edge, commonly

used material of its kind?

2. Will the courses prepare students to deal with anticipated future challenges and to keep up

with future technology advances in this area?

3. Will the students who complete the certificate program be prepared to apply the concepts and

technology to the real-world problems & challenges that they will encounter as part of their

jobs?

Page 14: UB Grant Project - University at Buffalobina/DataIntensive/BigData/EvaluationReport_… · National Science Foundation (NSF) Grant Project CCLI DUE-0920335 PROJECT TITLE: A Comprehensive

Project Evaluation Report Year 2: 2010-2011

12

(f) Who are the participants to be served?

Answer:

1. Students in the Computer Science & Engineering (CSE) curriculum at UB

2. Students in fields that need the technology such as Engineering (e.g., Electrical, Mechanical,

etc.), Biology and Biomedical fields (e.g., Biological Sciences, Bioinformatics), Chemistry

(e.g., Quantum Chemistry), Physics (e.g., Particle Physics)

3. Underrepresented groups

4. Information technology workforce personnel who need training/retraining

5. People seeking new careers in industry

(g) What are the activities and strategies that will address the problem or need which was identified?

What is the intervention? How will participants benefit? What are the expected outcomes?

Answer:

1. Design the new certificate program and institutionalize it: The TIDE Certificate Program in

Data-Intensive Computing will need to be approved by the UB CSE Department and by the

higher level School of Engineering & Applied Sciences (SEAS) which includes the CSE

Department.

2. Develop new course materials concerning data-intensive computing and include & integrate

the materials into the five courses specified for the Data-Intensive Computing Certificate

Program.

3. Prepare the software & hardware infrastructure to support the certificate program and its

courses. A stable implementation of a peta-byte distributed file system will be prepared and

staged with the help of resources available in SEAS, CSE and Dr. Chaudhary’s Accelerated

Computing Laboratory.

4. Publicize the new certificate program among departments within the university and to

incoming students: Once the certificate program is approved, it will be featured on the

university’s public websites and on all the publicity material generated by the university. All

the investigators on the proposal are heavily involved in Discovery Days and Open House for

students where the Data-Intensive Computing Certificate Program will be publicized to non-

university students who are potential candidates for admission into the university and

program. All the departmental chairpersons and undergraduate advisors will be briefed about

the new certificate program.

5. Design and introduce the new course materials and student projects/labs into the individual

courses of the certificate program, with the projects/labs designed to enable students to have

hands-on experience. The plan for development of the certificate program includes the

following: For Course 1, CSE116/250 (Data Structures & Algorithms), integrate/add at least

one lecture on data-intensive computing and provide students with information and handouts

on the new Certificate Program. For CSE486 / 586 (Distributed Systems), integrate at least

one lab project directly related to data-intensive technology. For CSE 487 / 587, change the

name of the course to Data-Intensive Computing and re-design the entire course to focus on

data-intensive computing. For the Discipline-Specific Course, integrate course materials and

lab projects that focus on appropriate applications of the technology into various (probably

existing) project courses in different UB departments. For the Capstone Project course,

develop the materials and lab project(s).

6. Schedule the courses in proper sequence. (See item I below.)

Page 15: UB Grant Project - University at Buffalobina/DataIntensive/BigData/EvaluationReport_… · National Science Foundation (NSF) Grant Project CCLI DUE-0920335 PROJECT TITLE: A Comprehensive

Project Evaluation Report Year 2: 2010-2011

13

7. Evaluate the effectiveness of the NSF grant project, the new TIDE Certificate Program in

Data-Intensive Computing, and the Program’s individual courses using objective surveys and

other techniques. We will evaluate the effectiveness of the individual courses, the overall

effect of the TIDE framework, and keep track of the number of STEM students who enroll in

the program & courses. Based on the evaluation results each year, we will intervene and

improve the content and methodology of the Program and courses.

8. Monitor milestones and adjust: All the participants, including supported students, will meet

periodically to assess the progress and monitor the milestones, and to make any adjustments

warranted. Continuous assessment and improvement will be an important activity to make

TIDE highly effective.

9. Ensure that participants benefit by getting a good understanding of data-intensive computing

and by acquiring a “hands-on” working knowledge of the technology.

10. Disseminate to a wider audience: TIDE program details will be publicized to educators and

industrial stakeholders through workshops in prominent conferences and through publications

and community portals such as a Wikipedia and Facebook [17].

(h) Where will the program be located (educational level, geographical area)?

Answer:

The initial implementation of the Certificate Program and courses will be located at the University at

Buffalo, Buffalo, New York.

(i) How many months of the school year or calendar year will the program operate? When will the

program begin and end? When will the courses be offered?

Answer:

The program will operate 12 months of each year. The Certificate Program courses will be scheduled

in proper sequence. Offering of the first three courses will be evaluated after the first cycle and the

schedules may be modified depending on the demand.

1. Course 1: CS2/CSE116/CSE250 Data Structures & Algorithms is offered by the CSE

department in all semesters (Fall, Spring, Summer).

2. Course 2: CSE486 Distributed Systems is typically offered in at least each Spring

semester, and we will maintain this schedule.

3. Course 3: CSE487 Data-Intensive Computing is a critical new course and is a mandatory

course for all paths and will be offered in the Summer and Fall semesters of each year.

4. Course 4: Domain-Specific Course in the student’s major field. Scheduling depends on

each student’s particular department.

5. Course 5: Capstone Project Course.is offered each semester.

(j) How much does it cost? What is the budget for the program? What human, material, and institutional

resources are needed? How much is needed for evaluation? For dissemination?

Answer: Please see the proposal budget submitted to NSF for this project.

Current: The budget has been broken down into these items in the budget.

Plan: As the project progresses, all expenses will be recorded by category to provide actual expenses

and effectiveness of the initial budget.

Page 16: UB Grant Project - University at Buffalobina/DataIntensive/BigData/EvaluationReport_… · National Science Foundation (NSF) Grant Project CCLI DUE-0920335 PROJECT TITLE: A Comprehensive

Project Evaluation Report Year 2: 2010-2011

14

(k) What are the measurable outcomes which the project wants to achieve? What is the expected impact

of the project in the short run? The longer run?

Answer:

1. The TIDE Certificate Program in Data-Intensive Computing, comprised of a sequence of five

courses, that provides students with both theoretical knowledge of data-intensive computing

and practical hands-on experience with the application of the technology in the students’

major fields of study and specialization.

2. Objective surveys to improve the course content and delivery methods between subsequent

offerings of the courses.

3. All the materials recorded and disseminated for adoption by other educators and trainers.

(l) What arrangements have been made for data collection? What are the understandings regarding

record keeping, responding to surveys, and participation in testing?

Answer: We plan to conduct most of the surveys through the web and to store the data directly in a

database for analysis. The content and the software for these will be designed shortly. The privacy of

the participants will be strictly maintained.

Page 17: UB Grant Project - University at Buffalobina/DataIntensive/BigData/EvaluationReport_… · National Science Foundation (NSF) Grant Project CCLI DUE-0920335 PROJECT TITLE: A Comprehensive

Project Evaluation Report Year 2: 2010-2011

15

5 Formative Evaluation

FORMATIVE EVALUATION

Purpose: To assess ongoing project activities. The intent of the Formative Evaluation is to provide

information to improve the project. Formative Evaluation begins at project start-up and continues

throughout the life of the project. It typically consists of two segments:

1. Implementation Evaluation: To assess whether the project is being conducted as planned.

2. Progress Evaluation: To assess progress in meeting the project's goals.

Evaluation Questions – The Basis of the Evaluation

1. Has the set of Core Competencies for the Certificate Program been defined? These are the

competencies that are required for research (to advance the field) and for practical application design

(to build systems) in the data-intensive computing area

2. Has the Certificate Program in Data-Intensive Computing been designed, developed and

implemented?

o Comprised of five (5) courses: 3 specified required courses and 2 elective courses

o To be offered starting Spring 2010

o Including the design of multiple paths (multiple entry points) through the program, to allow

maximum and diverse participation from a variety of audiences, from research students to

industrial workforce.

3. Does the certificate program, including its courses, provide good coverage and appropriate content

and example applications within the area of data-intensive computing?

4. Have the Courses comprising the Certificate Program, including course materials for the five (5)

courses, been designed, developed, and implemented?

o Course 1: CSE 250 Algorithms & Data Structures (CS2) – Enhancement of existing UB CSE

course

Additional lecture materials and handouts

o Course 2: CSE 486/586 Distributed Systems – Modification of existing UB CSE course

Modification of existing departmental course outline

Modification of course syllabus

Additional/supplementary lecture materials, materials/notes for students, case studies,

student lab exercises/projects

o Course 3: CSE 487/587 Data-Intensive Computing – New UB CSE course

New departmental course outline

New syllabus

New lecture materials

New supplementary materials/notes and case studies for students

New student hands-on lab exercises/projects

o Course 4: Domain-Specific Course in the student’s major field – Special version(s) of existing

UB course(s)

Supplementary materials/notes for students

Case studies

Hands-on student projects

o Course 5: Capstone Project – Special version(s) of existing UB course(s)

Supplementary materials/notes for students

Page 18: UB Grant Project - University at Buffalobina/DataIntensive/BigData/EvaluationReport_… · National Science Foundation (NSF) Grant Project CCLI DUE-0920335 PROJECT TITLE: A Comprehensive

Project Evaluation Report Year 2: 2010-2011

16

Case studies

Hands-on student projects

5. Are the courses effective in facilitating student learning?

6. Has the computational infrastructure been designed and implemented?

o A stable implementation of a peta-byte distributed file system and computational environment

for student use

7. Has dissemination and promotion to a wide audience been achieved via means such as the following?

o Website(s)

o Promotional/publicity materials, activities, and presentations within UB and external to UB,

including colleges, high schools, and industry

o Conference presentations

8. Has participation in the Certificate Program been broadened by including

o Undergraduates at UB from

All UB departments within the School of Engineering & Applied Sciences (SEAS) and

Other non-SEAS collaborating disciplines

o Industrial workforce personnel seeking training/retraining

o Mentoring activities, especially for underrepresented minorities to enroll in the program and to

benefit from it via organizations such as: BEAM, LSAMP, CSTEP, UB Honors Program

9. Has the TIDE framework and Certificate Program in Data-Intensive Computing been disseminated to

educators for their implementation and use, including the following?

o Dissemination of the certificate program and framework materials including the Certificate

Program design and documents, course outlines, syllabi, course/lecture materials, handouts,

student lab exercises/projects, and details for implementation of a simple and low-cost

computing environment for supporting the courses.

o Dissemination vehicles including documentation in the form of manuals, instructions, &

strategies, conference presentations, and website

Page 19: UB Grant Project - University at Buffalobina/DataIntensive/BigData/EvaluationReport_… · National Science Foundation (NSF) Grant Project CCLI DUE-0920335 PROJECT TITLE: A Comprehensive

Project Evaluation Report Year 2: 2010-2011

17

6 Project-Level Evaluation

1. Question: Has the set of Core Competencies for the Data-Intensive Computing Certificate Program

been defined?

o These are the competencies that are required for research (to advance the field) and for practical

application design (to build systems) in the data-intensive computing area.

Answer: Yes.

Evidence:

The Core Competencies for the Program are available at the UB Wiki CSE website for the TIDE NSF

Grant Project and Certificate Program in Data-Intensive Computing: https://wiki.cse.buffalo.edu/tide/

Click on “What is TIDE?” to pull down the list of links, and then click on “Core Competencies” to

view the set of competencies.

Comments:

None.

Recommendations:

• Update the Core Competencies of the Certificate Program as needed in Year 3 of this Grant Project,

and beyond, to keep them current and relevant.

• Keep the Certificate Program Core Competencies prominently available on the important DIC

Certificate program websites, especially:

o UB’s new website (undergoing development) on Data-Intensive Computing Research &

Education funded by this NSF Grant Project:

http://www.cse.buffalo.edu/~bina/DataIntensive/index.html

o Prof. Ramamurthy’s course website:

http://www.cse.buffalo.edu/~bina/cse486/spring2010/index.html

Page 20: UB Grant Project - University at Buffalobina/DataIntensive/BigData/EvaluationReport_… · National Science Foundation (NSF) Grant Project CCLI DUE-0920335 PROJECT TITLE: A Comprehensive

Project Evaluation Report Year 2: 2010-2011

18

2. Question: Has the new Certificate Program in Data-Intensive Computing been designed, developed

and implemented?

o Comprised of five (5) courses: 3 specified required courses and 2 elective courses

o Courses to be offered starting Spring 2010

o Including the design of multiple paths (multiple entry points) through the program, to allow

maximum and diverse participation from a variety of audiences, from research students to

industrial workforce.

Answer: Yes. As part of this NSF Project, the new Certificate Program in Data-Intensive Computing

(DIC) has been designed & developed, is being implemented at UB, was submitted to SUNY and has

received approval from SUNY and has become a registered Certificate Program at the state level.

Certificate Program approvals by UB and SUNY: The new DIC Certificate Program

Was reviewed and officially approved at UB by the office of the Provost for Undergraduate

Education (by Vice Provost A. Scott Weber and Provost Satish K. Tripathi) in November 2010.

Was submitted to the State University of New York for approval & registration at the state level

on November 15, 2010 by UB Provost Tripathi’s office.

In a letter received by UB, dated February 18, 2011 (see Appendix A), the new DIC Certificate

Program has been approved by SUNY and is now a registered program at the state level.

Year 1 Courses offered in Spring 2010:

Two courses, namely Courses 1 & 2, were developed and implemented in Year 1. More details on

the individual courses are covered in the next sub-section as part of the next Object of Evaluation.

Course 1: CSE 250 Algorithms & Data Structures (CS2) – Enhancement of existing UB CSE

course

Additional lecture materials and handouts

Course 2: CSE 486/586 Distributed Systems – Modification of existing UB CSE course

Modification of existing departmental course outline

Modification of course syllabus

Additional/supplementary lecture materials, materials/notes for students, case studies,

student lab exercises/projects

Year 2 Courses offered in Fall 2010:

Course 1: CSE 250 Algorithms & Data Structures (CS2) – Further enhancement/updating of

existing UB CSE course

Course 3: CSE 487/587 Data-Intensive Computing – New UB CSE course

Development of new departmental course outline

Development of new course syllabus

Development of lecture materials, materials/notes for students, case studies, student lab

exercises/projects, tests, etc.

Year 2 Courses offered in Spring 2011:

Course 1: CSE 250 Algorithms & Data Structures (CS2) – Further enhancement of existing UB

CSE course, including additional lecture materials and handouts

Course 2: CSE 486/586 Distributed Systems – Further modification & refinement of previously

Page 21: UB Grant Project - University at Buffalobina/DataIntensive/BigData/EvaluationReport_… · National Science Foundation (NSF) Grant Project CCLI DUE-0920335 PROJECT TITLE: A Comprehensive

Project Evaluation Report Year 2: 2010-2011

19

existing UB CSE course, which was first modified in Year 1 for the DIC Certificate Program

Refinement of departmental course outline

Refinement of course syllabus

Additional/supplementary lecture materials, materials/notes for students, case studies,

student lab exercises/projects

Year 1 & 2:

Course 4: Domain-Specific Course in the Student’s Major Field - Existing courses in the

different disciplines have been identified to serve.

Multiple Paths through the Certificate Program:

This topic has been addressed in Years 1 & 2, and official documentation of this topic is being

developed for the public.

Evidence:

Detailed information on the new Certificate Program in Data-Intensive Computing at UB is available at:

UB Wiki CSE TIDE website: https://wiki.cse.buffalo.edu/tide/

UB Data-Intensive Computing Research and Education website:

http://www.cse.buffalo.edu/~bina/DataIntensive/index.html

UB Prof. Bina Ramamurthy’s personal faculty website:

http://www.cse.buffalo.edu/~bina/index.html

Letter from SUNY to UB Provost Lavallee stating that the DIC Certificate Program was approved

and is now a registered program at the state level (see Appendix A).

The UB Academic Catalog will list the new DIC Certificate Program and provide information on

the program.

The above-listed websites provide information on topics including:

What is data-intensive computing? Topics include: motivation and purpose, the approach being

developed as part of this NSF Grant Project, and the definition of data-intensive computing concepts

Background topics include: Large scale distributed systems, MapReduce programming, peta-scale

distributed file systems, and existing educational efforts

Project Plan. Topics include: implementation, sample programs, evaluation, mentoring activities,

dissemination activities, broader impact

Multiple Paths: This topic was addressed in Years 1& 2 and, in fact, several industry workers (employed

full-time in industry) enrolled in the Certificate Program courses and plan to complete the Certificate

Program. These people are examples of students already utilizing different “non-traditional” paths

through the program.

Recommendations:

• Continue the identification and/or development of Certificate Program Course 4 & Course 5 for the

various participating academic program departments.

• Update & publicize documentation of the multiple paths through the certificate program.

Page 22: UB Grant Project - University at Buffalobina/DataIntensive/BigData/EvaluationReport_… · National Science Foundation (NSF) Grant Project CCLI DUE-0920335 PROJECT TITLE: A Comprehensive

Project Evaluation Report Year 2: 2010-2011

20

3. Question: Do the certificate courses at UB cover appropriate content and example applications

within the area of data-intensive computing?

Answer: Yes.

Evidence:

In Year 1, the course materials for existing courses, Course 1 CSE 116/250 and Course 2 CSE 486, were

modified and enhanced.

In Year 2, Fall Semester 2010, the course materials for Course 1 (developed in Year 1) were further

updated & enhanced and the new Course 3, CSE 487/587, was developed & implemented, and the course

outline, syllabus, course materials, etc., were developed & implemented for the new Course 3.

In Year 2, Spring Semester 2011, the course materials for Course 1 (developed in Year 1) were

further updated & enhanced and Course 2, CSE 486/586, was further developed & implemented,

and the course outline, syllabus, course materials, etc., were further developed & implemented

for the new Course 2.

All the course outlines for Courses 1, 2, & 3 cover content material that is consistent with state-of-the-

art journal articles, technical publications, conference presentations, books, etc.

All the lecture materials, demos, and lab exercises/projects cover example applications in a variety of

areas.

In Year 1, for Course 2, CSE 486 Distributed Systems, new course materials were developed to

introduce the MapReduce programming model, the Hadoop Distributed File System (HDFS),

Amazon EC2 cloud demos; virtualization demos; and a local Hadoop cluster architecture. The course

was run in the Spring Semester 2010 of Year 1.

In Year 2, the new Course 3, CSE 487/587 Data-Intensive Computing, was developed and it ran in

the Fall Semester 2010. This course provides an intense coverage of the storage models, application

architectures, middleware, and programming models that address challenges in ultra-scale data.

Topics include: the motivating real-world problems, storage requirements of big data, organization of

big data repositories such as Google File System (GFS), characteristics of Write-Once-Read-Many

(WORM) data, semantic organization of data, data-intensive programming models such as

MapReduce, fault-tolerance and performance, services-based cloud computing middleware,

intelligence discovery methods, and scalable analytics and visualization. Tools that are used in the

new course include the Google App Engine (GAE), Amazon EC2, and Windows Azure.

In Year 2, the Course 2, CSE 486 Distributed Systems, was again offered and updated/refined in the

Spring semester 2011.

The updated course syllabus for CSE 486 for Spring 2011, is available at:

http://www.cse.buffalo.edu/~bina/cse486/spring2011/DescriptionJan19.pdf

The course syllabus and description for CSE 487/587 for Fall 2010 Semester, are available at:

http://www.cse.buffalo.edu/~bina/cse487/fall2010/

Comments:

All of the courses and course materials developed as part of this Grant Project for the Certificate

Program include in-depth coverage of topics, and a good spectrum of example applications. The lecture

Page 23: UB Grant Project - University at Buffalobina/DataIntensive/BigData/EvaluationReport_… · National Science Foundation (NSF) Grant Project CCLI DUE-0920335 PROJECT TITLE: A Comprehensive

Project Evaluation Report Year 2: 2010-2011

21

materials include PowerPoint slides that are very professionally done. The materials do not just cover the

in-depth coverage of topics, but also cover the motivation and rationale for Data-Intensive Computing,

interesting examples of the technology and real world applications, available resources & tools including

those available on the Internet, how the technology is continuing to evolve and change, relevant special

challenging issues such as protecting massive data stores against security/privacy threats, opportunities

for students, and how to get involved.

The lab projects, particularly for the significant courses Course 2 CSE 486 and Course 3 CSE

487/587, are well thought out & well designed, include good explanatory material and good instructions

for accomplishing the exercises. The lab exercises focus on critical topics including Web Services,

dealing with peta+-sized data stores, and data-intensive programming. As mentioned above, the lab

exercises provided students with the opportunity to implement real world applications.

Recommendations:

• Continue to refine and improve all course materials based on lessons learned and student feedback.

• Include and maintain currency of example real-world problems & applications that require data-

intensive computing or cloud computing approaches in the Certificate Program courses.

• Include comparisons with traditional computing approaches to illustrate the shortfalls of traditional

approaches to the big-data problems/applications in all the Certificate Program courses.

• Ensure that the course content of all the Certificate Program courses is kept up-to-date with technical

developments in the field.

• Continue to expand and improve the variety and quality of example applications and case studies used

in lectures and projects in all the Certificate Program courses.

Page 24: UB Grant Project - University at Buffalobina/DataIntensive/BigData/EvaluationReport_… · National Science Foundation (NSF) Grant Project CCLI DUE-0920335 PROJECT TITLE: A Comprehensive

Project Evaluation Report Year 2: 2010-2011

22

4. Question: Have the courses comprising the Certificate Program, including course materials for the

five (5) courses, been designed, developed and implemented?

o Course 1: CSE 116/250 (CS2) – Modification of existing UB CSE course

o Course 2: CSE 486/586 Distributed Systems – Modification of existing UB CSE course

o Course 3: CSE 487/587 Data-Intensive Computing – New UB CSE course

o Course 4: Domain-Specific Course in the student’s major field – New special version(s) of

existing UB course(s)

o Course 5: Capstone Project - New special version(s) of existing UB course(s)

Answer:

Year 1: Completion was achieved for the design, development and implementation of Course 1 &

Course 2 in Year 1. These courses ran in Year 1. Also, the course outline for Course 3, scheduled to

be taught for the first time in Year 2, was initially developed in Year 1.

Year 2: Completion was achieved for the design, development and implementation of Course 3 in

Year 2. Course 1 & Course 2 were also updated & enhanced in Year 2.

Year 1 & 2: Identification & specification of the remaining two courses, Course 4 & 5, scheduled to

run in upcoming Project Year 3, were addressed in Year 2.

Evidence:

Course materials were developed & delivered for each of Course 1, 2, & 3. These courses ran in

Years 1 & 2, and the materials were distributed to students and used in these classes.

Course 1: Implementation of Course 1 consisted of implementing modifications to existing course

CSE 116/250. These modifications were minor, namely the enhancement/replacement of one week’s

lectures and development of associated handouts.

Course 2: Implementation of Course 2 consisted of modifications that were implemented in CSE

486/586. These modifications were much more extensive. The new course materials for CSE 486/586

included:

Enhanced/modified course outline and course syllabus.

Additional lecture and presentation materials on data-intensive computing.

New data-intensive computing examples.

New lab exercises/projects on data-intensive computing.

Enhanced/modified final exam.

Enhanced/modified course and informational websites.

Course 3: Course 3, CSE 487/587 Data-Intensive Computing is a new course at UB, and the course

outline and new course materials for the new Course 3 were developed, and the course was offered

for the first time in the Fall 2010 Semester. New course materials for CSE 487/587 included:

New course outline and course syllabus.

New lecture and presentation materials on data-intensive computing.

New data-intensive computing examples & demos.

New lab exercises/projects on data-intensive computing.

New tests & final exam.

Page 25: UB Grant Project - University at Buffalobina/DataIntensive/BigData/EvaluationReport_… · National Science Foundation (NSF) Grant Project CCLI DUE-0920335 PROJECT TITLE: A Comprehensive

Project Evaluation Report Year 2: 2010-2011

23

New course and informational websites

Course 1, CSE 116/250, is scheduled to run every semester at UB, and indeed, ran every semester

during the time-frame of this grant project.

Course 2 CSE 486 is scheduled to run every Fall & Summer semester at UB, and indeed, ran during

the Spring 2010 & Summer 2010 semesters, and during the Spring 2011 & Summer 2011 semesters.

Course 3 CSE 487/587 is scheduled to run every Spring semester at UB, and indeed, ran during the

Spring 2011 semester.

Course materials for CSE 486/586 and CSE 487/587, including course outline, syllabus, lecture

materials and lab exercises, are available at Prof. Ramamurthy’s course website:

http://www.cse.buffalo.edu/~bina/

The results of the completed CSE 486 and CSE 487/587student course evaluation questionnaires are

reported in this Evaluation Report and included in Appendix B of this report.

Comments:

The Project Team plans to continue to update the developed materials for Courses 1, 2, & 3

during Year 3 of this Grant Project, based on faculty experience in using the materials and on student

evaluation feedback.

The project plan schedule for this Grant Project specifies further identification and

implementation of Courses 4, & 5 in the next year of this Grant Project.

Recommendations:

• Continue to run the Data-Intensive Computing courses at UB and continue to improve and update the

courses & instructional materials to keep them current and relevant to solving the real-world

problems of industry.

• In Year 3, update the courses developed in Years 1 & 2 based on lessons learned, feedback from

students via the Student Course Evaluation Questionnaire, and interactions with faculty at other

institutions who are also implementing courses on data-intensive computing.

• In Year 3, implement Courses 4 & 5 scheduled for the next year of this Grant Project

Page 26: UB Grant Project - University at Buffalobina/DataIntensive/BigData/EvaluationReport_… · National Science Foundation (NSF) Grant Project CCLI DUE-0920335 PROJECT TITLE: A Comprehensive

Project Evaluation Report Year 2: 2010-2011

24

5. Question: Are the courses effective in facilitating student learning?

Answer: Yes.

Evidence:

In Year 1, Spring 2010 semester, performance of students in Course 2, CSE 486, on lab projects and

final exam.

In Year 1, Spring 2010 semester, feedback from students in Course 2, CSE 486 obtained from the

Student Course Evaluation Questionnaires.

In Year 2, Fall 2010 semester, performance of students in Course 3, CSE 487/587, on lab projects and

final exam.

In Year 2, Fall 2010 semester, feedback from students in Course 3, CSE 487/587, obtained from the

Student Course Evaluation Questionnaires.

For Year 1 & Year 2, the Student Course Evaluation Questionnaires, and the corresponding results,

for CSE 486 and CSE 487/587 are included in Appendix C of this Evaluation Report.

In Year 2, UB student Nicholas Demagio, a UB Physics major, took the CSE 487 Data-Intensive

Computing course in Fall 2010. The NYC company, Brilig LLD., contacted UB, expressing

interested in students from this CSE487 Data-Intensive Computing course for their company. UB

student Nicholas Demagio completed the CSE 487 course and his Bachelor’s Degree in December

2010, and was hired by the Brilig company in December 2010 and has begun a career at this

company.

In Year 2, additionally, three UB students (two undergraduates and one graduate) were selected

because of their coursework & education in the DIC Program courses, and were awarded internships

with funding support by CUBRC to work on one of the CUBRC R&D projects. The funding support

started in January 2010 and continued through the Spring 2011 and Summer 2011 semesters.

(CUBRC is a not-for-profit independent R&D company in Buffalo).

The three above-mentioned students plan to use their R&D work at CUBRC in their Capstone Project

& BS Thesis.

Comments:

Year 1:

Spring 2009 Semester – Course 2, CSE 486:

First, the students' performance in CSE 486 in Spring 2009 on the final exam indicated successful

mastery of the course content for the most part.

For CSE 486, there were 24 students enrolled. The final grade distribution for the class is as follows:

o 4 A, 7 A-, 4 B+, 4 B, 1 B-, 2 C+, 2 F

Second, the results of the Student Course Evaluation Questionnaire were very positive. See Appendix

C of this Evaluation Report for details.

Highly rated course features, as indicated by the results of the Student Course Evaluation

Questionnaire, for CSE 486:

• Students felt they got a good introduction to data-intensive computing and its future potential

Page 27: UB Grant Project - University at Buffalobina/DataIntensive/BigData/EvaluationReport_… · National Science Foundation (NSF) Grant Project CCLI DUE-0920335 PROJECT TITLE: A Comprehensive

Project Evaluation Report Year 2: 2010-2011

25

• The course increased their interest in data-intensive computing and distributed systems.

• The lab exercises/projects helped them learn the course material.

• The grading of the lab exercises/projects was fair.

• The topics covered will be useful to them in the future, beyond CSE 486.

• The provided computer resources were adequate to do the lab exercises and assignments.

Year 2:

Fall 2010 Semester – Course 3, CSE 487/587:

The students' performance in CSE 487/587 in Fall 2010 on the final exam indicated successful

mastery of the course content.

For CSE 487/587, in the Fall 2010 semester, there were 22 students enrolled. The final grade

distribution for the class is as follows:

o 10 A, 8 A-, 3 B+, 1 B

Also, the results of the Student Course Evaluation Questionnaire were very positive. See Appendix C

of this Evaluation Report for details.

Highly rated course features, as indicated by the results of the Student Course Evaluation

Questionnaire, included for CSE 487/587:

• Students felt they learned a lot about data-intensive computing and its real-world value &

applications.

• The course increased their interest in the field.

• The lab exercises/projects helped them learn the course material.

• The grading of lab exercises/projects and tests was fair.

• The topics covered will be useful to them in their future educational & workplace endeavors.

• The computer resources were adequate to do the lab exercises and assignments.

Spring 2011 Semester – Course 2, CSE 486:

The students' performance in CSE 486 in Spring 2011 on the final exam indicated successful mastery

of the course content.

For CSE 486, in the Spring 2011 semester, there were 142 students initially enrolled and 112 students

completed the course. The final grade distribution for the class is as follows:

o ?? A, ?? A-, ?? B+, ?? B, ?? B-, ?? C, ?? C-, ?? D, ?? F

(** Awaiting receipt of the above data from Prof. Ramamurthy **)

Results of the Student Course Evaluation Questionnaire were very positive. See Appendix C of this

Evaluation Report for details.

Recommendations:

• Continue to refine and improve the course materials for Courses 1, 2, & 3 in subsequent semesters.

• Continue to improve TA support to students.

• Continue the good work of Years 1 & 2 into Year 3 when developing/identifying the remaining

certificate courses (Certificate Program Courses 4, & 5) scheduled in Year 3 of this Grant Project.

Page 28: UB Grant Project - University at Buffalobina/DataIntensive/BigData/EvaluationReport_… · National Science Foundation (NSF) Grant Project CCLI DUE-0920335 PROJECT TITLE: A Comprehensive

Project Evaluation Report Year 2: 2010-2011

26

6. Question: Has the Computational Infrastructure been designed and implemented?

o A stable implementation of a peta-byte distributed file system and a computational environment

for student use

Answer: Yes.

Evidence:

In Year 1, CSE 486 students were provided with choices including the following:

(1) At UB, a NEXOS computing facility comprised primarily of PCs with Java Netbeans, Tomcat, JSP, a

5-node HDFS running HDFS on Ubuntu 8.04, and connectivity to the central Oracle DBMS server.

(2) A computer development environment comprised of Eclipse, Java Netbeans, Tomcat, JSP, Oracle or

MySQL, which a student could implement on their own personal computer.

In Year 2, CSE 486 & CSE 487/587 students were provided with choices including the following:

(1) The Google Application Engine Environment available via the Internet (at no cost to the student).

(2) The Amazon Elastic Compute Cloud (Amazon EC2) available via the Internet. The user must pay

for the cost of using these services. UB students were provided funding by Amazon grants.

(3) The Microsoft Windows Azure for Cloud Computing (the user must pay for the cost of services; UB

students were provided funding by an MS grant).

Comments:

The UB Project Team has investigated, and continues to investigate & evaluate, the advantages

and disadvantages of available existing and newly emerging computing environments for data-intensive

computing. The goal is to provide students with good computing platform/environment choices, along

with associated evaluation information.

As a result of the results of the Team’s Year 1 investigations for CSE 486 and continuing in the

Team’s Year 2 investigations for CSE 486 and CSE 487/587, the Team plans to focus on the use of the

computing environments identified in Year 2, namely the Google Application Engine Environment,

Amazon Elastic Compute Cloud, and MS Windows Azure. The other choices listed above for Year 1 will

still be available to students, but the Google, Amazon, and MS development environments will be

recommended more highly.

With regard to cost, note that most of the different alternative computing environments have their

associated cost. For example, JSP, Oracle/MySQL, and others, on their personal computer, the cost

includes the time they will need to invest in getting the tools installed on their computer and set up and

integrated so as to work together, which may be no small feat. For the Internet-based tools, the Google

Application Engine Environment is reportedly free of charge, but to use Amazon facility for students. For

EC2 and MS Azure, the user is charged a fee for the computing environment at UB it would be UB’s cost

of labor & equipment to provide and support the computing the person using their own suite of tools such

as Eclipse, Java Netbeans, Tomcat, for using the computational environment, including computing & data

storage services (the user needs to have their credit card handy).

To help with the computing costs for students for Year 2, we are pleased to report that grant

awards were received by UB from Amazon and Microsoft:

An Amazon EC2 educational grant, in Fall 2010, provided approximately $2,500 worth of computing

support to our project at UB for educational purposes. This grant is to provide computing resources

for students in the Certificate Program in Data-Intensive Computing.

An Amazon EC2 educational grant, in Spring 2011, provided approximately $7,400 from Amazon for

Page 29: UB Grant Project - University at Buffalobina/DataIntensive/BigData/EvaluationReport_… · National Science Foundation (NSF) Grant Project CCLI DUE-0920335 PROJECT TITLE: A Comprehensive

Project Evaluation Report Year 2: 2010-2011

27

students, $50/student, providing ample funding for student usage of Amazon EC2 for coursework

An initial Microsoft Azure account has donated approximately $3,500 worth of cloud computing

resources to our project for educational purposes. An initial Microsoft Azure account, by way of an

MSDN Premium Subscription, provided 750 hours of cloud compute and 10GB of cloud storage per

month for the first 8 months.

Recommendations:

• Continue keeping abreast of and continue evaluating the existing & emerging data-intensive

computing environments and select the best available state-of-the-art choices for students to use when

performing the lab exercises/projects assigned as part of the Certificate Program courses.

• For the UB lab facility, continue to update the hardware & software and provide on-going

maintenance support.

Page 30: UB Grant Project - University at Buffalobina/DataIntensive/BigData/EvaluationReport_… · National Science Foundation (NSF) Grant Project CCLI DUE-0920335 PROJECT TITLE: A Comprehensive

Project Evaluation Report Year 2: 2010-2011

28

7. Question: Have dissemination and promotion to a wide audience via the following been achieved?

o Development of promotional/publicity materials

o Promotional/publicity activities, and presentations within UB and external to UB, including

colleges, high schools, and industry

o Website(s)

o Conference presentations

Answer: Yes.

Evidence:

Development of promotional materials:

A flyer/handout was created in Year 1 and updated in Year 2 on the UB CSE Certificate Program

in Data-Intensive Computing. The flyer is designed for distribution to interested individuals &

students and to UB departments & personnel in response to individual inquiries and at

presentations, classes, workshops, conferences, meetings, etc.

A copy of the flyer/handout is available at:

http://www.cse.buffalo.edu/~bina/TideCertificateDec8.pdf

Since the DIC Certificate Program was approved at the state level by SUNY, the program is

included in the UB catalog, website, and other University-level promotional materials.

Certificate Program students:

In Year 1, we had two Certificate Program students: one from local industry (Andy Small) and

another, a female adult learner at UB (Eva Slate). These two students were each awarded one of

the 12 student stipends, $1,000 each, provided by this NSF Grant.

In Year 2, several students in the Certificate Program courses were being considered for

scholarship awards. These students include Mohit Bansal (awarded a funded CUBRC internship

working on a CUBRC project & graduating at the end of Summer 2011), Bethany Griswald (a

sophomore enrolled in CSE 486 in Spring 2011), Bich Vu (a sophomore enrolled in CSE 486 in

Summer 2011), and Edward Poon (a sophomore enrolled in CSE 486 in Spr2011).

Promotional/publicity activities and presentations both within UB & external to UB:

UB CSE Department: Data-Intensive Computing and the DIC Certificate Program are discussed

in CSE 116 “Introduction to CS for Majors II” each semester and the promotional/informational

DIC Certificate Program handout is distributed. Note that CSE 116 is a pre-requisite for Course 1

CSE 250 of the Certificate Program.

UB Departments: During Years 1 & 2, Prof. Ramamurthy has met with many of the science and

technology departments at UB to discuss data-intensive computing, the DIC Certificate Program,

the individual courses, and the relevance of data-intensive computing to their field, program, and

needs. Several of these departments plan to collaborate with this NSF Grant Project team and the

CSE Department to make the Certificate Program and its individual courses available to their

students. Inter-departmental discussions have included courses, technology, computing and data-

storage environments, student projects, etc.

High Schools: In Year 1, Prof. Ramamurthy spent a day at Hutch Tech High School, Buffalo,

NY, serving as a panel member for a special day of programs organized by the state-wide STEP

(Science and Technology Entry Program) organization. STEP is an ongoing New York State

Education Department-supported college preparatory program for teenagers. STEP began in 1986

to encourage and prepare more underrepresented minorities and low income secondary school

Page 31: UB Grant Project - University at Buffalobina/DataIntensive/BigData/EvaluationReport_… · National Science Foundation (NSF) Grant Project CCLI DUE-0920335 PROJECT TITLE: A Comprehensive

Project Evaluation Report Year 2: 2010-2011

29

students for entry into scientific, technical, health, and health related professions, including many

areas where licensure is required. During the Hutch Tech High School full-day event, three

sessions of the planned program were conducted to three different audiences, each comprised of

students in the 10th, 11

th, and 12

th grades. Each audience was composed of approximately 300

students, for a total of approximately 900 students for the day. Prof. Ramamurthy participated in

all three programs.

Industry: In Year 1, Northrup Grumman donated a gift of $500 to UB in support of the newly

developing Certificate Program. The funds were used to purchase items such as scientific

calculators, USB flash drives, etc., that were used as gifts/awards to winners of the STEM Quiz

at the special STEP event held at Hutch Tech High School.

High Schools: In Year 1, during the special STEP full-day of programs at Hutch Tech High

School, Prof. Ramamurthy also participated in a special lunch that was provided for female high

school students to enable them to interact with professional women working in science and

technology fields in order for them to gain more in-depth insight into issues, concepts, potential

concerns, etc., associated with a career in science and technology for a woman.

Industry: In Year 1, Prof. Ramamurthy served as a panel member for a special one-evening

meeting of the NSBE (National Society of Black Engineers) at UB. Fifty (50) people attended the

NSBE event.

Industry: In Year 2, Prof. Ramamurthy gave an industry presentation at a full-day meeting of our

NSF-UB DIC Grant Project Team, CUBRC (Calspan-UB Research Center), and General

Dynamics on November 4, 2010. The meeting focused on R&D topics, opportunities, and

pursuits of mutual interest. Prof. Ramamurthy’s presentation was entitled “Cloud Computing: Its

Capabilities and Limitations”. Prof. Ramamurthy interacted with meeting participants, gathered

relevant information, and “spread the word” regarding the NSF Grant Project work being

conducted at UB.

Industry: In Year 2, as a result of the above-mentioned meeting, three UB students (two

undergraduates and one graduate) were awarded funded internship positions at CUBRC on one of

the CUBRC R&D projects. The internships started in January 2011 and continued through the

Spring 2011 and Summer 2011 semesters (CUBRC is a not-for-profit independent R&D company

in Buffalo). Prof. Ramamurthy supervises the work of the students and weekly meetings are held

between the involved CUBRC and UB personnel.

Industry: In Year 2, Prof. Ramamurthy gave an industry presentation at a full-day meeting of our

NSF-UB DIC Grant Project Team and the Monroe County Library System staff on December 2,

2010, in Rochester, NY. Prof. Ramamurthy’s presentation was entitled “Cloud Computing and

Migrating Your Systems to the Cloud”. The purpose of the meeting was to explore the

applicability of data-intensive computing to the needs of the Monroe County Library System,

which includes Rochester, NY.

Industry: In Year 2, soon after her presentation to the Monroe County Library System staff on

December 2, 2010, (listed above), a copy of Prof. Ramamurthy’s briefing slides from her

presentation were requested for Eastman Kodak, Rochester, NY, for consideration in their

development of a disaster recovery plan.

Middle Schools: In Year 2, Prof. Ramamurthy visited Gaskill Middle School in Niagara County,

NY, on March 22, 2011, and gave a presentation that included coverage of the motivating

problems, solution approaches, commercially provided Internet-based solutions, live interactive

demos of cloud-computing features, and Q&A sessions. With the help of a small donation from

Northrop Grumman Corporation, Prof. Ramamurthy participated in the procurement of some

basic science kits to the school’l science labs. Dr. Jessical Poulin, Prof. Ramamurthy, and a

CSUGS student visited Gaskill and made a presentation & demonstrated PopWorld, the

Page 32: UB Grant Project - University at Buffalobina/DataIntensive/BigData/EvaluationReport_… · National Science Foundation (NSF) Grant Project CCLI DUE-0920335 PROJECT TITLE: A Comprehensive

Project Evaluation Report Year 2: 2010-2011

30

evolutionary Biology tool being developed on another NSF grant project at UB.

CSTEP (Collegiate Science and Technology Entry Program): Prof. Ramamurthy has served as a

faculty mentor for under-represented minority students for more than 10 years. In Summer 2011,

Prof. Ramamurthy served as mentor for Damian Ogbonna, a CEN sophomore major.

NSBE (Natioal Society of Black Engineers): Prof. Rmamurthy served as a Technology panel

member on October 22, and was honored at the Annual Awards event on April 22, 2011.

Websites:

Three websites were established in Year 1 and updated & expanded in Year 2 to disseminate and

share the course materials, courseware, and other products of this Grant project with faculty

within UB and at other institutions. These websites include:

The UB Wiki CSE website for the TIDE NSF Grant Project and Certificate Program in Data-

Intensive Computing: https://wiki.cse.buffalo.edu/tide/

Prof. Ramamurthy’s website which includes coverage of this NSF Grant Project and individual

courses: http://www.cse.buffalo.edu/~bina/ This website includes a link to the website for Course

2 of the Certificate Program: http://www.cse.buffalo.edu/~bina/cse486/spring2010/index.html

UB’s new website (a work in progress undergoing development) on Data-Intensive Computing

Research & Education funded by this NSF Grant Project:

http://www.cse.buffalo.edu/~bina/DataIntensive/index.html

Conference attendance and presentations:

The Project PI, Prof. Ramamurthy, has attended and participated in several conferences, workshops,

and meetings during Years 1 & 2 to "spread the word" regarding the accomplishments and products

of this grant project. These include:

Year 1:

Prof. Ramamurthy attended the CIGSE 2010 conference during March 9-13 in Milwaukee, WI.

The conference included a full-day pre-conference workshop on the MS Azure Cloud and a 4-

hour workshop on the Amazon Cloud Environment with hands-on experience. Prof. Ramamurthy

interacted with conference participants, gathered relevant information and “spread the word”

regarding the Grant Project work being conducted at UB.

Prof. Ramamurthy attended the 1st Symposium on Cloud Computing held in Indianapolis, IN

during June 9-11, 2010. Again, Prof. Ramamurthy exploited this opportunity to expand her

knowledge in the area of cloud computing, gathered relevant information, and “spread the word”

regarding the Grant Project work being conducted at UB.

Prof. Ramamurthy attended and gave presentations at ICAET 2010 in Chennai, India:

Conference presentation given by Prof. Ramamurthy at the Women-in-Computing Conference at

ICEAT 2010, Chennai, India, on 6/23/2010 entitled “Data-intensive Computing”. This

presentation included coverage of the UB NSF Grant Project, the Certificate Program being

developed at UB as part of the Project, motivation and purpose, the TIDE approach, definition of

data-intensive computing concepts, solution approaches and technology, and examples. http://www.cse.buffalo.edu/~bina/DataIntensiveComputingJun24.pdf

Conference presentation given by Prof. Ramamurthy at ICEAT 2010, Chennai, India, on

6/24/2010 entitled “Cloud: The Next Generation Computer”. This presentation included coverage

of the UB NSF Grant Project, cloud computing and its relevance to big-data and data-intensive

computing, motivating problems and challenges, cloud computing concepts and models, and

solution technology. http://www.cse.buffalo.edu/~bina/TheCloudJune24.pdf

Page 33: UB Grant Project - University at Buffalobina/DataIntensive/BigData/EvaluationReport_… · National Science Foundation (NSF) Grant Project CCLI DUE-0920335 PROJECT TITLE: A Comprehensive

Project Evaluation Report Year 2: 2010-2011

31

Year 2: Invited Presentations/Participation (Spring & Summer 2011)

Jan 26-28: AAAS: Transforming Undergraduate Education in STEM PI Conference, Washington

D.C. Poster on NSF CCLI NEXOS Project.

Jan 30- Feb 1: Computing Education for the 21st Century Community Meeting, New Orleans,

LA. Based on CI-TEAM Evolutionary Biology Tool POP!World Project

March 9 -12: SIGCSE 2011: Special Interest Group in Computer Science Education, Dallas, TX.

NSF Showcase presentation: Cloud-enabling STEM Education

May 24-26: CI-TEAM PI Conference: Invited Member of a Panel on Competencies, UIUC,

Urbana-Champaign, IL.

June 2-3: Cloud Futures 2011, Invited presentation, "Introducing Cloud Computing into STEM

Curriculum Using Microsoft Azure", Redmond, WA.

June 5-9: Semtech 2011, The Sementic Technology Conference, San Franscisco, CA.

June 28: Wipro, Chennai, India: Cloud Computing: Concepts, Technologies and Business

Implications

August 11: University Day at Bloomberg LP, NY, USA, invited participant.

Recommendations:

The Grant Project team should continue and build on the good work performed in Project Years 1 & 2.

Page 34: UB Grant Project - University at Buffalobina/DataIntensive/BigData/EvaluationReport_… · National Science Foundation (NSF) Grant Project CCLI DUE-0920335 PROJECT TITLE: A Comprehensive

Project Evaluation Report Year 2: 2010-2011

32

8. Has participation in the Certificate Program been broadened by including the following?

o Undergraduates at UB from

All UB departments within the School of Engineering & Applied Sciences (SEAS) and

Other non-SEAS collaborating disciplines

o Industrial workforce personnel seeking training/retraining

o Mentoring activities, especially for underrepresented minorities to enroll in the program and to

benefit from it via organizations such as BEAM, LSAMP, STEP, UB Honors Program

Answer: Yes.

Evidence:

Prof. Ramamurthy has been working very actively to communicate and collaborate with other

departments at UB to publicize the new Certificate Program, enrich & strengthen the course

materials, and attract students into the Program and its courses. See the information provided for the

preceding Question 7 for more details.

Mentoring activities performed by Prof. Ramamurthy include supervision and encouragement of the

following undergraduate and graduate student research projects and resulting papers/presentations:

Undergraduate Research:

Austin Miller: UB Honors Program - Thesis entitled “A Methodology for Transforming

Common Algorithms to MapReduce Framework”

Regina May (CSTEP Student), Leslie Peirrot, & Mohit Bansal: “Extracting Information from

Large-Scale Data using Probablistic Methods”, Poster presented at the Special Conference on

Academic Excellence, April 2010, University at Buffalo, NY.

Mohit Bansal (Senior) UB Honors Program – Performed research on text processing as an

intern at CUBRC during Spring & Summer 2011.

Eric Nagler (Junior) - Performed research on storing & analyzing semantic data on HBASE

as an intern at CUBRC during Spring & Summer 2011.

Brian Rosenberg (Senior) - Performed research on entity extraction from text data as an

intern at CUBRC during Spring & Summer 2011.

Graduate Research:

Abhishek Agarwal : “Application Hadoop MapReduce Theory to Modern Portfolio

Analysis” (submitted to Cluster 2010),

Abhishek Agarwal: “MOPS: A Modified Priority Scheduler for Improved Resource

Utilization” (submitted to Cluster 2010)

Hingsik Kim: “Pop!World: An Evolutionary Biology Tool” (deployed on the Google App

Engine)

Amol Agarwal: “Hosting Applications on the Cloud”

Recommendations:

• Continue the active work in communicating and collaborating with other departments at UB.

Page 35: UB Grant Project - University at Buffalobina/DataIntensive/BigData/EvaluationReport_… · National Science Foundation (NSF) Grant Project CCLI DUE-0920335 PROJECT TITLE: A Comprehensive

Project Evaluation Report Year 2: 2010-2011

33

• Continue work on publicizing & disseminating information on the new Certificate Program to

industry.

• Continue working actively to increase mentoring opportunities and activities.

Page 36: UB Grant Project - University at Buffalobina/DataIntensive/BigData/EvaluationReport_… · National Science Foundation (NSF) Grant Project CCLI DUE-0920335 PROJECT TITLE: A Comprehensive

Project Evaluation Report Year 2: 2010-2011

34

9. Has the TIDE framework and Certificate Program been disseminated to educators for their

implementation?

o Dissemination of the certificate program and framework materials including the Certificate

Program design and documents, course outlines, syllabi, course/lecture materials, handouts,

student lab exercises/projects, and details for implementation of a simple and low-cost

computing environment for supporting the courses.

o Dissemination vehicles including documentation in the form of manuals, implementation

instructions, & strategies, conference presentations, and websites.

o Have strategies been provided for educators to effectively adopt the TIDE framework?

Answer: Yes.

Evidence:

Dissemination of materials:

The certificate program documents and related materials, developed so far in Years 1 & 2, have been

made available & disseminated via websites, conference presentations, and industry presentations.

These documents & related materials include the Certificate Program design & documents, course

outlines, syllabi, course/lecture materials, handouts, student lab exercises/projects, and information

regarding simple and low-cost computing environments for supporting the courses.

Dissemination vehicles include:

Websites including:

Prof. Ramamurthy’s website: http://www.cse.buffalo.edu/~bina/

The UB Wiki website for this NSF Grant Project: https://wiki.cse.buffalo.edu/tide/

UB’s new website (a work in progress under development) on Data-Intensive Computing

Research & Education funded by this NSF Grant Project:

http://www.cse.buffalo.edu/~bina/DataIntensive/index.html

Conference Presentations and Industry Presentations:

Please see the list of presentations provided as part of Question 7 above and on Prof.

Ramamurthy’s website at :http://www.cse.buffalo.edu/~bina/

Strategies to help educators adopt the program framework and certificate program:

Strategies have been included as part of the presentation material, slides, & handouts for the above

mentioned presentations.

Recommendations:

• Continue to improve & update the developed promotional/informational materials & handouts.

• Continue to expand dissemination of promotional/informational materials & handouts.

• Continue to improve, expand, and update the developed websites.

• Continue to attend/visit and give presentations at conferences, workshops, schools, and industries.

• For certificate program implementation strategies: Continue the documentation of relevant issues,

approaches, technology, etc., and write manuals to aid faculty at other institutions to adopt the

instructional framework and set up the DIC Certificate Program.

Page 37: UB Grant Project - University at Buffalobina/DataIntensive/BigData/EvaluationReport_… · National Science Foundation (NSF) Grant Project CCLI DUE-0920335 PROJECT TITLE: A Comprehensive

Project Evaluation Report Year 2: 2010-2011

35

7 Summative Evaluation

SUMMATIVE EVALUATION

Purpose: To assess the project's success. Summative Evaluation takes place after ultimate modifications

and changes have been made, after the project is stabilized and after the impact of the project has had a

chance to be realized.

Note: This evaluation will be conducted at the end of this Grant Project.

Evaluation Questions – The Basis of the Evaluation

(1) Question: Was the project successful? To what extent did the project meet the overall goals?

Answer: TBD

(2) Question: What were its strengths and weaknesses?

Answer:

Strengths included:

1. TBD

Weaknesses included:

1. TBD

(3) Question: Did the participants benefit from the project? In what ways?

Answer: TBD

(4) Question: What components were the most effective?

Answer: TBD

(5) Question: Were the results worth the project's cost?

Answer: TBD

(6) Question: Is this project replicable and transportable?

Answer: TBD

Page 38: UB Grant Project - University at Buffalobina/DataIntensive/BigData/EvaluationReport_… · National Science Foundation (NSF) Grant Project CCLI DUE-0920335 PROJECT TITLE: A Comprehensive

Project Evaluation Report Year 2: 2010-2011

8-1

8 Appendix A. Letter from SUNY on Certificate Program Approval & Registration

This appendix includes the letter received by UB from SUNY, dated February 18, 2011. The

letter informs UB that the Data-Intensive Computing (DIC) Certificate Program was approved by

SUNY and is now a SUNY registered program.

Page 39: UB Grant Project - University at Buffalobina/DataIntensive/BigData/EvaluationReport_… · National Science Foundation (NSF) Grant Project CCLI DUE-0920335 PROJECT TITLE: A Comprehensive

Project Evaluation Report Year 2: 2010-2011

9-1

9 Appendix B. Student Course Evaluation

Appendix includes the results of the Student Course Evaluations for Year 2:

Fall Semester, December 2010

o Student Course Evaluation Questionnaire for CSE 487/587

o Student Course Evaluation Results for CSE 487/587

Spring Semester, May 2011

o Student Course Evaluation Questionnaire for CSE 486/586

o Student Course Evaluation Results for CSE 486/586