file · web viewdata mining—why is it important?

51
Data Mining—Why is it Important? Data mining starts with the client. Clients naturally collect data simply by doing business; so that is where the entire process begins. But Customer Relationship Management (CRM) Data is only one part of the puzzle. The other part of the equation is competitive data, industry survey data, blogs, and social media conversations. By themselves, CRM data and survey data can provide very good information, but when combined with the other data available it is powerful. Data Mining is the process of analyzing and exploring that data to discover patterns and trends. The term Data Mining is one that is used frequently in the research world, but it is often misunderstood by many people. Sometimes people misuse the term to mean any kind of extraction of data or data processing. However, data mining is so much more than simple data analysis. According to Doug Alexander at the University of Texas, data mining is, “the computer-assisted process of digging through and analyzing enormous sets of data and then extracting the meaning of the data. Data mining tools predict behaviors and future trends, allowing businesses to make proactive, knowledge-driven decisions. Data mining tools can answer business questions that traditionally were too time consuming to resolve. They scour databases for hidden patterns, finding predictive information that experts may miss because it lies outside their expectations.” Data mining consists of five major elements:

Upload: dangdiep

Post on 14-Feb-2018

217 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: file · Web viewData Mining—Why is it Important?

Data Mining—Why is it Important?

Data mining starts with the client.  Clients naturally collect data simply by doing business; so that is where the entire process begins.  But Customer Relationship Management (CRM) Data is only one part of the puzzle. The other part of the equation is competitive data, industry survey data, blogs, and social media conversations.  By themselves, CRM data and survey data can provide very good information, but when combined with the other data available it is powerful.Data Mining is the process of analyzing and exploring that data to discover patterns and trends.

The term Data Mining is one that is used frequently in the research world, but it is often misunderstood by many people.  Sometimes people misuse the term to mean any kind of extraction of data or data processing. However, data mining is so much more than simple data analysis. According to Doug Alexander at the University of Texas, data mining is, “the computer-assisted process of digging through and analyzing enormous sets of data and then extracting the meaning of the data. Data mining tools predict behaviors and future trends, allowing businesses to make proactive, knowledge-driven decisions. Data mining tools can answer business questions that traditionally were too time consuming to resolve. They scour databases for hidden patterns, finding predictive information that experts may miss because it lies outside their expectations.”

Data mining consists of five major elements:

1) Extract, transform, and load transaction data onto the data warehouse system.

2) Store and manage the data in a multidimensional database system.

3) Provide data access to business analysts and information technology professionals.

4) Analyze the data by application software.

Page 2: file · Web viewData Mining—Why is it Important?

5) Present the data in a useful format, such as a graph or table.

This technique is a game changer in the world of statistical analysis and business. It is important in this realm because it can make predictions that older analyses techniques were simply not capable making. This visual from thearling.commay help understand the evolution and differences of data analysis through the years:

Evolutionary Step

Business Question

Enabling Technologies

Product Providers

Characteristics

Data Collection(1960s)

“What was my total revenue in the last five years?”

Computers, tapes, disks

IBM, CDC Retrospective, static data delivery

Data Access(1980s)

“What were unit sales in New England last March?”

Relational databases (RDBMS),  Structured Query Language (SQL), ODBC

Oracle, Sybase, Informix, IBM, Microsoft

Retrospective, dynamic data delivery at record level

Data Warehousing &Decision Support(1990s)

“What were unit sales in New England last March? Drill down to Boston.”

On-line analytic processing (OLAP), multidimensional databases, data warehouses

Pilot, Comshare, Arbor, Cognos, Microstrategy

Retrospective, dynamic data delivery at multiple levels

Data Mining(Emerging Today)

“What’s likely to happen to

Advanced algorithms, multiprocessor

Pilot, Lockheed, IBM, SGI,

Prospective, proactive information

Page 3: file · Web viewData Mining—Why is it Important?

Boston unit sales next month? Why?”

computers, massive databases

numerous startups (nascent industry)

delivery

Table 1. Steps in the Evolution of Data Mining.

Data Mining can be used in many different sectors of business to both predict and discover trends. It is a proactive solution for businesses looking to gain a competitive edge. In the past, we were only able to analyze what a company’s customers or clients HAD DONE, but now, with the help of Data Mining, we can predict what clientele WILL DO.

With Data Mining, companies can make better and more effective business decisions – marketing, advertising, etc – decisions that will help these companies grow.

For more information about how Data Mining can help discover trends and patterns in your market, contact the market research specialists at The Research Group by calling 410-332-0400 or click here today!

Qualitative market research utilizes the disciplines of psychology and sociology to garner emotive insights that drive behavior, and importantly influence decisions. The Research Group’s team of seasoned researchers will assist you in turning those insights into opportunities.

3 Reasons Why Data Mining is (almost) DeadData Mining (sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and

summarizing it into useful information. As the term suggests, the data is mined or queried for insight. For example, retailers use data mining

Page 4: file · Web viewData Mining—Why is it Important?

techniques to do basket analysis (customers who bought this also

bought that) and to further understand what other factors influence a purchase.

Traditionally, data mining has consisted of analysts generating questions to feed to a database in the hope of finding an answer. This

could be something like asking the data belonging to a clothing retailer, “Are customers buying Hawaiian shirts in Atlanta?” Sounds

very applicable, especially when it comes to the hype around Big Data, doesn’t it?

Applicable, yes. Effective? Not so much.

Given today’s explosion of “Big Data,” companies need more

advanced methods for leveraging their data – methods that don’t rely solely on tribal knowledge, personal experience or best guesses.

What’s needed are new technologies and purpose-built solutions that reveal questions to answers no one even knew to ask.

That leads me to the three main reasons why traditional data mining methods are going the way of the dodo:

1. The current volume of data is unprecedented. In fact, 15 of 17 sectors

in the U.S. have more data stored per company than the entire U.S.

Library of Congress. According to IDC, in 2015, an estimated 7.9

zettabytes of data will be produced and replicated – the equivalent of

18 million libraries of congress. With these massive data sets, it’s

close to impossible to figure out what to query? The number of

queries exponentially explodes with the number of data elements.

Page 5: file · Web viewData Mining—Why is it Important?

Should I query about customers buying shirts in Atlanta? Or in

summer? Or in summer with a coke? Or with a hot dog?…the list is

endless. As one my customers said – “I do not know what questions

to ask. Therein is the limitation!” The breadth and depth of this “big”

data makes querying seem like trying to strike oil while digging with a

toothpick.

2. Added to volume is velocity of the data. The data is piling up faster

and faster. A company encounters a continuous stream of real-time

data – social media updates, customer feedback, sales figures,

financial data, supply chain data, product quality data, product

monitoring data and on and on and on. There’s simply not enough

time to manually query the data – it’s like a physician trying to

diagnose thousands of patients at the same time. The data must

constantly inform the end-user – ie. diagnose itself and recommend a

treatment – for it to be of any strategic value.

3. As I’ve already discussed, conventional data mining techniques are

driven by the analyst – or group of people – tasked with coming up

with a hypothesis, which is subjective and vulnerable to personal bias

and human error. Given the amount of information that’s out there,

asking the right question every time is becoming more and more of a

challenge because even the smartest, most experienced analysts

“don’t know what they don’t know.” Querying methods are seriously

biased by what the analyst thinks to ask. Again, going to back to the

striking oil analogy, if the analyst thinks there is oil under a certain

rock, that is the only place he will dig. He could be sitting on a gold

mine 50 feet away, but he’d completely miss it.

Page 6: file · Web viewData Mining—Why is it Important?

Data mining is limited to manual endeavors – why limit company

success to antiquated methods that by design fail to leverage the data for all it’s worth? It’s time to usher in new methods – new technologies

– for transforming the enterprise from reactive – based on guesstimates, hunches, and flawed insight – to proactive – based on

data-driven, actionable insight.

CMMI

Maturity Level 1, called "Initial", is characterized by "Heroic Efforts". The CMMI identifiesno Process Areas at this level. You automatically achieve this level if you can design, develop, integrate, and test. Organizations at Maturity Level 1 are sometimes successful, and sometimes not.

Maturity Level 2, called "Managed", is characterized by "Basic Project Management". The seven Process Areas at Maturity Level 2 all deal with management, rather than technical issues:

Maturity Level 3, called "Defined", is characterized by "Process Standardization". This iswhere the bulk of the Process Areas reside in the CMMI. We find that these Process Areas fallinto three main categories:

Technical – The first five Process Areas (Requirements Development, Technical Solution, Product Integration, Verification, and Validation) deal with the technical engineering work.

Process Management – The next three Process Areas (Organizational Process Focus, Organizational Process Definition, and Organizational Training) provide the infrastructure for maintaining and improving the organization's processes.

Management – The last six Process Areas (Integrated Product Management, Risk Management, Integrated Teaming, Integrated Supplier Management, Decision Analysis & Resolution, and Organizational Environment for Integration) all build more management

Page 7: file · Web viewData Mining—Why is it Important?

discipline on top of the basic management Process Areas established at Maturity Level 2.

Maturity Level 4, called "Quantitatively Managed", is characterized by "Quantitative Management". With the disciplined processes established at Maturity Levels 2 and 3, the organization is now in the position to be able to gain a statistical, numbers-based understanding of its performance, and use that understanding to "manage by fact". The two Process Areas at Maturity Level 4 (Organizational Process Performance and Quantitative Project Management) apply this capability for statistical management to understand the quality of both the processes the organization uses and the products it produces.

Maturity Level 5, called "Optimizing", is characterized by "Continuous Process Improvement". Built on the disciplined processes of Maturity Levels 2 and 3, and the quantitative understanding of Maturity Level 4, the two Process Areas at Maturity Level 5 (Organizational Innovation & Deployment and Causal Analysis & Resolution) put the organization on the path of ever-improving performance by understanding and correcting the root causes of problems, and by fostering an environment of innovation and creativity.

Why Do People Believe the CMMI Has Little Value?The CMM and CMMI have received a lot of bad press over the years. Most of that bad press can be traced to one of two things: misunderstandings and abuses.

Misunderstandings. Many people who open the CMMI book are immediately overwhelmed by the volume of information: five Maturity Levels, two Generic Goals, 12 Generic Practices, 25 Process Areas, 55 Specific Goals, 185 Specific Practices, hundreds of Sub-Practices—nearly a thousand pages in all! It is hard to blame them for feeling that this model must be way too restrictive to be applicable to a real-life organization.

Naturally, if your organization is not under a mandate to achieve a Maturity Level rating, then the Practices, and even the Goals in the CMMI take on more of a suggestive flavor. Of course, any organization would do well to take them as exceedingly strong suggestions, given the CMMI’s solid research basis!

Page 8: file · Web viewData Mining—Why is it Important?

Abuses. As we said at the beginning of this paper, the SEI designed the CMMI to be a roadmap for process improvement. But what we have seen in practice is organizations requiring their suppliers to achieve specific Maturity Level ratings. This in turn causes those suppliers to turn to the CMMI simply to achieve a rating, even if they have little or no interest in process improvement. When the CMMI is used by an organization that has no interest in process improvement, its use can (and often does) become abuse. Processes are written solely to satisfy a CMMI Appraiser, but with little or no thought for how they will affect the organization's work. Paperwork grows seemingly without bounds, and people feel that they are drowning in "process for process' sake".Those five steps seem easy enough. But organizational change actually involves much more work than the simple mechanics of deciding to make a change. The key players in the organization must all agree on the need for change, as well as the strategy to be employed. Garnering the necessary agreement and establishing momentum are major challenges in and of themselves. But those are topics for another white paper.

Page 9: file · Web viewData Mining—Why is it Important?

How can CMMI help?• CMMI provides a way to focus and manage hardware and software

development from product inception through deployment and maintenance.

– ISO/TL9000 are still required. CMMI interfaces well with them. CMMI and TL are complementary - both are needed since they address different aspects.

• ISO/TL9000 is a process compliance standard• CMMI is a process improvement model

• Behavioral changes are needed at both management and staff levels. Examples:

– Increased personal accountability– Tighter links between Product Management, Development, SCN,

etc.• Initially a lot of investment required – but, if properly managed, we will

be more efficient and productive while turning out products with consistently higher quality.

CMMI Models within the Framework• Models:

– Systems Engineering + Software Engineering (SE/SW)– Systems Engineering + Software Engineering + Integrated Product

and Process Development (IPPD) – Systems Engineering + Software Engineering + Integrated Product

and Process Development + Supplier Sourcing (SS)– Software Engineering only

• Representation options:– Staged – Continuous

• The CMMI definition of “Systems Engineering” - “The interdisciplinary approach governing the total technical and

managerial effort required to transform a set of customer needs, expectations and constraints into a product solution and to support that solution throughout the product’s life.” This includes both hardware and software.

Page 10: file · Web viewData Mining—Why is it Important?

Maturity Level 1: Initial

• Maturity Level 1 deals with performed processes.• Processes are unpredictable, poorly controlled, reactive. • The process performance may not be stable and may not meet specific

objectives such as quality, cost, and schedule, but useful work can be done.

Maturity Level 2 : Managed at the Project Level• Maturity Level 2 deals with managed processes.• A managed process is a performed process that is also:

– Planned and executed in accordance with policy– Employs skilled people– Adequate resources are available– Controlled outputs are produced– Stakeholders are involved– The process is reviewed and evaluated for adherence to

requirements

Slide of 146

Level 5

Initial

Level 1

Processes are unpredictable, poorly controlled, reactive.

Managed

Level 2

Processes are planned, documented, performed, monitored, and controlled at the project level. Often reactive.

Defined

Level 3 Processes are well characterized and understood. Processes, standards, procedures, tools, etc. are defined at the organizational (Organization X ) level. Proactive.

Quantitatively Managed

Level 4 Processes are controlled using statistical and other quantitative techniques.

Optimizing

Process performance continually improved through incremental and innovative technological improvements.

Page 11: file · Web viewData Mining—Why is it Important?

• Processes are planned, documented, performed, monitored, and controlled at the project level. Often reactive.

• The managed process comes closer to achieving the specific objectives such as quality, cost, and schedule.

Maturity Level 3 : Defined at the Organization Level• Maturity Level 3 deals with defined processes.• A defined process is a managed process that:

– Well defined, understood, deployed and executed across the entire organization. Proactive.

– Processes, standards, procedures, tools, etc. are defined at the organizational (Organization X ) level. Project or local tailoring is allowed, however it must be based on the organization’s set of standard processes and defined per the organization’s tailoring guidelines.

• Major portions of the organization cannot “opt out.”

Behaviors at the Five Levels

Initial

Managed

Defined

QuantitativelyManaged

Optimizing

Process is unpredictable,poorly controlled, and reactive

Process is characterized for projects and is oftenreactive

Process is characterizedfor the organization andis proactive

Process is measuredand controlled

Focus is on continuousquantitative improvement

Maturity LevelProcess Characteristics BehaviorsFocus on "fire prevention";improvement anticipated anddesired, and impacts assessed.

Greater sense of teamwork and inter-dependencies

Reliance on defined process. People understand, support and follow the process.Over reliance on experience of good

people – when they go, the processgoes. “Heroics.”

Focus on "fire fighting";effectiveness low – frustration high.

Page 12: file · Web viewData Mining—Why is it Important?

CMMI Components• Within each of the 5 Maturity Levels, there are basic functions that need

to be performed – these are called Process Areas (PAs).• For Maturity Level 2 there are 7 Process Areas that must be completely

satisfied.• For Maturity Level 3 there are 11 Process Areas that must be completely

satisfied.• Given the interactions and overlap, it becomes more efficient to work the

Maturity Level 2 and 3 issues concurrently.• Within each PA there are Goals to be achieved and within each Goal

there are Practices, work products, etc. to be followed that will support each of the Goals.

CMMI Process Areas

Page 13: file · Web viewData Mining—Why is it Important?

Example

For the Requirements Management Process Area:An example Goal (required):

“Manage Requirements”An example Practice to support the Goal (required):

“Maintain bi-directional traceability of requirements”Examples (suggested, but not required) of typical Work Products might be

Slide of 146

Maturity Levels (1- 5)

GenericPractices

GenericGoals

Process Area 2

Common Features

Process Area 1 Process Area n

VerifyingImplementation

SpecificGoals

SpecificPractices

Abilityto Perform

DirectingImplementation

RequiredRequired

Sub practices, typical work products, discipline amplifications, generic practice elaborations, goal and

practice titles, goal and practice notes, and references

Commitmentto Perform

Sub practices, typical work products, discipline amplifications, generic practice elaborations, goal and

practice titles, goal and practice notes, and references

Required. Specific for each process area.

Required. Common across all process areas.

Page 14: file · Web viewData Mining—Why is it Important?

Requirements traceability matrix orRequirements tracking system

Yet another CMMI term: Institutionalization

• This is the most difficult part of CMMI implementation and the portion where managers play the biggest role and have the biggest impact

• Building and reinforcement of corporate culture that supports methods, practices and procedures so they are the ongoing way of business……..

– Must be able to demonstrate institutionalization of all CMMI process areas for all organizations, technologies, etc.

• Required for all Process Areas

Page 15: file · Web viewData Mining—Why is it Important?

Scenario 1

ABC Pvt Ltd is a company with branches at Mumbai, Delhi, Chennai and Banglore. The Sales Manager wants quarterly sales report. Each branch has a separate operational system.

Solution 1:ABC Pvt Ltd.Extract sales information from each database.Store the information in a common repository at a single site.

Mumbai

Delhi

Chennai

Banglore

SalesManager

Sales per item type per branchfor first quarter.

Page 16: file · Web viewData Mining—Why is it Important?

Scenario 2One Stop Shopping Super Market has huge operational database.Whenever Executives wants some report the OLTP system becomes slow and data entry operators have to wait for some time.

Solution 2Extract data needed for analysis from operational database.Store it in warehouse.Refresh warehouse at regular interval so that it contains up

to date information for analysis.

Mumbai

Delhi

Chennai

Banglore

DataWarehouse

SalesManager

Query &Analysis tools

Report

OperationalDatabase

Data Entry Operator

ManagementWait

Report

Page 17: file · Web viewData Mining—Why is it Important?

Warehouse will contain data with historical perspective.

Scenario 3

Cakes & Cookies is a small,new company.President of the company wants his company should grow.He needs information so that he can make correct decisions.

Solution 3 Improve the quality of data before loading it into the

warehouse.Perform data cleaning

and transformation before loading the data.

Use query analysis tools to support adhoc

queries.

What is Data Warehouse??Inmons’s definition A data warehouse is

-subject-oriented,

Operationaldatabase

DataWarehouse

Extractdata

Data EntryOperator

Data EntryOperator

Manager

Report

Transaction

Query a nd A

nal ysistool

salesD

ataW

arehou se

Page 18: file · Web viewData Mining—Why is it Important?

-integrated,-time-variant,-nonvolatile

collection of data in support of management’sdecision making process.

Subject-orientedData warehouse is organized around subjects such as

sales,product,customer. It focuses on modeling and analysis of data for decision

makers.Excludes data not useful in decision support process.

IntegrationData Warehouse is constructed by integrating multiple

heterogeneous sources.Data Preprocessing are applied to ensure consistency.

In terms of data.– encoding structures. – Measurement of attributes. – physical attribute of data

RDBMS

LegacySystem

DataWarehouse

Flat File Data ProcessingData Transformation

Page 19: file · Web viewData Mining—Why is it Important?

– naming conventions. – Data type format

Time-variantProvides information from historical perspective e.g. past 5-

10 yearsEvery key structure contains either implicitly or explicitly an

element of timeNonvolatile

Data once recorded cannot be updated.Data warehouse requires two operations in

data accessing– Initial loading of data– Access of data

Operational v/s Information SystemFeatures Operational Information

Characteristics Operational processing Informational processing

Orientation Transaction Analysis

User Clerk,DBA,database professional

Knowledge workers

Function Day to day operation Decision support

Data Current Historical

View Detailed,flat relational Summarized, multidimensional

DB design Application oriented Subject oriented

load

access

Page 20: file · Web viewData Mining—Why is it Important?

Unit of work Short ,simple transaction

Complex query

Access Read/write Mostly read

Features Operational Information

Focus Data in Information out

N0. of rec. accessed tens millions

Number of users thousands hundreds

DB size 100MB to GB 100 GB to TB

Priority High prformnc,high availability

High flexibility,end-user autonomy

Metric Transaction throughput Query througput

Operational v/s Information System

Extract Transform Load Refresh

ServeExternalSources

Analysis

Query/Reporting

Monitoring &AdministrationMetadata

Repository

OLAP Servers

Reconciled data

Page 21: file · Web viewData Mining—Why is it Important?

Data Warehouse ArchitectureData Warehouse server

– almost always a relational DBMS,rarely flat filesOLAP servers

– to support and operate on multi-dimensional data structures

Clients– Query and reporting tools– Analysis tools– Data mining tools

Data Warehouse SchemaStar SchemaFact Constellation SchemaSnowflake Schema

Star SchemaA single,large and central fact table and one table for each

dimension.Every fact points to one tuple in each of the dimensions and

has additional attributes.Does not capture hierarchies directly.

SnowFlake SchemaVariant of star schema model.A single,large and central fact table and one or more tables

for each dimension.Dimension tables are normalized i.e. split dimension table

data into additional tablesFact Constellation

Multiple fact tables share dimension tables.This schema is viewed as collection of stars hence called

galaxy schema or fact constellation.Sophisticated application requires such schema.

Page 22: file · Web viewData Mining—Why is it Important?

Building Data WarehouseData SelectionData Preprocessing

– Fill missing values– Remove inconsistency

Data Transformation & IntegrationData Loading

Data in warehouse is stored in form of fact tables and dimension tables.

Case Study

Afco Foods & Beverages is a new company which produces dairy,bread and meat products with production unit located at Baroda.

There products are sold in North,North West and Western region of India.

They have sales units at Mumbai, Pune , Ahemdabad ,Delhi and Baroda.

The President of the company wants sales information.

Sales Information

Page 23: file · Web viewData Mining—Why is it Important?
Page 24: file · Web viewData Mining—Why is it Important?
Page 25: file · Web viewData Mining—Why is it Important?

Sales Measures & Dimensions

Measure – Units sold, Amount.Dimensions – Product,Time,Region.

Sales Data Warehouse Model

Page 26: file · Web viewData Mining—Why is it Important?

Sales Data Warehouse Model

Page 27: file · Web viewData Mining—Why is it Important?

Online Analysis Processing(OLAP)

It enables analysts, managers and executives to gain insight into data through fast, consistent, interactive access to a wide variety of possible views of information that has been transformed from raw data to reflect the real dimensionality of the enterprise as understood by the user.

OLAP Server

An OLAP Server is a high capacity,multi user data manipulation engine specifically designed to support and operate on multi-dimensional data structure.

OLAP server available are– MOLAP server

Page 28: file · Web viewData Mining—Why is it Important?

– ROLAP server– HOLAP server

Data Warehousing includesBuild Data WarehouseOnline analysis processing(OLAP).Presentation.

Need for Data Warehousing Industry has huge amount of operational dataKnowledge worker wants to turn this data into useful

information.This information is used by them to support strategic

decision making . It is a platform for consolidated historical data for analysis. It stores data of good quality so that knowledge worker can

make correct decisions.From business perspective

-it is latest marketing weapon-helps to keep customers by learning more about their

needs .-valuable tool in today’s competitive fast evolving world.

Data Warehousing ToolsData Warehouse

Page 29: file · Web viewData Mining—Why is it Important?

– SQL Server 2000 DTS– Oracle 8i Warehouse Builder

OLAP tools– SQL Server Analysis Services– Oracle Express Server

Reporting tools– MS Excel Pivot Chart– VB Applications

• What is Crowdsourcing?• How Crowdsourcing works?• Types of Crowdsourcing• Applications of Crowdsourcing• Benefits & Problems of Crowdsourcing• Video

WHAT IS CROWDSOURCING?• Crowdsourcing is the process of getting work or funding,

usually online, from a crowd of people.• The word Crowdsourcing is a combination of Crowd &

Outsourcing • Definition's:• Crowdsourcing is the act of outsourcing tasks, traditionally

performed by an employee or contractor, to an undefined, large group of people or community (a "crowd"), through an open call.

Page 30: file · Web viewData Mining—Why is it Important?

• Crowdsourcing is an online, distributed problem solving and production model.

• The term crowd sourcing was first used by Jeff Howe in 2006 in an article for wired magazine.

The Croud Sourcing Process IN EIGHT STEPS

1- Company has a problem2- Company broadcasts problem online3- Online “crowd” is asked to give solutions4- Crows submits Solutions5- Crowd vets solutions6- Company rewards winning solvers7- Company owns winning solutions8- Company Profits

TYPES OF CROWDSOURCING

• Crowd funding• The wisdom of the crowd• Crowdsourcing creative work• Microwork

CROWD FUNDING• Crowd funding describes the collective effort of individuals

who network and pool their money, usually via the Internet, to support efforts initiated by other people or organizations. This includes disaster relief, startup company funding, free software development, scientific research and many more.

THE WISDOM OF THE CROWD

Page 31: file · Web viewData Mining—Why is it Important?

• The wisdom of the crowd is the process of taking into account the collective opinion of a group of individuals rather than a single expert to answer a question.

CROWDSOURCING CREATIVE WORK

• Creative crowdsourcing spans sourcing creative projects such as graphic design, architecture, apparel design, writing, illustration etc.

MICROWORK

• Microwork is a series of small tasks which together comprise a large unified project, and are completed by many people over the Internet. Microwork is considered the smallest unit of work in a virtual assembly line. It is often used where human intelligence required to complete the task efficiently.

APPLICATIONS OF CROWDSOURCING• Testing & Refining a

ProductNetflix SellaBand

• Market ResearchThreadless

Knowledge Management• Accenture • Wikipedia

• Customer Service• My Starbucks ideas

• R & D• InnoCentive • P&G Connect &

Develop • Polling and Voting

• InTrade Building a new city

Page 32: file · Web viewData Mining—Why is it Important?

The History / Genesis of Crowd sourcing

1714- Marine Pocket Clock invented1936- Toyota Holds a Logo Contest1955- Syd Opera House Architecture Contest2001- Wikipedia Launched2002- American Idol Season 12005- Youtube Launched2006- Crowdsourcing term coined

BENEFITS OF CROWDSOURCING• Problems can be explored at comparatively little cost. • Payment is by results. • The organization can tap a wider range of talent than might

be present in its own organization• Turn customers into designers• Turn customers into marketers

PROBLEMS WITH CROWDSOURCING

• Quality• Intellectual property leakage• No time constraint• Not much control over development or ultimate product• Ill-will with own employees• Choosing what to crowd source & what to keep in-house

Benefits of Refactoring

The Summary: Refactoring is a huge aid in untangling production code without breaking it, and in improving its long-term maintainability.

Refactoring helps you achieve:

1. self-documenting code, for better readability and maintainability, which is pretty much the only kind of code documentation that ever seems to stay current (Extract Method and Introduce Local allow you to create function and variable names that are descriptive enough to rarely need

Page 33: file · Web viewData Mining—Why is it Important?

comments). Until you experience readable, self-describing code, you don't know what you're missing

2. fine-grained encapsulation, for easier debugging and code reuse: Extract Method automatically determines the parameters needs in order to create a method from the current selection, and handles them correctly. You then know exactly what external information the selected block requires in order to operate. This can be a great aid in untangling complex code during code reviews or debugging.

3. the generalization of existing code, to make it easier to apply existing code to a broader range of problems - as youExtract Method, you can easily replace things like hard-coded constants (perhaps, a connection string, or a table name) with parameters, thus allowing the application of proven code to new contexts.

Continues…

UnderstandabilityMore straightforward and well organized (factored) code is easier to understand.

CorrectnessIt's easier to identify defects by inspection in code that's easier to understand. Overly complex, poorly structured, Rube Goldberg style code is much more difficult to inspect for defects. Additionally, well componentized code with high coherency of components and loose coupling between components is vastly easier to put under test. Moreover, smaller, well-formed bits under test makes for less overlap in code coverage between test cases which makes for faster and more trustworthy tests (which becomes a self-reinforcing cycle driving toward better and better tests). As well, more straightforward code tends to be more predictable and reliable.

Ease of Maintenance and EvolutionWell-factored, high quality, easy to understand common components are easier to use, extend, and maintain. Many changes to the system are now easier to make because they have smaller impact and it's more obvious how to make the appropriate changes.

Page 34: file · Web viewData Mining—Why is it Important?

Refactoring code does have merit on its own just in terms of code quality and correctness issues, but where refactoring pays off the most is in maintenance and evolution of the design of the software. Often a good tactic when adding new features to old, poorly factored code is to refactor the target code then add the new feature. This often will take less development effort than trying to add the new feature without refactoring and it's a handy way to improve the quality of the code base without undertaking a lot of "pie in the sky" hypothetical advantage refactoring / redesign work that's hard to justify to management.

Cloud computing Definitions of Cloud computing Architecture of Cloud computing Benefits of Cloud computing Opportunities of Cloud Computing Cloud computing – Google Apps Grid computing vs Cloud computing

Definitions Cloud computing is using the internet to access someone else's

software running on someone else's hardware in someone else's data center. Lewis Cunningham[2]

A large-scale distributed computing paradigm that is driven by economies of scale, in which a pool of abstracted, virtualized, dynamically scalable, managed computing power, storage,

Page 35: file · Web viewData Mining—Why is it Important?

platforms, and services are delivered on demand to external customers over the Internet. Ian Foster[9]

A Cloud is a type of parallel and distributed system consisting of a collection of interconnected and virtualized computers that are dynamically provisioned and presented as one or more unified computing resources based on service-level agreements established through negotiation between the service provider and consumers. Rajkumar Buyya[10]

Architecture of Cloud computingEssential Characteristics[7]

On-demand self-service. A consumer can unilaterally provision computing capabilities such

as server time and network storage as needed automatically, without requiring human interaction with a service provider.

Broad network access. Capabilities are available over the network and accessed through

standard mechanisms that promote use by heterogeneous thin or thick client platforms (e.g., mobile phones, laptops, and PDAs) as well as other traditional or cloudbased software services.

Resource pooling.

Page 36: file · Web viewData Mining—Why is it Important?

The provider’s computing resources are pooled to serve multiple consumers using a multi-tenant model, with different physical and virtual resources dynamically assigned and reassigned according to consumer demand.

Rapid elasticity. Capabilities can be rapidly and elastically provisioned - in some

cases automatically - to quickly scale out; and rapidly released to quickly scale in.

To the consumer, the capabilities available for provisioning often appear to be unlimited and can be purchased in any quantity at any time.

Measured service. Cloud systems automatically control and optimize resource usage

by leveraging a metering capability at some level of abstraction appropriate to the type of service.

Resource usage can be monitored, controlled, and reported - providing transparency for both the provider and consumer of the service.

Cloud Service ModelsSPI Model

Cloud Software as a Service (SaaS) Cloud Platform as a Service (PaaS) Cloud Infrastructure as a Service (IaaS)

Infrastructure as a Service (IaaS) The capability provided to the consumer is to provision processing,

storage, networks, and other fundamental computing resources. Consumer is able to deploy and run arbitrary software, which can

include operating systems and applications. The consumer does not manage or control the underlying cloud

infrastructure but has control over operating systems, storage, deployed applications, and possibly limited control of select networking components (e.g., host firewalls).

Page 37: file · Web viewData Mining—Why is it Important?

Software as a Service (SaaS) The capability provided to the consumer is to use the provider’s

applications running on a cloud infrastructure. The applications are accessible from various client devices through a

thin client interface such as a web browser (e.g., web-based email). The consumer does not manage or control the underlying cloud

infrastructure including network, servers, operating systems, storage, or even individual application capabilities, with the possible exception of limited user specific application configuration settings.

Cloud Deployment Models Public Cloud. Private Cloud. Community Cloud. Hybrid Cloud.

Public Cloud The cloud infrastructure is made available to the general public or a

large industry group and is owned by an organization selling cloud services.

Private Cloud The cloud infrastructure is operated solely for a single organization. It

may be managed by the organization or a third party, and may exist on-premises or off-premises.

Community Cloud The cloud infrastructure is shared by several organizations and

supports a specific community that has shared concerns (e.g., mission, security requirements, policy, or compliance considerations). It may be managed by the organizations or a third party and may exist on-premises or off-premises.

Hybrid Cloud The cloud infrastructure is a composition of two or more clouds

(private, community, or public) that remain unique entities but are bound together by standardized or proprietary technology that enables data and application portability (e.g., cloud bursting for load-balancing between clouds).

Page 38: file · Web viewData Mining—Why is it Important?

Benefits of Cloud Computing

Business Benefits Almost zero upfront infrastructure investment Just-in-time Infrastructure More efficient resource utilization Usage-based costing Reduced time to market

Technical Benefits

Automation – “Scriptable infrastructure” Auto-scaling Proactive Scaling More Efficient Development lifecycle Improved Testability Disaster Recovery and Business Continuity

Opportunities of Cloud Computing End consumers. Business customers. Developers and Independent Software Vendors (ISVs).

Google App Engine

Page 39: file · Web viewData Mining—Why is it Important?

Google App Engine enables you to build web applications on the same scalable systems that power Google applications. App Engine

applications are easy to build, easy to maintain, and easy to scale as your traffic and data storage needs grow.

Cost è ? Pay only for what you actually use. Exceed the free quota of 500 MB of storage and around 5M

pageviews per month. Trial? è

How to Create applications for Cloud computing? build an App Engine application using standard Java web

technologies, such as servlets and JSP. create an App Engine Java project with Eclipse àuse the Google

Plugin for Eclipse for App Engine development. use the App Engine datastore with the Java Data Objects (JDO)

standard interface. upload your app to App Engine.

Page 40: file · Web viewData Mining—Why is it Important?

Grid computing vs Cloud computing

Collective: interactions across collections of resources, directory servicesPlatform: collection of specialized tools, middleware and services on top of the unified resources toprovide a development and/or deployment platform.Unified Resources: resources that have been abstracted/encapsulated

Resource: discovery, negotiation, monitoring, accounting and payment of sharing operations on individual resourcesConnectivity: communication and authentication protocols

Application Grid Computing emerged in eScience to solve scientific problems

requiring HPC. Cloud Computing is rather oriented towards applications that run

permanently and have varying demand for physical resources while running.

the well-known CRM SaaS Salesforce.com.

Cloud Increase computing. Increase store. consumption basis. IBM, Google, Microsoft Hour, storage, view…

Grid Increase computing. Increase store. project-oriented academia or gov. labs number of service units