enterprise information architecture using data mining
TRANSCRIPT
Enterprise Information Architecture using Data Mining
Reshmi Chakraborty
Digital information proliferation
Our proposal & future research areas
Healthcare Sector
Utility Sector
An integrated solution framework
Digital Information Growth and need for a scalable data mining solution
3
In 1986, 14% of earth’s data was stored on
vinyl records. In 2000, 25% of all information was in
digital media form. By 2007, 94% of all information storage
capacity was digital, totaling 276 exabytes. Computing storage capacity is growing at
around 58% per year.
This is increasing infrastructure requirements,
complexity and straining IT resources. Achieving analytics with traditional data
mining strategies is becoming slow and/or
beyond the financial means of many
organizations.
Healthcare – state of the union
Spending is projected to be around $4.0 trillion by
2015 while OOP expense will grow by 9% only
Consumers “shopping” for their healthcare needs.
Resulting information proliferation and need for
data mining
Managed-Care Organizations (MCO) has
become a merger and acquisition industry
Each MCO has large amounts of digital
prescription claims (often redundant).
MCOs working to use these information to
improve and retain customer loyalty.
They need a Clinical Master Patient Index and
an efficient data mining strategy to remain
profitable. 4
Demand response in Utility sector
5
The fundamental problem is
rising utility expense.
Lack of capability to monitor
and control consumptions.
Need to study past data and
need algorithms to extrapolate
past data into possible future
conjectures.
Need a smarter
infrastructure, an integration
of electrical infrastructure and
information infrastructure
A tentative solution approach
The integrated data mining solution (apart from industry specific algorithms) need to cater the
following:
Retain customer loyalty through Portals and Mash-ups
Master Data Management System
A Clinical Master Data Management System
A Meter data management system based on power-line-communication (PLC) architecture.
Actionable analytics based on data mining algorithms.
Unified identity and access management system
Secured content management system.
Canonical data model for all electronic information exchange.
Service Oriented Architecture as part of the enterprise wide information strategy.
To scale this architecture up, implement this solution in SaaS/BPO/ Cloud (EC2/Azure/Google)
model
6
Basic Data Mining Process
7
AnalyticsOperational Data Store
ETL*
Data Mart
Pre process
Star schema based dimension cubes
Data mining algorithms
Select mining logic
Presentation logic
* Extract, Transform and Load
Operational Data
(ODS)
ETL*
Data Mart
Pre process
Star schema based dimension cubes
Data mining algorithms
Select mining logic
* Extract, Transform and Load
Healthcare version
8
Utility sector version
9
The solution consists of installation of
smart meters in individual apartments and
controlling them from one single location. The solution is intended to operate in a
service cloud model where different
customer energy profiles can be managed
from the same information cloud.
Customer information security will be maintained at two different levels. First, the information to and from smart meters will be communicated using web
service-security framework and each customer will have their own security algorithm.
Second, at the database level, the data will be stored in partitioned tables. Smart meters, RF links (zigbee protocol will be used to communicate data out of
smart meters into redundant data collection units) and data collection units will be off-the-shelf products.
Is this a viable approach?
10
A smart grid enabled demand response system
iPhone based consumer centric healthcare App – adoption trend
Lets look at a SWOT analysis
11
Strengths
• Scalable – can be sold in subscription model.
• Standardized – follows industry standard communication protocols
Opportunities
• Strategic alliances, partnerships with cloud providers like Amazon/Google.
• Can be built entirely using open source technologies.
Threats
• User hesitation.• Could be conidered as “Old
wine in new bottle”.
Weaknesses
• Additional industry specific components needed which may increase cost.
• Security and data privacy is still a major concern.
S W
TO
Our approach to implement an integrated information framework
12Learn- Investigate- Stimulate- Tabulate - Enumerate - Net
Three step business-technology strategy development process
Prometheus is our proposed utility sector solution
13
Future areas of research
14
Following are the future areas of research that we are going to focus in the process of developing our solutions and attempting to commercialize them:
• The techniques will be useful pending change in information storage techniques – i.e. it becomes a hybrid of relational and semantic storage model.
• Developing the solution in open source mode.
• Making the structure flexible so that the end consumer has the capability of choosing from a set of data mining algorithms and compare results.
• Identifying the target set of customers and testing the prototype model in viral networking environment.
In summary:
1. Information proliferation is costly from people, process
and technology perspective.
2. Business decision is a function of information and
intelligence.
3. Combining one and two above, we find that an integrated
hosted data mining framework can improve the top line of
an organization.
15
16
Thank you