the data deluge “ the growth of unstructured data ” dr kevin mcisaac, ibrs

19
The Data Deluge The Growth of Unstructured Data Dr Kevin McIsaac, IBRS www.ibrs.com.au

Upload: glenn-dangerfield

Post on 31-Mar-2015

217 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: The Data Deluge “ The Growth of Unstructured Data ” Dr Kevin McIsaac, IBRS

The Data Deluge

“The Growth of Unstructured Data ”

Dr Kevin McIsaac, IBRSwww.ibrs.com.au

Page 2: The Data Deluge “ The Growth of Unstructured Data ” Dr Kevin McIsaac, IBRS

© Copyright 2006 IBRS All rights reserved.

Overview

The Impact of Changes in Data Growth Rates

Exploiting Data Management Technologies

Taking Control Of E-mail Conclusions

Page 3: The Data Deluge “ The Growth of Unstructured Data ” Dr Kevin McIsaac, IBRS

© Copyright 2006 IBRS All rights reserved.

The Impact of Changes in Data Growth Rates

Data growth rates accelerate The “unstructured data” tipping point How big is the impact?

Page 4: The Data Deluge “ The Growth of Unstructured Data ” Dr Kevin McIsaac, IBRS

© Copyright 2006 IBRS All rights reserved.

Data Growth Rates Accelerate

92% of all new data is stored on magnetic media, primarily hard disks.

That data grew about 30% pa between 1999 and 2002

Growth rate forecast to grow at 60% pa though 2011!

i.e., your storage capacity will double every 18 months!

2007: First 1TB disk!

So What’s New! Data Has Always Grown At High Rates.

Source: Computer World/IBRS Data Management Survey

Page 5: The Data Deluge “ The Growth of Unstructured Data ” Dr Kevin McIsaac, IBRS

© Copyright 2006 IBRS All rights reserved.

The “Unstructured Data” Tipping Point

What is “Unstructured Data” We have reached a tipping

point were More that ½ of all data

managed by IT is unstructured Merrill Lynch estimate 85% of

business data is unstructured Some of your largest data sets

are unstructured, e.g., e-mail Unstructured data growth

rate of 65%-200% But, 38% of ITO’s lack a

document management system

Data Management Was Traditionally About Managing Structured Data. This Focus Needs to

Change.

Source: Computer World/IBRS Data Management Survey

Page 6: The Data Deluge “ The Growth of Unstructured Data ” Dr Kevin McIsaac, IBRS

© Copyright 2006 IBRS All rights reserved.

How Big Is The Impact? Office workers spend an average

of 9.5 hr/wk searching, gathering and analysing information, with 60 % of that on the Internet

Outsell White collar workers spend 30%

- 40% of their time managing documents

Gartner Our survey highlights

Strong concerns with the rate of unstructured data growth

Lack of systems to manage this Few concerns with the storage

infrastructure.

IT Must Learn To Manage Unstructured Data As Effectively As It Does Structured Data Today

Our unstructured data is growing too rapidly

70%

We do not have adequate systems to manage our unstructured data

65%

We don't know our storage costs

28%

We have problems meeting our compliance requirements

22%

Our structured data is growing too rapidly

19%

Provisioning storage takes too long

10%

We are spending too much on people to manage storage

10%

We are spending too much on storage hardware

4%

We are spending too much on storage software

4%Source: Computer World/IBRS Data Management Survey

Page 7: The Data Deluge “ The Growth of Unstructured Data ” Dr Kevin McIsaac, IBRS

© Copyright 2006 IBRS All rights reserved.

Exploiting Data Management Technologies

Advances in Storage Hardware Commoditisation of Storage Arrays Information Lifecycle Management Document Management Data Classification Disaster Recovery Readiness

Page 8: The Data Deluge “ The Growth of Unstructured Data ” Dr Kevin McIsaac, IBRS

© Copyright 2006 IBRS All rights reserved.

Advances in Storage Hardware

Shugart’s Law - $ per bit of magnetic storage declines 1/2 every 18 months

~37% pa (10%/Q), recently 50% pa! Flat budget supports ~60%pa growth

SANs well established & a commodity Fully featured arrays reasonably

priced iSCSI taking off as a complement to FC Bolt-on storage virtualisation not

gaining traction Content Addressable Storage

Use for long term archive. TCO benefits are in the long term

management of data

Shugart’s Law Ensures Drive Costs Are Contained, But What About The System Costs

Page 9: The Data Deluge “ The Growth of Unstructured Data ” Dr Kevin McIsaac, IBRS

© Copyright 2006 IBRS All rights reserved.

Commodity Storage Arrays G1: Monolithic arrays

Proprietary & very expensive G2: Modular arrays

Proprietary with commodity components, moderately expensive

G3: Commodity based arrays Commodity components, standards based, inexpensive

SAS as high performance, lower cost alterative to FC-disk Freely mix SAS and SATA in same frame

In-box virtualisation for simpler management and lower cost Thin provisioning is the next big virtualisation technology

Potential for new vendor to challenge established players

e.g., Compellent, EqualLogic, 3-PAR etcHardware Is Just A Small Part Of The Problem. Data Management Processes Are More Important

Page 10: The Data Deluge “ The Growth of Unstructured Data ” Dr Kevin McIsaac, IBRS

© Copyright 2006 IBRS All rights reserved.

Information Lifecycle Management

Automate the management of your data lifecycle policy

Retain, delete, migrate, archive Defining and enforcing policy

Who sets policy? Who has authority?

IT is not the data owner, just the steward!

Start with tiered storage Balance price with service levels

Due to high growth rates focus on unstructured data

Transactional stuff generally Ok Archival of E-mail and

Documents Don’t confuse backup & archival!

Separate archive from backupWhile ILM is The holy grail of storage vendors it has not yet been widely adopted

Source: Computer World/IBRS Data Management Survey

Page 11: The Data Deluge “ The Growth of Unstructured Data ” Dr Kevin McIsaac, IBRS

© Copyright 2006 IBRS All rights reserved.

Document Management Document management can

eliminate significant wasted time

“White collar workers spend 30% - 40% of their time managing documents”

But, 38% have no DM system and 50% only cover some documents

Document management needs to include e-mail

E-mail is often the largest unstructured data repository

But only12% said document management includes e-mailDocument Management and ILM and Archiving Are

All Predicated on Data Classification and Policy

Source: Computer World/IBRS Data Management Survey

Page 12: The Data Deluge “ The Growth of Unstructured Data ” Dr Kevin McIsaac, IBRS

© Copyright 2006 IBRS All rights reserved.

Data Classification & Policy Only 12% had clear,

formal policy. Without this: IT can’t act responsibly as a

steward No mandate!

ILM is nearly impossible, i.e., Data can’t be deleted and

archival is difficult.

Few had metadata or taxonomies, which hampers data use and reuse

We have classified some or all of our data.

53%

IT is a steward, managing data using policies set by the business.

35%

We have assigned business owners for our data.

30%

The business has defined the value of our key data.

18%

We have clear, formal policies for data management.

12%

We create metadata to help classify data.

6%

We create taxonomies to help classify data.

4%Source: Computer World/IBRS Data Management Survey

Businesses Need to Invest in Data Classification & Policy

Page 13: The Data Deluge “ The Growth of Unstructured Data ” Dr Kevin McIsaac, IBRS

© Copyright 2006 IBRS All rights reserved.

Disaster Recovery Readiness

Disaster recovery confidence level are high, however…

44% said they have not tested their DR plan in the last 12 months.

35% said they had only one a limited disaster recovery test in the last 12 months.

Without Regular Testing Disaster Recovery Plans Are A Lottery

Source: Computer World/IBRS Data Management Survey

Page 14: The Data Deluge “ The Growth of Unstructured Data ” Dr Kevin McIsaac, IBRS

© Copyright 2006 IBRS All rights reserved.

Taking Control Of E-mail

The Importance of E-mail E-mail Data Management Challenges Managing Users’ Mailboxes

Page 15: The Data Deluge “ The Growth of Unstructured Data ” Dr Kevin McIsaac, IBRS

© Copyright 2006 IBRS All rights reserved.

The Importance of E-mail 80% say e-mail is more

important than the telephone. 74 % said being without e-mail is a greater hardship than losing the telephone.

META Group A typical business user sends

and receives around 600 e-mail per week

Ferris Research The average office worker

spends 49 min/day managing e-mail. Upper level managers spend up to 4hrs/day.  All that sending & receiving, responding & deleting takes an enormous toll on workplace productivity.

ePolicy Institute

E-mail Is An Essential Business Tool But E-Mail Data Management Is Still A “Cottage

Industry”

Page 16: The Data Deluge “ The Growth of Unstructured Data ” Dr Kevin McIsaac, IBRS

© Copyright 2006 IBRS All rights reserved.

E-mail Data Management Challenges

57% Said Managing E-mail Was One Of Their Top DM Problems

Top Exchange DM challenges Managing Exchange disaster

recovery Managing the size of Message

Stores Protecting & searching individual

.PST files Restoring individual mailboxes Responding to legal discovery

and capturing all email for compliance

Osterman Research

Managing Users’ Mailboxes Is Key To All These Challenges

Source: Computer World/IBRS Data Management Survey

Kevin McIsaac
Check on this
Page 17: The Data Deluge “ The Growth of Unstructured Data ” Dr Kevin McIsaac, IBRS

© Copyright 2006 IBRS All rights reserved.

Managing Users’ Mailboxes

The common solution is to use mailbox quotas

40% use PSTs to limit growth but 37% said it caused problems.

Just shift the problem elsewhere

E-mail archival can be a powerful solution but…

Only 13 % had successfully implemented e-mail archiving

Another 13% tried and failed! Needs robust data management

policy Only 2% implemented an e-

discovery/compliance solution!

Getting E-mail Under Control Is An Important And Urgent Issue, But Proceed With Great Caution

Source: Computer World/IBRS Data Management Survey

Page 18: The Data Deluge “ The Growth of Unstructured Data ” Dr Kevin McIsaac, IBRS

© Copyright 2006 IBRS All rights reserved.

Conclusions We have reached a tipping point,

where unstructured data volume and growth exceeds that of structured data

Learn to manage unstructured data as effectively as structured data

Invest in data classification & policy before applying technology

Page 19: The Data Deluge “ The Growth of Unstructured Data ” Dr Kevin McIsaac, IBRS

The Data Deluge

“The Growth of Unstructured Data ”

Dr Kevin McIsaac, IBRSwww.ibrs.com.au