content lifecycle management - best practices for governance, archiving, compliance & mining...

18
1 © 2009 Caringo, Inc. Access Store Distribute Caringo & Alfresco Complete Content Lifecycle Solution

Upload: alfresco-software

Post on 05-Dec-2014

4.724 views

Category:

Technology


1 download

DESCRIPTION

See the full webinar here: http://www.alfresco.com/about/events/ondemand This webinar discusses the possibility of a full content life cycle management solution, addressing all lifecycle needs - from content creation - to archiving and retention. Creation of unstructured or file-based data is growing faster than any other data type in organizations. What is needed is a system that is easy to manage, scales to support the amount of file data being stored and preserves it for the long term. Industries that are regulated by government for data retention and integrity, need solutions that make it simpler for them to do so. And without significant overhead. CAStor is a Content Adressable Storage solution from Caringo which integrates with Alfresco Enterprise Edition. The CAStor’s unique software approach creates high-performance and massively scalable clustered storage on standard x86 server hardware. This provides customers with affordable content storage that can start with one terabyte and scale seamlessly into Petabytes as your business grows.

TRANSCRIPT

Page 1: Content Lifecycle Management - Best Practices for Governance, Archiving, Compliance & Mining Unstructured Data - Caringo and Alfresco Software

1

© 2009 Caringo, Inc.

Access

Store

Distribute

Caringo & AlfrescoComplete Content Lifecycle Solution

Page 2: Content Lifecycle Management - Best Practices for Governance, Archiving, Compliance & Mining Unstructured Data - Caringo and Alfresco Software

2

Managing File-Based Data as Content

Storing file-based data is as much an information management problem as it is an issue with the

storage technology

The point where business and IT needs converge

Business Need: Protecting and preserving intellectual property and business critical records for future benefit

IT Need: Implementing a cost-effective infrastructure that ensures the availability and integrity of file-based data

Page 3: Content Lifecycle Management - Best Practices for Governance, Archiving, Compliance & Mining Unstructured Data - Caringo and Alfresco Software

3

Realities of File-Based Data

Unstructured dataOver 95% is “unstructured” 1

Massive file growthUp to 120% per year2

Low reuse of files3

90% never accessed after creationOnly 65% of files accessed

are only accessed once3

Aging files occupying expensive storageSoftware needed to migrate files to

secondary storageAdded cost and complexity

Must meet compliance mandates Secondary storage tier required

90% 10% 65%

Files neveraccessed

again

accessedonce

1IDC, The Expanding Digital Universe2The Economic Impact of File Virtualization, IDC3Measurement and Analysis of Large-Scale Network File System Workloads, UC Santa Cruz

accessed

Page 4: Content Lifecycle Management - Best Practices for Governance, Archiving, Compliance & Mining Unstructured Data - Caringo and Alfresco Software

4

File Storage Challenges

Today’s storage requirements are different• Millions and billions of files on thousands of large disk drives

File systems simply cannot stretch any farther• The weight of layers of complexity and virtualization makes them brittle• They hit maximums on file size and number of files and servers• They encounter folder and drive letter problems

Newer file systems are high-maintenance• Even with layers of virtualization, underlying file systems must

still be managed, migrated, backed up and maintained• Requires highly skilled administrators

Volume of file data is major information management problem• Folder/Sub-folder/file name becomes cryptic at scale (millions/billions)• File systems provide no informational context for files

Page 5: Content Lifecycle Management - Best Practices for Governance, Archiving, Compliance & Mining Unstructured Data - Caringo and Alfresco Software

5

• Majority of capacity in commercial sector born as file-based, rich digital content

• 5 key infrastructure requirements (Enterprise Strategy Group)• Infinite scale – in real-time, dynamically, no human intervention• No boundaries – expand beyond walls of IT department• Operationally efficient – leverage commodity components, policy-based

automation• Self-Management – auto re-balance and optimize, no human intervention• Self- Healing – withstand failures, automatically adjust/heal itself

• Object-based storage (IDC)• 4 tests/criteria for technology

• Self-referencing – Unique address for each file/object• Described by metadata – Beyond standard file system• Location independence• Dynamic presentation – Not fixed to a traditional tree format• Intelligent replication/distribution

Next Gen: Internet Scale and Object-Based

Page 6: Content Lifecycle Management - Best Practices for Governance, Archiving, Compliance & Mining Unstructured Data - Caringo and Alfresco Software

6

• Effectively manage content from creation through storage and expiration

• Alfresco2Caringo interface available at Alfresco Forge

• Developed by XeniT• Alfresco and Caringo

Partner

• Alfresco ECM manages business process & workflow

• Caringo stores and protects business-critical content

• Ensure content integrity and preservation

• Preserve context of content for the long-term

• Accessible and available well into the future

Convergence: ECM & Content Storage

Page 7: Content Lifecycle Management - Best Practices for Governance, Archiving, Compliance & Mining Unstructured Data - Caringo and Alfresco Software

7

Covering the Complete Storage Workflow

Comprehensive solution to access, store and distribute• HTTP access for cloud

storage and Web 2.0• Complete business solutions• Continuous data availability• Long-term data protection• Intelligent data replication for

content distribution and disaster recovery

ContentFile Server

(CFS)

IntegratedSolutions

NativeCAStor

Page 8: Content Lifecycle Management - Best Practices for Governance, Archiving, Compliance & Mining Unstructured Data - Caringo and Alfresco Software

8

CAStor Software Key Features

• Runs on affordable and standard x86 server hardware• Delivers flexibility and choice

• Massively scalable storage cluster• Start small and scale to billions of files or objects• As you grow from TBs to PBs, access bandwidth also grows

• Increase capacity seamlessly• No disruption in operations or data availability. No migration!

• Manages and repairs itself automatically and faster than RAID

• Local and Wide Area Replication for DR and backup

• Data protection for regulatory compliance and internal governance• WORM, integrity checking, authenticity, object-level retention, LifePoints

• Rich metadata support• Attach and persist descriptive metadata with objects• Content in Context

Node1

n

2

3

GigE

900

4

CAStor Cluster

Page 9: Content Lifecycle Management - Best Practices for Governance, Archiving, Compliance & Mining Unstructured Data - Caringo and Alfresco Software

9

Early File Management Challenge

• 8dot3 in DOS days• Eight characters + extension

• Example: C:\Directory\document.doc

• Organizational challenge for even hundreds of files• Significant position coding schemes• Include fully qualified path and

name on documents• Law firms still do this today

• System metadata only, basic• Non-descriptive, not useful in

organizing files

Page 10: Content Lifecycle Management - Best Practices for Governance, Archiving, Compliance & Mining Unstructured Data - Caringo and Alfresco Software

10

Incremental Advancement

• Long file names introduced in Windows mid-90s• Promise of better identifying files for organization and finding

• 8dot3 turned into this:\My Documents\This is my document.doc

and…\My Documents\This is my document v2.doc

• Folder/sub-folder hierarchy is cumbersome• File counts now in the millions and beyond

• Remains difficult to manage especially over time• Millions will turn into billions

• File names still lack informational value

Page 11: Content Lifecycle Management - Best Practices for Governance, Archiving, Compliance & Mining Unstructured Data - Caringo and Alfresco Software

11

CAStor Content Storage SoftwareObject-Based

• CAStor ideally suited for file-based data storage

• Supports rich metadata tags• System generated metadata• Custom metadata• Descriptive information lives with

the file

Metadata101000101010100111010101100010110100…

110010UUID

HTTP/1.1 200 OKDate: Thu, 26 Jun 2008 21:26:34 GMTServer: CAStor Cluster/2.2CAStor-Application-Name: FinalCutProCAStor-Create-Date: 2008-06-26 21:26:14.687000Castor-System-Cluster: Internet Demo ClusterCastor-System-Created: Thu, 26 Jun 2008 21:26:20 GMTContent-Disposition: inline; filename=Sports %Segment%206-26-08.mxfContent-Length: 8619354Content-type: application/mxflifepoint: [Thu, 03 Jul 2008 21:26:14 GMT] reps=2, deletable=Truelifepoint: [] deleteReplica-Count: 2

Content Address File Data

Page 12: Content Lifecycle Management - Best Practices for Governance, Archiving, Compliance & Mining Unstructured Data - Caringo and Alfresco Software

12

CAStor Content Objects

• Supports all types of digital content

• Metadata values stored are specific to each individual type

• Vast 128-bit address space• Never run out of UUIDs• Billions of objects

• Define metadata values to drive replication and distribution

Metadata 101000101010100111010101100010110100…110010UUID1

Content Address Video

Metadata 111000101010100111010101100010110100…101010UUID2

Image

Metadata 101100101010100111010101100010110100…110111UUID3

Audio

Metadata 101011101010100111010101100010111110…110011UUID4

Doc

Page 13: Content Lifecycle Management - Best Practices for Governance, Archiving, Compliance & Mining Unstructured Data - Caringo and Alfresco Software

1313

Metadata Enables IntelligenceFilter and Rules Engine

HTTP/1.1 200 OKDate: Thu, 26 Jun 2008 21:26:34 GMTServer: CAStor Cluster/2.2CAStor-Application-Name: SimpleCASgCAStor-Create-Date: 2008-06-26 21:26:14.687000Castor-System-Cluster: Internet Demo ClusterCastor-System-Created: Thu, 26 Jun 2008 21:26:20 GMTContent-Disposition: inline; filename=Car%20Chase%206-26-08.mxfContent-Length: 86193452Content-type: application/mxflifepoint: [Thu, 03 Jul 2008 21:26:14 GMT] reps=2, deletable=Truelifepoint: [] deleteReplica-Count: 2

Metadata Filtered for specific value(s)

Rule(s) fire when condition met

If Content-type = MXF thenReplicate to DR Facility

Page 14: Content Lifecycle Management - Best Practices for Governance, Archiving, Compliance & Mining Unstructured Data - Caringo and Alfresco Software

14

Intelligent Content Replication and Distribution

CAStor Content Router (CR)

• Policy-based replication to geographically separate sites

• Policies driven by administrator-defined rules for specific metadata

• Multiple replication and distribution topologies supported• 1:1, 1:M, M:1, M:M• Customize to meet specific needs

• Replicate some or all files

• Fully automated to reduce management effort for file replication

Video

Page 15: Content Lifecycle Management - Best Practices for Governance, Archiving, Compliance & Mining Unstructured Data - Caringo and Alfresco Software

15

Content Relationships in Storage

• Relate elements of a specific project in an anchor stream• Simple list of UUIDs• Video, key image, audio, script

• Add elements through business process/workflow

• Persist relationships over the long term

Metadata 101000101010100111010101100010110100…110010UUID1

Content Address Video

Metadata 111000101010100111010101100010110100…101010UUID2

Image

Metadata 101100101010100111010101100010110100…110111UUID3

Audio

Metadata 101011101010100111010101100010111110…110011UUID4

Doc

Mutable Metadata

UUID

UUID1UUID2UUID3UUID4UUIDn

Anchor Stream

Page 16: Content Lifecycle Management - Best Practices for Governance, Archiving, Compliance & Mining Unstructured Data - Caringo and Alfresco Software

16

Caringo: Unified Content Infrastructure

ContentFile Server

(CFS)

IntegratedSolutions

NativeCAStor

Investment Protection• Runs on standard x86 server hardware• Add new generation server hardware at any time without

disruption

Cost-effective Scaling • Add capacity without interruption or need to provision

storage• Scale from Terabytes to Petabytes in a single cluster

Operational Efficiency• Self-managing and self-healing cluster minimizes

administrative intervention

High Performance Object Storage• Easily address performance needs for small and/or large

file workloads

Data Protection & Preservation• Archive unstructured data for the long-term and address

regulatory compliance

Page 17: Content Lifecycle Management - Best Practices for Governance, Archiving, Compliance & Mining Unstructured Data - Caringo and Alfresco Software

17

A Winning Combination for Managing the Content Lifecycle

Experience the Solution Today

Get 4TB of CAStor Software FreeGo to http://www.caringo.com/downloadCAStor.html

Page 18: Content Lifecycle Management - Best Practices for Governance, Archiving, Compliance & Mining Unstructured Data - Caringo and Alfresco Software

18Training 9

Thank You!

Access

Store

Distribute