content lifecycle management - best practices for governance, archiving, compliance & mining...
DESCRIPTION
See the full webinar here: http://www.alfresco.com/about/events/ondemand This webinar discusses the possibility of a full content life cycle management solution, addressing all lifecycle needs - from content creation - to archiving and retention. Creation of unstructured or file-based data is growing faster than any other data type in organizations. What is needed is a system that is easy to manage, scales to support the amount of file data being stored and preserves it for the long term. Industries that are regulated by government for data retention and integrity, need solutions that make it simpler for them to do so. And without significant overhead. CAStor is a Content Adressable Storage solution from Caringo which integrates with Alfresco Enterprise Edition. The CAStor’s unique software approach creates high-performance and massively scalable clustered storage on standard x86 server hardware. This provides customers with affordable content storage that can start with one terabyte and scale seamlessly into Petabytes as your business grows.TRANSCRIPT
1
© 2009 Caringo, Inc.
Access
Store
Distribute
Caringo & AlfrescoComplete Content Lifecycle Solution
2
Managing File-Based Data as Content
Storing file-based data is as much an information management problem as it is an issue with the
storage technology
The point where business and IT needs converge
Business Need: Protecting and preserving intellectual property and business critical records for future benefit
IT Need: Implementing a cost-effective infrastructure that ensures the availability and integrity of file-based data
3
Realities of File-Based Data
Unstructured dataOver 95% is “unstructured” 1
Massive file growthUp to 120% per year2
Low reuse of files3
90% never accessed after creationOnly 65% of files accessed
are only accessed once3
Aging files occupying expensive storageSoftware needed to migrate files to
secondary storageAdded cost and complexity
Must meet compliance mandates Secondary storage tier required
90% 10% 65%
Files neveraccessed
again
accessedonce
1IDC, The Expanding Digital Universe2The Economic Impact of File Virtualization, IDC3Measurement and Analysis of Large-Scale Network File System Workloads, UC Santa Cruz
accessed
4
File Storage Challenges
Today’s storage requirements are different• Millions and billions of files on thousands of large disk drives
File systems simply cannot stretch any farther• The weight of layers of complexity and virtualization makes them brittle• They hit maximums on file size and number of files and servers• They encounter folder and drive letter problems
Newer file systems are high-maintenance• Even with layers of virtualization, underlying file systems must
still be managed, migrated, backed up and maintained• Requires highly skilled administrators
Volume of file data is major information management problem• Folder/Sub-folder/file name becomes cryptic at scale (millions/billions)• File systems provide no informational context for files
5
• Majority of capacity in commercial sector born as file-based, rich digital content
• 5 key infrastructure requirements (Enterprise Strategy Group)• Infinite scale – in real-time, dynamically, no human intervention• No boundaries – expand beyond walls of IT department• Operationally efficient – leverage commodity components, policy-based
automation• Self-Management – auto re-balance and optimize, no human intervention• Self- Healing – withstand failures, automatically adjust/heal itself
• Object-based storage (IDC)• 4 tests/criteria for technology
• Self-referencing – Unique address for each file/object• Described by metadata – Beyond standard file system• Location independence• Dynamic presentation – Not fixed to a traditional tree format• Intelligent replication/distribution
Next Gen: Internet Scale and Object-Based
6
• Effectively manage content from creation through storage and expiration
• Alfresco2Caringo interface available at Alfresco Forge
• Developed by XeniT• Alfresco and Caringo
Partner
• Alfresco ECM manages business process & workflow
• Caringo stores and protects business-critical content
• Ensure content integrity and preservation
• Preserve context of content for the long-term
• Accessible and available well into the future
Convergence: ECM & Content Storage
7
Covering the Complete Storage Workflow
Comprehensive solution to access, store and distribute• HTTP access for cloud
storage and Web 2.0• Complete business solutions• Continuous data availability• Long-term data protection• Intelligent data replication for
content distribution and disaster recovery
ContentFile Server
(CFS)
IntegratedSolutions
NativeCAStor
8
CAStor Software Key Features
• Runs on affordable and standard x86 server hardware• Delivers flexibility and choice
• Massively scalable storage cluster• Start small and scale to billions of files or objects• As you grow from TBs to PBs, access bandwidth also grows
• Increase capacity seamlessly• No disruption in operations or data availability. No migration!
• Manages and repairs itself automatically and faster than RAID
• Local and Wide Area Replication for DR and backup
• Data protection for regulatory compliance and internal governance• WORM, integrity checking, authenticity, object-level retention, LifePoints
• Rich metadata support• Attach and persist descriptive metadata with objects• Content in Context
Node1
n
2
3
GigE
900
4
CAStor Cluster
9
Early File Management Challenge
• 8dot3 in DOS days• Eight characters + extension
• Example: C:\Directory\document.doc
• Organizational challenge for even hundreds of files• Significant position coding schemes• Include fully qualified path and
name on documents• Law firms still do this today
• System metadata only, basic• Non-descriptive, not useful in
organizing files
10
Incremental Advancement
• Long file names introduced in Windows mid-90s• Promise of better identifying files for organization and finding
• 8dot3 turned into this:\My Documents\This is my document.doc
and…\My Documents\This is my document v2.doc
• Folder/sub-folder hierarchy is cumbersome• File counts now in the millions and beyond
• Remains difficult to manage especially over time• Millions will turn into billions
• File names still lack informational value
11
CAStor Content Storage SoftwareObject-Based
• CAStor ideally suited for file-based data storage
• Supports rich metadata tags• System generated metadata• Custom metadata• Descriptive information lives with
the file
Metadata101000101010100111010101100010110100…
110010UUID
HTTP/1.1 200 OKDate: Thu, 26 Jun 2008 21:26:34 GMTServer: CAStor Cluster/2.2CAStor-Application-Name: FinalCutProCAStor-Create-Date: 2008-06-26 21:26:14.687000Castor-System-Cluster: Internet Demo ClusterCastor-System-Created: Thu, 26 Jun 2008 21:26:20 GMTContent-Disposition: inline; filename=Sports %Segment%206-26-08.mxfContent-Length: 8619354Content-type: application/mxflifepoint: [Thu, 03 Jul 2008 21:26:14 GMT] reps=2, deletable=Truelifepoint: [] deleteReplica-Count: 2
Content Address File Data
12
CAStor Content Objects
• Supports all types of digital content
• Metadata values stored are specific to each individual type
• Vast 128-bit address space• Never run out of UUIDs• Billions of objects
• Define metadata values to drive replication and distribution
Metadata 101000101010100111010101100010110100…110010UUID1
Content Address Video
Metadata 111000101010100111010101100010110100…101010UUID2
Image
Metadata 101100101010100111010101100010110100…110111UUID3
Audio
Metadata 101011101010100111010101100010111110…110011UUID4
Doc
1313
Metadata Enables IntelligenceFilter and Rules Engine
HTTP/1.1 200 OKDate: Thu, 26 Jun 2008 21:26:34 GMTServer: CAStor Cluster/2.2CAStor-Application-Name: SimpleCASgCAStor-Create-Date: 2008-06-26 21:26:14.687000Castor-System-Cluster: Internet Demo ClusterCastor-System-Created: Thu, 26 Jun 2008 21:26:20 GMTContent-Disposition: inline; filename=Car%20Chase%206-26-08.mxfContent-Length: 86193452Content-type: application/mxflifepoint: [Thu, 03 Jul 2008 21:26:14 GMT] reps=2, deletable=Truelifepoint: [] deleteReplica-Count: 2
Metadata Filtered for specific value(s)
Rule(s) fire when condition met
If Content-type = MXF thenReplicate to DR Facility
14
Intelligent Content Replication and Distribution
CAStor Content Router (CR)
• Policy-based replication to geographically separate sites
• Policies driven by administrator-defined rules for specific metadata
• Multiple replication and distribution topologies supported• 1:1, 1:M, M:1, M:M• Customize to meet specific needs
• Replicate some or all files
• Fully automated to reduce management effort for file replication
Video
15
Content Relationships in Storage
• Relate elements of a specific project in an anchor stream• Simple list of UUIDs• Video, key image, audio, script
• Add elements through business process/workflow
• Persist relationships over the long term
Metadata 101000101010100111010101100010110100…110010UUID1
Content Address Video
Metadata 111000101010100111010101100010110100…101010UUID2
Image
Metadata 101100101010100111010101100010110100…110111UUID3
Audio
Metadata 101011101010100111010101100010111110…110011UUID4
Doc
Mutable Metadata
UUID
UUID1UUID2UUID3UUID4UUIDn
Anchor Stream
16
Caringo: Unified Content Infrastructure
ContentFile Server
(CFS)
IntegratedSolutions
NativeCAStor
Investment Protection• Runs on standard x86 server hardware• Add new generation server hardware at any time without
disruption
Cost-effective Scaling • Add capacity without interruption or need to provision
storage• Scale from Terabytes to Petabytes in a single cluster
Operational Efficiency• Self-managing and self-healing cluster minimizes
administrative intervention
High Performance Object Storage• Easily address performance needs for small and/or large
file workloads
Data Protection & Preservation• Archive unstructured data for the long-term and address
regulatory compliance
17
A Winning Combination for Managing the Content Lifecycle
Experience the Solution Today
Get 4TB of CAStor Software FreeGo to http://www.caringo.com/downloadCAStor.html
18Training 9
Thank You!
Access
Store
Distribute