rsctc 2008 © 2008 zl technologies, inc. email archiving arvind srinivasan gaurav baone
TRANSCRIPT
RSCTC 2008 © 2008 ZL Technologies, Inc.
Email ArchivingArvind SrinivasanGaurav Baone
RSCTC 2008 15181
Imagine this is what happens
to your business records
at the end of every month ….
RSCTC 2008 15181
If this looks absurd …
That’s exactly what we do to email!
Regulators now treat email like hard copy records
Practically every major transaction, project, and contract, is recorded in email
SEC 17a-4
NASD 3010, 3110
HIPAA
FDA 21 CFR 11
DoD 5015.2
Sarbanes-Oxley
Non-compliance fines and legal liabilities are rising . . .
ZipLip, Inc.
And the courts agree (FRCP, Dec 2006)
RSCTC 2008 15181
Just How Much Scalability Does Archiving Require?
7 Years Retention
4.47 Billion Emails For Archive System To Index & Search
4.28 Billion Web-Pages Indexed by
source: Google Press Release, Feb 17, 2004
versus
25,000 Employees averaging 70 mails/day
Assume:
Functionality needs to scale to these volumes
ZL Technologies, Inc.
CONFIDENTIALCONFIDENTIAL
ZLTI Unified Archival
RSCTC 2008
Outline
Email Capture Methods
Business Drivers
Archive Functionality
Retention & Deletion
Surveillance & Compliance
E Discovery
Conclusion
ZL Technologies, Inc.
CONFIDENTIALCONFIDENTIAL
ZLTI Unified Archival
RSCTC 2008
Email Capture Methods
Active Capture Methods – PRO-ACTIVE Archiving– Journaling
– Mailbox crawling
– SMTP Gateway Capture
Historical Capture Methods – REACTIVE Archiving– Restore from backup tapes
– Crawl for PST / NSF files from desktops
– Forensic captures
ZL Technologies, Inc.
CONFIDENTIALCONFIDENTIAL
ZLTI Unified Archival
RSCTC 2008
Journaling – 100% Capture
ZL Technologies, Inc.
CONFIDENTIALCONFIDENTIAL
ZLTI Unified Archival
RSCTC 2008
Mailbox Crawling – Policy Based
ZL Technologies, Inc.
CONFIDENTIALCONFIDENTIAL
ZLTI Unified Archival
RSCTC 2008
Reactive Archiving
ZL Technologies, Inc.
CONFIDENTIALCONFIDENTIAL
ZLTI Unified Archival
RSCTC 2008
Not Just Email
ZL Technologies, Inc.
CONFIDENTIALCONFIDENTIAL
ZLTI Unified Archival
RSCTC 2008
Primary Business Drivers - Regulations and Laws
Investment Advisors Act
Canada PIPEDA
Gramm-Leach-Bliley Act NASD 3010
NASD 3011
HIPAA
SEC 17a-4
Sarbanes-Oxley Act
CA SB1386
Mutual Funds Rule 38a-1
Hedge Funds Rule 203(b)
UK Freedom of Information Act
US Freedom of Information Act
Japan Personal Information Protection Act
Florida Sunshine Law
Basel II
FRCPDoD5015.2
ZL Technologies, Inc.
CONFIDENTIALCONFIDENTIAL
ZLTI Unified Archival
RSCTC 2008
Functional Requirements
Retention
Surveillance and Compliance
e Discovery
Common Theme - Classification
ZL Technologies, Inc.
CONFIDENTIALCONFIDENTIAL
ZLTI Unified Archival
RSCTC 2008
Real-time Categorization of Mail
Sender/Recipients
Content (Subject, body, attachment)
User Input (Which folder it was found, Manual Tagging)
Retention & Deletion
Conflicting Requirements:
Laws & Regulation => Retain for “x” years.
Vs
Company Liability/Risk and Cost
Retention Periods and Policies
Regulation Type of Record
Retention Period
Age Discrimination in Employment
Act
Hiring Documents
One year from date of decision
Fair Labor Standards
Payroll ,sales and Personal
Records
Three Years
Rehabilitation Act
Handicap discrimination
Records
Three Years
Civil Rights Act Records One Year
Occupational Safety and Health Act
Health Records
30 Years
ZL Technologies, Inc.
CONFIDENTIALCONFIDENTIAL
ZLTI Unified Archival
RSCTC 2008
Retention & Deletion (cont’d)
"a priori" and "a posteriori“ based Retention.
Event Driven – Deletion of mail from user folder, Reclassification of mail by end user
Legal Hold – Court Orders to retain evidence relating to certain subject matters.
Single Instance Storage
Same Email in Multiple Mailboxes
Same Attachment in Multiple Emails
Significant storage savings.
ZL Technologies, Inc.
CONFIDENTIALCONFIDENTIAL
ZLTI Unified Archival
RSCTC 2008
Surveillance
Conflicting Requirements:
Regulation require review of documents
Vs
Effort spent into reviewing the documents.
Examples of Compliance Categories
Category Content Action
Adult Offensive language
Post-Review
Confidential SSN Numbers, Bank Account
numbers
Pre-Review to prevent
confidential information from going
out
Legal Issues Words like attorney, charge*.
Phrases like breach* and
agreement within 6 words
Post/Pre Review
Compliance Hype
Stocks and sell between 3 words
of each other
Pre-Review in Financial Industries
Real-time Flagging of Mail
Lexical Based – Key words, word associations, wild-cards
Policy Based – Eg. Mail from WallStreetJournal.com is newsletter.
Custom Code – Detect Vacation Response, Read Receipts, DSN’s
ZL Technologies, Inc.
CONFIDENTIALCONFIDENTIAL
ZLTI Unified Archival
RSCTC 2008
Surveillance(cont’d)
Real-time Flagging is a categorization problem
Current Systems suffer from lot of false positive.
Transparent and Deterministic rules preferred over Blackboxes.
Disclaimers (Internal and External) tend to get flagged as it contains the very terms that we try to flag.
Use Reviewer feedback to adapt the rules.
ZL Technologies, Inc.
CONFIDENTIALCONFIDENTIAL
ZLTI Unified Archival
RSCTC 2008
E-Discovery
Conflicting Requirements:
Produce electronic docs. to satisfy court-orders
Vs.
Providing insufficient, not relevant, privileged Information
Search Type Court-dictated Required Search
Full text "acidosis"
Boolean "cardiac" OR "respiratory"
Phrase "in-custody death"
Proximity "pre-existing" within 10 words of "condition"
Wildcard "epilep*"
Wildcard proximity
"mental*" within 5 words of "condition"
Dual wildcard proximity
"continu*" within 10 words of "discharg*"
Wildcard sentence-level
"caus*" within same sentence as "death"
┼ Source: Williams v. Taser Int’l, Inc., 2007 WL 1630875 (N.D. Ga. June 4, 2007)
Discovery Request
Certain number of custodians
Date Range
Pertaining to certain subject matter; usually described by a set of Search terms.
ZL Technologies, Inc.
CONFIDENTIALCONFIDENTIAL
ZLTI Unified Archival
RSCTC 2008
E-Discovery(cont’d)
Landmark case Zubulake vs. UBS Warburg (2003)
Primarily driven by Federal Rules of Civil Procedure (FRCP) established in 2006.
Litigants are entitled to obtain electronic information from the adverse party.
Voluntary Initial Disclosures need to be made pertaining to each litigant
Today, almost all cases have some sort of electronic documents as evidence.
ZL Technologies, Inc.
CONFIDENTIALCONFIDENTIAL
ZLTI Unified Archival
RSCTC 2008
E-Discovery(cont’d)
Parties face Sanctions if they do not provide all the relevant documents. (Numerous precedence, eg. Metrokane vs Built NY 2008). Validation occurs when receiving party can prove existence of other document through hard-copy printout or other means.
Lawyers from both parties routinely negotiate keywords to define Search Concepts
Manual Review of Documents for Relevance and Privilege. Numerous product cluster similar documents (near deduplication) to present similar documents to reviewers to improve efficiency.
Chain of Custody – To prove that the document has not be tampered or altered.
ZL Technologies, Inc.
CONFIDENTIALCONFIDENTIAL
ZLTI Unified Archival
RSCTC 2008
Palin’s e-mail at $15m per request
NBC's price quote for e-mails sent to Todd Palin: $15 million.
AP's price quote for e-mails between state employees and the campaign headquarters of Sen. John McCain: $15 million.
AP's price quote for e-mails between state employees and the National Park Service: $15 million.
Cost to retrieve e-mail for 1 mailbox
6 Hours to assemble email for 1 employee mailbox
2 Hours for “security” checks
5 Hours to filter by requested keyword or topic
13 Total hours per mailbox
$73.87 Hourly rate
$960.31 Cost to retrieve e-mail for1 mailboxCost to retrieve e-mail for all
employees
$960.31 Cost to retrieve email for 1 mailbox
16,000 Full-time employees
$15.3 million
Cost to retrieve e-mail for all employees
ZL Technologies, Inc.
CONFIDENTIALCONFIDENTIAL
ZLTI Unified Archival
RSCTC 2008
Conclusion Most challenges in archiving can be reduced to Classification problem.
Segmentation Problems: Detect internal and external disclaimers
Detect change in Email behavior through email profile analysis
Understanding mails: Need to develop Analysis techniques to understand the contents
Visualization and Grouping Similar mails – Control the order in which mails and documents are viewed.
Consistent way of defining Subject Matters – Beyond just a set of keywords.
Extract more meta data about attachments such as images, audio and video files.
And all the above are required in muliple languages – English, Japanese, Spanish, Chinese, and others.