freeeed presentation

32
+ Hadoop-based Open Source eDiscovery: FreeEed (Easy as popcorn)

Upload: markkerzner

Post on 25-May-2015

1.327 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: FreeEed presentation

+

Hadoop-based Open Source eDiscovery: FreeEed

(Easy as popcorn)

Page 2: FreeEed presentation

2+Business (legal) use case

• Duty to disclose information – rule FRCP

26

• Preserve relevant information

• Produce information on request

• Keep the information for X years

• Sanctions for obstruction

• Sanctions for non-compliance

Page 3: FreeEed presentation

3+Before the thirties

• Court room was full of surprises

Page 4: FreeEed presentation

4+Civil discovery changes this

Page 5: FreeEed presentation

5+Discovery basics

• Obligations of the parties

• At the start of a lawsuit or litigation

possibility, preserve relevant data

• Produce data at request, within

timelines

• Review the data before production

• Can request eDiscovery from

opponents

• Store and archive

Page 6: FreeEed presentation

6+Interesting facts about eDiscovery• Most of these are proprietary or under

NDA

• Representative case size: 5GB to

500GB

• Cost per GB of processing: $5-200,

~$100

• Takes 25-50% of litigation budget

• Days to process and months to

review

• Preservation: 3-7 years

• 500 providers, with 10 majors

Page 7: FreeEed presentation

7+Challenges of eDiscovery

• Data sizes in the TB

• Seasonal loads, tight deadlines

• Hundreds of file formats

• Heavy read/write load in review

• Text analytics is of paramount

importance

• Huge price tickets obstruct justice

Page 8: FreeEed presentation

8+FreeEed main features

• Open source Hadoop-based eDiscovery:

• As scalable as Hadoop

• Fast review with NoSQL

• Scales with the lawsuit - time and

volume

• Data preservation and archiving with

VM

• Only possible with open source

license

Page 9: FreeEed presentation

9+Design goals

• Built on open source components

• Big Data scalable

• Preservation, chain of custody,

archiving

• Scalable technically and business-ly

• Stable (don’t laugh, people get different

results on different runs)

• Close-source compatible (MS + Azure

too)

Page 10: FreeEed presentation

10+Packaging architecture

• Comes as VM’s

• Grab as few or as many as you want

• No mixing of matters

• No ethical problems

• Preserve for as many years as you want

• 1 VM = 1 corn, FreeEed = free popcorn

Page 11: FreeEed presentation

11+FreeEed makes lawyers happy

Page 12: FreeEed presentation

12+FreeEed : Architecture

Page 13: FreeEed presentation

+FreeEed popcorn is very popular with lawyers, legal techs, IT, etc.

Page 14: FreeEed presentation

14+FreeEed popcorn

• Deploy on laptops, servers or cloud

• One-node or any number of nodes

• Scalable storage

• Different cooking recipes

• No mixing of matters

• Easy archiving

• Easy deletion

Page 15: FreeEed presentation

15+Processing architecture

• Based on golden-image VM

• Controlled cluster start in any

environment

• Index / cull on the fly or later

• Immediately searchable

Page 16: FreeEed presentation

16+Cluster start-up on EC2

Page 17: FreeEed presentation

17+Cloud integration

Downloadable VM’s

Same VM’s on Amazon AWS

Amazon VM’s are very convenient Immediate deployment Any hardware configuration you need Control lots of power from a limited-power laptop

Azure – working with Microsoft

Page 18: FreeEed presentation

18+Review architecture

• Lucene

• Solr

• HBase

• Lucene indexes created in reducers and

combined in Solr

• For small matters, write directly to Solr

Page 19: FreeEed presentation

19+Review screen

Page 20: FreeEed presentation

20+Review capabilities

• Search

• Cull down

• View text and metadata

• Tag documents

• Export as images or as native files

Page 21: FreeEed presentation

21+Eagle eye’s view - EDRM

Page 22: FreeEed presentation

22+Left of EDRM – Legal Hold

• FreeEedCollect

• Architecture: https://

github.com/markkerzner/FreeEedColl

ect

• ZooKeeper/MapReduce/Flume/HDFS

Page 23: FreeEed presentation

23+Right of EDRM – Org. charts

Partnership with Sintelix

Page 24: FreeEed presentation

24+Analytics – network of actors

Partnership with Sintelix

Page 25: FreeEed presentation

25+FreeEed and data governance

• Virtualization for data preservation

• Scalable processing

• Archiving

• Documents groups not mixing

• Data format stored together with

software that understands it

Page 26: FreeEed presentation

26+Hadoop & Big Data applications

• Other related applications

• Financial – text analytics

• Energy – documents and procedures

analytics

• Actual on-going projects

Page 27: FreeEed presentation

27+FreeEed as a learning tool

• 100’s of downloads

• Dozens of active users

• Real-world Hadoop application

• Many developers download to learn

• Complex, real, but manageable

Page 28: FreeEed presentation

28+FreeEed adoption – who is trying our “popcorn”?• Large law firms

• Small law firms and solos

• Government agencies

• Universities

• Enterprises

• Developers learn Big Data

Page 29: FreeEed presentation

29+Looking forward

• Add

• Collection

• Analytics

• Community

• Integrations

• Implementation

s

Page 30: FreeEed presentation

30+How you can use FreeEed

• For its intended purpose

• Large law firms

• Small firms and solos,

• Pro-se

• Integrate in the IT legal

• Start a similar document management

project

Page 31: FreeEed presentation

31+How you can use FreeEed

• For its intended purpose

• Large law firms

• Small firms and solos,

• Pro-se

• Integrate in the IT legal

• Start a similar document management

project

Page 32: FreeEed presentation

32+Q&A

• Thank you!

• People usually ask:

• How can I put my data in the cloud?

• Is it safe?

• Do you do OCR, PST, OST, etc…?