freeeed presentation
TRANSCRIPT
![Page 1: FreeEed presentation](https://reader036.vdocument.in/reader036/viewer/2022070317/55626457d8b42ae87d8b4f67/html5/thumbnails/1.jpg)
+
Hadoop-based Open Source eDiscovery: FreeEed
(Easy as popcorn)
![Page 2: FreeEed presentation](https://reader036.vdocument.in/reader036/viewer/2022070317/55626457d8b42ae87d8b4f67/html5/thumbnails/2.jpg)
2+Business (legal) use case
• Duty to disclose information – rule FRCP
26
• Preserve relevant information
• Produce information on request
• Keep the information for X years
• Sanctions for obstruction
• Sanctions for non-compliance
![Page 3: FreeEed presentation](https://reader036.vdocument.in/reader036/viewer/2022070317/55626457d8b42ae87d8b4f67/html5/thumbnails/3.jpg)
3+Before the thirties
• Court room was full of surprises
![Page 4: FreeEed presentation](https://reader036.vdocument.in/reader036/viewer/2022070317/55626457d8b42ae87d8b4f67/html5/thumbnails/4.jpg)
4+Civil discovery changes this
![Page 5: FreeEed presentation](https://reader036.vdocument.in/reader036/viewer/2022070317/55626457d8b42ae87d8b4f67/html5/thumbnails/5.jpg)
5+Discovery basics
• Obligations of the parties
• At the start of a lawsuit or litigation
possibility, preserve relevant data
• Produce data at request, within
timelines
• Review the data before production
• Can request eDiscovery from
opponents
• Store and archive
![Page 6: FreeEed presentation](https://reader036.vdocument.in/reader036/viewer/2022070317/55626457d8b42ae87d8b4f67/html5/thumbnails/6.jpg)
6+Interesting facts about eDiscovery• Most of these are proprietary or under
NDA
• Representative case size: 5GB to
500GB
• Cost per GB of processing: $5-200,
~$100
• Takes 25-50% of litigation budget
• Days to process and months to
review
• Preservation: 3-7 years
• 500 providers, with 10 majors
![Page 7: FreeEed presentation](https://reader036.vdocument.in/reader036/viewer/2022070317/55626457d8b42ae87d8b4f67/html5/thumbnails/7.jpg)
7+Challenges of eDiscovery
• Data sizes in the TB
• Seasonal loads, tight deadlines
• Hundreds of file formats
• Heavy read/write load in review
• Text analytics is of paramount
importance
• Huge price tickets obstruct justice
![Page 8: FreeEed presentation](https://reader036.vdocument.in/reader036/viewer/2022070317/55626457d8b42ae87d8b4f67/html5/thumbnails/8.jpg)
8+FreeEed main features
• Open source Hadoop-based eDiscovery:
• As scalable as Hadoop
• Fast review with NoSQL
• Scales with the lawsuit - time and
volume
• Data preservation and archiving with
VM
• Only possible with open source
license
![Page 9: FreeEed presentation](https://reader036.vdocument.in/reader036/viewer/2022070317/55626457d8b42ae87d8b4f67/html5/thumbnails/9.jpg)
9+Design goals
• Built on open source components
• Big Data scalable
• Preservation, chain of custody,
archiving
• Scalable technically and business-ly
• Stable (don’t laugh, people get different
results on different runs)
• Close-source compatible (MS + Azure
too)
![Page 10: FreeEed presentation](https://reader036.vdocument.in/reader036/viewer/2022070317/55626457d8b42ae87d8b4f67/html5/thumbnails/10.jpg)
10+Packaging architecture
• Comes as VM’s
• Grab as few or as many as you want
• No mixing of matters
• No ethical problems
• Preserve for as many years as you want
• 1 VM = 1 corn, FreeEed = free popcorn
![Page 11: FreeEed presentation](https://reader036.vdocument.in/reader036/viewer/2022070317/55626457d8b42ae87d8b4f67/html5/thumbnails/11.jpg)
11+FreeEed makes lawyers happy
![Page 12: FreeEed presentation](https://reader036.vdocument.in/reader036/viewer/2022070317/55626457d8b42ae87d8b4f67/html5/thumbnails/12.jpg)
12+FreeEed : Architecture
![Page 13: FreeEed presentation](https://reader036.vdocument.in/reader036/viewer/2022070317/55626457d8b42ae87d8b4f67/html5/thumbnails/13.jpg)
+FreeEed popcorn is very popular with lawyers, legal techs, IT, etc.
![Page 14: FreeEed presentation](https://reader036.vdocument.in/reader036/viewer/2022070317/55626457d8b42ae87d8b4f67/html5/thumbnails/14.jpg)
14+FreeEed popcorn
• Deploy on laptops, servers or cloud
• One-node or any number of nodes
• Scalable storage
• Different cooking recipes
• No mixing of matters
• Easy archiving
• Easy deletion
![Page 15: FreeEed presentation](https://reader036.vdocument.in/reader036/viewer/2022070317/55626457d8b42ae87d8b4f67/html5/thumbnails/15.jpg)
15+Processing architecture
• Based on golden-image VM
• Controlled cluster start in any
environment
• Index / cull on the fly or later
• Immediately searchable
![Page 16: FreeEed presentation](https://reader036.vdocument.in/reader036/viewer/2022070317/55626457d8b42ae87d8b4f67/html5/thumbnails/16.jpg)
16+Cluster start-up on EC2
![Page 17: FreeEed presentation](https://reader036.vdocument.in/reader036/viewer/2022070317/55626457d8b42ae87d8b4f67/html5/thumbnails/17.jpg)
17+Cloud integration
Downloadable VM’s
Same VM’s on Amazon AWS
Amazon VM’s are very convenient Immediate deployment Any hardware configuration you need Control lots of power from a limited-power laptop
Azure – working with Microsoft
![Page 18: FreeEed presentation](https://reader036.vdocument.in/reader036/viewer/2022070317/55626457d8b42ae87d8b4f67/html5/thumbnails/18.jpg)
18+Review architecture
• Lucene
• Solr
• HBase
• Lucene indexes created in reducers and
combined in Solr
• For small matters, write directly to Solr
![Page 19: FreeEed presentation](https://reader036.vdocument.in/reader036/viewer/2022070317/55626457d8b42ae87d8b4f67/html5/thumbnails/19.jpg)
19+Review screen
![Page 20: FreeEed presentation](https://reader036.vdocument.in/reader036/viewer/2022070317/55626457d8b42ae87d8b4f67/html5/thumbnails/20.jpg)
20+Review capabilities
• Search
• Cull down
• View text and metadata
• Tag documents
• Export as images or as native files
![Page 21: FreeEed presentation](https://reader036.vdocument.in/reader036/viewer/2022070317/55626457d8b42ae87d8b4f67/html5/thumbnails/21.jpg)
21+Eagle eye’s view - EDRM
![Page 22: FreeEed presentation](https://reader036.vdocument.in/reader036/viewer/2022070317/55626457d8b42ae87d8b4f67/html5/thumbnails/22.jpg)
22+Left of EDRM – Legal Hold
• FreeEedCollect
• Architecture: https://
github.com/markkerzner/FreeEedColl
ect
• ZooKeeper/MapReduce/Flume/HDFS
![Page 23: FreeEed presentation](https://reader036.vdocument.in/reader036/viewer/2022070317/55626457d8b42ae87d8b4f67/html5/thumbnails/23.jpg)
23+Right of EDRM – Org. charts
Partnership with Sintelix
![Page 24: FreeEed presentation](https://reader036.vdocument.in/reader036/viewer/2022070317/55626457d8b42ae87d8b4f67/html5/thumbnails/24.jpg)
24+Analytics – network of actors
Partnership with Sintelix
![Page 25: FreeEed presentation](https://reader036.vdocument.in/reader036/viewer/2022070317/55626457d8b42ae87d8b4f67/html5/thumbnails/25.jpg)
25+FreeEed and data governance
• Virtualization for data preservation
• Scalable processing
• Archiving
• Documents groups not mixing
• Data format stored together with
software that understands it
![Page 26: FreeEed presentation](https://reader036.vdocument.in/reader036/viewer/2022070317/55626457d8b42ae87d8b4f67/html5/thumbnails/26.jpg)
26+Hadoop & Big Data applications
• Other related applications
• Financial – text analytics
• Energy – documents and procedures
analytics
• Actual on-going projects
![Page 27: FreeEed presentation](https://reader036.vdocument.in/reader036/viewer/2022070317/55626457d8b42ae87d8b4f67/html5/thumbnails/27.jpg)
27+FreeEed as a learning tool
• 100’s of downloads
• Dozens of active users
• Real-world Hadoop application
• Many developers download to learn
• Complex, real, but manageable
![Page 28: FreeEed presentation](https://reader036.vdocument.in/reader036/viewer/2022070317/55626457d8b42ae87d8b4f67/html5/thumbnails/28.jpg)
28+FreeEed adoption – who is trying our “popcorn”?• Large law firms
• Small law firms and solos
• Government agencies
• Universities
• Enterprises
• Developers learn Big Data
![Page 29: FreeEed presentation](https://reader036.vdocument.in/reader036/viewer/2022070317/55626457d8b42ae87d8b4f67/html5/thumbnails/29.jpg)
29+Looking forward
• Add
• Collection
• Analytics
• Community
• Integrations
• Implementation
s
![Page 30: FreeEed presentation](https://reader036.vdocument.in/reader036/viewer/2022070317/55626457d8b42ae87d8b4f67/html5/thumbnails/30.jpg)
30+How you can use FreeEed
• For its intended purpose
• Large law firms
• Small firms and solos,
• Pro-se
• Integrate in the IT legal
• Start a similar document management
project
![Page 31: FreeEed presentation](https://reader036.vdocument.in/reader036/viewer/2022070317/55626457d8b42ae87d8b4f67/html5/thumbnails/31.jpg)
31+How you can use FreeEed
• For its intended purpose
• Large law firms
• Small firms and solos,
• Pro-se
• Integrate in the IT legal
• Start a similar document management
project
![Page 32: FreeEed presentation](https://reader036.vdocument.in/reader036/viewer/2022070317/55626457d8b42ae87d8b4f67/html5/thumbnails/32.jpg)
32+Q&A
• Thank you!
• People usually ask:
• How can I put my data in the cloud?
• Is it safe?
• Do you do OCR, PST, OST, etc…?