scientists see promise in deep-learning programs microsoft seeks an edge in analyzing big data jeff...

Download Scientists See Promise in Deep-Learning Programs Microsoft Seeks an Edge in Analyzing Big Data Jeff Hawkins Develops a Brainy Big Data Company Google Offers

If you can't read please download the document

Upload: gladys-caldwell

Post on 16-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

  • Slide 1
  • Scientists See Promise in Deep-Learning Programs Microsoft Seeks an Edge in Analyzing Big Data Jeff Hawkins Develops a Brainy Big Data Company Google Offers Big-Data Analytics The Age of Big Data How Big Data Became So Big Why Hire a Lawyer? Computers Are Cheaper Armies of Expensive Lawyers, Replaced by Cheaper Software
  • Slide 2
  • The total amount of digital data in the world is estimated to exceed 1.8 Zettabytes (1.8 TRILLION Gigabytes)) The digital universe is doubling every 2 years 85% of that data is owned or controlled by corporations at some point in its lifecycle Source: International Data Corporation (IDC) Study, 2012
  • Slide 3
  • Big Data is Here And its coming soon to a litigation near you Whats changed?
  • Slide 4
  • The Great Commingling
  • Slide 5
  • Redefining scalability in eDiscovery. 1 1000 1 X 10 12
  • Slide 6
  • Predictive Coding is a Form of Machine Learning What is Machine Learning?
  • Slide 7
  • voice recognition software, e.g., calling your bank or credit card company handwriting, facial or fingerprint recognition analyzing market trends and guiding investment decisions making decisions on applications for credit or loans modeling and predicting severe weather patterns filtering spam in your email inbox targeted marketing on the internet robotics Its already a part of our lives...
  • Slide 8
  • KEY POINT: Predictive coding is just a part of a continuum of technology assisted review (TAR) methods that we are already very familiar with in searching and analyzing data. Key Words Concept Clustering Concept Search Predictive Coding Three supporting propositions: 1.Each successive approach incorporates the preceding approaches. 2.Each successive approach contains more supporting criteria. 3.All are ultimately based on the concept of pattern matching.
  • Slide 9
  • Key Words = Simple pattern matching External input: wild, wolf, pet dog cat rhino ferret goldfish cow wolf domestic wild pet
  • Slide 10
  • Concept Clustering = Organization based on internal relationships dog cat domesticated wild pet rhino ferret goldfish cow wolf tiger dog cat domesticated wild pet rhino ferret goldfish cow wolf tiger 01110111011010010110110001100100 (wild) 011001000110111101100111 (dog) 011100000110010101110100 (pet)
  • Slide 11
  • Concept Searching dog cat rhino ferret goldfish cow wolf domestic wild pet dog cat rhino ferret goldfish cow wolf domesticated wild pet tiger = Key words + Concept organization External input: zoo, wild, domesticated farm zoo 01111010011011110110111 (zoo) 01110111011010010110110001100100 (wild) 01100100011011110110110101100101011100 11011101000110100101100011011000010111 01000110010101100100 (domesticated)
  • Slide 12
  • Predictive Coding dog cat rhino ferret goldfish cow wolf domestic wild pet dog cat rhino ferret goldfish cow wolf domesticated wild pet tiger = document-level input + probabilistic modeling farm zoo external input: human-coded documents output: doc-level probability rankings 01111010011011110110111 (zoo) 01110111011010010110110001100100 (wild) 01100100011011110110110101100101011100 11011101000110100101100011011000010111 01000110010101100100 (domesticated)
  • Slide 13
  • Infer Step 1. sample documents from entire set.
  • Slide 14
  • Step 2: attorney review of sample documents to create training and control set. In the European mind, wolves long stood as a symbol of baneful, uncontrollable nature. As far back as the time of Aesop in 500 BCE (Before the Common Era), wolves in literature are portrayed as wicked villains and long- fanged, terrible beasts. Before the Middle Ages, wolves were nearly always the greedy thief, criminal trickster, or cruel remorseless murderer. The wolf does not fare well in the European imagination. Can the wolf be domesticated? The domesticated dog is descended from the wolf found in the wild. While some people have occasionally attempted to raise wolves as pets, their 2 inch fangs and tendency to eat nearby small animals such as cats can create socially awkward situations with neighbors. Responsive Not Responsive
  • Slide 15
  • Step 3: create model from human coded training set (responsive and not responsive). In the European mind, wolves long stood as a symbol of baneful, uncontrollable nature. As far back as the time of Aesop in 500 BCE (Before the Common Era), wolves in literature are portrayed as wicked villains and long-fanged, terrible beasts. Before the Middle Ages, wolves were nearly always the greedy thief, criminal trickster, or cruel remorseless murderer. The wolf does not fare well in the European imagination. Can the wolf be domesticated? The domesticated dog is descended from the wolf found in the wild. While some people have occasionally attempted to raise wolves as pets, their 2 inch fangs and tendency to eat nearby small animals such as cats can create socially awkward situations with neighbors. Can the wolf be domesticated? The domesticated dog is descended from the wolf found in the wild. While some people have occasionally attempted to raise wolves as pets, their 2 inch fangs and tendency to eat nearby small animals such as cats can create socially awkward situations with neighbors. wolves wolf pet WordPos.Neg. wolf.98.08 dog.56.43 pet.42.28 raise.61.09 costner dances WordAssoc% wolfpet.73 dogwolf.43 petraise..88 raisewolf.61 raise werewolf 01100100 01101111 01100111
  • Slide 16
  • Step 4: test model against sample (human coded) set. "Dances With Wolves" has the makings of a great work, one that recalls a variety of literary antecedents, everything from "Robinson Crusoe" and "Walden" to "Tarzan of the Apes." Michael Blake's screenplay touches both on man alone in nature and on the 19th- century white man's assuming his burden among the less privileged. Wolves are sometimes kept as exotic pets, and in some rarer occasions, as working animals. Although closely related to dogs (which are believed to have split from wolves between 10,000 and 100,000 years ago), wolves do not show the same tractability as dogs in living alongside humans. Wolves also need much more space than dogs, about 10- 15 sq. miles.
  • Slide 17
  • Yes No Apply model to remainder of documents that have not been reviewed Responsive Non-responsive
  • Slide 18
  • Step 5: Apply model to entire set and rank documents. 100 % 90% 80% 70% 60% 50% 40% 30% 20% 10% 0%
  • Slide 19
  • PREDICTIVE CODING AND BIG DATA NYLJ/Pangea3 Webinar April 15, 2013
  • Slide 20
  • OUTLINE 1.Mitigating Big Data in E-Discovery 2.Stakeholder Analysis 3.The New Reality of Predictive Coding 4.Long-Term Trends
  • Slide 21
  • MITIGATING BIG DATA IN E- DISCOVERY Predictive Coding and Big Data
  • Slide 22
  • BIG DATA IN E-DISCOVERY Bigger haystackmore documents in general Corporate data culturemore relevant documents More sourcesposes collection/preservation challenges
  • Slide 23
  • MITIGATING BIG DATA IN E-DISCOVERY Some mitigating factors: Principles of proportionality and cooperation Information governance tools and document management Technology-assisted review and predictive coding
  • Slide 24
  • STAKEHOLDER ANALYSIS Predictive Coding and Big Data
  • Slide 25
  • PREDICTIVE CODING STAKEHOLDER ANALYSIS Judges: generally receptive Clients: cost efficiencies vs. risk management Lawyers: new model, building expertise
  • Slide 26
  • THE NEW REALITY OF PREDICTIVE CODING Predictive Coding and Big Data
  • Slide 27
  • NEW REALITY OF PREDICTIVE CODING Reduced Data Volumes Increased Complexity and Density Focused, High-Stakes Human Review Battle of Expertise Predictive Coding
  • Slide 28
  • LONG-TERM TRENDS Predictive Coding and Big Data
  • Slide 29
  • LONG-TERM TRENDS Over time, Big Data growth > predictive coding benefits Some document-by-document human review necessary Strategic nuances in a new discovery battleground
  • Slide 30
  • NEW YORK Pangea3 LLC 530 5th Avenue, 7th FL New York, NY 10036 Tel. (US Main): +1-212-689-3819 Fax: +1-212-820-9784 MUMBAI Pangea3 Legal Database Systems Pvt. Ltd. 102-B, Ground Floor, Leela Business Park Andheri-Kurla Road Andheri East, Mumbai 400 059, India U.S. Line:+1-877-311-8528 Tel.:+91-22-6191-7500 Fax:+91-22-6191-7600 DALLAS Pangea3 LLC 2395 Midway Road Carrollton, TX 75006 Tel. (US Main): +1-212-689-3819 Fax: +1-212-820-9784 DELHI Pangea3 Legal Database Systems Pvt. Ltd. B-23, Sector 58 Noida UP 20 301, India U.S. Line: +1-877-311-8528 Tel: +91-120-425-5210/14/16 Fax: +212-820-9783 CONTACT PANGEA3
  • Slide 31
  • SEARCH (1) How do we search for discoverable ESI? Manually? Manually? With automated assistance? With automated assistance? Which isbetter and why? Which isbetter and why? M.R. Grossman & G.V. Cormack, The Grossman-Cormack Glossary of Technology-Assisted Review, 7 Fed. Cts. Law R. 1 (2013) Maura R. Grossman & Gordon V. Cormack, Technologically- Assisted Review in E-Discovery Can Be More Effective and More Efficient than Exhaustive Manual Review, XVII Rich. J.L. & Tech. 11 (2011) (available at http://jolt.richmond.edu/v17i3/article11.pdf) http://jolt.richmond.edu/v17i3/article11.pdf For a shorter discussion, see Efficient E-Discovery, ABA Journal 31 (Apr. 2012) 31
  • Slide 32
  • SEARCH (2) Using search terms? How accurate are these? See In re National Assn of Music Merchants, Musical Instruments and Equipment Antitrust Litig., 2011 WL 6372826 (S.D. Ca. Dec. 19, 2011) Using search terms? How accurate are these? See In re National Assn of Music Merchants, Musical Instruments and Equipment Antitrust Litig., 2011 WL 6372826 (S.D. Ca. Dec. 19, 2011) 32
  • Slide 33
  • SEARCH (3) Automated review or predictive coding as an alternative to the use of search terms. For decisions which address automated review, see: EORHB, Inc. v. HOA Holdings LLC, C.A. No. 7409 (Del. Ct. Ch. Oct. 15, 2012) EORHB, Inc. v. HOA Holdings LLC, C.A. No. 7409 (Del. Ct. Ch. Oct. 15, 2012) In re Actos (Pioglitazone) Prod. Liability Litig., MDL No. 6:11-md-2299 (W.D. La. July 27, 2012) In re Actos (Pioglitazone) Prod. Liability Litig., MDL No. 6:11-md-2299 (W.D. La. July 27, 2012) Da Silva Moore v. Publicis Groupe SA, 2012 U.S. Dist. LEXIS 23350 (S.D.N.Y. Feb. 24), affd, 11 Civ. 1279 (ALC (AJP) (S.D.N.Y. Apr. 26, 2012) Da Silva Moore v. Publicis Groupe SA, 2012 U.S. Dist. LEXIS 23350 (S.D.N.Y. Feb. 24), affd, 11 Civ. 1279 (ALC (AJP) (S.D.N.Y. Apr. 26, 2012) Global Aerospace Inc. v. Landow Aviation, L.P., Consol. Case No. CL 61040 (VA Cir. Ct. Apr. 23, 2012) Global Aerospace Inc. v. Landow Aviation, L.P., Consol. Case No. CL 61040 (VA Cir. Ct. Apr. 23, 2012) 33
  • Slide 34
  • SEARCH (4) WHAT LESSONS CAN BE DRAWN FROM THE DECISIONS? Judge approved automated search at a threshold level. Results may be subject to challenge and later rulings. Judge approved automated search at a threshold level. Results may be subject to challenge and later rulings. Threshold superiority of automated vs. manual review recognized given volume of ESI and attorney review costs. Threshold superiority of automated vs. manual review recognized given volume of ESI and attorney review costs. Large volumes of ESI in issue. Large volumes of ESI in issue. Party seeking to do automated review must offer transparency of process or something close to it. Party seeking to do automated review must offer transparency of process or something close to it. Reasonableness of methodology is key. Reasonableness of methodology is key. Speculation by the opposing party is insufficient to defeat threshold approval. Speculation by the opposing party is insufficient to defeat threshold approval. 34
  • Slide 35
  • SEARCH (5) LETS TAKE A DEEP BREATH AND RECAP WHERE WE ARE TODAY, VENDOR HYPE NOTWITHSTANDING: We have yet to see a judicial analysis of process and results in a contested matter. We have yet to see a judicial analysis of process and results in a contested matter. Safe to assume that the proponent of a process will bear the burden of proof (whatever that burden might be). Safe to assume that the proponent of a process will bear the burden of proof (whatever that burden might be). Safe to assume at least some transparency of process may/will be expected. Safe to assume at least some transparency of process may/will be expected. If reasonableness is standard, how reasonable must the results be? Is precision of 80% enough? 90%? Remember, there are no agreed-on standards. If reasonableness is standard, how reasonable must the results be? Is precision of 80% enough? 90%? Remember, there are no agreed-on standards. 35
  • Slide 36
  • INTERLUDE Assume a party makes production of ESI based on search terms proposed by an adversary. Assume further that the adversary suspects something is missing. Is suspicion enough to warrant direct access to the partys databases by a consultant retained by the adversary? If not, what proofs should be required? Will an attorneys certification or affidavit suffice? Will an attorneys certification or affidavit suffice? Will/should the attorney become a witness? Will/should the attorney become a witness? Will experts be needed? Will experts be needed? Note, with regard to proofs, S2 Automation LLC v. Micron Technology, Inc., No. 11-0884 (D.N.M. Aug. 9, 2012), where the court, relying on Rule 26(g)(1), required a party to disclose its search methodology. 36
  • Slide 37
  • INTERLUDE A collision between search and ethics? Assume a partys attorney knows that search terms proposed by adversary counsel, if applied to the partys ESI, will not lead to the production of relevant (perhaps highly relevant) ESI. Assume a partys attorney knows that search terms proposed by adversary counsel, if applied to the partys ESI, will not lead to the production of relevant (perhaps highly relevant) ESI. Absent a lack of candor to adversary counsel or the court under RPC 3.4 (which implies if not require,s some affirmative statement), does not RPC 1.6 require the partys attorney to remain silent? Absent a lack of candor to adversary counsel or the court under RPC 3.4 (which implies if not require,s some affirmative statement), does not RPC 1.6 require the partys attorney to remain silent? What if the nonproduction becomes learned later? If nothing else, will the partys attorney suffer bad PR if nothing else? What if the nonproduction becomes learned later? If nothing else, will the partys attorney suffer bad PR if nothing else? If the partys attorney wants to advise the adversary, should the attorney secure her clients informed consent? What if the client says, no? If the partys attorney wants to advise the adversary, should the attorney secure her clients informed consent? What if the client says, no? (with thanks to the Hon. John M. Facciola) 37
  • Slide 38
  • INTERLUDE AS WE THINK ABOUT SEARCH, THINK ABOUT THE ETHICS ISSUES THAT USE OF A NONPARTY VENDOR MAY LEAD TO! 38