supporting the legal reasoning process by classification

38
Chair of Software Engineering for Business Information Systems (sebis) Faculty of Informatics Technische Universität München wwwmatthes.in.tum.de Supporting the Legal Reasoning Process by Classification of Judgments Applying Active Machine Learning Linus Boehm, 07.05.18 Final Presentation, BA-Thesis

Upload: others

Post on 03-Oct-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Supporting the Legal Reasoning Process by Classification

Chair of Software Engineering for Business Information Systems (sebis)

Faculty of Informatics

Technische Universität München

wwwmatthes.in.tum.de

Supporting the Legal Reasoning Process by Classification of

Judgments Applying Active Machine LearningLinus Boehm, 07.05.18 Final Presentation, BA-Thesis

Page 2: Supporting the Legal Reasoning Process by Classification

Motivation

Legal Reasoning Process

Approach

Implementation

Data Set

Evaluation

Conclusion

Outline

© sebis171120 Boehm BA KickOff 2

Page 3: Supporting the Legal Reasoning Process by Classification

Motivation

Legal Documents

• are growing and changing: 2013-2017 period more than 550 adopted laws [1]

• getting more complex

Document Discovery

• often done manually (knowledge-intensive)

• very time-consuming and costly [2]

Supporting the legal reasoning process by classification of Judgments

• common law relies heavily on precedents

• judgments relevant in (German) civil law?

• further development of the law by judges

• Case law is often applied in dynamic areas of life

• German landlord and tenant law is largely shaped by case law

© sebis 3180507 Boehm BA Final

Page 4: Supporting the Legal Reasoning Process by Classification

Legal Reasoning Process – Target

© sebis180507 Boehm BA Final 4

Supporting the legal reasoning process of a lawyer:

• Searching for relevant laws, precedent cases, and facts

• Aim: Seeking for evidence to solve the issue in favor of the client

• Support lawyer by automatic recognition of relevant predicates

• e.g. including predicates about unlawfulness of contracts

Page 5: Supporting the Legal Reasoning Process by Classification

Legal Reasoning Process – Target

© sebis180507 Boehm BA Final 5

Characteristics of the legal reasoning process of a judge:

• Striving for consistency

• Further development of the law

• Attempts to regulate social relations by applying the

legislature’s intention

Opportunity for Lexia to support the legal reasoning process

of a judge:

• Automatic recognition of relevant predicates

Page 6: Supporting the Legal Reasoning Process by Classification

Approach

© sebis180507 Boehm BA Final 6

Manuel Annotation of

JudgmentsImplementation of a BTC for

judgmentsEvaluation

Adaption of Lexia and LexML Evaluating common

performance indicators like

f1score, ROC, recall and

precision

Sentences about the

unlawfulness of contract clauses

Page 7: Supporting the Legal Reasoning Process by Classification

Manual Annotation Process

Example:

[…] entsprechende Formularklausel wegen unangemessener Benachteiligung des Mieters nach § 307 BGB

unwirksam […]

Problems with the Annotation „Rule“:

- Sometimes one Part is mentioned in following or previous sentence -> partly causing the False Positives

© sebis180507 Boehm BA Final 7

Contract clause legal justification unlawfullness+ +

e.g:

Vertragsklausel

Formularklausel

§x des Vertrages

e.g:

Paragraph of Civil Code

Precendant of the BGH

e.g:

Unwirksam

Nichtig

Nichtigkeit

Page 8: Supporting the Legal Reasoning Process by Classification

Implementation

© sebis180507 Boehm BA Final 8

Page 9: Supporting the Legal Reasoning Process by Classification

Implementation

© sebis180507 Boehm BA Final 9

Rest

Interface

User Interface

Importer Data and Text

Mining Engine

Data Access Layer

ES

Rest

Interface

Data Access Layer

MDCSV

Exporter

Machine Learning Engine

Importer

Lexia LexML

Own illustration based on Muhr [3]

Page 10: Supporting the Legal Reasoning Process by Classification

Implementation

© sebis180507 Boehm BA Final 10

Rest

Interface

User Interface

Importer Data and Text

Mining Engine

Data Access Layer

ES

Rest

Interface

Data Access Layer

MDCSV

Exporter

Machine Learning Engine

Importer

Lexia LexML

Adaption of the Sentence Segmenter

Adaption of the UI

- Pipeline Configuration

- Annotator for Binary Labels

Page 11: Supporting the Legal Reasoning Process by Classification

Implementation

© sebis180507 Boehm BA Final 11

Rest

Interface

User Interface

Importer Data and Text

Mining Engine

Data Access Layer

ES

Rest

Interface

Data Access Layer

MDCSV

Exporter

Machine Learning Engine

Importer

Lexia LexML

- Adding Binary Classification

- Weighted NB and LR to encounter

unbalanced classes

Exporter for Binary Evaluation Metrics

Page 12: Supporting the Legal Reasoning Process by Classification

Implementation

© sebis180507 Boehm BA Final 12

Rest

Interface

User Interface

Importer Data and Text

Mining Engine

Data Access Layer

ES

Rest

Interface

Data Access Layer

MDCSV

Exporter

Statistical

Evaluation

Machine Learning Engine

Importer

R Scripts for semi-automated Evaluation Illustration of

different performance Measurements like f1score, PR-

Curve, ROC, etc.

Lexia LexML

Page 13: Supporting the Legal Reasoning Process by Classification

Active Learning – Setting

AML Pipline Stages:

• Preprocessing: Tokenizer, Stopwords Removing

• Vector Representation: Binary Word Representation

• Classifier: Multinominal Naive Bayes (Weighted)

• Query Strategy: Uncertainty Sampling (Maximum Vote

Entropy)

© sebis180507 Boehm BA Final 13

Machine Learning

Model

Labeled

training set

Oracle

(human annotator)

Unlabeled

pool

ℒ 𝑈

Page 14: Supporting the Legal Reasoning Process by Classification

Data Set

© sebis180507 Boehm BA Final 14

Page 15: Supporting the Legal Reasoning Process by Classification

Data Set

Raw Documents

• Over 800 imported and annotated judgments

• BGH judgments of the VIII. Civil Senate

• specialized in law on the sale of goods, landlord and tenancy law

Training & Test Set

• Using a subset of 82 documents

• Only Tenor and Reasoning part imported

• 3135 Sentences of which are 72 True (2,3%)

• 80/20 partition

© sebis180507 Boehm BA Final 15

Page 16: Supporting the Legal Reasoning Process by Classification

Evaluation

© sebis180507 Boehm BA Final 16

Page 17: Supporting the Legal Reasoning Process by Classification

Discover weight factor

© sebis180507 Boehm BA Final 17

Page 18: Supporting the Legal Reasoning Process by Classification

Discover weight factor

© sebis180507 Boehm BA Final 18

Weight 20/0.05

seems to perform

well

Page 19: Supporting the Legal Reasoning Process by Classification

Discover weight factor

© sebis180507 Boehm BA Final 19

Page 20: Supporting the Legal Reasoning Process by Classification

Comparison of Seed Set Size and Query Size Influence

- Weight true with factor 200 (w8)

- Seed Set random sampled

© sebis180507 Boehm BA Final 20

Page 21: Supporting the Legal Reasoning Process by Classification

Influence of Seed Set

© sebis180507 Boehm BA Final 21

Missed class/cluster effect

Page 22: Supporting the Legal Reasoning Process by Classification

Influence of Seed Set

© sebis180507 Boehm BA Final 22

Page 23: Supporting the Legal Reasoning Process by Classification

Reasons

- Small Random Sampled Seed Sets are often not representative

- Assumption BC 3% labeled as True:

- P(no True in Seed Set with n=20) ≈ 54%

- Unbalanced classes can lead to missed class/cluster effect

- => sufficient exploration phase during the seed set generation is crucial

© sebis180507 Boehm BA Final 23

Page 24: Supporting the Legal Reasoning Process by Classification

ROC with unbalanced classes

© sebis180507 Boehm BA Final 24

Page 25: Supporting the Legal Reasoning Process by Classification

ROC with unbalanced classes

© sebis180507 Boehm BA Final 25

All Configurations seem to

perform well

Page 26: Supporting the Legal Reasoning Process by Classification

N = 624 Predicted = 1 Predicted = 0

Actual = 1 TP=8 FN=6

Actual = 0 FP=4 TN=606

© sebis180507 Boehm BA Final 26

TPR (recall) = 𝑇𝑃

𝑇𝑃+𝐹𝑁FPR =

𝐹𝑃

𝐹𝑃+𝑇𝑁

Reason: TN is dominant -> FPR even for low thresholds relative stable [4]

Page 27: Supporting the Legal Reasoning Process by Classification

Influence of Query Size

© sebis180507 Boehm BA Final 27

Page 28: Supporting the Legal Reasoning Process by Classification

Influence of Query Size

© sebis180507 Boehm BA Final 28

Page 29: Supporting the Legal Reasoning Process by Classification

Influence of Query Size

© sebis180507 Boehm BA Final 29

Page 30: Supporting the Legal Reasoning Process by Classification

Recall Heatmap

© sebis180507 Boehm BA Final 30

Recall= 𝑇𝑃

𝑇𝑃+𝐹𝑁

Page 31: Supporting the Legal Reasoning Process by Classification

Precision Heatmap

© sebis180507 Boehm BA Final 31

Low FP cause

high precision

even when

Precision = 𝑇𝑃

𝑇𝑃+𝐹𝑃

Page 32: Supporting the Legal Reasoning Process by Classification

F1Score Heatmap

© sebis180507 Boehm BA Final 32

Page 33: Supporting the Legal Reasoning Process by Classification

Demo

© sebis180507 Boehm BA Final 33

Page 34: Supporting the Legal Reasoning Process by Classification

Conclusion

• Binary Classification is a promising approach for identifying predicates in judgments

• Seed Set Generation and Size is crucial

Limitations

• Evaluation with only one preprocessing configuration

• No Cross-Validation

• Strict Annotation rule misses some sentences

Future Work

• Evaluation of different preprocessing configurations in context of judgments

• Validate results with Cross-Validation

• Try different methods to encounter unbalenced classes

• Use a combination of AML and rule-based learning for a better class balance

• Using different predicates then Sentences like paragraphs or “n-grams” of sentences

© sebis180507 Boehm BA Final 34

Page 35: Supporting the Legal Reasoning Process by Classification

Technische Universität München

Faculty of Informatics

Chair of Software Engineering for Business

Information Systems

Boltzmannstraße 3

85748 Garching bei München

Tel +49.89.289.

Fax +49.89.289.17136

wwwmatthes.in.tum.de

Linus Boehm

Advisor: Ingo Glaser

17132

[email protected]

Page 36: Supporting the Legal Reasoning Process by Classification

References

(1) https://www.bundestag.de/blob/194870/7c8a01e16c98fc9c32ddb203d7bd88e0/gesetzgebung_wp18-

data.pdf

(2) Gruner, Ronald H. "Anatomy of a lawsuit." (2008).

(3) Muhr, Johannes, Master Thesis, Design, Prototypical Implementation and Evaluation of an Active Machine

Learning Service in the Context of Legal Text Classification (2017)

(4) https://www.kaggle.com/lct14558/imbalanced-data-why-you-should-not-use-roc-curve

© sebis171120 Boehm BA KickOff 36

Page 37: Supporting the Legal Reasoning Process by Classification

Research Method

Design Science as most fitting method

- Focus is on the creation of an effective IT solution

- Description of the design problem

- Development an new IT artefact

- Strict evaluation of the IT artefact to measure the utility

© sebis180507 Boehm BA Final 37

Environment IS Research Knowledge Base

Business

Needs

Applicable

KnowledgeAssess Refine

Develop/Build Foundations/

Methodologies

People/Organizations

Application in the Appropriate Environment Additions to the Knowledge Base

Technology

• Lexia

• LexML

• Extended Version of

LexML

• Adaptions to Lexia

• Software Developers

• Legal Experts • (Active) Machine Learning

• Binary Text Classification in

jurisdictions

• Rule-based Approaches

• ML Frameworks

• Document Classification

• Sentence Classification

• Binary Text Classification

Justify/Evaluate

Page 38: Supporting the Legal Reasoning Process by Classification

Research Questions

How can ML support the legal reasoning process of lawyers and judges?

What challenges occur in a BTC of judgments?

What is the best combination of Seed Set and Query Size for a BTC in context of jurisdiction?

© sebis180507 Boehm BA Final 38