bibletech2010.ppt

31
Andi Wu Asia Bible Society From Identical Strings From Identical Strings From Identical Strings From Identical Strings to Similar Strings to Similar Strings to Similar Strings to Similar Strings Intelligent Search of Biblical Texts Based on Intelligent Search of Biblical Texts Based on Intelligent Search of Biblical Texts Based on Intelligent Search of Biblical Texts Based on Syntax and Semantics Syntax and Semantics Syntax and Semantics Syntax and Semantics

Upload: andi-wu

Post on 14-Apr-2017

281 views

Category:

Documents


2 download

TRANSCRIPT

Andi Wu

Asia Bible Society

From Identical Strings From Identical Strings From Identical Strings From Identical Strings to Similar Stringsto Similar Stringsto Similar Stringsto Similar Strings

Intelligent Search of Biblical Texts Based on Intelligent Search of Biblical Texts Based on Intelligent Search of Biblical Texts Based on Intelligent Search of Biblical Texts Based on Syntax and SemanticsSyntax and SemanticsSyntax and SemanticsSyntax and Semantics

Original Motivation

� Systematic approach to Bible translation

� To make the translation consistent, translators need to know not only the phrases that are identical but phrases that are not identical but similar in meaning.

Asia Bible Society 2

亚洲圣经协会

�Traditional Search:�Based on matches in form�Same words�Same word orders

� Intelligent Search:�Based on matches in meaning�Words can be different�Word orders can be different

Identical Strings vs. Similar StringsIdentical Strings vs. Similar StringsIdentical Strings vs. Similar StringsIdentical Strings vs. Similar Strings

Example of Traditional Search:Concordance

Genesis 1:1

׃׃׃׃ץץץץאת השמים ואת האר את השמים ואת האר את השמים ואת האר את השמים ואת האר ים ים ים ים א ה א ה א ה א ה בראשית ברא בראשית ברא בראשית ברא בראשית ברא Deuteronomy 31:28

ם ם ם ם באזניה באזניה באזניה באזניה ברה ברה ברה ברה ואד ואד ואד ואד ם ם ם ם ושטריכ ושטריכ ושטריכ ושטריכ זקני שבטיכם זקני שבטיכם זקני שבטיכם זקני שבטיכם ־ ־ ־ ־ י את־ כלי את־ כלי את־ כלי את־ כלאל אל אל אל הקהילו הקהילו הקהילו הקהילו ׃׃׃׃ים ואת־ הארץים ואת־ הארץים ואת־ הארץים ואת־ הארץת־ השמ ת־ השמ ת־ השמ ת־ השמ א א א א ידה בם ידה בם ידה בם ידה בם ה ואע ה ואע ה ואע ה ואע ל ל ל ל הא הא הא הא את הדברים את הדברים את הדברים את הדברים

Jeremiah 23:24

ץץץץים ואת־ האר ים ואת־ האר ים ואת־ האר ים ואת־ האר ת־ השמ ת־ השמ ת־ השמ ת־ השמ א א א א ה עשית ה עשית ה עשית ה עשית את את את את הנה הנה הנה הנה ה ה ה ה יהו יהו יהו יהו י י י י אדנ אדנ אדנ אדנ ה ה ה ה אה אה אה אה ויה ויה ויה ויה הנטהנטהנטהנטח+ הגדול ובזרע+ ח+ הגדול ובזרע+ ח+ הגדול ובזרע+ ח+ הגדול ובזרע+ בכ בכ בכ בכ ל ל ל ל־ דבר׃ל־ דבר׃ל־ דבר׃ל־ דבר׃כ כ כ כ מ+ מ+ מ+ מ+ מ מ מ מ א א א א ל ל ל ל יפ יפ יפ יפ א־ א־ א־ א־ ל

Haggai 2:21

ת־ השמים ת־ השמים ת־ השמים ת־ השמים א א א א ר אני מרעישר אני מרעישר אני מרעישר אני מרעישה לאמ ה לאמ ה לאמ ה לאמ חת־ יהוד חת־ יהוד חת־ יהוד חת־ יהוד פ פ פ פ בל בל בל בל זרב זרב זרב זרב ל־ ל־ ל־ ל־ א א א א אמר אמר אמר אמר ׃׃׃׃ואת־ הארץואת־ הארץואת־ הארץואת־ הארץ

Asia Bible Society 4

亚洲圣经协会

Example of Similar Strings:Example of Similar Strings:Example of Similar Strings:Example of Similar Strings:� Same words in different orders

Jeremiah 2:1

Ezekiel 24:20

亚洲圣经协会

Example of Similar Strings:Example of Similar Strings:Example of Similar Strings:Example of Similar Strings:� Different words in different orders

Proverbs 1:7

Psalms 111:10

Similar Strings

� Strings that are similar in meaning

� Similar words in similar syntactic relationships

� Need in Bible translation

Asia Bible Society 7

The importance of Syntactic Relations

� Similar strings != strings containing similar words

� The same words in different syntactic relations can mean very different things

An old man with a dog chased a young lady with an umbrella.

vs.

An old lady with a dog chased a young man with an umbrella.

Asia Bible Society 8

Semantic Units of SentencesTriples: dependency relationships between two words

e.g. In the beginning God created the heavens and the earth.

� God – create ( subject-verb)

� create – heavens (verb-object)

� create – earth (verb-object)

� create – in the beginning (verb-adverbial)

� heavens – earth (conjunction).

Asia Bible Society 9

Different Strings With the Same Triples

God created the heavens and the earth.

The heavens and the earth were created by God.

God created the heavens and He created the earth.

It is God who created the heavens and the earth.

� God – create ( subject-verb)

� create – heavens (verb-object)

� create – earth (verb-object)

� heavens – earth (conjunction).

Asia Bible Society 10

Different Strings With Similar Triples

God created man in his own image.

Adam is the man that God created.

Man was created by God on the sixth day.

I am a man created by God.

Triples in common:

� God – create ( subject-verb)

� create – man (verb-object)

Asia Bible Society 11

Similar Triples With Different Words

His troops were annihilated.

His army was destroyed.

His forces were wiped out.

annihilate troops

destroy army (verb-object)

wipe-out forces

Asia Bible Society 12

Data Requirement

To recognize similar strings in Biblical texts, we need

� Syntactic analysis of the original Hebrew and Greek texts

� Synonym database of Hebrew and Greek

Both of them have already been developed at Asia Bible Society

Asia Bible Society 13

亚洲圣经协会

Genesis 1:1

Hebrew Syntactic Treebank

亚洲圣经协会

John 1:11

Greek Syntactic Treebank

Triples

� Extracted from the trees

� Strings for comparison:

Text covered by each node/subtree

� Similar strings:

Subtrees containing similar triples

Asia Bible Society 16

Synonyms

� Database of Hebrew synonyms

� Database of Greek synonyms

Asia Bible Society 17

Asia Bible Society 18

Compute Similarities Between Subtrees

� Semantic space of a subtree:

The set of triples (including their synonymous expansions) contained in the subtree

� Similar subtrees

Subtrees whose semantic spaces overlap

(set intersection)

� Degree of similarity

Set Intersection / Set Union

Asia Bible Society 19

Semantic Distance

= log ( Intersection / Union ) * -1

Set A = { a, b, c } Set B = { b, c, d, e }

Intersection = { b, c }

Union = { a, b, c, d, e }

Distance(A,B) = log(2/5)* -1 = 0.9162907318742

Set C = { a, b, c, d } Set D = { c, e, f, g, h }

Intersection = { c }

Union = { a, b, c, d, e, f, g, h }

Distance(C,D) = log(1/8)* -1 = 2.0794415416798Asia Bible Society 20

Joshua 18:3 vs. Deuteronomy 4:1

Asia Bible Society 21

� Semantic Space of Joshua 18:3

= { fathers~you(Poss), God~fathers(Poss), Yahweh~God(Appos), Yahweh~give(S-V),give~you(V-O), land~give(Mod) }

� Semantic Space of Leviticus 4:1

= { fathers~you(Poss), God~fathers(Poss), Yahweh~God(Appos), Yahweh~give(S-V),give~you(V-O), land~give(Mod) }

� Intersection

= { fathers~you(Poss), God~fathers(Poss), Yahweh~God(Appos), Yahweh~give(S-V),give~you(V-O), land~give(Mod) }

� Union

= { fathers~you(Poss), God~fathers(Poss),Yahweh~God(Appos), Yahweh~give(S-V),give~you(V-O), land~give(Mod) }

� Semantic distance = log(6/6)* -1 = 0.0Asia Bible Society 22

Proverbs 24:12 vs. Psalms 62:13

Asia Bible Society 23

Asia Bible Society 24

� Semantic Space of Psalms 14:12

= { repay~person(V-O), as~deed(P-O),deed~him(Poss),

repay~as(V-PP)}

� Semantic Space of Psalms 62:1

= { reward~everyone(V-O), as~deed(P-O),deed~him(Poss),

reward~as(V-PP), you~reward(S-V)}

� Intersection = { repay/reward~person/everyone(V-O), as~deed(P-O),deed~him(Poss), repay/reward~as(V-PP)}

� Union = {repay/reward~person/everyone(V-O), as~deed(P-O),deed~him(Poss),repay/reward~as(V-PP),you~reward2(S-V) }

The computation

� Pair-wise comparison of all phrases

� Keep pairs with semantic distance < 9.0

� 1,607,721 in the database

� More than 24 hours on a single machine for the computation

Asia Bible Society 25

Matthew 22:32 vs. Acts 7:32

Asia Bible Society 26

Linking OT and NT

Hebrew OT � Septuagint � Greek NT

� Automatic alignment

� Strong number matching

� Greek Strong numbers for all words in OT which occur in NT

� Match based on Greek Strong numbers

Asia Bible Society 27

Psalms 91:12 <--> Luke 4:11

Asia Bible Society 28

Search in Bible translations

� Alignment between translations and original texts

� Queries in other languages � queries in

Hebrew/Greek

� Search always done in Hebrew/Greek

Asia Bible Society 29

Further Improvements

The results will be better if

� All the references are annotated

� Better alignment between the Hebrew OT and Septuagint

Asia Bible Society 30

Conclusion

Rich linguistic knowledge (syntactic and semantic knowledge) enables us to compare linguistic units on the basis of meaning rather than form, thus making the search of Biblical texts more intelligent.

Asia Bible Society 31