&&computer&science&department,&& hace7epe&university...

32
Pınar Duygulu Computer Science Department, Hace7epe University, Ankara, Turkey Damla Arifoglu Computer Science Department Sabanci University, Istanbul, Turkey presented by Damla Arifoglu

Upload: others

Post on 21-Jun-2021

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: &&Computer&Science&Department,&& Hace7epe&University ...pinar/talks/OttomanWorkshop-2015.… · Duygulu, Pinar; Arifoglu, Damla; Kalpakli, Mehmet; "Cross-document word matching for

Pınar  Duygulu      Computer  Science  Department,    

Hace7epe  University,  Ankara,  Turkey    

Damla  Arifoglu  Computer  Science  Department  

Sabanci  University,  Istanbul,  Turkey    

presented by Damla Arifoglu

Page 2: &&Computer&Science&Department,&& Hace7epe&University ...pinar/talks/OttomanWorkshop-2015.… · Duygulu, Pinar; Arifoglu, Damla; Kalpakli, Mehmet; "Cross-document word matching for

 •  OTAP   is   jointly   sponsored  by   the  University  of  Washington   in  Sea7le  and  Bilkent  

University   in  Ankara,  Turkey  under   the  direcHon  of  Professor  Walter  G.  Andrews  (U.W.)  and  Professor  Mehmet  Kalpaklı  (Bilkent).  

 •  OTAP   is  a  project  employing  computer   technology   to  make   transcribed  O7oman  

texts   and   resources   for   understanding   O7oman   texts   broadly   accessible   to  internaHonal  audiences.    

   

What  is  OTAP?  

h7p://www.bilkent.edu.tr/~otap/  

Page 3: &&Computer&Science&Department,&& Hace7epe&University ...pinar/talks/OttomanWorkshop-2015.… · Duygulu, Pinar; Arifoglu, Damla; Kalpakli, Mehmet; "Cross-document word matching for

-  The Ottoman Empire lasted for more than 6 centuries (1299-1922), spread over 3 continents

-  More than 150 million historical documents were produced

-  Huge collections in the archives, libraries, museums and private collections of almost forty nations

-  Interest of scholars from many disciplines and from many different countries

Why  O7oman?  

Page 4: &&Computer&Science&Department,&& Hace7epe&University ...pinar/talks/OttomanWorkshop-2015.… · Duygulu, Pinar; Arifoglu, Damla; Kalpakli, Mehmet; "Cross-document word matching for

•  Number   of   documents   in   Başbakanlık   Osmanlı  archives   is   150.000.000   (T.C.   Başbakanlık   Devlet  Arşivleri,  2008;  Özdemir,  2003;  Bardakçı,  1998)  

•  Süleymaniye   Library   in   Istanbul   has     65.000  handwri7en,    and    45.000  old  printed  documents    (Kalpaklı,  1991).  

Documents  in  Archives  

Page 5: &&Computer&Science&Department,&& Hace7epe&University ...pinar/talks/OttomanWorkshop-2015.… · Duygulu, Pinar; Arifoglu, Damla; Kalpakli, Mehmet; "Cross-document word matching for

Challenges  of  O7oman  

-  The Ottoman is a connected and cursive script -  The Ottoman alphabet is a subset of the Arabic alphabet -  Has additional vocals and characters from the Persian and

Turkish scripts -  No explicit word boundaries -  Most of the characters are only distinguished by adding

dots and zigzags (called as diacritics)

Page 6: &&Computer&Science&Department,&& Hace7epe&University ...pinar/talks/OttomanWorkshop-2015.… · Duygulu, Pinar; Arifoglu, Damla; Kalpakli, Mehmet; "Cross-document word matching for

O7oman  Documents  

Page 7: &&Computer&Science&Department,&& Hace7epe&University ...pinar/talks/OttomanWorkshop-2015.… · Duygulu, Pinar; Arifoglu, Damla; Kalpakli, Mehmet; "Cross-document word matching for

O7oman  Documents  

Page 8: &&Computer&Science&Department,&& Hace7epe&University ...pinar/talks/OttomanWorkshop-2015.… · Duygulu, Pinar; Arifoglu, Damla; Kalpakli, Mehmet; "Cross-document word matching for

O7oman  Documents  

Page 9: &&Computer&Science&Department,&& Hace7epe&University ...pinar/talks/OttomanWorkshop-2015.… · Duygulu, Pinar; Arifoglu, Damla; Kalpakli, Mehmet; "Cross-document word matching for

O7oman  Documents  

Page 10: &&Computer&Science&Department,&& Hace7epe&University ...pinar/talks/OttomanWorkshop-2015.… · Duygulu, Pinar; Arifoglu, Damla; Kalpakli, Mehmet; "Cross-document word matching for

Sample  O7oman  Words  

Page 11: &&Computer&Science&Department,&& Hace7epe&University ...pinar/talks/OttomanWorkshop-2015.… · Duygulu, Pinar; Arifoglu, Damla; Kalpakli, Mehmet; "Cross-document word matching for

Challenges  of  O7oman  

Main  difficul+es  with  O2oman  :    Meaning  is  content  dependent  Suffixes  in  Turkish  cause  mistakes  The  language/vocabulary  differences  in  Hme  

OCR  is  difficult  for  O7oman  compared  to  LaHn  alphabet  

Page 12: &&Computer&Science&Department,&& Hace7epe&University ...pinar/talks/OttomanWorkshop-2015.… · Duygulu, Pinar; Arifoglu, Damla; Kalpakli, Mehmet; "Cross-document word matching for

Challenges  of  documents  in  archives  

SegmentaHon  problems  in  handwri7en  documents  printed  

handwri7en  

Page 13: &&Computer&Science&Department,&& Hace7epe&University ...pinar/talks/OttomanWorkshop-2015.… · Duygulu, Pinar; Arifoglu, Damla; Kalpakli, Mehmet; "Cross-document word matching for

System  Layout  

Documents  

Representa+on  

Preprocessing  Segmenta+on   Components  

MATCHING  

Retrieval  

Recogni+on  

Page 14: &&Computer&Science&Department,&& Hace7epe&University ...pinar/talks/OttomanWorkshop-2015.… · Duygulu, Pinar; Arifoglu, Damla; Kalpakli, Mehmet; "Cross-document word matching for

ClassificaHon  of  Calligraphy  styles  

Page 15: &&Computer&Science&Department,&& Hace7epe&University ...pinar/talks/OttomanWorkshop-2015.… · Duygulu, Pinar; Arifoglu, Damla; Kalpakli, Mehmet; "Cross-document word matching for

Line  ExtracHon  

A  Hybrid  Approach  for  Line  Segmenta6on  in  Handwri8en  Documents,  Hande  Adiguzel,  Emre  Sahin,  Pinar  Duygulu,  13th  InternaHonal  Conference  on  FronHers  in  HandwriHng  RecogniHon  (ICFHR),  Bari,  Italy,  September  18-­‐20,  2012  

 

Page 16: &&Computer&Science&Department,&& Hace7epe&University ...pinar/talks/OttomanWorkshop-2015.… · Duygulu, Pinar; Arifoglu, Damla; Kalpakli, Mehmet; "Cross-document word matching for

Word  matching  

segmenta+on  

Word  matching  

How  to  represent  a  word?  

Page 17: &&Computer&Science&Department,&& Hace7epe&University ...pinar/talks/OttomanWorkshop-2015.… · Duygulu, Pinar; Arifoglu, Damla; Kalpakli, Mehmet; "Cross-document word matching for

•  p  

Keypoint-­‐based  representaHon  printed  

Rika  

Matching  O8oman  Words:  An  Image  Retrieval  Approach  to  Historical  Document  Indexing  ,  Esra  Ataer,  Pinar  Duygulu  In  Proceedings  of  ACM  InternaHonal  Conference  on  Image  and  Video  Retrieval,(CIVR)  ,    July  9-­‐11  2007,  Amsterdam,  The  Netherlands  

Page 18: &&Computer&Science&Department,&& Hace7epe&University ...pinar/talks/OttomanWorkshop-2015.… · Duygulu, Pinar; Arifoglu, Damla; Kalpakli, Mehmet; "Cross-document word matching for

Line-­‐based  representaHon  

A  line  based  representa6on  to  match  the  words  in  historical  manuscripts,  Ethem  FaHh  Can,  Pinar  Duygulu,  in  Pa7ern  RecogniHon  Le7ers,    Volume  32,  Issue  8,  Pages  1081-­‐1222,  June  2011    

original  

binarized  

contours  approximated    

lines  

Page 19: &&Computer&Science&Department,&& Hace7epe&University ...pinar/talks/OttomanWorkshop-2015.… · Duygulu, Pinar; Arifoglu, Damla; Kalpakli, Mehmet; "Cross-document word matching for

Redif  extracHon  

Redif  Extrac6on  in  Handwri8en  O8oman  Literary  Texts.  Ethem  F.  Can1,  Pınar  Duygulu,  Fazlı  Can,  Mehmet  Kalpaklı,    Int.  Conf.  on  Pa7ern  RecogniHon  (ICPR)  2010  

Page 20: &&Computer&Science&Department,&& Hace7epe&University ...pinar/talks/OttomanWorkshop-2015.… · Duygulu, Pinar; Arifoglu, Damla; Kalpakli, Mehmet; "Cross-document word matching for

Redif  extracHon  

Redif  Extrac6on  in  Handwri8en  O8oman  Literary  Texts.  Ethem  F.  Can1,  Pınar  Duygulu,  Fazlı  Can,  Mehmet  Kalpaklı,    Int.  Conf.  on  Pa7ern  RecogniHon  (ICPR)  2010  

Page 21: &&Computer&Science&Department,&& Hace7epe&University ...pinar/talks/OttomanWorkshop-2015.… · Duygulu, Pinar; Arifoglu, Damla; Kalpakli, Mehmet; "Cross-document word matching for

Cross-­‐document  word  matching  

Duygulu, Pinar; Arifoglu, Damla; Kalpakli, Mehmet; "Cross-document word matching for segmentation and retrieval of Ottoman divans”,Pattern Analysis and Applications,1-17,2014,Springer London

Page 22: &&Computer&Science&Department,&& Hace7epe&University ...pinar/talks/OttomanWorkshop-2015.… · Duygulu, Pinar; Arifoglu, Damla; Kalpakli, Mehmet; "Cross-document word matching for

Word-­‐segmentaHon  with  cross-­‐document  word  matching  

Duygulu, Pinar; Arifoglu, Damla; Kalpakli, Mehmet; "Cross-document word matching for segmentation and retrieval of Ottoman divans”,Pattern Analysis and Applications,1-17,2014,Springer London

Page 23: &&Computer&Science&Department,&& Hace7epe&University ...pinar/talks/OttomanWorkshop-2015.… · Duygulu, Pinar; Arifoglu, Damla; Kalpakli, Mehmet; "Cross-document word matching for

Kufic  

D Arifoglu, E Sahin, H Adiguzel, P Duygulu, M Kalpakli, “Matching Islamic Patterns in Kufic Images”, Pattern Analysis and Applications, 2015

Page 24: &&Computer&Science&Department,&& Hace7epe&University ...pinar/talks/OttomanWorkshop-2015.… · Duygulu, Pinar; Arifoglu, Damla; Kalpakli, Mehmet; "Cross-document word matching for

Kufic  

Page 25: &&Computer&Science&Department,&& Hace7epe&University ...pinar/talks/OttomanWorkshop-2015.… · Duygulu, Pinar; Arifoglu, Damla; Kalpakli, Mehmet; "Cross-document word matching for

Ottoman keyword search system Query word

Page 26: &&Computer&Science&Department,&& Hace7epe&University ...pinar/talks/OttomanWorkshop-2015.… · Duygulu, Pinar; Arifoglu, Damla; Kalpakli, Mehmet; "Cross-document word matching for

Ottoman keyword search system

Page 27: &&Computer&Science&Department,&& Hace7epe&University ...pinar/talks/OttomanWorkshop-2015.… · Duygulu, Pinar; Arifoglu, Damla; Kalpakli, Mehmet; "Cross-document word matching for

Manual  labeling  tool  

Page 28: &&Computer&Science&Department,&& Hace7epe&University ...pinar/talks/OttomanWorkshop-2015.… · Duygulu, Pinar; Arifoglu, Damla; Kalpakli, Mehmet; "Cross-document word matching for

OTAP  transliteraHon  alphabet  

Page 29: &&Computer&Science&Department,&& Hace7epe&University ...pinar/talks/OttomanWorkshop-2015.… · Duygulu, Pinar; Arifoglu, Damla; Kalpakli, Mehmet; "Cross-document word matching for

AutomaHc  labeling  

Page 30: &&Computer&Science&Department,&& Hace7epe&University ...pinar/talks/OttomanWorkshop-2015.… · Duygulu, Pinar; Arifoglu, Damla; Kalpakli, Mehmet; "Cross-document word matching for

Contributors  •  Emre  Şahin  •  Esra  Ataer  Cansızoğlu  •  Ethem  FaHh  Can  •  Damla  Arifoğlu  •  Hande  Adıgüzel  •  Ceyhun  Karbeyaz  •  Bilge  Köroğlu    

Acknowledgements  

Page 31: &&Computer&Science&Department,&& Hace7epe&University ...pinar/talks/OttomanWorkshop-2015.… · Duygulu, Pinar; Arifoglu, Damla; Kalpakli, Mehmet; "Cross-document word matching for

 

             Pinar  Duygulu  

             Associate  Professor  

             Hace7epe  University,  Department  of  Computer  Engineering  

             Ankara,  Turkey  

             [email protected]  

 

             Damla  Arifoglu  

             PhD  student  

             Computer  Engineering  Department  

             Sabanci  University,  Istanbul,  Turkey  

             [email protected]  

 

Contact  info  

Page 32: &&Computer&Science&Department,&& Hace7epe&University ...pinar/talks/OttomanWorkshop-2015.… · Duygulu, Pinar; Arifoglu, Damla; Kalpakli, Mehmet; "Cross-document word matching for

ANY QUESTIONS?

Thank  you!