Page 1
Introduction to Information Retrieval
Introduction to
Information Retrieval
PostingList
Park Cheon Eum
Page 2
Introduction to Information Retrieval
awk - array
Ch. 1
Page 3
Introduction to Information Retrieval
awk - array
Page 4
Introduction to Information Retrieval
awk - array
Page 5
Introduction to Information Retrieval
awk - array
Page 6
Introduction to Information Retrieval
awk - array
Page 7
Introduction to Information Retrieval
Algorithm
start
doc1, … , 10
split(doc1,…,10)
doc1,…,10 < id
append(docs, doc1,…,10)
sort, uniq
posting
postring결과
End
Page 8
Introduction to Information Retrieval
Algorithm - Indexer steps: Token sequence
문서 내용을 토큰 별로 나누어 ID를 설정한다.
I did enact Julius
Caesar I was killed
i' the Capitol;
Brutus killed me.
Doc 1
So let it be with
Caesar. The noble
Brutus hath told you
Caesar was ambitious
Doc 2
Page 9
Introduction to Information Retrieval
Algorithm - Indexer steps: Sort
단어 별로 정렬한다. ID 순으로
Page 10
Introduction to Information Retrieval
Algorithm - Indexer steps: Dictionary & Postings
같은 단어 && 같은 ID 는 하나만 남긴다. (= frequency)
같은 단어 && 다른 ID는 Posting한다.
Sec. 1.2
Page 11
Introduction to Information Retrieval
Algorithm
Page 12
Introduction to Information Retrieval
Processing
doc1, … , 10
split(doc1,…,10)
doc1,…,10 < id
Page 13
Introduction to Information Retrieval
Processing
append(docs, doc1,…,10)
Page 14
Introduction to Information Retrieval
Processing
sort
Page 15
Introduction to Information Retrieval
Processing
15
frequency
Page 16
Introduction to Information Retrieval
Processing
쉬운 방법
posting
Page 17
Introduction to Information Retrieval
Processing
배열 사용 posting
Page 18
Introduction to Information Retrieval
Processing
배열 사용
posting