return to the materials digital humanities conference 2013

1
The Daedalus: A search-engine for visualizing semantic relationships Brent Kievit-Kylar, Sean Connolly, & Colin Allen Indiana University, Bloomington Background Search engines make predictions. Given the set of words a user enters, a search engine makes a prediction about what information that user is seeking. The predictions are generally given in the form of a list, with the highest ranked prediction coming first. But, what determines this ranking? Internet search engines turn each webpage into a generic “bag” of words. The “words in the bag” contain no semantic or grammatical meaning. Each is given a score based on how many times it was repeated in the doc, and its connection to all Semantic Linking Search engines can’t always tell when different words mean the same thing. In the “chemist” example below, the search engine reveals it doesn’t know that “qc” and “quality”-“control” mean the same thing. It may never natively figure this out. But to a human with expertise in a specific domain, the matter could be trivial. Semantic linking could help for corpuses compiled over many years, or over different languages. Aristotle separated “plot” from “story”: “story” is the elements and events of a narrative and “plot” is their ordering. Russian Formalists used the words “fabula” and “syuzhet” to write a similar conceptualization of narrative. Researchers might want to know of both. The tool lets a researcher build domain- specific knowledge into generic IT. Future Work We believe Daedalus – the “data list” – can perhaps help best in the querying and cataloging of archives that exist at universities. We believe the tool can help researchers dive deeper into texts with technology, see yet unseen connections “Words are known by the company they keep.” (Firth 1957) Semantic Override Do you know the search-weighting protocols for your data search tools? The way your tool is built impacts the efficacy and limits its potential for use. The tool allows the re- weighting of terms so users may “take over” the search and override the strength of the weightings of the word-symbol relationships.. Re-weighitng the relationships of key words also simultaneously refreshes the search with the new weights and generates a new the other words. A visual representation for the weighted “bag of words” for the non- grammatical query “potter’s patronus animal” at left (drawn from a real web query by our Daedalus tool 9/29/12) Reweig ht key terms across and within texts, and give users greater control over digital research tools. As part of the InPho project, the Daedalus represents each article of the Stanford Encyclopedia of Philosophy as a meta-object, showing the introduction in one domain and the rest of the article as another. Re- weighting and linking search results page.

Upload: sean-connolly

Post on 18-Jan-2017

74 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Return to the Materials Digital Humanities Conference 2013

The Daedalus: A search-engine for visualizing semantic relationships

Brent Kievit-Kylar, Sean Connolly, & Colin Allen Indiana University, Bloomington

BackgroundSearch engines make predictions. Given the set of words a user enters, a search engine makes a prediction about what information that user is seeking. The predictions are generally given in the form of a list, with the highest ranked prediction coming first. But, what determines this ranking?

Internet search engines turn each webpage into a generic “bag” of words. The “words in the bag” contain no semantic or grammatical meaning. Each is given a score based on how many times it was repeated in the doc, and its connection to all

Semantic LinkingSearch engines can’t always tell when different words mean the same thing. In the “chemist” example below, the search engine reveals it doesn’t know that “qc” and “quality”-“control” mean the same thing. It may never natively figure this out. But to a human with expertise in a specific domain, the matter could be trivial.

Semantic linking could help for corpuses compiled over many years, or over different languages. Aristotle separated “plot” from “story”: “story” is the elements and events of a narrative and “plot” is their ordering. Russian Formalists used the words “fabula” and “syuzhet” to write a similar conceptualization of narrative. Researchers might want to know of both. The tool lets a researcher build domain-specific knowledge into generic IT.

Future WorkWe believe Daedalus – the “data list” – can perhaps help best in the querying and cataloging of archives that exist at universities. We believe the tool can help researchers dive deeper into texts with technology, see yet unseen connections

“Words are known by the company they keep.” (Firth 1957)Semantic Override

Do you know the search-weighting protocols for your data search tools? The way your tool is built impacts the efficacy and limits its potential for use. The tool allows the re-weighting of terms so users may “take over” the search and override the strength of the weightings of the word-symbol relationships.. Re-weighitng the relationships of key words also simultaneously refreshes the search with the new weights and generates a new

the other words. A visual representation for the weighted “bag of words” for the non-grammatical query “potter’s patronus animal” at left (drawn from a real web query by our Daedalus tool 9/29/12)

Reweight key terms

across and within texts, and give users greater control over digital research tools.

As part of the InPho project, the Daedalus represents each article of the Stanford Encyclopedia of Philosophy as a meta-object, showing the introduction in one domain and the rest of the article as another. Re-weighting and linking generates new search results.

search results page.