using the web
DESCRIPTION
the use of the web during the act of translationTRANSCRIPT
![Page 1: Using the Web](https://reader035.vdocument.in/reader035/viewer/2022070508/577c83fc1a28abe054b7167a/html5/thumbnails/1.jpg)
Using the web as a linguistic resource.
Approaches and methods
![Page 2: Using the Web](https://reader035.vdocument.in/reader035/viewer/2022070508/577c83fc1a28abe054b7167a/html5/thumbnails/2.jpg)
• Glossaries/Dictionaries• Translation-Oriented Web Search• Using traditional corpora online• Concordancing the web• The mega-corpus/mini-web approach• Building and using monolingual/comparable
corpora
![Page 3: Using the Web](https://reader035.vdocument.in/reader035/viewer/2022070508/577c83fc1a28abe054b7167a/html5/thumbnails/3.jpg)
• Glossaries/Dictionaries• Translation-Oriented Web Search• Using traditional corpora online• Concordancing the web• The mega-corpus/mini-web approach• Building and using monolingual/comparable
corpora
![Page 4: Using the Web](https://reader035.vdocument.in/reader035/viewer/2022070508/577c83fc1a28abe054b7167a/html5/thumbnails/4.jpg)
Glossaries
![Page 5: Using the Web](https://reader035.vdocument.in/reader035/viewer/2022070508/577c83fc1a28abe054b7167a/html5/thumbnails/5.jpg)
![Page 6: Using the Web](https://reader035.vdocument.in/reader035/viewer/2022070508/577c83fc1a28abe054b7167a/html5/thumbnails/6.jpg)
![Page 7: Using the Web](https://reader035.vdocument.in/reader035/viewer/2022070508/577c83fc1a28abe054b7167a/html5/thumbnails/7.jpg)
![Page 8: Using the Web](https://reader035.vdocument.in/reader035/viewer/2022070508/577c83fc1a28abe054b7167a/html5/thumbnails/8.jpg)
• Glossaries/Dictionaries• Translation-Oriented Web Search• Using traditional corpora online• Concordancing the web• The mega-corpus/mini-web approach• Building and using monolingual/comparable
corpora
![Page 9: Using the Web](https://reader035.vdocument.in/reader035/viewer/2022070508/577c83fc1a28abe054b7167a/html5/thumbnails/9.jpg)
Translation-oriented Web Search
• Basic issues in web search
• Case studies:- evidence of attested usage; - investigating phraseology; - testing translation candidates
![Page 10: Using the Web](https://reader035.vdocument.in/reader035/viewer/2022070508/577c83fc1a28abe054b7167a/html5/thumbnails/10.jpg)
Problems and limitations of search engines
• limited number of results• format• ranking• normalization, punctuation, special characters• informational, navigational, transactional
needs
![Page 11: Using the Web](https://reader035.vdocument.in/reader035/viewer/2022070508/577c83fc1a28abe054b7167a/html5/thumbnails/11.jpg)
Sintassi di GOOGLE
+ > co-occorrenza di due o più parole- > esclude la presenza di una parolaOR > una qualsiasi fra due o più parole
“” > frase esatta * > una parola qualsiasi in una posizionenazione (.uk; .it; .fr…)istituzione (ac.uk; .edu; .org…)
![Page 12: Using the Web](https://reader035.vdocument.in/reader035/viewer/2022070508/577c83fc1a28abe054b7167a/html5/thumbnails/12.jpg)
Web pages evaluation (4)
Imparare a leggere gli indirizzi:
• .it – Italy• .uk – Great Britain • .us – United States• .ca – Canada • .au – Australia • .ie – Ireland (Eire)
![Page 13: Using the Web](https://reader035.vdocument.in/reader035/viewer/2022070508/577c83fc1a28abe054b7167a/html5/thumbnails/13.jpg)
Imparare a leggere gli indirizzi:
• .edu – restricted use by educational sites (usually a university or college)
• .com – general use by commercial business sites • .gov – restricted use by U.S. governmental/non-military
sites • .mil – restricted use by U.S. military sites and agencies • .net – general use by networks, internet service providers,
organizations • .org – general use by non-profit organizations and others
![Page 14: Using the Web](https://reader035.vdocument.in/reader035/viewer/2022070508/577c83fc1a28abe054b7167a/html5/thumbnails/14.jpg)
• Some specific portals:• Pubmed > site:. .ncbi.nlm.nih.gov/pubmed• Wlsecier > site:.elesevier.com
![Page 15: Using the Web](https://reader035.vdocument.in/reader035/viewer/2022070508/577c83fc1a28abe054b7167a/html5/thumbnails/15.jpg)
The kleene star
• By placing an asterisk (kleene star) within quotes one can explore phrases:
e.g. the key to/for success?“the key * success”- For 1,340,000- To 90,000,000e.g. linee cellulari di/dell’epatoma umano“linee cellulari * epatoma umano”? linee cellulari di epatoma umano!
![Page 16: Using the Web](https://reader035.vdocument.in/reader035/viewer/2022070508/577c83fc1a28abe054b7167a/html5/thumbnails/16.jpg)
1. Exploring collocationPAESAGGI SUGGESTIVI
![Page 17: Using the Web](https://reader035.vdocument.in/reader035/viewer/2022070508/577c83fc1a28abe054b7167a/html5/thumbnails/17.jpg)
![Page 18: Using the Web](https://reader035.vdocument.in/reader035/viewer/2022070508/577c83fc1a28abe054b7167a/html5/thumbnails/18.jpg)
Exploring collocationPAESAGGI SUGGESTIVI
![Page 19: Using the Web](https://reader035.vdocument.in/reader035/viewer/2022070508/577c83fc1a28abe054b7167a/html5/thumbnails/19.jpg)
Complex query
• Geographical position: UK• Exact phrase: «suggestive landscapes»• Co-occurrence: travel OR tourism OR holiday• Language: English• Provenance: United Kingdom• Domain:.uk
Google search…
![Page 20: Using the Web](https://reader035.vdocument.in/reader035/viewer/2022070508/577c83fc1a28abe054b7167a/html5/thumbnails/20.jpg)
![Page 21: Using the Web](https://reader035.vdocument.in/reader035/viewer/2022070508/577c83fc1a28abe054b7167a/html5/thumbnails/21.jpg)
2. Valdating translation candidates
You are into English an Italian article on cancer and you encounter the phrase “sede di insorgenza”.
Sede = siteInsorgenza = onsetONSET SITE vs SITE OF ONSET
![Page 22: Using the Web](https://reader035.vdocument.in/reader035/viewer/2022070508/577c83fc1a28abe054b7167a/html5/thumbnails/22.jpg)
“onset site”
![Page 23: Using the Web](https://reader035.vdocument.in/reader035/viewer/2022070508/577c83fc1a28abe054b7167a/html5/thumbnails/23.jpg)
cancer “onset site” (relevance)
![Page 24: Using the Web](https://reader035.vdocument.in/reader035/viewer/2022070508/577c83fc1a28abe054b7167a/html5/thumbnails/24.jpg)
cancer “onset site” site:elsevier.com(relevance + reliability)
![Page 25: Using the Web](https://reader035.vdocument.in/reader035/viewer/2022070508/577c83fc1a28abe054b7167a/html5/thumbnails/25.jpg)
cancer “site of onset” site:.elsevier.com
![Page 26: Using the Web](https://reader035.vdocument.in/reader035/viewer/2022070508/577c83fc1a28abe054b7167a/html5/thumbnails/26.jpg)
«onset site» > 8700cancer «onset site» > 1170cancer «onset site» site:.elsevier.com > 36
• «site of onset» 29500• cancer «site of onset» 10100• cancer «site of onset» site:.elsevier.com > 55
![Page 27: Using the Web](https://reader035.vdocument.in/reader035/viewer/2022070508/577c83fc1a28abe054b7167a/html5/thumbnails/27.jpg)
• «onset site» cancer Google Books
• «site of onset» cancer Google Books
![Page 28: Using the Web](https://reader035.vdocument.in/reader035/viewer/2022070508/577c83fc1a28abe054b7167a/html5/thumbnails/28.jpg)
![Page 29: Using the Web](https://reader035.vdocument.in/reader035/viewer/2022070508/577c83fc1a28abe054b7167a/html5/thumbnails/29.jpg)
![Page 30: Using the Web](https://reader035.vdocument.in/reader035/viewer/2022070508/577c83fc1a28abe054b7167a/html5/thumbnails/30.jpg)
3.Exploring PhraseologyLA FREQUENZA … E’ IN AUMENTO
• La frequenza del carcinoma a cellule squamose della mucosa orale è in rapido aumento; inoltre, il suo comportamento clinico è difficilmente prevedibile basandosi solo sui classici parametri istologici.
Draft translation • The frequency of squamous cell oral cancer is
rapidly increasing…But… • Do people really say that “the frequency of something
is increasing”? • Would people say this when talking about cancer?
![Page 31: Using the Web](https://reader035.vdocument.in/reader035/viewer/2022070508/577c83fc1a28abe054b7167a/html5/thumbnails/31.jpg)
Step 1: in put a string to be tested“The frequency of * is increasing”
50,000 matches
Step 2: add context (“cancer”) to boost relevancecancer “The frequency of * is increasing”
9000 matches
Step 3: add domain (e.g.: .ac.uk) to boost reliabilitycancer “The frequency of * is increasing” site:.ac.uk
4 matches
![Page 32: Using the Web](https://reader035.vdocument.in/reader035/viewer/2022070508/577c83fc1a28abe054b7167a/html5/thumbnails/32.jpg)
Step 4 – Search for a more specific pattern:“the frequency of * cancer is increasing” site:.ac.uk
0 hits
Step 5 - Move the kleene star to further explore phraseology
e.g. “The * of * cancer is increasing”…
![Page 33: Using the Web](https://reader035.vdocument.in/reader035/viewer/2022070508/577c83fc1a28abe054b7167a/html5/thumbnails/33.jpg)
“the * of * cancer is increasing"
![Page 34: Using the Web](https://reader035.vdocument.in/reader035/viewer/2022070508/577c83fc1a28abe054b7167a/html5/thumbnails/34.jpg)
The incidence of * cancer is increasing
![Page 35: Using the Web](https://reader035.vdocument.in/reader035/viewer/2022070508/577c83fc1a28abe054b7167a/html5/thumbnails/35.jpg)
• Glossaries/Dictionaries• Translation-Oriented Web Search• Using traditional corpora online• Concordancing the web• The mega-corpus/mini-web approach• Building and using monolingual/comparable
corpora
![Page 36: Using the Web](https://reader035.vdocument.in/reader035/viewer/2022070508/577c83fc1a28abe054b7167a/html5/thumbnails/36.jpg)
USING THE BNC
- Go to http://corpus.byu.edu/bnc/
- Register as: “Graduate student: not languages or linguistics”
![Page 37: Using the Web](https://reader035.vdocument.in/reader035/viewer/2022070508/577c83fc1a28abe054b7167a/html5/thumbnails/37.jpg)
![Page 38: Using the Web](https://reader035.vdocument.in/reader035/viewer/2022070508/577c83fc1a28abe054b7167a/html5/thumbnails/38.jpg)
![Page 39: Using the Web](https://reader035.vdocument.in/reader035/viewer/2022070508/577c83fc1a28abe054b7167a/html5/thumbnails/39.jpg)
Collocation
![Page 40: Using the Web](https://reader035.vdocument.in/reader035/viewer/2022070508/577c83fc1a28abe054b7167a/html5/thumbnails/40.jpg)
Colligation
![Page 41: Using the Web](https://reader035.vdocument.in/reader035/viewer/2022070508/577c83fc1a28abe054b7167a/html5/thumbnails/41.jpg)
Colligation
![Page 42: Using the Web](https://reader035.vdocument.in/reader035/viewer/2022070508/577c83fc1a28abe054b7167a/html5/thumbnails/42.jpg)
Semantic preference
![Page 43: Using the Web](https://reader035.vdocument.in/reader035/viewer/2022070508/577c83fc1a28abe054b7167a/html5/thumbnails/43.jpg)
Semantic prosody
![Page 44: Using the Web](https://reader035.vdocument.in/reader035/viewer/2022070508/577c83fc1a28abe054b7167a/html5/thumbnails/44.jpg)
Compare collocates
![Page 45: Using the Web](https://reader035.vdocument.in/reader035/viewer/2022070508/577c83fc1a28abe054b7167a/html5/thumbnails/45.jpg)
• Glossaries/Dictionaries• Translation-Oriented Web Search• Using traditional corpora online• Concordancing the web• The mega-corpus/mini-web approach• Building and using monolingual/comparable
corpora
![Page 46: Using the Web](https://reader035.vdocument.in/reader035/viewer/2022070508/577c83fc1a28abe054b7167a/html5/thumbnails/46.jpg)
www.webcorp.org.uk
![Page 47: Using the Web](https://reader035.vdocument.in/reader035/viewer/2022070508/577c83fc1a28abe054b7167a/html5/thumbnails/47.jpg)
![Page 48: Using the Web](https://reader035.vdocument.in/reader035/viewer/2022070508/577c83fc1a28abe054b7167a/html5/thumbnails/48.jpg)
Results for LANDSCAPEfrom WebCorp
![Page 49: Using the Web](https://reader035.vdocument.in/reader035/viewer/2022070508/577c83fc1a28abe054b7167a/html5/thumbnails/49.jpg)
![Page 50: Using the Web](https://reader035.vdocument.in/reader035/viewer/2022070508/577c83fc1a28abe054b7167a/html5/thumbnails/50.jpg)
![Page 51: Using the Web](https://reader035.vdocument.in/reader035/viewer/2022070508/577c83fc1a28abe054b7167a/html5/thumbnails/51.jpg)
![Page 52: Using the Web](https://reader035.vdocument.in/reader035/viewer/2022070508/577c83fc1a28abe054b7167a/html5/thumbnails/52.jpg)
![Page 53: Using the Web](https://reader035.vdocument.in/reader035/viewer/2022070508/577c83fc1a28abe054b7167a/html5/thumbnails/53.jpg)
![Page 54: Using the Web](https://reader035.vdocument.in/reader035/viewer/2022070508/577c83fc1a28abe054b7167a/html5/thumbnails/54.jpg)
Exercises
• Explore collocationVolatilitySpread
• Explore neologismsFlexicurityStaycation
• Explore phrasal creativity Everyone and their * knows
• Explore Computer-Mediated-Communication l8r … dunno… IMHO
![Page 55: Using the Web](https://reader035.vdocument.in/reader035/viewer/2022070508/577c83fc1a28abe054b7167a/html5/thumbnails/55.jpg)
Volatility - chemistry
![Page 56: Using the Web](https://reader035.vdocument.in/reader035/viewer/2022070508/577c83fc1a28abe054b7167a/html5/thumbnails/56.jpg)
Volatility - market
![Page 57: Using the Web](https://reader035.vdocument.in/reader035/viewer/2022070508/577c83fc1a28abe054b7167a/html5/thumbnails/57.jpg)
![Page 58: Using the Web](https://reader035.vdocument.in/reader035/viewer/2022070508/577c83fc1a28abe054b7167a/html5/thumbnails/58.jpg)
![Page 59: Using the Web](https://reader035.vdocument.in/reader035/viewer/2022070508/577c83fc1a28abe054b7167a/html5/thumbnails/59.jpg)
![Page 60: Using the Web](https://reader035.vdocument.in/reader035/viewer/2022070508/577c83fc1a28abe054b7167a/html5/thumbnails/60.jpg)
![Page 61: Using the Web](https://reader035.vdocument.in/reader035/viewer/2022070508/577c83fc1a28abe054b7167a/html5/thumbnails/61.jpg)
More examples from blogs:
• Dunno• IMHO
![Page 62: Using the Web](https://reader035.vdocument.in/reader035/viewer/2022070508/577c83fc1a28abe054b7167a/html5/thumbnails/62.jpg)
WebCorp Linguist’s Search Engine
![Page 63: Using the Web](https://reader035.vdocument.in/reader035/viewer/2022070508/577c83fc1a28abe054b7167a/html5/thumbnails/63.jpg)
http://www.webcorp.org.uk/live/wlse.jsp
REGISTER FOR FREE
![Page 64: Using the Web](https://reader035.vdocument.in/reader035/viewer/2022070508/577c83fc1a28abe054b7167a/html5/thumbnails/64.jpg)
• Synchronic English Web Corpus, a corpus consisting of 467,713,650 words from web-extracted texts covering the period 2000-2010;
• Diachronic English Web Corpus, consisting of 128,951,238 words covering the period Jan 2000 - Dec 2010, with each month containing approximately 1 million words;
• Birmingham Blog Corpus, consisting of 628,558,282 words from blog texts.
![Page 65: Using the Web](https://reader035.vdocument.in/reader035/viewer/2022070508/577c83fc1a28abe054b7167a/html5/thumbnails/65.jpg)
Synchronic English Web Corpus
• Mini-web Sample > 339,907,995 words corpus compiled from 100,000 randomly selected web-pages to form a sample of the distribution of texts throughout the web.
• Domains > 127,805,655 words from 56,000 pages selected from 14 domains on the basis of the Open Directory classification of web pages.
![Page 66: Using the Web](https://reader035.vdocument.in/reader035/viewer/2022070508/577c83fc1a28abe054b7167a/html5/thumbnails/66.jpg)
Domains in the Synchronic English Web Corpus
![Page 67: Using the Web](https://reader035.vdocument.in/reader035/viewer/2022070508/577c83fc1a28abe054b7167a/html5/thumbnails/67.jpg)
![Page 68: Using the Web](https://reader035.vdocument.in/reader035/viewer/2022070508/577c83fc1a28abe054b7167a/html5/thumbnails/68.jpg)
SPREAD
• Spread > health• Spread > business
![Page 69: Using the Web](https://reader035.vdocument.in/reader035/viewer/2022070508/577c83fc1a28abe054b7167a/html5/thumbnails/69.jpg)
![Page 70: Using the Web](https://reader035.vdocument.in/reader035/viewer/2022070508/577c83fc1a28abe054b7167a/html5/thumbnails/70.jpg)
![Page 71: Using the Web](https://reader035.vdocument.in/reader035/viewer/2022070508/577c83fc1a28abe054b7167a/html5/thumbnails/71.jpg)
balkanis/zation
![Page 72: Using the Web](https://reader035.vdocument.in/reader035/viewer/2022070508/577c83fc1a28abe054b7167a/html5/thumbnails/72.jpg)
![Page 73: Using the Web](https://reader035.vdocument.in/reader035/viewer/2022070508/577c83fc1a28abe054b7167a/html5/thumbnails/73.jpg)
Birmingham Blog Corpus
• kinda
![Page 74: Using the Web](https://reader035.vdocument.in/reader035/viewer/2022070508/577c83fc1a28abe054b7167a/html5/thumbnails/74.jpg)
kinda feel + adj
![Page 75: Using the Web](https://reader035.vdocument.in/reader035/viewer/2022070508/577c83fc1a28abe054b7167a/html5/thumbnails/75.jpg)
kinda sorta + verb