ontologically-based searching for jobs in linguistics

26
DLLS 2003 1 Ontologically-based Searching for Jobs in Linguistics Deryle Lonsdale [email protected] Funded by:

Upload: molly

Post on 16-Jan-2016

18 views

Category:

Documents


0 download

DESCRIPTION

Ontologically-based Searching for Jobs in Linguistics. Deryle Lonsdale [email protected]. Funded by:. The BYU Data Extraction Group. Group of faculty (5) and students (15) from CS, Linguistics, SOAIS Goal: ontology-based data extraction NSF funding: CISE/IIS/IDM TIDIE Website: www.deg.byu.edu/ - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Ontologically-based Searching for Jobs in Linguistics

DLLS 2003 1

Ontologically-based Searching for Jobs in

Linguistics

Deryle [email protected]

Funded by:

Page 2: Ontologically-based Searching for Jobs in Linguistics

DLLS 2003 2

The BYU Data Extraction Group Group of faculty (5) and students

(15) from CS, Linguistics, SOAIS Goal: ontology-based data

extraction NSF funding: CISE/IIS/IDM TIDIE Website: www.deg.byu.edu/

Papers, presentations Tools Demos

Page 3: Ontologically-based Searching for Jobs in Linguistics

DLLS 2003 3

The BYU Data Extraction Group

Page 4: Ontologically-based Searching for Jobs in Linguistics

DLLS 2003 4

Overview Ontology-based extraction Building knowledge sources Jobs in linguistics (Sproat) Putting it all together Some sample results

Page 5: Ontologically-based Searching for Jobs in Linguistics

DLLS 2003 5

Ontologies and IESource Target

Page 6: Ontologically-based Searching for Jobs in Linguistics

DLLS 2003 6

Document-based IE

Page 7: Ontologically-based Searching for Jobs in Linguistics

DLLS 2003 7

Conceptual modeling (OSM)

Year Price

Make Mileage

Model

Feature

PhoneNr

Extension

Car

hashas

has

has is for

has

has

has

1..*

0..1

1..*

1..* 1..*

1..*

1..*

1..*

0..1 0..10..1

0..1

0..1

0..1

0..*

1..*

Page 8: Ontologically-based Searching for Jobs in Linguistics

DLLS 2003 8

Recognition and Extraction

Car Year Make Model Mileage Price PhoneNr0001 1989 Subaru SW $1900 (336)835-85970002 1998 Elantra (336)526-54440003 1994 HONDA ACCORD EX 100K (336)526-1081

Car Feature0001 Auto0001 AC0002 Black0002 4 door0002 tinted windows0002 Auto0002 pb0002 ps0002 cruise0002 am/fm0002 cassette stereo0002 a/c0003 Auto0003 jade green0003 gold

Page 9: Ontologically-based Searching for Jobs in Linguistics

DLLS 2003 9

Car-Ads Ontology (textual)Car [->object];Car [0..1] has Year [1..*];Car [0..1] has Make [1..*];Car [0...1] has Model [1..*];Car [0..1] has Mileage [1..*];Car [0..*] has Feature [1..*];Car [0..1] has Price [1..*];PhoneNr [1..*] is for Car [0..*];PhoneNr [0..1] has Extension [1..*];Year matches [4]

constant {extract “\d{2}”; context "([^\$\d]|^)[4-9]\d[^\d]"; substitute "^" -> "19"; }, … …End;

Page 10: Ontologically-based Searching for Jobs in Linguistics

DLLS 2003 10

The data-frame library Low-level patterns implemented as

regular expressions Match items such as email

addresses, phone numbers, names, etc.

Mileage matches [8] constant { extract "\b[1-9]\d{0,2}k"; substitute "[kK]" -> "000"; },

{ extract "[1-9]\d{0,2}?,\d{3}"; context "[^\$\d][1-9]\d{0,2}?,\d{3}[^\d]"; substitute "," -> "";}, { extract "[1-9]\d{0,2}?,\d{3}"; context "(mileage\:\s*)[^\$\d][1-9]\d{0,2}?,\d{3}[^\d]"; substitute "," -> "";},

{ extract "[1-9]\d{3,6}"; context "[^\$\d][1-9]\d{3,6}\s*mi(\.|\b\les\b)";}, { extract "[1-9]\d{3,6}"; context "(mileage\:\s*)[^\$\d][1-9]\d{3,6}\b";}; keyword "\bmiles\b", "\bmi\.", "\bmi\b", "\bmileage\b";end;

Page 11: Ontologically-based Searching for Jobs in Linguistics

DLLS 2003 11

Lexicons Repositories of enumerable classes

of lexical information FirstNames, LastNames, USstates,

ProvoOremApts, CarMakes, Drugs, CampGroundFeats, etc.

Page 12: Ontologically-based Searching for Jobs in Linguistics

DLLS 2003 12

Accessing the output Extracted information is stored in a

relational database Results can be queried using SQL Wide range of views is possible

Page 13: Ontologically-based Searching for Jobs in Linguistics

DLLS 2003 13

Finding jobs in linguistics Linguistlist.org, LSA Email distribution lists (corpora,

langage naturelle, CAAL/ACLA, etc.) Usual commercial sites

(monster.com, flipdog.com, dice.com)

Word-of-mouth sources

Page 14: Ontologically-based Searching for Jobs in Linguistics

DLLS 2003 14

Sproat’s analysis Random sample (224/2250) of LinguistList

postings, 1994-2001 Development vs. research, academic vs.

industrial Linguists are most often (approx. 80% of

the time) offered development jobs Linguists hired more for specific tasks

(e.g. grammar, lexicon development) rather than for more general research-oriented tasks (e.g. creating new technological approaches.)

Page 15: Ontologically-based Searching for Jobs in Linguistics

DLLS 2003 15

The banner yearsYear Academia Industry % Industry

1994 27 2 7%

1995 45 5 10%

1996 52 3 5%

1997 48 3 6%

1998 57 3 5%

1999 56 14 20%

2000 55 43 39%

2001 (mid) 22 10 31%

Dramatic rise in 1999, 2000

Steep drop-off since 2001

Rising demand for technical, computational skills

Page 16: Ontologically-based Searching for Jobs in Linguistics

DLLS 2003 16

Linguistic jobs ontology Why?

user-specifiable constraints

Somewhat closely follows existing ontologies (e.g. jobs, software)

Page 17: Ontologically-based Searching for Jobs in Linguistics

DLLS 2003 17

Data frames and lexicons Language names

ethnologue (sub)fields of linguistics

Linguistlist.org Tools, toolkits Software components, programming

languages Linguistics-related job titles Activities Responsibilities Country names

Page 18: Ontologically-based Searching for Jobs in Linguistics

DLLS 2003 18

The corpus 3237 postings (LinguistList, Corpora, LN,

WoM):1998 5411999 5752000 8712001 952 2002 788

Some noise (non-English, factored, program descriptions, attachments, etc.)

Semi-automatic edits (boilerplate, publicity blurbs about institutions, etc.)

Page 19: Ontologically-based Searching for Jobs in Linguistics

DLLS 2003 19

Sample output Here

Page 20: Ontologically-based Searching for Jobs in Linguistics

DLLS 2003 20

Observations 270 don’t have linguist* (!) Demand for knowledge of English

equals that for all other languages combined (G, F, S, J, C)

Computer/computational background required for almost 1/3 (1116)

Noticeable amount of headhunting, particularly in Seattle, DC areas

Page 21: Ontologically-based Searching for Jobs in Linguistics

DLLS 2003 21

Programming languages

0

100

200

300

400

500

600

700

C/C++ CGI HTML/SGMLJ ava/ J script Lisp/Python PerlProlog SQL TclVB XML/XSLT

Page 22: Ontologically-based Searching for Jobs in Linguistics

DLLS 2003 22

Popular subfields

0

100

200

300

400

500

600

700

IE/ IR Morpho NLP Phonetics

Phonology Pragmatics Speech SyntaxSemantics MT TESOL/EFL Translation

Page 23: Ontologically-based Searching for Jobs in Linguistics

DLLS 2003 23

Subfields (another perspective)

0

200

400

600

800

Psycho Neuro HistoricalTypological Acquisition CognitionSocioling Lexicography PhilologyPhilosophy Anthropo

Page 24: Ontologically-based Searching for Jobs in Linguistics

DLLS 2003 24

An engineering discipline? 160 linguistics jobs ending in “engineer” Software development cycle

research e., software design e. development e., software e. software quality e., linguistic test e., linguistic quality e. linguistic support e., user experience e. presales e., technical sales e.

Specific subfields web site e. speech e., voice recognition e., speech recognition application e.,

speech e., ASR tuning e., audio e. dialog e.

tools e. AI e., NLP e. knowledge e. linguist e., natural language e. staff e. human factors e., user interface e.

Page 25: Ontologically-based Searching for Jobs in Linguistics

DLLS 2003 25

Paradigms

0

50

100

150

200

250

300

Machine learning Finite- stateStatistical Stoch/ProbMath GenerativeField Methods

Page 26: Ontologically-based Searching for Jobs in Linguistics

DLLS 2003 26

Other observations Often a job title is not even listed (!) More in18 of data frames (e.g. email,

ph. #) Great need for (preferably hierarchical)

lexical repositories related to linguistics job titles theoretical frameworks, subfields typical linguist job activities linguistic research/development venues