statistical nlp

Upload: chandrashekar-ramaswamy

Post on 06-Jul-2018

224 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/18/2019 Statistical NLP

    1/19

     

    Statistical Natural Language

    Processing

  • 8/18/2019 Statistical NLP

    2/19

     

    What is NLP?

    Natural Language Processing (NLP), orComputational Linguistics, is concerned withtheoretical and practical issues in the design

    and implementation of computer systems forprocessing human languages

    It is an interdisciplinary field which draws on

    other areas of study such as computerscience, artificial intelligence, linguistics andlogic

  • 8/18/2019 Statistical NLP

    3/19

     

     pplications of NLP

    natural language interfaces todata!ases

    programs for classifying and retrie"ingdocuments !y content

    e#planation generation for e#pertsystems

    machine translationad"anced word$processing tools

  • 8/18/2019 Statistical NLP

    4/19

     

    What ma%es NLP a

    computational challenge?

     m!iguous nature of Natural Language&

    'here are "aried applications for

    language technology

    nowledge representation is a difficult

    tas%&

    'here are different le"els of information

    encoded in our language

  • 8/18/2019 Statistical NLP

    5/19

     

    What is statistical NLP?

    Statistical NLP aims to perform

    statistical inference for the field of NLP

    Statistical inference consists of ta%ingsome data generated in accordance

    with some un%nown pro!a!ility

    distri!ution and ma%ing inferences&

  • 8/18/2019 Statistical NLP

    6/19

     

    oti"ations for Statistical NLP

    Cogniti"e modeling of the human language

    processing has not reached a stage where

    we can ha"e a complete mapping !etween

    the language signal and the informationcontents&

    Complete mapping is not always re*uired&

    Statistical approach pro"ides the fle#i!ilityre*uired for ma%ing the modeling of a

    language more accurate&

  • 8/18/2019 Statistical NLP

    7/19 

    Idea !ehind Statistical NLP

    +iew language processing as a noisy

    channel information transmission&

    'he approach re*uires a model thatcharacteries the transmission !y gi"ing

    for e"ery message the pro!a!ility of the

    o!ser"ed output

  • 8/18/2019 Statistical NLP

    8/19 

    Statistical odeling and

    Classification

    Primiti"e acoustic features

    -uantiation

    a#imum li%elihood and related rulesClass conditional density function

    .idden ar%o" odel ethodology

  • 8/18/2019 Statistical NLP

    9/19 

    /etails0&

    Primitive acoustic features are used to

    estimate the speech spectrum on the !asis of

    its statistical properties&

    1y means of quantization a typical speechsignal can !e represented as a se*uence of

    sym!ols and can !e mapped using statistical

    decision rules into a multidimensional

    acoustic feature space, thus classifying the

    signal&

  • 8/18/2019 Statistical NLP

    10/19 

    a#imum Li%elihood

     lthough there is no direct method for computing thepro!a!ility of a phonetic unit gi"en its acousticfeatures,we can use 1ayes rule to estimate theprobability of a phonetic class given its features

    from the likelihood of the features given theclass& 'his method leads to the ma#imum li%elihoodclassifier which assigns an un%nown "ector to thatclass whose pro!a!ility density function conditionedon the class has the ma#imum "alue&

     nother "ariant of the ma#imum li%elihood methodology

    is clustering& 

  • 8/18/2019 Statistical NLP

    11/19 

    .idden ar%o" odels

      .idden ar%o" odel, is a set of states (le#icalcategories in our case) with directed edgesla!eled with transition probabilities thatindicate the pro!a!ility of mo"ing to the state at

    the end of the directed edge, gi"en that one isnow in the state at the start of the edge& 'hestates are also la!eled with a function whichindicates the pro!a!ilities of outputting differentsym!ols if in that state (while in a state, one

    outputs a single sym!ol !efore mo"ing to thene#t state)& In our case, the sym!ol output froma state2le#ical category is a word !elonging tothat le#ical category&

  • 8/18/2019 Statistical NLP

    12/19 

    .idden ar%o" odels (cont&)

  • 8/18/2019 Statistical NLP

    13/19 

    Conditional Class /ensity

    3unction

     ll statistical methods of speech

    recognition depend on the class

    conditional density function&

    'hese, in turn, depend on the e#istence

    of a sufficiently large, correctly la!eled

    training set and well understood

    statistical estimation techni*ues

  • 8/18/2019 Statistical NLP

    14/19 

    .ow does statistics help

    /isam!iguation may !e achie"ed !yusing stochastic conte#t free grammars

    It helps in pro"iding degrees ofgrammaticality

    Naturalness

    Structural preference

    4rror 'olerance

  • 8/18/2019 Statistical NLP

    15/19 

    4#ample using stochastic

    C35

    for e#ample consider the sentence

    6 7ohn Wal%s 6

    'he grammar is as follows 8

    9 S $: NP + ;&<= S $: NP ;&>

    > NP $: N ;&

    @ NP $: N N ;&=

    A N $: 7ohn ;&B

      B N $: Wal%s ;&@

    < + $: Wal%s 9&;

    'he num!ers on the right represent the weights for each rule&'he weightof the analysis is the product of the weights of the rules used in thederi"ation&

    Predicting the right sentence that is percei"edis !ased on these weights&

  • 8/18/2019 Statistical NLP

    16/19 

    /egrees of grammaticality

    'raditional approaches to NLP do not

    accommodate gradations of

    grammaticality& sentence is either

    correct or not&

    In some cases accepta!ility may "ary

    with the structure and conte#t of the

    sentence&

  • 8/18/2019 Statistical NLP

    17/19 

    Structural Preference

    Consider the sentence

    6 'he emergency crews hate most is domestic

    "iolence&

    'he correct interpretation is86'he emergency Dthat the crews hate mostE is domestic

    "iolence&

    'hese preferences can !e seen more as structural

    preferences rather than parsing preferences&Statistical approaches can easily handle such structural

    preferences&

  • 8/18/2019 Statistical NLP

    18/19 

    4rror 'olerance

      remar%a!le property of human

    language comprehension is error

    tolerance&

    any sentences that the traditional

    approach classifies as ungrammatical

    can actually !e interpreted !y statistical

    NLP techni*ues&

  • 8/18/2019 Statistical NLP

    19/19

    Conclusions

    3ree and commercial software is now

    a"aila!le that pro"ides a lot of NLP features&

    (e&g& icrosoft FP has a speech recognition

    software !y which users can control menusand e#ecute commands)

      lot of research is going into de"eloping new

    applications and in"estigating new techni*ues

    and approaches that will ma%e Statistical NLP

    more feasi!le in the near future&