3 rd progress meeting for sphinx 3.6 development arthur chan, david huggins-daines, yitao sun...

39
3 rd Progress Meeting For Sphinx 3.6 Development Arthur Chan, David Huggins-Daines, Yitao Sun Carnegie Mellon University Jan 25, 2006

Post on 19-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 3 rd Progress Meeting For Sphinx 3.6 Development Arthur Chan, David Huggins-Daines, Yitao Sun Carnegie Mellon University Jan 25, 2006

3rd Progress Meeting For Sphinx 3.6 Development

Arthur Chan,David Huggins-Daines,

Yitao SunCarnegie Mellon University

Jan 25, 2006

Page 2: 3 rd Progress Meeting For Sphinx 3.6 Development Arthur Chan, David Huggins-Daines, Yitao Sun Carnegie Mellon University Jan 25, 2006

This meeting 3rd Progress report on 3.6 development

(40 pages) Agenda

What happened in Fall 2005? (4 slides) Progress of Sphinx Development in Fall

2005 (17 slides) Summary of Progress in 2005 (10 slides) Discussion: Should we create one release

candidate? (1 slide)

Page 3: 3 rd Progress Meeting For Sphinx 3.6 Development Arthur Chan, David Huggins-Daines, Yitao Sun Carnegie Mellon University Jan 25, 2006

What happened in FALL 2005?

Page 4: 3 rd Progress Meeting For Sphinx 3.6 Development Arthur Chan, David Huggins-Daines, Yitao Sun Carnegie Mellon University Jan 25, 2006

What happened in Fall 2005? Major Events in Sphinx Development

We participate GALE in Oct 2006 Conformance of the recognizers (sphinx 3 and sphinx 4)

become an issue Lack of advanced acoustic modeling techniques

become very glaring Sphinx 3 and 4 have gone through bug fixes.

CALO effort are now split to two Off-line recognizer: require major improvement in LM

and AM. AM Issue is shared with GALE

On-line recognizer (CALO jargon: Smartnote) Now have new LM and AM Require significant development work

Page 5: 3 rd Progress Meeting For Sphinx 3.6 Development Arthur Chan, David Huggins-Daines, Yitao Sun Carnegie Mellon University Jan 25, 2006

Time distribution (Estimated) Arthur

50% on GALE, 20% on CALO, 30% on Sphinx

Dave 65% CALO, 30% on

PocketSphinx, 5% on Sphinx

Yitao 90% CALO, 10% on

Sphinx

GALE

CALO

SphinxDev

GALE

CALO

SphinxDev

PocketSphinx

GALE

CALO

SphinxDevPocketSphinx

Page 6: 3 rd Progress Meeting For Sphinx 3.6 Development Arthur Chan, David Huggins-Daines, Yitao Sun Carnegie Mellon University Jan 25, 2006

The Two Funded Projects Upside:

They point to issues that need to be solved Need significant reprioritization of tasks

Balance of effort on the 2 projects is now achieved

Downside: Code development of Sphinx becomes a

slower process Also, we haven’t released s3 for a while => Should we release the code now?

Tired students and staffs can be found everywhere

Page 7: 3 rd Progress Meeting For Sphinx 3.6 Development Arthur Chan, David Huggins-Daines, Yitao Sun Carnegie Mellon University Jan 25, 2006

Progress of Sphinx 3.6 in FALL 2005

Page 8: 3 rd Progress Meeting For Sphinx 3.6 Development Arthur Chan, David Huggins-Daines, Yitao Sun Carnegie Mellon University Jan 25, 2006

Overview Work on second-stage

Merging of bestpath search in the 2-nd stage of tree search

IBM lattice generation word confidence estimation

Behavior changes and bug fixes Treatment of acoustic scores Assertion in vithist.c

Attempts in search algorithm improvements Mode 3 – Flat lexicon decoding Mode 4 – Tree lexicon decoding

Sphinx on Mandarin and coded language. New tools: conf, dp

Page 9: 3 rd Progress Meeting For Sphinx 3.6 Development Arthur Chan, David Huggins-Daines, Yitao Sun Carnegie Mellon University Jan 25, 2006

Work Schedule Sep 1 to Oct 1:

Implementation of triphones in flat lexicon decoder Oct 1 to Nov 1:

Implementation of triphones on tree lexicon decoder (incomplete)

Nov 1 to Dec 8: IBM lattice generation Confidence score generation Fixed issues in scores

Dec 8 to Jan 3: Concept of “vacation” was tried Jan 3 to now:

Fixed bugs, prepare release.

Page 10: 3 rd Progress Meeting For Sphinx 3.6 Development Arthur Chan, David Huggins-Daines, Yitao Sun Carnegie Mellon University Jan 25, 2006

Second-stage Processing Best-path search could now be specified in

decode Implementation requires write back. (urgh.)

Recognizer can now generate lattice in IBM format

Word is attached at the link Sphinx format generates word attached to the node. Scores are normalized with best senone scores

Rong’s confidence-based routine is now in Sphinx conf Goodies: use Sphinx logs3 routine -> significantly

reduce alpha-beta scores mismatch.

Page 11: 3 rd Progress Meeting For Sphinx 3.6 Development Arthur Chan, David Huggins-Daines, Yitao Sun Carnegie Mellon University Jan 25, 2006

Second-stage Processing (cont.) Further work

Best-path generation doesn’t conform to past 3.5

-> Bugs caused by 3.6 development Also, the best path is not always in the lattice

-> Legacy bug Confidence-based method

Lattice-based : could only be used off-line currently 10% of the data still have alpha-beta mismatch

Consensus network generation need special focus

Page 12: 3 rd Progress Meeting For Sphinx 3.6 Development Arthur Chan, David Huggins-Daines, Yitao Sun Carnegie Mellon University Jan 25, 2006

Scores we see (Change 1) Tree search now truly generate un-

normalized scores. was normalized by the ending frame only Caused by bug introduced in mid-2005

All 1-st stage search use the same score logging functions Include align, allphone, decode_anytopo, decode matchseg_write, match_write are the current

versions log_* is still used but will soon be totally

replaced

Page 13: 3 rd Progress Meeting For Sphinx 3.6 Development Arthur Chan, David Huggins-Daines, Yitao Sun Carnegie Mellon University Jan 25, 2006

Scores we see(Change 2) Multi-stream GMM computation

(ms_gauden) By default, it won’t quantize log pdf to 8 bits

now Single-stream GMM computation

Vectors with zero means and variances are removed (-remove_zero_var_gau)

Scores and performance will change Testing resource has changed. (Evandro grins at this point)

Page 14: 3 rd Progress Meeting For Sphinx 3.6 Development Arthur Chan, David Huggins-Daines, Yitao Sun Carnegie Mellon University Jan 25, 2006

Scores we see (Change 3)

Sphinx now supports generation of different hypseg format (-hypseg_fmt) SPHINX 2-format SPHINX 3-format ctm format

Always require more processing, but it is better than nothing.

Page 15: 3 rd Progress Meeting For Sphinx 3.6 Development Arthur Chan, David Huggins-Daines, Yitao Sun Carnegie Mellon University Jan 25, 2006

Scores – a summary Unnormalized (true) acoustic and language

scores generated by (-hypsegscore_unscale) 1-st stage search and Best path search right after the 1-st stage

Normalized acoustic score would be generated by

Lattice generation If developers wants to have true scores in

lattice Developers could get the best scores from the

decoder (–bestsenscrdir) and do their own processing

Page 16: 3 rd Progress Meeting For Sphinx 3.6 Development Arthur Chan, David Huggins-Daines, Yitao Sun Carnegie Mellon University Jan 25, 2006

Other important bug fixes

Bug in vithist.c Caused assertion and stop the

recognizer Now fix and will return error message

to the search abstraction routine.

Page 17: 3 rd Progress Meeting For Sphinx 3.6 Development Arthur Chan, David Huggins-Daines, Yitao Sun Carnegie Mellon University Jan 25, 2006

Attempts in search algorithm improvements (Mode 3) Flat-lexicon decoder

Search implementation is completed decode could now use flat-lexicon decoding

-op_mode 3 Decoders revamping is completed

Mode 2 (FST) Mode 3 (Flat-lexicon) Mode 4 (Ravi’s Tree-Lexicon) Mode 5 (Arthur’s Tree-Lexicon)

decode_anytopo is still there for backward compatibility purpose

decode_anytopo = decode in mode 3

Page 18: 3 rd Progress Meeting For Sphinx 3.6 Development Arthur Chan, David Huggins-Daines, Yitao Sun Carnegie Mellon University Jan 25, 2006

No Further Re-factoring

Avoid re-factoring before next check-in

Align and allphone have different input/output file formats It doesn’t make sense to stuff into a

single executable. Using XML configuration and control file

will be a choice But it takes too much time to implement

Page 19: 3 rd Progress Meeting For Sphinx 3.6 Development Arthur Chan, David Huggins-Daines, Yitao Sun Carnegie Mellon University Jan 25, 2006

Algorithmic Work -Flat Lexicon Decoder Full triphone completed in flat-lexicon

decoding 2.5% relative improvement in accuracy But requires 100xRT (urgh) Useful for debugging

Also considered full trigram implementation Will results in another 5-10 times slow down

Conclusion Flat lexicon search has come to its limit

Page 20: 3 rd Progress Meeting For Sphinx 3.6 Development Arthur Chan, David Huggins-Daines, Yitao Sun Carnegie Mellon University Jan 25, 2006

Algorithmic Work -Tree Lexicon Decoder Current full triphone implementation

Has flaws in score propagation Tree copies

No time to do it at all, Q4’s workload nearly kill AC Benchmarking results

GALE results: Full Lexicon = Tree Lexicon

CALO/Communicator results: Tree Lexicon 5% relative poorer.

Conclusion Half a year on search is expected to give us another

5%

Page 21: 3 rd Progress Meeting For Sphinx 3.6 Development Arthur Chan, David Huggins-Daines, Yitao Sun Carnegie Mellon University Jan 25, 2006

Conclusion on Search Need to seriously consider

Is working on search a good idea? In both CALO/GALE, gain come from

SAT and cross adaptation Second-stage processing

Confusion network Confidence annotation First-stage SD -> Second-stage SA

VTLN also only give 5% rel but it only takes 5 days to implement

Page 22: 3 rd Progress Meeting For Sphinx 3.6 Development Arthur Chan, David Huggins-Daines, Yitao Sun Carnegie Mellon University Jan 25, 2006

Sphinx on Different Text Encodings

There are already non-CMU work for Spanish French

Big question mark Could it work on other encoding?

Page 23: 3 rd Progress Meeting For Sphinx 3.6 Development Arthur Chan, David Huggins-Daines, Yitao Sun Carnegie Mellon University Jan 25, 2006

Sphinx on Mandarin (gb2312)

Page 24: 3 rd Progress Meeting For Sphinx 3.6 Development Arthur Chan, David Huggins-Daines, Yitao Sun Carnegie Mellon University Jan 25, 2006

Sphinx on Mandarin (cont.)

Thanks to Ravi Bugs we fixed to get it through

1236322: libutil\str2words special character bug

1236166: special character wasn't supported

This should give us fairly good foundation to start on most language

Page 25: 3 rd Progress Meeting For Sphinx 3.6 Development Arthur Chan, David Huggins-Daines, Yitao Sun Carnegie Mellon University Jan 25, 2006

Summary of Sphinx in Fall 2005

We have done something Strong focus in search research

doesn’t seem to get us far. Fire to fight on the modeling side Sounds like the time to check in

and move on

Page 26: 3 rd Progress Meeting For Sphinx 3.6 Development Arthur Chan, David Huggins-Daines, Yitao Sun Carnegie Mellon University Jan 25, 2006

Progress of Sphinx 3.X (From X=5 to X=6)

Page 27: 3 rd Progress Meeting For Sphinx 3.6 Development Arthur Chan, David Huggins-Daines, Yitao Sun Carnegie Mellon University Jan 25, 2006

Progress of Sphinx 3.X(From X=5 to X=6)

New Features (4 slides) Items that are significant

Gentle, mild and simple re-factoring and its consequence (4 slides)

Documentation (1 slide) Regression testing (1 slide) Pruned Features ?

Page 28: 3 rd Progress Meeting For Sphinx 3.6 Development Arthur Chan, David Huggins-Daines, Yitao Sun Carnegie Mellon University Jan 25, 2006

New Features (Search) Speed

Further enhancement of CIGMMS BBI tree implementation (by Dave, in

SphinxTrain) Search

FST search Full triphone implementation in

decode_anytopo Separation of search

abstraction/implementation in 3.X

Page 29: 3 rd Progress Meeting For Sphinx 3.6 Development Arthur Chan, David Huggins-Daines, Yitao Sun Carnegie Mellon University Jan 25, 2006

New Features (Adaptation)

Adaptation Multiple classes for MLLR (by Dave) MAP adaptation (by Dave, in

SphinxTrain)

Page 30: 3 rd Progress Meeting For Sphinx 3.6 Development Arthur Chan, David Huggins-Daines, Yitao Sun Carnegie Mellon University Jan 25, 2006

New Features (Others) New executables

lm_convert lm3g2dmp++

dp If Evandro ask, “Why do we need dp in sphinx 3?” Say this, “I don’t know, we found the executable

at ./s3/src/misc/dp.c” conf

Off-line word-level confidence annotation program Mismatch dict-LM

Un-match entries could be automatically generated (-lts_mismatch)

Page 31: 3 rd Progress Meeting For Sphinx 3.6 Development Arthur Chan, David Huggins-Daines, Yitao Sun Carnegie Mellon University Jan 25, 2006

Gentle, mild and simple re-factoring (GMM computation)

GMM computation is now shared among decode, decode_anytopo, align,

allphone So e.g.

decode_anytopo could use fast GMM computation

decode could use SCHMM

Page 32: 3 rd Progress Meeting For Sphinx 3.6 Development Arthur Chan, David Huggins-Daines, Yitao Sun Carnegie Mellon University Jan 25, 2006

Gentle, mild and simple re-factoring (Search) Its consequence in search

programming: FST, Flat, Tree search now share the

same interface (decode) Just like Sphinx 2 and 4

Writing a new search won’t be replacing a search

2-nd stage now works for decode Alright, not for FST search

Page 33: 3 rd Progress Meeting For Sphinx 3.6 Development Arthur Chan, David Huggins-Daines, Yitao Sun Carnegie Mellon University Jan 25, 2006

Gentle, mild and simple re-factoring (Others)

Scores output now rationalized Several bug fixes causing seg faults

are eliminated Vithist.c bugs Class-based LM is now working correctly

Command-line among applications are now synchronized and re-factored

Page 34: 3 rd Progress Meeting For Sphinx 3.6 Development Arthur Chan, David Huggins-Daines, Yitao Sun Carnegie Mellon University Jan 25, 2006

Documentation/Tutorial Hieroglyph

Now writing 2nd draft Doxygen documentation (by Evandro) Tutorial now works

archive_s3 Sphinx 2 Sphinx 3 Sphinx 4

Page 35: 3 rd Progress Meeting For Sphinx 3.6 Development Arthur Chan, David Huggins-Daines, Yitao Sun Carnegie Mellon University Jan 25, 2006

Regression Testing Our weakest link Now daily

Standard regression test is done Performance check on

Communicator/TIDIGITs/TI46 doxygen documentation will be made and

tested make check now has 50 tests (3.5:

11) fairly robust to careless mistakes

Page 36: 3 rd Progress Meeting For Sphinx 3.6 Development Arthur Chan, David Huggins-Daines, Yitao Sun Carnegie Mellon University Jan 25, 2006

Expected Trimmed Features

Search Mode 0: alignment (?) Mode 1: allphone Mode 5: word tree copies

If full triphone in Ravi’s tree search couldn’t be quickly, trimmed it as well

(?) Yitao’s PCFG rescoring

Page 37: 3 rd Progress Meeting For Sphinx 3.6 Development Arthur Chan, David Huggins-Daines, Yitao Sun Carnegie Mellon University Jan 25, 2006

Conclusion of Sphinx 3.X (From X=5 to X=6)

We have done something Development last year

has enriched the code Niceify a lot of things internal to code

There are hiccups in our development Not perfect Well, compare this with NASDAQ.

Page 38: 3 rd Progress Meeting For Sphinx 3.6 Development Arthur Chan, David Huggins-Daines, Yitao Sun Carnegie Mellon University Jan 25, 2006

Discussion:What should we do now?

Option 1, keep on working without release

Option 2, merge the crazy branch with the trunk without release

Option 3, merge the crazy branch with the trunk and create release-candidate Sphinx 3.6 RCI

Page 39: 3 rd Progress Meeting For Sphinx 3.6 Development Arthur Chan, David Huggins-Daines, Yitao Sun Carnegie Mellon University Jan 25, 2006

End