a parallel implemenation of conditional random fields this was an auss/nip project for the grant...

13
A Parallel Implemenation of Conditional Random Fields This was an AUSS/NIP project for the grant Developing an Entity Extractor for the Scalable Constructing of Semantically Rich Socio-Technical Network Data by Jana Diesner of UIUC. Mostly I worked with Brent Fegley, her research assistant This is machine learning, and thus NIP.

Upload: hillary-hoover

Post on 18-Dec-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Parallel Implemenation of Conditional Random Fields This was an AUSS/NIP project for the grant Developing an Entity Extractor for the Scalable Constructing

A Parallel Implemenation of Conditional Random Fields

This was an AUSS/NIP project for the grant Developing an Entity Extractor for the Scalable Constructing of Semantically Rich Socio-Technical Network Data by Jana Diesner of UIUC.

Mostly I worked with Brent Fegley, her research assistant

This is machine learning, and thus NIP.

Page 2: A Parallel Implemenation of Conditional Random Fields This was an AUSS/NIP project for the grant Developing an Entity Extractor for the Scalable Constructing

Motivations

For them: some of the problems they want to run take weeks with the serial code.

For us: machine learning is an obvious candidate for HPC.

Page 3: A Parallel Implemenation of Conditional Random Fields This was an AUSS/NIP project for the grant Developing an Entity Extractor for the Scalable Constructing

What is CRF?

The original paper: Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data by Lafferty, McCallum, and Pereira (2001)

Goal: (for example) add part-of-speech labels to text

Idea: use a hidden Markov process based on previous previous labels. Optimize the transition weights in the model to get the predicted parts of speech as close the the correct values as possible, for example by maximizing the entropy of the solution.

Page 4: A Parallel Implemenation of Conditional Random Fields This was an AUSS/NIP project for the grant Developing an Entity Extractor for the Scalable Constructing

Diesner specifics

The goal for Diesner's group is to identify entities within unlabeled text

95 training examples; roughly 2500 labeled paragraphs from WSJ

Two schemes: 'BOUNDARY' (5 labels) and 'CATEGORY' (95 labels) (much longer running)

<ENAMEX TYPE="ORGANIZATION:CORPORATION">Intel Corp.</ENAMEX> reported a <NUMEX TYPE="PERCENT">50 %</NUMEX> drop in <TIMEX TYPE="DATE:DATE">third-quarter</TIMEX> net income , partly because of a <NUMEX TYPE="CARDINAL">one</NUMEX>-time charge for discontinued operations . The big semiconductor and computer <ENAMEX TYPE="ORG_DESC:CORPORATION">maker</ENAMEX> , said it had net of <NUMEX TYPE="MONEY">$ 72 million</NUMEX> , or <NUMEX TYPE="MONEY">38 cents</NUMEX> , down<NUMEX TYPE="PERCENT">50 %</NUMEX> from <NUMEX TYPE="MONEY">$ 142.7 million</NUMEX> , or <NUMEX TYPE="MONEY">78 cents</NUMEX> a share . The lower net included a charge of <NUMEX TYPE="MONEY">$ 35 million</NUMEX> , equal to <NUMEX TYPE="MONEY">12 cents</NUMEX> a share on an after-tax basis , for the cost of abandoning a computer-systems joint venture with <ENAMEX TYPE="ORGANIZATION:CORPORATION">Siemens AG</ENAMEX> of <ENAMEX TYPE="GPE:COUNTRY">West Germany</ENAMEX> . Earning also fell from the <TIMEX TYPE="DATE:DATE">year-ago</TIMEX> period because of slowing microchip demand .

Page 5: A Parallel Implemenation of Conditional Random Fields This was an AUSS/NIP project for the grant Developing an Entity Extractor for the Scalable Constructing

Sarawagi Implementation of CRF

Diesner et al. use a Java implementation developed by Sunita Sarawagi of IITB around 2006

DataIter iterates over DataSequence instances

– so a DataSequence is some labeled text FeatureGenerator takes a DataSequence and

provides a set of features

– e.g text tags or parts of speech CRF.Trainer optimizes the weights

Page 6: A Parallel Implemenation of Conditional Random Fields This was an AUSS/NIP project for the grant Developing an Entity Extractor for the Scalable Constructing

Illinois application

RoundRobinEstimator takes each set of 4 from the 5 training sets, trains the CRF, and tests the prediction against the remaining set.

Two cases: BOUNDARY (5 labels) and CATEGORY (95 labels)

Typical test case: 10 optimization steps for BOUNDARY, 3 for CATEGORY

I worked directly from their SVN repo using Eclipse

Maven for source version control

Page 7: A Parallel Implemenation of Conditional Random Fields This was an AUSS/NIP project for the grant Developing an Entity Extractor for the Scalable Constructing

Serial Code Architecture

Basically the problem is gradient decent in many dimensions.

Uses LBFGS

– A Java port of the good old Fortran code

– Has internal state, which frustrated Fegley's threading efforts

Calls alternate between LBFGS (setting next test point) and computeFunctionGradient (evaluating the sum)

Page 8: A Parallel Implemenation of Conditional Random Fields This was an AUSS/NIP project for the grant Developing an Entity Extractor for the Scalable Constructing

(Some of) Java's Support for Parallel Threads

A java.util.concurrent.ExecutorService maintains a pool of threads that take tasks from a parallel queue and return values via futures.

We can create one using newFixedThreadPool()

The threads and tasks have to be customized.

Task

Task

Task

Parallel Queue

Promise Queue

Result

Result

Result

Page 9: A Parallel Implemenation of Conditional Random Fields This was an AUSS/NIP project for the grant Developing an Entity Extractor for the Scalable Constructing

Parallel Code Architecture

Introduce a new derived class:

ParallelTrainer extends Trainer

In ParallelTrainer.computeFunctionGradient(), training example terms are evaluated across threads.

Scalar values get returned as Futures.

The gradient vector must be returned via the calling parameter!1. Make a new Thread class that accumulates gradient terms

over its lifetime

2. Merge those values at the end of the iteration

Page 10: A Parallel Implemenation of Conditional Random Fields This was an AUSS/NIP project for the grant Developing an Entity Extractor for the Scalable Constructing

Results

Breakdown of time vs. task for the BOUNDARY and CATEGORY problems

Page 11: A Parallel Implemenation of Conditional Random Fields This was an AUSS/NIP project for the grant Developing an Entity Extractor for the Scalable Constructing

Results (2)

Parallel speed-up vs. thread count for the two tasks. The BOUNDARIES task rapidly exhausts scalability, but the CATEGORIES task is still improving at 64 threads.

Training time vs. thread count for CATEGORIES. The blue line is for 16 cores/32 hyperthreads.

Page 12: A Parallel Implemenation of Conditional Random Fields This was an AUSS/NIP project for the grant Developing an Entity Extractor for the Scalable Constructing

DrawbacksThe big issue: Sarawagi's API does not provide the

FeatureGenerator until it is time to start training. (Very flexible).

Internal state from Trainer is shared across the package.

This makes it difficult to create the threads early- they would have to have their internals replaced every iteration anyway.

Thus we create threads late, starting fresh every iteration. Very inefficient, but the overhead is tiny for realistic cases.

Page 13: A Parallel Implemenation of Conditional Random Fields This was an AUSS/NIP project for the grant Developing an Entity Extractor for the Scalable Constructing

Where are we now?

I'm working with a new set of grad students to understand some variability.

Rounding error causes drift in the optimization trajectory

The need to return the gradient vector makes a fully deterministic version very expensive

We've also shared ParallelTrainer with a group at CMU which uses Sarawagi's CRF implementation.