how to model text like a rockstar

Post on 17-Jul-2015

111 Views

Category:

Data & Analytics

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Ben TaylorChief Data Scientist

How To Model Text Like A Rock Star

Chemical Engineering (BS/MS/PhD…*)

Twitter: @bentaylordataLinkedIn: bentaylordata

Modeling Numeric Data Is Easy

Text Applications?

StockResumesCover lettersLogs

The Basics Of Document Modeling

UNSTRUCTURED

STRUCTURED

Tokenized

Rich Text

From: Mamatha Devineni Ratnam <mr47+@andrew.cmu.edu>

Subject: Pens fans reactions

Organization: Post Office, Carnegie Mellon, Pittsburgh, PA

Lines: 12

NNTP-Posting-Host: po4.andrew.cmu.edu

I am sure some bashers of Pens fans are pretty confused about the lack

of any kind of posts about the recent Pens massacre of the Devils. Actually,

I am bit puzzled too and a bit relieved. However, I am going to put an end

to non-PIttsburghers' relief with a bit of praise for the Pens. Man, they

are killing those Devils worse than I thought. Jagr just showed you why

he is much better than his regular season stats. He is also a lot

fo fun to watch in the playoffs. Bowman should let JAgr have a lot of

fun in the next couple of games since the Pens are going to beat the pulp out of Jersey anyway.

I was very disappointed not to see the Islanders lose the final

regular season game. PENS RULE!!!

rec.sport.hockey

From: mblawson@midway.ecn.uoknor.edu (Matthew B Lawson)

Subject: Which high-performance VLB video card?

Summary: Seek recommendations for VLB video card

Nntp-Posting-Host: midway.ecn.uoknor.edu

Organization: Engineering Computer Network, University of Oklahoma, Norman, OK, USA

Keywords: orchid, stealth, vlb

Lines: 21

My brother is in the market for a high-performance video card that supports

VESA local bus with 1-2MB RAM. Does anyone have suggestions/ideas on:

- Diamond Stealth Pro Local Bus

- Orchid Farenheit 1280

- ATI Graphics Ultra Pro

- Any other high-performance VLB card

Please post or email. Thank you!

- Matt

--

| Matthew B. Lawson <------------> (mblawson@essex.ecn.uoknor.edu) |

--+-- "Now I, Nebuchadnezzar, praise and exalt and glorify the King --+--

| of heaven, because everything he does is right and all his ways |

| are just." - Nebuchadnezzar, king of Babylon, 562 B.C. |

comp.sys.ibm.pc.hardware

Weak Text Example

Load Example Dataset

Load Example Dataset

CountVectorizer

CountVectorizer

Term Frequency

Term Frequency

How Can I Be Amazing?

<notebook>

Weak Text Example

Now lets really mess this up, reduce one class by an order of magnitude

Does this model have any value?

Problem Setup

• Piecemeal the structuring: final outputs are scalars

Audio

Video

Text

Signal Processing

Personality

Expression Signal Processing

ts

ts

us

usus

us = unstructured datats = time series data

s = scalar data

s

FeatureGen

Raw Audio Indicators

@bentaylordata

• Engagement• Motivation• Distress• Aggression

Model

Personality Models

@bentaylordata

FeatureGen

Video Indicators

@bentaylordata

SignalProcessing

F989 F990 F991

scalar

@bentaylordata

Combining All Features

X

56.341 -200.45 0 1 2 4 60.71 12 52.15 -350.12 1 1

Feature Mapping:As the features are produced they are stored in a matrix where each column represents a feature and each row represents an interview

2 4 60.71 12 52.15 -350.12 1 0

2 3 16.16 21 25.51 -105.21 0 0 NANA

NA

NANA

Features Extracted

70% retention top performers, 75% reduction in total interview volume

10,000 interviews, 75% reduction, 2,500 interviews reviewed30% increase in sourcing to hit goals10,000=>13,000, 2,500=>3,250 Total savings: (10,000-3,250)/10,000 = 67.50%

PEOPLE ARE WHO THEY ARENOT WHAT THEY WRITE

VOICES . EXPERIENCES . PASSION . POTENTIAL

Tyler Penman1

Riley Kurts2

Jace Kendall3

Ric Fratus4

Jennifer Lee5

Benjamin Dickson6

@bentaylordata

btaylor@hirevue.com

Questions?

top related