an intro to text analytics on big data with a use case

Post on 15-Jan-2015

120 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Introduction on how to perform text analytics using input from twitter and the "Emmys" as use case example.

TRANSCRIPT

#TOSMAC

Toronto SMAC Meetup – Welcome!An Intro to Text Analytics on Big Data with a use case

#TOSMAC

Toronto SMAC Team

| © 2014 IBM Corporation2

Lucas Silva Felipe MosquettaMarcos de Mello

#TOSMAC

Twitters numbersAn Intro to Text Analytics on Big Data with a use case

| © 2014 IBM Corporation 3

As you know:

-500 million Tweets are sent per day.

-Twitter supports 35+ languages.

-255 million monthly active users.

Huge amount of data!

#TOSMAC

An Intro to Text Analytics on Big Data with a use case

| © 2014 IBM Corporation 4

Overview

Section1 Section2 Section3 Section4 Section5

#TOSMAC

Section1 Section2 Section3 Section4 Section5

An Intro to Text Analytics on Big Data with a use case

| © 2014 IBM Corporation 5

Overview

#TOSMAC

Section1 Section2 Section3 Section4 Section5

An Intro to Text Analytics on Big Data with a use case

| © 2014 IBM Corporation 6

Overview

#TOSMAC

Let’s get started!

| © 2014 IBM Corporation 7

#TOSMAC

Input dataAn Intro to Text Analytics on Big Data with a use case

| © 2014 IBM Corporation 8

#TOSMAC

An Intro to Text Analytics on Big Data with a use case

| © 2014 IBM Corporation 9

Section2

#TOSMAC

Demo

| © 2014 IBM Corporation 10

#TOSMAC

Section1 Section2 Section3 Section4 Section5

An Intro to Text Analytics on Big Data with a use case

| © 2014 IBM Corporation 11

Next section

#TOSMAC

Section1 Section2 Section3 Section4 Section5

An Intro to Text Analytics on Big Data with a use case

| © 2014 IBM Corporation 12

Next sectionExtractor: used to extract

structured information from unstructured and

semi-structured data.

AQL: Annotation Query Language. Rule language

with familiar SQL-like syntax.

#TOSMAC

Section1 Section2 Section3 Section4 Section5

An Intro to Text Analytics on Big Data with a use case

| © 2014 IBM Corporation 13

Next section

Profiler:troubleshooting performance

problems.

#TOSMAC

Main conceptsAn Intro to Text Analytics on Big Data with a use case

| © 2014 IBM Corporation 14

Types of extraction specifications:

- Dictionaries

- Regular expressions

- Part of speech

#TOSMAC

Main conceptsAn Intro to Text Analytics on Big Data with a use case

| © 2014 IBM Corporation 15

#TOSMAC

Main conceptsAn Intro to Text Analytics on Big Data with a use case

| © 2014 IBM Corporation 16

#TOSMAC

An Intro to Text Analytics on Big Data with a use case

| © 2014 IBM Corporation 17

#TOSMAC

Main conceptsAn Intro to Text Analytics on Big Data with a use case

| © 2014 IBM Corporation 18

#TOSMAC

Main conceptsAn Intro to Text Analytics on Big Data with a use case

| © 2014 IBM Corporation 19

#TOSMAC

Main conceptsAn Intro to Text Analytics on Big Data with a use case

| © 2014 IBM Corporation 20

Types of extraction specifications:

- Dictionaries

-Regular expressions

- Part of speech numbers:7.54

13

#TOSMAC

Demo

| © 2014 IBM Corporation 21

#TOSMAC

Main conceptsAn Intro to Text Analytics on Big Data with a use case

| © 2014 IBM Corporation 22

Types of extraction specifications:

- Dictionaries

- Regular expressions

- Part of speech

#TOSMAC

Main conceptsAn Intro to Text Analytics on Big Data with a use case

| © 2014 IBM Corporation 23

#TOSMAC

An Intro to Text Analytics on Big Data with a use case

| © 2014 IBM Corporation 24

#TOSMAC

| © 2014 IBM Corporation 25

An Intro to Text Analytics on Big Data with a use case

AQL Guidelines

Basic feature AQL statements- Develop the core building blocks of the extractor.

#TOSMAC

| © 2014 IBM Corporation 26

An Intro to Text Analytics on Big Data with a use case

AQL Guidelines

Candidate generation AQL statements- Combine basic features AQL statements.

#TOSMAC

| © 2014 IBM Corporation 27

An Intro to Text Analytics on Big Data with a use case

Candidate generation AQL statements

$7.5 million$4 thousand

$ 7.5 million

#TOSMAC

| © 2014 IBM Corporation 28

An Intro to Text Analytics on Big Data with a use case

Candidate generation AQL statements

$7.5 million$4 thousand

$ 7.5 million

$7.5 million

#TOSMAC

| © 2014 IBM Corporation 29

An Intro to Text Analytics on Big Data with a use case

AQL Guidelines

Filter and consolidate AQL statements- Refine results- Remove invalid annotations- Resolve overlap between annotations.

#TOSMAC

Demo

| © 2014 IBM Corporation 30

#TOSMAC

| © 2014 IBM Corporation 31

An Intro to Text Analytics on Big Data with a use case

Conclusion

#TOSMAC

Check pointAn Intro to Text Analytics on Big Data with a use case

| © 2014 IBM Corporation 32

#TOSMAC

What we have doneAn Intro to Text Analytics on Big Data with a use case

| © 2014 IBM Corporation 33

Section1 Section2 Section3

#TOSMAC

What are we going to do?An Intro to Text Analytics on Big Data with a use case

| © 2014 IBM Corporation 34

Section4 Section5

#TOSMAC

Demo

| © 2014 IBM Corporation 35

#TOSMAC

Also using RAn Intro to Text Analytics on Big Data with a use case

| © 2014 IBM Corporation 36

1.75 0.32

#TOSMAC

What are we going to do?An Intro to Text Analytics on Big Data with a use case

| © 2014 IBM Corporation 37

#TOSMAC

Demo

| © 2014 IBM Corporation 38

#TOSMAC

So what?An Intro to Text Analytics on Big Data with a use case

| © 2014 IBM Corporation 39

#TOSMAC

CompaniesAn Intro to Text Analytics on Big Data with a use case

| © 2014 IBM Corporation 40

#TOSMAC

Exporting to youAn Intro to Text Analytics on Big Data with a use case

| © 2014 IBM Corporation 41

#TOSMAC

Thank you!Let's network!

| © 2014 IBM Corporation 42

top related