getting the results you want from unstructured data

24
Getting the Results You Want From Unstructured Data: An Overview for Developers

Upload: alchemyapi

Post on 30-Jul-2015

220 views

Category:

Data & Analytics


5 download

TRANSCRIPT

Page 1: Getting the Results You Want From Unstructured Data

Getting the Results !You Want From Unstructured Data: !An Overview for Developers

Page 2: Getting the Results You Want From Unstructured Data

Aaron Chavez VP of Engineering

Meet the Expert

#datatoresults

Page 3: Getting the Results You Want From Unstructured Data

Pioneer of web services for real-time text and image analysis

•  Founded in 2005 •  40,000+ users •  Used in 36+ countries •  More than 1+ billion API calls monthly •  Deep learning experts •  Recently acquired by IBM

#datatoresults

Page 4: Getting the Results You Want From Unstructured Data

Poll

How many of you have an NLP/AI application planned, in development, or in use? A.  We’re trying to get an idea how we could use deep learning. B.  We have a napkin sketch and see potential. C.  We have code written and are testing it. D.  We have an application ready and in use.

#datatoresults

Page 5: Getting the Results You Want From Unstructured Data

What You’ll Learn Today

1

2

3

What types of problems are better solved by NOT using AI/NLP

How to assess your approach from a qualitative and quantitative perspective

The best way to test what you are developing

#datatoresults

Page 6: Getting the Results You Want From Unstructured Data

Expect to Be Surprised

#datatoresults

Page 7: Getting the Results You Want From Unstructured Data

Bad Surprises •  Machines haven’t replaced us yet; every system has its quirks and shortcomings – that’s ok!

• If it has to be perfect, don’t use an intelligent system. If it can be reduced to a process, it doesn’t require intelligence •  Use the results in aggregate •  Use the results in conjunction with human expertise

#datatoresults

Page 8: Getting the Results You Want From Unstructured Data

Good Surprises You need to experiment because there is so much out there, many services that never would have thought possible, are actually possible.

• Question Answering: Get information through voice recognition capabilities. For example, how about asking the phone to navigate you to a location or ask for data found on the web: “Who is the president of the United States?” • Websites that track infectious diseases or “crisis data” in real time • Less esoteric applications, such as learning about a sales prospect and their company’s recent business decisions

Roadmaps and trajectories are important. !Even if something is impossible today, it might be just around the corner.

Page 9: Getting the Results You Want From Unstructured Data

IBM’s Watson for Oncology

Memorial Sloan Kettering and IBM are collaborating to train IBM Watson to help doctors identify treatment options for patients with cancer and assist in vital research. Medical imaging analysis + Machine learning + Computer vision + Medical expertise

#datatoresults

Page 10: Getting the Results You Want From Unstructured Data

Assess Qualitatively and Quantitatively

#datatoresults

Page 11: Getting the Results You Want From Unstructured Data

Qualitative

•  Is this tool really trying to solve the same problem you want it to solve?

•  Is it feature-complete? •  Informal testing will not bring you hard numbers,

but you can work through the checklist

#datatoresults

Page 12: Getting the Results You Want From Unstructured Data

Examples of Qualitative Exploration

•  Does it accept my data as-is? •  Can I use existing data instead of finding it

myself? •  NLP: does it support the language(s) I need? •  Is there scoring/ranking that allow for fine-

turning of results? •  “Tagging” versus “Classifying” versus “More

like this”

#datatoresults

Page 13: Getting the Results You Want From Unstructured Data

Quantitative

Do not rely on your gut. Your product is more important than that. –  Be scientific –  Don’t get hung up on the “quirks” of

these systems You can’t “feel” the difference between 70%

accurate and 80% accurate, but such margins can separate success and failure.

#datatoresults

Page 14: Getting the Results You Want From Unstructured Data

#deeplearningseries

Test It EXACTLY How You Will Use It

Page 15: Getting the Results You Want From Unstructured Data

Your Data, Your Application

» Use YOUR data, based on » YOUR definition of correct, for » YOUR use case

#datatoresults

Page 16: Getting the Results You Want From Unstructured Data

Your Data, Your Application

–  Obviously, accuracy matters, but what else?

–  If you plan to run the service in live applications, test for reliability

–  If you need real-time results, test for latency

–  If you want to process high volumes of data, test for throughput

#datatoresults

Page 17: Getting the Results You Want From Unstructured Data

Test Holistically

–  Don’t test an intermediate result when you can test the whole

–  What if your goal is to show a better ad using text classification?

•  Don’t just measure the accuracy of a text classifier

•  Measure the overall improvement in the system when you add text classification

#datatoresults

Page 18: Getting the Results You Want From Unstructured Data

Use Case

Spiderbook Redefines CRM to be Customer Relationship Discovery “The problem I ran into was that most NLP and named entity recognition algorithms had been developed using pristine data sets, hand-curated for test suites.” “Those algorithms are unable to accurately analyze the content you find on the Web, which is not perfectly written articles, blog posts or tweets.” Aman Naimat, Spiderbook co-founder

#datatoresults

Page 19: Getting the Results You Want From Unstructured Data

#datatoresults

Next Steps

Page 20: Getting the Results You Want From Unstructured Data

Next Steps •  Run the Alchemy demos for language, vision, face detection

•  http://www.alchemyapi.com/products/demo •  Let your imagination run!

•  Access these resources on our website:

•  Get started with the guide: !http://www.alchemyapi.com/developers/getting-started-guide/

•  SDKs available at: https://github.com/AlchemyAPI •  Test deep learning with your own applications:

•  Free API Key: http://www.alchemyapi.com/api/register.html

•  Need help? Contact [email protected]

#datatoresults

Page 21: Getting the Results You Want From Unstructured Data

What We’ve Covered

#datatoresults

1

2

3

What types of problems are better solved by NOT !using AI/NLP

How to assess your approach from a qualitative !and quantitative perspective

The best way to test what you are developing

Page 22: Getting the Results You Want From Unstructured Data

#datatoresults

Q&A

Page 23: Getting the Results You Want From Unstructured Data

Contact us

[email protected] !

www.alchemyapi.com You will receive an email with a recording of this webinar,

the slides and additional resources soon.

Thank you for attending!

#datatoresults

Page 24: Getting the Results You Want From Unstructured Data

#datatoresults