building tools to data mine unstructured text using a machine learning api

36
Learn | Solve | Evolve | Inspire © ai-one inc. 2011 Outline of discussion Topic-Mapper: ai-one for Text ai-one biologically inspired intelligence Overview of ai-Browser Building Tools to Data Mine Unstructured Text using a Machine Learning API

Upload: ai-one

Post on 19-Jun-2015

32.824 views

Category:

Technology


1 download

DESCRIPTION

ai-one's Topic-Mapper API enables programmers to develop software that can learn like a human. This presentation describes how to build an application to mine data from unstructured text using a combination of machine learning, natural language processing (NLP) and clustering technologies. The source code for this text analytics program is available as a reference design for others to modify to meet specific industry and use case requirements.

TRANSCRIPT

Page 1: Building Tools to Data Mine Unstructured Text using a Machine Learning API

Learn | Solve | Evolve | Inspire

© ai-one inc. 2011

Outline of discussionTopic-Mapper: ai-one for Text

ai-one™biologically inspired intelligence

Overview of ai-Browser

Building Tools to Data Mine Unstructured Text using a Machine Learning API

February 2012

Page 2: Building Tools to Data Mine Unstructured Text using a Machine Learning API

Learn | Solve | Evolve | Inspire

© ai-one inc. 2011

Agenda• ai-one technology & company• Topic-Mapper: A machine learning API • ai-Browser: A prototype application

– Research tool for knowledge workers– Combines NLP and other technologies– Enables machine-human collaboration– Reference design can be modified for specific domains– Source code included with Topic-Mapper SDK

Page 3: Building Tools to Data Mine Unstructured Text using a Machine Learning API

Learn | Solve | Evolve | Inspire

© ai-one inc. 2011

biologically inspired intelligence

creativitylogic

Page 4: Building Tools to Data Mine Unstructured Text using a Machine Learning API

Learn | Solve | Evolve | Inspire

© ai-one inc. 2011

Seeking early-adopter customers who will use technology to gain competitive advantage.

Positioned as a “easy step” to build mainstream artificial intelligence applications to understand unstructured text.

ai-one | where we are

Page 5: Building Tools to Data Mine Unstructured Text using a Machine Learning API

Learn | Solve | Evolve | Inspire

© ai-one inc. 2011

• Technology licensing ONLY – NO professional services– NO end-user applications

• We focus on continuous evolution of core technology

• Our consulting and OEM partners focus on application of technology to solve problems

ai-one | our business model

Page 6: Building Tools to Data Mine Unstructured Text using a Machine Learning API

Learn | Solve | Evolve | Inspire

© ai-one inc. 2011

ai-one’s biologically inspired intelligence

Our technology “learns” dynamically like you do, enabling you to extract the inherent intelligence from any content

Text docs, twitter, RSS, FB feeds, data

associations

relevance

patterns

LearningIntelligenceReading

Page 7: Building Tools to Data Mine Unstructured Text using a Machine Learning API

Learn | Solve | Evolve | Inspire

© ai-one inc. 2011

… it provides answers to questions you didn't know you wanted to ask….

our technology | ai-one description

ai-one’s technology is an adaptive holosemantic dataspace (“biologically inspired intelligence”) that allows users to quickly analyze and discover meaningful patterns of interleaved text, time related data, and images.

The holosemantic dataspace (HSDS) provides complex AI with reasoning and learning capability.

Page 8: Building Tools to Data Mine Unstructured Text using a Machine Learning API

Learn | Solve | Evolve | Inspire

© ai-one inc. 2011

ai-oneHSDS“Sensors”

Text, Images, Signal Processing

API

Smallest Input = Data Quant

Topic-Mapper for Text• Text analytics• GenomicsAvailable NOW

UltraMatch for Computer Vision• Image recognition• RoboticsEstimated availability July 2012

Graphalizer for Signal Processing• Financial markets• N-dimensional Time seriesPlanned for 2013

ai-one SDKs | APIs for building learning machines

Three products: Each optimized for the unique “grammar” of each type of data.

Page 9: Building Tools to Data Mine Unstructured Text using a Machine Learning API

Learn | Solve | Evolve | Inspire

© ai-one inc. 2011

the fundamental theoryour universal sales proposition

• Self-optimized information processing

• Self-controlled content organization

• Multiple higher-order concept formation

• Autonomic learning via recognition of multiple contexts

• Self-generalization of learned concepts

biologically inspired intelligence

Page 10: Building Tools to Data Mine Unstructured Text using a Machine Learning API

Learn | Solve | Evolve | Inspire

© ai-one inc. 2011

big discovery | new form of machine learning

Holosemantic dataspace (HSDS)• Operates at byte-level

• data agnostic • Any language, any sensor, any type of digital image

• “Listens” to data • Records every unique byte pattern only once• Heter- and hierarchical structures, temporal & spatial

• Detects how every byte pattern relates to every other• Autonomic no training or human intervention • Modeled on neurophysiology

Page 11: Building Tools to Data Mine Unstructured Text using a Machine Learning API

Learn | Solve | Evolve | Inspire

© ai-one inc. 2011

Artificial Intelligence Biologically Inspired Intelligence

• Questionable reputation

• Widely used, nobody knows

• Mostly used in static areas where models, data & requests do not change fast

• Needs domain expertise to setup and host, needs a lot of mentoring

• Behavior of the solutions are not very consistent and close to human behavior or decisions

• New approach; new to market

• First to market w/ SDK

• Usefully in dynamic areas where models, data and requests change rapidly

• No domain expertise needed. Less than a day to train developer. Less than a day to build apps.

• Behavior is very similar to how a human would behave or decide.

BII | a disruptive, revolutionary invention

Page 12: Building Tools to Data Mine Unstructured Text using a Machine Learning API

Learn | Solve | Evolve | Inspire

© ai-one inc. 2011

Topic-Mapper | our product for human languages

• Generates lightweight ontology

• Contextual learning

• Finds patterns

• Easy to combine with Natural Language Processing (NLP) and other

technologies

• Provides inherent semantic associative search and phonetic analysis

• Human language independent

• Requires only basic structuring of input text (XML)

• Ongoing/incremental learning

• Works with and without external ontologies (RDF, OWL, etc.)

Page 13: Building Tools to Data Mine Unstructured Text using a Machine Learning API

Learn | Solve | Evolve | Inspire

© ai-one inc. 2011

… language is not math ….

1. Detects more words of higher relevance2. Faster processing the corpus3. Much faster incremental updates4. Enables NLP to find patterns and learn without

human intervention5. Works on very small data sets (e.g. Tweets)

=

Faster implementation of semantic solutions

Topic-Mapper | benefits

Page 14: Building Tools to Data Mine Unstructured Text using a Machine Learning API

Learn | Solve | Evolve | Inspire

© ai-one inc. 2011

• ai-one core Text library (out-of-process COM server)– .NET 3.5 CLR wrapper (dll)

• Small footprint instantiation (<700k)• API documentation & developer’s guide• Code examples

– REST & SOAP deployments– BrainBrowser application for pattern search on Internet– BrainView application to visualize lightweight ontologies (LWO)

• BrainBoard workbench application for rapid proof of concept development

• Text focused support libraries and tools to assist in text preparation, processing, parsing, and loading into ai-one

Topic-Mapper | technical description

Page 15: Building Tools to Data Mine Unstructured Text using a Machine Learning API

Learn | Solve | Evolve | Inspire

© ai-one inc. 2011

BrainBoard | Topic-Mapper prototyping tool

Page 16: Building Tools to Data Mine Unstructured Text using a Machine Learning API

Learn | Solve | Evolve | Inspire

© ai-one inc. 2011

Topic-Mapper | semantic commands

Associationreturns the associative network for semantic correlation with the (one or more) input words; referred to as "brainstorm“

AssociationReversethe inverse of Association; referred to as "focus“

AssociationCheckreturns a list of all associative paths between two input words (source and target);

Page 17: Building Tools to Data Mine Unstructured Text using a Machine Learning API

Learn | Solve | Evolve | Inspire

© ai-one inc. 2011

Topic-Mapper | semantic commands

StatisticReturns frequency counts for input word; counts total occurrences, subtotal by structures and includes handles for each structure.

KeyWordsGiven a pointer to a context, return the words and a score indicating the semantic significance between the words and information contained within the context.

PhoneticReturns list of words with phonetic similarity to the input word; includes a score for each word.

Page 18: Building Tools to Data Mine Unstructured Text using a Machine Learning API

Learn | Solve | Evolve | Inspire

© ai-one inc. 2011

• StopWords{Get|Set|Erase}: maintenance of a stop word list. stop words are words found in the dataspace, but not used for any of the semantic commands.

• Context{Get|Set|Erase|Find}: maintenance of contexts; contexts are bags of words which, by definition, have a strong relation among themselves.

• ContextTighten: increases the semantic relation within the reference handle

• Relation{Get|Set|Erase|Find}: maintenance of relational triple: subject, object and predicate. Used to teach explicit relationships from entities like thesauri, taxonomies, and ontologies.

Topic-Mapper SDK | teaching commands

Page 19: Building Tools to Data Mine Unstructured Text using a Machine Learning API

Learn | Solve | Evolve | Inspire

© ai-one inc. 2011

ai-Browser | combines NLP with Topic-Mapper to find, compare, understand documents

Select Article Extract

raw text

Display text for selected Keyword

Associations for selected Keyword

Page 20: Building Tools to Data Mine Unstructured Text using a Machine Learning API

Learn | Solve | Evolve | Inspire

© ai-one inc. 2011

ai-Browser | architecture

Topic-Mapper

NLP

ClusteringOntology

“language is not math; rules-based and statistical approaches fail to deliver when the problem is complex, chaotic or data is very dynamic”

Works with any NLP API (e.g., OpenNLP or NLTK)

Works with any OWL, RDF or unstructured (free form text) as an

ontology

Works with any tool that can read XGMML, or

PMML file (Cytoscape, MATLab, etc,)

Combines four technologies to understand complex behavior and language.

Page 21: Building Tools to Data Mine Unstructured Text using a Machine Learning API

Learn | Solve | Evolve | Inspire

© ai-one inc. 2011

ai-Browser | Use CasesProves that technology can find the single best answer from millions of choices by combining human knowledge (free text ontology) with a reference point.

• Medical research (PubMed)

• Finding the best job candidate (LinkedIn)

• Finding the ideal matching item in classifieds (CraigList)

• Create searchable topic maps for conversations (Twitter, talk radio)

Transforms search into research.

Page 22: Building Tools to Data Mine Unstructured Text using a Machine Learning API

Learn | Solve | Evolve | Inspire

© ai-one inc. 2011

Add Knowledge(Free Form Text or Ontology)

• Categorize content based on rules• NLP is trained to understand parts of speech• Manually updated and developed

Filter(OpenNLP)

• Define concepts that are important to you.• Introduce additional knowledge.• Learn from external sources.

ai-Browser | What’s Inside?Finds what you need, not what you ask for..

Find Relationships(Topic-Mapper)

• Get all connections between all words• Identify Keywords• Identify Associations

Page 23: Building Tools to Data Mine Unstructured Text using a Machine Learning API

Learn | Solve | Evolve | Inspire

© ai-one inc. 2011

NLP | Processes Structured Language

• Categorizing words into parts of speech• Provides rules of grammar for language• Enables machine to understand structure of language• Provides named-entity extraction• Filtering of Topic-Mapper results• Domain lexicon

ai-Browser uses NLP to pre-process text to isolate nouns, verbs and modifiers

Page 24: Building Tools to Data Mine Unstructured Text using a Machine Learning API

Learn | Solve | Evolve | Inspire

© ai-one inc. 2011

ontology | provides domain expertise• Enables faster incremental learning, precision on small data sets• Add enterprise and public domain knowledge• Add user generated knowledge to enhance desired patterns• Generate LWOs from documents(s)• Increases pattern and term relevance for higher keyword rankings for search

engines

ai-Browser uses ontologies to sharpen results – especially valuable for small texts (like tweets)

Page 25: Building Tools to Data Mine Unstructured Text using a Machine Learning API

Learn | Solve | Evolve | Inspire

© ai-one inc. 2011

clustering | finds similarity of meaning• Enables customers to develop proprietary models• Data mining applications• Enables graph analysis using off-the-shelf tools using XGMML

and/or PMML representation of HSDS• Useful for:

– Reporting– Visualizing light-weight ontology– Comparing multiple documents– Knowledge representation

ai-Browser works with many analytical tools to post-process into clusters for reporting, further analysis, etc.

We use Cytoscape.org! (but you can use others)

Page 26: Building Tools to Data Mine Unstructured Text using a Machine Learning API

Learn | Solve | Evolve | Inspire

© ai-one inc. 2011

Topic-Mapper | roadmap• ai-one develops series of minimally viable products (MVP) to

generate customer interest• ai-one licenses source code to others to modify• Extends functionality by compensating for the “Fail Modes” of

existing technologies• Potential MVP Applications

– Automated ontology builder– Data mining free form text (AI search) – Data aggregation– Data cleansing– Automated RDF tagging– Genome sequence assembly & analysis

Page 27: Building Tools to Data Mine Unstructured Text using a Machine Learning API

Learn | Solve | Evolve | Inspire

© ai-one inc. 2011

Topic-Mapper | technical limitations• 32-bit, single thread with 4 GB capacity per instance• Only knows what you feed it

– Machine learning ≠ computer programming – Influence results… not control them

• Deployment options to overcome limitations:– Moving windows or full-capture– Series, parallel or single instance– REST or SOAP

Next Step: 64-bit, multithread version to be released in mid-2012 with 18 exabyte capacity/instance

Page 28: Building Tools to Data Mine Unstructured Text using a Machine Learning API

Learn | Solve | Evolve | Inspire

© ai-one inc. 2011

ai-one | success stories• Early customers

– Manor, SwissPort, BKA, global telecom carrier, others.

• Significant funding to maintain long-term focus on transitioning invention into innovation

• Our technology shows promise to disrupt markets– Data mining– Text analytics– Bioinformatics– Knowledge management– Personalized medicine– Behavioral marketing

Page 29: Building Tools to Data Mine Unstructured Text using a Machine Learning API

Learn | Solve | Evolve | Inspire

© ai-one inc. 2011

ai-one | how to start using technology

1. Contact ai-one to schedule personalized demonstration

2. Develop use cases detailing data sources, problem sets and desired outcomes

3. Attend ai-one training seminar

4. Refine use cases and project plan

5. License ai-one technology & source code for sample application(s)

6. Build your app!

Most developers can build an app within a day.

Page 30: Building Tools to Data Mine Unstructured Text using a Machine Learning API

Learn | Solve | Evolve | Inspire

© ai-one inc. 2011

ai-one | summary

• Big Idea Machines can learn like humans!

• Startup with solid funding & initial customers

• “Lean startup” model to develop customers

• Technology is a general use technology – not an end-user application.

• Extends capabilities of existing programming languages.

• Takes less than a day for a developer to start building apps – but requires a “different way of thinking”

Page 31: Building Tools to Data Mine Unstructured Text using a Machine Learning API

Learn | Solve | Evolve | Inspire

© ai-one inc. 2011

Contact

© ai-one inc. 2011

ai-one inc. World Headquarters5711 La Jolla Blvd., Bird RockLa Jolla, CA 92037United States of America

cell: +13232365938direct:+18583815897main:+18583641951

ai-one ag Flughofstrasse 55, Zürich-Kloten8152 Glattbrugg Switzerland

cell: +41794000589main:+41448284530

ai-one gmbh Koenigsallee 35a, Grunewald14193 BerlinGermany

cell: +4915112830531main: +493047890050

Olin HydeBusiness [email protected]

Page 32: Building Tools to Data Mine Unstructured Text using a Machine Learning API

Learn | Solve | Evolve | Inspire

© ai-one inc. 2011

ai-one |

additional information & case studies

Page 33: Building Tools to Data Mine Unstructured Text using a Machine Learning API

Learn | Solve | Evolve | Inspire

© ai-one inc. 2011

YouTube Channel Demos & Videos: http://www.youtube.com/user/semsys

MIT Forum Jan 17, 2012 Presentation: http://prezi.com/k1hsog309uji/ai-one-presentation-to-mit-enterprise-forum/

Case Study and Technical Evaluation in International Journal of Knowledge Management (Sept 2011): http://www.irma-international.org/viewtitle/56362/

ai-one | additional information

Page 34: Building Tools to Data Mine Unstructured Text using a Machine Learning API

Learn | Solve | Evolve | Inspire

© ai-one inc. 2011

Case Study - SEMPER ProjectConcept Based Retrieval and Lightweight Ontologies

The SEMPER Team is creating an interactive, web based platform for out-patient assistance for alcohol dependency and work related disorders.

"Learning a Lightweight Ontology for Semantic Retrieval in Patient-Center Information Systems".

Prof. Dr. Ulrich Reimer, University of Applied Sciences St. Gallen et al.

In this paper Prof. Reimer describes the use of ai-one (Association command) to learn associated nets of related terms to build ‘lightweight ontologies” and then how they created “seed concepts” of over lapping related terms with the teaching commands to give the content a notion of relevance. A keyword query then resulted in the return of content that included related concepts.

The paper also describes the testing of the ai-one approach versus the classical cosine similarity measure on a tf-idf document term matrix.

Page 35: Building Tools to Data Mine Unstructured Text using a Machine Learning API

Learn | Solve | Evolve | Inspire

© ai-one inc. 2011© ai-one inc. 2011

ASTIS™ Automatic Shoe Track Information System Followed by test with University of Lausanne, CSI LAB.

Page 36: Building Tools to Data Mine Unstructured Text using a Machine Learning API

Learn | Solve | Evolve | Inspire

© ai-one inc. 2011

Genome SQJV with ibionics / UNI Wildau

Improving the matching quality and dramatically increasing the speed per analysis by using ai-one™ technology for pattern analysis and matching.

In addition the HSDS can be used as cellarer data space for medical prognostics.

The HSDS offers a perfect environment for modeling and weather report for patient health:

“In Silico Care Cycles”!