sisr.swissinformatics.org · web viewdeep . learning. but today’s chatbots don’t offer...

Conversational A.I. Needs Meaning, Not Keywords: Part 1

John BallFollowMay 30

<img class="progressiveMedia-noscript js-progressiveMe-dia-inner" src="https://cdn-images-1.medium.com/max/1200/1*qD-lHfmQIy3Ji9NKpdfqEQ.jpeg">Understanding is deeper than just word matchingAs a cognitive scientist working on human-level conversational A.I., I often ask people why they use parts-of-speech in their model of language. I mean,

https://medium.com/@john_at_pat

it duplicates definitions, is a part of the parsing con-cept that has never been accurately implemented for any human language, and excludes meaning. But it is relentlessly taught to students instead of the meaning-based model.Similarly, I ask people why they use word embed-dings in their conversational systems when it takes something that we have exquisitely detailed knowl-edge of, a word, and converts it into a meaningless number in one or more dimensions. Worse, the numbers relate to the word’s properties of colloca-tion with other words, not to anything meaningful. It’s not accurate either, because languages don’t work by collocation alone — phrases are key. And it changes depending on how the statistics are gath-ered and from where. Do we really need to use out-of-context words in conversation?In science, when a model doesn’t work, it is often tinkered with if there is no known alternative. In the case of the solar system, circular planetary orbits were improved with planetary orbits moving in epicycles, which were improved further with planets following multiple epicycles orbiting the earth. Sadly, no matter how many were added, the model didn’t accurately predict planetary motion because the model was fundamentally wrong.In the world of conversational A.I., all platforms to-day seem enamored with intents. Intents allow a developer to map known strings of text to intents: the so-called “intent classification.”

Why use intents? It comes back to word embed-dings. We will see the parallels of the solar-system model behind the use of word embeddings for con-versation. It is fundamentally the wrong way to manage a conversation for a number of reasons.Welcome to the world of chatbots.And welcome to the revolutionary world of conver-sational A.I. where it’s claimed that “A.I. and so-phisticated natural language processing” is in play (in the Figure 1 article)!

<img class="progressiveMedia-noscript js-progressiveMe-dia-inner" src="https://cdn-images-1.medium.com/max/1600/1*WT-DmmpKzW8oJBVaSvBrKg.png">Figure 1. From www.consumersadvocate.org/chatbots. Note how the interactions are now “deeper than ever”, presumably due to deep learning. But today’s chatbots don’t offer human-like interac-tions because they don’t understand.This series will explore the science and engineering behind the current, disappointing, conversational A.I. and how working without meaning results in a lack of generalization and a lack of understanding.

Introduction

http://www.consumersadvocate.org/chatbots

The media reporting around artificial neural net-works continues to back the idea that all manner of human-imitative problems are being solved by the “deep learning” breakthrough. But most of the sci-entists behind that technology acknowledge the se-vere limitations of it, such as Turing award winner Yoshua Bengio:“what is missing from current machine learning are understanding and generalizations beyond the training distribution[i]”This is a severe limitation because human lan-guages center around communications: discourse encodes words (meaning/semantics) in phrases (syntax) to convey unambiguously (in context).The best model of semantics, syntax and context comes from Role and Reference Grammar (RRG), a linguistic model that was first to explain the world’s languages in a way that even computers can understand. My company, Pat Inc. (Pat), uses my brain theory (Patom theory) to eliminate the combinatorial explosion caused by the parsing model. Sets and lists alone, with phrase template matching, are sufficient to understand human lan-guage. Patom theory resolves the combinatorial ex-plosion from computational linguistics with different methods: converting rules to sets, decomposing ev-erything possible, and resolving meaning sepa-rately to consolidating phrases. It all follows from the brain theory.

The combination of linguistics and computer sci-ence leads to what has been known as Natural Language Understanding (NLU), but which has been stolen by marketing to mean: “I think you meant this, maybe.” Understanding is radically dif-ferent to today’s claims of NLU by state-of-the-art systems.NLU is intended to support NLP (Natural Language Processing) by providing meaning, but sadly the concept of NLU is being used to describe “under-standing” without understanding the meaning of words. This splits NLU into two concepts: (a) key-word-NLU which ‘understands’ phrases without de-termining their word’s meaning and (b) meaning-NLU, which does.Again, my brain theory expects common experi-ences to be decomposed, like words, meaning words are typically composed of multiple meanings, each of which is to be validated.Keyword-NLU (state-of-the-art)In keyword-NLU, finding relevant keywords or words that are statistically similar is good enough, even when a person would strongly disagree with the classification of the sentence.Meaning-NLU (Pat’s model)In meaning-NLU, the meaning of the words must be recognized, including validation from predicates. In meaning-NLU systems (like Pat’s system), “the

meanings of a word in a sentence is determined by the meanings of the other words in a sentence.”Let’s explore the state-of-the-art used in “Conversa-tional A.I.” which uses keyword-NLU and contrast its limitations with meaning-NLU. It’s not that key-word-NLU doesn’t have an application. It does. It’s just that it cannot scale from simple chatbot to con-versations, because it asks the developer to deter-mine each and every intent in advance for all possi-ble conversations. It leaves the problem of A.I. to the developer, while depriving the developer of meaning.The Science — “good enough” isn’t good enoughIn science, near-enough usually isn’t good enough. Getting the wrong answers is unacceptable in al-most every science. To deal with ineffective theory in NLP, we need to refocus our aim at the real tar-get — conversing with machines with speed and ac-curacy. It starts with meaning, not keywords and should be our primary interface with all devices, re-gardless of your source language.When digital computers were first built, they were expected to get the right answers. If 1+1=3 were produced regularly, we would not have gone to the moon because computers could not have controlled the spacecraft. We would not have iPhones either. By contrast, the frustrations in getting NLP to work

has led to an acceptance that errors are accept-able. They aren’t.The goal of NLP is to produce systems that are 100% accurate, not something else. Granted, noisy environments cause clarifying questions. New or unknown words can cause that too, as can ambigu-ity. But A.I. benchmarking tests continue to be built that the state-of-the-art technology cannot pass. Worse still, the best tools are considered a success when they cannot be used commercially due to in-accuracy.The General Language Understanding Evaluation (GLUE)[ii] benchmark consolidates a number of dif-ferent tests. The related paper explains: “The hu-man ability to understand language is general, flexi-ble, and robust. In contrast, most NLU models above the word level are designed for a specific task and struggle with out-of-domain data.”But in the words of the organizers: “… the low ab-solute performance of our best model indicates the need for improved general NLU systems”For NLU systems to progress, tests like those we propose at Pat are needed, in addition. In a future article, I will explain how Pat provides a range of tests from simple to advanced in which working NLU systems pass the easiest tests with 100% ac-curacy — keyword-NLU systems can’t. Human chil-dren should pass those tests also.Facebook A.I. Research: bAbI tasks

In our work on the Facebook A.I. Research team’s (bAbI) tasks, our system scored 100% and found errors in the training datasets — a blind spot for the machine learning methodology. It also answered the system in English, not with artificial keywords that allow tests to pass without a English solution (does that make any sense for an NLP test?).This testing model, ensuring the system is validated on easy examples before testing more complex ex-amples, is the approach we advocate in order to provide human-level NLU in the future across multi-ple languages.If companies release tools that can’t get 100% cor-rect on simple tests, like bAbI what does that say about the platform?Along the same lines, at Stanford near my office in Palo Alto, their SQuAD[iii] tests seem to get results of up to 90%. You’d think that’s OK, but it means the systems are inaccurate. Sure, the tests are complex, but they are tailored for the contestants to be able to pass without understanding.Simply getting 90% accuracy without understanding is aiming at the wrong target.What we are seeing is the difference between search technology (that points humans at web pages they may want) and conversation (where the user gets an answer). Conversation is more useful as it stops humans needing to select the correct documents, and it should exclude bias. It requires a very different approach.

To build useful conversational systems, they must pass a large array of tests with perfection. Anything less becomes intelligence augmentation, not hu-man-imitative A.I. Intelligence augmentation should be checked by humans. That’s OK, but the revolu-tion in NLU only comes with understanding like a human, at least some of the time.Setting the Scene: a brief historyNLP with meaning-NLU has long been considered the Holy Grail in Silicon Valley. It will be the last de-vice interface we ever use because it uses the fastest method people use to communicate. No more keyboards or keywords — not even glowing holograms floating in midair for actors to press in sci-fi movies. Words are more than enough to do the job rapidly and accurately.But NLP has always been a failure. It started with Noam Chomsky’s linguistic revolution in which he proposed a model based only on syntax. I call that syntax-first. Lack of success with the related rules-based models were replaced by better statistical approach that, in turn, morphed into a connectionist approach with artificial neural networks which is better still. Fundamentally, however, these ap-proaches to parsing trace back to Chomsky’s origi-nal 1957 model. Lack of sucess generally has also led to the use of word embeddings — with scientific justification that J.R.Firth advocated it. But he didn’t, if his writings are to be believed.

Aiming at the right target, trying things, failing and iterating to find the myriad of ways not to do NLP is the key to assembling a working NLU solution.The science of NLP has failed because of Ein-stein’s observation: “Everything should be made as simple as possible, but not simpler.”Syntax-first (like parsing) excludes meaning and context. It is too simple.Distributional semantics (out-of-context-only?) ex-cludes meaning and syntax. It is also too simple.Pat’s model based on RRG and Patom theory is just right, like Baby bear’s bed in “Goldilocks.” This system is the minimum needed to realize the goals of voice-first[iv] interaction or its next iteration, voice-only.The goal is captured clearly in the article: “We’re fi-nally able to communicate with our devices — phones, computers, wearable devices, smart speakers and more — the same way we talk to one another when we need to get things done.” Sadly, the state-of-the-art can’t deliver on this, but the sentiment remains.Using Meaning instead of keywordsPatom theory models a brain with bidirectional ele-ments. The representation remains in the sensory area while its object form connects back to its sen-sory pieces. With languages, the sounds of words is therefore auditory recognition, but the thing the word refers to is in its own applicable area — vision (the occipital lobe) for color words, the temporal

lobe for objects and the frontal lobe for actions. Connecting different words to the same meaning creates synonyms, recognized independently, with perhaps some additional qualification. Isn’t “whis-per” the same as “speak” with a particular manner? It’s bidirectional: words connect to meaning, and meaning connects to words.Machine-readable Meaning (RRG layers)RRG provides a language-independent model for a sentence: a layered model. A Semantic Set maps the words in a sentence to such a representation (shortened here) for the example: “The cat ate the rat continuously slowly on the mat today evidently because it was hungry.”

<img class="progressiveMedia-noscript js-progressiveMe-dia-inner" src="https://cdn-images-1.medium.com/max/1600/1*LIJq_mRN8M8xbvN66aWf-Q.jpeg">Figure 2. A semantic set showing the layered RRG model — nu-cleus, core and clause level.Machine-readable Meaning (RRG juncture)Perhaps the second most intriguing feature of RRG is its treatment of junctures. How are arguments from the start of a sentence mapped to the other side of the juncture?

In the sentence “John promised the cat to eat the rat” broken down below, it covers two lines (note that the juncture shows that ‘John promised’ covers the second phrase with the shared argument, ‘John’).

<img class="progressiveMedia-noscript js-progressiveMe-dia-inner" src="https://cdn-images-1.medium.com/max/1600/1*c5Mw8q0fNgXYXCHYG9859g.jpeg">Figure 3. A semantic set with an RRG juncture.By providing the machine-readable meaning of the input, a developer now has the start of the tools needed to perform conversational A.I. In addition to the meaning of the sentence, the tracking of con-text is also important as languages allow ambigu-ous sentences that are unambiguous in context. If a brain can easily determine someone’s meaning, there is no need to simplify an element further.End of Part 1In the upcoming articles, we will see how today’s conversational A.I. platforms inhibit the use of con-versation with their design principles — embracing the meaningless distributional semantics instead of focusing on meaning. We will also go into the detail of the solution: how semantic sets such as those above are arrived at, getting the results of parsing

without the uncontrolled combinatorial explosion caused by excluding meaning.While today’s platforms provide keyword-NLU, the minimum solution for conversation is meaning-NLU. Let’s begin the journey.[i] https://syncedreview.com/2019/04/16/bengio-and-marcus-at-world-ai-summit-in-montreal/ Synced, April 16, 2019.[ii] https://gluebenchmark.com/ GLUE benchmark.[iii] https://rajpurkar.github.io/SQuAD-explorer/ The Stanford Question Answering Dataset, version 2.0.[iv] https://www.forbes.com/sites/ilkerkoksal/2018/02/01/voice-first-devices-are-the-next-big-thing-heres-why/#56a189de6873 Voice-First, Feb 2018.

https://www.forbes.com/sites/ilkerkoksal/2018/02/01/voice-first-devices-are-the-next-big-thing-heres-why/#56a189de6873



https://rajpurkar.github.io/SQuAD-explorer/

https://gluebenchmark.com/

https://syncedreview.com/2019/04/16/bengio-and-marcus-at-world-ai-summit-in-montreal/

https://syncedreview.com/2019/04/16/bengio-and-marcus-at-world-ai-summit-in-montreal/

sisr.swissinformatics.org · web viewdeep . learning. but today’s chatbots don’t offer...

Documents