synthema semantic intelligence, speech & language technologies

17
Carlo Aliprandi 1 SyNTHEMA Semantic Intelligence, Speech & Language Technologies Carlo Aliprandi – Mario Spoto Synthema srl

Upload: carrie

Post on 23-Feb-2016

35 views

Category:

Documents


0 download

DESCRIPTION

SyNTHEMA Semantic Intelligence, Speech & Language Technologies. Carlo Aliprandi – Mario Spoto Synthema srl. Company Profile. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: SyNTHEMA Semantic Intelligence, Speech & Language Technologies

Carlo Aliprandi

1

SyNTHEMASemantic Intelligence, Speech & Language Technologies

Carlo Aliprandi – Mario SpotoSynthema srl

Page 2: SyNTHEMA Semantic Intelligence, Speech & Language Technologies

Carlo Aliprandi

2Company Profile

Based in Pisa (Italy), SyNTHEMA is a high-technology SME that was established in 1993 by computer scientists from the IBM Research Center. Since then, the company has rapidly evolved, becoming nowadays a leading provider of Language and Semantic solutions, with state-of-the-art technologies for applications like Enterprise Search, Audio&Text Mining, Technology Watch, Competitive Intelligence, Speech Recognition, Respeaking and Speech Analytics.

Grounding its leadership into a strong IT Research and Development, SyNTHEMA has pioneered a number of innovative applications and solutions, adopted on a daily basis by a vast amount of users to perform productivity tasks in different markets and industries, including Homeland Security, Intelligence and Law Enforcement, Public Administration and Government, Healthcare and Media.

Page 3: SyNTHEMA Semantic Intelligence, Speech & Language Technologies

Carlo Aliprandi

3Structure and activities

Semantic Technology

Translation Technology

Speech Technology

• 30 People (20 IT, 10 Localisation Services)

Page 4: SyNTHEMA Semantic Intelligence, Speech & Language Technologies

Tecnologie del Linguaggio – COSA SONO?

• Language technology is often called Human Language Technology (HLT) or Natural Language Processing (NLP) and consists of computational linguistics and speech technology as its core but includes also many application oriented aspects of them. Language technology is closely connected to computer science and general linguistics

• Bill Clinton: “Soon researchers will bring us devices that can translate foreign languages as fast as as you can talk ….”

Carlo Aliprandi

4

Page 5: SyNTHEMA Semantic Intelligence, Speech & Language Technologies

Tecnologie del linguaggio, alcuni esempi

LINGUAGGIO SCRITTO Traduzione Automatica Analisi Semantica Ricerca in linguaggio naturale Information Retrieval Question Answering

LINGUAGGIO PARLATO

Speech Recognition – Speech to Text Trascrizione Automatica Sottotitolazione Assistita Intelligent Speech Interfaces Comprensione del Parlato Gestione del dialogo (Avatar,..)

Carlo Aliprandi

5

Page 6: SyNTHEMA Semantic Intelligence, Speech & Language Technologies

Carlo Aliprandi

6

AIIA NLP Workshop 2010

Il linguaggio naturale

Source Ethnologue Source Netz-Tipp.De 2002

Source http://www.netz-tipp.de/languages.html

Page 7: SyNTHEMA Semantic Intelligence, Speech & Language Technologies

Carlo Aliprandi

7

AIIA NLP Workshop 2010

Semantica

• La semantica è una parte della linguistica che studia il significato delle parole, delle frasi e dei testi.

• Computazionalmente, si tratta di analizzare automaticamente il testo, cercando di capire il testo e di rappresentarne il significato profondo, (Natural Language Understanding) che risulta sempre dipendente dal contesto.

Page 8: SyNTHEMA Semantic Intelligence, Speech & Language Technologies

Carlo Aliprandi

8

AIIA NLP Workshop 2010

Tecnologie fondanti, esempi: interessi

“Questo prodotto è rivolto ai clienti cui interessi la qualità”“Contattateci per il calcolo interessi addebitati nei conti correnti e nei mutui”

borsa“La borsa di Milano oggi ha chiuso in rialzo”“La Treasure è una borsa in pelle di lusso““Dal 2010 diventa obbligatoria la borsa della spesa biodegradabile”“Il presidente ha le borse sotto agli occhi”“per ottenere la borsa di studio della Camera dei Deputati…”

Semantica: stato dell’arte

WSD

VERBONOME

POS tagging

interessareinteresse

lemmatizzazione

NER

ORG.

DATE

PERS.

LOC.

Page 9: SyNTHEMA Semantic Intelligence, Speech & Language Technologies

Carlo Aliprandi

9Semantic Technology

The Italian market offers State of the art for deep NLU:• Lemmatisation• POS Tagging • MultiWord Detection (MWD)• Named Entity Recogniiton (NER)• Parsing (dependency – constituency)• Word Sense Disambiguation (WSD)• Sentiment Analysis (SA)• Semantic Role Labeling (SLR)

Languages:

Page 10: SyNTHEMA Semantic Intelligence, Speech & Language Technologies

Carlo Aliprandi

10

AIIA NLP Workshop 2010

Semantica

• è un cool topic?– Bing Microsoft – Powerset (linguistic processor)– Google – Applied Semantics (ontology, or knowledge base of

concepts and their relationships, coupled with linguistic processing engine)

– Google Squared (structures the unstructured data on web pages)– Hakia (meaning-based search engine, ontology and semantic lexicon,

ontological parser)– WolphramAlpha

+ computational knowledge engine, distilled and revised knowledge, NL query, rich visualisation

- Knowledge engineering, language dependent

– IBM Watson (Jeopardy!)

• aspettando la killer app, c’è una domanda latente di “Semantic Search”

Page 11: SyNTHEMA Semantic Intelligence, Speech & Language Technologies

Carlo Aliprandi

11Speech Technology

The Italian market offers State of the art speech technology for:

• Automatic Speech Recognition

• Automatic Transcription

• Dialogue Systems

• Speech Analytics

Languages:

Page 12: SyNTHEMA Semantic Intelligence, Speech & Language Technologies

Carlo Aliprandi

12Speech Recognition

• Dictation– Dictation is the interactive composition of text– Medical Report, court – parliamentary proceedings

• Transcription– Transcription is transforming

speech into text (Batch – Online)

• Dialogue– CRM, device control, navigation, call routing

• Speech Retrieval– Search audio and video using keywords

Page 13: SyNTHEMA Semantic Intelligence, Speech & Language Technologies

Carlo Aliprandi

13

– Q&A

Thank you

Carlo Aliprandi Language and Speech Solutions Manager

Synthema srlwww.synthema.it

Page 14: SyNTHEMA Semantic Intelligence, Speech & Language Technologies

Carlo Aliprandi

14from Core NLP&ASR Technologies to productsDal 1997, portiamo tecnologia R&D nel mercato SR (consumer - professional) 1997: primo sistema di Riconoscimento del Parlato Continuo per l’Italiano (IBM Viavoice – Rad, Pat)2000: Camera dei Deputati: CameraVox, primo sistema di Speech Reporting basato sul respeaking2001: SpeechTitle (VoiceSubTitle), primo sistema per la sottotitolazione con Live Respeaking (broadcasting)2002: Voice Suite, primo sistema professionale per il distributed Speech Reporting2005: Voice Suite (and Fabrizio G. Verruso) World Champion in Speech Reporting - (still current) WR of 174 wpm2009: Applicazione al mondo Giudiziario: Voice Suite Legal edition2007: SpeechAligner, sistema di allineamento automatico tra video/audio e testo2008: DictaSpeech, sistema ALL-IN-ONE per la gestione del workflow dell’Audio Digitale e della Resocontazione Vocale 2009: Applicazione al mondo Giudiziario: Voice Suite Legal edition2010: SpeechJive, Riconoscimento del Parlato su nuovi motori Nuance Dragon NaturallySpeaking2012: SpeechScribe.Server, Speaker Independent ASR. Trascrizione automatica batch di audio e video del parlato spontaneo

2012: SpeechScribe.Live!, primo sistema di Online Speaker Independent ASR. Trascrizione automatica Live di audio e video del parlato spontaneo per la lingua italiana

2013: SpeechScribe, new languages:

Page 15: SyNTHEMA Semantic Intelligence, Speech & Language Technologies

Carlo Aliprandi

15Main Clients - Italy

• Italian Chamber of Deputies• RAI• Università di Pisa• Regione Emilia-Romagna• Tribunale di Milano

• 200+ Hospitals, 10.000+ physicians:– Ospedale di Merano (German & Italian) – Azienda Ospedaliera Pisana– Ospedale di Viareggio– Ospedale Savigliano– Area vasta Toscana Sud-Est

Page 16: SyNTHEMA Semantic Intelligence, Speech & Language Technologies

Carlo Aliprandi

16Market

• HealthCare (Public and Private)• Media• Local and central Government• Customer care

Page 17: SyNTHEMA Semantic Intelligence, Speech & Language Technologies

Carlo Aliprandi

17R&D - current projects

• Mosaic (Multi-Modal Situation Assessment & Analytics Platform)Automated detection, recognition, geo-location and mapping, to enhance situation awareness, surveillance targeting and camera handover

• Caper (Collaborative information, Acquisition, Processing, Exploitation and Reporting for the prevention of organised crime)A common platform for the prevention of organised crime through information sharing, exploitation and analysis of open and closed information sources

• Savas (Sharing AudioVisual language resources for Automatic Subtitling)Audiovisual resources collection and sharing, to develop a new Speech Recognition (ASR) technology for Multilingual Live Subtitling, specifically tuned to the needs of the Broadcasting and New Media Industries.

• OpenNER (Open Named Entity recognition)Provide enterprise and society with base technologies for Crosslingual Named Entity Recognition and Classification and Sentiment Analysis through the reuse of existing resources