speech technology for opentext explore - qpc...advancements in speech technology for opentext...

Advancements in speech technology for OpenText™ Explore

White paper

This white paper discusses how various models of speech transcription and accuracy of speaker-separated audio can influence call center results. It also outlines the costs and benefits of 100% speech processing of all interactions and how to prepare for implementation of speech technology server specifications.

2/6Advancements in speech technology for OpenText Explore

Overview 3

Speed and accuracy 3

Speaker-separated audio 4

Language support 4

Moving to 100% processing 4

Server specifications 5

Contents


OverviewExplore is a powerful, fast and accurate speech analytics solution that can quickly transform massive amounts of recorded audio into actionable assets. By applying highly accurate (up to 88%) transcription capabilities, Explore quickly and reliably produces search-optimized text output from multi-speaker audio files, allowing organizations to turn enormous volumes of audio into meaningful insights.

In addition to the languages and dialects mentioned in this paper, additional languages are being rolled out regularly helping businesses increase efficiency and streamline operations around the world. Explore transcribes speech content with the highest accuracy in the industry at remarkable speed, making previously inaccessible unstructured data available for analysis, searching and indexing.

Speed and accuracyPre-recorded audio can be processed in one of three modes balancing speed, accuracy and hardware requirements:

• Accurate mode operates at 1x real time factor (RTF), making it possible to process one minute of audio in 52.2 seconds and with maximum accuracy (88%).

• Fast and accurate mode enables a trade-off between speed and accuracy, operating at 3x RTF and processing one minute of audio in 18.6 seconds with 87% accuracy.

• Warp mode operates at up to 10x RTF and enables processing of one minute of audio in 4.2 seconds with 82% accuracy.

With Explore, organizations can also increase transcription accuracy, for example through the ability to add words to existing classes. They can also overlay the out-of-the-box statistical language model (SLM) with a domain language model (DLM) that uses data related to a particular domain, to enhance Explore’s vocabulary and language model.

Explore contains a fast, highly accurate engine that produces search-optimized text output from an audio file. Accuracy of 80-88% can be achieved—the highest in the industry. Pre-recorded audio can be processed from 1x RTF—to process 60 second of audio in 60 seconds, resulting in maximum accuracy—or up to 10x RTF, enabling processing of 60 seconds of audio in just six seconds. Why does speed matter? Organizations get the data more quickly, of course. But more importantly, they save hardware costs, since faster speeds allow for a dramatically smaller footprint.

OpenText Explore leverages a highly accurate, automated transcription solution that unlocks the customer insights trapped in multi-speaker audio so that organizations can harness data across the OpenText™ Qfiniti platform. This white paper outlines the speed, accuracy and languages supported by the platform.


Speaker-separated audioIn many cases, Qfiniti will record an agent and caller on different channels and allow users to analyze results by speaker. In rare cases, when the files are not speaker separated (also known as mono recording), the Explore engine can perform “diarization” of the audio.

Moving to 100% speech processingOver the last decade, most speech analytics solutions have been able to ingest only 20% of contact center interactions, partially due to the cost per interaction and partially due to the price of speech processing hardware. But reduced hardware costs and increased speed have decreased the total cost of ownership. As a result, Explore is available in a new, 100% named-agent model. This model allows organizations to process and ingest an unlimited amount of call recordings into the system for every named-agent licensed in the solution.

Maximum flexibility: The new Explore license model is a contract term based on the number of agents an organization wants to license. With a simple monthly agent fee, organizations can ingest 100% of that agent’s interactions at any of the speed/accuracy models mentioned in this white paper. Tier-based pricing models are also available, which are based on the overall number of agents and the contract term—one, two or three years. An OpenText account executive can provide the specific details of each model or contract term.

Language support

• English: US, UK, Canada, Australia, South Africa, US Broadcast

• French: Canada, Europe

• German

• Portuguese: Brazil, Europe

• Spanish: Mexico, US, Spain, Colombia, Argentina, Chile, Guatemala

• Catalan

• Hebrew

• Italian

• Japanese

• Korean

• Arabic: World, Gulf

• Mandarin: China, Taiwan

• Cantonese: Hong Kong

• Dutch


Maximum value: 100% speech processing also positions an organization for a maximum return on investment. With Explore’s new speech engine, hardware costs are significantly reduced, enabling a lower pricing model to support the agent-based model. In most cases, current hardware configuration used to support 20% in the past can be used to process 100%. With all of its calls available in Explore, an organization can take advantage of:

• Analytics-driven quality monitoring: Automatically score interactions using OpenText™ Qfiniti AutoScore and embed results in OpenText™ Qfiniti Advise scorecards.

• Compliance triggers and alerting: Trigger compliance personnel of patterns within interactions.

• Increased sample sizes: Use Explore to analyze results across all interactions.

• Analytics-driven customer surveys: Use Explore results to trigger customer surveys based on words, phrases or sentiment.

Server specificationsSpeech transcription rates vary based on factors such as file size, codec, length of call and location of ingestion in relationship to Qfiniti Observe. The OpenText solution architect team creates custom server configurations during the design phase of a deployment.

As an example, a server used to transcribe 25,000 recordings per day with an average talk time of 360 seconds includes:

• 12 Cores Total (less than 4 processor sockets)

• 20 GB RAM

• Windows 2016 Server 64-bit Standard Edition

• NIC card Dual 1Gbit

• RAID controller - Min 1 Channels

• Local Disk RAID Arrays: 2x 72 GB HDs in RAID 1

• Redundant Power Supply

• Redundant Fans

• VMWare Supported

• AWS EC2 Instance Type: m4.4xlarge

6/6Copyright ©2018 Open Text. OpenText is a trademark or registered trademark of Open Text. The list of trademarks is not exhaustive of other trademarks. Registered trademarks, product names, company names, brands and service names mentioned herein are property of Open Text. All rights reserved. For more information, visit: http://www.opentext.com/2/global/site-copyright.html • 05/2018 • 09634

opentext.com/contact

Website: http://www.opentext.com/qfiniti

YouTube video: https://www.youtube.com/watch?v=-hF4WqEzWhw

Solution overview: http://www.opentext.com/file_source/OpenText/en_US/PDF/opentext-so-deliver-real-time-voice-customer-analytics-en.pdf

About OpenTextOpenText, The Information Company, enables organizations to gain insight through market leading information management solutions, on-premises or in the cloud. For more information about OpenText (NASDAQ: OTEX, TSX: OTEX) visit: opentext.com.

Connect with us:• OpenText CEO Mark Barrenechea’s blog• Twitter | LinkedIn

http://www.opentext.com

https://blogs.opentext.com/category/ceo-blog/

https://twitter.com/OpenText

http://www.linkedin.com/company/opentext

speech technology for opentext explore - qpc...advancements in speech technology for opentext...

Documents