transform content to actionable data and unlock … content to actionable data...6 transform,...

8
TRANSFORM CONTENT TO ACTIONABLE DATA AND UNLOCK NEW BUSINESS OPPORTUNITIES THROUGH AI The promise of AI has been long time coming, but it is finally enabling content creators to streamline complex business processes, uncover rich insights, and deliver greater value. Whitepaper: May 2019

Upload: others

Post on 22-May-2020

12 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: TRANSFORM CONTENT TO ACTIONABLE DATA AND UNLOCK … Content to Actionable Data...6 Transform, Content to Actionable Data and Unlock New Business Opportunities Through AI Tr Thr 7 The

TRANSFORM CONTENT TO ACTIONABLE DATAAND UNLOCK NEW BUSINESS OPPORTUNITIES THROUGH AI

The promise of AI has been long time coming, but it is finally enabling content creators to streamline complex business processes, uncover rich insights, and deliver greater value.

Whitepaper: May 2019

Page 2: TRANSFORM CONTENT TO ACTIONABLE DATA AND UNLOCK … Content to Actionable Data...6 Transform, Content to Actionable Data and Unlock New Business Opportunities Through AI Tr Thr 7 The

© SPi Global Content Solutions © SPi Global Content Solutions

3Transform, Content to Actionable Data and Unlock New Business Opportunities Through AI2 Transform, Content to Actionable Data and Unlock New Business Opportunities Through AI

Artificial Intelligence technology has come to dominate the business world in a relatively short period of time. As a concept, it emerged from a 1956 workshop at Dartmouth College, where a handful of computer scientists theorized how machines would learn, analyze language, and complete tasks. From this workshop, the term artificial intelligence (AI) was coined and scientists discussed the future of machine learning, neural networks, and deep learning–essentially laying the foundation for the automation revolution now underway.

Many AI applications remained theoretical or experimental for decades because most computers could not process information quickly enough or store the massive amount of data needed to complete more complex tasks. Only recently have computer processing speeds and storage capabilities advanced to a point where AI can tackle highly nuanced problems like language and

content analysis. In just the last four years, these advancements have made a huge impact on a variety of businesses by streamlining processes that have been predominately manual, such as content analysis and categorization.

The applications of AI are nearly limitless, and the technology is ushering in a new era of content creation and enrichment. Effectively, AI and machine learning are converting content into actionable data for businesses, helping professionals extract key concepts from medical research, meet intricate financial regulatory requirements, and streamline the production of academic journals.

Today, the field of artificial intelligence encompasses several solutions that help businesses quickly analyze and understand wide swathes of content. Those solutions include machine learning, natural language processing, computer vision, and deep learning.

Machine Learning is the ability for machines to learn from a representative subset of data without being preprogrammed to do so. Instead, algorithms help machines identify patterns in order to make predictions or decisions.

Natural Language Processing is an area of computer science that seeks to help computers “understand” human language. The advances in this field have helped develop auto summarization engines, chatbots, etc.

Deep learning is a machine learning methodology that relies on huge amount of raw training data in order to learn a mapping between unstructured input data like image data, text data and corresponding outputs. The algorithm automatically identifies the features in the data. This is in contrast to traditional machine learning algorithms where the machine learning engineer designs the learning features.

Computer vision uses deep learning models to teach computers to identify and analyze digital images and multi-dimensional data. The goal is to automate tasks that require visual understanding.

“With the rise of digital content, the pace of content creation has increased dramatically,” explains SPi Global Vice President of Analytics and Artificial Intelligence, Venkateshwarlu “Venky” Sonathi. “At the same time, more businesses than ever before rely on targeted, valuable content to drive positive experiences for their customers. In order for these content-driven businesses to succeed, AI tools have become indispensable.”

In this report we’ll explore several AI solutions that SPi Global has developed to revolutionize how businesses work with and transform content into actionable data insights, and how businesses are using these tools to unlock new opportunities.

All of these capabilities empower professionals to analyze, under- stand, and improve content at scale. SPi Global, a technology provider that helps businesses enrich and transform their content, is leading the way in developing content-focused AI solutions. Using a combination of the AI technology described above, SPi Global helps professionals in publishing, finance, and healthcare tackle unique content production and enrichment problems. Its solutions enable professionals to make business decisions based on rich data analysis and troubleshoot issues as diverse as email categorization for the highly regulated financial industry to concept extraction from medical research.

Page 3: TRANSFORM CONTENT TO ACTIONABLE DATA AND UNLOCK … Content to Actionable Data...6 Transform, Content to Actionable Data and Unlock New Business Opportunities Through AI Tr Thr 7 The

© SPi Global Content Solutions © SPi Global Content Solutions

5Transform, Content to Actionable Data and Unlock New Business Opportunities Through AI4 Transform, Content to Actionable Data and Unlock New Business Opportunities Through AI

Improving JournalAcceptance Rates

Historically, evaluating a journal article for publication has required a great deal editorial knowledge, subject matter expertise, and a significant amount time. The Authors make an educated guess as to which journals are the best fit for their content, while scholarly publishers must read every submission thoroughly to determine whether or not the paper meets its quality standards or aligns with its particular subject matter focus. This approach is time intensive and may not even result in a published article for the author or publisher.

A more efficient evaluation process is possible with artificial intelligence, explains Venky. “Usually, when a manuscript gets rejected, publishers risk losing revenue, the author, and his/her content to a competitive journal,” says Sonathi. “In order to address this issue, we developed our Transfer Desk Agent, which suggests alternative journals within the publisher’s catalog that may be a better fit.”

SPi Global developed the Transfer Desk Agent in December of 2017. It uses a machine learning framework to analyze all of the articles published within a publisher’s catalog of journals. The tool creates a unique fingerprint of each journal, which allows it to better match newly submitted articles to the appropriate journal based on the subject matter. The tool recommends alternative journals where a rejected article may align better, decreasing the likelihood of an unfruitful review process.

“Many publishers’ rejection rates are fairly high,” explains Jishnu Gupta, Chief Technology Officer at SPi Global. “That means they’re losing a lot of opportunity and revenue when they reject articles that could be a fit for other journals.”

Using the Transfer Desk Agent, one journal publisher was able to increase its acceptance rate by 20%. “That also means 20% more revenue for that publisher,” says Jishnu.

In addition to helping publishers retain valuable content, this technology improves the scholarly research landscape. Publishers can disseminate information to the scholarly community faster by automatically surfacing the best journal submissions. Faster publication times also help publishers keep pace with the increased volume of research in the scholarly community and ensures researchers are up-to-date.

The applications of this technology are not limited to scholarly publishing, adds Venky. “As a concept, it can be expanded to any business operation that involves identifying incoming textual information and routing it to an appropriate channel. For example, a customer service team that receives complaints via emails can use the Transfer Desk Agent concept to route emails to the most appropriate team.”

Page 4: TRANSFORM CONTENT TO ACTIONABLE DATA AND UNLOCK … Content to Actionable Data...6 Transform, Content to Actionable Data and Unlock New Business Opportunities Through AI Tr Thr 7 The

© SPi Global Content Solutions © SPi Global Content Solutions

7Transform, Content to Actionable Data and Unlock New Business Opportunities Through AI6 Transform, Content to Actionable Data and Unlock New Business Opportunities Through AI

The ability to recognize concepts and categorize them using machine learning and natural language processing is a cornerstone to SPi Global’s AI solutions. SPi Global customizes this approach to meet a variety of industry-specific needs. For example, in the financial sector, this technology helps firms sift through research and weed out content that does not meet new regulatory requirements.

SPi Global has several financial clients who must adhere to far-reaching financial regulations that went into effe ct in the European Union in 2018. The EU passed the Markets in Financial Instruments Directive II (MiFID II) in order to create greater regulation and transparency around financial research content.

Under this law, buy- side firms are required to track inducement risk and consumption of research they receive from sell- side firms. This research provides financial advice that could influence markets, so managers document all the research information they receive and flag inducement risk. Inducements, in the financial sector, are when brokers offer research for free to induce fund managers to trade with them.

Most of this financial research is delivered via email, so to ensure that its clients are meeting regulations, SPi Global developed an Email Classification System that analyzes and categorizes all incoming emails. The technology scans content within the email, as well as any attachments, identifying what research the company actually pays for and which incoming research may not be MiFID II compliant. Emails that are compliant are allowed to pass through to the appropriate financial professionals. Those that are not compliant are flagged for further analysis.

SPi Global built machine learning into the Email Classification System, and it is informed by business rules. “The system was developed using insights from subject matter experts who understand the nuances of MiFID II regulations minutely,” says Venky. “They developed a corpus of manually classified emails that were used to train the model. This follows the process of supervised learning.”

The automated solution saves companies millions of dollars in labor costs and legal fees, adds Jishnu. “Just to adhere to this regulation, firms would need to hire about 20 to 30 analysts to classify incoming research,” says Jishnu. “With our application, they only need a couple of analysts to sift through the data as the classification is done by the system. Our tool is 90% accurate, so it is simply a matter of the analysts validating the results. On top of that, our technology creates an audit trail that demonstrates our clients are MiFID II compliant.”

The same email classification approach is applicable to a variety of industries, says Venky. Publishers, for example, can use this system to better manage the publishing workflow with respect to author management.

Project managers receive thousands of emails daily related to ongoing publishing projects that are in different stages of production. The Email Classification System can analyze and categorize these emails so that project managers immediately know which publication they relate to, what stage of the production process they are in, and even the level of priority. The technology speeds up the publishing work-flow and empowers project managers to respond to the most important emails first and improve the service levels provided to the authors.

Using AI to Meet NewFinancial Regulations

Page 5: TRANSFORM CONTENT TO ACTIONABLE DATA AND UNLOCK … Content to Actionable Data...6 Transform, Content to Actionable Data and Unlock New Business Opportunities Through AI Tr Thr 7 The

© SPi Global Content Solutions © SPi Global Content Solutions

9Transform, Content to Actionable Data and Unlock New Business Opportunities Through AI8 Transform, Content to Actionable Data and Unlock New Business Opportunities Through AI

“Let’s say on average there are about 20 important concepts within an article,” says Jishnu. “Today, the

machine is able to extract 16 out of 20 of those concepts automatically. It’s achieved 80% automation, meaning our subject matter experts can spend the majority of their time

validating the machine’s work, as opposed to identifying concepts themselves and going through the entire article.”

In 2018, SPi Global augmented this process with a machine learning solution. “We created a supervised model to predict the concepts based on the document content,” says Venky.

The model is built on all of the research SPi Global’s subject matter experts have analyzed and tagged in the past. Those millions of data points teach the machine learning model how to recognize and extract the most important concepts within a research article. The technology is calibrated so that it can distinguish between concepts and recognize which concepts are most valuable and relevant to the researcher.

The Concept Identification tool is improving the quality and relevancy of concepts, often identifying concepts that subject matter experts might have missed. That’s improving the taxonomy tree, making search and discovery within medical databases more relevant and valuable for professionals. It also adds a level of consistency and reduces deviation, which is inherent when SMEs select concepts.

The application is also increasing the volume of research SPi Global can analyze for its clients, as well as the rate at which it can develop taxonomy trees. For example, one client has increased the volume of research it analyzes through SPi Global by 50% in the last year.

In the near future, SPI Global plans to employ deep learning technology to automate Concept Identification even further, says Venky. With these advancements, SPi Global will effectively eliminate the need to identify concepts manually and subject matter experts will simply quality check the concept recommendations.

For several years, SPi Global has provided Concept Identification services to scientific and medical information business. Historically, these publishers would rely on a team of highly qualified subject matter experts to read research, identify core concepts within that research, and link to related concepts in other content.

“Let’s say we extract from a research article a certain disease. We’ll also extract the symptoms associated with that disease, the medication for that disease, the chemical compounds of that medication,” explains Jishnu. “All of this data is organized and ingested into a taxonomy tree. That taxonomy tree powers search and discovery within a database so that medical or pharmaceutical professionals can find the most relevant information.”

Identifying Concepts to Improve Database Search & Discovery

Page 6: TRANSFORM CONTENT TO ACTIONABLE DATA AND UNLOCK … Content to Actionable Data...6 Transform, Content to Actionable Data and Unlock New Business Opportunities Through AI Tr Thr 7 The

© SPi Global Content Solutions © SPi Global Content Solutions

11Transform, Content to Actionable Data and Unlock New Business Opportunities Through AI10 Transform, Content to Actionable Data and Unlock New Business Opportunities Through AI

One of the most labor-intensive processes, content businesses face today is translating PDFs into searchable, categorized information. To accomplish this previously, business professionals had to type out the text information within a PDF and then tag and categorize it. Now AI technology can scan searchable and non-searchable PDFs and extract the most important information from them.

SPi Global’s proprietary extraction tool, SPiZoneTM, is trained to recognize certain areas of a document and automatically extract their meaning. The technology identifies “zones” of a document, whether they are text, a table, or an image, and then pulls out the important information from those zones. It then normalizes the PDF information so that it can be ingested into the customer’s database.

An alternative and a less advanced solution is an Optical Character Recognition (OCR) engine. These engines have significant limitations, says Jishnu. OCR engines do not preserve relevancy or style; they provide raw text extraction. The solution still needs a great deal of human intervention to correctly identify areas of a document, whereas SPiZoneTM is fully automated.

What makes SPiZoneTM particularly powerful is that once it recognizes a certain content zone, for example an address on an invoice, it can identify that type of content regardless of how it is presented. The address could appear in a different position or be aligned vertically instead of horizontally, and SPiZoneTM will still recognize the address and pull the appropriate data.

Developed in 2011, the tool has evolved to solve for a wide variety of use cases. For example, one SPi Global customer analyzes public filings that businesses submit reporting their assets. This information is then compiled and sold as research to financial professionals. The client uses SPiZoneTM to scan the PDF filings, extract the most important information, and import it into their database. That allows the company to provide highly accurate and timely research to its customers.

Extracting Information from PDF Content

A risk assessment company also uses this tool to provide critical information to car insurance companies. The customer uses SPiZoneTM to extract information from police reports on car accidents from multiple states in the U.S. That information is pulled directly into the company’s database so that it can precisely calculate the risk for car accidents in different regions of the country.

“Content extraction is a need for any industry,” says Venky. “Whether it’s pulling important information from invoices or legal documents, content extraction can be a significant bottleneck for businesses and SPiZoneTM is removing that.”

With over 95% accuracy in the content it pulls from PDFs, SPiZoneTM is essentially eliminating the need for human intervention. The technology has transformed a once highly manual process into immediate and precise content analysis.

Page 7: TRANSFORM CONTENT TO ACTIONABLE DATA AND UNLOCK … Content to Actionable Data...6 Transform, Content to Actionable Data and Unlock New Business Opportunities Through AI Tr Thr 7 The

13Transform, Content to Actionable Data and Unlock New Business Opportunities Through AI

© SPi Global Content Solutions

12 Transform, Content to Actionable Data and Unlock New Business Opportunities Through AI

AI Is Unlocking New Possibilitiesfor BusinessesThese are just a few of the ways that SPi Global is using AI technology to ingest businesses’ content, identify patterns, and extract valuable data insights. The applications are near limit-less because every business has a need to understand and act on the content. That is why Jishnu believes content businesses are just scratching the surface of what’s possible with AI.

“As a content technology company, I think there is a very interesting journey ahead of us,” says Jishnu. “This problem of analyzing and processing content will grow tenfold because content is at the heart of everyone’s business. With the internet age and the explosion of content creation, there is a greater need than ever to automate content analysis.”

Venky agrees and believes SPi Global is uniquely qualified to take on this challenge for businesses.

“Because of our deep subject matter expertise and knowledge of semantics within a professional domain, I think we’re in a unique position to solve these very industry-specific problems, whether those are in the financial, legal, medical, or scholarly sectors,” says Venky.

“AI is at the heart of our investment at SPi Global and will continue to be in the coming years,” says Jishnu. “At SPi Global,we see AI as an augmentation for our deep subject matter experts. AI will driveautomation and organize huge volumes of unstructured data in a meaningful way for consumption. We will always need some level of human intervention, but with AI, professionals can accomplish more than they ever could before and improve the timeliness and quality of their content data.”

AI technology unlocks a new level of productivity and scale for content businesses. With SPi Global as a partner, businesses can disseminate massive amounts of content, eliminate bottlenecks, and make more informed and timely business decisions. That is empowering business professionals toachieve scale that was not possible prior to the AI boom.

Page 8: TRANSFORM CONTENT TO ACTIONABLE DATA AND UNLOCK … Content to Actionable Data...6 Transform, Content to Actionable Data and Unlock New Business Opportunities Through AI Tr Thr 7 The

About SPi GlobalSPi Global is a market-leading content technology and content solutions enterprise that providesdata services and subject matter expertise (SME) to multiple industries such as publishing, finance, healthcare and life sciences, media and retail, research, learning, and corporates. Leveraging its deep domain expertise and suite of proprietary technology platforms, the company brings forth cutting-edge innovation for the extraction, enrichment and transformation of structured and unstructured content and information assets.

With a client-base scoping 30 countries worldwide, SPi Global delivers business transform ation services from 19 centers across the globe. The company’s multi-geographical resource pool is strategically located in six countries, the Philippines, India, US, China, Nicaragua and Vietnam.

For more information on how SPi Global can help you maximize your content online and offline, please contact:

Marketing CommunicationsT 632 855 8600 Ext. 29643E [email protected]

www.spi-global.com