the lab digitalization loop picks up speed… · the lab digitalization loop ... and many manual...

11
Dr. Gerhard Noelken The Lab Digitalization Loop picks up speed…

Upload: tranlien

Post on 23-Jul-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Lab Digitalization Loop picks up speed… · THE LAB DIGITALIZATION LOOP ... and many manual data transactions. ... the Internet of Things and Artificial Intelligence has created

Dr. Gerhard Noelken

The Lab Digitalization Looppicks up speed…

Page 2: The Lab Digitalization Loop picks up speed… · THE LAB DIGITALIZATION LOOP ... and many manual data transactions. ... the Internet of Things and Artificial Intelligence has created

TABLE OF CONTENTSExecutive Summary

Introduction

Problem Statement

1 Laboratory Workflow

2 Standard Data Format

3 Semantic Technology

4 Internet of Laboratory Things

5 Data are Your Biggest Asset

6 Artificial Intelligence or Deep Learning

Conclusions

What Is In It For Me? Or How To Best Explain“

Return On Investment”

References

5

5

5

6

9

11

13

15

17

19

19

19

19

Page 3: The Lab Digitalization Loop picks up speed… · THE LAB DIGITALIZATION LOOP ... and many manual data transactions. ... the Internet of Things and Artificial Intelligence has created

44 5

THE LAB DIGITALIZATION LOOPPICKS UP SPEED…

Digitalization, Internet of Things and Artificial Intelligence bring higher productivity and creativity to the Pharmaceutical Laboratory

Lab Digitalization Loop

EXECUTIVE SUMMARY � Digitalization in Pharmaceutical labs has not progressed as quickly as in other industries.

The complexity of data and processes and the lack of consistency across different research approaches are some of the reasons for this.

� The reality in many labs is still defined by stand-alone analytical instruments, software systems that don’t follow a consistent workflow, and many manual data transactions.

� Organizations need to rethink how to improve the experience and efficiency in the lab. They need to broadly implement better usability and easier human-to-intelligent system interaction while minimizing human interference.

� Labs need to capture and store data in a way that enables computers to consistently interpret and intelligently ana-lyze data with all contextual information available.

� The arrival of Semantic Technology, the Internet of Things and Artificial Intelligence has created the opportunity to rethink Data Management now.

INTRODUCTIONWe use the smartphone to manage our daily life. From communication to finances, from transportation to household infra-structure, everything depends on it.

Why is it so different in the scientific lab? While we know about some challenges like the hybrid paper/electronic environment and limited instrument connectivity, other reasons are not as obvious.

A lot of the technology that can help improve laboratory operations already exists and just needs to be combined and leveraged in the right way.

This whitepaper outlines how the collaboration of organizations across science-based industries and their leveraging of advanced technology will improve lab processes and productivity. Digitalization will accelerate scientific creativity, bringing better products to market faster. Together these trends will not just benefit science but enable better care for patients as well.

OVERCOMING LABORATORY INERTIAYears of scientific training have given scientists the confidence to plan and execute the best experiments possible. They know how to set up experiments, assign and schedule resources and how to analyze the data to achieve valid results that move projects forward. Yet the labs struggle with some issues.

Following a test procedure or conducting an experiment for a targeted project is different from setting up an experiment to be reused to increase the broader scientific knowledgebase of your organization.

Having to adapt a proven experiment setup so that it adheres to a new standard, SOP or IT system can be cumbersome. Likewise, mandatory adherence to defined terms when documenting an experiment can be disruptive.

Collaboration is made very difficult when teams are working on very similar experiments but small modifications, different system versions, and different descriptive terms make it impossible to compare the data and draw any meaningful conclusions.

Changing the behavior of scientific teams is extremely difficult, because most corporate incentives are driven by project pro-gression. They do not value contributions to corporate knowledge to the same extent; data capture for reuse is not honored.

But explaining the value of digitalization at a non-technical level will provide insight into the value of Intelligent Data Man-agement in the new world of Digital Data and Big Data Analysis.

Laboratory Workflow

Data as Our Biggest

Asset

Artificial Intelligence /

Machine Learning

StandardData Format

Internet of Laboratory

Things

SemanticTechnology

Page 4: The Lab Digitalization Loop picks up speed… · THE LAB DIGITALIZATION LOOP ... and many manual data transactions. ... the Internet of Things and Artificial Intelligence has created

6 7

REPORTING• COMPILE, INTERPRET RESULTS• GENERATE REPORTS

LAB EXECUTION• PREPARE TESTS• PERFORM TESTS

RECIPES & METHODS• DEVELOP PROCEDURES• ADAPT PROCEDURES• MANAGE PROCEDURES

LAB PLANNING & MANAGEMENT• REQUEST WORK• SCHEDULE, REVIEW• CONFIRM REQUESTS

RESOURCES• SAMPLES• MATERIALS• PERSONNEL• EQUIPMENT

LABORATORY WORKFLOW

Technology Convergence

Fundamentals of a measurement workflow

Scientists who want to set up their next experiment create a list of ingredients, buffers, reaction vessels, the analytical instruments and maybe discuss with their managers or peers which statistical analysis would be best for the experiment.

Today most scientists no longer write into a paper notebook but rather use an ELN (Electronic Lab Notebook). For most of the steps, a template will be available that helps with the DOE (Design of Experiment) or a form to create a recipe.

Practically this is the first step towards the digitalization of the experimental record. Unfortunately, without some other steps described in this paper it is just creating a “Paper on Glass” version of what otherwise would have been recorded in a paper lab notebook.

Digitalization offers huge benefits, but also requires some preparation. Fortunately, all the steps do not have to happen simultaneously as long as there is a vision of what the aim should be.

Mapping the process from idea generation, through experimental set-up to data analysis and inter-pretation can be a very useful first step (1). It will clarify which functionality is needed at each step in terms of instructions for the instrument (input) and what should be captured in terms of metadata or results (output).

The specifics of an experiment will also define the software modules that support the transactions needed at each step. They come from defining the data and workflow process separately. This makes it easy to decide which modules or functionalities of a laboratory system (ELN, LIMS or LES) would be best suited for the task.

But today’s systems often require difficult choices, because the functionality they provide like instrument integration or reporting can overlap significantly. Technology convergence of the systems will lead to a more integrated platform consisting of modules providing specific functionalities.

Conceptually this supports the data flow by separating it from the process steps traditionally required by your lab systems.

Given the data integrity, quality and completeness expected today, using one modular system that consistently gives instructions to instruments and gathers results and metadata at the end of each step would be ideal. In such a scenario, manual data capture or transfer and formatting steps can be removed from the process completely.

The Allotrope Foundation has used the following simplified workflow to develop the concept of the Allotrope data format:

� Contextual metadata accumulates along every step

� Distributed across multiple systems and records

Plan Your Experiment

“Paper On Glass”

Building A Digitalization Vision

Mapping The Workflow

Mapping The Data Flow

Aligning Software Modules

Simplifying The Workflow

1

Plan Analysis

Prepare Samples

Submit Samples

Control Inst. Acquire

DataProcess

DataAnalyze

DataReports Results

Store, Archive

Data

������� ������������ � ����� ����

SamplePrep Data

Instrument Instructions

Instrument Data

Processed Data

Analyzed Data

Reported Results

Stored DataAnalytical Method

ELN InstrumentSoftware

 

File Shares, Databases   Request‘System’

LIMS AnalysisSoftware

CAPTURING ALL METADATA

CAPTURING ALL METADATA

AUTHORDOCUMENTANALYZE

LIMS

SDMS CHEM ELN

CHEM ELN

LESINVENTORIES

CAPTUREACCESSREVIEW

Page 5: The Lab Digitalization Loop picks up speed… · THE LAB DIGITALIZATION LOOP ... and many manual data transactions. ... the Internet of Things and Artificial Intelligence has created

8

Regulated Data Repository

9

A key aspect is that at each step all results are captured, including the metadata describing the exper-imental conditions, e.g., the settings of the instrument. Only with a complete set of metadata can the system reliably compare the results with data coming from a different source.

� BIOVIA foundational management of the Allotrope Ontologies & Vocabularies

� BIOVIA foundational connection to laboratory instruments using Allotrope ADF format

� BIOVIA Recipe and Methods in S88 format

� Customer focused proof of concept projects in process and in the works

Members of the Allotrope Partner Network like BIOVIA have already proven that their ELN and lab infor-matics solutions can successfully work with transactional data by importing the Allotrope Foundation Ontology (AFO) and using the Allotrope Data Format (ADF).

File Pre-Processing Analysis ELN Experiment

Measures Taxonomy

Certain conventions have been established to enable machines to deal with data. When Bernard Lee designed the internet, an important step was to define the markup language HTML.

A markup language is a system for annotating a document. The idea and terminology evolved from the “marking up” of paper manuscripts, i.e., editorial revision instructions.

In digital media, these instructions were replaced by tags. Markup tells the software how to display a document in a browser.

<!DOCTYPE html><html>

<head><meta charset=”utf-8”><title>An example of HTML</title></head>

<body> <h1>This is MAIN header</h1> <a href=”http://3dsbiovia.com/”>BIOVIA</a> <h2>This is SUB header </h2> <p> Plain text</p></body>

</html>

XML is the acronym for Extensible Markup Language. In XML tags are not predefined as in HTML. For this reason, XML is much more flexible when it comes to describing data.

<?xml version=”1.0” encoding=”utf-8”?><!-- This is an example --><people> <person> <name>Stephen Hayward</name> <employment status=”full time”/> <motto xmlns:h=”https://www.w3.org/1999/xhtml”> I <h:b>love</h:b> DATA! </motto> </person>

</people>

There have been defined data standards for different areas of science for many years (e.g., analytical chemistry). As data for various technologies are different, people started with technology-specific standards.

By implementing data standards, users benefit from simpler long-term storage and retrieval of analytical data along with more advanced techniques for data mining and knowledge generation.

Demonstrating conformity with regulatory compliance and record retention requirements is also far simpler when working with a single, vendor-neutral open-standard format.

Allowing For Data Consistency

HTML

STANDARD DATA FORMAT2

XML

Laboratory Data Capture, Storage and Reuse

root element

element start tag

element end tag

qualified namenamespace declaration(avoids name conflicts, forces uniqueness)

comment

content

attribute

processing instruction

Metrology, Equipment

Page 6: The Lab Digitalization Loop picks up speed… · THE LAB DIGITALIZATION LOOP ... and many manual data transactions. ... the Internet of Things and Artificial Intelligence has created

10 11

Analytical Information Markup Language (AnIML) was an early attempt to use XML as a markup language for analytical chemistry data generated by technologies such as chromatography and spectroscopy (2).

With the increasing use of process analytical technologies (PAT) and design for manufacturing strate-gies in the pharmaceutical sector, data from many diverse instrument types and vendors are brought together in a single data analysis package where the actual “results” are computed.

Unfortunately, building AnIML took a long time and concerns about usability and the ability to handle large data volumes drove another consortium of 13 large Pharma companies (the Allotrope Foundation) to sponsor a slightly different approach based on HDF (Hierarchical Data Format), ontologies and an API framework to reduce the implementation effort for end-users.

The Allotrope Data Format defines a highly flexible HDF data container that should be able to serve many use cases beyond just analytical chemistry. Its adoption at vendor and industry sites and ongoing implementation work indicate that a solution to a long overdue challenge has been found.

As a member of the Allotrope Partner Network, BIOVIA has many years of experience in managing laboratory instrument transactions, data transport, and archiving using the Allotrope ADF format, while building recipe and method information in the S88 format using Allotrope ontologies.

When describing an experiment, scientists usually use the terms they traditionally use when commu-nicating with colleagues, or they use the terms proposed for a specific software tool, or the words used in an instrument manual. After years of scientific work, scientists tend to use the language they are most comfortable with. But scientists also face situations where they need to share data while collaborating and comparing results with colleagues in another part of the company who are working on a similar experiment. Even after agreeing on units and experimental conditions, data can only be combined manually. Often the usage of certain terms is different for different individuals. And in today’s world, there is also the “electronic colleague,” the computer or software that has yet another expectation about how certain data should be labelled. IT departments often provide a defined list of terms, a controlled vocabulary, and emphasize the importance of taxonomies and ontologies. But the R&D departments or laboratory operations of most pharmaceutical companies have not yet managed to implement such ontologies holistically. Beginning with a list of agreed terms like a glossary or dictionary, scientists might work towards building a taxonomy introducing a clear hierarchy of terms. In an ontology it is possible to represent complex relationships based on a formal logic. These relationships are what make the ontology valuable for the computational tools using this semantic power.

Analytical Information Markup Language

Allotrope Data Format

What Is A Taxonomy?

Sharing Data

Laboratory Language

Ontology Can Represent

Relationships

SEMANTIC TECHNOLOGY3

Semantic Spectrum of Knowledge Organization Systems

Descriptive metadata about• Method, instrument, sample, process, result, etc.• Provenance, audit trail• Data Cube, Data Package

Analytical data represented by one- or multidimensional arrays of homogeneous data structures.

Data represented by arbitrary formats, incl. native instrument formats, images, pdf, video, etc.

Specifically designed to store and organize large amounts of scientific data.

Allotrope Data Format (ADF)

HDF5Platform Independent File Format

Data DescriptionSemantic Graph Model

Data Cubes Universal Data Container

Data PackageVirtual File System

APIs

(Jav

a &

.NET

cla

ss li

brar

ies)

Page 7: The Lab Digitalization Loop picks up speed… · THE LAB DIGITALIZATION LOOP ... and many manual data transactions. ... the Internet of Things and Artificial Intelligence has created

12 13

DefinitionAn ontology is a model that provides a formal description of entities, their attriibutes and all sorts of relationships that can hold between them.

� usage of knowledge representation language which is based on formal logic

� can represent complex relationships among objects and include the rules and axioms missing, e.g. from semantic networks

� allows automated reasoning

The ontology concept has also been applied in the context of the modern Internet, where it extends the network of hyperlinked human-readable web pages by inserting machine-readable metadata about pages and how they relate to each other. This enables automated agents to access the Web more intelligently and perform more tasks on behalf of users.

BIOVIA has successfully demonstrated how an ontology can automatically be imported into an ELN.

Amgen has been the first pharmaceutical organization to use this capability, applying Semantic Web technologies to instrument integration through BIOVIA Lab Services. Amgen manages the installation of hundreds of new laboratory instruments each year by leveraging Semantic Web technologies and Allotrope Foundation standards. They have developed and deployed nearly ‘plug-and-play’ capabilities to connect new instruments to the BIOVIA lab solution.

The Internet of Things (IoT) is a network of devices, all embedded with electronics, software, and sensors that enable them to exchange and analyze data. Soon probably every sensor in a lab will have an IP address and a dashboard to easily address each device or “laboratory thing.”

Some years ago, tools were available that allowed scientists to see the lab inventory and determine if a certain instrument was on or off, providing extensive functionality using Web services.

With the arrival of the Internet of Things (IoT) at home and in the manufacturing space, new func-tionalities are becoming available. Start-up companies are offering IoT platforms specifically tailored for the lab environment.

Functionality of these platforms is often still limited to the status parameters of individual instruments.

The crucial step will be to appropriately combine instrument status data with actual result data pro-duced while the experiment runs on the instrument.

Some IoT companies claim that their customers can already manage instruments, automate exper-iments and collect result data on the same dashboard while retrofitting “legacy” equipment to this environment. Ensuring this is process is performed correctly is a substantial task that which puts a high onus on the IoT provider and customer for validation.

Ontology

Semantic Web

Ontology- Enabled ELN

Iot Platform

Combine Status And Results

Link Your Instruments

INTERNET OF LAB THINGS 4

Protocol Converter File Database / SOMS

Instrument Web Services

Empower

METROLOGY PROCEDURE EXECUTION

File & Database TransferDirect Results Transfer

Page 8: The Lab Digitalization Loop picks up speed… · THE LAB DIGITALIZATION LOOP ... and many manual data transactions. ... the Internet of Things and Artificial Intelligence has created

1514

Capabilities for the lab that IoT companies are working on:

� Integration of intelligent labelling of all components in the lab

� Better data capture into the lab workflow

Wearables range from the biometric bracelet that allows a scientist or technician to access the lab all the way to Augmented Reality accessories like Google glasses that display Standard Operating Proce-dures (SOPs) in their visual field or allow them to record the complete execution of their experiment.

Defining the added value of these components as well as their usability in the lab are challenges that still need more work.

Other challenges include data security in an environment where each instrument has its own IP address and our current lack of standards defining which IoT protocols should be used across different industries.

For decades pharma companies regularly stated that their corporate knowledge, the “data residing in their databases,” is their biggest asset.

That statement might be generally correct, but unfortunately it is still quite difficult for many com-panies to leverage that knowledge. This is due to the many different formats for storing information, the different terminology used in each department and the missing links between structured and unstructured data.

Unstructured data describes all the documents, minutes, reports and presentations that are created during a project and often contain the crucial conclusions that have been drawn from a series of experiments.

While text analysis tools can help with the interpretation of unstructured data, automatically linking unstructured to structured data is a field still in its infancy. Today codes and identifiers are still the safest way to link free text with experimental data.

Structured data refers to any data that resides in a fixed field within a record or file and is typically contained in relational databases and spreadsheets. Therefore it depends on creating a data model. For new structured data, we have previously discussed the methods required to allow the computer to interpret data—such as ontologies, data formats and contextual metadata.

The challenge is how to apply these principles consistently across the organisation. This is where guidance for good data management might not be good enough and “Data Governance” becomes important.

A Data Governance body must ensure that a common data architecture is used across all departments in an organisation. The value of good data management results from linking information across organisational boundaries. If departments use different data management approaches, that value might never materialise.

Even in an R&D organisation, it is not always easy to align the data management principles between Early Research, Clinical Research and Regulatory. The requirements in these departments are different and, therefore, they have developed different approaches towards their data management with good reason. Convincing those departments that moving to an ontology-based approach with consistent data formats that are meaningful across the company is not an easy task.

Mobile Devices

Augmented Reality

Wearables

Security

Data Governance

Unstructured Data

Structured Data

Corporate Knowledge

DATA ARE YOUR BIGGEST ASSET5

Biometric ID Bracelets Heads-up Display Safety Glasses

Motion SensorsWearable Displays

Barcode Scanners

Mobile Devices RF ID

NFC

Page 9: The Lab Digitalization Loop picks up speed… · THE LAB DIGITALIZATION LOOP ... and many manual data transactions. ... the Internet of Things and Artificial Intelligence has created

16 17

Over the last three decades, there have been many mergers and acquisitions in Pharma. Information belonging to active projects of the individual companies is now merged into new, more consistent databases for the new organisation. All other data are archived, which means that those data are written to tape, stored in a secure environment—and with few exceptions never used again.

As previously discussed, there are good reasons for this outcome: lack of consistent formats, lack of metadata, no consistent terminology.

Data quality can be improved by adding additional metadata or just cleaning the data. Unfortunately, these exercises are very work intensive.

Of course, one last resort is to treat the “structured” legacy data like unstructured data and use semantic analysis tools to create the missing context.

There are two main categories of new data:

� Project related data: Data generated to advance a specific drug project.

� Non-project data: Data created in a screening effort, for method development, for basic research or as a result of a project that has failed at a stage along the pharma R&D pipeline.

Non-project data are often considered the most valuable data, because they allow scientists as well as computers to learn how to differentiate between active and non-active compounds.

In the past, it has been difficult to access data even within the same pharmaceutical company. Access-ing data from other departments was often not encouraged and accessing clinical data was often impossible for confidentiality reasons. And even when data could be accessed, data still required reformatting for reuse or further analysis.

Tools like BIOVIA Pipeline Pilot can simplify this task. Pipeline Pilot enables scientists to rapidly create, test and publish scientific services that automate the process of accessing, analysing and reporting scientific data, either for the scientist’s personal use or for sharing across the scientific community.

For today’s data scientists, many more sources like RWD (Real World Data), outcome data and well-struc-tured literature data are available for Big Data Analysis.

Most of what has been discussed so far serves one purpose: enabling data to make the computer a more intelligent and efficient partner in the lab. Expert systems, QSAR prediction tools, neural networks and the more recent artificial intelligence (AI) and Machine Learning (ML) tools have all suffered two key problems: quality of the data input and usability of the software.

Scientists in BioPharma highly value the support of data scientists or the IT team to assist with data analysis, but they are often scarce commodities and their availability is limited.

It would be much more convenient to have data available in a self-descriptive standard data format, with access to an intuitive toolset that easily guides the scientist without requiring much additional assistance by specialists.

Quantitative structure-activity relationship (QSAR), in simplest terms, is a method for building compu-tational or mathematical models which attempt to find a statistically significant correlation between structure and function using a chemometric technique (3). In terms of drug design, structure here refers to the properties or descriptors of the molecules, their substituents or interaction energy fields. Function corresponds to an experimental biological/biochemical end- point like binding affinity, activity, toxicity or rate constants.

Legacy Data

New Data

Data ProvisionAnd Reuse

New Data Sources

Computational Chemistry And QSAR

Usability Of Apps

Data Quality

ARTIFICIAL INTELLIGENCE6

Page 10: The Lab Digitalization Loop picks up speed… · THE LAB DIGITALIZATION LOOP ... and many manual data transactions. ... the Internet of Things and Artificial Intelligence has created

18 19

Artificial Neural Networks are a class of machine learning algorithms. They are inspired by biological neural networks that are used to approximate many inputs into a target output. Deep Learning Networks are built with several layers in which the output of one layer serves as the input of the next layer (4).

Increasingly early success stories are resulting from the use of Deep Learning in Drug Discovery, which many startups are also leveraging (citations from 5).

Janssen uses AI in many of their projects. AI systems that are trained on various data sources, including preclinical data sets, have helped them make “significant performance improvements” by enabling “better selections of which compounds to…make and test” in the lab and by “flagging” whether compounds might have “toxic” or “unexpected favorable” effects.

The German pharmaceutical company Merck Group has developed two drugs using computer-vision software, which analyzes images of cells and tissues. They also have other AI systems capable of drawing insights from public databases of genetic and chemical information.

AI has helped scientists at Berg decide which cancers they were going to go after by helping them understand how a drug in clinical testing might work at the cellular level.

We have completed a journey from the processes in the lab, through data formats and IoT, to data analysis and AI. For scientists, it is critical to understand that treating data well at the source in the lab and capturing it in a consistent manner will make the reuse of the data significantly easier and R&D processes much more efficient.

Usually you would expect a summary chapter at this place. I prefer to call it the “Return on Investment” chapter, because without an investment of resources the digitalization loop is not going to gain speed quickly. Organizations need to describe the value of digitalization in terms specific to their own cir-cumstances. Defining the business value will provide the resources and means that are needed for good data management.

Most probably, for several steps the business value will not become visible immediately. Using ontol-ogies to describe experimental results will only become valuable when users start linking their data to new data sources, including literature data or data from a new collaboration.

However, by explaining the whole digitalization loop an organization can create some better under-standing for this dynamic.

Once the data is in good shape, the organization will be able to work or “experiment” completely differently with its data. Combining data with data from collaborators, literature or “Real World Data” sources is an electronic experiment. Using clinical outcome data to create a hypothesis about the value of a new chemical entity can make a huge difference in the prioritization of a project. All this becomes feasible because a semantic approach enables scientists to combine different datasets much more easily. Computers can then interpret the data correctly and use algorithms to help scientists arrive at conclusions that would not have been possible otherwise.

References(1) Managing Workflow, John Joyce, February 04, 2016 Lab Manager

(2) Herding AnIMLs, Chemistry International, Tony Davies, vol 29 No. 6 2007

(3) QSAR J. Verma, V.M. Khedkar, E. C. Coutinho, Curr. Top. Med. Chem. 2010, 10, 95

(4) Deep Learning for Computational Chemistry, Garrett B. Goh, Nathan O. Hodas and Abhinav Vishnu, J Comp Chem 2017, 38, 1291

(5) How AI is transforming Drug Creation, Daniela Hernandez, WSJ 2017 06 25

Deep Learning

AI In Preclinical

Image Analysis

AI In Drug Development

CONCLUSIONS

What Is In It For Me? Return On Investment Is Key

Page 11: The Lab Digitalization Loop picks up speed… · THE LAB DIGITALIZATION LOOP ... and many manual data transactions. ... the Internet of Things and Artificial Intelligence has created

Copyright Information [© 2017 Data4Lab Ltd. All Rights reserved]. For more information, please contact [email protected]