big data: d’utopia a realitatbioinformaticsbarcelona.eu/media/upload/arxius... · big data:...
Post on 05-Oct-2020
1 Views
Preview:
TRANSCRIPT
BIG DATA: D’UTOPIA A REALITAT
Big data i AI, el repte transformat en casos d'èxit a la indústria biotecnològica i farmacèutica
Dimarts 6 de febrer 2018
Toni Manzano, R&D Director and co-founder
AGENDA
AgendaWhat does big data really mean?
1
How do we extract knowledge from big data?
2
Real use cases in one of the most complex industry3
1
What does big data really mean?
Artificial Intelligence
The new-world of bigdata, internet connectivity and analytics, is built with many pieces
Internet of Things
Machine Learning
What is big data
What is big data
90% of the data in the world today has been created in the last 24 months alone
Source: Adeptia
Volume
Every day we create5 quintillion bytes of data
=5.000.000.000.000.000.000
characters / day=
5 ExaBytes / day
Big data and the magic of the aggregation and the granularity
STATISTIC CHALLENGES
Yet the adoption rate of big data, cloud technologies in Pharma is lagging that of other industries
Retail
Consulting
Transportation
Construction
Food Products
Steel
Automobile
Industrial instruments
Publishing
Telecommunications
0 15 30 45 60
$0.4B
$0.8B
$1.2B
$2.0B
$3.4B
$3.4B
$4.2B
$4.3B
$5.0B
$9.6B
17.0%
18.0%
18.0%
19.0%
20.0%
20.0%
20.0%
21.0%
39.0%
49.0%
Productivity Increase %Sales Increase $B
Source: Wipro
STATISTIC CHALLENGES
SOFTWARE, INFORMATION TECHNOLOGY SERVICES32%
INTERNET8,78%
TELECOM
RESEARCH3,37%
RETAIL2,66%
MARKETING AND ADVERTISING2,55%
4,19%
FINANCIAL SERVICES2,35%
2,15%AUTOMOTIVE
… AND MORE
?
Source: Naimat, Aman “The New Artificial Intelligence Market” O’Reilly
Companies investing in AI by industrySTATISTIC CHALLENGES
2
How do we extract knowledge from big data?
Examples of emergent knowledge from big data
Square, a platform originally presented as a device plus a backend to process credit card payments, announced recently its focus to generate a new revenue stream: big data and analytics. Using its massive amount of transaction’s data square offers valuable information to a new range customers
UPS, one of the earliest adopters of business analytics, is moving to a new dynamic package routing program which will save the company tens of millions each year in fuel costs. “UPS executives don’t necessarily view Big Data as new,” Guest Columnist Thomas H. Davenport writes, ”but they do view it as providing revolutionary benefits through evolutionary implementation.”
The company has negotiated deals with multiple energy partners in the U.S. Some utility partners are willing to spend $30 to $60 per year and per thermostat to be able to turn the air conditioner up when it’s a hot day. This way, the utility can levels load on the grid. Partners don’t have direct access to the thermostats, they just sign a deal with Nest, and then Nest has access to the thermostats. Moreover, it’s a recurring revenue stream.
Rio Tinto’s Pilbara region mines, railways and ports generate 2.4 terabytes of data a minute, and its new, state-of-the-art processing centre in Brisbane is working towards processing this valuable information. The company recently reported that its new processing centre in Brisbane has already reduced the company’s costs by US$80 million
GETTING KNOWLEDGE
Kaiser Permanente: Reducing the 26.2% of office visits 8x telephone visits
Premier: 2,700 members, hospitals and health systems 90,000 non-acute facilities 400,000 physicians saving an estimated 29,000 lives reducing healthcare spending by nearly $7 billion
Asthmapolis has created a GPS-enabled tracker that monitors inhaler usage by asthmatics.
ginger.io offers a mobile application in which patients (such as those with diabetes) agree, in conjunction with their providers, to be tracked through their mobile phones and assisted with behavioral health therapies.
Johns Hopkins School of Medicine discovered how to predict sudden increases in flu-related from Google Flu Trends
The analysis of Twitter updates was as accurate as (and two weeks ahead of) official reports at tracking the spread of cholera in Haiti after the January 2010 earthquake
Examples of emergent knowledge in healthGETTING KNOWLEDGE
Drivers, driving change in healthQuality in processes and procedures is required
• 46 millions of primary visits per year • 760.000 of hospital admissions per year • 100.000 of new public health members
per year • 2.7 millions of emergency visits per year • 60 millions of electronic recipes per year • 140 millions of electronic prescriptions
per year • 63 hospitals, 49 mental health • 369 equipments for primary attention
• 3 GB for each codified genotype • 30 MB for each Ray-X test • 120 MB for each mammography • 1 GB for each 3D CT Scan • 150 MB for magnetic resonance • 80% of non-structured data • 20-40% of anual increment • 665 TB of data generated by hospital
AQuAS, Observatori de Salut de Catalunya, 2013 NetApp, The Body as a Source of Big Data, 2013
Main magnitudes in the health system in Catalonia Data volumes generated
GETTING KNOWLEDGE
Do you know how much data is being generated in your site?
Emergency of knowledge
10 Tb 500 Tb 100 PB
per year
Volume Variety
Velocity
Veracity
GETTING KNOWLEDGE
When the statistical results have huge impact in the users
GETTING KNOWLEDGE
n-dimension
3
Real use cases in one of the most complex industry
$50 wasted by Pharma
manufacturers each year
Billion
PHARMA MANUFACTURING CHALLENGES
Source: W. Nicholson Price II, Making Do in Making Drugs: Innovation Policy and Pharmaceutical Manufacturing, 55 B.C.L. Rev. 491
70% of manufacturing data is unused
CHALLENGES
Source: Gartner
Mindset change
DATA SCIENCE CHALLENGES
CQA
CPP
DS
HVAC
QC
(…..)
IoT
CLIMA
New knowledge
Artificial Intelligence Machine Learning
Big data Cloud technologies
IIoT
SAP
ERPMES
Legacy IoT
ERP
MES
IoT
ERPLIMS
CLIMA Users
+Regulated Data Hub
Siloed data 70% unused
Processes Engineering
Something is changing in Pharma…
New statistical tools for a new age based on big data
STATISTIC CHALLENGES
ANOVA
Chi-square
T-Student
p-value
Linear Regression
Gaussian Distribution
Representative sample
Decision Trees
Naive Bayes Classification
Logistic Regression
Neural Networks
Principal Component Analysis
Least Squares Regression
NLP
Clustering Algorithms
Bayesian Networks
Classical statistics Big data statistics
Data management process
STATISTIC CHALLENGES
Data Acquisition
Knowledge generation
Data Preparation
Data Aggregation
Analysis and Models
Interpretation Visualization
Computation
Heterogeneity Scalability Velocity Privacy Security
Decision making
USE CASE 1 - Golden Batch
RM
12days 14
days
ERP
MES
CLIMA
Users
ERP
MES
CLIMA
Users
?CQA: All of them under specs
CPP: Equivalent results except culture duration
Neural Networks + Pattern recognition
CQA: pH Temperature O2 CO2 PID …
CPP: Yield Glucose Flowrates DOUR % Addition …
Others: Cleaning BCO Holding times Climate HVAC …
USE CASE 1 - Golden Batch
Overall Equipment EffectivenessMeasuring the real time batch activity and transversal operationsCPP, CQA and ambience measures
USE CASE 2 - OEE
<26%
Downtimes OEE > 35%
After a 6 months POC using bigengine, the amount of 26% unexpected downtimes were reduced, the OEE increased a 35%
>35%
USE CASE 2 - OEE
Once-a-month data collection ≉ data for predictive maintenanceUSE CASE 3 - Predictive maintenance
• Failures between data collection rounds
• Inconsistent data collection
• Variable operating conditions
• Inaccessible machinery
• Manual Analysis
• Wireless Connectivity
• Inexpensive Sensors
• Big data in real time
• Cloud Computing
• Artificial Intelligence
Classic procedure Unattended process
Solv
en
ts &
Raw
Mat
eri
al Reactor 1
Reactor 2
(…)
Reactor n
VOC
Parallel processes Sequential processes
Emissions
USE CASE 4 - VOC, EL & Cooling process optimization
Why?
USE CASE 5 - Defects in tablets
toni.manzano@bigfinite.com
top related