bioinformatika, mint a mesterséges intelligencia és a...

Chemoinformaticswith artificial intelligence

Péter Antal

Computational Biomedicine (Combine) workgroupDepartment of Measurement and Information Systems,

Budapest University of Technology and Economics

Overview

• Chemoinformatics

• Artificial intelligence/machine learning

• The data flood in life sciences

• The data and knowledge fusion challenge

• The semantic unification in chemoinformatics

• Artificial intelligence in drug discovery

• Examples:

– Drug repositioning

– Drug-target interaction prediction

Chemoinformatics

• Gasteiger, Johann, and Thomas Engel, eds.

Chemoinformatics: a textbook. John Wiley & Sons, 2006.

• Bajorath, Jürgen. Chemoinformatics for Drug Discovery.

John Wiley & Sons, 2013.

• Karthikeyan, Muthukumarasamy, and Renu Vyas.

Practical Chemoinformatics. Springer, 2014.

• Brown, Nathan. In Silico Medicinal Chemistry:

Computational Methods to Support Drug Design. No.

8. Royal Society of Chemistry, 2015.

MK&RV:Practical chemoinformatics

1. Open-Source Tools, Techniques, and Data in Chemoinformatics

Semantic(-web) technologies

2. Chemoinformatics Approach for the Design & Screening of Focused Virtual Libraries

Prioritization methods using data and knowledge fusion

3. Machine Learning Methods in Chemoinformatics for Drug Discovery

Prediction methods using data and knowledge fusion

4. Docking and Pharmacophore Modelling for Virtual Screening

Priors from docking

5. Active Site-Directed Pose Prediction Programs for Efficient Filtering of Molecules

Binding sites, pockets and latent dimensions

6. Representation, Fingerprinting, and Modelling of Chemical Reactions

7. Predictive Methods for Organic Spectral Data Simulation

8. Chemical Text Mining for Lead Discovery

9. Integration of Automated Workflow in Chemoinformatics for Drug Discovery

Visual data analytics and workflow systems

10.Cloud Computing Infrastructure Development for Chemoinformatics

Karthikeyan, Muthukumarasamy, and Renu Vyas. Practical Chemoinformatics.

Springer, 2014.

Automating

drug discovery

Schneider, Gisbert. "Automating drug discovery." Nature Reviews Drug Discovery 17.2 (2018): 97.

Design cycle

Automated drug discovery facility

Active learning with microfluidics

Artificial intelligence

• IBM Grand Challenge

– 1997: Deep Blue wins human champion G.

Kasparov.

– 1999-2006<: Blue Gene, protein prediction

– 2011: Watson

• Natural language processing

• inference

• Game theory

IBM Watson (2011): Jeopardy

Clinical decision support systems

Watson for Oncology – assessment and advice cycle

www.avanteoconsulting.com/machine-learning-accelerates-cancer-research-discovery-innovation/

• Google DeepMind

• Monte Carlo tree

search

• 2016: 9 dan

• 2017: wins against

human champion

• 2017: Carnegie Mellon University MI:

Libratius

• Pittsburgh Supercomputing Center:

– 1.35 petaflops computation

– 274 Terabytes memory

Poker: Libratus

• Teaching + Learning: learning from manual

and from practice

Machines playing Civilization

Proportion of wins

Playing computer games

• YOLO (you only look once)

Vision: YOLO

https://www.ted.com/talks/joseph_redmon_how_a_computer_learns_to_recognize_obj

ects_instantly#t-409586

Emotion detection, sentiment analysis

https://www.ted.com/talks/rana_el_kaliouby_this_app_knows_how_you_feel_fro

m_the_look_on_your_face

Walking, movements

Real-time translation

D.Adams: Hitchhiker's Guide to the Galaxy"Pilot Translating Earpiece

• ~„big data failed, AI correctly predicted

the upset victory” (correct prediction of

election in the US 3 times in a row)

Political analytics: MogIA

Automated essay scoring (AES)

• Juridical decisions:

– Human experts: 66% identical decision.

– Katz, D.M., Bommarito II, M.J. and Blackman,

J., 2017. A general approach for predicting

the behavior of the Supreme Court of the

United States. PloS one, 12(4), p.e0174698.

• 1816-2015 esetek

• 70%< accuracy

– COMPAS CORE

Legal applications of AI

February 28, 2019 21

http://beauty.ai/• A beauty contest was judged by AI and the robots

didn't like dark skin, Guardian

• Another AI Robot Turned Racist, This Time At Beauty

Contest, Unilad

Beauty.AI

February 28, 2019 22

• Turing-test, Loebner-prize

• Tay was an artificial intelligence chatterbot released by

Microsoft Corporation on March 23, 2016. Tay caused

controversy on Twitter by releasing inflammatory tweets

and it was taken offline around 16 hours after its launch.[1]

Tay was accidentally reactivated on March 30, 2016, and

then quickly taken offline again.

Chatbot: Tay

• Gatys, L.A., Ecker,

A.S. and Bethge,

M., 2015. A neural

algorithm of artistic

style. arXiv

preprint

arXiv:1508.06576.

Reproduction of artistic style

Automated discovery systems Langley, P. (1978). Bacon: A general discovery system. Proceedings of the Second Biennial Conference of the Canadian Society for Computational Studies of Intelligence (pp. 173-180). Toronto, Ontario.

Chrisman, L., Langley, P., & Bay, S. (2003). Incorporating biological knowledge into evaluation of causal regulatory hypotheses. Proceedings of the Pacific Symposium on Biocomputing (pp. 128-139). Lihue, Hawaii.

(Gene prioritization…)

R.D.King et al.: The Automation of Science, Science, 2009

„Machine science”Swanson, Don R. "Fish oil, Raynaud's syndrome, and undiscovered public knowledge." Perspectives in biology and medicine 30.1 (1986): 7-18.

Smalheiser, Neil R., and Don R. Swanson. "Using ARROWSMITH: a computer-assisted approach to formulating and assessing scientific hypotheses." Computer methods and programs in biomedicine 57.3 (1998): 149-153.

D. R. Swanson et al.: An interactive system for finding complementary literatures: a stimulus to scientific discovery, Artificial Intelligence, 1997

James Evans and Andrey Rzhetsky: Machine science, Science, 2013

„Soon, computers could generate many useful hypotheses with little help from

humans.”

State of the art

• Deep Blue defeated the reigning world chess champion Garry Kasparov in 1997

• Proved a mathematical conjecture (Robbins conjecture) unsolved for decades

• No hands across America (driving autonomously 98% of the time from Pittsburgh to San Diego)

• During the 1991 Gulf War, US forces deployed an AI logistics planning and scheduling program that involved up to 50,000 vehicles, cargo, and people

• NASA's on-board autonomous planning program controlled the scheduling of operations for a spacecraft

• Proverb solves crossword puzzles better than most humans

• Google search

• Object recognition…

Hallmarks of a new AI era?

Factors behind the „A.I./learning

hype”• New theory?

– Unified theory of AI?

– A new machine learning approach?

• New hardware? (computing power..)

– Graphics cards (GPUs)?

– Quantum computers?

• New resources?

– Data?

– Knowledge?

– Money?

– Brains/Minds?

Milestones and phases in AI• ~1930: Zuse, Neumann, Turing..: „instruction is data”:

– Laws of nature can be represented, „executed”/simulated with modifications, learnt

– Knowledge analogously: representation, execution, adaptation and learning

• 1943 McCulloch & Pitts: Boolean circuit model of brain

• 1950 Turing's "Computing Machinery and Intelligence"

• 1956 Dartmouth meeting: the term "Artificial Intelligence”

• 1950s Early AI programs (e.g. Newell & Simon's Logic Theorist)

• The psysical symbol system hypothesis: search

• 1965 Robinson's complete algorithm for logical reasoning

• 1966—73 AI discovers computational complexityNeural network research almost disappears

• 1969—79 Early development of knowledge-based systems

• The knowledge system hypothesis: knowledge is power

• 1986-- Neural networks return to popularity

• 1988-- Probabilistic expert systems

• 1995-- Causality research

• The „big data” hypothesis: let data speak

• 1995-- Emergence of machine learning

• 2005/2015-- Emergence of autonomous adaptive decision systems („robots”, agents)

• The autonomy hypothesis??

Computational

complexity

Knowledge

representation

Thresholds of

knowledge

Statistical

complexity

Computer

Reminder:

Automating

drug discovery

Schneider, Gisbert. "Automating drug discovery." Nature Reviews Drug Discovery 17.2 (2018): 97.

Design cycle

Automated drug discovery facility

Active learning with microfluidics

The data flood in life sciences

Heterogeneous data in biomedicine

Genome(s)

Phenome (disease, side effect)

Transcriptome

Proteome

Metabolome

Environment&life style

Moore’s Law for Data Explosion (Carlson’s law)

Sequencing

costs per mill.

Publicly

available

genetic data

NATURE, Vol 464, April 2010

• x10 every 2-3 years

• Data volumes and

complexity that IT has

never faced before…

Bioactivity databases I.

•Targets: 10,774

•Compound records: 1,715,667

•Distinct compounds: 1,463,270

•Activities: 13,520,737

•Publications: 59,610

ChEMBL is a database of bioactive drug-like small molecules, it contains 2-D

structures, calculated properties (e.g. logP, Molecular Weight, Lipinski

Parameters, etc.) and abstracted bioactivities (e.g. binding constants,

pharmacology and ADMET data).

https://www.ebi.ac.uk/chembl

Bioactivity databases II.

Compounds: 97,127,348

Substances: 252,300,917

BioAssays: 1,067,565

Tested Compounds: 3,417,415

Tested Substances: 5,591,261

RNAi BioAssays: 173

BioActivities: 239,680,570

Protein Targets: 12,159

Gene Targets: 58,18635

Bioactivity databases III:ExCAPE-DB

Sun, J., Jeliazkova, N., Chupakhin, V., Golib-Dzib, J.F., Engkvist, O.,

Carlsson, L., Wegner, J., Ceulemans, H., Georgiev, I., Jeliazkov, V. and

Kochev, N., 2017. ExCAPE-DB: an integrated large scale dataset

facilitating Big Data analysis in chemogenomics. Journal of

cheminformatics, 9(1), p.17.

Data: chemogenomics screening

• Justin Lamb: The Connectivity Map: a new tool for

biomedical research, Nature, 7,pp 54-60, 2007

Compounds Cell lines

Each cell is

transcriptional

proifle

Repositories for gene expression

• Gene Expression Omnibus (NCBI)

• http://www.ncbi.nlm.nih.gov/geo/

STRING - Protein-Protein Interactions

• http://string-db.org/

Number of genome-wide association studiesTota

Public

ations

Calendar Quarter

2005 2006 2007 2008 2009 2010 2011 2012

NHGRI GWA Catalog

www.genome.gov/GWAStudie

www.ebi.ac.uk/fgpt/gwas/

Published Genome-Wide Associations through 12/2012

Published GWA at p≤5X10-8 for 17 trait categories

Genetic overlap based disease maps

L.A.Barabási:PNAS, 2007, The human disease network

Epidemiologocal disease maps

Marx, P., Antal, P., Bolgar, B., Bagdy, G., Deakin, B. and Juhasz, G., 2017. Comorbidities in the diseasome are

more apparent than real: What Bayesian filtering reveals about the comorbidities of depression. PLoS

computational biology, 13(6), p.e1005487.

Number of biomedical publications

Little Science, Big Science, by

Derek J. de Solla Price, 1963

200000

400000

600000

800000

1000000

1200000

1950 1960 1970 1980 1990 2000 2010

Number of annual papers

Unification of biology: Gene Ontology

• Ontologies:

– Gene Ontology (GO): http://www.geneontology.org/

– Enzyme Classification (EC)

– Unified Medical Language Systems (UMLS)

– OBO

The Human Phenotype Ontology

http://human-phenotype-ontology.github.io/

Semantic publishing:

papers vs DBs/KBs

M. Gerstein, "E-publishing on the Web: Promises, pitfalls, and payoffs for bioinformatics," Bioinformatics, 1999

M. Gerstein: Blurring the boundaries between scientific 'papers' and biological databases, Nature, 2001

P. Bourne, "Will a biological database be different from a biological journal?," Plos Computational Biology, 2005

M. Gerstein et al: "Structured digital abstract makes text mining easy," Nature, 2007.

M. Seringhaus et al: "Publishing perishing? Towards tomorrow's information architecture," Bmc Bioinformatics,

M. Seringhaus: "Manually structured digital abstracts: A scaffold for automatic text mining," Febs Letters, 2008.

D. Shotton: "Semantic publishing: the coming revolution in scientific journal publishing," Learned Publishing, 2009

The fusion challenge

in drug discovery

Combination of

elements

target

disease

binding site

target protein

product

pathway

disease

transcription factor

binding site

E D. Green et al. Nature 470, 204-213 (2011) doi:10.1038/nature09764

Accomplishments of genomics research

Pharma productivity (~gap)

Mullard, A., 2017. 2016 FDA drug approvals. Nature Reviews Drug Discovery,

16(2), pp.73-76.

The fusion bottleneck

(~limits of personal cognition)

Watson?

The Science Behind an Answer

• http://www-03.ibm.com/innovation/us/watson/what-is-watson/science-behind-an-answer.html

Network of databases in 2000

• 10k< relevant biological

databases and knowledge-bases

• Petabytes of sequence and

high-throughput gene/protein

• ~10.000.000 concepts and

relations explicitly in

knowledge bases

Linked Open Data in 2017

Linking Open Data cloud diagram 2017, by Andrejs Abele, John P. McCrae, Paul Buitelaar, Anja Jentzsch and Richard

Cyganiak. http://lod-cloud.net/

Approaches to fusion

• Encyclopedists:

– Wikipedia, Wikidata,

– Linked Open Data (LOD),

– Semantic unification

• Automated cross-domain querying

– Forms

– Workflow systems

– Natural language understanding, Machine reading

• Automated reasoning

– Watson

• Automated discovery systems („Automation of science”)

– Adam, Eve

• Large-scale similarity-based fusion applied in repositioning

Semantic unification in

chemoinformatics

Semantic Web

• Tim Berners-Lee, 1999, „I have a dream...”, W3C

• Web of data, Web 3.0

• Share, reuse, querying, integration of data, automatic processing, reasoning

• Publishing data in human readable HTML documents to machine readable documents

• Linked Data

The Internet network: nodes are computers or post-pc devices and links are wired or

wireless connections between them.

https://users.dimi.uniud.it/~massimo.franceschet/netart/talk/netart.html

The Resource Description Framework

• The data model of the Semantic Web

• RDF statement

– subject: resource identified by an IRI

– predicate (property): resource identified by an

– object: resource or literal (constant value)

• Graph databases of RDF triples

Relational databases vs.

Triplestores (graph databases)Relational databases• Relations are separated from data (cases)

• Tables&keys define the formal model (syntax)

for the data (cases)

• Model-based (~predefined)

• Meaning (semantics) is informal (out of scope

of the DB)

• Singular databases (~they are separated)

Triplestores• Unified representation of relations and data

• Triples („graph database”) stores the dynamic

model for the data, together with the factual

• Model-free (~relations as data)

• Meaning is defined by the (explicit) relations

(~ontology)

• Linked open data space (using universal

identifiers & ontologies)

Semantic technologies for drug

discovery

• Whitaker, B.J. and Rzepa, H.S., 1995. Chemical publishing via the

Internet. In International chemical information conference (pp. 62-71).

• Murray-Rust, P., Rzepa, H.S., Wright, M. and Zara, S., 2000. A

universal approach to web-based chemistry using XML and CML.

Chemical Communications, (16), pp.1471-1472.

• Murray-Rust, P. and Rzepa, H.S., 2002. Scientific publications in

XML-towards a global knowledge base. Data Science Journal, 1,

pp.84-98.

• Murray-Rust, P., 2008. Chemistry for everyone. Nature, 451(7179),

pp.648-651.

A problem with public data: parallel works on cleaning...integration

• Discovery Platform for cross-domain fusion.

• Public, curated, linked data.

– The data sources you already use, integrated and

linked together: compounds, targets, pathways,

diseases and tissues.

• Everything in triples: Subject-predicate-object

Open Pharmacological Space

Precursor: Gene Ontology: tool for the unification of biology, Nature, 2000

@gray_alasdair Big Data Integration 65

• Discovery Platform to cross barriers.

• The data sources you already use, integrated

and linked together: compounds, targets,

pathways, diseases and tissues.

• ChEBI, ChEMBL, ChemSpider, ConceptWiki,

DisGeNET, DrugBank, Gene Ontology,

neXtProt, UniProt and WikiPathways.

• For questions in drug discovery, answers from

publications in peer reviewed scientific journals.

OPS: scientific pharma questions

bioinformatika, mint a mesterséges intelligencia és a...

Documents

inti2016 161124 intelligencia territorial para la...

Üzleti intelligencia

implementacija smith waterman algoritma koristeci graﬁ...

pazmanyalt3 intelligencia

the new level 4 intelligence analyst apprenticeship...

bioinformatika biomo

Üzleti intelligencia - nhit-it3.hu · Üzleti...

tutorial bioinformatika

training bioinformatika untuk protein...

connecting repositories · vagy mesterséges módszere,...

karakterisasi molekuler protein pada beras hitam...

oracle day 2011 · Élet a warehouse builder után, avagy...

goleman daniel erzelmi intelligencia

bioinformatika-analisis homologi sekuen dna (1)

studijní obor bioinformatika. last lecture summary

bioinformatika · 2020-03-31 · study programme and level...

mestersÉges szikes tavak És szikes kopÁrok … ·...

level 4 apprenticeship in intelligence operations ·...

the new level 4 intelligence analyst apprenticeship...

oracle üzleti intelligencia