Computational Methods for Biological Networks
Lecture 0:
Introduction to the Course
Shuigeng Zhou
School of Computer Science
September 10, 2019
Why Study Biological Networks?
The execution of complex biological processes requires
the precise interaction and regulation of thousands of
molecules
Complex biological systems can be represented and
analyzed as computable networks
High throughput biological experimental technologies
provide huge data for biological network construction
Biology/Bioinformatics has increasingly shifted its focus
from individual genes, mRNAs and proteins etc. to
large-scale biological networks of different molecules
2019/9/26 Biological Networks 2
2019/9/26 Biological Networks 3
What is Big Data?
It’s said that the term “Big data” in its current use was
coined by Roger Magoulas @ O'Reilly Media
There is not a consensus as to how to define big data
“Big data exceeds the reach of commonly used hardware
environments and software tools to capture, manage, and process it
with in a tolerable elapsed time for its user population.” - Teradata Magazine article, 2011
“Big data refers to data sets whose size is beyond the ability of typical
database software tools to capture, store, manage and analyze.” - The McKinsey Global Institute, 2011
The Vs of Big Data
3Vs model high-volume, high-velocity, and/or high-variety
Gartner (2012)
4Vs models
Volume, velocity, variety and virtual
Courtney Lambert (2012)
Volume, velocity, variety and veracity
IBM (2012)
Volume, velocity, variety and value
DataStax (2012)
5Vs model
Volume, velocity, variety, veracity and value
2019/9/26 Biological Networks 4
2019/9/26 5Biological Networks
天猫双十一交易额成长图
2019/9/26 6 Biological Networks
2017年:1682亿
天猫双十一交易量增长图
2019/9/26 7 Biological Networks
2017年:14.8 亿笔
2016年双十一单日数据
2019/9/26 8 Biological Networks
2019/9/26 9 Biological Networks
2019/9/26 10 Biological Networks
2019/9/26 11 Biological Networks
2019/9/26 12 Biological Networks
2019/9/26 13 Biological Networks
全球大数据市场预测
2019/9/26 14 Biological Networks
全球大数据市场预测
2019/9/26 Biological Networks 15
From Statista
中国大数据市场预测(单位:亿)
2019/9/26 Biological Networks 16
Why is Big Data Hot?
Applications
Sensor networks, social networks, Internet search indexing, astronomy,
atmospheric science, genomics, military surveillance, medical records,
video archives, and large-scale e-commerce
Market & industry
Government
In 2012, the Obama administration announced the Big Data Research
and Development Initiative
中国政府于2015年8月31日印发了《促进大数据发展行动纲要》
Academia
Deal with Data, Science, Feb. 2011 issue
Big data, Nature, vol. 455, no. 7209, 2008
2019/9/26 Biological Networks 17
What’s Big Data for?
Gartner’s big data definition
2019/9/26 Biological Networks 18
“Big Data are high-volume, high-velocity, and/or high-variety
information assets that require new forms of processing to
enable enhanced decision making, insight discovery and
process optimization.”
Data -> Knowledge -> Business Intelligence
Big data -> “Big” Knowledge -> “Big”
Intelligence
Big Biological Data
2019/9/26 Biological Networks 19
Big Biological Data
2019/9/26 Biological Networks 20
Big Biological Data
2019/9/26 Biological Networks 21
Big Biological Data
2019/9/26 Biological Networks 22
Big Biological Data
2019/9/26 Biological Networks 23
DNA sequencing is now improving faster than Moore's law!
Big Biological Data
2019/9/26 Biological Networks 24
The cost of DNA sequencing is now deceasing faster than Moore's law!
Where Big Biological Data from?
New technologies in biology
Next-Generation Sequencing (NGS)
Genome-sequencing
RNA-sequencing
CHIP-sequencing
Chromosome conformation capture
Single cell sequencing
Sophisticated imaging systems
Mass spectrometry-based flow cytometry
New projects (-omics)
Research publications
2019/9/26 Biological Networks 25
Relatively big Comparing to transmission, storage
and computation capacity
How Big is Big Data?
Absolutely big
相对“大”
相对传输、存储与处理能力而言
2019/9/26 26 Biological Networks
Why BIG?
To be bigger, to be more complete
2019/9/26 Biological Networks 27
Big is not ALL
Completeness is more important than volume
Quality is more important than volume
2019/9/26 Biological Networks 28
Small amounts of
huge data
Genomes
Two Types of Big Data
Huge amounts of
small data
E-commerce data
2019/9/26 29 Biological Networks
Are Big Data New?
2019/9/26 30 Biological Networks
Michael Stonebraker (2014 Turing Award recipient)
Are Big Data New?
VLDB (Very Large Database)
Since 1975
Big data vs. Database
Volume: very large, massive
Velocity: streaming data management, real-time DB
Variety: RDB, XML database, multimedia database etc.
Veracity: data cleaning, uncertain data management
Value: data mining
2019/9/26 31 Biological Networks
Big Data Challenges
New applications
New computing architectures/platforms
Data security and privacy preservation
2019/9/26 32 Biological Networks
Major Techniques
Storage
Data access (Query processing)
Indexing, algorithms
Analysis
Statistics, AI, machine learning, data mining
Visualization
Security & Privacy
2019/9/26 Biological Networks 33
Instructor
Prof. Shuigeng Zhou
R502, Yifu Building, Fudan Handan campus
Tel: 55664967
Email: [email protected]
Homepage: http://admis.fudan.edu.cn/~sgzhou
The best way to reach me is email!
2019/9/26 Biological Networks 34
Lectures: Time and Venue
Time
8:55AM – 11:35AM, Tuesdays from September 10 to
December 24, 2019
There will be no class on October 1 (National Day)
So we have totally only 15 weeks of class
Venue: HY605
2019/9/26 Biological Networks 35
Prerequisites
Mathematics: basic knowledge of probability,
discrete math, graph theory, and linear algebra
Computer science: programming, basic data
structures/algorithms, machine learning and data
mining
Biology: Basic concepts of molecular biology (e.g.
DNA, RNA, proteins, etc.)
For CS or other non-biology background students, you
should learn the basics of molecular biology
2019/9/26 Biological Networks 36
Grading
10% class participation
40% paper presentation
50% algorithm implementation and course
summary (a survey/research paper is preferred!)
2019/9/26 Biological Networks 37
Course Goal
We will introduce
major types of biological networks (gene regulatory
networks, transcriptional regulatory networks and
protein-protein interaction networks etc.)
Methods to infer / construct biological networks from
biological experimental data (especially high
throughput data)
Algorithms for analyzing and mining biological
networks
2019/9/26 Biological Networks 38
Major Topics
Basic concepts of biological networks
Topological properties and structure of biological networks
Alignment of biological networks
Protein-protein interaction networks
Network-based prediction of protein function
Network-based prediction of complexes
Network-based drug target detection
Matrix factorization, random walk, graph embedding, and deep learning
techniques for biological networks
Gene regulatory networks
Transcriptional regulatory networks
Network visualization
2019/9/26 Biological Networks 39
Recommended Texts Luonan Chen, Rui-Sheng Wang, and Xiang-Sun Zhang, Biomolecular
Networks: Methods and Applications in Systems Biology, Wiley, 2009.
Xiaoli Li and See-Kiong Ng (eds). Biological Data Mining in Protein
Interaction Networks, IGI Global , 2009
Björn H. Junker, Falk Schreiber, Analysis of Biological Networks. Wiley,
2011.
F. Kepes (ed.), Biological Networks, World Scientific, 2007.
Jurisica and Wigle (Editors), Knowledge Discovery in Proteomics, CRC
Press, 2005.
Bornholdt and Schuster (Editors), Handbook of Graphs and Networks:
From the Genome to the Internet, Wiley, 2003.
Kurt Mehlhorn, Stefan Näher, LEDA: A Platform for Combinatorial and
Geometric Computing, Cambridge University Press, 1999.
Related articles from Nature, Science, PNAS, NAR, Bioinformatics, BMC
Bioinformatics / Systems Biology, etc.
2019/9/26 Biological Networks 40
Bioinformatics Journals & Conferences
Journals
Nature, Science
Nature Genetics, Nature Methods, PNAS
Genome Research, Genome Biology
NAR, Bioinformatics
BMC Genomics/Bioinfromatics, IEEE/ACM TCBB
Conferences
RECOMB, ISMB
ECCB, WABI
GIW, APBC, ACM-BCB, TBC
BIBM, BIBE, PRIB
2019/9/26 Biological Networks 41
Thanks!
Questions?