Download - My comments are an informal communication and represent …...Apr 30, 2015 · Russia China Australia Zimbabwe India Pakistan Iran Turkey Kazakhstan Mongolia Arabia Myanmar Nepal

My comments are an informal communication and represent my own best judgment. These comments do not bind or obligate FDA.

•  open FDA: FDA Launches Big Data 'openFDA' initiative, giving public easier Access to Safety Information

•  MAQC consortium: FDA-led community-wide consortium aimed at assessing the technical performance of next-generation sequencing platforms

•  HIVE: FDA partners with the academia in order to develop Next Generation Sequencing platform

•  CFSAN: FDA partners with CDC, NIH/NCBI in developing research infrastructure for a risk-based food safety system

FDA leverages big data: current efforts

Currently focus: •  Adverse events. FDA’s publically available drug adverse event and

medication error reports, and medical device adverse event reports. •  Recalls. Enforcement report data, containing information gathered

from public notices about certain recalls of FDA-regulated products. •  Labeling. Structured Product Labeling (SPL) data for FDA-regulated

human prescription drug, OTC drug and biological product labeling.

Romania Hungary Greece

Cyprus

Ukraine Lithuania

Croatia Moldova Serb. Mont.

Germany Switzerland

Gibraltar

Austria Lux. Slovakia

Sweden Finland

Norway

France Spain

Poland Italy

Belarus United Kingdom Latvia Ireland

Bulgaria

Estonia

Portugal

Denmark

Andorra

Netherlands Belgium Czech Rep.

Albania Macedonia

Iceland

Syria Kuwait

Qatar Guam

Palau Micronesia

Christmas Is.

Wake I. Marshall Is.

Maldives

Russia

China

Australia

India Iran

Kazakhstan Mongolia

Saudi Arabia

Turkey Iraq

Pakistan Myanmar

Afghanistan Uzbekistan

Turkmenistan

Thailand Yemen

Japan

Oman Laos Vietnam

Nepal

Kyrgyzstan Azerbaijan Tajikistan

New Zealand

Papua New Guinea

Jordan North Korea

Indonesia Malaysia

Bangladesh South Korea

Bhutan

Sri Lanka

Taiwan

New Caledonia

Philippines

Solomon Is. Vanuatu

Israel U.A.E. Bahrain

Lebanon Georgia Armenia

East Timor

Cambodia Benin

Congo Liberia

Canary Is.

Gabon Togo

Rwanda

Cape Verde

Seychelles

Algeria North Sudan

Libya Mali

Chad Niger Egypt

Angola Dem. Rep. Congo

Ethiopia

South Africa

Nigeria

Namibia

Mauritania

Zambia Tanzania

Kenya Somalia

Botswana Mozambique

Morocco

Madagascar

Cameroon

Zimbabwe

Ghana Guinea

Tunisia

Uganda Cote d'Ivoire

Senegal Burkina Faso

Western Sahara Eritrea

Malawi

Swaziland Lesotho

Cen. Afr. Rep. Sierra Leone Guinea-Bissau The Gambia

Equat. Guinea

Burundi

Djibouti

Comoros

Falkland Is.

Suriname

Brazil

Argentina

Peru

Chile

Bolivia

Colombia Venezuela

Paraguay Uruguay

Ecuador Guyana

French Guiana Trinidad & Tobago

Samoa French Polynesia

Cook Is.

Galapagos Is.

Honduras Nicaragua

The Bahamas Hawaii

Bermuda Midway Is.

Canada

United States

Mexico Cuba

Panama Haiti Puerto Rico Dominican Rep.

Guatemala Belize El Salvador

Costa Rica

Greenland

South Sudan

>10M API calls >½ of API calls were issued from outside US >12 new software (mobile or web) apps

>20,000 connected IP addresses ~6,000 registered API users

openFDA: Usage

5 5

MicroArray Quality Control (MAQC) An FDA-led community wide consortium effort to assess technical performance and application of genomics technologies (microarrays, GWAS and next-gen sequencing)

MAQC-I 2005.2 – 2006.9

6 papers, 2006 13 papers, 2010

Assess reliability of microarrays •  Repeatability •  Reproducibility

Assess microarray based biomarkers •  Clinical use •  safety evaluation

MAQC-II 2006.9 – 2010.10

MAQC-III/SEQC 2008.8 – 2014

Assess reliability of next-gen sequencing (RNA-seq) and compare it with microarrays

11 manuscripts

137 participants 51 organizations

202 participants 97 organizations

>180 participants 73 organizations

6

The 3rd Phase of MAQC - SEquencing Quality Control 180 participants from 73 organizations Generated > 10Tb data and >100 billion reads Represented ~6% data in GEO (Jun, 2014) 11 Manuscripts: 3 by Nat Biotechnol, 2 by Nat Commun, 3 by Sci Data, 2 by Genome Biology (revision) and 1 in Nat Method (revision)

Objectives

Study designs

Datasets

FDA, USDA, CDC State, Local and Foreign Public Health Agencies

Academia/Industry ADDITIONAL DATA ANALYSIS

DATA ASSEMBLY, STORAGE and ANALYSIS

DATA ACQUISITION

NCBI, EMBL DDBJ (INSDC) (Public Access Database)

Food Safety Research Infrastructure – Publicly available data

NaFonal Network of Sequencers IntrenaFonal Network of Sequencers

GenomeTrakr Strategy

•  Develop and test the performance of a distributed sequencing based network, rather than centralized model

•  Provide sequence and minimal metadata in a publicly accessible database –  Partner with NCBI for storage and serving data –  Cost prohibitive for FDA to establish its own high capacity

data site –  Cost savings by using NCBI allowed more labs to participate –  Industry, academia, and other government agencies have

access to data for individual needs

GenomeTrakr: distributed sequencing network

~10,000 pre-‐registered strains ~6000 genomes

•  Robust data loading: multiple sources, large data blobs, through complex handshaking procedures

•  Distributed storage: compressed, organized data and metadata

•  Hierarchical Security: permission based files, meta-data, processes, algorithms in a collaborative environment.

•  Distributed Computations: private cloud based platform running virtual services

•  Interface: customizable, web-driven unified interface with graphical visualizations

•  Expertise: in-house research and development team capable of responding to the needs

HIVE: High-performance Integrated Virtual Environment

maxi-HIVE location: White Oak /CDRH HPC

storage: ~2 Petabytes cpu:1500 cores, extensible to 3000-5000 wan : 10Gb ⇒ Internet2

lan: 40Gb ⇒ Infiniband platform: metal + SunGrid goal: regulatory next-gen support platform for long term storage and large scale computations; to support regulatory submissions for NGS and standardization portal for NGS evidence submissions

mini-HIVE

location: White Oak/CBER server room

storage: ~380 Terabytes cpu: ~350 cores wan: 1Gb

lan: 10-20 GB platform: metal goal: research and scientific NGS portal with cutting edge production quality tools

HIVE: deployments Public-HIVE

location: GWU Dr Mazumder’s lab

storage: 200 Terabytes cpu: ~350 cores wan: 1Gb

lan: 10 GB platform: metal goal: support and integrate wider community of researchers into HIVE process, allow access to cutting edge regulatory complaint tools and standards, perform pilot free projects with academic, industry and government entities to promote and ease the access to novel NGS techniques. To incorporate HIVE into education.

Public-elastic HIVE location: ColonialOne/Ashburn datacenter storage: extensible to Petabytes

cpu:+1000 cores wan : 10Gb ⇒ Internet2 lan: 10Gb ⇒ Infiniband

platform: Lustre open source goal: to become extensibility platform for public HIVE users for their large scale computational needs for large clinical research projects.

Amazon

•  Standardization: FDA gets ready for standardization of big data submissions

•  Bioinformatics harmonization: ongoing efforts to build bioinformatics validation platforms

•  MAQC-IV: looking into personalized genome quality metrics projects

•  Cloud: prepares for more utilization of cloud services to store, manipulate and communicate big data

FDA leverages big data: moving forward

www.ncbi.nlm.nih.gov

Expectations

Thanks to all of those who’s hard work and bright ideas made all of the above possible. Slide contributors: Taha Kass-Hout, Tong Weida, Errol Strain and Marc Allard

Acknowledgments

Download - My comments are an informal communication and represent …...Apr 30, 2015 · Russia China Australia Zimbabwe India Pakistan Iran Turkey Kazakhstan Mongolia Arabia Myanmar Nepal

Top Related