14/2/2014 18mb eccb 2013method used on its analysis. in this work we show the variation of results...

1
14/2/2014 18MB ECCB 2013 I\.Ol,.;::)UVI'\.O, IUIIUKU UIIIVCI:::I,l.y,-'0I-l0II, Illo.:>ayul\.' yOlllOlllUl.V, IUIIUI'\.U UIIIVCI::JILy, .1°1-1011 , . Short Abstract: Tohoku UniversityTohoku Medical Megabank Organization (ToMMo; http://www.megabank.tohoku.ac.jp/)was founded to establish an advanced medical system to foster the reconstruction from the Great East Japan Earthquake. The organization will develop a biobank that combines medical and genome information during the process of rebuilding the community medical system and supporting hea Ith and welfa re in the Tohoku area. A blueprint for Tohoku University Tohoku Medical Megabank Organization is a ten-year project including three main activities: a biobank combining medical and genome information; an online platform for the coordination of community medical information; and training program designed for a varieties of highly specialized professionals and experts such as researchers of bioinformatics and science communicators. The biobank to be developed will be utilized to analyze the local heredity information 50 that it can establish an advanced medical system based on genome information with cutting-edge information and communication technology. The first goal of ToMMo is to understand the detailed genetic population background in this area including rare variants. Thus, this project applies deep coverage whole genome sequencing ofthousands people who joined to this prospective genome cohort project within years. This poster presents the oral presenter's organizing part "the data management and the bioinformatics analysis of massive amount of high throughput sequencing data", and the research position availabilities of this very exciting project as a graduate student or a research staff. IQ-º. Poste r N100 A statistical variant calling approach using pedigree information Kaname Kojima, Tohoku Medical Megabank Organization, Tohoku University, Japan Naoki Nariai, Tohoku Medical Megabank Organization, Tohoku University, Japan; Masao Nagasaki, Integrative Genomics, Japan Short Abstract: Due to the progress of next-generation sequencing technologies, whole genome sequencing for each individual becomes possible in practical time and with reasonable cost for the identification of disease associated mutations. Also, individual genome data from case-control study or study with pedigree analysis contribute to the elucidation of unknown disease mechanisms. Since accurate variant detection is required for the analysis of these genomes in a reliable manner, the development of accurate variant callers is demanded. In most of the variant callers, variants such as single nucleotide polymorphisms, insertions, and deletions are detected at each position in a reference genome from the information of mapped sequence reads. Since there exist positions with insufficient coverage of reads due to bias in the library preparation and mapping failures at short tandem repeat polymorphic sites or variable number oftandem repeat sites, reliable variant detection is challenging at these sites. Hence, variant callers that are robust to those errors even with low coverage data are demanded. We propose a new variant calling approach that considers pedigree information. Unlike variant callers considering individuais independently, our approach can use sequence read information of each individual for variant calling on other individuais by connecting read generation models of individuais based on pedigree information. Therefore, accurate variant calling and genotyping is expected even in insufficient read coverage sites. In performance evaluation with the HapMap CEU parent-offspring trio sequencing data, our approach outperformed existing approaches in accuracy on the agreement with SNP array genotyping results. Poster N10l Counting RNAseq reads: which way is better? Felipe Rodrigues da Silva, Embrapa Informática Agropecuária, Brazil Felipe Da Silva, Brazil; Roberto Willians Noda, Embrapa Milho e Sorgo, Brazil; Adhemar Zerlotini Neto, Embrapa Informática Agropecuária, Brazil; Francisco Pereira Lobo, Embrapa Informática Agropecuária, Brazil; Newton Partilho Carneiro, Embrapa Milho e Sorqo , Brazil Short Abstract: RNAseq presented a revolution on mRNA expression analysis. Microarrays where considered doomed and early experimental validation ofRNAseq ana·lysis findings has largely endorsed the common sense view ofthis technology would become the de facto standard on gene expression analysis. Some data RNAseq, however, seems to be very sensitive to the method used on its analysis. In this work we show the variation of results we've found while working with ~1 billion !Ilumina reads from drought tolerant Sorghum bicolor genotype in the presence and absence ofthe stress and compared results found for key genes already characterized. Poste r N102 In silico screening for Antimicrobial Resistance genes in NGS Sequenced Bacterial Strains Christian Rausch, Royal DSM, Netherlands Short Abstract: Royal D5M ís a global science-based company active in health, nutrition and materiais and a prominent player in industrial biotechnology. We have developed a dual approach to detect Antimicrobial Resistance (ARes) genes in bacterial genome sequences. The first part of the method implements a homology search at the DNA levei which relies on mapping of the sequencing reads to a collection of known ARes genes and is an update of a method that was previously described by Brennedsen et aI. (2011) accounting for 4 fold longer reads (150 bp) which are typically today obtained by Next gen sequencing machines like the Illumina MiSeq. In the second part ofthe method, the collection of ARes genes is searched at the protein levei in thede novo assembly ofthe genome using translated BLAST. Because de novo assemblies built from NGSdata today typically are not closed but consist of many contigs (in the order of 100 in our cases), the combination of both approaches allows to also assess genes that might be absent in the draft de novo assembly and takes advantage of increased sensitivity of homology searches at the protein instead of DNA leveI. We have applied our method to Lactobacillus bulgaricus strains, results will be given on the poster. Poster N103 Assembling the 20 Gb White Spruce Genome Shaun Jackman, British Columbia Cancer Agency, Canada Inanc Birol, Canada; Anthony Raymond, British Columbia Cancer Agency, Canada; Shaun D Jackman, British Columbia file:l//E:lnote_ nodas/ismbeccb2013/postersN .html#N 101 27/29

Upload: others

Post on 05-Sep-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 14/2/2014 18MB ECCB 2013method used on its analysis. In this work we show the variation of results we've found while working with ~1 billion!Ilumina reads from drought tolerant Sorghum

14/2/2014 18MB ECCB 2013I\.Ol,.;::)UVI'\.O, IUIIUKU UIIIVCI:::I,l.y,-'0I-l0II, Illo.:>ayul\.' yOlllOlllUl.V, IUIIUI'\.U UIIIVCI::JILy, .1°1-1011, .Short Abstract:Tohoku UniversityTohoku Medical Megabank Organization (ToMMo; http://www.megabank.tohoku.ac.jp/)was founded toestablish an advanced medical system to foster the reconstruction from the Great East Japan Earthquake. The organizationwill develop a biobank that combines medical and genome information during the process of rebuilding the communitymedical system and supporting hea Ith and welfa re in the Tohoku a rea. A blueprint for Tohoku University Tohoku MedicalMegabank Organization is a ten-year project including three main activities: a biobank combining medical and genomeinformation; an online platform for the coordination of community medical information; and training program designed for avarieties of highly specialized professionals and experts such as researchers of bioinformatics and science communicators.The biobank to be developed will be utilized to analyze the local heredity information 50 that it can establish an advancedmedical system based on genome information with cutting-edge information and communication technology. The first goal ofToMMo is to understand the detailed genetic population background in this area including rare variants. Thus, this projectapplies deep coverage whole genome sequencing ofthousands people who joined to this prospective genome cohortproject within years. This poster presents the oral presenter's organizing part "the data management and thebioinformatics analysis of massive amount of high throughput sequencing data", and the research position availabilities ofthis very exciting project as a graduate student or a research staff.

IQ-º.

Poste r N100

A statistical variant calling approach using pedigree information

Kaname Kojima, Tohoku Medical Megabank Organization, Tohoku University, JapanNaoki Nariai, Tohoku Medical Megabank Organization, Tohoku University, Japan; Masao Nagasaki, Integrative Genomics,Japan

Short Abstract:Due to the progress of next-generation sequencing technologies, whole genome sequencing for each individual becomespossible in practical time and with reasonable cost for the identification of disease associated mutations. Also, individualgenome data from case-control study or study with pedigree analysis contribute to the elucidation of unknown diseasemechanisms. Since accurate variant detection is required for the analysis of these genomes in a reliable manner, thedevelopment of accurate variant callers is demanded. In most of the variant callers, variants such as single nucleotidepolymorphisms, insertions, and deletions are detected at each position in a reference genome from the information ofmapped sequence reads. Since there exist positions with insufficient coverage of reads due to bias in the librarypreparation and mapping failures at short tandem repeat polymorphic sites or variable number oftandem repeat sites,reliable variant detection is challenging at these sites. Hence, variant callers that are robust to those errors even with lowcoverage data are demanded. We propose a new variant calling approach that considers pedigree information. Unlikevariant callers considering individuais independently, our approach can use sequence read information of each individual forvariant calling on other individuais by connecting read generation models of individuais based on pedigree information.Therefore, accurate variant calling and genotyping is expected even in insufficient read coverage sites. In performanceevaluation with the HapMap CEU parent-offspring trio sequencing data, our approach outperformed existing approaches inaccuracy on the agreement with SNP array genotyping results.

Poster N10l

Counting RNAseq reads: which way is better?

Felipe Rodrigues da Silva, Embrapa Informática Agropecuária, BrazilFelipe Da Silva, Brazil; Roberto Willians Noda, Embrapa Milho e Sorgo, Brazil; Adhemar Zerlotini Neto, Embrapa InformáticaAgropecuária, Brazil; Francisco Pereira Lobo, Embrapa Informática Agropecuária, Brazil; Newton Partilho Carneiro, EmbrapaMilho e So rqo , Brazil

Short Abstract:RNAseq presented a revolution on mRNA expression analysis. Microarrays where considered doomed and earlyexperimental validation ofRNAseq ana·lysis findings has largely endorsed the common sense view ofthis technology wouldbecome the de facto standard on gene expression analysis. Some data RNAseq, however, seems to be very sensitive to themethod used on its analysis. In this work we show the variation of results we've found while working with ~1 billion!Ilumina reads from drought tolerant Sorghum bicolor genotype in the presence and absence ofthe stress and comparedresults found for key genes already characterized.

Poste r N102

In silico screening for Antimicrobial Resistance genes in NGS Sequenced Bacterial Strains

Christian Rausch, Royal DSM, Netherlands

Short Abstract:Royal D5M ís a global science-based company active in health, nutrition and materiais and a prominent player in industrialbiotechnology. We have developed a dual approach to detect Antimicrobial Resistance (ARes) genes in bacterial genomesequences. The first part of the method implements a homology search at the DNA levei which relies on mapping of thesequencing reads to a collection of known ARes genes and is an update of a method that was previously described byBrennedsen et aI. (2011) accounting for 4 fold longer reads (150 bp) which are typically today obtained by Next gensequencing machines like the Illumina MiSeq. In the second part ofthe method, the collection of ARes genes is searched atthe protein levei in thede novo assembly ofthe genome using translated BLAST. Because de novo assemblies built fromNGS data today typically are not closed but consist of many contigs (in the order of 100 in our cases), the combination ofboth approaches allows to also assess genes that might be absent in the draft de novo assembly and takes advantage ofincreased sensitivity of homology searches at the protein instead of DNA leveI. We have applied our method to Lactobacillusbulgaricus strains, results will be given on the poster.

Poster N103

Assembling the 20 Gb White Spruce Genome

Shaun Jackman, British Columbia Cancer Agency, CanadaInanc Birol, Canada; Anthony Raymond, British Columbia Cancer Agency, Canada; Shaun D Jackman, British Columbia

file:l//E:lnote_ nodas/ismbeccb2013/postersN .html#N 101 27/29