t-bioinfo methods and approaches

10

Upload: elia-brodsky

Post on 20-Jan-2017

887 views

Category:

Science


6 download

TRANSCRIPT

Page 1: T-BioInfo Methods and Approaches
Page 2: T-BioInfo Methods and Approaches

Typical Mass-use Pipelines Complex Challenges and Workflows

NGS (Next Generation Sequencing) 1. Total-RNA Analysis (RNA-seq, Non-Coding RNA, Repeats)2. Epigenetics (CHiP-seq and Bisulfate-Seq)3. Variant Calling4. Microbiome (Metagenomics)

Mass Spec 1. Proteomics2. Metabolomics

Structural Biology 1. Libraries of Small Molecules (Query, Clustering)2. Docking (Including large molecules)

Machine Learning 1. Phenotypic Analysis and Modeling2. Analysis of visual data3. Standard Statistical methods4. Integration of heterogenous data sets

CirSeq Mutation Analysis 1. Analysis of viral CirSeq data for precise mutation identification2. Fitness of mutations reflecting viral adaptation3. Identification of viral quasi-species

Mass Spec 1. Protein-protein Interactions between host and viral

proteins2. Post translational modifications of host proteins

Structural Biology 1. Libraries of Small Molecules (Query, Clustering)2. Docking (Including large molecules)

NGS host data 1. Host gene expression variations in response to

infectious quasi-species

T-BioInfo is a user-friendly computational platform that enables analysis and integration of big data. The challenge of mining -omics data for meaningful patters that can be applied in biomedical and agricultural research as sequencing becomes cheaper and more precise. On the other hand, complex

networks of dependencies that define many conditions tend to require integration of huge heterogenous data sets from SNPs, gene expression, epigenetic markers, proteomic and metabolomic profiles, even structural biology data. Our company has developed innovative and user friendly workflows for analysis

and integration of these different datasets. Now we are looking to test and commercialize a platform that provides web access to the platform.

Page 3: T-BioInfo Methods and Approaches

Simple, Flexible and Consistent Interface Across All Sections

Integration of analysis types

One environment for all types of data

and analysis

“one-button” approach to most areas of analysis

• Flexibleanalysispipelinesinthepla/ormsec4onsandeasytoperformdatainput

• Auserisassistedbythepla/orminconstruc4ngmeaningfulalgorithmicpipelinesforprocessingdata:modulesforpipelinecon4nua4onarehighlightedbyblackbackgroundandyellow4tle.

Page 4: T-BioInfo Methods and Approaches

Analysis of Total RNA

Concept:rawtotaltranscriptomereadscontaininforma4onnotonlyaboutexpressedsplicevariants(isoforms)ofgenes,butalsoaboutexpressedtransposonsandregulatorynon-codingRNAs.Thecompleteanalysisconsistsofthreesteps.First,thereadsaremappedonisoformsinordertogetisoformexpressionlevels.Second,previouslyunmappedreadsaremappedonknownrepe44veelements(RE)andnon-codingRNAsinordertogettheirexpressionlevels.Third,therestofreadsareprocessedbyspecialclustering(BiClustering)inordertogetnewexpressedREandnon-codingRNAsaswellastheirexpressionlevelsunderappliedbiologicalcondi4ons.Onthenextstage,dataintegra4oncanbeperformed:interplaybetweenexpressedisoforms,transposons,andregulatoryRNAs.

1Detec4onofexpressedisoformsandtheirexpressionlevelsbymappingthereadsonconstructedtranscripts

2 Forunmappedreads: √

3Detec4onofmostexpressedrepeatsandregulatoryRNAfromdatabases

4BiClustering:associa4onsofkmersandreadsasabicluster,andgenera4onofKchainsofbiclusters

5 ExtensionsofKchains ±

6MappingofNGSreadsonfoundKchains:detec4onofmostexpressednoveltransposonsandregulatoryRNAs

T-Bioinfo RNA-seq/chip section

Example: Expression of RepeatsAlgorithmic Approaches:

Analysis of “Junk” RNA

Page 5: T-BioInfo Methods and Approaches

Epigenetic Analysis: Bisulfite DNA Methylation and CHiP-Seq

BisulfiteConcept:bisulfitesequencingshowsTinsteadofCinareadifCofagenomicssite(likeCpG)ismethylated.Thus,detec4onofmethylatedsitesandgenomefragmentsenriched/depletedbymethyla4onisbasedonspecialtypeofreadmapping,andsegmenta4onofthewholegenomemethyla4onprofile.Theanalysisobjec4vesincludespecialmappingalgorithmswithtoleranceoftheT-to-Cmismatch,sta4s4cales4ma4onoftheper-sitemethyla4onlevel,allelespecificityofDNAmethyla4on,aswellasdetec4onoftheover-methylatedandunder-methylatedgenomicregions.

CHiP-SeqConcept:detec4onofepigene4csignalssuchashistonemodifica4onsofdifferenttypesandDNAmethyla4oneventsaswellasdeterminingprotein/DNAbindingsites(TFbindingsites)areperformedbyCHiP-seqandCHiP-chipexperiments.Analysisofprofilesofthesewholegenomesignalsisperformedbythegenomesegmenta4onalgorithms.Theanalysisobjec4vesincludeiden4fyingsignalenrichedgenomefragmentsasputa4veepigene4cevents,andacombina4onofenrichedfragmentsonposi4veandnega4vestrandswithacertaindistancebetweenthemastheTFbindingevent.Onthenextanalysisstage,thedataintegra4oncanbeperformed:interplaybetweengenomemuta4onsandepigene4csignalsononesideandexpressedisoforms,transposons,andregulatoryRNAsontheotherside.Thenetworkofgeneregula4onbyatranscrip4onfactorcanbereconstructedfromthewholegenomeTFbindingposi4onsandexpressionsofthedown-streamgenes.MicroarraydatasetsaretransformedintopseudoNGSreadsandareanalyzedbythesameCHiP-seqpipelines.

T-Bioinfo CHiP-seq section

1 Preprocessingofrawdata √

2MappingofNGSreadsbybisulfitemappingalgorithms:nopenaltyforT(read)-to-C(genome)mismatches

3Detec4onoftheDNAmethylatedposi4onsandtheirscoresbytheconfidenceintervalmethod

4 Allelespecificityofthemethyla4oninaposi4on. -

5Detec4onofover-methylatedandunder-methylatedgenomicintervalsbythesegmenta4onalgorithms

±

6Detec4onofdifferen4alDNAmethyla4ons(individualposi4onsandintervals)betweencontras4ngcondi4ons

±

Page 6: T-BioInfo Methods and Approaches

Virology Pipeline

Mutation Fitness

Genome-wide fitness calculations enabled by CirSeq, combined with structural information, can provide high-definition, bias-free insights into structure-function relationships, potentially revealing novel functions for viral proteins and RNA structures, as well as nuanced insights into a viral genome’s phenotypic space. Such analyses have the power to reveal protein residues or domains that directly correspond to viral functional plasticity and may significantly inform our structural and mechanistic understanding of host–pathogen interactions.

Page 7: T-BioInfo Methods and Approaches
Page 8: T-BioInfo Methods and Approaches

Integration of Heterogenous Data sets Concept:mutualassocia4onoffeaturesofbiologicaldatasetsismostsubstan4alpartforintegra4onofseveralanalysesofbiologicalprojectsinonestory.Wearesugges4ngseveral

techniquesforsuchassocia4ons.

MatchingofmetaboliteandSNPprofilesaccordingtoLB’sselectionofSNPs

Page 9: T-BioInfo Methods and Approaches

Patent Pending Technology for Drug Discovery

Fast screening and clustering of small molecules based on physico-chemical similarity (70-100 times faster than industry

standard)

SAMPLE STUDIES: !DENGUE! POLIO !

NS3 NS5

Analysis of Mass-Spec Proteomics Data:

Ank1(Adenylate kinase isoenzyme 1)

Increased expression during early infection

Analysis of RNA sequences to reveal Mutation Fitness Proteomic Mutation Fitness

Small Molecule CandidateProteins of Interest by Comparison:

Identifying a biologically active molecule (Polio)

Patent Pending: Ref. P-78368-US | App. No. 14/625,785 entitled SYSTEMS AND METHODS OF IMPROVED MOLECULE SCREENING

Computational analysis of small molecules can be roughly divided into three sections: pre-processing analysis, virtual screening methods, and clustering. The aim of the conformer generation process is to build a set of representative conformers that covers the conformational space of a given molecule. There are two main classes of virtual screening methods: similarity-based methods (descriptor-based screening; geometric querying; shape-based querying; fingerprints) and receptor-based methods (docking). One of the greatest challenges of docking software is to consider protein flexibility. These macromolecules are not static objects and conformational changes are often key elements in ligand binding. T-Bioinfo provides a number of proprietary methods that can be combined into pipelines for drug discovery.

Page 10: T-BioInfo Methods and Approaches

Tauber Bioinformatics Research Center

Tauber Bioinformatics Research Center at the University of Haifa has a proven track record in Bioinformatics with scientific

collaborations with Hospitals, top US Universities, involvement in government-funded projects, and multiple publications in

leading journals such as Science and Nature.

Pine Biotech holds an exclusive license for commercialization of tools developed at the TBRC for research, industry applications

and education. The startup is located at the BioInnovation Center in New Orleans, LA. In collaboration with TBRC staff, Pine Biotech

is completing several pilot projects to validate our approach.

Aleph Therapeuticsא

Early Adopters and Collaborators: