probe selection for microarrays
Post on 14-Jan-2016
36 Views
Preview:
DESCRIPTION
TRANSCRIPT
Probe Selection for Microarrays
Considerations and Pitfalls
Kay HofmannMEMOREC Stoffel GmbHCologne/Germany
Probe selection wish list
Probe selection strategy should ensure
Biologically meaningful results (The truth...)
Coverage, Sensitivity (... The whole truth...)
Specificity (... And nothing but the truth)
Annotation
Reproducability
Technology
Probe immobilization
Oligonucleotide coupling Synthesis with linker, covalent coupling to surface
Oligonucleotide photolithography
ds-cDNA coupling cDNA generated by PCR, nonspecific binding to surface
ss-cDNA coupling PCR with one modified primer, covalent coupling, 2nd strand removal
Spotting
With contact (pin-based systems)
Withoug contact (ink jet technology)
Technology-specific requirements
General
Not too short (sensitivity, selectivity)
Not too long (viscosity, surface properties)
Not too heterogeneous (robustness)
Degree of importance depends on method
Single strand methods (Oligos, ss-cDNA)
Orientation must be known
ss-cDNA methods are not perfect
ds-cDNA methods don’t care
Probe selection approaches
Accuracy Throughput
Selected GeneRegions
SelectedGenes
Anonymous
ESTs
ClusterRepresentatives
Non-Selective Approaches
EST spotting
Using clones from a library after sequencing
Little justification since sequence availability allow selection
Anonmymous (blind) spotting
Using clones from a library without prior sequencing
Only clones with interesting expression pattern are sequenced
Normalization of library highly recommended
Typical uses:
HT-arrays of ‘exotic’ organisms or tissues
Large-scale verification of DD clones
Spotting of cluster representatives
Sequence Clustering
For human / mouse / rat EST clones: public cluster libraries
Unigene (NCBI)
THC (TIGR)
For custom sequence: clustering tools
STACK_PACK (SANBI)
JESAM (HGMP)
PCP (Paracel, commercial)
A benign clustering situation
In the absence of 5‘-3‘ links
Two clusters corresponding to one gene
!
Overlap too short
Three clusters corresponding to one gene
!
Chimeric ESTs! !
One cluster corresponding to two genes
Chimeric ESTs .. continued
Chimeric ESTs are quite common
Chimeric ESTs are a major nuisance for array probe selection
One of the fusion partners is usually a highly expressed mRNA
Double-picking of chimeric ESTs can fool even cautious clustering programs.
Unigene contains several chimeric clusters
The annotation of chimeric clusters is erratic
Chimeric ESTs can be detected by genome comparison
There is one particularly bad class of chimeric sequences that will be subject of the exercises.
How to select a cluster representative
If possible, pick a clone with completely known sequence
Avoid problematic regions Alu-repeats, B1, B2 and other SINEs LINEs Endogenous retroviruses Microsatellite repeats
Avoid regions with high similarity to non-identical sequences
In many clusters, orientation and position relative to ORF are unknown and cannot be selected for.
Test selected clone for sequence correctness
Test selected clone for chimerism
Some commercial providers offer sequence verified UNIGENE cluster representatives
Selection of genes
If possible, use all of them
Biased selection Selection by tissue Selection by topic Selection by visibility Selection by known expression properties Selection from unbiased pre-screen
Use sources of expression information EST frequency Published array studies SAGE data
Selection of gene regions
3‘ UTR
ORF
5‘ UTR
Alternative polyadenylation
Alternative polyadenylation
Constitutive polyA heterogeneity
3’-Fragments: reduced sensitivity no impact on expression ratio
Regulated polyA heterogeneity Fragment choice influences expression ratio Multiple fragments necessary
Detection of cryptic polyA signals Prediction (AATAAA) Polyadenylated ESTs SAGE tags
Alternative splicing
Alternative splicing
Constitutive splice form heterogeneity
Fragment in alternative exon: reduced sensitivity No impact on expression ratio
Regulated splice form heterogeneity Fragment choice influences expression ratio Multiple fragments necessary
Detection of alternative splicing events Hard/Impossible to predict EST analysis (beware of pre-mRNA) Literature
Alternative promoter usage
Alternative promotor usage
What is the desired readout?
If promoter activity matters most: multiple fragments If overall mRNA level matters most: downstream fragment
Detection of alternative promoter usage Prediction difficult (possible?) EST analysis Literature
UDP-Glucuronosyltransferases
UGT1A8
UGT1A7
Selection of gene regions
Coding region (ORF)
Annotation relatively safe No problems with alternative polyA sites No repetitive elements or other funny sequences danger of close isoforms danger of alternative splicing might be missing in short RT products
3’ untranslated region Annotation less safe danger of alternative polyA sites danger of repetitive elements less likely to cross-hybridize with isoforms little danger of alternative splicing
5’ untranslated region close linkage to promoter frequently not available
A checklist
Pick a gene
Try get a complete cDNA sequence
Verify sequence architecture (e.g. cross-species comparison)
Mask repetitive elements (and vector!)
If possible, discard 3’-UTR beyond first polyA signal
Look for alternative splice events
Use remaining region of interest for similarity searches
Mask regions that could cross-hybridize
Use the remaining region for probe amplification or EST selection
When working with ESTs, use sequence-verified clones
1) Assume that you are interested in the p53-homolog p63, also known as Ket (TrEMBL: Q9UE10) What kind of fragment(s) would you use for expression analysis? Why?
2) The cytochrome P450 family is very important for toxicological microarray analysis since most isoforms repond to different toxic compounds. Is it possible to design a cDNA fragment (minimal size 200 bp) that would be able to separate CYP2A6 and CYP2A7? What is the situation with CYP1A1 and CYP1A2? What region should be used?
3) Check whether probes for p53 (Swissprot: P53_HUMAN), p63 and p73 (P73_HUMAN) are available on the Affymetrix human 35K chip or the mouse 12K chip. Check whether there are sequence verified clones available from Research Genetics.
4) Two (hypothetical) papers using different types of microarrays report very different results for the regulation of the thyroid receptor alpha-2 (Swissprot: THA2_HUMAN). Can you think of a possible explanation? What could you do to resolve this issue?
Exercises
1) Literature search with Pubmed:http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
2) Sequence search & retrieval (SwissProt, Entrez)http://www.expasy.ch/sprot/http://www.ncbi.nlm.nih.gov:80/entrez/query.fcgi?db=Nucleotide
3) BLAST searches at SIBhttp://www.ch.embnet.org/software/aBLAST.html Use specific subdatabase! Mind the ‘repsim‘ filter
4) Two-way sequence alignmenthttp://www.ch.embnet.org/software/LALIGN_form.html
Tools for Exercises
top related