probe selection for microarrays

Probe Selection for Microarrays

Considerations and Pitfalls

Kay HofmannMEMOREC Stoffel GmbHCologne/Germany

Probe selection wish list

Probe selection strategy should ensure

Biologically meaningful results (The truth...)

Coverage, Sensitivity (... The whole truth...)

Specificity (... And nothing but the truth)

Annotation

Reproducability

Technology

Probe immobilization

Oligonucleotide coupling Synthesis with linker, covalent coupling to surface

Oligonucleotide photolithography

ds-cDNA coupling cDNA generated by PCR, nonspecific binding to surface

ss-cDNA coupling PCR with one modified primer, covalent coupling, 2nd strand removal

Spotting

With contact (pin-based systems)

Withoug contact (ink jet technology)

Technology-specific requirements

General

Not too short (sensitivity, selectivity)

Not too long (viscosity, surface properties)

Not too heterogeneous (robustness)

Degree of importance depends on method

Single strand methods (Oligos, ss-cDNA)

Orientation must be known

ss-cDNA methods are not perfect

ds-cDNA methods don’t care

Probe selection approaches

Accuracy Throughput

Selected GeneRegions

SelectedGenes

Anonymous

ClusterRepresentatives

Non-Selective Approaches

EST spotting

Using clones from a library after sequencing

Little justification since sequence availability allow selection

Anonmymous (blind) spotting

Using clones from a library without prior sequencing

Only clones with interesting expression pattern are sequenced

Normalization of library highly recommended

Typical uses:

HT-arrays of ‘exotic’ organisms or tissues

Large-scale verification of DD clones

Spotting of cluster representatives

Sequence Clustering

For human / mouse / rat EST clones: public cluster libraries

Unigene (NCBI)

THC (TIGR)

For custom sequence: clustering tools

STACK_PACK (SANBI)

JESAM (HGMP)

PCP (Paracel, commercial)

A benign clustering situation

In the absence of 5‘-3‘ links

Two clusters corresponding to one gene

Overlap too short

Three clusters corresponding to one gene

Chimeric ESTs! !

One cluster corresponding to two genes

Chimeric ESTs .. continued

Chimeric ESTs are quite common

Chimeric ESTs are a major nuisance for array probe selection

One of the fusion partners is usually a highly expressed mRNA

Double-picking of chimeric ESTs can fool even cautious clustering programs.

Unigene contains several chimeric clusters

The annotation of chimeric clusters is erratic

Chimeric ESTs can be detected by genome comparison

There is one particularly bad class of chimeric sequences that will be subject of the exercises.

How to select a cluster representative

If possible, pick a clone with completely known sequence

Avoid problematic regions Alu-repeats, B1, B2 and other SINEs LINEs Endogenous retroviruses Microsatellite repeats

Avoid regions with high similarity to non-identical sequences

In many clusters, orientation and position relative to ORF are unknown and cannot be selected for.

Test selected clone for sequence correctness

Test selected clone for chimerism

Some commercial providers offer sequence verified UNIGENE cluster representatives

Selection of genes

If possible, use all of them

Biased selection Selection by tissue Selection by topic Selection by visibility Selection by known expression properties Selection from unbiased pre-screen

Use sources of expression information EST frequency Published array studies SAGE data

Selection of gene regions

3‘ UTR

5‘ UTR

Alternative polyadenylation

Constitutive polyA heterogeneity

3’-Fragments: reduced sensitivity no impact on expression ratio

Regulated polyA heterogeneity Fragment choice influences expression ratio Multiple fragments necessary

Detection of cryptic polyA signals Prediction (AATAAA) Polyadenylated ESTs SAGE tags

Alternative splicing

Constitutive splice form heterogeneity

Fragment in alternative exon: reduced sensitivity No impact on expression ratio

Regulated splice form heterogeneity Fragment choice influences expression ratio Multiple fragments necessary

Detection of alternative splicing events Hard/Impossible to predict EST analysis (beware of pre-mRNA) Literature

Alternative promoter usage

Alternative promotor usage

What is the desired readout?

If promoter activity matters most: multiple fragments If overall mRNA level matters most: downstream fragment

Detection of alternative promoter usage Prediction difficult (possible?) EST analysis Literature

UDP-Glucuronosyltransferases

UGT1A8

UGT1A7

Selection of gene regions

Coding region (ORF)

Annotation relatively safe No problems with alternative polyA sites No repetitive elements or other funny sequences danger of close isoforms danger of alternative splicing might be missing in short RT products

3’ untranslated region Annotation less safe danger of alternative polyA sites danger of repetitive elements less likely to cross-hybridize with isoforms little danger of alternative splicing

5’ untranslated region close linkage to promoter frequently not available

A checklist

Pick a gene

Try get a complete cDNA sequence

Verify sequence architecture (e.g. cross-species comparison)

Mask repetitive elements (and vector!)

If possible, discard 3’-UTR beyond first polyA signal

Look for alternative splice events

Use remaining region of interest for similarity searches

Mask regions that could cross-hybridize

Use the remaining region for probe amplification or EST selection

When working with ESTs, use sequence-verified clones

1) Assume that you are interested in the p53-homolog p63, also known as Ket (TrEMBL: Q9UE10) What kind of fragment(s) would you use for expression analysis? Why?

2) The cytochrome P450 family is very important for toxicological microarray analysis since most isoforms repond to different toxic compounds. Is it possible to design a cDNA fragment (minimal size 200 bp) that would be able to separate CYP2A6 and CYP2A7? What is the situation with CYP1A1 and CYP1A2? What region should be used?

3) Check whether probes for p53 (Swissprot: P53_HUMAN), p63 and p73 (P73_HUMAN) are available on the Affymetrix human 35K chip or the mouse 12K chip. Check whether there are sequence verified clones available from Research Genetics.

4) Two (hypothetical) papers using different types of microarrays report very different results for the regulation of the thyroid receptor alpha-2 (Swissprot: THA2_HUMAN). Can you think of a possible explanation? What could you do to resolve this issue?

Exercises

1) Literature search with Pubmed:http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed

2) Sequence search & retrieval (SwissProt, Entrez)http://www.expasy.ch/sprot/http://www.ncbi.nlm.nih.gov:80/entrez/query.fcgi?db=Nucleotide

3) BLAST searches at SIBhttp://www.ch.embnet.org/software/aBLAST.html Use specific subdatabase! Mind the ‘repsim‘ filter

4) Two-way sequence alignmenthttp://www.ch.embnet.org/software/LALIGN_form.html

Tools for Exercises

probe selection for microarrays

biased selection selection

visibility selection

tissue selection

topic selection

array probe selection

common chimeric ests

erratic chimeric ests

sscdna methods

Documents

the analysis of microarray data · 2008-11-17 · image...

new industrial temperature readout and probe selection...

efficient probe selection in micro-array design

dna microarrays examining gene expression. prof....

industrial temperature readout and probe selection...

nonunique probe selection and group testing ding-zhu du

accurate method for fast design of diagnostic...

introduction to affymetrix microarrays why use microarrays?...

4 probe selection, machine controls, and equipment · probe...

data sheet: probe software for machine tools: program...

probe selection strategies for microarrays

a combinatorial extracellular matrix platform identifies...

nonunique probe selection and group testing

probe selection guide - altervista · 2011. 7. 3. · probe...

dna microarrays

probe selection guide - theremino · 2020. 10. 20. ·...

probe software for machine tools: program selection list

industrial temperature readout and probe selection...

computational molecular biology non-unique probe selection...

probe selection for microarrays