prediction of the coding sequences of mouse homologues of kiaa

14
DNA Research 10, 167–180 (2003) Short Communication Prediction of the Coding Sequences of Mouse Homologues of KIAA Gene: III. The Complete Nucleotide Sequences of 500 Mouse KIAA-homologous cDNAs Identified by Screening of Terminal Sequences of cDNA Clones Randomly Sampled from Size-fractionated Libraries Noriko Okazaki, 1 Reiko Kikuno, 1 Reiko Ohara, 1 Susumu Inamoto, 2,3 Haruhiko Koseki, 4,5 Shuichi Hiraoka, 4 Yumiko Saga, 6 Takahiro Nagase, 1 Osamu Ohara, 1,5 and Hisashi Koga 1,3,Kazusa DNA Research Institute, 2-6-7 Kazusa-kamatari, Kisarazu, Chiba 292-0818, Japan, 1 Institute of Research and Innovation, 1201 Takada, Kashiwa, Chiba 277-0861, Japan, 2 Chiba Industry Advancement Center, 2-6 Nakase, Mihama-ku, Chiba 261-7126, Japan, 3 Department of Molecular Embryology, Graduate School of Medicine, Chiba University, 1-8-1 Inohana, Chuo-ku, Chiba 260-8670, Japan, 4 RIKEN Research Center for Allergy and Immunology, 1-7-22 Suehiro, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan, 5 and Division of Mammalian Development, National Institute of Genetics, Yata 1111, Mishima 411-8540, Japan 6 (Received 8 August 2003) Abstract We have conducted a human cDNA project to predict protein-coding sequences (CDSs) in large cDNAs (> 4 kb) since 1994, and the number of newly identified genes, known as KIAA genes, already exceeds 2000. The ultimate goal of this project is to clarify the physiological functions of the proteins encoded by KIAA genes. To this end, the project has recently been expanded to include isolation and characterization of mouse KIAA-counterpart genes. We herein present the entire sequences and the chromosome loci of 500 mKIAA cDNA clones and 13 novel cDNA clones that were incidentally identified during this project. The average size of the 513 cDNA sequences reached 4.3 kb and that of the deduced amino acid sequences from these cDNAs was 816 amino acid residues. By comparison of the predicted CDSs between mouse and human KIAAs, 12 mKIAA cDNA clones were assumed to be differently spliced isoforms of the human cDNA clones. The comparison of mouse and human sequences also revealed that four pairs of human KIAA cDNAs are derived from single genes. Notably, a homology search against the public database indicated that 4 out of 13 novel cDNA clones were homologous to the disease-related genes. Key words: mKIAA; mouse; cDNA sequencing; large proteins; orthologue; novel genes; protein-coding region One of the expected goals of the human genome project is to fully utilize genomic information for the de- velopment of effective diagnoses and/or medical treat- ments of various human diseases. As a result, many disease-related genes have been identified and registered in the public databases: the Human Gene Mutation Database (HGMD) (http://archive.uwcm.ac.uk/uwcm/ mg/hgmd0.html), 1 the Online Mendelian Inheritance in Man (OMIM) 2 (www.ncbi.nlm.nih.gov/omim), and the Genome Database (GDB) (http://www.gdb.org ). 3 For example, OMIM contains 8733 mapped mutations and Communicated by Michio Oishi To whom correspondence should be addressed. Tel. +81-438- 52-3919, Fax. +81-438-52-3918, E-mail: [email protected] HGMD contains 34,682 mutations in 1390 genes as of July 20, 2003. However, the gap between genotype and phenotype has yet to be filled in many cases. In addition to these efforts, since 1994 we have been independently conducting a human cDNA sequencing project to accumulate sequence information of protein coding regions (CDSs) in cDNAs, focusing on the long cDNAs (> 4 kb) encoding large proteins. 4 Since position- ally cloned genes frequently encode large multidomain proteins, 5 we considered that focusing on long cDNAs encoding large proteins could be an efficient way to iden- tify unknown human disease-related genes. 610 Conse- quently, we have newly characterized more than 2000 human genes and systematically designated them using at Pennsylvania State University on February 23, 2013 http://dnaresearch.oxfordjournals.org/ Downloaded from

Upload: buixuyen

Post on 27-Dec-2016

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Prediction of the Coding Sequences of Mouse Homologues of KIAA

DNA Research 10, 167–180 (2003) Short Communication

Prediction of the Coding Sequences of Mouse Homologues of

KIAA Gene: III. The Complete Nucleotide Sequences of500 Mouse KIAA-homologous cDNAs Identified by Screening of

Terminal Sequences of cDNA Clones Randomly Sampled fromSize-fractionated Libraries

Noriko Okazaki,1 Reiko Kikuno,1 Reiko Ohara,1 Susumu Inamoto,2,3 Haruhiko Koseki,4,5

Shuichi Hiraoka,4 Yumiko Saga,6 Takahiro Nagase,1 Osamu Ohara,1,5 and Hisashi Koga1,3,∗

Kazusa DNA Research Institute, 2-6-7 Kazusa-kamatari, Kisarazu, Chiba 292-0818, Japan,1 Institute ofResearch and Innovation, 1201 Takada, Kashiwa, Chiba 277-0861, Japan,2 Chiba Industry AdvancementCenter, 2-6 Nakase, Mihama-ku, Chiba 261-7126, Japan,3 Department of Molecular Embryology, GraduateSchool of Medicine, Chiba University, 1-8-1 Inohana, Chuo-ku, Chiba 260-8670, Japan,4 RIKEN ResearchCenter for Allergy and Immunology, 1-7-22 Suehiro, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan,5

and Division of Mammalian Development, National Institute of Genetics, Yata 1111, Mishima 411-8540,Japan6

(Received 8 August 2003)

Abstract

We have conducted a human cDNA project to predict protein-coding sequences (CDSs) in large cDNAs(> 4 kb) since 1994, and the number of newly identified genes, known as KIAA genes, already exceeds 2000.The ultimate goal of this project is to clarify the physiological functions of the proteins encoded by KIAAgenes. To this end, the project has recently been expanded to include isolation and characterizationof mouse KIAA-counterpart genes. We herein present the entire sequences and the chromosome loci of500 mKIAA cDNA clones and 13 novel cDNA clones that were incidentally identified during this project.The average size of the 513 cDNA sequences reached 4.3 kb and that of the deduced amino acid sequencesfrom these cDNAs was 816 amino acid residues. By comparison of the predicted CDSs between mouseand human KIAAs, 12 mKIAA cDNA clones were assumed to be differently spliced isoforms of the humancDNA clones. The comparison of mouse and human sequences also revealed that four pairs of human KIAAcDNAs are derived from single genes. Notably, a homology search against the public database indicatedthat 4 out of 13 novel cDNA clones were homologous to the disease-related genes.Key words: mKIAA; mouse; cDNA sequencing; large proteins; orthologue; novel genes;protein-coding region

One of the expected goals of the human genomeproject is to fully utilize genomic information for the de-velopment of effective diagnoses and/or medical treat-ments of various human diseases. As a result, manydisease-related genes have been identified and registeredin the public databases: the Human Gene MutationDatabase (HGMD) (http://archive.uwcm.ac.uk/uwcm/mg/hgmd0.html),1 the Online Mendelian Inheritance inMan (OMIM)2 (www.ncbi.nlm.nih.gov/omim), and theGenome Database (GDB) (http://www.gdb.org ).3 Forexample, OMIM contains 8733 mapped mutations and

Communicated by Michio Oishi∗ To whom correspondence should be addressed. Tel. +81-438-

52-3919, Fax. +81-438-52-3918, E-mail: [email protected]

HGMD contains 34,682 mutations in 1390 genes as ofJuly 20, 2003. However, the gap between genotype andphenotype has yet to be filled in many cases.

In addition to these efforts, since 1994 we have beenindependently conducting a human cDNA sequencingproject to accumulate sequence information of proteincoding regions (CDSs) in cDNAs, focusing on the longcDNAs (> 4 kb) encoding large proteins.4 Since position-ally cloned genes frequently encode large multidomainproteins,5 we considered that focusing on long cDNAsencoding large proteins could be an efficient way to iden-tify unknown human disease-related genes.6–10 Conse-quently, we have newly characterized more than 2000human genes and systematically designated them using

at Pennsylvania State University on February 23, 2013

http://dnaresearch.oxfordjournals.org/D

ownloaded from

Page 2: Prediction of the Coding Sequences of Mouse Homologues of KIAA

168 Sequencing of Mouse KIAA-homologous cDNAs [Vol. 10,

“KIAA” plus a 4-digit number.11 Among these genes,38 KIAA genes were identified as disease-related genesand 193 KIAA genes were similar to disease-related genes(identities of the amino acid residues against OMIMdatabase are 100% and ≥ 30%, respectively) althoughthe molecular functions of most KIAA proteins have notbeen identified. The number of KIAA genes which link tosome disease is certainly increasing as more informationbecomes available about KIAA genes.

In general, functional studies at the mRNA and theprotein levels have the potential to bridge the gap be-tween genotype and phenotype because it is mRNA thatlinks genes and proteins. In this respect, our cDNAproject will contribute to both the forward and reversegenetic approaches to human diseases. We also expectthat this project will contribute to a better understand-ing of the etiology of human diseases at the molecularlevel. However, due to ethical considerations, the accu-mulation of experimental data using appropriate animalmodels is the most realistic means to characterize hu-man gene products in vivo. Therefore, in 2001 we be-gan to collect and characterize cDNAs encoding mousecounterparts of human KIAA proteins.12 There are sev-eral reasons why we select the mouse as our model offirst choice: (1) the mouse is a widely used model ani-mal for study of mammalian gene functions, and approx-imately 99% of mouse genes have homologues in the hu-man genome;13 (2) a high-quality draft genome sequenceof the mouse is publicly available;13 (3) genetic manipu-lation of mouse embryo has been completely establishedand provides great insight into the pathogenesis of hu-man disease. We have already reported the cDNA se-quences of approximately 500 mouse KIAA-homologues,and designated them using “mKIAA” plus a 4-digit num-ber corresponding to the human KIAA cDNA.12,14 Se-quence information was deposited in public databasesand the results of the sequence analysis of thesecDNA clones are freely available through the ROUGEdatabase (http://www.kazusa.or.jp/rouge/). A similarapproach using animal models is also being performed byother groups15 and their data are also freely available:the Mammalian Gene Mutation Database (MGMD)(http://lisntweb.swan.ac.uk/cmgt/index.htm)16 and On-line Mendelian Inheritance in Animals (OMIA) (http://www.angis.org.au/omia).17 Because KIAA genes werefirst identified by our cDNA project, our analyses ofmouse KIAA-homologous genes would well complementthe efforts by these other groups.

As an extension of our preceding studies,12,14 we hereinreport the predicted coding sequences of 513 mKIAAgenes from newly identified cDNA clones and the eval-uation results of the integrity of the protein-coding se-quences (CDSs) in these mKIAA cDNAs. Among the513 cDNA clones, we incidentally isolated 13 novel cDNAclones which are not homologous to human KIAA. Never-theless, it should be emphasized that 4 of the13 genes are

assigned as homologues of previously identified disease-related genes.

1. Assignment of CDS in Mouse KIAA-homologous cDNAs

The cDNA clones were isolated from the size-fractionated mouse cDNA libraries derived from threedifferent mouse tissues: adult brain, fetal brain and em-bryonic tail. These libraries were previously constructedby the in vitro recombination-assisted method.18 ThecDNA clones to be entirely sequenced were selected ac-cording to the procedures previously described.12,14 Af-ter careful confirmation with the DNA sequences reg-istered in the public databases, we deposited only thenewly identified mouse cDNA sequences to the DDBJ/EMBL/GenBank databases (accession numbers givenin Table 1).12,14 Their structural features are shownin Table 1 and Figs. 1–4. We decided to designatethe 500 mouse KIAA cDNA sequences homologousto human KIAA cDNA clones as “mKIAA” plus thesame 4-digit number as the corresponding human KIAAcDNA clone. Whereas 13 cDNA clones were eventu-ally found not to be homologous to any KIAA cDNAs,these were conventionally designated “mKIAA” plus a4-digit number which has not been allocated for humanKIAA genes (mKIAA3007, mKIAA3011–mKIAA3021,and mKIAA3023). The structural features of these newmKIAA cDNAs are described in a separate section indetail.

We first identified CDSs in the obtained cDNA se-quences according to the results of GeneMark analysis.19

When the GeneMark analysis did not detect any possi-ble CDSs, CDSs were tentatively assigned as the longestopen reading frames (ORFs). The average size of the513 mouse cDNA sequences reached 4.3 kb and thatof the deduced amino acid sequences from their longestCDS in each cDNA was 786 amino acid residues. Mul-tiple CDSs were predicted in 90 mouse cDNA sequences(Fig. 2), whereas the remaining 423 cDNAs carried sin-gle CDSs predicted by GeneMark analysis (Fig. 1). Ifwe include multiple CDSs with more than 50 amino acidresidues in each cDNA clone in the calculation, the av-erage size of the deduced amino acid sequences becomes816 amino acid residues. Such multiple predicted CDSsin a single cDNA sequence are most likely the resultof artificial CDS splits caused by errors of reverse tran-scription, retained intron(s), or other cloning artifacts.20

Further confirmatory experiments are required to de-termine whether these predicted CDS interruptions arespurious or not. However, since we have already donemany such confirmatory experiments for a considerablenumber of human KIAA cDNA clones, the comparisonof structures of mKIAA cDNA clones with the corre-sponding human KIAA cDNAs provides us an alter-native criterion by which to judge the authenticity of

at Pennsylvania State University on February 23, 2013

http://dnaresearch.oxfordjournals.org/D

ownloaded from

Page 3: Prediction of the Coding Sequences of Mouse Homologues of KIAA

No. 4] N. Okazaki et al. 169

Table 1. Information of sequence data and chromosomal number of the identified genes.

Gene name Accession cDNA CDS Cromosomal Gene name Accession cDNA CDS Cromosomalnumbersa) length length numberd) numbersa) length length numberd)

(bp)b) (a.a.)c) (bp)b) (a.a.)c)

mKIAA0007 AK129031 3368 665 17 mKIAA0258 AK129103 4386 415 4mKIAA0008 AK129032 3247 773 14 mKIAA0259 AK129104 5228 1569 9mKIAA0011 AK129033 4301 918 5 mKIAA0261 AK129105 4585 1285 14mKIAA0013 AK129034 5978 1049 2 mKIAA0262 AK129106 3099 824 5mKIAA0015 BC042570* 4284 313 16 mKIAA0270 AK129107 2391 350 10mKIAA0016 BC002087* 997 162 2 mKIAA0272 AK129108 3314 740 14mKIAA0017 AK129035 4044 1122 8 mKIAA0282 AK129109 4370 831 12mKIAA0018 AK129036 4017 559 4 mKIAA0286 AK129110 5998 447 10mKIAA0019 AK129037 4351 841 2 mKIAA0295 AK129111 5718 804 9mKIAA0026 AK129038 1767 288 X mKIAA0296 AK129112 6683 1541 7mKIAA0030 AK129039 3251 907 6 mKIAA0297 AK129113 7781 1431 12mKIAA0035 AK129040 2405 703 19 mKIAA0303 AK129114 5372 950 13mKIAA0041 AK129041 6469 807 16 mKIAA0311 AK129115 5712 824 12mKIAA0044 AK129042 2790 458 12 mKIAA0331 AK129116 3737 574 5mKIAA0049 AK129043 4320 915 11 mKIAA0333 AK129117 4906 1201 2mKIAA0051 AK129044 6100 1681 7 mKIAA0334 AK129118 8645 857 5mKIAA0052 AK129045 4440 743 13 mKIAA0339 BC049883* 3108 822 7mKIAA0056 AK129046 5021 1505 9 mKIAA0342 AK129119 6584 1337 9mKIAA0058 BC014808* 1773 178 15 mKIAA0345 AK129120 4568 962 18mKIAA0063 AK004913* 3356 212 15 mKIAA0346 AK129121 2278 708 11mKIAA0064 BC023732* 1938 502 5 mKIAA0348 AK129122 5130 1299 17mKIAA0069 AF172088* 2056 222 7 mKIAA0351 AK129123 6009 537 2mKIAA0070 BC035324* 2131 632 8 mKIAA0354 AK129124 3416 674 4mKIAA0074 AK129047 2226 611 2 mKIAA0355 AK129125 4788 856 7mKIAA0075 AK129048 4294 321 13 mKIAA0363 BC029830* 793 219 12mKIAA0077 AK129049 4416 147 11 mKIAA0364 AK129126 3532 1026 XmKIAA0078 AK129050 2732 426 15 mKIAA0368 AK129127 3817 1229 4mKIAA0081 BC014742* 1841 222 7 mKIAA0372 AK129128 4333 808 13mKIAA0083 AK129051 4100 1078 10 mKIAA0383 AK129129 4839 211 14mKIAA0090 AK129052 4843 992 4 mKIAA0398 AK129130 4941 467 18mKIAA0092 AK129053 929 309 9 mKIAA0399 AK129131 4197 110 11mKIAA0094 AK129054 4665 275 3 mKIAA0404 AK129132 6265 1931 19mKIAA0095 AK129055 5986 792 8 mKIAA0407 AK129133 4320 1207 9mKIAA0096 BC020189* 3085 750 9 mKIAA0410 AK129134 5222 544 14mKIAA0098 AK129056 1682 542 15 mKIAA0415 AK129135 4174 623 5mKIAA0101 AK011090* 899 140 9 mKIAA0421 AK129136 6284 683 7mKIAA0102 AK017504* 1442 215 7 mKIAA0425 AK129137 5456 1578 4mKIAA0108 AK084515* 1358 273 12 mKIAA0426 AK129138 4406 508 13mKIAA0111 BC008132* 1413 410 6 mKIAA0429 AK129139 4923 773 15mKIAA0116 AK129057 987 286 9 mKIAA0433 AK129140 5387 1132 1mKIAA0123 AK004549* 3063 509 2 mKIAA0434 AK129141 7506 1170 9mKIAA0124 AK129058 2449 737 15 mKIAA0436 AK129142 2196 484 17mKIAA0127 AK129059 4674 129 11 mKIAA0437 AK129143 5427 1407 18mKIAA0131 AK129060 3823 935 X mKIAA0442 AK129144 1369 414 5mKIAA0132 AK129061 2904 637 9 mKIAA0444 AK129145 5327 702 4mKIAA0134 AK129062 3781 1167 7 mKIAA0448 AK088698* 2247 362 3mKIAA0135 AK129063 4613 1389 1 mKIAA0449 AK129146 4559 171 1mKIAA0140 BC035523* 4899 439 7 mKIAA0450 AK129147 5333 700 4mKIAA0142 AK129064 4861 809 8 mKIAA0451 AK129148 4471 344 1mKIAA0143 AK129065 3941 832 15 mKIAA0455 AK129149 5291 795 3mKIAA0152 AK129066 5850 306 5 mKIAA0460 AK129150 4558 1385 3mKIAA0156 AK129067 3959 952 5 mKIAA0466 AK129151 5128 1214 3mKIAA0158 D49382* 3207 367 1 mKIAA0467 AK129152 6387 1478 4mKIAA0159 AK129068 4478 1397 ND mKIAA0468 AK129153 4972 486 4mKIAA0162 AK129069 4249 1229 11 mKIAA0469 AK052203* 3869 604 4mKIAA0163 AK129070 3925 466 16 mKIAA0475 AK129154 3672 268 1mKIAA0164 AK129071 4736 763 10 mKIAA0479 AK129155 4610 287 1mKIAA0165 AK129072 6597 2133 15 mKIAA0496 AK129156 2934 265 15mKIAA0169 AK129073 5670 1761 2 mKIAA0520 U30602* 4360 1334 16mKIAA0170 AK129074 5109 1015 17 mKIAA0533 AK129157 6161 1521 2mKIAA0173 AK129075 6932 934 1 mKIAA0537 AK033672* 2817 652 1mKIAA0175 AK129076 2423 648 4 mKIAA0540 AK129158 5065 1382 9mKIAA0176 AK129077 2396 239 17 mKIAA0542 AK129159 3271 353 11mKIAA0179 AK129078 4479 640 17 mKIAA0543 AK129160 6113 937 6mKIAA0181 AK129079 4019 1169 2 mKIAA0544 AK129161 2655 455 16mKIAA0185 AK129080 6040 1866 19 mKIAA0550 AK129162 5279 612 1mKIAA0186 AK129081 868 168 2 mKIAA0562 AK129163 4644 906 4mKIAA0187 AK129082 4246 1287 6 mKIAA0564 AK129164 4845 1246 14mKIAA0190 AK129083 3244 818 8 mKIAA0570 AK129165 4221 1221 11mKIAA0193 AK129084 4674 439 6 mKIAA0581 AK129166 5977 960 2mKIAA0194 AK129085 4096 1080 18 mKIAA0583 BC043128* 3367 582 14mKIAA0196 AK129086 5091 479 15 mKIAA0590 AK129167 5727 1265 17mKIAA0198 AK129087 5474 506 2 mKIAA0593 AK129168 2195 731 11mKIAA0200 AK129088 5518 1045 11 mKIAA0597 AK129169 4978 536 15mKIAA0206 AK087506* 4314 236 13 mKIAA0601 AK129170 2942 879 4mKIAA0211 AK129089 4821 1292 7 mKIAA0610 AK129171 3567 644 3mKIAA0212 AK129090 5818 674 6 mKIAA0614 AK129172 5443 1159 5mKIAA0215 AK129091 4751 822 X mKIAA0616 AK129173 5664 634 8mKIAA0222 AK129092 4828 423 18 mKIAA0617 AK129174 4359 792 1mKIAA0225 AK129093 6411 2067 6 mKIAA0620 AK129175 6178 1746 6mKIAA0229 AK129094 6784 1198 17 mKIAA0621 AK129176 7786 847 18mKIAA0233 AK129095 5959 1480 8 mKIAA0625 AK129177 4704 778 2mKIAA0234 AK129096 4867 1390 X mKIAA0629 AK129178 5927 1023 14mKIAA0239 AK129097 6094 842 11 mKIAA0631 AK129179 2683 724 9mKIAA0242 AK129098 2411 194 1 mKIAA0645 AK129180 5211 1529 5mKIAA0245 AK129099 4289 352 8 mKIAA0650 AK129181 4042 1066 17mKIAA0246 AF290914* 5499 1789 14 mKIAA0655 AK129182 4332 1083 5mKIAA0247 AK129100 4835 324 12 mKIAA0659 AK129183 2855 151 19mKIAA0250 AK129101 5792 1157 1 mKIAA0661 AK129184 4059 1035 7mKIAA0253 AK129102 2848 720 1 mKIAA0666 AK129185 5820 1087 12

at Pennsylvania State University on February 23, 2013

http://dnaresearch.oxfordjournals.org/D

ownloaded from

Page 4: Prediction of the Coding Sequences of Mouse Homologues of KIAA

170 Sequencing of Mouse KIAA-homologous cDNAs [Vol. 10,

Table 1. Continued.

Gene name Accession cDNA CDS Cromosomal Gene name Accession cDNA CDS Cromosomalnumbersa) length length numberd) numbersa) length length numberd)

(bp)b) (a.a.)c) (bp)b) (a.a.)c)

mKIAA0667 AK129186 4267 1243 6 mKIAA1033 AK129269 4977 1050 10mKIAA0677 AK129187 4516 1080 4 mKIAA1034 AK129270 6271 727 1mKIAA0679 AK129188 5024 1037 19 mKIAA1041 AK129271 4645 595 4mKIAA0683 AK129189 4428 647 17 mKIAA1048 AK129272 6200 543 6mKIAA0686 AK129190 5245 1069 13 mKIAA1055 AK129273 5906 979 9mKIAA0687 AK129191 4301 1254 1 mKIAA1062 AK129274 4914 1416 2mKIAA0692 AK129192 4369 727 5 mKIAA1064 AK129275 4920 912 7mKIAA0697 AK129193 4660 1119 5 mKIAA1068 AK129276 2585 107 11mKIAA0698 AK129194 4611 1169 X mKIAA1071 AK129277 5197 539 XmKIAA0709 AK129195 4227 1001 11 mKIAA1077 AK129278 4436 1078 1mKIAA0710 AK129196 4911 1171 10 mKIAA1079 AK129279 6894 919 NDmKIAA0711 AK129197 5924 634 8 mKIAA1080 AK129280 2806 612 7mKIAA0720 AK129198 2050 498 4 mKIAA1082 AK129281 4167 1150 18mKIAA0723 AK129199 3804 852 18 mKIAA1083 AK129282 4659 614 17mKIAA0724 AK129200 3836 1049 4 mKIAA1084 AK129283 6646 1013 7mKIAA0728 AK129201 5523 1589 1 mKIAA1085 AK129284 5427 512 18mKIAA0731 AK129202 1786 364 11 mKIAA1087 AK129285 6565 723 7mKIAA0732 AK129203 4414 972 11 mKIAA1089 AK129286 4426 1064 3mKIAA0742 AK129204 4743 1334 6 mKIAA1090 AK129287 3381 857 5mKIAA0743 AK129205 5583 1203 12 mKIAA1094 BC019543* 622 150 2mKIAA0751 AK129206 4583 1297 15 mKIAA1098 AK129288 1045 193 14mKIAA0756 AK129207 5071 1251 1 mKIAA1099 AK129289 3116 678 1mKIAA0765 AK129208 4957 569 2 mKIAA1100 AK129290 5965 384 6mKIAA0767 AK129209 6072 532 15 mKIAA1101 AK129291 4633 451 9mKIAA0770 AK129210 4306 889 2 mKIAA1104 AK129292 3401 1033 13mKIAA0776 AK129211 2687 717 4 mKIAA1113 AK129293 4871 1071 3mKIAA0778 AK129212 3318 1022 1 mKIAA1123 AK129294 4944 1389 11mKIAA0782 AK129213 4013 1147 7 mKIAA1128 AK129295 2019 672 14mKIAA0784 AK129214 4930 1089 2 mKIAA1135 AK129296 1190 396 18mKIAA0786 AK129215 3995 891 3 mKIAA1139 AK129297 4928 1224 11mKIAA0790 AK129216 6475 1078 10 mKIAA1142 AK129298 2861 597 7mKIAA0791 AK129217 6136 1423 15 mKIAA1152 AK129299 4311 500 12mKIAA0794 AK129218 2924 367 16 mKIAA1153 AK129300 2788 556 2mKIAA0797 AK129219 3853 1174 9 mKIAA1162 AK129301 4913 354 2mKIAA0801 AK129220 5180 1042 13 mKIAA1166 AK129302 2075 243 XmKIAA0805 AK129221 6217 1751 12 mKIAA1169 AK129303 3755 590 5mKIAA0810 BC047928* 4021 920 5 mKIAA1171 AK129304 4178 586 17mKIAA0812 AK129222 5085 1147 9 mKIAA1180 AK129305 5798 490 8mKIAA0814 AK129223 977 116 11 mKIAA1185 AK129306 3204 521 4mKIAA0816 AK129224 5802 1167 2 mKIAA1194 AK129307 3448 376 11mKIAA0829 AK129225 4336 1332 10 mKIAA1196 AK129308 5340 883 2mKIAA0834 AK129226 4732 453 5 mKIAA1204 AK129309 6147 1020 16mKIAA0840 AK129227 4567 523 15 mKIAA1212 AK129310 5282 727 11mKIAA0841 AK129228 3457 549 7 mKIAA1215 AK129311 3835 575 1mKIAA0850 AK129229 3421 644 1 mKIAA1216 AK129312 4789 1385 10mKIAA0857 AK129230 6119 1353 6 mKIAA1227 AK129313 5449 1074 16mKIAA0858 AK129231 5774 1625 14 mKIAA1235 AK129314 1455 426 17mKIAA0862 BC049775* 2588 584 19 mKIAA1236 AK129315 5042 1309 12mKIAA0868 AK129232 6957 1399 6 mKIAA1247 AK129316 3925 948 2mKIAA0871 BC049127* 4013 471 5 mKIAA1253 BC017148* 2767 461 10mKIAA0876 AK129233 4360 1027 17 mKIAA1254 AK032645* 1227 219 9mKIAA0878 AK129234 5018 633 13 mKIAA1259 AK129317 4521 1196 2mKIAA0881 AK129235 4091 1070 7 mKIAA1266 AK129318 1658 522 17mKIAA0884 AK129236 3472 1060 12 mKIAA1269 AK129319 4298 409 2mKIAA0886 AY102283* 1406 172 11 mKIAA1277 AK081176* 3416 1074 6mKIAA0904 AK129237 4608 1051 11 mKIAA1279 AK129320 2371 638 10mKIAA0917 AK129238 6841 164 12 mKIAA1281 AK129321 2524 196 18mKIAA0921 AK129239 5118 1522 19 mKIAA1286 AK129322 4524 649 17mKIAA0922 AK129240 4177 1346 3 mKIAA1287 AK129323 4481 1202 11mKIAA0928 AK129241 4724 1475 12 mKIAA1288 AK129324 6127 405 8mKIAA0943 AK129242 4185 266 1 mKIAA1293 BC048497* 1187 376 XmKIAA0945 AK129243 4577 806 2 mKIAA1301 AK129325 6415 1455 1mKIAA0947 AK129244 6103 1732 13 mKIAA1303 AK129326 3047 373 11mKIAA0948 AK129245 2267 655 11 mKIAA1306 AK129327 5498 1347 17mKIAA0950 AK129246 2498 345 15 mKIAA1308 AK129328 3590 924 2mKIAA0953 AK129247 5862 657 12 mKIAA1310 AK129329 4527 828 1mKIAA0956 AK129248 4854 1210 3 mKIAA1311 AK129330 5164 735 18mKIAA0966 AK129249 4530 1169 7 mKIAA1315 AF132726* 1919 449 4mKIAA0970 AK129250 4089 622 14 mKIAA1323 AK129331 1349 143 18mKIAA0971 AK129251 3470 601 1 mKIAA1333 AK129332 5119 731 12mKIAA0974 AK129252 1828 369 14 mKIAA1334 AK129333 4750 992 15mKIAA0980 AK129253 4665 1292 2 mKIAA1338 AK129334 4560 1397 2mKIAA0982 AK129254 4412 504 13 mKIAA1340 AK129335 5481 282 6mKIAA0989 BC035201* 4180 807 9 mKIAA1341 AK129336 2831 506 2mKIAA0990 AK129255 4167 821 7 mKIAA1350 BC022221* 3750 613 3mKIAA0994 D16580* 3825 389 4 mKIAA1352 AK129337 3805 1210 18mKIAA0996 AK129256 4580 804 14 mKIAA1354 AK129338 4127 679 4mKIAA0999 AK129257 5289 1052 9 mKIAA1355 AF317839* 4087 1189 1mKIAA1007 AK129258 5409 1458 14 mKIAA1367 BC013628* 4815 809 12mKIAA1008 AK129259 4047 428 14 mKIAA1374 AK129339 2608 446 3mKIAA1009 AK129260 1863 620 9 mKIAA1375 AK129340 4410 909 9mKIAA1010 AK129261 3901 836 19 mKIAA1377 AK129341 3440 1004 9mKIAA1013 AK129262 4283 989 6 mKIAA1379 BC014698* 1305 391 NDmKIAA1014 AK129263 5015 1052 2 mKIAA1382 AK129342 4590 571 15mKIAA1016 AK129264 4695 777 14 mKIAA1386 AK129343 6429 1087 5mKIAA1017 AK129265 4726 1060 7 mKIAA1387 AK129344 3351 865 11mKIAA1020 AK129266 4490 642 16 mKIAA1390 BC039762* 1571 378 3mKIAA1023 AK029333* 2962 257 5 mKIAA1391 AK129345 5854 1063 9mKIAA1029 AK129267 4364 830 18 mKIAA1392 AK129346 4573 876 8mKIAA1031 AK129268 4732 618 15 mKIAA1394 AK129347 3762 851 19

at Pennsylvania State University on February 23, 2013

http://dnaresearch.oxfordjournals.org/D

ownloaded from

Page 5: Prediction of the Coding Sequences of Mouse Homologues of KIAA

No. 4] N. Okazaki et al. 171

Table 1. Continued.

Gene name Accession cDNA CDS Cromosomal Gene name Accession cDNA CDS Cromosomalnumbersa) length length numberd) numbersa) length length numberd)

(bp)b) (a.a.)c) (bp)b) (a.a.)c)

mKIAA1395 AK129348 6484 1906 9 mKIAA1662 AK129417 3978 582 15mKIAA1398 AK129349 2911 804 2 mKIAA1667 AK129418 2721 671 5mKIAA1401 AK129350 3337 800 11 mKIAA1668 AK129419 6445 883 15mKIAA1404 BC037702* 2347 472 2 mKIAA1669 AK129420 4709 304 15mKIAA1406 AK129351 2660 838 2 mKIAA1676 AK129421 4361 621 10mKIAA1412 AK129352 3545 969 19 mKIAA1684 AK129422 5080 937 11mKIAA1417 AK129353 4527 1167 9 mKIAA1686 AK129423 4875 1205 6mKIAA1418 AK129354 4600 857 16 mKIAA1696 AK129424 3969 660 2mKIAA1422 AK129355 4161 1065 2 mKIAA1698 BC039214* 3207 1019 19mKIAA1425 AF438610* 1513 485 4 mKIAA1705 AK129425 4430 562 14mKIAA1426 AK129356 5709 712 17 mKIAA1708 AK129426 5046 1291 15mKIAA1427 AK129357 6118 429 2 mKIAA1709 AK129427 4076 392 2mKIAA1430 AK129358 4575 566 8 mKIAA1716 BC025556* 1786 277 4mKIAA1433 AK129359 4834 655 3 mKIAA1717 AK129428 4996 384 3mKIAA1437 BC048152* 4314 811 2 mKIAA1718 AK129429 4829 433 6mKIAA1441 AK129360 4930 557 3 mKIAA1721 AK129430 4723 1149 14mKIAA1443 AK129361 4611 564 14 mKIAA1734 AK129431 5192 1299 11mKIAA1445 AK129362 2878 632 16 mKIAA1735 AK129432 4896 484 9mKIAA1453 AK129363 3006 974 11 mKIAA1736 AK129433 4556 1175 2mKIAA1457 AK129364 5866 1364 5 mKIAA1738 AK129434 4933 1134 11mKIAA1458 AK129365 4315 642 5 mKIAA1740 AK129435 4322 978 6mKIAA1460 AK129366 4311 265 7 mKIAA1741 AK129436 4436 1324 2mKIAA1465 AK129367 3794 785 9 mKIAA1752 AK129437 4705 487 8mKIAA1468 AK129368 5296 950 1 mKIAA1753 AK129438 3719 826 11mKIAA1469 AK129369 3334 663 2 mKIAA1760 AK129439 4367 1332 13mKIAA1470 AK129370 3469 484 4 mKIAA1785 AK129440 3583 617 18mKIAA1478 BC010715* 2827 644 15 mKIAA1790 AK129441 6864 1898 7mKIAA1481 AK129371 5769 1726 5 mKIAA1795 AK129442 4564 531 19mKIAA1486 AK129372 6502 492 1 mKIAA1798 AK129443 3937 761 10mKIAA1488 AK129373 1295 425 3 mKIAA1802 AK129444 2649 804 8mKIAA1489 AK129374 3778 627 6 mKIAA1807 AK129445 4719 850 3mKIAA1491 BC005482* 2018 393 4 mKIAA1813 AK129446 2649 675 19mKIAA1499 AK129375 3663 662 11 mKIAA1815 AK129447 5011 895 19mKIAA1504 AK129376 3286 480 7 mKIAA1819 AK129448 2265 443 9mKIAA1506 AK129377 6432 1520 5 mKIAA1820 AK129449 5703 1518 11mKIAA1507 AK129378 3605 896 11 mKIAA1822 AK129450 3292 636 12mKIAA1514 AK129379 8188 1994 11 mKIAA1830 AK129451 4493 477 18mKIAA1515 AK129380 2946 660 9 mKIAA1835 AK129452 2877 646 15mKIAA1521 AK129381 5543 1275 2 mKIAA1840 AK129453 4830 1580 2mKIAA1522 AK129382 5025 1048 4 mKIAA1841 AK129454 4107 719 11mKIAA1523 AK129383 3685 686 11 mKIAA1845 BC010969* 1492 362 1mKIAA1530 AK129384 6423 590 5 mKIAA1848 AK129455 3186 369 2mKIAA1531 AF378759* 5035 1214 8 mKIAA1851 AK129456 4811 703 9mKIAA1534 AK129385 2750 475 7 mKIAA1853 AK129457 6567 372 5mKIAA1541 AK129386 2327 509 14 mKIAA1861 AK129458 3504 970 6mKIAA1542 AK129387 4150 1068 7 mKIAA1863 BC043126* 4039 979 11mKIAA1545 AK129388 4592 483 5 mKIAA1869 AK129459 3062 774 3mKIAA1546 AK129389 6365 614 3 mKIAA1873 AK129460 2949 859 6mKIAA1549 AK129390 4433 817 6 mKIAA1876 AK129461 1335 348 2mKIAA1552 AK129391 2503 485 17 mKIAA1877 AK129462 4842 219 17mKIAA1558 AK129392 5111 502 19 mKIAA1888 AK129463 4921 609 11mKIAA1564 AK129393 3696 836 14 mKIAA1891 AK129464 4644 1086 8mKIAA1565 AK129394 6708 2081 12 mKIAA1924 AK129465 5946 1782 17mKIAA1567 AK129395 2789 703 X mKIAA1940 AK129466 5084 865 6mKIAA1568 AK129396 5898 792 16 mKIAA1948 AK129467 3516 463 2mKIAA1573 AK129397 3972 813 4 mKIAA1954 AK032286* 2574 643 8mKIAA1575 AK129398 6034 1724 X mKIAA1968 AK129468 5948 1100 4mKIAA1581 AK083135* 2734 347 15 mKIAA1970 AK129469 4901 337 7mKIAA1584 AK129399 4001 746 X mKIAA1974 AK129470 1284 149 2mKIAA1589 AK129400 3929 774 12 mKIAA1977 AK129471 3400 574 16mKIAA1590 AK129401 3290 585 2 mKIAA1990 AK129472 4938 885 8mKIAA1593 BC043082* 3966 1058 10 mKIAA2014 AK129473 4326 1045 15mKIAA1595 AK129402 2090 615 5 mKIAA2016 AK129474 6045 1731 17mKIAA1601 AK129403 4501 856 14 mKIAA3007 AK129475 3743 734 3mKIAA1604 AK129404 4555 903 X mKIAA3011 AK129476 2559 496 6mKIAA1610 AK129405 4757 554 4 mKIAA3012 AK129477 2819 253 11mKIAA1614 AK129406 3797 909 1 mKIAA3013 AK129478 5592 1461 15mKIAA1624 AK129407 3802 530 2 mKIAA3014 AK129479 4953 1304 5mKIAA1626 AK129408 4297 1241 4 mKIAA3015 AK129480 5339 858 7mKIAA1627 AK129409 5702 389 3 mKIAA3016 AK129481 2659 714 10mKIAA1629 AK129410 5585 675 18 mKIAA3017 AK129482 2473 710 10mKIAA1633 AK129411 5603 1855 4 mKIAA3018 AK129483 6526 732 5mKIAA1635 AK129412 4864 399 3 mKIAA3019 AK129484 4903 285 3mKIAA1638 AK129413 4767 903 5 mKIAA3020 AK129485 6083 1884 15mKIAA1643 AK129414 3980 950 5 mKIAA3021 AK129486 5945 756 9mKIAA1644 AK129415 5309 253 15 mKIAA3023 AK129487 4506 1305 11mKIAA1646 AK129416 3661 409 15

a) Accession numbers of DDBJ, EMBL, and GenBank databases. The accession numbers with asterisk are those depositedby other groups because these cDNAs encoded the identical proteins to those reported previously.b) Values excluding poly(A) sequences.c) Deduced amino acids length of the CDS in each cDNA. CDSs were identified according to the result of GeneMark analysis.When multiple CDSs were predicted, the sum total value of the length of multiple CDSs more than 50 amino acids residueswere shown.d) Chromosome numbers were determined from the results of BLAT2 search of cDNA clones against the mouse draft sequence(ftp://ftp.ensembl.org/pub/mouse-7.3a/data/golden path/).11

at Pennsylvania State University on February 23, 2013

http://dnaresearch.oxfordjournals.org/D

ownloaded from

Page 6: Prediction of the Coding Sequences of Mouse Homologues of KIAA

172 Sequencing of Mouse KIAA-homologous cDNAs [Vol. 10,

mKIAA 0 1 2 3 4 5 6 7 8 9 kb

0007

0008

0011

0013

0015

0016

0018

0019

0026

0030

0035

0041

0044

0049

0051

0056

0058

0063

0064

0069

0070

0074

0075

0077

0078

0081

0083

0090

0092

0096

0098

0101

0102

0108

0111

0116

0123

0124

0127

0132

0134

0135

0140

0142

0143

0152

0158

mKIAA 0 1 2 3 4 5 6 7 8 9 kb

0159

0162

0164

0165

0169

0170

0173

0175

0176

0179

0181

0185

0186

0187

0190

0193

0194

0198

0200

0206

0211

0212

0215

0222

0225

0229

0234

0239

0242

0245

0246

0247

0250

0253

0258

0259

0261

0262

0270

0272

0282

0286

0295

0296

0303

0311

0331

mKIAA 0 1 2 3 4 5 6 7 8 9 kb

0333

0334

0339

0345

0346

0348

0351

0354

0355

0363

0368

0372

0383

0398

0399

0404

0407

0410

0421

0425

0426

0429

0433

0434

0436

0437

0442

0444

0448

0449

0451

0455

0460

0466

0468

0469

0475

0479

0496

0520

0537

0544

0550

0562

0564

0583

0590

Figure 1. Schematic representation of structures of mKIAA cDNAs which had a single predicted CDS. A shotgun method wasapplied to determine the entire sequences of cDNA clones according to the method previously reported.12 The mKIAA gene numberscorresponding to respective cDNAs are given on the left. The horizontal scales represent the cDNA length in kb. The predictedCDSs and untranslated regions are shown by dark blue and open boxes, respectively. The positions of the first ATG codons with orwithout the contexts of Kozak’s rule are illustrated by solid and open triangles, respectively.34 SINEs and other repetitive sequencesare displayed by dotted and hatched boxes, respectively. Although the corresponding human KIAA cDNAs are not shown in thisfigure, a canonical polyadenylation signal sequence (AATAAA) and its single-nucleotide variants are represented by red lines whenthey existed in the same or close (within 5 bases) positions on the aligned sequences between mKIAA and KIAA cDNAs and atleast one of them was in the 35-bp upstream region of the 3′-extreme end. If either the canonical polyadenylation signal sequence orits single-nucleotide variants is found upstream from the possible polyadenylation signal sequence thus detected by the alignment ofmKIAA and KIAA cDNA sequences, these upstream latent polyadenylation signals are represented by orange lines as an indicationthat these hexamer sequences were likely for alternative polyadenylation. On the other hand, the polyadenylation signal hexamersequences found at the 3′-end are represented by green lines when they were located differentially (their positions were more than5 bases apart) on the aligned sequences of mKIAA and KIAA cDNAs or they were present only in one of the mKIAA and KIAAcDNA polyadenylation signal sequences. The sequence alignments of mouse and human KIAA cDNAs represented in Fig. 1 areavailable through our web site (http://www.kazusa.or.jp/rouge/).

at Pennsylvania State University on February 23, 2013

http://dnaresearch.oxfordjournals.org/D

ownloaded from

Page 7: Prediction of the Coding Sequences of Mouse Homologues of KIAA

No. 4] N. Okazaki et al. 173

mKIAA 0 1 2 3 4 5 6 7 8 9 kb

0593

0601

0610

0614

0616

0620

0621

0629

0631

0650

0655

0659

0661

0666

0667

0677

0679

0698

0709

0711

0720

0723

0728

0731

0732

0742

0751

0756

0776

0778

0782

0784

0786

0790

0791

0794

0797

0810

0812

0814

0829

0834

0840

0850

0857

0858

0862

mKIAA 0 1 2 3 4 5 6 7 8 9 kb

0868

0871

0876

0878

0881

0884

0886

0904

0917

0922

0928

0943

0945

0947

0948

0950

0953

0956

0966

0970

0974

0980

0982

0989

0990

0994

0996

0999

1009

1010

1013

1016

1020

1023

1029

1033

1034

1041

1048

1055

1062

1064

1068

1071

1079

1080

1082

mKIAA 0 1 2 3 4 5 6 7 8 9 kb

1083

1084

1089

1094

1098

1099

1100

1104

1113

1123

1128

1135

1139

1142

1152

1153

1162

1166

1171

1185

1194

1196

1204

1216

1227

1235

1236

1253

1254

1266

1277

1279

1281

1286

1287

1293

1301

1306

1308

1310

1311

1315

1323

1333

1334

1340

1341

Figure 1. Continued.

the observed CDS interruptions;12,14 when a CDS splitwas predicted only in mouse cDNA clone and the cor-responding region was assigned to encode a continu-ous single CDS in human cDNA, we considered thatthe predicted CDS interruption in the mouse cDNAclone was most likely spurious. Based on this assump-tion, we found 63 mKIAA cDNAs to contain spuri-ous CDS interruption(s), and their spurious CDS inter-ruptions were classified into the following 4 categories:1) 8 mKIAA cDNA clones (mKIAA0533, mKIAA0687,mKIAA0697, mKIAA0770, mKIAA0816, mKIAA0921,

mKIAA1468, and mKIAA1521) appeared to contain anonsense mutation; 2) 12 mKIAA cDNA clones appearedto contain frame-shift errors (mKIAA0342, mKIAA0543,mKIAA0597, mKIAA0625, mKIAA1085, mKIAA1101,mKIAA1259, mKIAA1395, mKIAA1543, mKIAA1573,mKIAA1589, and mKIAA1696); 3) 40 mKIAA cDNAclones retained intron(s); and 4) 3 mKIAA cDNA clones(mKIAA0052, mKIAA0233, and mKIAA1017) carriedboth retained intron(s) and frame-shift errors (Fig. 2).These four categories of CDS interruptions in the cDNAsare shown in Fig. 2 using the symbols *, $, #, and

at Pennsylvania State University on February 23, 2013

http://dnaresearch.oxfordjournals.org/D

ownloaded from

Page 8: Prediction of the Coding Sequences of Mouse Homologues of KIAA

174 Sequencing of Mouse KIAA-homologous cDNAs [Vol. 10,

mKIAA 0 1 2 3 4 5 6 7 8 9 kb

1350

1352

1354

1355

1367

1374

1375

1377

1379

1382

1386

1387

1390

1391

1392

1398

1401

1404

1406

1412

1417

1418

1422

1425

1426

1430

1433

1441

1443

1453

1457

1458

1460

1465

1469

1470

1478

1481

1486

1488

1489

1491

1499

1506

1507

1514

1515

mKIAA 0 1 2 3 4 5 6 7 8 9 kb

1522

1531

1541

1546

1549

1552

1558

1565

1567

1568

1575

1581

1584

1593

1595

1601

1610

1614

1624

1626

1627

1629

1633

1635

1643

1646

1667

1669

1684

1686

1698

1705

1708

1716

1717

1718

1721

1734

1735

1736

1738

1740

1741

1752

1753

1760

1785

mKIAA 0 1 2 3 4 5 6 7 8 9 kb

1790

1795

1798

1802

1807

1813

1815

1819

1820

1822

1830

1835

1840

1841

1845

1853

1861

1863

1869

1873

1876

1877

1888

1891

1924

1940

1948

1954

1970

1974

1990

2014

2016

3007

3011

3012

3013

3014

3015

3016

3017

3018

3019

3020

3021

3023

Figure 1. Continued.

$/# for categories 1 to 4, respectively. Most of theframe-shift was caused by a one- or two-nucleotide inser-tion/deletion which is frequently found in regions withhomopolymeric runs and is most likely due to errors inreverse transcription.20

In contrast, when a CDS split was predicted in amouse cDNA clone and the human regions correspond-ing to its adjacent regions were predicted to form alonger continuous CDS with an additional CDS region,we assumed that the predicted CDS interruption in themouse cDNA clone was caused by alternative splicing

or occasionally by a splicing error. Twelve mKIAAcDNA clones (mKIAA0017, mKIAA0163, mKIAA0364,mKIAA0542, mKIAA0543, mKIAA0645, mKIAA0686,mKIAA0692, mKIAA0765, mKIAA1395, mKIAA1530,and mKIAA1590) were assumed to be isoforms causedby alternative splicing or to be products from aberrantlyspliced mRNAs. This class of CDS interruptions in thecDNAs is indicated in Fig. 2 using “+”. Among thesecDNAs, mKIAA0765, which was annotated RNA bindingmotif protein 12, was assumed to be an isoform causedby splicing out of a relatively large genomic region (ap-

at Pennsylvania State University on February 23, 2013

http://dnaresearch.oxfordjournals.org/D

ownloaded from

Page 9: Prediction of the Coding Sequences of Mouse Homologues of KIAA

No. 4] N. Okazaki et al. 175

mKIAA-9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 kb

0017 (M)+

(H)

0052 (M)$/#

(H)

0094 (M)#

(H)

0095 (M)#

(H)

0131 (M)#

(H)

0156 (M)#

(H)

0163 (M)+

(H)

0196 (M)#

(H)

0233 (M)$/#

(H)

0342 (M)$

(H)

0364 (M)+

(H)

0415 (M)#/+

(H)

0450 (M) (H)

0467 (M)#

(H)

0540 (M)#

(H)

0542 (M)#/+

(H)

0543 (M)$/+

(H)

0581 (M)#

(H)

0597 (M)$

(H)

0617 (M)#

(H)

0625 (M)$

(H)

0645 (M)+

(H)

mKIAA-9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 kb

0683 (M)#

(H)

0686 (M)#/+

(H)

0687 (M)* (H)

0692 (M)+

(H)

0697 (M)* (H)

0710 (M)#

(H)

0724 (M)#

(H)

0743 (M)#

(H)

0765 (M)+

(H)

0767 (M)#

(H)

0770 (M)* (H)

0801 (M)#

(H)

0816 (M)* (H)

0841 (M)#

(H)

0921 (M)* (H)

0971 (M)#

(H)

1007 (M)#

(H)

1008 (M)#

(H)

1014 (M)#

(H)

1017 (M)$/#

(H)

1031 (M)#

(H)

1077 (M) (H)

Figure 2. Schematic comparison of structures of mKIAA and KIAA cDNAs either of which had multiple predicted CDSs. Only CDSsencoding more than 50 amino acid residues predicted by GeneMark analysis are indicated. In particular, when multiple predictedCDSs overlap with each other in single cDNA sequences, only the shorter CDSs consisting of more than 50 amino acid residues areshown in this figure. The longest CDSs, the shorter CDSs and untranslated regions are shown by dark blue, light blue and openboxes, respectively. The mKIAA numbers highlighted by asterisk (*), dollar sign ($), sharp (#) and plus sign (+) represent CDSinterruptions that were supposed to nonsense mutation, frame-shift error(s), contain intron(s) and alternative splicing or splicingerror(s), respectively, as described in the text. The positions of the first ATG codon with or without the contexts of Kozak’s rule inthe longest CDSs are illustrated by solid and open triangles, respectively. The start and end points of the aligned region(s) betweenmKIAA and KIAA CDSs are tied with thin lines on the basis of results of FASTA search of amino acid sequences predicted fromcorresponding CDSs.35 A canonical polyadenylation signal sequence (AATAAA) and its single-nucleotide variants are representedby red lines when they existed in the same or close (within 5 bases) positions on the aligned sequences between mKIAA and KIAAcDNAs and at least one of them was in the 35-bp upstream region of the 3′-extreme end. If either the canonical polyadenylation signalsequence or its single-nucleotide variants are found upstream from the possible polyadenylation signal sequence thus detected by thealignment of mKIAA and KIAA cDNA sequences, these upstream latent polyadenylation signals are represented by orange lines asan indication that these hexamer sequences were likely to be for alternative polyadenylation. On the other hand, the polyadenylationsignal hexamer sequences found at the 3′-end are represented by green lines when they were located differentially (their positionsdiffered by more than 5 bases) on the aligned sequences of mKIAA and KIAA cDNAs or they were present only in one of the mKIAAand KIAA cDNAs.

at Pennsylvania State University on February 23, 2013

http://dnaresearch.oxfordjournals.org/D

ownloaded from

Page 10: Prediction of the Coding Sequences of Mouse Homologues of KIAA

176 Sequencing of Mouse KIAA-homologous cDNAs [Vol. 10,

mKIAA-9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 kb

1085 (M)$

(H)

1087 (M) (H)

1090 (M) (H)

1101 (M)$

(H)

1169 (M)#

(H)

1180 (M) (H)

1212 (M) (H)

1215 (M) (H)

1247 (M) (H)

1259 (M)$

(H)

1269 (M)#

(H)

1288 (M)#

(H)

1303 (M) (H)

1338 (M)#

(H)

1394 (M)#

(H)

1395 (M)$/+

(H)

1427 (M)#

(H)

1437 (M) (H)

1445 (M) (H)

1468 (M)* (H)

1504 (M) (H)

1521 (M)* (H)

mKIAA-9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 kb

1523 (M)#

(H)

1530 (M)+

(H)

1534 (M)$

(H)

1542 (M)#

(H)

1545 (M)#

(H)

1564 (M)#

(H)

1573 (M)$

(H)

1589 (M)$

(H)

1590 (M)+

(H)

1604 (M)#

(H)

1638 (M)#

(H)

1644 (M)* (H)

1662 (M)#

(H)

1668 (M) (H)

1676 (M)#

(H)

1696 (M)$

(H)

1709 (M)#

(H)

1848 (M)#

(H)

1851 (M)#

(H)

1968 (M) (H)

1977 (M)#

(H)

Figure 2. Continued.

proximately 17 kb). Interestingly, the 3′-downstream re-gion contained the sequence of a differently annotatedgene (Copine I).21 Since a human cDNA (NCBI Ref-Seq ID no. NM 152838) encoded the same alternativelyspliced form as our cDNA sequences of mKIAA0765, thismKIAA0765-type alternative splice form would not becaused by splicing errors and would be used in a partic-ular tissue or a developmental stage.

In previous reports, we discussed the presence andtheir authenticity of the second short CDS in a sin-gle cDNA clone.12,14 In this study, we found suchcryptic CDSs in 20 mKIAA cDNAs, and evaluated

their authenticity. Unfortunately we could not eval-uate 11 cryptic CDSs (in mKIAA0450, mKIAA1087,mKIAA1090, mKIAA1180, mKIAA1212, mKIAA1215,mKIAA1247, mKIAA1644, mKIAA1668, mKIAA3005,and mKIAA3006), because no corresponding humancDNA sequences were present or no verified correspond-ing human cDNA sequences were available in the se-quence comparison. According to the criteria in theprevious report,12,14 we compared 9 cryptic CDSs andjudged 8 cryptic CDSs (in mKIAA0163, mKIAA1259,mKIAA1303, mKIAA1437, mKIAA1455, mKIAA1504,mKIAA1534, and mKIAA1696) were falsely predicted.

at Pennsylvania State University on February 23, 2013

http://dnaresearch.oxfordjournals.org/D

ownloaded from

Page 11: Prediction of the Coding Sequences of Mouse Homologues of KIAA

No. 4] N. Okazaki et al. 177

mKIAA-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 kb

0297 (M)0297 (H)0329 (H)

0533 (M)*0533 (H)1907 (H)

mKIAA-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 kb

0570 (M)0570 (H)&

0729 (H)

0805 (M)0805 (H)&

0995 (H)

Figure 3. mKIAA cDNAs which bridge two human cDNAs with different KIAA numbers. The structures of mKIAA cDNAs corre-sponding to two human different KIAA cDNAs are shown together with those of the human KIAA cDNAs, which are illustratedaccording to the formula described in the legend of Fig. 1. The KIAA and the mKIAA numbers are given on the left side of theschematic illustrations of cDNAs. The mKIAA number highlighted by an asterisk (*) was believed to be a nonsense mutation or aframe-shift error. The KIAA and mKIAA numbers are given on the left side of the schematic illustrations of cDNAs. The KIAAnumbers highlighted by an symbol (&) represent revised clones that have longer sequences than the original clones by isolatingadditional cDNA clones and/or connected parts of additional cDNAs or PCR products of the missing portion to the original cDNAclone.36

As for the remaining cryptic CDS (in mKIAA1077),though an alternative splice form in which this crypticCDS is actively used has not yet been reported, humanKIAA1077 cDNA also carried such a cryptic CDS. Thus,future experimental validation might reveal that the pre-dicted cryptic CDS in KIAA1077 is used in a particulartissue/cell or at a particular developmental stage.

As for mKIAA0170 and mKIAA1968 cDNAs, the se-quence identities of the encoded proteins were only 32%and 49% to the corresponding human KIAAs, respec-tively. Although these identities were lower than therange of generally accepted sequence identities betweenmouse and human orthologous proteins (ranging from41% to 100%21), mKIAA0170 and mKIAA1968 cDNAclones were the most homologous mouse cDNAs to thecorresponding human cDNA clones at present. We there-fore tentatively designated these cDNA clones derivedfrom mKIAA0170 and mKIAA1968 genes, even thoughwe can not completely exclude the possibility that theirmouse counterparts do not exit. Further accumulation ofsequence data will verify the authenticity of these nomen-clatures. Although an additional short CDS was foundin mKIAA1968, we did not infer the ORF as a crypticCDS from the low sequence identity.

As for mKIAA0728, the encoded protein had low(29%) sequence similarity to the corresponding humanKIAA0728. As human KIAA0728 is identical to the3′-portion of a single very long bullous pemphigoidantigen 1 (BPAG1) gene and mKIAA0728 is homolo-gous to the 5′-portion of the same gene, these portionsof the BPAG1 gene hardly overlap each other. Variousisoforms caused by alternative splicing of the BPAG1gene have also been reported,23,24 thus mouse and hu-man KIAA0728 might be different alternative splicingforms of the BPAG1 gene.

2. Chromosomal loci of mKIAA Genes

The currently available draft sequence of the mousegenome (ftp://ftp.ensembl.org/pub/mouse-7.3a/data/golden path/),13 enabled us to predict the genomic struc-tures of mouse genes by comparing the cDNA sequenceswith their corresponding genomic sequence.13,14 To as-sign the chromosomal localization of genes generatingthe 513 cDNAs identified in this study, the cDNA se-quences were subjected to BLAST search against themouse genome draft sequences, and the genome se-quences that satisfy either of the following two condi-tions were selected: E-value = 0.0 and sequence iden-tity is 90% or greater; or E-value ≤ 1e−10 and se-quence identity is 99% or greater. Then they are alignedwith the cDNA sequences by SIM4.25 As shown inTable 1, we could successfully map 510 mouse KIAAcDNAs on the genome under these conditions, while threecDNAs (mKIAA0159, mKIAA1079, and mKIAA1379)could not be mapped. The results from the sequenceanalysis of mouse KIAA cDNAs including the genomicstructures predicted were summarized in the ROUGEdatabase and are available through the World Wide Web(http://www.kazusa.or.jp/rouge/).

3. mKIAA cDNAs Which Bridge Two HumancDNAs with Different KIAA Numbers

In our previous study, multiple KIAA genes sup-posed to be independent were found to be portionsof a very large single gene. In this study, we alsoidentified four mouse cDNAs, each of which seemedto correspond to two different human KIAA cDNAs.Their cDNA and genomic structures were comparedwith those of the corresponding multiple human KIAAgenes (data not shown). To assign the genome struc-tures, the cDNA sequences were subjected to BLASTsearch against the mouse and human genome sequences

at Pennsylvania State University on February 23, 2013

http://dnaresearch.oxfordjournals.org/D

ownloaded from

Page 12: Prediction of the Coding Sequences of Mouse Homologues of KIAA

178 Sequencing of Mouse KIAA-homologous cDNAs [Vol. 10,

Table 2. Predicted function of novel genes based on homology search.a)

a) Homology search was performed by Smith-Waterman algorithm, using BioView Toolkit and GeneMatcher (revi-sion 3.3, Paracel Inc. USA) against non-redundant amino acid sequence database, nr, that has been constructed by NCBI(http://www.ncbi.nlm.nih.gov/blast/db/nr.z). The homologous protein with the highest score was listed, when it satisfiedthe following conditions, i) the protein was annotated, ii) the aligned region exceeded 200 amino acid residues, and iii) per-cent identity in the algined region was 30% or greater. mKIAA3019 was not listed, since annotated protein homologous tomKIAA3019 was not exited in the database.b) The values mean the ratio of the length of aligned region to the original length of the query sequence, in percentage.

(ftp://ftp.ncbi.nih.gov/genomes/H sapiens). The ge-nomic structures of the cDNAs were assigned by SIM4under the same conditions described above.25

Four mKIAA cDNAs were found to be homologous tothe following sets of human KIAA cDNAs: KIAA0297and KIAA0329; KIAA0533 and KIAA1907; KIAA0570and KIAA0729; and KIAA0805 and KIAA0995. EachcDNA set had a single genomic locus (Chromosome 14[gi29824585] for KIAA0297/KIAA0329, Chromosome 20[gi29824591] for KIAA0533/KIAA1907, Chromosome 2[gi29824573] for KIAA0570/KIAA0729, and Chromo-some 14 [gi29824585] for KIAA0805/KIAA0995), whichwas consistent with the merger of these pairs of humanKIAA genes into single genes. As a matter of conve-nience, mouse cDNA clones homologous to respective setsof human KIAA cDNAs were designated as mKIAA plusthe earlier number of human KIAA genes.

4. Incidental Identification of Novel cDNAClones Which are not KIAA-orthologouscDNAs

In the beginning of this article, we indicated that13 cDNA clones were eventually found not to be or-thologous to any KIAA cDNAs. Most of the cDNAs(12 clones) encoded proteins similar to the correspond-ing human KIAA proteins to some extent, whereasmKIAA3015 proteins showed no significant sequence sim-ilarity to any human KIAA proteins. The possiblefunctions of the gene products predicted by similaritysearches against the public non-redundant amino acid se-quences are shown in Table 2.

Six genes had significant sequence similarity to func-tionally annotated proteins and were defined as theirhomologues. Among them, four genes (mKIAA3012,mKIAA3017, mKIAA3020, and mKIAA3023) were

assigned as homologues of disease-related genes.mKIA3012 is a mouse homologue of human RAB1Athat regulates vesicle transport from endoplasmic retic-ulum to Golgi apparatus, and overexpression of RAB1Agene caused cardiomyopathy in mice.26 mKIAA3017 isa mouse homologue of human BCR (breakpoint clus-ter region) gene. The fusion of 5′ parts of the BCRgene to the ABL (tyrosine kinase) gene, known as thePhiladelphia (Ph) translocation, produces a Bcr-Abloncoprotein and causes chronic myelogenous leukemia(CML).27 mKIA3020 is a mouse homologue of humanpseudophosphatase SBF1 gene and its deficiency in malemice causes infertility, impaired spermatogenesis, andazoospermia.28 The mKIA3023 gene is a mouse homo-logue of the human HER-2/neu gene and encodes a re-ceptor tyrosine kinase. Amplification and overexpressionof the HER-2/neu gene are known to contribute to theoccurrence of certain adenocarcinomas.29

In addition to the above mKIAAs, two genes(mKIAA3015 and mKIAA3018) were identified as ho-mologues of the genes whose functions were physio-logically or biochemically defined. mKIAA3015 pro-tein is a mouse homologue of the unc-53 gene ofCaenorhabditis elegans that is involved in longitudinalneuronal navigation.30 mKIAA3018 is a mouse homo-logue of the human AFAP gene, which has been definedas an actin filament-associated adaptor protein and mod-ulates changes in actin filament integrity in response tocellular signals.31 Thus, the sequence information of these6 mKIAA genes will be very useful to examine their rel-evance to diseases and the phenotype of animal modelssuch as knock-out models.

Other newly identified mKIAA genes were not ho-mologous to the genes which encode functionally anno-tated proteins. Among them, one mKIAA (mKIAA3011)showed relatively high similarity (79%) to Zinc fingerprotein 212, whose function is predictable. Zinc fin-

at Pennsylvania State University on February 23, 2013

http://dnaresearch.oxfordjournals.org/D

ownloaded from

Page 13: Prediction of the Coding Sequences of Mouse Homologues of KIAA

No. 4] N. Okazaki et al. 179

ger protein 212 is a human KRAB domain-containingC2H2-type zinc finger protein.32 The C2H2-type zinc fin-ger motif defines a large superfamily of specific DNA-and RNA-binding proteins, and these proteins have beendemonstrated to carry important regulatory functionsin embryogenesis.33 Therefore, the mKIAA3011 proteinmay also behave like other C2H2-type zinc finger pro-teins.

Lastly, it should be noted that the mKIAA3019protein does not have any similarity to either thepreviously identified proteins, including model ref-erence sequence predicted by automated computa-tional analysis using the NCBI gene prediction method(http://www.ncbi.nlm.nih.gov/RefSeq/), or to proteinmotifs and might be a mouse-specific protein. Furtheranalyses such as Southern zoo blotting experiments arerequired to confirm whether or not the mKIAA3019 geneis, in fact, a mouse-specific gene.

Acknowledgements: We thank Hiroshi Kohga forestablishment of a database for experimental man-agement and we also thank Tomomi Kato, TomomiTajino, Keishi Ozawa, Kazuhiro Sato, Akiko Ando,Takashi Watanabe, Kiyoe Sumi, Kyoko Watanabe,Hiroko Kinoshita, Noriko Utsumi, and Nobue Kashimaand Masaki Takazawa for their technical assistance.This study was supported by the following grants: theCREATE Program (Collaboration of Regional Entitiesfor the Advancement of Technological Excellent) fromJST (Japan Science and Technology Corporation); agrant from Special Coordination Funds and a grant fromOrganized Research Combination System of the Ministryof Education, Culture, Sports, Science and Technology,the Japanese Government; and a grant from the KazusaDNA Research Institute.

References

1. Stenson, P. D., Ball, E. V., Mort, M. et al. 2003, Hu-man Gene Mutation Database (HGMD(R)): 2003 up-date, Hum. Mutat., 21, 577–581.

2. Hamosh, A., Scott, A. F., Amberger, J., Bocchini, C.,Valle, D., and McKusick, V. A. 2002, Online MendelianInheritance in Man (OMIM), a knowledgebase of humangenes and genetic disorders, Nucleic Acids Res., 30, 52–55.

3. Letovsky, S. I., Cottingham, R. W., Porter, C. J., and Li,P. W. 1998, GDB: the Human Genome Database, NucleicAcids Res., 26, 94–99.

4. Nomura, N., Miyajima, N., Sazuka, T. et al. 1994, Predic-tion of the coding sequences of unidentified human genes.I. The coding sequences of 40 new genes (KIAA0001–KIAA0040) deduced by analysis of randomly sampledcDNA clones from human immature myeloid cell lineKG-1, DNA Res., 1, 27–35.

5. Mushegian, A. R., Bassett, D. E., Jr., Boguski, M. S.,Bork, P., and Koonin, E. V. 1997, Positionally clonedhuman disease genes: patterns of evolutionary conserva-

tion and functional motifs, Proc. Natl. Acad. Sci. USA,94, 5831–5836.

6. Kraemer, D., Wozniak, R. W., Blobel, G., and Radu,A. 1994, The human CAN protein, a putative onco-gene product associated with myeloid leukemogenesis, isa nuclear pore complex protein that faces the cytoplasm,Proc. Natl. Acad. Sci. USA, 91, 1519–1523.

7. Toh, K. L., Jones, C. R., He, Y. et al. 2001, An hPer2phosphorylation site mutation in familial advanced sleepphase syndrome, Science, 291, 1040–1043.

8. Engert, J. C., Berube, P., Mercier, J. et al. 2000, AR-SACS, a spastic ataxia common in northeastern Quebec,is caused by mutations in a new gene encoding an 11.5-kbORF, Nat. Genet., 24, 120–125.

9. Bolz, H., von Brederlow, B., Ramirez, A. et al. 2001,Mutation of CDH23, encoding a new member of the cad-herin gene family, causes Usher syndrome type 1D, Nat.Genet., 27, 108–112.

10. Vicart, P., Caron, A., Guicheney, P. et al. 1998, A mis-sense mutation in the alphaB-crystallin chaperone genecauses a desmin-related myopathy, Nat. Genet., 20, 92–95.

11. Nagase, T., Kikuno, R., and Ohara, O. 2001, Predic-tion of the coding sequences of unidentified human genes.XXII. The complete sequences of 50 new cDNA cloneswhich code for large proteins, DNA Res., 8, 319–327.

12. Okazaki, N., Kikuno, R., Ohara, R. et al. 2002, Pre-diction of the coding sequences of mouse homologuesof KIAA gene: I. The complete nucleotide sequencesof 100 mouse KIAA-homologous cDNAs identified byscreening of terminal sequences of cDNA clones randomlysampled from size-fractionated libraries, DNA Res., 9,179–188.

13. Waterston, R. H., Lindblad-Toh, K., Birney, E. et al.2002, Initial sequencing and comparative analysis of themouse genome, Nature, 420, 520–562.

14. Okazaki, N., Kikuno, R., Ohara, R. et al. 2003, Pre-diction of the coding sequences of mouse homologuesof KIAA gene: II. The complete nucleotide sequencesof 400 mouse KIAA-homologous cDNAs identified byscreening of terminal sequences of cDNA clones randomlysampled from size-fractionated libraries, DNA Res., 10,35–48.

15. Schriml, L. M., Hill, D. P., Blake, J. A. et al. 2003,Human disease genes and their cloned mouse orthologs:exploration of the FANTOM2 cDNA sequence data set,Genome Res., 13, 1496–1500.

16. Lewis, P. D., Harvey, J. S., Waters, E. M., and Parry,J. M. 2000, The mammalian gene mutation database,Mutagenesis, 15, 411–414.

17. Nicholas, F. W. 2003, Online Mendelian Inheritance inAnimals (OMIA): a comparative knowledgebase of ge-netic disorders and other familial traits in non-laboratoryanimals, Nucleic Acids Res., 31, 275–277.

18. Ohara, O., Nagase, T., Mitsui, G. et al. 2002, Character-ization of size-fractionated cDNA libraries generated bythe in vitro recombination-assisted method, DNA Res.,9, 47–57.

19. Hirosawa, M., Nagase, T., Ishikawa, K., Kikuno, R.,Nomura, N., and Ohara, O. 1999, Characterization ofcDNA clones selected by the GeneMark analysis from

at Pennsylvania State University on February 23, 2013

http://dnaresearch.oxfordjournals.org/D

ownloaded from

Page 14: Prediction of the Coding Sequences of Mouse Homologues of KIAA

180 Sequencing of Mouse KIAA-homologous cDNAs [Vol. 10,

size-fractionated cDNA libraries from human brain, DNARes., 6, 329–336.

20. Hirosawa, M., Ishikawa, K., Nagase, T., and Ohara,O. 2000, Detection of spurious interruptions of protein-coding regions in cloned cDNA sequences by GeneMarkanalysis, Genome Res., 10, 1333–1341.

21. Ehringer, M. A., Thompson, J., Conroy, O. et al. 2001,High-throughput sequence identification of gene codingvariants within alcohol-related QTLs, Mamm Genome,12, 657–663.

22. Makalowski, W. and Boguski, M. S. 1998, Evolution-ary parameters of the transcribed mammalian genome:an analysis of 2,820 orthologous rodent and human se-quences, Proc. Natl. Acad. Sci. USA, 95, 9407–9412.

23. Okumura, M., Yamakawa, H., Ohara, O., and Owaribe,K. 2002, Novel alternative splicings of BPAG1 (bullouspemphigoid antigen 1) including the domain structureclosely related to MACF (microtubule actin cross-linkingfactor), J. Biol. Chem., 277, 6682–6687.

24. Leung, C. L., Zheng, M., Prater, S. M., and Liem, R.K. 2001, The BPAG1 locus: Alternative splicing pro-duces multiple isoforms with distinct cytoskeletal linkerdomains, including predominant isoforms in neurons andmuscles, J. Cell Biol., 154, 691–697.

25. Florea, L., Hartzell, G., Zhang, Z., Rubin, G. M., andMiller, W. 1998, A computer program for aligning acDNA sequence with a genomic DNA sequence, GenomeRes., 8, 967–974.

26. Wu, G., Yussman, M. G., Barrett, T. J. et al. 2001,Increased myocardial Rab GTPase expression: a con-sequence and cause of cardiomyopathy, Circ. Res., 89,1130–1137.

27. Kurzrock, R., Shtalrid, M., Romero, P. et al. 1987, Anovel c-abl protein product in Philadelphia-positive acutelymphoblastic leukaemia, Nature, 325, 631–635.

28. Firestein, R., Nagy, P. L., Daly, M., Huie, P., Conti,M., and Cleary, M. L. 2002, Male infertility, impairedspermatogenesis, and azoospermia in mice deficient forthe pseudophosphatase Sbf1, J. Clin. Invest., 109, 1165–1172.

29. Peles, E. and Yarden, Y. 1993, Neu and its ligands: froman oncogene to neural factors, Bioessays, 15, 815–824.

30. Stringham, E., Pujol, N., Vandekerckhove, J., andBogaert, T. 2002, unc-53 controls longitudinal migrationin C. elegans, Development, 129, 3367–3379.

31. Baisden, J. M., Qian, Y., Zot, H. M., and Flynn, D. C.2001, The actin filament-associated protein AFAP-110is an adaptor protein that modulates changes in actinfilament integrity, Oncogene, 20, 6435–6447.

32. Becker, K. G., Nagle, J. W., Canning, R. D., Biddison,W. E., Ozato, K., and Drew, P. D. 1995, Rapid isolationand characterization of 118 novel C2H2-type zinc fingercDNAs expressed in human brain, Hum. Mol. Genet., 4,685–691.

33. Hollemann, T., Bellefroid, E., Stick, R., and Pieler, T.1996, Zinc finger proteins in early Xenopus development,Int. J. Dev. Biol., 40, 291–295.

34. Nam, D. K., Lee, S., Zhou, G. et al. 2002, Oligo(dT)primer generates a high frequency of truncated cDNAsthrough internal poly(A) priming during reverse tran-scription, Proc. Natl. Acad. Sci. USA, 99, 6152–6156.

35. Brenner, S. E., Chothia, C., and Hubbard, T. J. 1998, As-sessing sequence comparison methods with reliable struc-turally identified distant evolutionary relationships. PG-6073-8, Proc. Natl. Acad. Sci. USA, 95, 6073–6078.

36. Nakajima, D., Okazaki, N., Yamakawa, H., Kikuno,R., Ohara, O., and Nagase, T. 2002, Construction ofexpression-ready cDNA clones for KIAA genes: manualcuration of 330 KIAA cDNA clones, DNA Res., 9, 99–106.

at Pennsylvania State University on February 23, 2013

http://dnaresearch.oxfordjournals.org/D

ownloaded from