Poult Sci 2006. 85:1438-1441
© 2006 Poultry Science Association
MOLECULAR, CELLULAR, AND DEVELOPMENTAL BIOLOGY: Research Note |
An Expressed Sequence Tag Analysis of the Chicken Reproductive Tract Transcriptome
D. P. Froman*,1,
J. D. Kirby
and
D. D. Rhoads
,
* Department of Animal Sciences, Oregon State University, Corvallis 97330; and
Departments of Poultry Science and
Biological Science, University of Arkansas, Fayetteville 72701
1 Corresponding author: drhoads{at}uark.edu
 |
ABSTRACT
|
|---|
Analysis of the chicken reproductive tract transcriptome is important in comparative biology for analysis of reproductive tract development and evolution. In addition, molecular analysis of the reproductive tract is important for identification of genes affecting fertility in the poultry industry. We sampled the chicken reproductive tract (ovary, oviduct, and testis) transcriptome, generating 5,328 expressed sequence tags that assembled into 4,518 contigs. We identified 475 contigs with no match in the current expressed sequence tag databases or in GenBank. The novel contigs included 31 with no match to the current assembly of the chicken genome, 119 representing spliced transcripts, and 309 that were unspliced. More detailed molecular characterization of the 428 novel contigs present in the assembly will be important to gene discovery and annotation of the chicken and other vertebrate genomes.
Key Words: expressed sequence tag chicken reproductive tract testis ovary
 |
INTRODUCTION
|
|---|
The chicken genome project (Wallis et al., 2004) draft genome assembly provides a wealth of information to extend use of the chicken as a model organism in developmental and comparative biology. The genome assembly aids in gene identification, gene mapping, and genetic analysis. The chicken has been instrumental in comparative studies, particularly in developmental biology, neurobiology, and immunology, adding tremendous insight for the understanding of vertebrate systems. Comparative genomics will elucidate key conserved chromosomal sequences and gene sequences, whereas comparisons of transcript profiles in various tissues will identify conserved and divergent developmental processes. One limitation of the assembly is that current gene recognition algorithms may miss as many as 10% of protein-coding genes, thus requiring corresponding cDNA projects (Ota et al., 2003). We are interested in elucidating similarities and differences between mammalian and chicken gonadal development and gametogenesis to better understand these central processes in reproduction. For the chicken, expressed sequence tag (EST) databases have been developed for over 20 different adult and embryo tissues, including muscle, brain, lymphoid tissue, ovary, and testis (Boardman et al., 2002; Cogburn et al., 2003; Hubbard et al., 2005; Kim et al., 2005; Shin et al., 2005). However, transcriptomes for chicken male and female reproductive tracts remains largely uncharacterized. We initiated a joint project to characterize a University of Delaware chicken reproductive tract library. We assembled and annotated an EST data set from a mixed reproductive tract library from the chicken. We compared this EST database with other databases including chicken EST databases. Our results suggested that further characterization of the reproductive tract transcriptome is essential to better gene annotation of the chicken genome.
 |
MATERIALS AND METHODS
|
|---|
A reproductive tract library, provided by L. Cogburn (University of Delaware, Newark), was constructed from testis (25%), ovary (50%), and oviduct (25%) RNA. Tissue RNA were from testis: embryonic (d 19), posthatch (5, 7, 13, 21, and 35 wk of age), and adult (1 yr old); ovary: posthatch (5, 7, and 8 wk of age) and adult (1 yr old); and oviduct: entire oviduct from an immature hen (15 wk of age) plus 3 segments of oviduct (magnum, uterus, white isthmus) from sexually mature adults (37 wk of age) at 3 or 16 h postovulation. The library was directionally cloned in pCMV-SPORT 6 and maintained in Eschericia coli EMDH10B, similar to other tissue cDNA libraries (see http://www.chickest.udel.edu; Cogburn et al., 2003). A portion of the library was normalized by Invitrogen (In-vitrogen Inc., Carlsbad, CA). Individual bacterial clones from the normalized and nonnormalized libraries were cultured, and plasmid inserts were amplified from crude cell lysates. Specifically, 5 µL of bacterial overnight growth was diluted to 50 µL with sterile water. The mixture was heated at 100°C for 10 min and submitted to the University of Arkansas DNA Core Laboratory for PCR amplification and sequencing. Culture extract (5 µL) was amplified in a 50-µL of PCR reaction with flanking M13 forward and reverse primers. The PCR reactions contained of 1 µM primers, 200 µM dNTP, 50 mM TrisCl (pH 8.3), 1 mM MgCl2, 100 mM KCl, 3 mg/mL of highly purified BSA, and 15 U Taq polymerase. The PCR parameters were 94°C for 3 min then 25 cycles of 94°C for 30 s, 45°C for 30 s, and 72°C for 3 min, followed by 72°C for 3 min. The PCR products were purified on Qiagen 96-well manifolds (Qiagen Inc., Valencia, CA) and eluted into 100 µL of water, and an aliquot (5 µL) was subjected to sequence determination using an internal primer specific for the 5'-end of the cloned mRNA. Some PCR products were sequenced using Beckman CEQ chemistry and detected on a Beckman CEQ (Beckman-Coulter Agen-court Bioscience Corp., Beverly, MA), whereas others were sequenced using BigDye chemistry and detected on an ABI 3100 (Applied Biosystems, Foster City, CA). The EST sequences were then trimmed of vector sequence, and those EST with more than 100 bases of high quality were assembled into contigs using SeqMan (version 6.0, DNASTAR Inc., Madison, WI). Trimmed EST were deposited into the GenBank database of EST (dbEST; accession numbers DT654631
[GenBank]
to DT659958
[GenBank]
). Contigs less than 100 bp were not included for BLAST analyses. Consensus contig sequences were used in BLAST queries with cutoff e-values of 1020 for GenBank nonredundant [(NR) 12/19/2003; 9,572,270,928 bases in 1,994,022 sequences], Gen-Bank dbEST [7/29/2005; 28,161,526 entries, including 549,157 from Gallus gallus], and chicken EST [(ChickEST) 339,314 G. gallus EST from 64 cDNA libraries generated from 21 different embryonic and adult tissues assembled into 85,486 contigs and 19,626 annotated cDNA clones (Hubbard et al., 2005), obtained from http://www.chick.umist.ac.uk/], or with a cutoff of 1030 for whole genome shotgun trace files for the chicken genome project [10,564,548,474 bases in 10,768,939 sequences]. Contigs with no match against the trace files were then used in a BLAST search of the whole genome shotgun-contigs database at the National Center for Biotechnology Information (Bethesda, MD) representing the chicken genome assembly. Potentially spliced contigs were identified through BLAT searches on the University of California Santa Cruz genome browser (http://genome.ucsc.edu). Potentially spliced contigs were those with a BLAT score >90% of contig length and a BLAT span >100 + contig length.
 |
RESULTS AND DISCUSSION
|
|---|
We picked 1,000 colonies from the nonnormalized library and 6,200 colonies from the normalized library. We used a rapid PCR method to amplify inserts from boiled extracts from each culture. During the course of this project, we found that by adding extra (2 to 3x) Taq polymerase and extending the extension times, we could reliably amplify up to 6-kb inserts with this method. The PCR products were purified and then directly subjected to DNA sequencing. From the 7,200 clones, we obtained high quality sequence (>100 bp) from 5,328 EST sequences representing 2,284,463 bases. These sequences assembled into 4,518 contigs comprising 1,943,134 bases.
We then used BLAST searches with high cutoff values to identify those contigs with significant matches in public gene databases. High e-values were used to assure identification of significant matches and minimize false hits. For example, in BLAST searches of the GenBank NR database, use of e-values above 1020 allowed for matches of 85% over 50 bases, which included CR-1 repeats. E-values less than 1020 required more identities over more extensive regions (e.g., 90% over 120 bases). The results for the BLAST searches of 4 different databases are summarized in Table 1
. For the GenBank NR database, 3,441 contigs (76%) identified significant homologs, whereas 3,891 (86%) of the contigs had matches in GenBank dbEST. The BLAST searches of the assembled contigs for ChickEST indicated that 863 (19%) of our contigs had not been previously identified in this extensive collection of Chick-EST data. More importantly, there were 614 contigs with no match in either dbEST or ChickEST (intersection of the sets not in dbEST and not in ChickEST). However, of these 614 contigs not in EST databases, 139 contigs do not match sequences in GenBank NR. Of these 139 contigs with matches in GenBank, 105 are annotated in GenBank as predicted mRNA from automated computational analysis of genome assemblies. Our data therefore confirm these predicted transcripts, at least for the chicken. Of the remaining 475 contigs (those contigs not in EST databases and not in Genbank NR), 444 have significant matches in the current assembly, whereas 31 have no match in the assembly. These 31 matchless contigs consist of 30 singletons and 1 contig of 2 EST. There were a total of 198 contigs that failed to significantly align with the current assembly, in close agreement to the estimate of 5% for missing sequences based on the 7.04 x genome coverage (Wallis et al., 2004).
To further examine whether the 444 novel chicken contigs (no match in dbEST, ChickEST, and GenBank NR but with matches in the assembly) represent coding sequences, we used these contigs in a BLAST search of the TIGR G. gallus gene index (GGGI, version 10.0, 2/15/2005) using a more relaxed BLAST cutoff of 1015, to allow for matches in short exons. This identified 16 of the 444 contigs with entries in the GGGI, leaving 428 contigs with matches in the assembly but no match in the GGGI, dbEST, ChickEST, and GenBank NR. Of these 428 contigs, 1 contained 3 EST, 13 were doublets, and 414 were singletons. With 2 or more entries, 14 of these contigs were highly likely to be new mRNA, as yet uncharacterized in the chicken or other EST projects. The other 414 represented new candidates for G. gallus genes or 5'-regions of transcripts with no overlap to 3'-regions from other EST projects.
Others have reported that testis-derived ChickEST have a high incidence (76%) of unspliced EST, contigs that are colinear with the genome assembly (Shin et al., 2005). The 428 novel reproductive tract contigs were therefore analyzed by BLAT searches to determine which represented probable splice products. This identified 309 of 428 contigs (72%) as unspliced, in close agreement with previous estimates for testis EST (Shin et al., 2005). The unspliced contigs included 9 contigs with multiple EST (1 contig of 3 EST and 8 contigs of 2 EST). A number of the unspliced contigs appeared to come from introns of predicted (NCBI:RefSeq, http://www.ncbi.nlm.nih.gov/RefSeq/) genes. The 119 potentially-spliced contigs included 4 contigs of 2 EST. Alignments of these contigs with the assembly show that most, but not all, conform to canonical splice junctions. Many of the contigs spanned multiple exons with a range from 2 to 6 exons. For example, EST DT657601
[GenBank]
includes 6 exonic sequences of 41, 40, 91 42, 90, and 135 bases spanning 20,331 bases on GGA6. The 5'-end of our EST overlaps the 5'-end of an antisense EST, BU370869
[GenBank]
, from an adult (tissue not defined). Another example, DT657216
[GenBank]
, defines exons of 28, 139, 100, 90, and 83 bases spanning 5,284 bases of GGA5, with no other EST within nearly 50 kb. There was no apparent chromosome bias for novel EST, as most chromosomes were represented, and no chromosome appeared to be overrepresented.
To test whether the 428 novel transcripts contained protein-coding information, we used tblastx searches of the GenBank NR protein database using a relaxed cutoff (1010) to identify short, potential protein domains. Only 15 contigs identified matches (5 spliced contigs and 10 unspliced contigs). Of these, 6 matched predicted proteins from G. gallus, but the region of conservation is limited (50 to 60% identity over 70 to 100 residues). Therefore, our contigs are only distantly related to these predicted proteins.
To compare the quality of the normalization, we examined the frequency of EST from the 2 libraries in those contigs with 7 or more EST (Table 2
). In addition to the 7 contigs comprised of at least 7 EST listed in Table 2
, there were 5 contigs of 6 EST, 8 contigs of 5 EST, 28 contigs of 4 EST, 94 contigs of 3 EST, 441 contigs of 2 EST, and 3,935 contigs represented only by a single EST. The most frequent EST, for ovalbumin, represented 15% (124/855) of the EST in the nonnormalized library but less than 0.2% in the normalized library, which is approximately the frequency of the most abundant EST in the normalized library. All of the mitochondrial cytochrome oxidase I EST were identified from the nonnormalized library (8/855) but none from the 4,473 cDNA from the normalized library. Based on the different frequencies with which EST were isolated from the normalized vs. nonnormalized libraries, it appears that cDNA sequence may have an affect on the efficiency of normalization. For example, normalization appears to have significantly reduced the frequency of ovalbumin cDNA and significantly reduced the number of transcripts from cytochrome odixase I. Conversely, EST to the hypothetical protein AJ851805
[GenBank]
.1 were not found in the nonnormalized set but among the most abundant EST in the normalized library.
Comparison of our 4,518 contigs with 339,314 EST from adult and embryo chick tissues, including nearly 30,000 ovary EST (Boardman et al., 2002), revealed that 614 of our contigs identified genes previously uncharacterized as EST from the chicken (not in ChickEST or dbEST) and that 475 represented potentially new transcripts. It will be important to further characterize these 475 transcripts by generating finished, full-length cDNA sequence. As these EST are from a mixture of embryonic, adult, male, and female tissues, it will be important to determine the expression profiles for these novel transcripts. Finished cDNA sequences for these genes will be crucial for gene prediction models, as well as improved annotation of the chicken genome (Wallis et al., 2004; Hubbard et al., 2005). Analysis of the mixed reproductive tract library clearly shows that the chicken reproductive tract transcriptome is poorly documented. Based on the number of singletons and the percentage of novel contigs we have identified from this project, we believe that further sequencing in the normalized library is warranted to further explore its depth. Development of additional EST databases for specific reproductive tract tissues will also be important.
 |
ACKNOWLEDGMENTS
|
|---|
This work was supported by a grant to J. D. Kirby and D. D. Rhoads from the Arkansas Biosciences Institute. Special thanks to Larry Cogburn, University of Delaware, Newark, for providing the normalized cDNA library and clones from the nonnormalized library. Laura Thorne and Juliana Burns provided valuable assistance.
Received for publication October 26, 2005.
Accepted for publication March 9, 2006.
 |
REFERENCES
|
|---|
Boardman, P., J. Sanz-Ezquerro, I. Overton, D. Burt, E. Bosch, W. Fong, C. Tickle, W. Brown, S. Wilson, and S. Hubbard. 2002. A comprehensive collection of chicken cDNAs. Curr. Biol. 12:19651969.[ISI][Medline]Cogburn, L. A., X. Wang, W. Carre, L. Rejto, T. E. Porter, S. E. Aggrey, and J. Simon. 2003. Systems-wide chicken DNA microarrays, gene expression profiling, and discovery of functional genes. Poult. Sci. 82:939951.[Abstract/Free Full Text]
Hubbard, S. J., D. V. Grafham, K. J. Beattie, I. M. Overton, S. R. McLaren, M. D. R. Croning, P. E. Boardman, J. K. Bonfield, J. Burnside, R. M. Davies, E. R. Farrell, M. D. Francis, S. Griffiths-Jones, S. J. Humphray, C. Hyland, C. E. Scott, H. Tang, R. G. Taylor, C. Tickle, W. R. A. Brown, E. Birney, J. Rogers, and S. A. Wilson. 2005. Transcriptome analysis for the chicken based on 19,626 finished cDNA sequences and 485,337 expressed sequence tags. Genome Res. 15:174183.[Abstract/Free Full Text]
Kim, D. K., D. Lim, B. R. Lee, J. H. Shin, H. Kim, and J. Y. Han. 2005. Analysis of testis-specific transcripts in the chicken. Anim. Genet. 36:232234.[ISI][Medline]
Ota, T., Y. Suzuki, T. Nishikawa, T. Otsuki, T. Sugiyama, R. Irie, A. Wakamatsu, K. Hayashi, H. Sato, K. Nagai, K. Kimura, H. Makita, M. Sekine, M. Obayashi, T. Nishi, T. Shibahara, T. Tanaka, S. Ishii, J.-I. Yamamoto, K. Saito, Y. Kawai, Y. Isono, Y. Nakamura, K. Nagahari, K. Murakami, T. Yasuda, T. Iwayanagi, M. Wagatsuma, A. Shiratori, H. Sudo, T. Hosoiri, Y. Kaku, H. Kodaira, H. Kondo, M. Sugawara, M. Takahashi, K. Kanda, T. Yokoi, T. Furuya, E. Kikkawa, Y. Omura, K. Abe, K. Kamihara, N. Katsuta, K. Sato, M. Tanikawa, M. Yamazaki, K. Ninomiya, T. Ishibashi, H. Yamashita, K. Murakawa, K. Fujimori, H. Tanai, M. Kimata, M. Watanabe, S. Hiraoka, Y. Chiba, S. Ishida, Y. Ono, S. Takiguchi, S. Watanabe, M. Yosida, T. Hotuta, J. Kusano, K. Kanehori, A. Takahashi-Fujii, H. Hara, T.-O. Tanase, Y. Nomura, S. Togiya, F. Komai, R. Hara, K. Takeuchi, M. Arita, N. Imose, K. Musashino, H. Yuuki, A. Oshima, N. Sasaki, S. Aotsuka, Y. Yoshikawa, H. Matsunawa, T. Ichihara, N. Shiohata, S. Sano, S. Moriya, H. Momiyama, N. Satoh, S. Takami, Y. Terashima, O. Suzuki, S. Nakagawa, A. Senoh, H. Mizoguchi, Y. Goto, F. Shimizu, H. Wakebe, H. Hishigaki, T. Watanabe, A. Sugiyama, M. Takemoto, B. Kawakami, M. Yamazaki, K. Watanabe, A. Kumagai, S. Itakura, Y. Fukuzumi, Y. Fujimori, M. Komiyama, H. Tashiro, A. Tanigami, T. Fujiwara, T. Ono, K. Yamada, Y. Fujii, K. Ozaki, M. Hirao, Y. Ohmori, A. Kawabata, T. Hikiji, N. Kobatake, H. Inagaki, Y. Ikema, S. Okamoto, R. Okitani, T. Kawakami, S. Noguchi, T. Itoh, K. Shigeta, T. Senba, K. Matsumura, Y. Nakajima, T. Mizuno, M. Morinaga, M. Sasaki, T. Togashi, M. Oyama, H. Hata, M. Watanabe, T. Komatsu, J. Mizushima-Sugano, T. Satoh, Y. Shirai, Y. Takahashi, K. Nakagawa, K. Okumura, T. Nagase, N. Nomura, H. Kikuchi, Y. Masuho, R. Yamashita, K. Nakai, T. Yada, Y. Nakamura, O. Ohara, T. Isogai, and S. Sugano. 2003. Complete sequencing and characterization of 21,243 full-length human cDNAs. Nat. Genet. 36:4045.
Shin, J. H., H. Kim, K. D. Song, B. K. Han, T. S. Park, D. K. Kim, and J. Y. Han. 2005. A set of testis-specific novel genes collected from a collection of Korean native chicken ESTs. Anim. Genet. 36:346348.[ISI][Medline]
Wallis, J., J. Aerts, M. Groenen, R. Crooijmans, D. Layman, T. Graves, D. Scheer, C. Kremitzki, M. Fedele, N. Mudd, M. Cardenas, J. Higginbotham, J. Carter, R. Mcgrane, T. Gaige, K. Mead, J. Walker, D. Albracht, J. Davito, S.-P. Yang, S. Leong, A. Chinwalla, M. Sekhon, K. Wylie, J. Dodgson, M. Romanov, H. Cheng, P. D. Jong, K. Osoegawa, M. Nefedov, H. Zhang, J. Mcpherson, M. Krzywinski, J. Schein, L. Hillier, E. Mardis, R. Wilson, and W. Warren. 2004. A physical map of the chicken genome. Nature 432:761764.[Medline]