Laboratory for Genomics and Bioinformatics

 
 

Lee H. Pratt

Marie-Michèle Cordonnier-Pratt
 
Projects People Laboratory Protocols Acknowledgments Links
Home
EST Summary

In an initial collaboration at the University of Georgia, we have obtained high quality sequence from 3,552 plasmids selected at random from three equine cDNA libraries made from monocytes, mesenteric lymph nodes, and liver mRNA. Liver cDNAs were sequenced from the 3’ end only. Most monocyte cDNAs were sequenced from the 3’ end only, while 768 were sequenced from both 3’ and 5’ ends. Because lymph node cDNAs were not cloned unidirectionally, they were sequenced from only one end. A total of 3,574 quality sequences, defined as those having a contiguous region of >100 base pairs (bp) exceeding an overall PHRED score of Q16 after removal of vector and polyT, were obtained. These sequences average ~600 nucleotides (nt) in length. When these sequences are evaluated instead with a moving window of Q20, average read lengths are reduced by only ~40 nt. Clustering of 3’ ESTs establishes that they derive from ~1,400 different equine genes (unigenes), giving a gene discovery yield of ~60% (Table 1). BLASTP returns from SwissProt Plus yielded 284 or 195 hits to other equine sequences at an E-value of <-20 or <-50, respectively, and 1838 or 1135 hits overall at the same E-values. Thus, about 60% of the cDNAs that provided quality sequence could be identified in this way. Of those cDNAs for which sequence was obtained from both ends, we found that 98% were annotated identically irrespective of whether the EST was 3’ or 5’. Thus, our 3’ EST are sufficiently long and accurate to provide annotation that is essentially as good as that provided by 5’ ESTs. These quality sequences have been deposited in GenBank and can be viewed without trimming elsewhere on this web site. Clone requests are filled as described on another page.

A summary of our current (as of 29 Jan. 2003) equine EST data is presented in an accompanying table.

Annotation:
These equine ESTs have been annotated in provisional fashion by BLASTX and BLASTN against the SwissProt Plus and GenEMBL databases, respectively. Newly discovered equine cDNAs cover a wide range of function (Figure 1), including cytokines, chemokines, enzyme inhibitors, growth factors, receptors, housekeeping genes, nuclear proteins, proto-oncogenes, and signal transducers. All of these cDNAs have been archived in duplicate at -80°C and are available for distribution (http://fungen.botany.uga.edu/Sorghum/Clone.htm) at cost to cover shipping and handling. Of the cDNAs with an apparent match to known genes, the majority included those involved in cell signaling/cell communication.

 

Figure 1. Cellular component ontology of a limited number of our equine ESTs based on provisional annotations.

 

Polymorphic Markers:
Our bioinformatics pipeline allows us to detect polymorphic markers, such as SNPs and SSRs, that might be useful in gene mapping studies or for studies of association with specific QTLs such as susceptibility to disease. Current information about such markers is limited, as we have until now included a limited number of genotypes in our cDNA libraries. The table below lists a few examples of the equine SSRs discovered. With additional funding this activity will be expanded substantially.


Examples of some of the microsatellite repeat motifs in our equine ESTs.