![]() |
Laboratory for Genomics and Bioinformatics |
![]() |
|||||
|
Lee H. Pratt |
Marie-Michèle
Cordonnier-Pratt
|
||||||
| Projects | People | Laboratory | Protocols | Acknowledgments | Links | |
Home | ||
|
Sorghum
Milestone Unigene Set
|
|
The objective of the first three years of NSF funding was to obtain 10,000 or more unigenes from 10 sorghum cDNA libraries, each sampled to a depth of about 5,000 clones. Each clone was sequenced from both 3' and 5' ends, leading to a total of somewhat more than 50,000 cDNAs and 107,652 ESTs. These libraries are the first 10 listed in the overall summary of sorghum EST libraries. A growing unigene set is assembled by clustering only 3' ESTs. This approach avoids the difficulty inherent to 5' ESTs, which do not necessarily overlap even if coming from the same gene. We have just completed construction of a SQL-verified milestone unigene set, which contains 16,801 members. Via this link one can find information by entering Unigene number or anchor sequence, laboratory or GenBank identification. Alternatively, a versatile gene discovery viewing tool is also available. The latter provides electronic expression profiles, a query function, contig sequence content, annotation, and more. All cDNAs sequenced were selected randomly from unamplified cDNA libraries that were neither subtracted nor normalized. Thus, it has been possible to monitor the rate of gene discovery in close to real time, to predict the number of unigenes to be identified at any future sequencing depth, and to predict the total number of genes expected if all libraries were sequenced to infinity. This latter number is presently ~34,000. As new libraries are added, it continues to increase although the rate of increase is now rather small. Thus, a rough estimate of the total number of genes in sorghum based upon our EST data would be between 35,000 and 40,000. The first figure illustrates the number of unique genes ('unigenes') identified as a function of the number of 3' sorghum ESTs in our database.
The second figure displays the number of sequences in each unigene cluster as a function of an arbitrary cluster number. Each bar represents a 'unigene'; each library is represented by a different color. The scroll bar allows one to move through all 21,021 contigs. Clicking on a single bar brings up a pie chart as shown below this figure. The gene discovery viewing tool referred to above provides similarinformation, and more, for our Milestone Unigene Set.
|