I just came across this article in Science magazine that I believe is an interesting advancement in the field of structural genomics.
A group from Scripps and Burnham Institute including Adam Godzik took the whole genome of Thermotoga Maritima, a thermophile bacterium with a small genome, and modeled all the proteins and metabolic pathways, 478 proteins in total. This also included figuring out for a good number of proteins what the function of that protein is, and then reconstructing the metabolic pathways.
Of the 478 proteins, 120 had been identified experimentally. Of the missing 358 proteins, about half could be modeled with pretty good confidence (i.e. better than 30% homology). Only 3 of the proteins required some major tinkering to get at least a rough idea of what the fold looks like, and the analysis of the fold is what the group is focusing on.
First, I think the fact that two thirds of the structures were either from experiment or from reliable homology modeling is pretty encouraging, but it depends if you are a glass half full or half empty guy.
Second, once they had the enzymatic pathways modeled this way the group then identified a minimal set of proteins essential for the bacterium to survive. They found three groups of proteins: "core essential", where if you take out one, game over for Thermotoga, "synthetic lethal" where there is a built-in redundancy in the pathways such that one protein alone is not essential, but taking out more than one is lethal, and "non-essential" which are, well, non-essential.
The level of detail that the modeling allowed is pretty impressive (I think). I wonder how long it is going to take to go from this to a human or mammalian cell. An analysis like this could have a really big impact in drug discovery, where the preferred thinking of affecting one target to cure a disease is running into a bit of trouble lately, imho. Analyzing networks of proteins this way could really have a major impact on figuring out the best way to affect disease states using polypharmacology.
Of course, going from 478 to 20,000-25,000 proteins is not going to happen tomorrow (or ever?). Knowing how many proteins there are would be a good first step, I guess.
Did we finally figure out how many protein-coding sequences there are in the human genome? I have to check.
Zhang Y, Thiele I, Weekes D, Li Z, Jaroszewski L, Ginalski K, Deacon AM, Wooley J, Lesley SA, Wilson IA, Palsson B, Osterman A, & Godzik A (2009). Three-dimensional structural view of the central metabolic network of Thermotoga maritima. Science (New York, N.Y.), 325 (5947), 1544-9 PMID: 19762644