What does this protein do? Ask FEATURE.
January 20, 2009 2 Comments
(Part of the “Dinner Table Science” series)
It’s a scenario that’s becoming more and more common in today’s technology-driven science: we discover a new gene or protein, but have nary an inkling of how it contributes to life as we know it. Knowing what proteins do can lead to improved medical treatments, better biofuels, and new insights into biology, but it does little good to discover proteins without studying them. If we’re lucky, the new protein happens to be similar to something we’ve already studied, and so we can guess with some confidence about its function, but what do we do when it’s not? Testing everything we can think of with laboratory experiments is clearly out of the question; there are far too many possibilities, and – let’s face it – far too few graduate students.
Well, you could try asking a computer program. Inspired by the fact that a protein’s amino acid sequence and three-dimensional structure often provide clues about what it does, computational scientists are developing algorithms to predict function for newly discovered proteins. FEATURE, developed by Russ Altman’s group at Stanford University, is one such algorithm. It uses a technique known as supervised machine learning, which derives distinguishing characteristics of a particular kind of object from known examples so that future instances can be classified automatically.
If it looks like a duck…
In FEATURE’s case, the objects are functional sites – specific locations in protein structures associated with behaviors like ion binding or enzymatic reactions. Even a fairly simple function like calcium binding is critical to the cell – calcium regulates the activity of many important proteins, controlling how cells respond to their environment, how neurons fire, and how muscles contract. FEATURE can learn what calcium binding sites look like by comparing examples of sites known to bind calcium to examples that are known to not bind calcium. It can then predict whether a new protein binds calcium. It is essentially the computer equivalent of a laboratory test for calcium binding, without the need for chemical reagents, physical quantities of the protein in question, or hours of human labor.
Using the FEATURE algorithm, the Altman group has created models for many different protein functions, including calcium and zinc binding. In the works are projects which probe the dynamic nature of protein function – how functional sites change as proteins flex and wiggle – and efforts to discover new kinds of functional sites without prior knowledge.
Not a miracle cure, but useful
Automated classification has its own drawbacks, of course. FEATURE uses protein structure data to learn the distinguishing properties of functional sites, but only a fraction of proteins have structure data available. More generally, computational tools provide only predictions (some more accurate than others) and so do not eliminate the need for experimental verification.
Despite these problems, algorithms for predicting function are an important part of studying proteins in the post-genomic era. The growing popularity of large-scale projects such as metagenomics (sequencing all the DNA in samples of natural environments such as the oceans and our digestive systems) and structural genomics (solving the structures of all known proteins) means that the rate at which we discover new proteins is increasing much faster than the rate at which we acquire understanding about what they do. Armed with computational tools like FEATURE, we can narrow down the possibilities and generate testable hypotheses, making an initially intimidating task – figuring out what protein X does – more tractable.
Note: My Ph.D. work is based on FEATURE.