The home stretch, and next steps

Photo by zizzy on Flickr

Photo by zizzy on Flickr

I’ve been neglecting my blog lately, through no fault of its own. There’s been lots going on, and lots not going on as a result, so I just wanted to post a quick update.

I’m a little over a month away from handing in my dissertation, which means many hours that might have gone towards writing blog posts instead went towards writing the darn thing. Learning LaTeX wasn’t nearly as bad as I thought it would be, especially with a handy Stanford thesis template and some web resources. I just handed a copy of the draft to my committee for comments and hoped for a brief respite to take care of some random bits (like all those appendices, code documentation, etc…).

Of course, my advisor says, “Now that you have nothing to do, why don’t you write a paper?” So now I’m writing a paper… which is probably a good thing, since there does seem to be enough material for a paper, and I might as well write it while I’m in so-called “writing mode”. I have the feeling by the time this is over, though, even Twitter will seem like too much writing!

Most of the rest of the time is spent on activities related to Ultimate, which started up about a month ago. There’s been two tournaments, some practices, various pickup games which are pseudo-mandatory for me, and various workouts to get in shape for the season.

Even with those two things, there’s probably ample time to squeeze in a post or two. But the mental energy isn’t quite there. And maybe it’s also because I still spend a couple hours cooking and get 8-9 hours of sleep most nights. Some things just can’t be sacrificed. Like home made strawberry shortcake with freshly whipped cream. :)

So that’s mostly what I’ve been up to the last couple months. The next month will probably be much the same. And then?

Well, I think it’s official enough now that I can announce it: I’ve accepted an offer to join 23andMe as a scientific curator in late August! I’m very excited about working with them and hopefully will be able to contribute across multiple facets of the company.

The end – and a new beginning – is in sight!

A wordle says… a thousand words

… or, in this case, 75.

wordle1

The day has finally come – the day before my thesis proposal dissertation oral defense exam whatsit. I’m feeling pretty good about it, mostly because after many late nights slaving over my slides, after writing and adding and changing and tweaking language endlessly to make sure I’m saying everything I need to say but not saying too much, after spending hours running new code to produce new calculations to produce new figures only to conclude not to include them in the talk, after slashing and burning more than half of the slides multiple times, once accompanied by a despondent “arrrgghh!! He HATES it!!”… I’m feeling unreasonably ok about tomorrow because my presentation is now, well, presentable.

For those of you who can’t hear me extoll the virtues of machine learning methods for function recognition in protein structures, I composed a Wordle from my notes associated with each slide. I think it summarizes my thesis rather well in 75 words or less.

The beginning of the end: defending the dissertation, part 1

After many years (!?) and several false starts (and false ends), I can finally see the light at the end of the tunnel. In two weeks I’ll be defending my dissertation. Granted, it’s not the rah rah got my plane ticket and a bound copy of my thesis type of defense. My department does a “proposal defense” which takes place 6-9 months before you intend to finish, so it’s both more and less stressful than the traditional defense. I have to say, I think I like this better than the traditional arrangement because it forces you to crystallize your thoughts and be able to discuss them constructively with others. As part of the process, we have to turn in a dissertation proposal 2 weeks prior, and it was – besides a huge pain in the rear – extremely helpful for clarifying my thesis.

My fellow students are always very gracious with their time, so many of them sat through a practice talk and gave feedback. Several hours and plenty of Thai food later, I had many pages of suggestions. The most important advice I received revolved around the following:

  • Emphasize the problem you are solving and the context with regards to existing work
  • Use demonstrative, motivating examples
  • Have a clear structure to all parts of your talk so it is easy to follow
  • Help people focus in on what is important in each slide
  • Display equations extremely sparingly, use graphics when possible
  • Don’t use multiple examples when one clear one will suffice
  • Make fonts, axes, points etc as large as you can on graphs/plots
  • Return to your outline, specific aims, or framework periodically to re-orient the audience
  • Think about what is important for your audience to know, cut out all other detail (you can keep what you cut out in the back of the presentation in case someone asks)

Next week I’ll hopefully have incorporated all of this into a new hopefully 25 minutes shorter (!!) version of the talk (yes it was over an hour…) for another round of feedback.

For anyone interested in what my dissertation is about, here’s an abstract:

Knowledge of protein function is essential for understanding biological processes and mechanisms, which can be manipulated to treat disease or engineer beneficial outputs such as disease therapeutics or biofuels. The emergence of high-throughput biological tools, however, has produced a significant bottleneck between protein identification and functional annotation. Structural genomics projects are generating many novel protein structures with little associated functional knowledge, and so computational function characterization methods that do not rely on strict sequence or structure conservation are needed. In this proposal, I present a method for building 3D models of protein functional sites automatically from sequence motifs, called SeqFEATURE, which we have used to construct a large library of functional site models. In particular, I show that SeqFEATURE performs more robustly than other methods when sequence and structural similarity are low.

Another problem in function prediction stems from the fact that most methods require examples of known functions and do not generalize to new functions. A recent study used unsupervised clustering to group together structurally and chemically similar FEATURE-based protein microenvironments, which could potentially represent novel functions. To annotate these clusters, I developed a set of methods for ranking important terms found in the literature associated with the proteins comprising the cluster. In addition, I have adapted the “neighbor divergence per gene” (NDPG) method to assess functional coherence of protein clusters. Preliminary analyses indicate that functional clusters have much greater functional coherence than random clusters, and that coherence decreases with the amount of signal in the cluster. The NDPG method will be combined with hierarchical clustering to refine and select optimal sub-clusters for annotation.

This work extends existing frameworks in the context of structural genomics: creating a pipeline for rapid construction of robust functional site models that can be applied in high-throughput, and defining an approach by which novel biological functions can be discovered and characterized.

Follow

Get every new post delivered to your Inbox.