Breaking out of “the last bastion of indentured servitude”

(Note: I originally had this comic as an illustration but have removed it while I check whether I have permission to use it as the image is under copyright.)

The Stanford undergraduate program in biomedical computation and graduate program in biomedical informatics jointly hosted an industry panel tonight to highlight career paths outside of academia. The panel was diverse, including:

  • someone who had worked at a small biotech prior to enrolling in a Ph.D. program and co-founded a startup while still finishing his degree,
  • someone who had gone from a Ph.D. and a post-doc in bench science to working in venture capital, and
  • someone who worked in chemical engineering and toxicology before becoming head of a biomedical informatics division at a large pharmaceutical company and is soon starting an MBA.

They talked about how they got to where they are now, general advice for people considering whether to get a Ph.D. or an M.D., how to approach startups, and some differences between working in academia vs. large companies vs. small startups. Some general themes that came out of the discussion were:

  • Getting a Ph.D. is a good idea. As one panelist put it, “I hated it, it was the worst time in my life, but I’m so glad I did it.” Having a Ph.D. simply offers you more opportunity and removes the glass ceiling which unfortunately is present for those who do not have higher degrees. Even in less technical jobs, a Ph.D. is often useful. In venture capital, for example, you might be interacting with very technical people on very technical projects, and a Ph.D. not only gives you leverage to build rapport with them but also gives you the training to understand the details of those projects. (The same is also true for consulting.)
  • Seek out diverse experiences and broaden your view. Even in a Ph.D. where you’re focusing on a very specific area, you should be aware of what’s going on elsewhere because nothing is completely isolated from everything else. One panelist encouraged students to do multiple internships to explore industries and career options.
  • Learn how to fail, and be tenacious. You hear this everywhere but it really is true: don’t be afraid to fail, and to fail often. People are more likely to hire someone who has failed and learned from it than someone who has always succeeded. If you’ve always succeeded, it may mean you’ve never been tested, and similarly, that you’ve never taken risks. If you go to graduate school, however, you’ll definitely fail a lot
  • There’s no “right” path. Each of the panelists started out thinking they were going to do one thing and ended up doing something different – sometimes something wildly different. None of them took a direct line to get to where they are now. You shouldn’t expect to, either. Explore, be flexible, and take advantage of opportunities when they present themselves. Just keep thinking about what you enjoy doing and what fires you up. This leads to the next theme.
  • Find your passion. You will be hired if you are passionate about that work. Your startup will be more likely to succeed if you are passionate about it. You will have more of an impact if you are passionate about it.

In addition to this general advice, the panelists fielded several questions surrounding startups – how to evaluate ideas and the pros and cons of startups vs. other types of companies or academia.

To startup or not to startup?

This depends on your goals. If you just want to make a lot of money, by all means start a company. If you want a specific experience, you may be better off joining an existing small company that does something you get excited about. When evaluating a startup idea, it can be helpful to ask whether the idea is an enabling technology for a tangible application. Even if the market isn’t quite there yet, your idea will create one if you can demonstrate that your product makes something possible that wasn’t possible before. An example of this is the technology that enabled high-throughput parallel assays of gene expression, which was later acquired by Affymetrix.

What makes a startup different from a large company or academia?

A major difference is pace. Things happen fast in the startup environment and attitude is very much “fail early and fail often.” Decisions are made and executed quickly. In contrast, decision-making can be incredibly slow in a large company, which often is much less willing to take risks. Ironically, large companies often have lots of money to throw around, which is something that startups and academic labs must work very hard to get. For a large company, the hardest resource to find is good personnel. Startups have an easier time attracting personnel but it’s still not cheap. Personnel, however, is very cheap plentiful and relatively cheap for academia (one panelist described graduate school as “the last bastion of indentured servitude”).

I hope the following diagrams (inspired by Indexed) are helpful in illustrating these differences*:

time_spent001

money_personnel

risk_reward

* Diagrams are from a peon’s perspective and not necessarily to scale.

Advertisements

Thoughts on leadership from John Hennessey

Photo by dunechaser on Flickr

Photo by dunechaser on Flickr

John L. Hennessey is the current President of Stanford University and a Professor of Electrical Engineering and Computer Science. Prior to becoming President of the University, he served as the Chair of the Department of Computer Science, Dean of the School of Engineering, and Provost of the University after Condoleeza Rice. As part of a leadership development program geared towards graduate students, Hennessey participated in an interview-style seminar. His love for academic teaching was evident, and he reminisced fondly of his younger days, saying, “being a graduate student and a faculty member – they were some of the happiest times of my life.” These are some of the more interesting thoughts I took away from the interview.

What will graduate education look like in 25 years?

Hennessey believes we must train adults to be adept at living in the 21st century. This means reflecting the fact that there will be no fixed careers – people will shift roles often, which requires different skill sets. He foresees the emergence of new fields of study, much like the field of bioengineering today. The demand for Ph.D.-level graduates in non-academic roles – policy, industry, non-profit, etc – will rise, and the more educated people there are, the better the world will be.

It’s OK to say ‘No’

One thing Hennessey wishes he’d known as a graduate student is that you should never participate in something unless you’re able and willing to do it 100%. This doesn’t just go for students, either – professors would do well not to take on more students than they can handle. Better to do a really good job on a few things, whether it be research projects or mentoring students, than to do a poor job on a lot of things, especially mentoring students.

Photo by pedrosimoes7 on Flickr

Photo by pedrosimoes7 on Flickr

Leadership is NOT management!

A great leader is someone with vision who is able to make difficult decisions. When you’re younger or more junior, leadership often consists of simply forming groups and coming to any outcome; as you go higher up, the outcomes really start to matter, making the decisions more difficult. A great leader drives towards a consensus on difficult problems in the right way. Reminding people of the bigger picture helps to create an environment of shared responsibility.

OK, leadership does require some management

Those in academic labs are probably familiar with the brilliant scientist who is a terrible manager. He or she has great ideas and can really inspire people, but the lab is an administrative mess and being a student there is confusing and stressful. Bottom line? Management is also important. Academics are rarely coached in effective management so in the absence of formal training it’s essential that researchers start out small and build up slowly to practice and expand their managerial skills. It’s a trial by fire, but you can learn a bit by taking courses and following the example of others. In the end, though, the best education is actually doing it.

Some suggestions for tackling day to day personnel and task management:

  • Delegate and empower
  • Prioritize – recognize what is actually important to get done
  • Find ways to add value to otherwise mundane tasks

And finally some general thoughts to keep in mind as you develop your leadership skills:

Go outside your comfort zone, and be willing to fail.

Making tough decisions means you can’t please everybody.

Don’t get bogged down in bureaucracy – find a way to make things work.

Learn from your peers and surround yourself with people who know a little bit more than you do.

Tips and tricks for software engineering in bioinformatics (talk by Joel Dudley)

Photo by ladyada on Flickr

Photo by ladyada on Flickr

Joel Dudley, co-founder of MacResearch and a student in the Stanford Biomedical Informatics program, gave a quick seminar today on how to program effectively for bioinformatics. The main point is that to be a good bioinformatician, you need to build up your toolbox, be aware of what’s out there, and use and integrate existing tools to do more powerful work. To do this, he gives the following suggestions:

1. Learn UNIX. It’s quick, it’s powerful, it’s easy to learn. What often takes several lines to code in a scripting language can usually be reduced to a single line on the command line.

2. Be jack of all trades, but master of ONE. That is, be familiar with most programming languages, but be really good at one of them. In the hierarchy of languages, VB and C are more “primitive” while Ruby and Python are most “advanced” – he recommends starting with one of the more advanced languages if you are new to programming. Out of Ruby and Python, Python will probably give you more bang for your buck, due to the smorgasbord of libraries available and broad acceptance (e.g. academic labs, Google). In addition, there are lots of bridges between languages, such as Jython (Java and Python) and JRuby (Java and Ruby) so expert knowledge of one is usually sufficient for you to make a lot of things work practically everywhere.

3. Don’t reinvent the wheel. “Frameworks are your friends.” Take advantage of large existing projects like BioPython/Perl/Ruby/Java, Django, Rails, etc which contain lots of ready to go code for practically everything. Use the internet to find existing code solutions – e.g. Koders is like a Google search for open source code on the web.

koders

4. Learn one text editor really well. Take your pick of Emacs, vi, or a GUI-based editor like TextMate for Macs. The advantage of emacs and vi is that they will be installed on pretty much any system you come across.

5. “Don’t trust yourself”, i.e. use code versioning. Examples are Subversion, CVS, and git. You can even outsource your code hosting with github. Combine this with project management in GForge.

6. Don’t be afraid to use more than 3 letters to define a variable. Having short variable names won’t make the code run faster. It will, however, make the code more difficult for others (and you, 3 months from now) to understand!

Photo by archeon on Flickr

Photo by archeon on Flickr

7. Balance architecture and accomplishment. You may be tempted to create something that is complete, elegant, and perfectly structured. This will likely be a waste of time. It’s ok to sacrifice a little bit of structure to get something that actually works.

8. Automate documentation. Documentation is necessary, but it’s a pain to write. So come up with a convention for your headers and make it automatic. Use available tools like Doxygen, JavaDoc, and RDoc, many of which are free.

The above are generic for academic-level software engineering. Some tips that more specifically address high-throughput biomedical computing:

9. Kill the flat file (sort of). This is the most common file format used in bioinformatics, but it hardly lends itself efficient computation. A common task we want to do with the file is read in the data and store it keyed so that we can look up specific pieces of the data later. Hate databases? Cringe at SQL? If you can represent your data as key/value pairs, consider using an embeddable database like the open source BerkeleyDB (now licensed by Oracle), which require no administration. If you don’t mind SQL, but hate the administration, SQLite allows you to create embedded, serverless databases. Other options that go beyond the relational database concept are CouchDB (“a distributed, fault-tolerant and schema-free document-oriented database accessible via a RESTful HTTP/JSON API”) and Hypertable (“a high performance distributed data storage system”).

10. New ways to do parallel computing. Determine whether your tasks are loosely coupled (independent) or tightly coupled. Although personal computers and laptops are coming out with more cores, most programs only use one at a time. Find ways to utilize idle cores – e.g. there is a way to do this in R. Think in terms of MapReduce. Take advantage of cloud computing, like Amazon’s EC2. Use platforms like Hadoop and Disco to make parallel computing applications. A cool example of this is Cloudburst-Bio, a massively parallel project for genome assembly from next-generation sequencing that uses MapReduce.

via PCNews

via PCNews

11. Embrace hardware. New (and old) hardware is available that can give you significant speedups in biomedical computation, notably graphical processing units (GPUs) which have been used to accelerate molecular dynamics. Hardware vendors like Nvidia are starting to respond; you can now get GPU workstations like NVidia’s Tesla personal supercomputer offering many 100sX speedup over traditional workstations. So if you don’t want to utilize the cloud, you can get an affordable and powerful cluster that fits on top of your desk. Aside from GPUs, there are field programmable gate arrays – chips you can program after manufacturing.

12. Playing nice with others. Think a bit about data exchange formats – but definitely use them! Suggestions are JSON, YAML, and, of course, XML. When working in teams, use an “agile software development” strategy – mainly many fast iterations of the specification-development-feedback cycle. Use tools to automate the development process, such as unit testing and the granddaddy, “make“. Tools like BaseCamp (and perhaps Science 2.0 versions like Laboratree) can help with the more general project management aspects.

————————————————-

In summary:

Focus on the goal (biology or medicine).
Don’t be clever (you’ll trick yourself).
Value your time.
Outsource everything but genius.
Use tools available to you.
And have fun. ;)

Many thanks to Joel for the tips! He mentioned uploading the presentation to Slideshare so I’ll include a link to the slides once they’re up.

Update: slides for Joel’s presentation are up on Slideshare.

Update 2/27/09: Episode #13 of the Coast 2 Coast Bio podcast discusses these points in much more depth. Thanks, Deepak and Hari!

Five thoughts on innovation

Photo by annais on Flickr

Photo by annais on Flickr

Another very interesting panel at the retreat I went to this week covered the topic of innovation: how do we generate ideas and see them through successfully?

Among the panelists were successful PIs, an NSF CAREER award recipient, and entrepreneurs (not mutually exclusive), and each one gave a personal reflection on what it means to be innovative and how one nurtures innovation, followed by an open forum. I can’t do the very excellent discussion justice so I won’t even attempt to give a full summary, but will instead mention some of the points that made the strongest impression on me.

i

1. If innovation is the engine, then passion is the fuel.

For entrepreneurs and scientists alike, being creative and innovative is necessary for success, generating the ideas that keep the enterprise – whether it be business or research oriented – running. But it’s incredibly difficult to be innovative on problems you’re not passionate about. So be aware of what your own interests are, and let yourself nurture them.

2. “It’s better to be lucky than smart” + “Fortune favors the prepared”

Ideas often come when you least expect them, but they don’t really come out of the blue. Ideas come from the topics you think about, so let yourself grapple with the problems that ignite your curiosity, wrestle with them, play around with them. Then, your subconscious will pick up when you stop thinking about it, making connections that surface unexpectedly. Changing things up can often help trigger new connections, so look for new environments that might stimulate different areas of your brain.

Preparation isn’t just mental, either. Keep a notebook with you whenever you can so you can jot down ideas or thoughts that come to you. Don’t count on yourself remembering it later!

Photo by poportis on Flickr

Photo by poportis on Flickr

3. Have a million ideas and throw out the bad ones.

This is actually a paraphrase of Linus Pauling‘s quote, “If you want to have good ideas you must have many ideas. Most of them will be wrong, and what you have to learn is which ones to throw away.” Just as most projects will fail, most ideas won’t work, so the key is to have as many ideas as you can, and then figure out which ones are worth pursuing.

Photo by pakgwei on Flickr

Photo by pakgwei on Flickr

Experience, as well as sharing your ideas with others and getting critical feedback, will help you separate the good from the bad. There’s definitely a balance you need to strike between recognizing when an idea should be retired and having “finitiative” – the initiative to finish. Stay open-minded, listen to people, but if you have a real gut feeling about something, don’t be afraid to go with it. Some people have had to wait ten years before others validated their ideas.

4. Don’t be afraid of 80%.

Many computational scientists are familiar with the phenomenon where, to solve a particular problem, we produce more and more variations on algorithms with increasingly smaller gains in performance. (Good examples are speech recognition and other AI type tasks.) If anyone tried to publish an algorithm that only got 80% accuracy when all its contemporaries had, say, accuracies around 95%, they’d probably get dismissed by everyone else rather quickly. This is unfortunate, because while all the current methods are probably variations on the same theme, the new algorithm could be doing something truly different. So don’t judge something simply by how it compares to others. Something that is really innovative will likely seem inferior on first glance or first implementation.

5. Never underestimate the power of a good shower.

Almost all the panelists agreed – some of their best ideas came while they were in the shower. So get cleanin’, and start dreamin’!

——————————

Addendum: I brought up this point at the very end of the panel discussion because so far everything I’d heard indicated a one way flow of information – from outside to inside. But given the company I’ve started keeping, I now always think of information flow as two-way. You can certainly learn from others if you take seminars, ask questions, etc, but you will learn even more if you share your ideas with others. So the last uber-thought I want to add is:

*. “I’ve never had an idea that couldn’t be improved by sharing it with as many people as possible.”

This quote by Bill Hooker in his seminal essay on Open Science pretty much sums it up. After I mentioned this, one panelist told a story of how he might never have pursued the research field in which he is now a leader if he hadn’t found a supportive community of peers that understood what he was doing and could give him constructive feedback and encouragement; his academic career up until that point had been extremely lonely. So it’s important not only to find people who will support you, but people who you can bounce your ideas off of. For this, I find the FriendFeed community, and social web tools in general, to be a wonderful resource.

Some very academic advice

For many graduate students, the question of what to do next is a hard one. Do I want to do research? Teach? Get involved with business? Something completely different? What kind of lifestyle do I value and what are my priorities? Let’s say you have some idea of the answers to these questions, and have decided you want to pursue academic jobs. Then what?

Even though I’m fairly certain academia is not in my future, I found a panel discussion at a department retreat this week centered on these questions to be very informative. The panel was geared towards Ph.D. students thinking about either post-doc or faculty positions. Something to note is that while a post-doc, if not mandatory, is at least heavily encouraged in the life sciences if you are pursuing a career in academia, but is not necessarily required for fields like bioinformatics or computer science – that is, you can be hired as an assistant professor straight out of your Ph.D. This discussion covered both post-doc and faculty applications.

Each panel member was assigned one of the following topics to cover in 5-10 minutes of podium time:

  • Preparing your CV and choosing where to apply
  • Crafting a compelling teaching statement
  • Crafting a compelling research statement
  • The job talk

Each panelist fielded some questions from the audience during their talk but there was also an open discussion afterward. Below, I’ll try to summarize the main points each panelist made and then provide my own thoughts.

1. Preparing your CV and choosing where to apply.

This portion was heavily tailored towards biomedical informatics and towards our institution, which has a certain template for the CV, but I would guess that the main content is applicable at least to other biomedical informatics type people.

Structure of the CV:

  1. Name and contact information
  2. Education, most recent first. Your graduate training should include the title of your dissertation and the name(s) of your advisor(s).
  3. Relevant work experience, e.g. any appointments if applicable, teaching positions, industry positions, etc.
  4. Publications in peer-reviewed archival journals. Conference papers go here if they were peer-reviewed.
  5. Invited talks. If someone paid for you to come give the talk, it probably counts. These are great because it shows other people are interested in you or your work.
  6. Other publications, such as conference papers, book chapters, and popular articles.
  7. Service and leadership. This includes participation on review boards, program committees, organizing committees,
  8. Awards and honors – fellowships, grants, and anything that was competitive in nature. Best Poster award? Check. Travel funding award? Check. Genius grant? Check.
  9. Other activities. Whatever you spend a significant amount of time on outside of your research and academics. If you’ve earned any distinctions in those activities, be sure to list them.
  10. Membership in professional societies or organizations, e.g. ISCB, AAAS, ACS, [insert acronym here].

Most people who review your CV will look first at sections 1, 2, and 4. Your publications – the titles, the number, and where they’re published – are likely to be what determine whether they look at the rest of the CV (for better or for worse). Then the rest of your CV is what will distinguish you from the rest of the pile.

Photo by SOCIALisBETTER on Flickr

Photo by SOCIALisBETTER on Flickr

Regarding where to apply, there were a few major points:

  • If you are applying for a post-doc, apply to places where you can learn something new, ideally at a different institution. Learning something new shows that you’re not a one trick pony. Going somewhere else shows that your success is due to YOU and not your graduate advisor.
  • Think about what you want to do and what your priorities are. Whether you prefer research or teaching or both will determine what kind of schools you should look for. Then, of course, there are considerations such as location, whether you have a significant other that factors into the equation, etc.
  • It’s similar to college or graduate admissions – you have your “reach” schools, your “target” schools, and your “safety schools”. Your target and reach schools should be institutions at the same level as your current one.
  • Don’t compare theoretical job offers, only actual ones. It’s pointless to spend time debating over job offers you don’t even have yet, so just apply to whatever you think you might be interested in. When you have offers, then you should spend time thinking about them.

And, one of the most important things to do is to leverage your network, in particular when applying for post-docs. Cold-calling a professor almost never works, so ask your advisors and your committee members to make some phone calls on your behalf – this is part of their job!

2. Crafting a compelling teaching statement.

If teaching is what you want to do, this statement is critical. First, you need to demonstrate your commitment to teaching. This means getting started as soon as possible actually teaching, whether it be tutoring, organizing workshops or talks, TA-ing and giving lectures, even writing review articles – whatever shows that you have engaged in activities requiring you to synthesize a lot of material and explain complicated concepts succinctly and effectively. Whenever possible, seek evaluative feedback from your audience so that you can figure out what you’re doing well and what you need to do better.

Photo by foundphotoslj on Flickr

Photo by foundphotoslj on Flickr

As part of this, you need to make it obvious why you want to teach. What motivates you? Why is teaching important to you? Maybe it’s the satisfaction you get seeing someone improve their understanding due to your efforts. Whatever it is, use actual experiences to make your motivations concrete.

The second part is showing that you are a successful teacher. This could be through examples of how you helped people learn, through being invited to give lectures or talks, through awards or honors, etc. Videos were not mentioned but I can imagine these being useful material for the committee reviewing your application. SciVee or even JoVE, if you can publish there, could come in handy for this.

Even if you are applying to a research-heavy institution, you usually still need to include a teaching statement, it might just be shorter.

3. Crafting a compelling research statement.

What problems are you trying to solve and why? Again, they want to hear your motivations. What have you done so far to solve these problems and what are your research plans for the future? You need to convince them not only that you have a track record of investigating difficult and important questions, but that you have ideas you will explore successfully if they hire you. In essence, they want to hear what your first few grants will be.

Then, make sure you are describing everything for a broad scientific audience – don’t assume that everyone on your committee has an intimate knowledge of your problem area. You should also be able to step back and provide an overall vision for your work, placing it in context and recognizing its impact. With all things written, keep it as short as possible while still getting your points across! (This goes for the teaching statement, too.)

4. The job talk

So let’s say they liked your CV, your teaching and research statements, and your recommendation letters. At this point, they will invite you to visit them for a day or two, wherein you will give a talk and meet with multiple faculty and probably some students.

The #1 rule: Know your audience. The more you know about who you’re presenting for, the better you can plan for the scope and content of your talk and the better prepared you will be for questions that come up.

#2: You must know your work and your talk so well that you can adapt your talk on the fly and interact with the audience effectively. A necessary skill is the ability to field questions with confidence, grace, and the appropriate amount of humility. Most people don’t do this well and it is primarily a social skill. Repeating the question is often a good idea, both to clarify, give you time to think, and, in some cases, clue the rest of the audience in to the situation if the question happens to be ridiculous. When someone takes issue with your work, you want to empower the questioner while still giving yourself and your work sufficient credit. Agree with whatever aspect of the question is accurate, but then give your perspective on why the rest of the statement may not be accurate. (If the questioner is insistent, it is often effective to express your interest in their statement and ask if you could discuss it with them afterward. Then, follow up! They might actually have a good point.)

Photo by psd on Flickr

Photo by psd on Flickr

#3: Be prepared to give a “chalk talk” – that is, a talk without slides or notes. This may happen as part of a second interview with a smaller audience, where the main goal is to be able to have a frank discussion with you about your research and your intended plans. Here, as with your research statement, it’s important to have a good idea of what your first two grants will be, to the detail of the specific aims.

Other thoughts: Start looking 6-12 months before you want to start the new job. Try to schedule interviews at “safety” schools before the ones you really care about. You’re going to make mistakes, and you want to make them early and learn from them. Go to job talks at your own school, to see how they’re done and to learn. Practice answering questions with friends and colleagues about your work. Have a 1 minute elevator pitch about your work and also a phrase or slogan that others can remember (e.g. if your work is on imaging informatics, it could be “so many images, so little time”). Also, letters of recommendation are extremely important. Things like giving back to your department or your field, being a good colleague, volunteering your time for scientific pursuits, showing initiative – these types of things will motivate your references to write you glowing letters of recommendation.

———————————

Many of these tips are useful for non-academic job applications as well, such as having an elevator pitch, being able to answer questions well, and engaging in activities that highlight your leadership and initiative. Something that is good to keep in mind is that once you’ve cleared one hurdle, it doesn’t matter by how much. So if you got a first round interview, don’t worry about how your CV might have stacked up to the others – focus on doing well on your job talk and interviews because that is all that matters now. If you make it to the second round, don’t worry about what might have happened in the first round. And once you get a job offer, you’re in control – they are committed to their choice and will do what they can to get you to accept. So learn from your mistakes, but keep looking forward!

How to write a bioinformatics research paper

A while ago, I posted my advisor’s take on the anatomy of a Ph.D. thesis. This post will be similar, except it tackles another sometimes intimidating task faced by graduate students – writing a paper. Again, this is my advisor’s take on the process, and given his experience and panache I’m inclined to agree with it; however, I myself don’t tend to follow guidelines terribly well so I’ll just say that this is a somewhat idealized process that usually becomes messier in practice. The hardest part is getting started, though, and then having a guideline is great for motivation. Though this example is geared towards life sciences/biomedicine/bioinformatics type papers, much of it should still be applicable to other fields.

Before you write: create an “elevator pitch”

Formulate a one liner describing the main point of the paper. It should be specific, interesting, and related to how the paper contributes to the field. (A one liner is good to have in your pocket for a lot of other things too, to answer questions like “What do you do / what is [bioinformatics]?” and “What is your research on?”)

Anatomy of a paper (not necessarily in the order you’ll write)

Abstract (7-10 sentences)

The abstract presents the logic of the paper. Because this is the first and often only part of the paper most people will read, it is extremely important to write it well.

  • Sentence 1: Describe the important unsolved problem.
  • Sentence 2: Emphasize the challenge/unsolvedness. (For grants and certain papers, this is known as the “people dying” sentence.)
  • Sentence 3: Describe the critical sub-problem of interest.
  • Sentence 4: Describe the opportunity presented. (This is the “sense of hope” sentence.)
  • Sentences 5-6: Briefly summarize the methods.
  • Sentences 7-8: Briefly summarize the results, including a few exact numbers or findings.
  • Sentence 9: Describe the specific contribution this research makes to the field.

Introduction

This is essentially a scholarly review of previous work, which involves scouring the literature to present the background to your problem and any relevant research that has been done previously by you or others. You should provide a deep analysis of the overall problem and challenges, describe other approaches to the problem besides yours, describe the problems still unsolved, and what potential there is to solve them (aka your research). Provide a very brief overview of what you will be describing in the methods and telegraph your most important results.

Be generous with references – you can never be too thorough with your review. And, the folks you reference may very well end up reviewing your paper. Just don’t inflate your reference list with irrelevant work.

(Materials and) Methods

This is what you did, but not necessarily in the order you did it. You should avoid a historical recounting (this is what your lab notes are for) and instead present the final process you followed that would produce your results, in a logical fashion. For example, you might first write about your data sets, any pre-processing on that data, then the specific algorithms you used to manipulate that data, and then the evaluation and analysis of that data.

Importantly, you should never apologize for anything in your Methods sections – just report it, and save the justification/explanation for the Discussion section. Your goal is essentially to explain where the figures came from. Above all, DO NOT INCLUDE RESULTS IN YOUR METHODS SECTION.

Results

This section can be thought of as one big caption for all of your figures. Keep it concise, make sure to refer to your figures and tables whenever relevant, and try to have the flow mirror the flow of the Methods section. Above all, DO NOT DISCUSS YOUR RESULTS. Just report the results and save the whys and maybes for the Discussion.

Figures and Tables

These are the meat of your paper, and often the only other thing that people will look at besides the abstract. It therefore behooves you to spend time making them clear, useful, and aesthetically pleasing. They should be uncluttered but easy to interpret – make sure your axes and relevant data points are labeled and provide a legend if there are multiple types of data. You should also be sure that your figures can be read in grayscale as well, as not everyone prints in color (though the web makes this less of a problem).

Start designing your figures and tables early as they are your results. See which ones are absolutely necessary and try to go for maximum information with minimum wasted space / reader effort. And as far as looks go, even just ditching the Excel defaults is already a huge improvement.

For captions, you want to go for thought control. Bring the reader’s attention to what you want them to see and use a little spin (e.g. “X clearly outperforms Y…”). Keep it honest, though, of course.

Discussion

For your discussion, start by listing the key points. Describe the positive aspects of your results first, and then make admission of the negatives with a little bit of spin to paint them in a more positive light – nothing dishonest here, just a discussion of some of the possible reasons for these negatives or why they might not be as negative as first thought. Make sure to justify or explain any controversial or unorthodox choices you may have made in your methods or analysis. At the end you may provide a brief positive summary of the work and a reflection on future work in this area (this can also be used as a Conclusion if applicable).

The writing process

See the graphic at the right for a rough sketch of what the writing process might look like. Basically you want to compose your one liner first, followed by the abstract and the bulk of your figures and tables. This will give you much of the content of the paper as well as the logical flow of the paper. After the methods and results it gets more iterative, and you might find yourself going back to the methods or the results to check up on certain findings or to do additional analysis, which might then change the slant you take in your discussion, etc.

When writing your first draft, try to write it as quickly as possible while addressing the goals of each section. The idea is that you will rewrite it, but for now you want to have as many of your thoughts down on paper as possible.

When revising, look at every sentence both in isolation and relative to previous and subsequent sentences. The former is to ensure the sentence has clarity and voice; the latter is to ensure that the flow and logic of the paper is intact. Use transitional phrases where appropriate to guide the reader through this flow.

Stylistically, you want to use only active voice (“We designed a study” vs “a study was designed”), use “we” as opposed to “I”, and avoid colloquial phrases. Avoid repetition as well, unless it is deliberate. The occasional passive can be acceptable if it breaks up repetition.

Summary

Get the point and logic of your paper down first by writing the one-liner and abstract, and then draft the rest as quickly as possible. Make sure you have most of the figures/tables you plan to have before you start writing since these are the meat of your paper. And then revise, revise, revise. Use active voice and pay attention to the overall flow of the paper. Don’t be afraid to make the writing interesting – you’ll make it that much more enjoyable for your reviewers and readers.

Obviously, you should have someone proofreading your manuscript for technical details, but I highly recommend getting a friend or colleague who is a good writer – or at least a native English speaker, if the paper is written in English – to proofread as well and offer grammatical or stylistic advice.

Anatomy of a Ph.D. thesis

Let’s face it: life is complicated. But thanks to the ever-flourishing DIY industry (for example, WikiHow), a lot of endeavors that used to seem complicated are made much less so through step by step instructions. In science, experimental protocols already do this, at least in theory, but what about other aspects of science, like writing papers, keeping up with literature, making presentations, or networking at conferences? My advisor has given informal talks for his students on a number of these topics, the latest of which was a set of general guidelines for writing a Ph.D. thesis.

In order of their appearance in the final document…

  • Chapter 1 – Introduction. This is essentially an executive summary. You should briefly describe all contributions your thesis makes to your field, provide at least one “gee-whiz” result, and lay out a roadmap for the rest of the thesis (“In Chapter 2, I present the background… In Chapter 3, I discuss my work on X….”). It is acceptable to make claims without proof, since you will be defending these later on.
  • Chapter 2 – Background. This is essentially a literature review, and demonstrates your understanding of the field and the context surrounding your work. For bioinformatics theses, this covers both the biomedical domain and the area of informatics or computation your work involves. You should present an intellectual framework in which your work fits – what has been done, the advantages and limitations of this previous work, the potential avenues for improvement, and where you come in. Ideally, this chapter could be published as a review article with very little modification.

The next couple chapters are the meat of the thesis, and can take at least two forms depending on what kind of work you did during your Ph.D. If you worked on several somewhat disjoint projects and published 2 or 3 papers on them, you can write one chapter for each paper (but no more than 3). If you worked on just one problem, you are probably better off writing a chapter for the methods and a chapter for the results and discussion (if you developed two approaches for the same problem, you can repeat this for the second approach). So:

GENERAL THEME: Several projects

  • Chapter 3 – Methods, Results, and Discussion from paper 1
  • Chapter 4 – Methods, Results, and Discussion from paper 2
  • (Chapter 5 – Methods, Results, and Discussion from paper 3, if applicable)

FOCUSED THEME: Single project

  • Chapter 3 – Methods
  • Chapter 4 – Results/Discussion
  • (Chapters 5 and 6 – Methods and Results/Discussion for approach 2, if applicable)

In general, you do not want to reuse text from your published papers verbatim, despite how tempting this can be. Papers are very strict and limit what you can express, so you should see your thesis as an opportunity to pontificate and give voice to your ideas. You should also form your thesis into a detailed guide of everything you tried, even some of the things that didn’t work, so that it can be a reference to future generations of grad students who may pursue extensions of your research.

  • Chapter 6 or 7, depending on type of thesis – Summary chapter. Describe overall contributions to the relevant domains. (For biomedical informatics theses, describe the overall contributions to biology or medicine, and the overall contributions to informatics or engineering. If applicable, you may also describe core contributions to computer science.) Here is where you also discuss the limitations of the work, the unsolved problems, and your best ideas for how to solve them.
  • Appendices – supplementary material. Almost anything goes, but you should definitely include all key data and datasets (information needed to recreate the major results from your thesis). Ideally, all data relevant to your thesis (and other related work, if possible) will be stored and/or made available either on the web or as a physical copy, though this is mostly for the advisor as a reference for future students. If you have any proofs or supplementary material, these should be in an appendix. You can also include additional work or papers published unrelated to your thesis.

So that explained what each chapter of the thesis should be about; what about actually writing the thesis? My advisor’s recommendation is to start with the meat chapters (Ch. 3 – 6/7) since you should have pretty much all the necessary material to begin with, then write Chapter 2, then write the first and last chapters.

More specific advice on how to actually write each chapter was not covered and probably warrants its own post. Note that this is my advisor’s take on the Ph.D. thesis; I’m sure there are some other interpretations, which would be interesting to hear! How much does the thesis vary by field?