Get it while it’s hot! 23andMe for $99

You may have already heard the rumors floating around and they’re all true: 23andMe is having another sale — the whole thing for $99!

Edit: No discount codes are needed. There’s an instant discount of $400 off due to the Black Friday+ sale, which will go until Christmas Monday 11/29 or while supplies last. Head on over to the store, or read on for a bit more info.

Read more of this post

Reflections on ASHG 2010

As conferences go, the American Society of Human Genetics (ASHG) annual meeting is a pretty big deal. Anyone who’s anyone in human genetics is there, and if you want to be someone you better be there, too. And it’s big — this year’s meeting saw more than 6,000 attendees spread throughout a gigantic convention center that spanned four square blocks in the heart of Washington, D.C. Academics, publishers, clinicians, policy wonks, and industry reps staked out their territory among an endless sea of posters, eye-popping demo booths, and cavernous session halls. The international meeting for bioinformatics that I’ve gone to the past seemed quaint by comparison.

At bioinformatics conferences, the common theme is computational methods, applied to a wide variety of topics. At a conference like ASHG, the common theme is human genetics, probed and interpreted with a variety of methods. But even the topic is breathtakingly broad. Sessions covered complex disease, non-coding RNAs, methylation, ethical/social/legal/education issues surrounding genomic research and genetic testing, mouse models, high-throughput sequencing, population and evolutionary genetics, pharmacogenetics, cilia, computational methods, and Mendelian disorders, to name just a few.

I made my first visit to ASHG this year as part of a small contingent from 23andMe*, a direct-to-consumer genomics company. Although I missed a good portion of the conference due to my schedule, some of my colleagues took notes on sessions that I missed, and ample coverage of many of the sessions could be had by following the Twitter hashtag #ashg2010. The following summaries and reflections represent a composite of tweets, other people’s notes, and my personal notes and impressions.
Read more of this post

No comment

At the risk of beating the issue to death, I offer yet another post on the question, “why don’t scientists comment on scientific articles?” Previous reflections stood within the larger context of scientific impact and article-level metrics, and I’ve also attempted some superficial analysis of commenting behavior at PLoS, BMJ, and BMC. More recently (and this is why the topic is on my mind again), a room full of bright minds at the PLoS Forum (including Cameron Neylon and Jon Eisen) scratched their heads over it and came up with pretty much the same conclusion as everyone else who’s ever thought about the problem — the costs simply outweigh the benefits.

The costs, in principle, are minimal. You might need to register for an account at the journal website and be logged on, but then all that’s needed is little more than what most of us already do multiple times a day with our email — type into a box and click “submit”. (In practice, there may be nonsensical, hidden costs that make you wonder what the folks at those journals were smoking.) So the perception that the cost-benefit equation doesn’t work speaks more to the lack of benefit than anything else.

Photo by jamesclay on flickr

Read more of this post

A brief analysis of commenting at BMC, PLoS, and BMJ

As announced on FriendFeed and Twitter, a writing collaboration between me and the inimitable Cameron Neylon has just been published at PLoS Biology, “Article-level metrics and the evolution of scientific impact”! (Loosely based on a blog post from several months ago.)

One of the many issues Cameron and I touched on was the problem of commenting. Most people probably aren’t aware of the problem; after all, commenting is alive and well on the internet in most places you look! But click over to PLoS or BioMed Central (BMC) and the comment sections are the digital equivalent of rolling tumbleweed.

As we mention briefly in the article, comments have great potential for improving science. For one thing, they’re a form of peer review, but without the month-long wait and seemingly arbitrary review criteria. Readers, authors, and other evaluators can also get a sense of what people think about the article. The ideal is certainly tantalizing — vigorous, rigorous debates over the finer scientific points as well as the overarching conclusions with participation both from experts in the field as well as informed laypeople, always with intelligence and civility!!!1!11!!one!! But let’s not kid ourselves — the worst-case scenario is all too easy to imagine and would probably look something like the discussions over at YouTube.

And this would be positively urbane. (From PhD comics)

Read more of this post

In memoriam: Warren DeLano



PyMOL has starred in many journal covers

On Tuesday, November 3rd, the scientific community suffered a great loss with the passing of Warren DeLano. Most people know him as the creator of PyMOL, a popular and extremely powerful molecular visualization tool, but most – including myself, until recently – may not know all of the other unique qualities that made Warren a mentor, collaborator, inspiration and friend to many. And by making PyMOL open source, Warren demonstrated his generosity and ensured that his work would continue to help future generations of scientists.
Read more of this post

Scripts and hacks for curation

When you curate scientific literature, there are lots of little tasks and procedures and requirements that can all add up into a big inefficient mess if they’re not integrated very well into your workflow. Some of these things are in-house and so you have some level of control over how they are handled (depending, sometimes, on engineering or management), while for others you are at the mercy of the wilderness that is science. It would be great, for example, if all abstracts contained enough information for us to evaluate whether the rest of the article is worth reading – or, in many cases, worth buying in order to read it.

Photo by hbart on Flickr

Photo by hbart on Flickr

In the context of genetic association literature, it would also be great if all abstracts used standard database identifiers for SNPs (i.e. rs #s), as this unambiguously defines the variant; used standard means of reporting the association (e.g. odds ratios with 95% confidence intervals and p-values); and mentioned pertinent aspects of the study, such as the population, number of cases and controls, and adjustments for confounders and multiple comparisons. For me, this qualifies as “enough information to evaluate whether the rest of the article is relevant.” I hope that when they do not mention things like the corrected p-values, it is not because their p-values were not significant. When articles cost up to $90 a pop… well, let’s not get me started.

But I digress. The point is that there are lots of things that could make curation a challenge, and consequently there are lots of things that could make curation easier. Standardization of abstracts, while it doesn’t make for juicy reading, makes going through high volumes of abstracts easier (and machine-accessible). Linking articles from journal websites to PubMed would also be useful, as PubMed serves as a portal to many other resources. Currently, almost all journals use digital object identifiers (DOIs), which are unique pointers to objects on the web. But PubMed IDs (PMIDs), like digital object identifiers (DOIs), are a little simpler, and provide a lot of useful functionality through integration with NCBI’s many databases. You can imagine all the little scripts and hacks you could come up with to improve the curation process, using greasemonkey scripts on Firefox, bookmarklets on any browser, and even web apps.

One somewhat mundane task we often have to do is search for a paper on PubMed to get the PMID. This is straightforward given we already know the authors, title, journal, etc, but still kind of a pain. Fortunately, PubMed allows you to search by DOI, which almost all publishers provide. So a slight improvement is to use the DOI as the search term in PubMed, as this will return the exact result if the DOI exists in the database. But you still have to open up a new browser window or navigate to PubMed and copy and paste the DOI into the search bar. To reduce the number of steps even further, we can use a simple bookmarklet containing a bit of javascript (if it looks cut off, you can still double-click copy and paste it):


This script extracts whatever text you’ve highlighted on a page and attempts to search PubMed using it as the DOI. So obviously it will only work if you’ve highlighted something and that something is a DOI, and that DOI is in PubMed. But assuming you do and it is, it will send you directly to the PubMed entry for that paper. Save the script as a browser bookmark, put the bookmark in your bookmarks bar, and whenever you’re on an article webpage (or RSS feed) and want to see the PubMed entry for that article, just highlight the DOI and click the bookmark. (Cameron wrote up a pipe on Yahoo!Pipes a while ago that does something similar, which inspired this bookmarklet.)

Clearly even this simple hack can be improved – it would be nice, perhaps, to have it return the PMID in an alert box so you can make a note and then continue doing whatever you were doing, rather than being sent away to PubMed (this might make use of AJAX?). It would be nice if you didn’t have to highlight, but the script would look for and extract the DOI from the page automatically. And I’m sure you could add even more bells and whistles, within reason.

My latest hackneyed…. “hack-need”… is to be able to identify follow-up studies for a particular genetic association. If you read a paper with PMID X saying SNP A is significantly associated with a disease, it would be really useful to know when future studies look into that association and either replicate or contradict the finding. Hopefully when they do so, they cite PMID X and/or mention SNP A. Essentially, I’d like to query PubMed for papers that cite a given PMID or SNP (via rs #). Ideally, I could do this in batch for many PMIDs and many SNPs automatically, and have each query return only results that are newer than the previous query (or query date). Then I set the script running behind the scenes, process the results using another script, and maybe have it send me an email with a list of new PMIDs to look into every week. Can world domination be far behind?*

Seriously though, I am looking for tips on how to do this follow-up identification thing, so any help appreciated. Pierre has given me some useful hints for how to search PubMed for papers citing a given rs #, and it would be great if this could be modified with dates:

(insert your favorite rs # as the id)
Update: reldate limits results to those within a number of days immediately preceding today’s date; could also use mindate and maxdate to specify a date range.

* Why stop there, you might ask? I could write a script that downloads the abstracts, “reads” them, filters out the irrelevant ones, summarizes the important information, and populates curation reports. But then I’d be out of a bloody job…

New job and curation 101

It’s been several weeks now since I started working at 23andMe, a personal genomics company located in Mountain View, CA. Perhaps not coincidentally, it’s also been several weeks since I last blogged. The transition hasn’t been difficult, but it did take some getting used to, mentally and physically. I mean, leaving for work by 8:30am? Regular hours? Commuting??

Ok, so I really have nothing to complain about. 8:30 isn’t that early, and I could shave half an hour off each end of my commute if I didn’t choose to take advantage of bike-friendly roads, good weather, and a company-sponsored free train pass (OMG benefits!?). All in all, things are pretty much fantastic. The work environment is friendly, flexible, and laid-back; we have plenty of food and drink to keep us fueled throughout the day, and regular workouts/yoga if we need to get fired up or mellowed down (and to keep the “Free Food 15” at bay). Plus, personal genomics is a super interesting and rapidly evolving industry, so there’s really never a dull moment.

So what is personal genomics, anyway? We’ve known for a while that genetics – the sequence of DNA inside our cells – plays an important role in our form and functioning. Many diseases are caused by changes in DNA (often in genes, parts of DNA that code for proteins) that alter the normal functioning of cells, though not all genetic differences lead to negative changes. (Genetics can also tell us about ancestry – who is related to whom and the history of populations – but I won’t be addressing that in this post.) Where it gets personal is when you apply it to individuals, such as when someone gets a genetic test to determine whether they have or are at risk of developing or passing on a particular disease. Where it gets genomics is when we use high-throughput technologies to do what is essentially thousands of genetics tests at once. Put them together, and you get personal genomics.

How do we know what genetic “pieces” correspond to what conditions or diseases? The general strategy is to compare the DNA of a whole bunch of individuals that have that condition (cases) to a whole bunch of individuals that don’t (controls). As long as both groups are similar save for their case-control status, any significant genetic differences between them should have something to do with that condition. We call this a genetic association.

It turns out that there are millions of single locations in the human genome where the exact sequence of the DNA might differ between two people, and these places, called single nucleotide polymorphisms, or SNPs, can contribute to differences we can observe, such as whether you flush when you drink alcohol or how easily you put on weight. 23andMe personal genomics kit determines what your sequence is for a representative subset of SNPs. Many are already known to be associated with certain conditions, and new research is being done every day to uncover more and more of these associations.

So what exactly do I do at 23andMe? My official job title is “Scientist, Content Curation”. Curation, I’ve found, is not very familiar to most people. Most people probably know that there is such a thing as a museum curator, but might not know what they do. Hardly anyone has ever heard of scientific curation. (And I thought explaining what I was studying as a grad student was hard! Biomedical informatics, anyone?)

But it’s really not that complicated. The essence of curation is almost always the same: the selection, acquisition, and management of content. What that content is differs depending on the field – for example, an art curator might look for and organize artwork for exhibition in a gallery, while a curator in the “Ancient Civilizations” department of a museum may be in charge of acquiring, managing, and presenting archaeological artifacts.

In science, curation involves organization of scientific knowledge and data. An area where this has been especially important is the life sciences, as the amount of information being generated by high-throughput experiments, large-scale projects, and scholarly publishing has skyrocketed. In order to manage this information and render it useful to others, the field of biocuration was born. Any database that organizes scientific knowledge – UniProt (the Universal Protein resource), FlyBase (database for that very important model organism, Drosophila), PharmGKB (a database focused on how genes and drugs interact), etc – depends on curators to keep the information up to date and easy to use.

And so it is with 23andMe. The genetic testing kit is one part of the product, but the other part is information – what knowledge is there about associations between the SNPs on our platform and health traits or conditions? What does your particular data mean? The science is far from exhausted on this subject, and in order to stay up to date with the research, 23andMe spends a lot of effort on curating the scientific literature for new genetic associations and presenting the information on our website for our customers.

Day to day, this means that we keep track of papers recently published in scientific journals, skim through to find ones that may have promising findings, and then vet these more thoroughly to see if they pass our stringent scientific standards. If they do, we extract the bits of information we need and put the bits together in reports that will eventually become part of the content on the website. It’s a job that definitely benefits from an organized system and an eye for detail – as well as a sense of curiosity.

After three weeks on the job, I think I’m starting to get the hang of the day to day work. Since my work is even more directly tied to the literature than it was as a graduate student in academia, I’m also developing an enhanced awareness of issues surrounding scientific publishing – those related to standardization and metadata, publication bias towards positive results, and closed vs. open access. The hardest aspect of transitioning from academia to industry hasn’t been the regular schedule, or the work environment, or the work itself, it’s been getting used to being on the other side of the pay-wall of scientific journals.

But that’s a rant for another time. ;)

How personal genomics is rocking the boat

I’ve been doing some reading on personal genomics, direct-to-consumer genetic tests, and personalized medicine lately, in an effort to steep myself in the science and issues prior to starting work in this field. Today, I read an opinion piece by R. J. Carlson titled “The disruptive nature of personalized medicine technologies: implications for the health care system,” [1] that was especially interesting. Rather than expound on the usual arguments for or against consumer genomics, it laid out several important areas where personalized medicine and genomics technologies would disrupt the current system, often with brutal honesty.

Clearly, one of these areas is private health insurance. Describing private health insurance as “a hybrid of economic ruthlessness and utilitarian social policy … is supposed to perform the social policy role that the public sector can’t or won’t, and
that is to ration,” Carlson points out some sobering scenarios. One is that the Genetic Information Non-discrimination Act (GINA) covers only the underwriting process, and does not guard against denial of coverage or steep increases in premiums once a genetically-suggested condition manifests. Another is the moral and social dilemma posed by the knowledge – on either side – that those with “demonstrably superior health” are subsidizing care for those with “known genetic risk”. And, given the increasing knowledge we’ll have about health risks, it would be ridiculous not to use any of it in designing insurance packages. Carlson doesn’t paint this as a negative thing, necessarily, but instead calls on public policy to “facilitate the constructive uses of these data by shaping financial and access reforms to the genomics medicine that is arriving.”

The debate over health insurance is fairly familiar, however. What Carlson makes very clear in the rest of the piece is that personal genomics takes medicine in a fundamentally different direction than where it has been going for the last half century. Traditional modern medicine has focused on mechanism and reductionism, finding what’s wrong and fixing it, and applying that knowledge to new cases of the same thing. We use the fact that humans are more or less similar to enact standards of care.

But personalized medicine focuses on the differences between people and treats every patient as a unique case. This leads to two natural consequences: it makes medical care more costly, and it renders the standardization of medical practice obsolete, if not impossible. Of course, personalized medicine could conceivably be more cost-effective through better preventative care, but this is only if significant effort goes towards realizing this potential. And although I hadn’t thought about personal genomics in the context of evidence-based medicine, it’s not hard to see the conflict:

There’s the rub: to be effective, a personalized medicine must build on our ever more definitive differences, defying standardization for the very long haul, if ever. Measuring quality in health care under a genomics model is crudely analogous to measuring automobile fuel efficiency when every automobile is assembled from a wide array of materially different but functionally interchangeable parts, performs differently on every trip, and changes in performance with the moods and capacities of every driver.

This article captures the nuances of some very interesting challenges facing health care in response to genomics technologies with a view that is both realistic and optimistic. Carlson recognizes that the era of medical paternalism is giving way to democratization of health information, and we must adapt our policies to reflect this. Indeed, he argues that without active and careful management of this process, we may very well sabotage our ability to reap any rewards from this technology.

Definitely worth a read, and worth thinking about.

[1] Carlson RJ. (2009) The disruptive nature of personalized medicine technologies: implications for the health care system. Public Health Genomics 12(3):180-184.
DOI: 10.1159/000189631 [PubMed] [Journal]

Scientific discourse as an epic FAIL

A post on FriendFeed pointed me to this blog post in Adventures in Ethics and Science discussing a particularly infuriating example of just how broken the current system of scientific publishing can be. The epic tale is presented by Prof. Rick Trebino in a PDF document (above) outlining “How to Publish a Scientific Comment in 123 Easy Steps”. This version includes his second addendum in which he gives many excellent (and some painfully obvious) suggestions for how to improve the system.

Here’s a preview:

1. Read a paper in the most prestigious journal in your field that “proves” that your entire life’s work is wrong.

2. Realize that the paper is completely wrong, its conclusions based entirely on several misconceptions.  It also claims that an approach you showed to be fundamentally impossible is preferable to one that you pioneered in its place and that actually works.  And among other errors, it also includes a serious miscalculation—a number wrong by a factor of about 1000—a fact that’s obvious from a glance at the paper’s main figure.

3. Decide to write a Comment to correct these mistakes—the option conveniently provided by scientific journals precisely for such situations.

6. Prepare further by writing to the authors of the incorrect paper, politely asking for important details they neglected to provide in their paper.

7. Receive no response.

15. Write a Comment, politely explaining the authors’ misconceptions and correcting their miscalculation, including illustrative figures, important equations, and simple explanations of perhaps how they got it wrong, so others won’t make the same mistake in the future.

16. Submit your Comment.

17. Wait two weeks.

18. Receive a response from the journal, stating that your Comment is 2.39 pages long. Unfortunately, Comments can be no more than 1.00 pages long, so your Comment cannot be considered until it is shortened to less than 1.00 pages long.

20. Remove all unnecessary quantities such as figures, equations, and explanations.  Also remove mention of some of the authors’ numerous errors, for which there is now no room in your Comment; the archival literature would simply have to be content with a few uncorrected falsehoods.  Note that your Comment is now 0.90 pages.

21. Resubmit your Comment.

22. Wait two weeks.

23. Receive a response from the journal, stating that your Comment is 1.07 pages long. Unfortunately, Comments can be no more than 1.00 pages long, so your Comment cannot be considered until it is shortened to less than 1.00 pages long.

And so the saga begins. Really, the whole thing makes my blood boil.

Fun Mac OS X command: say

Group meetings in the Altman lab often kick off with a Unix or computing tip. These range from examples of built-in but lesser known utilities that make our lives at the command line easier, to scripting hacks, to full-fledged applications you download and install.

At the last group meeting I attended, the presenter showed us a fun little command that comes with Mac OS X, called ‘say’. This command basically does what you think it does – it says whatever comes after it. Here’s a simple example:

shwu$ say hello world

The default voice is whatever is set as the default in your system (usually a female, unless you’ve changed it), but there are many others you can use by setting the -v parameter:

shwu$ say -v Agnes "this is another woman's voice"
shwu$ say -v Bruce "this is a man's voice"

Some are especially fun, like “Bad News”, Bubbles, “Pipe Organ”, Trinoids, and Zarvox. Others are a little weird, like Albert and Whisper. And then there are ones you just shouldn’t use if you’re home alone at night – Hysterical and Deranged, for example. A more complete list can be found here.

The ‘say’ command isn’t just for amusing yourself, though the tricks you could play on people remotely are endless. You can also use it in conjunction with other commands or in scripts:

shwu$ python -c "print 'stuff'" && say done printing stuff || say you have a bug in your script

will say ‘done printing stuff’, whereas if I’d left out one of the single quotes in the python command it would have said ‘you have a bug in your script’ instead. This is great for when you start a script running and turn your attention to YouTube videos other work, but want to be notified when your script either finishes or encounters an error.

Bench scientists can get in on the fun, too. Suppose you have a complicated pipetting protocol that specifies different amounts of different things in different places. A long list can be cumbersome to print out or read, so why not ‘say’ it instead? (Actually, while you can specify a file for it to say using -f, I’m not sure  how you would specify pauses if you had it read your aliquots from a text file… so you might need to create a script that wraps all the aliquot amounts in ‘say’ commands with pauses in between, and then put all that in another script… anyway, it would be pretty cool and all your lab mates would be jealous. Or maybe they’d just think you’re strange.)