Get it while it’s hot! 23andMe for $99

You may have already heard the rumors floating around and they’re all true: 23andMe is having another sale — the whole thing for $99!

Edit: No discount codes are needed. There’s an instant discount of $400 off due to the Black Friday+ sale, which will go until Christmas Monday 11/29 or while supplies last. Head on over to the store, or read on for a bit more info.

Read more of this post

Reflections on ASHG 2010

As conferences go, the American Society of Human Genetics (ASHG) annual meeting is a pretty big deal. Anyone who’s anyone in human genetics is there, and if you want to be someone you better be there, too. And it’s big — this year’s meeting saw more than 6,000 attendees spread throughout a gigantic convention center that spanned four square blocks in the heart of Washington, D.C. Academics, publishers, clinicians, policy wonks, and industry reps staked out their territory among an endless sea of posters, eye-popping demo booths, and cavernous session halls. The international meeting for bioinformatics that I’ve gone to the past seemed quaint by comparison.

At bioinformatics conferences, the common theme is computational methods, applied to a wide variety of topics. At a conference like ASHG, the common theme is human genetics, probed and interpreted with a variety of methods. But even the topic is breathtakingly broad. Sessions covered complex disease, non-coding RNAs, methylation, ethical/social/legal/education issues surrounding genomic research and genetic testing, mouse models, high-throughput sequencing, population and evolutionary genetics, pharmacogenetics, cilia, computational methods, and Mendelian disorders, to name just a few.

I made my first visit to ASHG this year as part of a small contingent from 23andMe*, a direct-to-consumer genomics company. Although I missed a good portion of the conference due to my schedule, some of my colleagues took notes on sessions that I missed, and ample coverage of many of the sessions could be had by following the Twitter hashtag #ashg2010. The following summaries and reflections represent a composite of tweets, other people’s notes, and my personal notes and impressions.
Read more of this post

No comment

At the risk of beating the issue to death, I offer yet another post on the question, “why don’t scientists comment on scientific articles?” Previous reflections stood within the larger context of scientific impact and article-level metrics, and I’ve also attempted some superficial analysis of commenting behavior at PLoS, BMJ, and BMC. More recently (and this is why the topic is on my mind again), a room full of bright minds at the PLoS Forum (including Cameron Neylon and Jon Eisen) scratched their heads over it and came up with pretty much the same conclusion as everyone else who’s ever thought about the problem — the costs simply outweigh the benefits.

The costs, in principle, are minimal. You might need to register for an account at the journal website and be logged on, but then all that’s needed is little more than what most of us already do multiple times a day with our email — type into a box and click “submit”. (In practice, there may be nonsensical, hidden costs that make you wonder what the folks at those journals were smoking.) So the perception that the cost-benefit equation doesn’t work speaks more to the lack of benefit than anything else.

Photo by jamesclay on flickr


Read more of this post

A brief analysis of commenting at BMC, PLoS, and BMJ

As announced on FriendFeed and Twitter, a writing collaboration between me and the inimitable Cameron Neylon has just been published at PLoS Biology, “Article-level metrics and the evolution of scientific impact”! (Loosely based on a blog post from several months ago.)

One of the many issues Cameron and I touched on was the problem of commenting. Most people probably aren’t aware of the problem; after all, commenting is alive and well on the internet in most places you look! But click over to PLoS or BioMed Central (BMC) and the comment sections are the digital equivalent of rolling tumbleweed.

As we mention briefly in the article, comments have great potential for improving science. For one thing, they’re a form of peer review, but without the month-long wait and seemingly arbitrary review criteria. Readers, authors, and other evaluators can also get a sense of what people think about the article. The ideal is certainly tantalizing — vigorous, rigorous debates over the finer scientific points as well as the overarching conclusions with participation both from experts in the field as well as informed laypeople, always with intelligence and civility!!!1!11!!one!! But let’s not kid ourselves — the worst-case scenario is all too easy to imagine and would probably look something like the discussions over at YouTube.

And this would be positively urbane. (From PhD comics)


Read more of this post

In memoriam: Warren DeLano

 

460px-Science090410

PyMOL has starred in many journal covers

On Tuesday, November 3rd, the scientific community suffered a great loss with the passing of Warren DeLano. Most people know him as the creator of PyMOL, a popular and extremely powerful molecular visualization tool, but most – including myself, until recently – may not know all of the other unique qualities that made Warren a mentor, collaborator, inspiration and friend to many. And by making PyMOL open source, Warren demonstrated his generosity and ensured that his work would continue to help future generations of scientists.
Read more of this post

Scripts and hacks for curation

When you curate scientific literature, there are lots of little tasks and procedures and requirements that can all add up into a big inefficient mess if they’re not integrated very well into your workflow. Some of these things are in-house and so you have some level of control over how they are handled (depending, sometimes, on engineering or management), while for others you are at the mercy of the wilderness that is science. It would be great, for example, if all abstracts contained enough information for us to evaluate whether the rest of the article is worth reading – or, in many cases, worth buying in order to read it.

Photo by hbart on Flickr

Photo by hbart on Flickr

In the context of genetic association literature, it would also be great if all abstracts used standard database identifiers for SNPs (i.e. rs #s), as this unambiguously defines the variant; used standard means of reporting the association (e.g. odds ratios with 95% confidence intervals and p-values); and mentioned pertinent aspects of the study, such as the population, number of cases and controls, and adjustments for confounders and multiple comparisons. For me, this qualifies as “enough information to evaluate whether the rest of the article is relevant.” I hope that when they do not mention things like the corrected p-values, it is not because their p-values were not significant. When articles cost up to $90 a pop… well, let’s not get me started.

But I digress. The point is that there are lots of things that could make curation a challenge, and consequently there are lots of things that could make curation easier. Standardization of abstracts, while it doesn’t make for juicy reading, makes going through high volumes of abstracts easier (and machine-accessible). Linking articles from journal websites to PubMed would also be useful, as PubMed serves as a portal to many other resources. Currently, almost all journals use digital object identifiers (DOIs), which are unique pointers to objects on the web. But PubMed IDs (PMIDs), like digital object identifiers (DOIs), are a little simpler, and provide a lot of useful functionality through integration with NCBI’s many databases. You can imagine all the little scripts and hacks you could come up with to improve the curation process, using greasemonkey scripts on Firefox, bookmarklets on any browser, and even web apps.

One somewhat mundane task we often have to do is search for a paper on PubMed to get the PMID. This is straightforward given we already know the authors, title, journal, etc, but still kind of a pain. Fortunately, PubMed allows you to search by DOI, which almost all publishers provide. So a slight improvement is to use the DOI as the search term in PubMed, as this will return the exact result if the DOI exists in the database. But you still have to open up a new browser window or navigate to PubMed and copy and paste the DOI into the search bar. To reduce the number of steps even further, we can use a simple bookmarklet containing a bit of javascript (if it looks cut off, you can still double-click copy and paste it):

javascript:var%20t;%20try%20%7B%20t=%20((window.getSelection%20&&%20window.getSelection())%20%7C%7C%20(document.getSelection%20&&%20document.getSelection())%20%7C%7C%20(document.selection%20&&%20document.selection.createRange%20&&%20document.selection.createRange().text));%20%7D%20catch(e)%20%7B%20%20t%20=%20%22%22;%20%7D;%20location.href='http://www.ncbi.nlm.nih.gov/sites/entrez?db=pubmed&term='+t+'%5BAID%5D';

This script extracts whatever text you’ve highlighted on a page and attempts to search PubMed using it as the DOI. So obviously it will only work if you’ve highlighted something and that something is a DOI, and that DOI is in PubMed. But assuming you do and it is, it will send you directly to the PubMed entry for that paper. Save the script as a browser bookmark, put the bookmark in your bookmarks bar, and whenever you’re on an article webpage (or RSS feed) and want to see the PubMed entry for that article, just highlight the DOI and click the bookmark. (Cameron wrote up a pipe on Yahoo!Pipes a while ago that does something similar, which inspired this bookmarklet.)

Clearly even this simple hack can be improved – it would be nice, perhaps, to have it return the PMID in an alert box so you can make a note and then continue doing whatever you were doing, rather than being sent away to PubMed (this might make use of AJAX?). It would be nice if you didn’t have to highlight, but the script would look for and extract the DOI from the page automatically. And I’m sure you could add even more bells and whistles, within reason.

My latest hackneyed…. “hack-need”… is to be able to identify follow-up studies for a particular genetic association. If you read a paper with PMID X saying SNP A is significantly associated with a disease, it would be really useful to know when future studies look into that association and either replicate or contradict the finding. Hopefully when they do so, they cite PMID X and/or mention SNP A. Essentially, I’d like to query PubMed for papers that cite a given PMID or SNP (via rs #). Ideally, I could do this in batch for many PMIDs and many SNPs automatically, and have each query return only results that are newer than the previous query (or query date). Then I set the script running behind the scenes, process the results using another script, and maybe have it send me an email with a list of new PMIDs to look into every week. Can world domination be far behind?*

Seriously though, I am looking for tips on how to do this follow-up identification thing, so any help appreciated. Pierre has given me some useful hints for how to search PubMed for papers citing a given rs #, and it would be great if this could be modified with dates:

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=snp&id=1802710&db=pubmed&reldate=7

(insert your favorite rs # as the id)
Update: reldate limits results to those within a number of days immediately preceding today’s date; could also use mindate and maxdate to specify a date range.

* Why stop there, you might ask? I could write a script that downloads the abstracts, “reads” them, filters out the irrelevant ones, summarizes the important information, and populates curation reports. But then I’d be out of a bloody job…

New job and curation 101

It’s been several weeks now since I started working at 23andMe, a personal genomics company located in Mountain View, CA. Perhaps not coincidentally, it’s also been several weeks since I last blogged. The transition hasn’t been difficult, but it did take some getting used to, mentally and physically. I mean, leaving for work by 8:30am? Regular hours? Commuting??

Ok, so I really have nothing to complain about. 8:30 isn’t that early, and I could shave half an hour off each end of my commute if I didn’t choose to take advantage of bike-friendly roads, good weather, and a company-sponsored free train pass (OMG benefits!?). All in all, things are pretty much fantastic. The work environment is friendly, flexible, and laid-back; we have plenty of food and drink to keep us fueled throughout the day, and regular workouts/yoga if we need to get fired up or mellowed down (and to keep the “Free Food 15” at bay). Plus, personal genomics is a super interesting and rapidly evolving industry, so there’s really never a dull moment.

So what is personal genomics, anyway? We’ve known for a while that genetics – the sequence of DNA inside our cells – plays an important role in our form and functioning. Many diseases are caused by changes in DNA (often in genes, parts of DNA that code for proteins) that alter the normal functioning of cells, though not all genetic differences lead to negative changes. (Genetics can also tell us about ancestry – who is related to whom and the history of populations – but I won’t be addressing that in this post.) Where it gets personal is when you apply it to individuals, such as when someone gets a genetic test to determine whether they have or are at risk of developing or passing on a particular disease. Where it gets genomics is when we use high-throughput technologies to do what is essentially thousands of genetics tests at once. Put them together, and you get personal genomics.

How do we know what genetic “pieces” correspond to what conditions or diseases? The general strategy is to compare the DNA of a whole bunch of individuals that have that condition (cases) to a whole bunch of individuals that don’t (controls). As long as both groups are similar save for their case-control status, any significant genetic differences between them should have something to do with that condition. We call this a genetic association.

It turns out that there are millions of single locations in the human genome where the exact sequence of the DNA might differ between two people, and these places, called single nucleotide polymorphisms, or SNPs, can contribute to differences we can observe, such as whether you flush when you drink alcohol or how easily you put on weight. 23andMe personal genomics kit determines what your sequence is for a representative subset of SNPs. Many are already known to be associated with certain conditions, and new research is being done every day to uncover more and more of these associations.

So what exactly do I do at 23andMe? My official job title is “Scientist, Content Curation”. Curation, I’ve found, is not very familiar to most people. Most people probably know that there is such a thing as a museum curator, but might not know what they do. Hardly anyone has ever heard of scientific curation. (And I thought explaining what I was studying as a grad student was hard! Biomedical informatics, anyone?)

But it’s really not that complicated. The essence of curation is almost always the same: the selection, acquisition, and management of content. What that content is differs depending on the field – for example, an art curator might look for and organize artwork for exhibition in a gallery, while a curator in the “Ancient Civilizations” department of a museum may be in charge of acquiring, managing, and presenting archaeological artifacts.

In science, curation involves organization of scientific knowledge and data. An area where this has been especially important is the life sciences, as the amount of information being generated by high-throughput experiments, large-scale projects, and scholarly publishing has skyrocketed. In order to manage this information and render it useful to others, the field of biocuration was born. Any database that organizes scientific knowledge – UniProt (the Universal Protein resource), FlyBase (database for that very important model organism, Drosophila), PharmGKB (a database focused on how genes and drugs interact), etc – depends on curators to keep the information up to date and easy to use.

And so it is with 23andMe. The genetic testing kit is one part of the product, but the other part is information – what knowledge is there about associations between the SNPs on our platform and health traits or conditions? What does your particular data mean? The science is far from exhausted on this subject, and in order to stay up to date with the research, 23andMe spends a lot of effort on curating the scientific literature for new genetic associations and presenting the information on our website for our customers.

Day to day, this means that we keep track of papers recently published in scientific journals, skim through to find ones that may have promising findings, and then vet these more thoroughly to see if they pass our stringent scientific standards. If they do, we extract the bits of information we need and put the bits together in reports that will eventually become part of the content on the website. It’s a job that definitely benefits from an organized system and an eye for detail – as well as a sense of curiosity.

After three weeks on the job, I think I’m starting to get the hang of the day to day work. Since my work is even more directly tied to the literature than it was as a graduate student in academia, I’m also developing an enhanced awareness of issues surrounding scientific publishing – those related to standardization and metadata, publication bias towards positive results, and closed vs. open access. The hardest aspect of transitioning from academia to industry hasn’t been the regular schedule, or the work environment, or the work itself, it’s been getting used to being on the other side of the pay-wall of scientific journals.

But that’s a rant for another time. ;)