A brief analysis of commenting at BMC, PLoS, and BMJ

As announced on FriendFeed and Twitter, a writing collaboration between me and the inimitable Cameron Neylon has just been published at PLoS Biology, “Article-level metrics and the evolution of scientific impact”! (Loosely based on a blog post from several months ago.)

One of the many issues Cameron and I touched on was the problem of commenting. Most people probably aren’t aware of the problem; after all, commenting is alive and well on the internet in most places you look! But click over to PLoS or BioMed Central (BMC) and the comment sections are the digital equivalent of rolling tumbleweed.

As we mention briefly in the article, comments have great potential for improving science. For one thing, they’re a form of peer review, but without the month-long wait and seemingly arbitrary review criteria. Readers, authors, and other evaluators can also get a sense of what people think about the article. The ideal is certainly tantalizing — vigorous, rigorous debates over the finer scientific points as well as the overarching conclusions with participation both from experts in the field as well as informed laypeople, always with intelligence and civility!!!1!11!!one!! But let’s not kid ourselves — the worst-case scenario is all too easy to imagine and would probably look something like the discussions over at YouTube.

And this would be positively urbane. (From PhD comics)

Read more of this post


Scientific discourse as an epic FAIL

A post on FriendFeed pointed me to this blog post in Adventures in Ethics and Science discussing a particularly infuriating example of just how broken the current system of scientific publishing can be. The epic tale is presented by Prof. Rick Trebino in a PDF document (above) outlining “How to Publish a Scientific Comment in 123 Easy Steps”. This version includes his second addendum in which he gives many excellent (and some painfully obvious) suggestions for how to improve the system.

Here’s a preview:

1. Read a paper in the most prestigious journal in your field that “proves” that your entire life’s work is wrong.

2. Realize that the paper is completely wrong, its conclusions based entirely on several misconceptions.  It also claims that an approach you showed to be fundamentally impossible is preferable to one that you pioneered in its place and that actually works.  And among other errors, it also includes a serious miscalculation—a number wrong by a factor of about 1000—a fact that’s obvious from a glance at the paper’s main figure.

3. Decide to write a Comment to correct these mistakes—the option conveniently provided by scientific journals precisely for such situations.

6. Prepare further by writing to the authors of the incorrect paper, politely asking for important details they neglected to provide in their paper.

7. Receive no response.

15. Write a Comment, politely explaining the authors’ misconceptions and correcting their miscalculation, including illustrative figures, important equations, and simple explanations of perhaps how they got it wrong, so others won’t make the same mistake in the future.

16. Submit your Comment.

17. Wait two weeks.

18. Receive a response from the journal, stating that your Comment is 2.39 pages long. Unfortunately, Comments can be no more than 1.00 pages long, so your Comment cannot be considered until it is shortened to less than 1.00 pages long.

20. Remove all unnecessary quantities such as figures, equations, and explanations.  Also remove mention of some of the authors’ numerous errors, for which there is now no room in your Comment; the archival literature would simply have to be content with a few uncorrected falsehoods.  Note that your Comment is now 0.90 pages.

21. Resubmit your Comment.

22. Wait two weeks.

23. Receive a response from the journal, stating that your Comment is 1.07 pages long. Unfortunately, Comments can be no more than 1.00 pages long, so your Comment cannot be considered until it is shortened to less than 1.00 pages long.

And so the saga begins. Really, the whole thing makes my blood boil.

The evolution of scientific impact

Photo by cudmore on Flickr

In science, much significance is placed on peer-reviewed publication, and for good reason. Peer review, in principle, guarantees a minimum level of confidence in the validity of the research, allowing future work to build upon it. Typically, a paper (the current accepted unit of scientific knowledge) is vetted by independent colleagues who have the expertise to evaluate both the correctness of the methods and perhaps the importance of the work. If the paper passes the peer-review bar of a journal, it is published.

Measuring impact

For many years, publications in peer-reviewed journals have been the most important measurement of someone’s scientific worth. The more publications, the better. As journals proliferated, however, it became clear that not all journals were created equal. Some had higher standards of peer-review, some placed greater importance on perceived significance of the work. The “impact factor” was thus born out of a need to evaluate the quality of the journals themselves. Now it didn’t just matter how many publications you had, it also mattered where.

But, as many argue, the impact factor is flawed. Calculated as the average number of citations per “eligible” article over a specific time period, it is highly inaccurate given that the actual distribution of citations is heavily skewed (an editorial in Nature by Philip Campbell stated that only 25% of articles account for 89% of the citations).  Journals can also game the system by adopting selective editorial policies to publish articles that are more likely to be cited, such as review articles. At the end of the day, the impact factor is not a very good proxy for the impact of an individual article, and to focus on it may be doing science – and scientists – a disservice.

In fact, any journal-level metric will be inadequate at capturing the significance of individual papers. While few dispute the possibility that high journal impact factors may elevate some undeserving papers while low impact factors may unfairly punish perfectly valuable ones, many still feel that the impact factor – or more generally, the journal name itself – serves as a useful, general quality-control filter. Arguments for this view typically stem from two things: fear of “information overload”, and fear of risk. With so much literature out there, how will I know what is good to read? If this is how it’s been done, why should I risk my career or invest time in trying something new?

What is clear to me is this – science and society are much richer and more interconnected now than at any time in history. There are many more people contributing to science in many more ways now than ever before. Science is becoming more broad (we know about more things) and more deep (we know more about these things). At the same time, print publishing is fading, content is exploding, and technology makes it possible to present, share, and analyze information faster and more powerfully.

For these reasons, I believe (as many others do) that the traditional model of peer-reviewed journals should and will necessarily change significantly over the next decade or so.

Article-level metrics at PLoS

The Public Library of Science, or PLoS, is leading the charge on new models for scientific publishing. Now a leading Open Access publisher, PLoS oversees about 7 journals covering biology and medicine as well as PLoS ONE, on track to become the biggest single journal ever. Papers submitted to PLoS ONE cover all areas of science and medicine and are peer-reviewed only to ensure soundness of methodology and science, no matter how incremental. So while almost every other journal makes some editorial judgment on the perceived significance of papers submitted, PLoS ONE does not. Instead, PLoS ONE leaves it to the readership to determine which papers are significant through comments, downloads, and trackbacks from online discussions.

Now 2 1/2 years old, PLoS ONE boasts thousands of articles and a lot of press. But what do scientists think of it? Clearly, enough think highly of it to serve on its editorial board or as reviewers, and to publish in it. Concerns that PLoS ONE constituted “lite” peer review seem largely unfounded, or at least outdated. Indeed, there are even tales of papers getting rejected from Science or Nature because of perceived scientific merit, getting published in PLoS ONE, and then getting picked up by Science and Nature’s news sections.

Yet there is still feeling among some that publishing in PLoS ONE carries little or no respectability. This is due in part to a misconception of how the peer review process at PLoS ONE actually works, but also in part because many people prefer an easy label for a paper’s significance. Cell, Nature, Science, PLoS Computational Biology – to most people, these journals represent sound science and important advances. PLoS ONE? It may represent sound science but it’s up to the reader to decide whether any individual paper is important.

Why is there such resistance to this idea? One reason may be tied to time and effort to impact: while citations always have taken some time to build up, a journal often provides a baseline proxy for the significance of a paper. A publication in Nature on your CV is an automatic feather in your cap, and easy for you and for your potential evaluators to judge. Take away the journal, and there is no baseline. For some, this is viewed as a bad thing; for others, however, it’s an opportunity to change how publications – and people – are evaluated.

Whatever the zeitgeist in particular circles, PLoS is clearly forging ahead. PLoS ONE’s publication rates continue to grow, such that people will eventually have to pay attention to papers published there even if they pooh-pooh the inclusive – but still rigorous – peer review policy. Recently, PLoS announced article-level metrics, a program to “provide a growing set of measures and indicators of impact at the article level that will include citation metrics, usage statistics, blogosphere coverage, social bookmarks, community rating and expert assessment.” (This falls under the broader umbrella of ‘post-publication peer review’.) Just how this program will work is a subject of much discussion, and certain metrics may need a lot of fine-tuning to prevent gaming of the system, but the growing consensus, at least among those discussing it online, is that it’s a step in the right direction.

Essentially, PLoS believes that the paper itself should be the driving force for significance, not the vehicle it’s in.

The trouble with comments

A major part of post-publication peer review such as PLoS’s article-level metrics is user comments. In principle, a lively and intelligent comment thread can help raise the profile of the article and engage people – whether it be other scientists or not – in a conversation about the science. This would be wonderful, but it’s also wishful thinking; as anyone who’s read blogs or visited YouTube knows, comment threads devolve quickly unless there is moderation.

From rustylime.com

From rustylime.com

For community-based knowledge curation efforts (think Wikipedia), there is also a well-known 90-9-1 rule: 90% of people merely observe, 9% make minor or only editorial contributions, and 1% are responsible for the vast majority of original content. So if your audience is only 100 people, you’ll be lucky if even one of them contributes. Indeed, experiments with wiki-based knowledge efforts in science have been rocky at best, though things seem to getting better. The big question remains:

But will the bench scientists participate? “This business of trying to capture data from the community has been around ever since there have been biological databases,” says Ewan Birney of the European Bioinformatics Institute in Hinxton, UK. And the efforts always seem to fizzle out. Founders enthusiastically put up a lot of information on the site, but the ‘community’ — either too busy or too secretive to cooperate — never materializes. (From a news feature in Nature last September on “wikiomics”.)

Thus, for commenting on scientific articles, we have essentially two problems: encouraging scientists to comment, and ensuring that the comments have some value. An experiment on article commenting on Nature several years ago was deemed a failure due to lack of both participation and comment quality. Even now, while many see the fact that ~20% of PLoS articles have comments as a success, others see it as a inadequate. Those I’ve talked to who are skeptical of the high volume nature of PLoS ONE tend also to view their comments on papers to be a highly valuable resource, one not to be given away for free in public but disclosed in private to close colleagues or leveraged for professional advancement through being a reviewer.

Perhaps the debate simply reflects different generational mindsets. After all, people are now growing up in a world where the internet is ubiquitous, sharing is second-nature, and almost all information is free. Scientific publishing is starting to change, and so it is likely that current incentive systems will change, too. Yet while the gulf will eventually disappear, it is perhaps at its widest point now, with vast differences in social norms, making any online discourse potentially fraught with unnecessary drama. As Bora Zivkovic mentions in a recent interview,

It is not easy, for a cultural reason, because a lot of scientist are not very active online and also use the very formalised language they are using in their papers. People who have been much more active online, often scientists themselves, they are more chatting, more informal. If they don’t like something they are going to say it in one sentence, not with seventeen paragraphs and eight references. So those two kinds of people, those two communities are eyeing each other with suspicion, there’s a clash of cultures. The first group sees the second group as rude. The second group views the first group as dishonest. I think it will evolve into something in the middle, but it will take years to get there.

When people point to the relative lack of comments on scientific papers, it’s important to point out the fact that online commenting has not been around in science for very long. And just as it takes time for citations to start trickling in for papers, it takes time to evaluate a paper in the context of its field. PLoS ONE is less than three years old. Bora notes, “It will take a couple of years, depends on the area of science until you can see where the paper fits in. And only then people will be commenting, because they have something to say.”

Brush off your bullshit detector

The last argument I want to touch on is that of journals serving as filter for information. With millions of articles published every year, it can seem a daunting task keeping up with the literature in your field. What should you read? In a sense, a journal is a classifier, taking in article submissions and publishing what it thinks are good and important papers. As with any classifier, however, performance varies, and is highly dependent on the input. Still, people have come to depend on journals, especially ones with established reputations, to provide this service.

Now even journals have become too numerous for the average researcher to track (hence crude measures like the impact factor). So when PLoS ONE launched, some assumed that it would consist almost entirely of noise and useless science, if it could be considered science at all. I think it’s clear that that’s not the case; PLoS ONE papers are indeed rigorously peer-reviewed, many PLoS ONE papers have already had great impact, and people are publishing important science there. Well, they insist, even if there’s good stuff in there, how am I supposed to find what’s relevant to me out of the thousands of articles they publish every year? And how am I supposed to know whether the paper is important or not if the editors make no such judgment?

Here, I would like to point out the many tools available for filtering and ranking information on the web. At the most basic level, Google PageRank might be considered a way to predict what is significant and relevant to your search terms. But there are better ways. Subscribing to RSS feeds (e.g. through GoogleReader) makes scanning lots of article titles quick and easy. Social bookmarking and collaborative filtering can suggest articles of interest based on what people like you have read. And you can directly tap into the reading lists of colleagues by following them on various social sharing services like Facebook, FriendFeed, Twitter, and paper management software like Mendeley. I myself use a loose network of friends and scientific colleagues on FriendFeed and Twitter to find interesting content from journals, news sites, and blog posts. The bonus is that you also interact with these people, networking at its most convenient.

The point is that there is a lot of information out there, you have to deal with it, and there are more and more tools to help you deal with it. It’s no longer sufficient to depend on only one filter, and an antiquated one at that. It may also be time to take PLoS’s lead and start evaluating papers on their own. Yes, it takes a little more work, but I think learning how to evaluate papers critically is a valuable skill that isn’t being taught enough. In a post about the Wyeth ghost-writing scandal, Thomas Levenson writes:

… the way human beings tell each other important things contains within it real vulnerabilities.  But any response that says don’t communicate in that way doesn’t make sense; the issue is not how to stop humans from organizing their knowledge into stories; it is how to build institutional and personal bullshit detectors that sniff out the crap amongst the good stuff.

From nitot on Flickr

From nitot on Flickr

Although Levenson was writing about the debate surrounding science communication and the media, I think there’s a perfect analogy to new ways of publishing. Any response that says don’t publish in that way doesn’t make sense; the issue is not how to stop people from publishing, it is how to build personal bullshit detectors – i.e. filters. People should always view what they read with a healthy dose of skepticism, and if we stop relying on journals, or impact factors, or worse to do all of our vetting for us, we’ll keep that skill nicely honed. At the same time, we are not in this alone; leveraging a network of intelligent agents – your peers – will go a long way.

So continue leading the way, PLoS. Even if not all of the experiments work, we will certainly learn from them, and keep the practice and dissemination of science evolving for the times.

When you don’t get what you pay for

This is by now an old problem but one that is not yet obsolete. You work at a place that does not have access to journals containing articles of potential interest to your research – that’s problem #1. But, maybe your institution or company has some kind of budget for paying for those articles. Huzzah! Except you discover that somewhere around 80% of the articles aren’t actually as useful as their title or abstract made them seem, and so you’ve wasted time and resources – that’s problem #2.

Both problems are eliminated if the article is Open Access (though you still have to read it to determine its usefulness), but the fact is that a lot of interesting research is still published in closed access journals, locked behind a pay-wall, and so both problems are still prevalent. For someone whose work requires broadly scouring the literature for new discoveries or evidence, it is important to survey as much of that literature as possible. Each enticing abstract and each tantalizing title could be another gem – or, more often than not, a complete dud. If there was a way to tell more accurately what a paper was about without seeing the entire thing – abstracts at a minimum, longer abstracts or section summaries as an additional feature, perhaps – that could solve problem #2. This would at least make paying for articles more justifiable.

Of course, I think that all published research should ideally be free and accessible by the public. This isn’t immediately practical in many cases but it is something we should try to achieve.

Happy Open Access Day!

October 14th 2008 is Open Access Day, and there are many dozens of events happening around the world to promote awareness of Open Access. Although the movement is most prominent in scientific publishing right now, the concept – making content and knowledge freely accessible to everyone – is applicable in any discipline. If you aren’t familiar with Open Access or would like to learn more, check out the list of participants to see if there is an event happening near you.

(This post is part of the synchro-blogging event, meant to explode Open Access over the intarwebs today.)

Why Open Access matters, to me and to the world

The big question is: why Open Access? The reasons are many (see Neil and Deepak’s posts for other perspectives). For one, research is typically funded by the public and in these cases especially the results of that research should be accessible by the public. For another, it is getting too expensive even for prestigious universities to maintain site licenses with subscriber-pays publishers. And in these days of the instant, information-rich web, it just doesn’t make sense to restrict access for the vast majority of potential consumers who can probably easily go elsewhere for similar information.

But for me, a bigger question is: why not? And while there are some arguments against Open Access, they are, by and large, arguments against specific implementations of it (e.g. author-pays models) and not against the concept itself. The fact is that collaboration between scientists, the importance of communication between scientists and the lay public, and the responsibility of advancing basic research towards application are all growing. Open Access enhances each of these by making information available more quickly and in full. And as Open Access gains acceptance, it will open doors to other potential improvements, such as increased publication of negative results, increased access to research “as it happens” (see Open Notebook Science), and the implementation of standards for “the fully supported paper”, as described at Science in the Open:

The idea here is deceptively simple, and has been discussed elsewhere; simply that all the relevant supporting information for a paper (data, detailed methodology, software tools, parameters, database versions etc. as well as access to required materials at reasonable cost) should be available for any published paper. The challenge here lies in actually recording experiments in such a way that this information can be provided.

Essentially – and here is where the biggest significance is for me – Open Access is one important leg of a platform supporting “open science” (the others being Open Data and Open Source; perhaps Open Notebooks/Research), and I believe it should act as a natural integration point for all legs as well. The fact that this isn’t how things were done in the Past is rarely a valid reason not to change, especially if the circumstances are wildly different. Science and research needs to adapt to the changing needs and capabilities of people and technology. Right now, Open Access is the easiest and most logical place to start, and as we address the other aspects we will come back to it full circle.

Open Access: not an alternate reality anymore

Being relatively new to research, it’s difficult to remember a distinct point when I first became aware of Open Access; it was always just there. Well, that’s not quite true – I’m pretty sure I had no conscious awareness of Open Access as an undergraduate but as a graduate student it’s maintained a well-established ambience. Perhaps this has to do with the fact that my field of study is concerned with information and reliant on access to data, or the fact that a founder of one of the major Open Access publishers, PLoS, is faculty at my school. Either way, aside from the hard to escape pedestal of Cell/Nature/Science, there has never been any intimation that closed access publishing is inherently better than Open Access. I think that this will only be more true for more people as time goes on, as access to information becomes more important to more people, as Open Access becomes more established and the pedestal model breaks down.

Doing my part for Open Access

Over the last year or so, I’ve become increasingly conscious of all things Open, including Open Access. As a result of this, I’ve started blogging, both to help me reflect on my thoughts on these topics and hopefully to help others become aware of them. I am also organizing a workshop on Open Science with Cameron Neylon which will discuss issues and next steps concerning Open Access, Open Data, and open science in general.

In my own research, the few papers I’ve published have all been Open Access (PNAS, which makes all publications open after 6 months; Genome Biology, part of Biomed Central; and BMC Genomics, also Biomed Central). I commit here and now that all future papers on which I am an author and have an influence on decisions will also be Open Access (and will do my best to ensure that I will only be an author on Open Access papers).

What can others do?

If you’re not me (and that’s most of you), you can still do what I do. Insist on publishing in Open Access journals. Write about the topic, and other aspects of openness, if it interests you. Stay up to date on the progress of Open movements, attend talks or events, and maybe even organize or help out at one. If you mentor students, you can start a pyramid by promoting awareness with them. If you’re a student, you can help your advisor become more Open Access friendly.

Good places to learn about Open Access include Peter Suber’s Open Access overview (and frequent newsletters), the websites of Open Access publishers like PLoS and BioMed Central, the Open Access Day FriendFeed room, and blogs of Open Access advocates, such as Bora, Jon EisenPeter Murray-Rust, and Science Commons.

Incremental and continuous – a new paradigm for scientific publishing?

I had an interesting conversation with my mother a couple of nights ago about open science. Both of my parents are scientists (dad = chemist, mom = pharmacologist, both of them entrepreneurs as well), so I can just tell them what I’m doing and they always “get it”. My mother in particular has this knack for seeing past the shallow tangle of things I’m saying and “getting it” at a deeper level than even I do. A mother’s intuition, a scientist’s skill, whatever you want to call it – she’s good.

The conversation began with my telling her about the workshop and how the planning was going and what we were going to try to do. I started getting into Open Access, peer-review, credit systems, open notebook science, data sharing and portability, reproducible research, the culture of science and academia, big change, etc, etc, etc… all along she verbally nods, asks questions, comments, and then she says,

…so research should be published incrementally and continuously.

At that simple, yet astonishingly clear statement, I fell silent. Yes, I was familiar with discussions about open peer-review, about open notebooks, about blogging one’s research, about reproducibility, but I hadn’t really encapsulated it in my head the way my mother did. Of course, the term “published” is used very loosely here, and if it is indeed incremental and continuous then it resembles scientific publishing as it is now not at all. But maybe in the future we won’t have journal articles per se – removed in both time and process from the actual work – except as summaries at natural breakpoints. Maybe the bulk of what is read and used as the basis of future studies will be these incrementally “published” (i.e. certified in some way) and continuously updated records of ongoing research.

Many things would be different as a result (or precondition) of such a “publishing” paradigm, and I’m sure there are many potential pros and cons. I don’t necessarily think it should be the model for how things are done in the future, but my mom’s logical leap definitely made me ponder this whole open science thing in a new light.

New meaning to "publish or perish" – an opening for Open Science?

The saying “publish or perish” is well-known in academia, and typically both actions refer to the same subject – you, the aspiring/struggling grad student/post-doc/fellow/assistant professor. A recent correspondence in Nature puts a new and bracing spin on the phrase.

I think at some point most academic researchers have experienced the conflict that can arise when it is time to write a paper. On the one hand, you’re getting a chance to reward those months or years of hard work with some exposure and a line on your CV, and invest in the potential for future collaborations. On the other, maybe you’ve just gotten started on a really promising or exciting research direction, are in a groove, work-wise, and to have something like writing suddenly vying for your attention just means that both activities suffer. You feel that you can’t drop what you’re working on to write the paper, but the paper writing is distracted and unfocused because you’re still trying to conduct research half the time (and thinking about it more than that). But we march on to these two seemingly competing drummers, fueled somewhat by the vague hope that our work, once it is in the public domain, will also contribute a drop in the bucket that is scientific advancement of our species.

But what about other species? In conservation biology, “publish or perish” can take on new, and frighteningly literal, meaning. Time spent working on publications is time taken away from research on ecosystems and endangered wildlife. In the meantime, earth’s natural resources and diversity suffer. To prevent this from happening, the authors of the letter suggest (only slightly ironically) the adoption of a new impact factor:

This impact factor would be based on an estimation of how much worse the conservation status of an endangered species or ecosystem might be in the absence of the candidate’s research. It would select for targeted investigation that should help to fill in ‘the great divide’, and would exclude opportunistic ecology papers claiming to be of conservation significance.

Even if this proposal was made half in jest, it does highlight some important questions. The first sentence essentially asks: how much faster could research be conducted (and, by translation, medical or scientific advances be developed) if there was less emphasis on publication? The second sentence is quite a bit more complex, since it seems it would bring in value judgments on the worth of specific research questions – something that would be hard to define objectively and is easily influenced by prevailing trends, funding, and big talk.

So let’s talk about the first idea – that the pace of scientific advancement suffers from the emphases placed on publication. Obviously, research needs to be disseminated if it is going to contribute. But here is where Open Science comes in. Suppose Open Science and Open Notebook Science became the norm rather than the burgeoning, but still fringe, movement that it is now. Two big questions immediately come to my mind: Would publication matter as much as it does now? Would research proceed faster? I say no and yes.

With most, if not all, of your methods, data, and results made public, formal publication would not be necessary for others to learn of and benefit from your work. Peer-review may become an intrinsic part of the entire research process. Of course, a formal summary of your work adds great value and would be indispensable for someone searching for information on your field of study, but much of the pressure to publish could be alleviated. Add to that the increased exposure to the entire community and you get enormous potential to speed up your research in addition to research in general. You can learn what is working and not working in your experiments, get useful feedback and suggestions, and meet people who may be able to help you, all on a much faster timescale. At the same time, new ideas may be spawned, collaborations fostered, and interesting connections made between concepts.

Obviously, the future of Open Science is not going to be as rosy as that, at least in the early stages of its evolution (issues like patenting and privacy are valid and worth lengthy discussion in their own right, but are beyond the scope of this post). In fields like conservation biology, however, the shadow cast by “publish or perish” has terribly real implications, and the move towards Open Science will help to lift it. Can anyone really argue that Open Science is a bad thing?