Peer review at the National Institutes of Health (NIH) is getting an in-depth look, with the official goal of “optimizing its efficiency and effectiveness, and to ensure that the NIH will be able to continue to meet the needs of the research community and public-at-large.” Whether this effort translates into significant changes remains to be seen. But there’s no doubt that the peer review system, like NIH as a whole, has changed significantly, including review committee size, composition, and process, and NIH appears to be struggling to manage these changes.
As part of this assessment of peer review, NIH convened the ‘NIH Regional Consultation Meeting on Peer Review’ (held in New York, Chicago, and San Francisco) to consult with scientists around the country. I was one of 150 or so people at the morning-long September meeting in Chicago, nearly all of whom seemed to be funded researchers and/or past or current grant reviewers. The Observer invited me to share my impressions of this event. (Summaries of the meetings are available at http://enhancing-peer-review.nih.gov/index.html, under Calendar of Events).
Before getting to some specifics, I should note that the NIH effort to improve peer review seems quite genuine, although this type of high-profile re-examination of the bureaucracy is cyclical, with the actual impact arguably rather small each round. For example, in the past, NIH has consulted leading people on psychometrics and decision-making, such as Lee Sechrest and Robyn Dawes, but has not used their recommendations to full advantage in the processes through which NIH makes funding decisions, whether at the level of formulating advice from review committees or in final funding decisions by individual institutes. I know that APS continues to encourage NIH to use what we know about decision-making and program evaluation, and I hope that experts from these areas will continue to offer their guidance at every opportunity.
Change in the Culture
The Chicago meeting did not delve into the science of decision-making or very far into the mechanics of decision-making by review committees. Rather, most of the discussion focused on what one could call the culture of the review system. With the doubling of the NIH budget and the subsequent doubling of the number of applications, there has been a huge growth in the number of reviewers. One NIH speaker reported that in 1987, NIH used around 2,000 reviewers; by 2006, it was more than 18,000. Nearly all of the expansion has been achieved via ad hoc reviewers rather than by expanding the pool of regular committee members, with unfortunate implications for review-committee culture, consistency of review criteria and processes, etc.
|APS to NIH: Use Science to Improve Peer Review
APS has long advocated the use of decision making research and program evaluation strategies to improve NIH’s review system. Executive Director Alan Kraut conveyed these views to NIH during the latest round of consultation on peer review with the scientific community.
“Think of NIH as a Fortune 500 company, and think about changing/revising NIH peer review in the way a Fortune 500 company might change/revise one of its important accounting systems. Most companies would undertake such a change with a clear evaluation strategy in mind. While there were many suggestions for how to change peer review at the July 30, 2007, meeting, each one needed preliminary consideration of which evaluation strategy would best determine whether the proposed change was a good one. Does it improve peer review? At the least, does it not harm peer review? Many outcome variables present themselves. Are similarly excellent applications funded under an older vs. a revised peer review system? (This might include running two peer review systems simultaneously, or, more feasibly, portions of them — a suggestion that was not mentioned by others at the July 30 meeting, but one that is often used in industry.) Is grant-funded research awarded in the old system and the revised system getting published in the same high impact journals, and with similar citations? Are excellent researchers (e.g., National Academy of Science members and MERIT Awardees funded similarly in the old vs. the revised system? My main point is to note that any of these strategies and many others, including those presented at the July 30 meeting, require an expertise grounded in statistics and psychometrics among those implementing the revisions. This is different than the expertise needed in asking excellent researchers for their suggestions on how to change peer review. I will proudly add that evaluation is an expertise that psychologists and other social and behavioral scientists possess, a skill consistently developed with NIH funding.
My second point is that there is a substantive field of research broadly known as Judgment and Decision Making. Again, this is an area of research that was often developed with NIH funding. In 2002, National Institute on Aging grantee Daniel Kahneman (Princeton) won a Nobel Prize for his work in this very field. Researchers in judgment and decision making ask questions about how decisions get made. What influences an adolescent’s decision to engage in risky sex? How do customs agents decide who to search? How does uncertainty of outcome affect decision making? In the context of some of the NIH peer review changes proposed at the July 30 meeting, these questions may include: Are decisions systematically different in face-to-face peer review meetings vs. telephone meetings vs. virtual meetings? Does the number of people involved in a decision making process make a difference and what is the mechanism for this difference? Do systematic differences occur with a changing number of reviewers involved in the peer review process? Do variable numbers of ad hoc reviewers affect decisions? What about the number of reviews done by a committee? These and other questions require an expertise in judgment and decision making to answer (and even to correctly pose the questions). It is substantive expertise in psychology and social and behavioral science that needs to be added to the process of peer review reorganization.”
One of the chairs of the session, Lawrence Tabak, director of the National Institute of Dental and Craniofacial Research, noted that at review committee meetings there are now sometimes “80 people in the room, most of them ad hocs.” This surely tends to produce more divergent scores (because of using the scoring scale differently, without mutual calibration), different degrees of understanding of the criteria for the many funding mechanisms, and uncertainty about the expertise of other committee members. That is a recipe for inconsistent scores as well as inefficient discussion. For example, reviewers often struggle with such questions as how much to trust an established applicant regarding specifics of the proposed research that are not described in detail; or, in the review of fellowship applications, what the weights should be for the quality of the applicant, the mentor, the mentor’s involvement, the proposed research, and the proposed formal training. Reasonable people can differ on these things, and it can take some time for a committee to develop a consistent approach to such issues. Large committees with unstable membership will not do that. An NIH staffer not at the meeting told me that senior NIH management has recently told the review staff to ramp up the use of “electronic reviews,” wherein reviewers never meet in person or by phone — the entire “discussion” is via asynchronous posts on a special-purpose website. Participating in one of these sessions earlier this fall, I had no sense at all of most of the committee members during discussion of a particular application, because it was rare for anyone other than assigned reviewers to post anything. This precludes the development of an effective committee culture.
A related phenomenon is that the proportion of reviewers who are relatively senior in the field and experienced as reviewers has plummeted. (This has certainly been my experience, doing reviews since 1985. Early on, very few reviewers were assistant professors. Now, full professors are usually a minority.) During the Chicago discussion, this relative attrition of senior reviewers was attributed to such things as the review experience having become less intellectually interesting and rewarding (also my experience) and accelerating pressure on senior people from other quarters in their work life, so that increasingly they decline invitations to review.1
Serving on review committees nowadays is quite different from how a seasoned faculty member I knew when I was in graduate school portrayed the (then often three-day) review meetings. He described them as advanced seminars with leading colleagues in one’s field and found them tremendously rich and valuable intellectually. They were clearly a highlight of his professional experience. This experience has “evolved” into much shorter meetings with much less time for give-and-take and mutual education among reviewers.
There was blunt discussion during the meeting that, along with the NIH budget doubling, soft-money research settings are increasingly hiring junior faculty to “exploit” (not my word) them — that is, hiring more research staff than can actually be accommodated by the setting, pressing them to get grants, and casting out those who fail. This intensifies the competition for grant money and undercuts some of the benefits of NIH budget increases. Imagine the budget doubling again and again. Is there any doubt that, with a few years’ lag, the number of applications would scale equally? Surely, the demand for research funds would grow to match (and once again vastly exceed) supply. This is not simply greed. The nation needs a vastly expanded research portfolio. But NIH’s purse cannot be driven by (virtually bottomless) demand for grant money. The nation (not just Congress) needs to be educated about the need for that portfolio, including basic research with long-term and unforeseeable payoff.
Impact of Priority Scores
Looking beyond the processes and structure of peer review, there are also questions about whether and how the priority scores produced by review committees are used in institutes’ funding decisions. Among other things, this calls into question the relevance of the peer review system and also has practical implications for attracting reviewers.
There has always been some flexibility, but in the past, priority scores were by far the main factor in determining funding in virtually all institutes. It appears that NIH has moved away from that policy in recent years. NIH institutes now seem to vary greatly in how rigidly they follow priority-score order. Of particular relevance to psychologists, the National Institute of Mental Health (NIMH) is probably among the least rigid in following priority scores at present. That is, program staff — and ultimately the institute director, who makes many of the individual funding decisions on the basis of staff recommendations — exercise considerable discretion in determining whether to pay a high-scoring application (as well as in determining how much to cut its budget or duration). That isn’t to say that the decisions are arbitrary or unwise, but considerations other than priority score — and thus outside of the peer-review system — receive substantial weight.
Of course, one’s view of this depends on one’s role in the process — as a potential principal investigator (PI) with a marginal priority score; a reviewer who worked hard on a critique and would like to think that his/her judgment will carry the day; an NIH program officer who will have to make a compelling case to the Institute director to get a grant paid (or not); or as an Institute director under close scrutiny from Congress and various advocacy groups. In any case, it clearly behooves applicants to talk to relevant program staff about the program priorities early in the process. In my experience, program staff are often extraordinarily helpful. They have every reason to try to attract strong applications to their portfolio even beyond what they can fund, so that they can make a case for more funding of their area.
Views from the Field
Although comments from the audience were the focus of the meeting, most of the suggestions sounded unwise and are not worth reporting on (but see meeting summaries at the URL above for many additional ideas, submitted in writing in conjunction with the three meetings and often better developed). There really didn’t seem to be a single realistic idea that had any traction with the audience, other than vague endorsements of somehow making review-committee service more enticing for senior people. Further increasing the NIH budget, which probably isn’t in the cards near-term, was pushed by a number of speakers from the audience. How to win over Congress and the public to support such an investment wasn’t discussed.
One suggestion that seemed both good and feasible was to give PIs a chance to comment on preliminary reviews prior to the review meeting. In some cases, that would save an entire 8- or 12-month review round and a lot of clerical and substantive work for all players. A variation of this idea was proposed in conjunction with the New York meeting, and a submission posted on the San Francisco report URL suggests allowing PIs to submit a short rebuttal shortly after the review meeting. All of these ideas would bring the grant-review process closer to that used by many journals.
A less radical option would be to strongly encourage reviewers to solicit clarification from PI’s before finishing a review. In fact it is longstanding NIH policy to encourage reviewers to do this, but in practice it is rarely followed. In my experience, it is no longer even mentioned to reviewers as an option and is sometimes even actively discouraged by NIH staff. As a result, reviewers raise concerns that could easily be addressed before the review meeting, but instead the PI must wait the better part of a year to do so, consuming a submission opportunity. No one is served by this aspect of current review practice.
Don’t Abandon Ship!
In broad terms, the past 40 years saw a period of easy money (reportedly NIMH funded on the order of 80 percent of applications in the late 1960s), then a period of tightening partly compensated by a relative constriction in the personnel pipeline, then a severe constriction in funding, and recently a very substantial growth in both the NIH budget and the demand for those resources. The exact timing of the recent budget and demand growth have been somewhat different, leading to the current sense of lean years following fat years, but in fact the recent variability is small relative to that of the past 40 years overall.
The present situation is far from bleak: There was considerable energy in the room, among NIH staff and among researchers. The NIH budget really did double. The pay line, though well below what it was several years ago, is well above where it was before the doubling (6th percentile about a decade ago). Lots of research is getting funded! Lots of professional socialization is being accomplished by involving so many reviewers, especially junior colleagues, thus investing in the future of the field.
Please do step up and agree to do reviews if NIH gives you the chance. The Initial Review Group I chaired 10 years ago was a small (N = 12), stable group of diverse, superb scientists. We were comfortable, efficient, and accountable in a way very difficult to achieve in a larger or less stable group. I learned a great deal in terms of scientific substance, NIH operations and goals, and grantsmanship (especially how to speak to reviewers as a grant author). I learned that grant reviewing is generally very high-minded, not capricious, and a process one can engage in successfully if one is persistent. I also developed valuable collaborations with friends I made on the committee. Many subsequent experiences as a member of temporary committees and as an ad hoc reviewer on standing committees have not been nearly as rich, but they have still been worthwhile. So get in there and play! It’s in your own self-interest and a great benefit to the field. ♦
1 My own impression is that there is a relative shortage of senior people nationally in our field, broadly, and not just among NIH reviewers. This a function of many things, including the severe economic problems in the decade or so that began with the first oil crisis and the tightening of university budgets that followed, versus the substantial growth in university enrollment and the substantial growth in the national med-school soft-money research enterprise.