by Gregory A. Petsko
This article is reproduced from the IUBMB News, issue 1 (February 2016), with kind permissions of the author and IUBMB (Dr. Michael P. Walsh, Secretary General). Dr. Petsko is Adjunct Professor at Cornell University, a member of the National Academy of Sciences of the USA and a former president of the American Society for Biochemistry and Cell Biology (among many other positions). His research provided major contributions for understanding structure-function relations of proteins, including many related to neurodegenerative diseases (Ed. Note)
High on my list of things that need changing in the culture of science today – and it’s a list that gets longer by the week – is the obeisance being paid by people who should know better to the meaningless, pervasive metrics that have skulked into our community like a burglar in the rosebushes. I am referring to the ubiquitous citation number and its illegitimate offspring, the impact factor and the h-index.
This problem may have reached its nadir (though I doubt it) with the appearance a couple of years ago of a sort of index of indices, the Q-index, which provides each academic member with individualised reports showing the key research and teaching performance data that are available from the university’s information systems. It also provides relevant benchmarks that support comparisons with average performance levels across the University and within units. The Q-index is comprised of two parts: the QR-Index focuses on measures of research performance and the QT-Index focuses on student evaluations of their teaching experience.
What amazes me is that faculty seem meekly to have accepted that their careers can be encapsulated in a single number – a number that is used to evaluate their performance comparatively with the performances of their peers.
Could any administrator ask for a better tool to turn the faculty against one another? All the faculty energy that should, in a normal university, be expended in fighting their natural enemies, the bureaucrats, is now directed towards internecine competition for the best Q-index. Pay raises, promotions, and as far as I know even the selection of one’s mate will now be left to a bunch of paper-shufflers who can justify every decision by referring to a number whose validity is not only unproven but unprovable, yet has the same mystic authority as an IQ score.
Which brings me, of course, to the IQ score. The legendary evolutionary biologist Stephen Jay Gould devoted an entire book, The Mismeasure of Man, to debunking that particular metric and its abuse by racists and eugenicists. First published in 1981, revised and expanded in 1996, The Mismeasure of Man is a brilliant refutation of the idea that “scientific” data have proven – or can prove – the intellectual superiority of one group over another. What makes this book particularly relevant for our discussion is a concept that he introduces on page 27 in my copy of the 1996 edition: reification (from the Latin word res, meaning “thing”). Gould defines reification as a fallacy of reasoning that occurs when we try to convert an abstract concept (like intelligence) into a concrete entity (like an IQ score). I’ve traced the concept back to Alfred North Whitehead, who calls it the Fallacy of Misplaced Concreteness, and further back to William James, who in 1909 had this to say about it: “The viciously privative employment of abstract characters and class names is, I am persuaded, one of the great original sins of the rationalistic mind.” He called it the fallacy of Vicious Abstractionism.
Citation analysis marked the introduction of this fallacy into scientific discourse. At first, it seemed harmless enough: all it did was measure exactly what it claimed to, namely the number of times a paper was cited in the subsequent literature. But then reification set in. Citation number began to be conflated with the impact of a paper, even though “impact” is an abstract concept that should not – and cannot – be converted into a concrete entity. It was but a short step from that to the abomination of the impact factor, which purported to measure the impact of an entire journal by a single metric.
To make matters still worse, impact factor began to be conflated with the quality of the journal, even though impact and quality are two completely different things. Impact means the effect or influence of one person, thing or action on another; quality means the degree of excellence of something. You can be excellent without having much of an impact (see Bugatti, Ettore). You can have a huge impact without being excellent (see Trump, Donald). But most of all, neither of these completely different concepts, impact and quality, is a thing that can be quantified, especially in a single number.
Our European brothers and sisters appear to have sat by and watched while administrators seized upon the impact factor of where one publishes as a way to rank faculty. Promotion, salary increases and funding all became tied to how many papers one published in self-stylized “high-impact journals” – journals with impact factors considerably over 10.
Do you know how silly this is? The most important physics paper published in my lifetime, by a large margin I think, was published Feb 12 of this year: “Observation of Gravitational Waves from a Binary Black Hole Merger” by Abbott et al. (and, with over 1000 co-authors, there’s an awful lot of ‘et al.’). That paper was published in Phys. Rev. Lett., which has an official impact factor of 7.5; you could get fired in many European institutions for publishing in a place like that.
It’s not enough that worshiping at the altar of the impact factor has allowed bureaucrats with no scientific judgment to pass judgment on scientists by simple arithmetic. It has polluted the entire culture of science. Where you publish has now become an acceptable proxy for the content of the paper. I can’t count the number of times I have sat in a review panel for a grant or a fellowship or promotion and heard a fellow reviewer say, “So-and-so has published 2 Nature papers and one Cell paper”. And then when I, in my best fake innocent tone, ask that reviewer, “Uh, can you tell me what was in those papers and why they were important?” all too often I am met with the reply that the reviewer has not read them.
If the only thing that matters is to publish in a few journals, then of course everyone will want to publish in those journals, which gives said journals – and the non-practicing scientists who staff them – enormous power over the careers of people they have never even met.
Think it can’t happen here (here being wherever you are, unless of course it’s already happened there)? Think again. The metrics are on the march, to the beat of reification. In the U.S., at Rutgers University, the state university of New Jersey (a state known for both the pharmaceutical industry and organized crime), the administration has contracted with a company called Academic Analytics to measure the productivity of its faculty. You can read about this stupidity, and the reaction from the Rutgers faculty, in this excellent article from Inside Higher Ed. The company, you will be frightened to know, has 385 institutional customers in the U.S. and abroad, representing about 270,000 faculty members in 9,000 Ph.D. programs and 10,000 departments. This is coming soon to a theater near you.
A Rutgers faculty resolution against the contract reads, in part, “the entirely quantitative methods and variables employed by Academic Analytics — a corporation intruding upon academic freedom, peer evaluation and shared governance — hardly capture the range and quality of scholarly inquiry, while utterly ignoring the teaching, service and civic engagement that faculty perform.” It also notes more practical concerns, such as that “taken on their own terms, the measures of books, articles, awards, grants and citations within the Academic Analytics database frequently undercount, overcount or otherwise misrepresent the achievements of individual scholars.” The contract, by the way, does not allow faculty access to data about themselves. Rutgers administrators say that the data are only used to evaluate departments and programs, not individuals. If you believe that, I have a bridge in Brooklyn I’d love to sell you.
The problem is not the use of any particular number, it’s the use of any number. Under the guise of improved accountability and outcomes assessment, people of questionable critical ability are usurping the rightful position of those who should be making evaluations, by relying on metrics that are not only unproven but also irrelevant. Numbers don’t know about people, nor do they care. That can be a good thing, but not when it comes to passing judgment. For that you need wisdom, insight, and sometimes compassion.
Ron DeLegge II famously remarked that “99 percent of all statistics only tell 49 percent of the story.” He was right, but only up to a point. Sometimes they tell none of it.
[The views expressed in this article are those of the author and do not necessarily reflect the oppinion of all Redoxoma members.]