metrics

More on impact factor metrics: are we ready to get rid of them?

Submitted by redoxoma on Wed, 09/26/2018 - 21:20

The Radical-Free corner by Francisco R. M. Laurindo

Recently, a young investigator wrote to Nature [1] urging to "stop saying that publication metrics do not matter and tell early-career researchers what does" when rating the scientific achievements of young investigators. This message highlights that increasing awareness against the inappropriate use of impact factor (IF) metrics for evaluating CVs is bringing, as a side effect, undertainty and lack of clarity on how someone's career achievements will be evaluated. This boils down to the simple question of whether we are ready to get rid of IF metrics.

First of all, it is important to say that it is increasingly accepted that using IFs as the sole metrics to evaluate someones's achievements in science is flawed by a number of reasons [2]. This tendency has led to the San Francisco "Declaration on Research Assessment" (DORA) in 2012, which has been since signed by over 500 institutions and 12000 individuals, calling among other issues, to "the need to eliminate the use of journal-based metrics, such as Journal IFs, in funding, appointment, and promotion considerations" and " the need to assess research on its own merits rather than on the basis of the journal in which the research is published ". In fact, recent experiences indicate that hiring investigators on the basis of addressing their achievements and contributions with interviews rather than traditional publication-based CVs has led to improved results [3]. Indeed, the current NIH-type biosketch (not dissimilar to FAPESP) is centered on the value of each individual's contribution to science.

All these welcome advances may have raised the perception, expressed in the comment from the first paragraph, that IF-related metrics do not matter any more. I believe the cold fact is that they still do to a reasonable extent and presently there is no completely adequate replacement for them, including the evaluation of young investigators. The distortions of strictly adhering to IFs and numeric scores should not be taken to mean that we have clear validated alternative methods available. Looking to someone's specific contributions and having a holistic approach when comparing a few candidates for academic purposes may indeed prove successful even with today's tools due to the low scale of this task. But even so, many evaluators still run their parallel evaluations of IF metrics, as this is yet so much embedded into our collective unconscious and provides some numerical scores with a security blanket of objectivity. However, problems become particularly acute when competitively evaluating a large number of CVs, e.g., regarding scholarships or large-scale research awards. As unfair and innaccurate as it would be to blindly rely only on the metrics, it would also be unfair to ignore them. Despite the limitations, there is indeed some gross correlation at least between the highest journal IFs with the quality and completeness of published work and ensuing amount and quality of the effort put into it. Along this line, IFs and number of articles do tell something about the capacity of the individual to choose important problems, to focus deep into a given problem until the end, to work hard and intensively into scientific questions and to be able to finish coherent stories about them. For the early-career investigator, relying solely on citations can be innappropriate because many good articles take a long time to get cited. Thus, number of published articles and their IF-related metrics are still a default basis for early-career CV evaluation. In fact, much before IFs were invented, everyone knew the most prestigious publications and those who published on them were positively considered.

On other hand, the limitations of those metrics are real. I believe that down the road these numbers can only help separate candidates that are very good or excellent from those that are merely good or median. However, metrics can significantly fail when trying to separate the top candidates from those that are just not as excellent. So, what can be done to improve on these issues? The DORA followers are getting away with all the metrics, however this leaves everyone, specially the young investigators, uneasy about how they will be evaluated and what are the best career strategies [1]. Without having the illusion (or arrogance?) that I will set the last word on this complex subject, I believe we are not yet ready to get rid of IF metrics in general, but the system can indeed take several extra paths to perfect, reinterpret and at the end eventually ignore them. Here I list some features that are increasingly being taken into account in the evaluation of young investigator's CVs. In the absence of a better collective term, I will call them modifiers, that is, each of them can potentially enhance or decrease potential inferences derived from metrics.

An important modifier is the intrinsic quality of the work, that is, overall degree of innovation, extent of contribution, implications for novel ideas or for potential applications, accuracy and completeness of the investigative strategy, and so on. Logically, the works having these qualities will usually take much more time to be performed and this allows less time to publish other papers, potentially decreasing the number of published items. Additionally, in some cases, these works may be published in journals that despite their solid reputation and tradition, along with lengthy and demanding reviews, do not display proportionally high IFs. Such are the cases of Journal of Biological Chemistry, Journal of Molecular Biology, American Journal of Physiology… among others. Contrarily, some journals use a number of strategies to unrealistically maximize their IFs and the intrinsic value of the work they publish may not be proportionally as high (this is a good theme for future discussions). Evaluators should take all these issues into account. However, it is important that the scientist being evaluated does not assume that reviewers will appreciate the quality of someone's work by default: reviewers are uniformely very busy and may not be from the particular subarea that would readily understand the specifics. Thus, the intrinsinc qualities of each one's work have to be explicitly clarified by the author. The investigator's biosketches from many research agencies, including Fapesp, provide appropriate space for the investigator to write in a few lines what is the contribution and novelty of that paper, as well as anything else that can indicate intrinsic value (e.g., it is a pioneer work in specified aspects, it contributed to diagnostic or therapeutic advances, it served as a basis for public policies, etc) or the community's perception of value (e.g., it promoted the invitation for a relevant talk, it was the theme of an editorial comment or chosen as the cover article, etc). This will contribute to identify intrinsic qualities in published work, which will adjust and improve the interpretations of numeric scores, in some cases even allowing one to get totally out of them. Interestingly, for unclear reasons, investigators have rarely made use of this strategy at Fapesp, although the biosketch format allows that.

On the opposite side, in some cases young investigators display a CV characterized by a large number of publications, however of intrinsic low value: incremental contributions, not-so innovative advances or questionable methodology. Given the conundrums of the scientific publishing scenario nowadays, these works do get published somehow. In other cases, the works are multiauthored without a clear contribution of the author being evaluated, which sometimes will appear as a middle-author amongst several others. There is nothing wrong – and actually it is good – to get involved in many investigations from a given group. Moreover, in some cases of multiple high-quality cooperative work, a middle position by no means indicates a negligible contribution. Again, disclosing the author's contribution for that particular work in the space provided in the biosketch is essential and will help understand the potential value of the author's contribution and how it differs from the so-called "salami-splicing" type of CV. Moreover, I suggest that the authors separately highlight, in their biosketches, only their few principal works by which they want to be evaluated and leave the others as a group of collaborations. That will avoid that noise from too many works obscures what really matters.

A further enhancer in a CV is what I would call "vertical coherence", that is, the connectivity accross each of the investigator's papers, allowing one to foresee the emergence of an investigative track. This multiplies the importance of each work, so their overall value is larger than the sum of each part. Again, these connections must be emphasized and explained by the investigator.

Another relevant aspect is that there are other dimensions of impact of a scientific work that transcend the scientific sphere. This is recognized now by many research agencies, including Fapesp, and comprise : 1) Social relevance and 2) Economic impact. Furthermore, in some areas, general metrics of impact are lower than those in other areas as a characteristic of the field. Again, in such cases, the relevant information will not be readily obvious for reviewers that are not too specialized and thus should be clearly highlighted by the investigator.

These considerations indicate that a number of parallel aspects can affect the perception and interpretation of the IF metrics, providing a more accurate and fair picture. Are these modifiers subjective? Perhaps yes to a good extent, but one has to balance the problems. Certainly, at this time we still face a paradox. As discussed above, IF metrics is still embedded into the system. On the other hand, it is likely possible that incorporating the modifiers described in this essay and trying more and more to have a systematic approach to them will enhance the capacity to select the best achievements while getting off the numerical score tiranny. Decorating the basic metrics with such a "systematic subjectivity" evaluation and perfecting it along time seems more realistic, feasible and less traumatic to early-career scientists than just abandoning metrics all of a sudden. While we love to hate IFs, they are still deep into our minds.

J. Tregoning. How will you judge me if not by impact factor? Nature, 558(7710): 345, 2018 | doi: 10.1038/d41586-018-05467-5
J. K. Vanclay. Impact factor: outdated artefact or stepping-stone to journal certification? Scientometrics, 92(2): 211-38, 2011 | doi: 10.1007/s11192-011-0561-0
S. L. Schmid. DORA Molecular Biology of the Cell, 28(22): 2941-4, 2017 | doi: 10.1091/mbc.e17-08-0534

Francisco R. M. Laurindo, Editor in Chief of Redoxoma Newsletter
Heart Institute (InCor), University of São Paulo Medical School, Brazil

More on impact factor metrics: are we ready to get rid of them?

Tags

Add new comment