F. Pacheco-Torgal: My letter to the editor of the Journal of Informetrics criticising the paper "From 'Sleeping Beauties' to 'Rising Stars'

See below the text of my letter criticizing the thesis proposed by J. Gorraiz, the author of the paper, who is affiliated with the University of Vienna.

J. Gorraiz’s recently published paper presents a conceptually stimulating and metaphor-laden examination of the ideological foundations of bibliometrics, tracing their origins to religious, moral, and philosophical traditions. While such a reflective approach is thought-provoking, the paper suffers from several substantive limitations—particularly when read in light of ongoing debates surrounding the practical utility, cost-efficiency, and predictive power of bibliometric tools in research evaluation.

The paper’s central thesis—that bibliometrics derive from religious and philosophical traditions—is built on an extended metaphorical scaffolding. Citations are likened to divine judgment, H-indexes to spiritual tallies, and “sleeping beauties” to secular miracles. While these metaphors may have rhetorical appeal, they ultimately distract from more pressing empirical and methodological issues. There is no engagement with recent literature on field-normalized citation metrics, responsible metric frameworks (such as the Leiden Manifesto or DORA), or citation dynamics in different disciplines. Nor does the paper propose concrete methodological or policy alternatives. The result is a text rich in allegory but impoverished in evidence, leaving readers without clear guidance for improving bibliometric practice.

Gorraiz asserts that a low citation count does not imply irrelevance; it may reflect novelty. While this claim holds some truth, the argument is selectively framed and omits crucial empirical counterevidence. Notably, he fails to mention the robust findings from Clarivate Analytics, whose “Citation Laureates” methodology—based on identifying papers with exceptionally high citation counts (over 1,000)—has successfully predicted more than 70 Nobel Prize winners. Moreover, it is worth recalling the analysis by Traag and Waltman (2019), which demonstrated that citation-based metrics exhibit a strong correspondence with expert peer review assessments, particularly in fields such as Physics, Clinical Medicine, and Public Health.

These findings collectively suggest that, far from being inherently unreliable, well-calibrated citation metrics can serve as a meaningful and practical complement—or, in some contexts, a viable alternative—to traditional peer review in the evaluation of research performance. These results strongly suggest that novelty and high citation impact are not mutually exclusive, and in fact, may often coincide. By disregarding this evidence, the paper constructs a false dichotomy between citation count and originality, while ignoring one of the most compelling demonstrations of bibliometrics' predictive capacity.

A still more consequential omission in the author’s analysis lies in the near-total absence of engagement with the underlying economic rationale for the widespread adoption of bibliometric tools. While the discussion frames citation indicators primarily as symbolic gestures or ritualistic artefacts within the academic system, it largely overlooks their pragmatic role as scalable and cost-efficient proxies in research evaluation—particularly in contexts where peer review faces severe logistical and financial constraints. Peer review, though indispensable in certain contexts, is notoriously resource-intensive: national research assessments such as the UK’s Research Excellence Framework (REF) have incurred costs exceeding £250 million per evaluation cycle. Similar pressures are evident in hiring processes, tenure reviews, and grant allocation panels, all of which require substantial investments of time, coordination, and expert labour.

Crucially, empirical evidence undermines the dismissive treatment of bibliometrics: Abramo et al. (2019) demonstrated that citation-based indicators not only outperform peer review in predicting subsequent scholarly impact, but also exhibit increasing predictive accuracy over time. These findings bring into sharp relief the structural trade-offs between speed, cost, and precision that evaluation systems must navigate. Bibliometric measures—despite their well-known limitations—offer reproducible, transparent, and broadly applicable screening mechanisms capable of alleviating the evaluative burden on human reviewers. Any critique that ignores these economic and operational realities, while failing to articulate a credible alternative framework, risks producing an analysis that is philosophically stimulating yet practically inert in the policy and administrative domains where evaluation decisions are actually made.

Which approach is more detrimental to the progress of science: implementing a hybrid model of abbreviated peer review augmented by quantitative metrics—thereby conserving substantial financial resources—or relying exclusively on comprehensive, resource-intensive peer review protocols that allocate those funds away from direct research support? Moreover, how might the latter paradigm exacerbate inequities in research assessment for low-income countries, which lack the financial capacity to underwrite such costly evaluation processes?

Finally, allow me to provide you with some insights into my homeland, Portugal, which has experimented with both approaches. In a prior Portuguese research assessment conducted in 2013, the international experts serving on the evaluation panels enjoyed complete autonomy. They had the freedom to evaluate research units through on-site visits and also had access to a comprehensive bibliometric analysis, utilizing data from Scopus, which was expertly conducted by Elsevier and generated a range of valuable metrics (Publications per FTE, Citations per FTE, h-index, Field-Weighted Citation Impact, Top cited publications, National and International Collaborations).

However, in recent years, we experienced a shift in perspective, with a Science Minister who shared similar sentiments with those critical of bibliometrics. During the most recent research assessment in 2018, which involved the evaluation of 348 research units comprising nearly 20,000 researchers, the Evaluation Guide clearly dictated that absolutely no metric could be used by the panels (note that all panels were composed by international experts, 51 from UK, 21 from USA, 17 from Germany, 17 from France, 11 from The Netherlands, 8 from Finland, 8 from Ireland, 7 from Switzerland, 6 from Sweden, 5 from Norway and also from other countries).

Nonetheless, once the research assessment had concluded, I conducted an extensive search through all the reports across various scientific areas. What I discovered was that the reviewers assigned significant importance to the quantity of publications and the perceived “quality” of journals, even though such considerations were expressly prohibited by the Evaluation Guide. I found that “publications”, “quartiles” and even “impact factors” were mentioned in the assessment reports more than 500 times. Meaning that in the absence of any metric the international experts (somewhat ironically) decide to use the worst of them all. Such findings lend strong support to the observations of Morgan-Thomas et al. (2024), who noted that the historically robust association between journal rankings and expert evaluations persists unabated, despite institutional endorsements of the principles articulated in DORA. This enduring pattern underscores a profound tension between formal evaluative guidelines and the implicit heuristics that experts continue to apply in practice.

quinta-feira, 7 de agosto de 2025

My letter to the editor of the Journal of Informetrics criticising the paper "From 'Sleeping Beauties' to 'Rising Stars'