Meeting the Challenge of Scientific Dissemination in the Era of COVID-19: Toward a Modular Approach to Knowledge-Sharing for Radiation Oncology

On May 1 and May 22, 2020, a pair of high-profile articles were fast-track reviewed and published by the New England Journal of Medicine (NEJM) and The Lancet, venues widely regarded as among the most prestigious of medical journals.1Mehra M.R. Desai S.S. Ruschitzka F. Patel A.N. Retracted: Hydroxychloroquine or chloroquine with or without a macrolide for treatment of COVID-19: a multinational registry analysis.Lancet. 2020; https://doi.org/10.1016/S0140-6736(20)31180-6Crossref Scopus (123) Google Scholar,2Mehra M.R. Desai S.S. Kuy S. Henry T.D. Patel A.N. Cardiovascular disease, drug therapy, and mortality in COVID-19.N Engl J Med. 2020; 102: 1-7Google Scholar The Lancet article reported a multinational registry analysis of chloroquine with or without macrolide antibiotics in patients who were infected with the novel severe acute respiratory syndrome corona virus-2 virus, and an NEJM manuscript from the same group investigated angiotensin-converting enzyme inhibitors and angiotensin-receptor blockers in patients who tested positive for coronavirus disease 2019 (COVID-19). These papers would have been a pinnacle achievement for the academic coauthors, in addition to the supporting company Surgisphere, who reportedly supplied the data. Led by the vascular surgeon Sapan Desai, this small company with “big data” aspirations redefined research priorities and patient study allocation with their remarkable results. Unfortunately, these august journals would soon be roiled by controversy when it became evident that the data may have been falsified for both papers.3Mehra MR, Desai SS, Kuy S, Henry TD, Patel AN. Retraction: Cardiovascular disease, drug therapy, and mortality in COVID-19. N Engl J Med. https://doi.org/10.1056/nejmc2021225. Accessed June 27, 2020.Google Scholar The subsequent debacle serves as a cautionary tale of the systematic failure modes of traditional avenues of sharing and verifying clinical science, particularly when applied to fast-tracked research.

. These papers would have been a pinnacle achievement for the academic coauthors, in addition to the supporting company Surgisphere, who reportedly supplied the data. Led by the vascular surgeon Sapan Desai, this small company with "big data" aspirations redefined research priorities and patient study allocation with their remarkable results. Unfortunately, these august journals would soon be roiled by controversy when it became evident that the data may have been falsified for both papers. 3 The subsequent debacle serves as a cautionary tale of the systematic failure modes of traditional avenues of sharing and verifying clinical science, particularly when applied to fast-tracked research.
Warning signs regarding the scientific integrity of these publications were posted not in a traditional journal, but via the Zenodo preprint server, as a near-immediate open letter to the Lancet. 4 Statistician James Watson led signatories to critique the Lancet and NEJM's fidelity to their own policies on data transparency, noting, among other issues: "the [Surgisphere] authors have not adhered to standard practices in the machine learning and statistics community. They have not released their code or data" nor external study preregistration with an ethics board. The letter demanded "Surgisphere provide[s] details on data provenance, [with] independent validation of the analysis [and] open access to all the data sharing agreements cited above." to verify findings in the Lancet article. 4 A retraction of the Lancet article followed, as the data could not be verified. In early June 2020, the results in NEJM were similarly repudiated, "after concerns were raised with respect to the veracity of the data and analyses conducted by Surgisphere Corporation." 4 The COVID-19 pandemic has exposed both long-standing and emerging issues with scientific review and dissemination. Although the pace and scope of scientific output in response to the pandemic are commendable and necessary, it has outstripped already fragile capacity and accountability mechanisms for ensuring scientific internal validity, rapid dissemination, credibility, and verifiability. Failure to achieve these critical components of scientific communication and credibility has tremendous potential for real-world harm (as in the Surgisphere debacle). Therefore, it is imperative that the scientific community optimally balance speed with rigor, and is held to account via transparent, modular, and verifiable standards to maximize reproducible research. Achieving these ambitious objectives for improving transmission of scientific knowledge requires using a diverse array of novel tools at our disposal. 5

Preprints: Accelerated Research Transmission in a Pandemic
The COVID-19 crisis gripping the world has justifiably led to an increased need for efficient scientific dissemination, with a resultant rapidity observed in efforts at transmission across both traditional streams of scientific discourse (eg, scientific manuscripts, 6,7 society journal consensus guidelines 8 ) as well as more novel mechanisms with various levels of peer review (eg, preprints, 9 social media 10 ). In 2019, a clinical medicine preprint repository, medRxiv, was made publicly available, allowing clinical research to be posted before peer review in a manner mimicking those widely used in physics (arXiv), psychology (psyarxiv), chemistry (chemRxiv), engineering (engRxiv), social sciences (SocArxiv), and basic biomedical science (bioRxiv). Coupled with the pressing thirst for usable information amid a fatal pandemic, the use of preprints has blossomed since late January 2020 11 ; similarly, reputable peer-reviewed journals from NEJM to the International Journal of Radiation Oncology Biology Physics have accelerated both their review processes and online posting of accepted peer-reviewed manuscripts with impressive legerity, leading to admirably paced communication in a time when timeliness is critical.
The value of preprints for near-immediate dissemination of research findings, their ease of use, their ability to circumvent traditional journal politics and dominant narratives, and their accompanying lack of restrictions before knowledge-sharing has been highlighted even further with COVID-19, with 32% of the National Institutes of Health (NIH) Office of Portfolio Analysis represented as preprints, whereas the PubMed/preprint ratio was at 3% as of last year. In the interval since COVID-19 became a global pandemic, the preprint phenomenon appears, anecdotally, to have grown substantially within radiation oncology. In a recent example, to combat initial shortfalls of personal protective equipment (PPE), Twitterati on social media forums began to discuss the practicability of radiation as a method for PPE sterilization and reuse. 12,13 Within days, pilot protocols were developed and a preprint generated to use laboratory biosafety cabinets as a method to stretch dangerously short supplies of previously disposable PPE for health care workers. 14 Near simultaneous efforts were made by other groups at ultraviolet-based sterilization and were disseminated directly via university website. 15 At present, none of the relevant scientific content from these works has yet been credentialed as peer-reviewed. 1 Nonetheless, in the heart of the PPE shortage, decisions were actively being made how best to reuse life-saving PPE with these prereview data from radiation oncologists. Without preprints, we would not know these findings even existed in a timely enough fashion to consider for practice during an acute PPE shortage.
Although there has previously existed fierce debate about the potential and pitfalls of preprints, the wave of COVID-19 preprints has rendered even heated theoretical arguments about the acceptability of preprints practically moot, as a tsunami of research teams have raced to get results on the servers as fast as possible. [16][17][18][19] Although the mass uptake of preprints for COVID-19 data are evident (medRxiv/biorXiv catalogs more than 5000 COVID-19erelated preprints at present), cancer, which kills nearly 600,000 persons per year in the United States, has not seen the same embrace of preprints to date by the radiation oncology community (ie, w150 preprints met search criteria for the keywords "radiotherapy" OR "radiation oncology").
A challenge of preprints, in many instances, is that the lay public, media, and some scientists have treated preprints as fully vetted scientific analyses, and amplified papers that would likely not pass muster in a more thorough journal review process; thus, indicators of internal validity that are evidence of some level of preepeer-review rigor are a useful indicator of quality for rapid dissemination such as preprints. Critics of preprints also point to rampant dissemination of poor quality or methodologically thin research as drawbacks of the platform and suggest this demonstrates the necessity of peer-review (although extant preprint data suggest that the majority of the effects of peer-review may in fact be cosmetic); thus, efforts to mitigate publication biases in both preprint and ultimate peer-review contexts are warranted.
Preregistration: A Tool to Combat Internal Bias in, During, and After COVID?
The findings of the seminal work "Why Most Published Research Findings Are False" have become so recited as to almost be a mantra for critics of scientific discourse, 20 who note widespread biases across published scientific literature. Foremost among these are "p-hacking" (repetition of analyses until a "significant" result emerges), the "file drawer effect" (whereby positive trials are published and negative studies are either not reported or discarded to an editorial manuscript limbo), "salami slicing" or duplicate publication bias (publishing multiple pieces of redundant or nearly overlapping research), or any number of other identifiable bases plague clinical research. 21,22 Among these, P-hacking and interpretation bias (eg, "borderline significance" for nonsignificant statistical endpoints, "statistically significant" but clinically inconsequential observed differences) have vexed statisticians to the extent of revising definitions of statistical significance. 23 This bias becomes especially critical in situations in which there is minimal established comparative prior knowledge, such as during the current pandemic involving a novel corona virus with high morbidity and mortality. Preregistration can serve as a powerful tool to mitigate these pitfalls.
By specifying one's research plan on a registry in advance of performing the study (preregistration), or submitting the methodology and statistical design to a journal for review before performing the study (prereview), biases can be prevented, or at least identified more clearly. Preregistration also reduces the capacity for data molding, P-hacking, or convenient hypotheses shifting, and may thus increase analytical rigor, as early results show preregistration increases the publication of null findings. 24,25 Impressively, the Red Journal was among the first journals to pilot prereview 26 ; however, to date, most radiation oncology researchers have eschewed preregistration, anecdotally arguing that the process commits the authors to a single journal. Further, the current Red Journal website makes no notation of prereview, and many radiation oncologists are, anecdotally, unfamiliar with the concept. In addition to journal-based prereview checklists, groups like the Open Science Foundation provide an independent avenue of easily used templates to preregister hypotheses, planned experiments, sample sizes, and statistical analyses, and can be generated before or after data collection. 27,28 The resulting time-stamped, digital object identifier (DOI)elabeled document, called a registered report, need not be constraining, but serves to demonstrate that alterations from planned research activities were fully transparent in intent and execution. Sadly, the process has not become normative in radiation oncology, with 1 glaring exception: clinical trials.
The Red Journal, like most within its scientific echelon, since 2015 has stipulated all clinical trials must be preregistered (eg, listed in clinicaltrials.gov), stating "Taken together, mandatory trial registration improves transparency, reduces the potential for bias, and should help to allay public concerns regarding possible manipulation of research findings for commercial or academic benefit." 29 Clearly, when it comes to clinical evidence, preregistration, if not prereview, is considered a stable standard in radiation oncology (as in other fields) in scenarios where critical patient care decisions are concerned.
A recent episode illustrates the powerful combination of preregistration and preprints. This spring, interest in reviving historic methods of low-dose radiation therapy for pneumopathy from early in the last century resulted in a piquant social media discourse, a subsequent review, editorials, and a flurry of commentary in the Green Journal [30][31][32][33][34] Rather than the standard peer-review timeline, or responses via editorial, a domestic group from Emory University had preregistered a trial, executed the pilot study, published results as preprint on medRxiv, and had an overview of the work featured online in Forbes 35,36 before the next issue of the Green Journal had arrived in mailboxes. Given the fact that radiation oncology is rarely mentioned in general audience magazines like Forbes, and given the discussion generated regarding the article within the scientific community, this approach, from a pure dissemination assessment, rendered peer review ancillary, if not moot. Advocates of preregistration have accelerated review processes in response to the novel coronavirus; for example, after an initiative by the Royal Society Open Science aiming for review of registered reports in less than 1 week from submission, a bevy of other journals and more than 450 referees volunteered to assist in the effort. 37 As Red Journal Editor-in-Chief Anthony Zeitman presciently noted in a 2017 commentary: "Many investigators report their work in data repositories, on archive sites, or on their own websites, all publicly available and easily searchable. This is the postjournal world toward which we appear to be heading." 38 We appear to have entered an era in clinical science post-COVID; however, it remains to be seen whether these instances represent large-scale adoption or crisis-driven limited use-cases.
Open Access: Publication Accessibility and Equity After the Current Crisis Overlaying this discussion of preprints is a similar discussion regarding full accessibility of findings, or open access. Starting in 2004, and mandated in 2008, the NIH stipulated that publicly funded research be made available to the public via PubMed Central "immediately upon acceptance for publication." 39 This effort to democratize access to federally funded research has moved apace with the Berlin, Budapest, and Salvador declarations, which describe the need for global accessibility to research data as an essential matter of equity and which have driven a suite of international policies and activities designed to increase open access to manuscripts after peer review. [40][41][42] The COVID-19 pandemic toppled barriers to publication access across platforms and stakeholders, [43][44][45] as early in the pandemic the United States and international National Science and Technology Advisors challenged publishers to voluntarily make COVID-19erelated publications (in addition to data) immediately accessible in public repositories such as PubMedCentral. 46 As we move (presumably) to a post-COVID-era, Plan S, a pan-European initiative, is slated to start in January 2020 (now pushed back to 2021). It would require that all EU-funded efforts immediately upon acceptance "must be published in compliant Open Access Journals or on compliant Open Access Platforms" and asserts "all researchers should be able to publish their work Open Access," 47 ensconcing the current COVID-19erelated open access imperatives across biomedical science (at least for the EU). It has been proposed that deposition of preprints would meet the mandated criteria for open-access, with a finalized version subsequently published by a traditional journal, leading to a confluence of consideration regarding the relationship between preprint, peer-review, and postpublication open access. Notably, Plan S also mandates that, to be compliant, a journal must provide within open access publications the ability to directly "link to raw data and code in external repositories."

Data Sharing: FAIRness in Data Accessibility
The data provenance underlying the Surgisphere analyses was almost immediately suspect. 4 Further, the editors of the Lancet and NEJM were, fascinatingly, not just critiqued for the acceptance of the manuscript, but also for their inability to verify data provenance. This, at some level is, in our estimation, a fairly novel critique for journal editors; previously, the idea that the purview of a journal would entail, in any sense, provision of some certitude of experimental data quality or origin, would have seemed bizarre in the pre-electronic era, when laboratory notebooks were physical objects, rather than a JuPyTer notebook. Data and code accessibility (which, we must be clear, is an entirely different enterprise than traditional peer review has typically concerned itself with) has become not just an appendage, but a central normative component of peer review. Recent retractions of a set of articles from "L'affaire Surgisphere" are particularly informative and illustrate another issue: the limitations of both peer-reviewed manuscripts and preprints in terms of quality control with regard to data availability and data transparency.
The need for a structure to index and annotate these shared data has led to a segment of journals dedicated to publishing data (as opposed to analyses) using templated records explaining the structure and content of deposited information called data descriptors. 48 Some traditional journals (eg, The American Association of Physicists in Medicine's flagship journals, Medical Physics and the Journal of Applied Clinical Medical Physics) facilitate publication of a citable data descriptor (ie, a formal description of a publicly accessible data set, describing the data and directing the interested reader to the relevant data repository as a PubMed-listed citation). Sadly, our corresponding flagship radiation oncology journals do not formally offer such a resource for data descriptors. Other virtual journals have stepped into the breach, offering avenues for publishing peer-reviewed data descriptors, provided the descriptor uses "field specific" standards of annotation and the data repositories meet (somewhat vague) community data norms. Adding another layer of complexity, some data repositories (ie, the storage location or data warehouse where shared files/information is permanently housed), such as the NIH Figshare, also serve as avenues for nonpeer-reviewed data publication, issuing a DOI but without external assessment. Thus, as with preprints, publication does not necessarily imply peer review, again confusing the uninitiated, who heretofore have been able to rely on the near one-to-one linkage between "published" and "peer-reviewed. 49 COVID-19 has, if not for radiation oncology, for infectious disease research, again been transformative through disruption. In addition to the implementation of existing data repositories (such as the Global Initiative to Share All Influenza Data (GISAID) viral genome datasharing platform previously used for influenza and avian flu), experts have called for making full-scale bulk anonymized electronic medical record (EMR) data broadly accessible to researchers, stating, "In this interconnected world, we can imagine a unifying multinational COVID-19 electronic health record waiting for global researchers to apply their methodological and domain expertise." 50 Driven by the fiery manifesto of Future Of Scholarly Communication and e-Scholarship (FORCE11) to "Rethink the unit and form of the scholarly publication," 51 NIH has recently embraced data-sharing via adoption of findable, accessible, interoperable, and reusable (FAIR) guiding principles. Data must be findable, accessible, interoperable, and reusable, not only at the human level, but also for machines (eg, indexing tools or software) (Table 1). 52,53 The FAIR principles describe the "how" of data infrastructure as an outgrowth of an ethos of transparency and equity that has been growing in medicine for years, and are supported by the Institute of Medicine, encouraging stakeholders (eg, cooperative groups, journal editors, specialty societies) that "data sharing [is].the expected norm." 54 Throughout the scientific enterprise, data dissemination, rather than merely article acceptance, is now a direct end-goal of the scientific process. This is a radical break from past eras, when data were hoarded as a precious commodity and not treated as intellectual commons, and norms are evolving in many fields. 55 The need for data sharing and repositories has also become more apparent via technical innovations such as machine and deep learning, as these require large pooled data sets to generate data-driven clinical decision models. [56][57][58] This need particularly has accelerated the FAIR data principles' integration in radiation oncology for structured machine readability (ie, index-ability/search-ability) and annotated data curation, as opposed to data qua unrefined data, are imperative. These principles are structurally reflected and supported by recent investment in shared data infrastructure, such as the NIH Strategic Plan for Data Science and Data Commons Framework. 59 [61][62][63] In our minds, although preregistration, preprints, peer-reviewed, postpublication open access, and FAIR-principled data publications have synergistic value, they are designed to serve fundamentally different purposes, offered through different venues, and serve different communities of stakeholders. Simply put, these scientific "modules" serve to provide gains in distinct domains of scientific dissemination: Preregistration increases internal validity of scientific knowledge dissemination. Preprints increase speed/transferability of scientific knowledge dissemination. Peer review increases credibility/interpretability of scientific knowledge dissemination. Open access increases the availability/equity of scientific knowledge dissemination. Data availability increases reproducibility/reusability of scientific knowledge dissemination.

A Proposal for Transparent Modular Scientific Dissemination
The fact that these structural modules serve distinct, related, but nonoverlapping goals has led to the current ecosystem where a multitude of iterative, unlinked steps results in a decentralized and poorly standardized corpus, all indexed in different manners and scarcely identifiable as a single consolidated scientific enterprise (Fig. 1). For example, in a scenario where a study is corrected, retracted, or edited or there is a variance in published data found, there may be no direct method to ensure, outside of the original author's good judgment, that corrections, errata, or retractions are perpetuated "upstream" (ie, back to the original preregistration or preprint) or "downstream" (ie, to a prior journal, open-access, or data repository), although there are avenues in PubMed for linking the retraction notice to the prior publication. In our estimation, a major conceptual limitation that precludes the scientific enterprise in COVID19 is the historic reliance on the peer-reviewed manuscript as the definitive "quanta" of scientific information. Historically, manuscripts were considered as complete, self-enclosed, and self-contextualizing, using the standard introduction, methods, results, and discussion format in such a way that, presumably, the peer referee and ultimate reader could reconstruct the scientific enterprise of the author with some veracity. In an era of simple experiments and high-trust nondiverse stakeholders, reputational assessment served to preclude inadvertent lack of scientific rigor or value (as peer review presumes good faith on all actors and is not calibrated particularly well for adversarial fraud detection). However, with modern experimental design (eg, Bayesian statistics, Markov models, machine learning approaches) and massive big data analyses using increasingly complex statistical techniques on increasingly arcane data elements (viz EMR data, epidemiologic records, genome-wide association studies, or radiomics variables), how could any reasonable person expect the interpretable whole of a scientific undertaking to be performed in a manner derived when the requisite referee skillset was a deep knowledge of the (admitted much smaller) corpus of extant literature and some basic knowledge of statistics?
Instead, we advocate the ideal of a unified thematic project, with direct linkages between component processes, in what might be termed as "scientific modules" as puzzle pieces of a larger holistic process (Fig. 2). In the most simple instance, this would involve a "check-list" notification that records the linked metadata or simply the DOIs of all prior "modules," with manual posthoc updating of indexed archival documents at "project" completion. 64 For example, at preprint submission, the existence of a registered report or an extant data deposition would be formally affirmed or denied; if affirmed, the relevant DOI(s) would be provided and added as a link on the preprint server. Similarly, at peer review, the referees would have confirmation of the existence or absence of an available registered report, preprint, and data deposition, and these could then be used as ancillary justification for acceptance or revision. Finally, at "final" article or data publication, the prior work would be listed as serial DOIs, allowing ready access to all modular components as a single package or linked "provenance metadata." 65,66 A preliminary checklist (Modular Science Checklist) has been drafted by the authors; conceivably it or an analogous document could be submitted with each "scientific module" (preprint, data descriptor, peer-review submission, etc) for clarity, with a final version completed after deposition/ publication of all modules, as an analog "content tracker form" to assure transparency across a series of currently disparate steps, 67 until end-user usable standardized provenance metadata solutions (such as those proposed by Mahmood et al 67 ) are realized in radiation oncology specifically or medical science generally.
In the future, however, we can envision an integrated (or at least, interoperable) modular science dissemination process (Fig. 2), where a software infrastructure is capable of linking prior, current, and future modules dynamically, such that a researcher need not bear the onus of modular science unassisted. For example, linked DOIs (if not metadata) of all modules across the project could be forward-or back-propagated dynamically. This would prove especially valuable in cases of retraction, errata, correction, or data updates, as the relevant information would be "embedded" not only in later references but in previously submitted modules; this is truly transparent science, but would require that the current puzzle of systems be formalized at some level, which requires a singular committed vision of the scientific process above and beyond each modular aim. 68 Reasonably, we feel that this will happen, if at all, through the leadership of concerted scientific societies and publication, journal, or repository agents serving as mediators of a modular science infrastructure, as individual researchers are unlikely to see added value, even in the instance of increased scientific transparency and rigor, in the context of substantively increased ad hoc clerical burden. However, as COVID-19 has shown us, in a crisis, new avenues of dissemination (ie, preprints) may surge in popularity, even in the absence of direct integration with traditional publication venues; by consolidating the process, scientific communities can enhance the quality of the entire modular scientific publication "chain" rather than myopically concerning themselves only with the traditional safe harbor of peer review.

Epilogue: Back to the Future
This is not the first time a novel zoonotic pathogen has served to spur speed in scientific missives. During the 17th century, when most print media was heavily government controlled, the waves of plague afflicting England necessitated rapid public tracking of up-to-date regional mortality. In that era, as in ours, the information dissemination rapidity resulted in a lack of central control.

Research plan
Pre-registration Pre-print

Modular Open Science
Putting the pieces to together Transparent modular documentation Fig. 2. Proposed transparent modular scientific dissemination process, using either provenance metadata or digital object identifier (DOI) linkages to link iterative or sequential processes for a scientific process. Presumably, these linkages could be further refined via technical interoperability between current discrete processes in Figure 1.
What came to be known as the "Pamphlet Wars" was a feud between the upstart Chemical Physicians Society and the exclusive College of Physicians. With a desire for increased influence, these educational societies, which were limited groups of w50 elite members, each published their own recommendations and guidance and disseminated highly divergent (and often erroneous) information. The varying and contradictory (mis)information of these pamphlets served to confuse the public; fraud and quackery were rampant. However, soon, a heretofore unconventional printing of handbills began (Fig. 3), with lay demographers jumping in to track rates of disease, publishing in English rather than Latin (the professional language of physicians). This data democratization meant that, rather than only being the purview of the learned and initiated through the filter of medical jargon, everyone had access to raw data, meaning that "ordinary Londoners, in addition to their governors, [could] anticipate a rise or fall in mortality, and.turn to medicine or prayers as circumstances or inclinations dictated." 69 This historical move toward open data publication not only reified the idea that research was not just for the elite, but also served as the accepted beginning of statistical epidemiology, allowing a rapid dissemination of data at a time generally thought to be starved of information. 70 Today, we must decidedas a specialtydwhether we will turn the 2019-nCoV coronavirus crisis into an impetus for improved faster, better, FAIR-er science, or whether we ignore the pressing need for transparent transmission of timely knowledge and consequently produce research that may benefit the endemic cancer crisis worldwide. In our minds, the concept of transparent modular science is a "disruptive integration" that brings the strengths of various scientific formats into a cohesive whole and represents an avenue for our specialty to lead into the post-COVID era with the full employ of each approach; otherwise, like the physicians of previous eras, we may find our pontifications ignored by those who can more ably share and disseminate data in democratic and interpretable formats in a timely and accountable manner.