Abstract
Citations are often used as a metric of the impact of scientific publications. Here, we examine how the number of downloads from Sci-Hub as well as various characteristics of publications and their authors predicts future citations. Using data from 12 leading journals in economics, consumer research, neuroscience, and multidisciplinary research, we found that articles downloaded from Sci-Hub were cited 1.72 times more than papers not downloaded from Sci-Hub and that the number of downloads from Sci-Hub was a robust predictor of future citations. Among other characteristics of publications, the number of figures in a manuscript consistently predicts its future citations. The results suggest that limited access to publications may limit some scientific research from achieving its full impact.
Similar content being viewed by others
Data Availability
Our data sets as well as the codes that we developed for the analyses are available in the following public repository. https://osf.io/8c632/?view_only=19ea965dd02449a0927a3d95d0132a55.
References
Adler, R., Ewing, J., & Taylor, P. (2009). Citation statistics: a report from the international mathematical union (imu) in cooperation with the international council of industrial and applied mathematics (iciam) and the institute of mathematical statistics (ims). Statistical Science, 24(1), 1–14.
Andročec, D. (2017). Analysis of Sci-Hub downloads of computer science papers. Acta Universitatis Sapientiae Informatica, 9(1), 83–96. https://doi.org/10.1515/ausi-2017-0006.
Anscombe, F. J. (1973). Graphs in statistical analysis. The American Statistician, 27(1), 17–21.
Antonakis, J., Bendahan, S., Jacquart, P., & Lalive, R. (2010). On making causal claims: A review and recommendations. The Leadership Quarterly, 21(6), 1086-1120. https://doi.org/10.1016/j.leaqua.2010.10.010.
Armstrong, M. (2015). Opening access to research. Economic Journal, 125(586), F1–F30.
Bendezú-Quispe, G., Nieto-Gutiérrez, W., Pacheco-Mendoza, J., & Taype-Rondan, A. (2016). Sci-Hub and medical practice: an ethical dilemma in Peru. The Lancet Global Health, 4(9), e608.
Berg, J., Bhalla, N., Bourne, P., Chalfie, M., Drubin, D., Fraser, J., et al. (2016). Preprints for the life sciences. Science, 352(6288), 899–901.
Bjrk, B.C., & Solomon, D. (2012). Open access versus subscription journals: A comparison of scientific impact. BMC Medicine, 10, https://doi.org/10.1186/1741-7015-10-73.
Bohannon, J. (2016). Who’s downloading pirated papers? everyone. Science, 352(6285), 508–512.
Bohannon, J., & Elbakyan, A. (2016). Data from: Whos downloading pirated papers? everyone. Dryad Digital Repository,. https://doi.org/10.5061/dryad.q447c.
Boudry, C., Alvarez-Muñoz, P., Arencibia-Jorge, R., Ayena, D., Brouwer, N. J., Chaudhuri, Z., et al. (2019). Worldwide inequality in access to full text scientific articles: the example of ophthalmology. PeerJ, 7, e7850.
Boukacem-Zeghmouri, C., Bador, P., Lafouge, T., & Prost, H. (2016). Relationships between consumption, publication and impact in french universities in a value perspective: a bibliometric analysis. Scientometrics, 106(1), 263–280.
Breitzman, A., & Thomas, P. (2015). Inventor team size as a predictor of the future citation impact of patents. Scientometrics, 103(2), 631–647.
Brody, T., Harnad, S., & Carr, L. (2006). Earlier web usage statistics as predictors of later citation impact. Journal of the American Society for Information Science and Technology, 57(8), 1060–1072.
Bhlmann, P. (2020). Invariance, causality and robustness. Statistical Science, 35(3), 404–426. https://doi.org/10.1214/19-STS721.
Chen, X. (2016). A Middle-of-the-Road Proposal amid the Sci-Hub Controversy: Share “Unofficial” Copies of Articles without Embargo, Legally. Publications 4(29), https://doi.org/10.3390/publications4040029
Deshpande, P. R. (2019). Why should Sci-Hub be supported? International Journal of Health and Allied Sciences, 8(3), 210–212. https://doi.org/10.4103/ijhas.IJHAS_91_18.
Faust, J. S. (2016). Sci-Hub A Solution to the Problem of Paywalls, or Merely a Diagnosis of a Broken System? Annals of Emergency Medicine, 68(1), 15A–17A. https://doi.org/10.1016/j.annemergmed.2016.05.010.
Garcia-Puente, M., Pastor-Ramon, E., Agirre, O., Moran, J. M., & Herrera-Peco, I. (2019). The use of Sci-Hub in systematic reviews of the scholarly literature. Clinical Implant Dentistry and Related Research, 21(5), 816. https://doi.org/10.1111/cid.12815.
Gonzalez-Solar, L. & Fernandez-Marcial, V. (2019). Sci-Hub, a challenge for academic and research libraries. Profesional de la Informacin 28(1), https://doi.org/10.3145/epi.2019.ene.12.
Greco, A. N. (2017). The Kirtsaeng and SCI-HUB Cases: The Major US Copyright Cases in the Twenty-First Century. Publishing Research Quarterly, 33(3), 238–253. https://doi.org/10.1007/s12109-017-9522-7.
Hausmann, R., Hidalgo, C., Bustos, S., Coscia, M., Simoes, A., & Yildrim, M. (2013). The atlas of economic complexity: mapping paths to prosperity. Cambridge: MIT Press.
Hegarty, P., & Walton, Z. (2012). The consequences of predicting scientific impact in psychology using journal impact factors. Perspectives on Psychological Science, 7(1), 72–78.
Himmelstein, D. S., Romero, A. R., Levernier, J. G., Munro, T. A., McLaughlin, S. R., Tzovaras, B. G., et al. (2018). Sci-hub provides access to nearly all scholarly literature. ELife, 7(e32), 822.
Horowitz, I. (1986). Scientific access and political constraint to knowledge: Revisiting the dilemma of rights and obligations. Science Communication, 7(4), 397–405. https://doi.org/10.1177/107554708600700404.
Jaffe, K., Caicedo, M., Manzanares, M., Gil, M., Rios, A., Florez, A., et al. (2013). Productivity in physical and chemical science predicts the future economic growth of developing countries better than other popular indices. PLoS ONE, 8(6), e66239.
Jaffé, R. (2019). #Pay4Reviews: Academic publishers should pay scientists for peer-review. PeerJ Preprints, 7, e27,573v1.
Laverde-Rojas, H., & Correa, J. C. (2019). Can scientific productivity impact the economic complexity of countries? Scientometrics, 120(1), 267–282.
Lee, H. A., Law, R., & Ladkin, A. (2014). What makes an article citable? Current Issues in Tourism, 17(5), 455–462.
Lewbel, A. (2012). Using heteroscedasticity to identify and estimate mismeasured and endogenous regressor models. Journal of Business & Economic Statistics, 30(1), 67–80.
Machin-Mastromatteo, J. D., Uribe-Tirado, A., & Romero-Ortiz, M. E. (2016). Piracy of scientific papers in Latin America: An analysis of Sci-Hub usage data. Information Development, 32(5), 1806–1814. https://doi.org/10.1177/0266666916671080.
Manley, S. (2019). On the limitations of recent lawsuits against Sci-Hub, OMICS, ResearchGate, and Georgia State University. Learned Publishing, 32(4), 375–381. https://doi.org/10.1002/leap.1254.
McNutt, M. (2016). My love-hate of Sci-Hub. Science (New York, NY), 352(6285), 497. https://doi.org/10.1126/science.aaf9419.
Mejia, C. R., Valladares-Garrido, M. J., Miñan-Tapia, A., Serrano, F. T., Tobler-Gómez, L. E., Pereda-Castro, W., et al. (2017). Use, knowledge, and perception of the scientific contribution of sci-hub in medical students: Study in six countries in latin america. PloS ONE, 12(10), e0185,673.
Milkman, K. L., & Berger, J. (2014). The science of sharing and the sharing of science. Proceedings of the National Academy of Sciences, 111(Supplement 4), 13,642–13,649.
Nazarovets, S. A. (2018). Black Open Access in Ukraine: Analysis of Downloading Sci-Hub Publications by Ukranian Internet Users. Science and Innovation, 14(2), 19–24. https://doi.org/10.15407/scine14.02.019.
Nicholas, D., Boukacem-Zeghmouri, C., Xu, J., Herman, E., Clark, D., Abrizah, A., et al. (2019). Sci-hub: The new and ultimate disruptor? view from the front. Learned Publishing, 32(2), 147–153.
Novo, L. A. B., & Onishi, V. C. (2017). Could sci-hub become a quicksand for authors? Information Development, 33(3), 324–325. https://doi.org/10.1177/0266666917703638.
O’Loughlin, J., & Sidaway, J. D. (2020). Commercial publishers: What is to be done? Geoforum, 112, 6–8. https://doi.org/10.1016/j.geoforum.2019.12.011.
Paulus, F. M., Rademacher, L., Schäfer, T. A. J., Müller-Pinzler, L., & Krach, S. (2015). Journal impact factor shapes scientists reward signal in the prospect of publication. PloS ONE, 10(11), e0142,537.
Peet, L. (2016). Sci-Hub Sparks Critique of Librarian. Library Journal, 141(15), 14–17.
Pinto, T., & Teixeira, A. A. C. (2020). The impact of research output on economic growth by fields of science: a dynamic panel data analysis, 1980–2016. Scientometrics,. https://doi.org/10.1007/s11192-020-03419-3.
Radicchi, F., Fortunato, S., & Castellano, C. (2008). Universality of citation distributions: Toward an objective measure of scientific impact. Proceedings of the National Academy of Sciences of the United States of America, 105(45), 17,268–17,272. https://doi.org/10.1073/pnas.0806977105.
Saleem, F., Hasaali, M. A., & Haq, Nu. (2017). Sci-hub & ethical issues. Research in Social & Administrative Pharmacy, 13(1), 253. https://doi.org/10.1016/j.sapharm.2016.09.001.
Seguin, J. (2019). The future of access: How a mosaic of next-gen solutions will deliver more convenient access to more users. Information Services & Use, 39(3), 237–242. https://doi.org/10.3233/ISU-190049.
Sekara, V., Deville, P., Ahnert, S. E., Barabási, A. L., Sinatra, R., & Lehmann, S. (2018). The chaperone effect in scientific publishing. Proceedings of the National Academy of Sciences, 115(50), 12,603–12,607.
Shuai, X., Pepe, A., & Bollen, J. (2012). How the scientific community reacts to newly submitted preprints: Article downloads, twitter mentions, and citations. PLoS ONE, 7(11), e47,523.
Sinatra R, Wang D, Deville P, Song C, Barabisi AL (2016) Quantifying the evolution of individual scientific impact. Science 354(6312), doi: https://doi.org/10.1126/science.aaf5239
Smith, L. D., Best, L. A., Stubbs, D. A., Archibald, A. B., & Roberson-Nay, R. (2002). Constructing knowledge: The role of graphs and tables in hard and soft psychology. American Psychologist, 57(10), 749.
Solomon, D. J. (2014). A survey of authors publishing in four megajournals. PeerJ, 2014(1), e365.
Solomon, D. J., & Björk, B. C. (2012). Publication fees in open access publishing: Sources of funding and factors influencing choice of journal. Journal of the American Society for Information Science and Technology, 63(1), 98–107.
Stasinopoulos, M., Rigby, R. A., Heller, G. Z., Voudouris, V., & De Bastiana, F. (2017). Flexible Regression and Smoothing Using GAMLSS in R. Boca Ratn, USA: CRC Press.
Steegen, S., Tuerlinckx, F., Gelman, A., & Vanpaemel, W. (2016). Increasing transparency through a multiverse analysis. Perspectives on Psychological Science, 11(5), 702–712.
Strielkowski, W. (2017). Will the rise of sci-hub pave the road for the subscription-based access to publishing databases? Information Development, 33(5), 540–542.
Sá, MJ., Ferreira, C.M., Serpa S. (2019). Science communication and online social networks: Challenges and opportunities. Knowledge Management: An International Journal, 19(2), 1–22.
Till, B. M., Rudolfson, N., Saluja, S., Gnanaraj, J., Samad, L., Ljungman, D., et al. (2019). Who is pirating medical literature? A bibliometric review of 28 million Sci-Hub downloads. Lancet Global Health, 7(1), E30–E31. https://doi.org/10.1016/S2214-109X(18)30388-7.
Varki, A. (2017). Scientific journals: Rename the impact factor. Nature, 548(7668), 393.
Zhang, Z., & Van Poucke, S. (2017). Citations for randomized controlled trials in sepsis literature: the halo effect caused by journal impact factor. PloS ONE, 12(1), e0169,398.
Zhu, J., & Liu, W. (2020). A tale of two databases: the use of Web of Science and Scopus in academic papers. Scientometrics,. https://doi.org/10.1007/s11192-020-03387-8.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
This appendix aims at providing detailed guidance for both understandings and reproducing the results of “The Sci-Hub Effect: Sci-Hub downloads lead to more article citation.” A first analysis is presented in “Part 1” section, which is composed of six subsections. In each of these subsections we provide the arguments that allow our readers understand how we reach more accurate estimates (i.e., robust estimates). “Part 2” section presents a second analysis with a similar structure and purpose as the preceding section. The difference between the first and the second analysis relies on the statistical techniques employed. The techniques employed in both analyses follow the recommendations from the perspective of multiverse analysis Steegen et al. (2016), and allowed us to discard all other confounding factors that could lead to a misleading interpretation of the results.
Part 1
We cleaned the data set by omitting missing information. Then, the amount of data is reduced to 8131 observations. We used these remaining observations to examine the behavior of outliers with box-plots and descriptive statistics. Both Fig. 3 and Table 1 reveal the presence of extreme values, particularly for citations, the number of pages, and the number of figures and tables. The presence of these outliers leads us to evaluate their influence on the regression models that are presented in the following subsections.
Regression diagnostics
An additional scatterplot analysis reveals a positive relationship between the number of citations and the number of Sci-Hub downloads (See Fig. 4). This relationship, however, can be distorted by the outliers already mentioned.
Table 2 summarizes the results of a multiple regression. There is a positive and significant relationship between the number of Sci-Hub downloads and the number of citations. Number of citations is also positively associated with the number of figures, authors per article, impact factor, and the H-index of both the first and last author of each paper. The length of the title has a negative relationship with the number of citations (i.e., papers with lengthy titles tend to have fewer citations), but this impact proved to be non-significant. The chaperone effect, the number of pages, the total number of tables, and the country resources, as captured by the GDP per capita and nature index, did not significantly predict the citations of a paper (at a statistical significance level of 5%).
Based on the results of Table 2, we can make some regression diagnostics (e.g., the fulfillment of the assumptions of the model and the effects of the outliers on the results). These diagnostics allow us to decide the type of parameter estimation method that best suits the data. We begin by conducting a residual analysis to test the following assumption: \(E(\epsilon | X)=0\). To do this, we depict the residuals against fitted values in Panel (a) of Fig. 5. Individual estimates in this graph must be interpreted by comparing their distance to zero (i.e., the larger the distance from zero, the worse its estimate). We observe that some values can significantly alter the results of regressions. Our second analysis focuses on testing the normality of the residuals through a Q-Q plot, as depicted in Panel (b) of Fig. 5. We notice that in both tails, several points do not fit the line, invalidating the results of the regressions (in particular, the confidence intervals and the significance tests). In Panel (c) in Fig. 5 we evaluate the i.i.d. assumption, particularly that of homoscedasticity. We notice that points are over the red line, indicating that the residuals have uniform variance. Again, the outlier points undermine this relationship, implying problems of heteroscedasticity. Finally, Cook’s distance shows us that some points are very far from their average, as captured by Panel (d) in Fig. 5.
In Table 3, we used a deletion diagnostic to identify which influential observations may cause a substantial change in the fit when they are excluded from the model. We used the following measures of influence when ith observation is deleted: (a) DFFIT (how much the regression function changes), (b) DFBETA (how much the coefficients change), (c) COVRATIO (how much the covariance matrix change), (d) \(D^2\) (Cooks distance, how much the entire regression function changes), and (e) hat-values (for detecting high-leverage observations). In the literature, it is common to point out that an observation is considered unusual if it is detected by at least one of the aforementioned influence measures. Although many observations meet this condition, we only show a few for the sake of brevity. We noticed that observations such as 1952 or 2223 stand out using any measure of influence, demonstrating how detrimental these points can be for the results of the regression analysis.
Dealing with outliers
So far, we found the presence of outliers that threaten the validity of traditional regression analyses. One possible solution, given the study of the regression diagnosis, would consist of eliminating the problematic observations. However, with this technique, valuable information is lost. Instead, we use a robust regression that is less sensitive to outliers ?. Table 4 shows the results for the model presented in Table 2 estimated by robust regression, through the use of iterated re-weighted least squares. Robust regression assigns a higher weighting to observations that generate a lower residual. Comparing the results of OLS and Robust regressions shows that coefficients, signs, and statistical significance are very different, revealing a strong influence of outliers on model parameters in the OLS regression.
Dealing with heteroscedasticity
An important assumption in traditional regression models is that errors must be homoscedastic. The violation of this assumption can lead to the use of covariance matrix estimators that can be inconsistently estimated. Although a first exploration was already carried out through graphical analysis, we test this assumption in our model through the Breusch-Pagan test. As expected in the cross-section models, the test shows the presence of heteroscedasticity problems (\(BP = 283.76, df = 14, p-value < 2.2e-16\)) whose solution consists of employing heteroscedasticity-consistent estimators, through the Huber-White basic sandwich estimator. Table 5 shows the results for regression with robust standard errors.
With this correction, the results are similar to the results of OLS with regard to the sign, magnitude, and statistical significance of the coefficients, but different from those of a robust regression in the size of the coefficients.
Dealing with endogeneity
Another assumption in OLS models is that of endogeneity, which takes place when one of the independent variables is related to the residual term in the regression equation. In that case, the OLS estimates can be spurious. The traditional technique to correct this problem is using instrumental variables. However, the application of this method needs to generate external instruments that are not always available. Here, we rely on Lewbel’s methodology to evaluate the endogeneity problem (Lewbel 2012). Although our results are based on R packages, the application of the Lewbel’s methodology is best developed in Stata, particularly the tests of overidentifying restrictions. Sargans statistic is not robust in the presence of conditional heteroskedasticity, so we rely on Hansen J statistic. Table 6 shows the results obtained through the Lewbel’s method, assuming that Sci-Hub and Nature Index variables are endogenous. The Hansen test allows us to test the orthogonality conditions for the instruments. The results mentioned above indicate that the model may have an endogeneity problem, so it needs to be instrumented in different models. It should be clear that the tests of assumptions of OLS models allow us to conclude that our proposed regression model is affected by outliers, heterosdasticity and endogeneity problems. To overcome these problems, our results will be presented using robust regression, regression with robust standard errors, and instrumented variables based on heteroscedasticity.
The results of Table 6 show that the majority of variables proved to be significant predictors of citations, except the title length, the chaperon degree, the total tables, and the resources of the affiliated country, as captured by the GDP per capita and Nature index.
Results of robustness analysis
In this section, we present our final results and test the robustness of them by using different sets of models and methods. The following equation gives the specification we are trying to estimate:
Where \(C_i\) stands for the number of citations the paper i has received, \(\beta \) is our parameter of interest as it quantifies the relationship between the citation of a paper and the number of times the paper i was downloaded through SciHub; \(X'\) is a vector containing the following control variables: The impact factor of the journal where the paper was published; the length of the title of the paper, as captured by the number of types or unique words in it; the number of graphs included in the paper i for communicating scientific findings, the number of tables included in the paper i; the chaperone effect captured by the H-index of the first and last author of paper i; the number of authors of the paper i. \(\theta _i\) represents the residuals of our model.
A reasonable assumption in our model would be that each discipline and journal have different citation patterns. Given the variability intrinsically associated with the scientific discipline and the particular journal where the paper was published, we also include dummies for discipline and journal type to control for hidden confounds. The above specification could be understood as an extended specification of Eq. 2 in the main manuscript. Table 7 shows the results of the estimates of robust regression.
We run Eq. 2 again. However, this time we introduce blocks of variables gradually to conduct a sensitivity analysis. First, model 1 does not include any control variables. Here, the number of times the paper was downloaded from Sci-Hub has a positive and significant effect on the number of citations. The following model introduces the dummies by the type of discipline (i.e., multidisciplinary, economics, consumer, or neuroscience) and journal. In this case, the results for Sci-Hub remain almost unchanged. In the third model, we added a series of variables related to the characteristics of the document (i.e., number of figures, tables, pages and the extension of the title). Once again, Sci-Hub remains robust to this new specification. The number of figures and tables included in a paper both show a positive and significant relationship with the number of citations. Conversely, the pages and the length of the title show the opposite association. Next, we introduced a new block of variables related to the characteristics of the authors (i.e., the H-index of both the first and last author, the number of authors of the paper, and the chaperone degree). The introduction of these new variables does not change the results for the Sci-Hub effect on article citations. All variables reveal positive and significant effects for citations except the chaperon degree (at a statistical significance level of 5%). In model 5, we introduced variables related to the context in which the authors and journals operate (such as the GDP per capita for the country of the authors, the impact factor of the journal, and the nature index). In this model, the Sci-Hub coefficient is still positive and highly significant. For the rest of the variables, only the impact factor seems to correspond to the expectations in terms of sign and statistical significance. Finally, we added all the control variables in the same model. The results remain unchanged from the previous specifications. In Table 8, we run the same models but now we estimate the parameters with a heteroscedasticity correction by using robust errors.
Regardless of the specification we use, the results show that the effect of Sci-Hub on the number of citations remains positive and significant. Concerning the characteristics of the document or the authors, the results vary for some variables. For example, variables such as the number of tables, the extension of the title, or the chaperon effect do not prove to be very robust to different specifications. Finally, in Table 9, we estimate our models by tackling the endogeneity problem. In general terms, the models show good performance. We were able to verify the validity of the instruments, except for models 1 and 3, when they were evaluated through the Hansen J statistic. As can be seen, the effect of Sci-Hub on article citations remains robust to different specifications, while the other variables have a similar behavior to that of Table 8, except for the variables related to the context in which the authors and journals operate (i.e., impact factor, author i’s GDP per capita, author i’s nature index).
Part 2
We begin the analysis by focusing on the marginal distribution of data. Figure 6a depicts the complete distribution. The presence of an outlier article with more than 9000 citations is evident in Fig. 6b. After excluding the article, it is clear that there are still some articles with more than 2000 citations (see Fig. 6c). By removing those articles, it is possible to obtain a smoother distribution (see Fig. 6e and f).
Fitting the shape of the marginal distribution
Generalized Additive Models for Location, Scale and Shape (GAMLSS) is the most optimal and flexible approach for modeling these data Stasinopoulos et al. (2017). GAMLSS allows fitting several count distributions to the marginal distribution and compare their goodness of fit via the Generalized Akaike Information Criterion (GAIC). Table 10 shows the GAIC results of the different tested distributions. The results indicate that the Zero Inflated Beta Negative Binomial (ZIBNB) and the Zero Adjusted (Hurdle) Beta Negative Binomial (ZABNB) distributions gave the best fit. Figure 7 shows the empirical cumulative distribution function (ECDF) plots of the data and four adjusted distributions.
Statistical analyses
As shown above, the four-parameters Zero inflated beta negative binomial (‘ZIBNB‘) gave the best fit. Hence, the data were modelled with this distribution. The first two parameters of the ‘ZIBNB‘ are \(\mu \) and \(\sigma \) and they represent the distributions’ location and scale. Recall that GAMLSS enables to model location (e.g. mean), scale (e.g. SD) and shape (i.e. skewness and kurtosis). For simplicity, though, only the location is modelled. For the case of numeric covariates, besides linear modeling, GAMLSS allows modeling covariates via smothers (e.g. penalized B-splines, monotone P-splines, loess curves). Also, via the package ‘gamlss.util‘ it is possible to use neural networks, decision tress, and others (see pages 24 to 25 in Stasinopoulos et al. (2017)). The results of the location modeling are shown in Table 11 (these results are ranked according to their absolute t-values).
Final model
A forward and backward stepwise variable selection procedure applied to a model with all variables, suggested the following model:
where, \(C_i\) is the number of citations the paper i has received; SciHub is the number of times the paper i was downloaded through SciHub; APA is the number of authors per article; TL is the length of the title of the paper; HIN and HI1 are the H-index of the first and last author of each paper; CE is the chaperone effect; TG and TT are the numbers of graphs and tables included in the paper; IF is the impact factor of the journal where the paper was published; GDPpc is the GDP per capita of the first author, and NI is the nature index.
By removing the variables not included in this model, the AIC went from 82692.05 to 82685.60.
Referring back to Table 11, and in order to render the model more parsimonious, it could be argued that if three variables were to be kept, then, in this order, they would be: the number of Sci-Hub downloads (ScihubN), and the total number of Figures in the published paper (Total.of.figures.y); ’Citations’ being the dependent variable. The impact factor of the publishing journal (IF) could also be considered as another good predictor of citations.
Figure 8 displays the results of associations between ‘Scihub‘ and ‘Citations‘ (A) and between ‘Total number of figures‘ and ‘Citations‘ (B). Figure 8a and b show linear and non-linear fitting lines. The linear fitting is performed via least median of squares (LMS) regression and the non-linear fitting is done via locally weighted scatterplot smoothing (LOWESS). The three-way associations are graphed using ordinary least squares planks (Fig. 8d) and locally estimated scatterplot smoothing (LOESS).
Rights and permissions
About this article
Cite this article
Correa, J.C., Laverde-Rojas, H., Tejada, J. et al. The Sci-Hub effect on papers’ citations. Scientometrics 127, 99–126 (2022). https://doi.org/10.1007/s11192-020-03806-w
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11192-020-03806-w