Code Sharing Is Associated with Research Impact in Image Processing
Citation: Patrick Vandewalle (2012/07) Code Sharing Is Associated with Research Impact in Image Processing. Computing in Science & Engineering (RSS)
DOI (original publisher): 10.1109/MCSE.2012.63
Semantic Scholar (metadata): 10.1109/MCSE.2012.63
Sci-Hub (fulltext): 10.1109/MCSE.2012.63
Internet Archive Scholar (search for fulltext): Code Sharing Is Associated with Research Impact in Image Processing
Tagged: Computer Science (RSS) academic software (RSS)
As the title reads, the author found that code sharing correlates with research impact in their field by sampling papers from mainstream journals in the mid-2000s.
- Most papers in signal processing do not release source code. This makes it hard to reproduce.
- It is widely agreed upon that this _shouldn't_ be the case (according to the author's), but it is.
- Academics have little explicit incentive to release their source code.
- There are implicit incentives: notoriety through downloads, feedback from users, ease of collaboration, usage by other researchers, and citations.
- Thesis: correlation between releasing code and citations.
- Causality can be determined by a "controlled experiment" which is left as future work.
- Releasing source code is neither necessary nor sufficient for reproducibility. Open source can be unreproducible if it depends on the system; reproducible work which does not require code or describes it very carefully does not to release code.
- But they are correlated.
- See also:
- Peace research: N.P. Gleditsch and H. Strand, “Posting Your Data: Will You Be Scooped or Will You Be Famous?” Int’l Studies Perspectives, vol. 4, no. 1, 2003, pp. 89–97.
- Cancer research: H.A. Piwowar, R.S. Day, and D.B. Fridsma, “Sharing Detailed Research Data Is Associated with Increased Citation Rate,” PLoS ONE, vol. 2, no. 3, 2007, p. e308; 
- Astronomy: E.A. Henneken and A. Accomazzi, “Linking to Data—Effect on Citation Rates in Astronomy,” Proc. Astronomical Data Analysis Software and Systems, Astronomical Soc. of the Pacific, 2011; 
- Open access: S. Lawrence, “Free Online Availability Substantially Increases a Paper’s Impact,” Nature, vol. 411, no. 6837, 2001, p. 521; 
- IEEE Transactions on Image Processing 2004 -- 2006, 645 papers
- Searched for source by hand.
- Roughly 10% had source.
- Use Google Scholar for citation numbers. Web of Science tends to be more selective when counting citations.
- Long-tail of rarely cited papers, so median is better than mean.
- Look for difference in median (median citation count no source = 25, with source = 76 in 2004).
- Mann-Whitney U-test says the difference is statistically significant.
- The null hypothesis is still rejected even if one removes half of the papers with source code from consideration; result is robust to ignoring some the "superstar" reproducible papers.
- Conclusion: Publications with source are more often cited.
- Only best-cited papers from 2004 -- 2008, IEEE Transactions on Image Processing 2004 (TIP), IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), and IEEE Transactions on Signal Processing (TSP)
- Count those which have source code available.
- Roughly 90% had source with the exception of TSP, due to its theoretical nature.
- TIP and TPAMI have a much greater proportion of their best-cited papers with source code than TSP.
- Conclusion: Best-cited papers release their source code
- The lifetime of source repo can be cut short if the website host is phased out.
- Industry research may have problems releasing source; they should still make studies reproducible internally.