Usage and attribution of Stack Overflow code snippets in GitHub projects

From AcaWiki
Jump to: navigation, search

Citation: Sebastian Baltes, Stephan Diehl Usage and attribution of Stack Overflow code snippets in GitHub projects.
Internet Archive Scholar (search for fulltext): Usage and attribution of Stack Overflow code snippets in GitHub projects
Wikidata (metadata): Q57262997
Download: https://arxiv.org/abs/1802.02938
Tagged:

Summary

Authors investigate use and attribution of Stack Overflow material, in particular whether usage complies with the SO license and terms. Data is based on Java language code/posts from GitHub and SO dumps, and a survey of developers who use both sites.

RQ1: How often is code from Stack Overflow posts used in public GitHub projects without the required attribution?

Using 3 methods, find one quarter to be a reasonable upper bound for the ratio of attributed usages of SO Java snippets in GH files. Between 3.3% and 11.9% of analyzed repositories contained references to SO questions or answers.

RQ2: How often does the license of repositories containing code copied from Stack Overflow conflict with Stack Overflow’s license?

At most 1.8% of all analyzed repositories containing code from SO used the code in a way compatible with CC BY-SA 3.0 -- GPL-3.0 licensed (it is conceivable to use CC-BY-SA-3.0 to CC-BY-SA-4.0 to GPL-3.0 compatibility) and with attribution.

RQ3: Do developers adhere to the attribution requirements defined in the Stack Overflow terms of service?

Most comments with attribution included only a link without naming the author or linking to author's SO page. None of the analyzed references fulfilled SO's terms, which may be more than can be enforced under CC-BY-SA notwithstanding.

RQ4: Are software developers aware of the licensing of Stack Overflow code snippets and its implications?

75% of developers responding to survey did not know that content on SO is licensed under CC BY-SA 3.0 and 67% did not know that attribution is required. Not attributing the code when coping code from SO was a common practice (41%) -- though claimed attribution far higher than observed, possibly due to social desirability bias.