Large teams develop and small teams disrupt science and technology

From AcaWiki
Jump to: navigation, search

Citation: Lingfei Wu, Dashun Wang, James A. Evans Large teams develop and small teams disrupt science and technology.
Internet Archive Scholar (search for fulltext): Large teams develop and small teams disrupt science and technology
Wikidata (metadata): Q61721669
Tagged: github (RSS)

Summary

Using existing distruptiveness metric: Disruption, D, of the focal paper is defined by the difference between the proportion of type i (citations of focal paper) and j (citations of papers the focal paper cites) papers pi − pj, which equals the difference between the observed number of these papers ni − nj divided by the number of all subsequent works ni + nj + nk. A paper may be disrupting (D = 1), neutral (D = 0) or developing (D = −1)

Collected datasets from:

  1. Web of Science (WOS) database that contains more than 42 million articles published between 1954 and 2014, and 611 million citations among them
  2. 5 million patents granted by the US Patent and Trademark Office from 1976 to 2014, and 65 million citations added by patent applicants
  3. 16 million software projects and 9 million forks to them on GitHub (2011–2014), a popular web platform that allows users to collaborate on the same code repository and ‘cite’ other repositories by copying and building on their code

Found that as teams grow from 1 to 50 team members, their papers, patents and products drop in percentiles of measured disruption by 70, 30 and 50, respectively.

Results do not change when only considering empirical or theoretical articles, review or original research articles, and other characteristics.

Solo authors and small teams much more often build on older, less popular ideas. Larger teams more often target recent, high-impact work as their primary source of inspiration, and this tendency increases monotonically with team size.

Recommend ensuring funding to diverse set of team sizes.

Theoretical and Practical Relevance

https://lingfeiwu.github.io/smallTeams/ author website includes links to code, data, and major press coverage.

Is the software dataset actually finding disruptive work? "Dataset of software. The GitHub data contain 15,984,275 code bases (or repositories) contributed by 2,348,085 programmers in GitHub between 2011 and 2014. In this period, 2,065,729 programmers contributed 9,127,410 forking patterns in which they copied and saved an existing repository to build upon it. To calculate disruption and other measures, we construct a citation network of repositories. For each repository, we identify its core members as those who contributed more edits, or ‘pushes’, than the average value of all contributors to a repository. We then add a citation link from repository A to B if a core member of A forked the code from B between this user’s first and last edit of A. The constructed network contains 26,900 nodes (repositories) and 108,640 links."