The New Legal Landscape for Text Mining and Machine Learning

From AcaWiki
Jump to: navigation, search

Citation: Matthew Sag The New Legal Landscape for Text Mining and Machine Learning.
Internet Archive Scholar (search for fulltext): The New Legal Landscape for Text Mining and Machine Learning
Wikidata (metadata): Q64475810
Download: https://ssrn.com/abstract=3331606
Tagged:

Summary

Text and data mining (TDM) is firmly ensconced in the US as fair use, following Authors Guild cases vs Google Books and HathiTrust, which held that copying expression works for non-expressive uses is fair use, in line with core copyright concepts: copyright holder controls communication of original expression to the public, compare with non-expressive uses and idea-expression dichotomy.

However TDM is not on as solid ground outside the US, eg in the EU, due to lack of fair use which provides greater flexibility than limited statutory exceptions, and because a lower threshold required to trigger the reproduction right raises questions about derivative works and machine learning that US researchers can ignore.

Four conceptual stages of a TDM research project highlight other potential obstacles, e.g., contracts with holding institutions, CFAA, anti-circumvention:

  1. Access (either physical or digital);
  2. Extraction (copying);
  3. Mining (analytical processing, internal verification, and external validation)
  4. Use