The effect of information controls on developers in China: An analysis of censorship in Chinese open source projects

From AcaWiki
Jump to: navigation, search

Citation: Jeffrey Knockel, Masashi Crete-Nishihata, Lotus Ruan The effect of information controls on developers in China: An analysis of censorship in Chinese open source projects.
Internet Archive Scholar (search for fulltext): The effect of information controls on developers in China: An analysis of censorship in Chinese open source projects
Wikidata (metadata): Q64410410
Download: https://www.aclweb.org/anthology/papers/W/W18/W18-4201/
Tagged:

Summary

Authors collected files from GitHub and extracted lists of strings and determined which of the lists are likely for sensitive disallowed Chinese keywords, starting from known keywords reverse engineered from chat apps. Using keyword matching, machine learning, and manual inspection, found 524 unique Chinese blacklists with 215k unique keywords. Longest contained 38k keywords, mean 2k, median 1k (but only considered lists with at least 20 words). Lists were dissimilar, aligned with Chinese censorship being decentralized and ad hoc.