Descriptive Metadata in the Music Industry: Why It Is Broken And How to Fix It—Part One
Citation: Tony Brooke (2014) Descriptive Metadata in the Music Industry: Why It Is Broken And How to Fix It—Part One. Journal of Digital Media Management (RSS)
Tagged: musicbrainz (RSS)
Descriptive metadata "popularly known as credits or liner notes" no more detailed for digital downloads today than for audio cylinders in 1899. Two "keystones" for solution: standardized descriptive metadata schema, and globally unique abstracted persistent identifier (GUAPI).
Paper does not cover "technical metadata" (e.g., file type, bitrate), "administrative metadata" (used for managing a collection, eg date modified and sort fields), "structural metadata" (compounding multiple items into one), and "performing rights metadata" (related to descriptive metadata, but higher stakes for industry, not of direct interest to public).
Music metadata has origins in printed scores. Inclusion with recorded music serves multiple goals:
- business: promote names of artists and labels to increase sales
- organizational: more effective delivery to retailers and better collection management
- legal: labels contractually obligated to pay contributors
- acknowledgement: right thing to do
Claims descriptive metadata delivered with recordings follow bell curve: minimal with cylinders, lots with LPs, minimal again with digital downloads. Claims descriptive metadata increases differentiation, lack of which makes music a commodity, and piracy increases with homogeneity.
Briefly describes key terms:
- authority, authority records
- controlled vocabulary
- data content standard (style guide)
- persistent identifier
- abstracted model (able to refer to various levels of precision eg FRBR's work, expression, manifestation, item)
Reasons for lack of descriptive metadata:
- Lack of descriptive metadata in CD specification created vacuum
- CDDB very late and imperfect workaround, offered no way to correct errors
- Resulting metadata-deprived files widely shared via P2P
- Rise of online stores included DDEX standard, but it did not address descriptive metadata
- Various silos and non-standard schemas arose
- Mostly a business problem: little collaboration among music industry entities
- Agreement there is a problem, no consensus on how to fix
Describes several existing silos, aiming to ask whether any can serve as an independent authority and whether one schema can be used as a model for a standard:
- Gracenote: proprietary, flat schema, poor interoperability, 130m tracks
- AllMusicGuide: proprietary, flat schema, 2.9m albums
- EchoNest: mixed, flat/mixed schema, good interoperability, 2.6m artists, 35m tracks
- MusicBrainz: open, abstracted schema, excellent interoperability, 0.8m artists, 16m tracks, 1.3m albums
- Discogs: open, flat schema, good interoperability, 3.0m artists, 4.3m albums
Gracenote (privatized CDDB, owned by Sony until sold in 2014 to Tribune) and AMG (owned by Rovi) sell licenses to metadata, to detriment of potential as authorities, and unlikely would offer schema to community.
EchoNest (recently acquired by Spotify) also commercial, but makes some descriptive metadata available for free, and adds contextual and subjective metadata.
Discogs has good independence relative to Gracenote and AMG, edited by community, focused on albums.
MusicBrainz "notable for following many good practices", run by a nonprofit, user contributed info policed by voting, strict editorial and style guidelines, has abstracted schema allowing for cataloging "amazingly detailed relationships", and a metadata crosswalk to other schemas for interoperability, complete database available for download, transparent governance. High quality, but relatively low quantity. Net "by far strongest contender as an independent authority".
Regarding fears of crowdsourced metadata:
- MusicBrainz thoroughly vetted
- If stakeholder finds error, can be corrected directly, unlike less accessible systems
- Descriptive metadata not mission critical (cf performing rights metadata)
MusicBrainz schema may prove even more important than MusicBrainz central database.
DDEX helps track sales, but includes just two descriptive fields: "title" and "main artist". CCD (Content Creator Data) is a descriptive metadata schema in progress from DDEX. CCD includes exhaustive fields for documenting production, but is not abstracted making it inadequate for released music. DDEX shows music industry can collaborate when financially justified.
No single solution yet has all the requirements:
- abstracted schema
- stakeholder buy-in
- ability to document entire process from production to release