Methodology & Data Sources
Every number on this site is traceable to a public, CC0-licensed catalogue. This page lists the three primary sources, their licences, and how we process them.
Data Sources
MoMA Collection (primary)
The Museum of Modern Art publishes its Artists and Artworks catalogues on GitHub under CC0. Repository: MuseumofModernArt/collection. We import the full Artists.csv (≈15,800 artists) and Artworks.csv (≈160,000 works).
The Metropolitan Museum of Art — Open Access
The Met publishes CC0 catalogue data: metmuseum/openaccess. Where a Met public-domain record matches a MoMA artist by display-name, we enrich the artist profile with a CC0 image sourced from the Met Collection API.
Free Music Archive (FMA)
An open metadata archive of ≈106,000 CC-licensed tracks, published by mdeff/fma. Supports our music-genre hubs with real track/artist counts per genre.
Processing
CSVs are imported into custom MySQL tables on every upstream change (weekly). Posts are regenerated only when their underlying rows change, to avoid re-crawling cost.
Limitations
Museum catalogues reflect institutional collecting priorities — they are not comprehensive. Under-represented nationalities in our archive are under-represented in MoMA itself, not in reality.