Data Is Plural
A new podcast from the long-running newsletter, Data Is Plural (data-is-plural.com). Each episode distills an expert interview into a crisp 15 minutes, taking you behind the scenes of another surprising dataset. One season = five episodes.
Data Is Plural
S2E5: Crosswords
•
Jeremy Singer-Vine
•
Season 2
•
Episode 5
This episode’s guests are George Ho and Saul Pwanson, whose crossword datasets were featured in the Data Is Plural newsletter in 2021 and 2016, respectively. Saul and George explain the difference between American-style and cryptic crosswords, how they collected their datasets, and what they learned along the way.
Relevant and mentioned links:
- Saul’s xd archive, grid comparison, and .xd file format
- FiveThirtyEight’s coverage of the plagiarism scandal Saul’s analysis unearthed and Saul’s csv,conf talk, “How a File Format Led to a Crossword Scandal”
- George’s dataset of cryptic crossword clues
- George’s datasheet for the dataset
- Timnit Gebru et al.’s “Datasheets for Datasets”
- XWord Info, from which Saul gathered New York Times crossword data
- David Steinberg’s Pre-Shortzian Puzzle Project, with “litzing” contributions from Barry Haldiman and others
Theme music by Nikhil Sonnad.