Academic Research AI/ML/NLP Data Analysis

Intimatopias in BTS Hurt/Comfort Fanfiction Tags: A Corpus-Based Keyword Analysis of Subgenre Tagging Practices on Archive of Our Own

December 31, 2025 · 2 min read

About the Project

Developed for Methods of Corpus Linguistics course taught by Dirk Speelman, as part of the MSc Digital Humanities program in 2025-26, this study examines the Hurt/Comfort (H/C) subgenre in English fanfiction of the K-pop boyband BTS on Archive of Our Own. These fanfiction narratives focus on moments of hurt followed by comfort between queer characters in this fandom. Using a corpus linguistics approach, keyword analysis of fanfiction tags helps identify patterns in how authors label H/C stories and how hurt and comfort are encoded in these stories. By comparing a target corpus of H/C fanfiction with a reference corpus of general BTS fanfiction, the study explores distinctive tagging practices and brings to light the subgenre tropes and tagging conventions in fanfiction communities that determine how these stories are categorized and communicated to readers.

The corpus was constructed using English BTS fanfiction from Archive of Our Own (AO3), drawing on metadata from the GOLEM database (Pannach 2024), which contains approximately 8 million AO3 stories till 2022. Metadata retrieved included fanfiction titles, engagement metrics (kudos), and user-generated tags. Data were collected in 221 batches via a SPARQL query endpoint in R.

GitHub Repository