Narratives, and other types of speech, can be used to enlighten, entertain, and make sense of the world. However, while discourse is frequently described as moving swiftly or slowly, covering a lot of ground, or going in circles, little research has been done to quantify such motions or determine whether they are advantageous.
To close this gap, Olivier Toubia et al. express texts as sequences of points in a latent, high-dimensional semantic space, combining multiple state-of-the-art natural language processing and machine-learning techniques. They develop a basic set of metrics to quantify elements of this semantic path, test them on thousands of texts from various domains (e.g., movies, TV shows, and academic papers), and see if and how they are related to success (e.g., the number of citations a paper receives).
The findings highlight several key cross-domain characteristics and give a broad framework for studying a variety of discourse kinds, as well as shed light on why things become popular and how natural language processing might help predict cultural success.
We use natural language processing and machine learning to analyze the content of almost 50,000 texts, constructing a simple set of measures (i.e., speed, volume, and circuitousness) that quantify the semantic progression of discourse.
Various criteria measured in the study capture human perceptions of circuitousness. While circuitousness might seem undesirable (the ratio of the actual distance traveled to the shortest possible path), it may allow the audience to create new and deeper connections between previously explored themes.
While many have theorized about features of narratives, less work has formalized these intuitions, or tested whether certain features of discourse are linked to success. This paper provides a set of measures to quantify the semantic progression of texts and the ground they cover. In particular, we examined speed, volume, and circuitousness and how they relate to the success of movies, TV show episodes, and academic papers.
The findings reveal that the characteristics that distinguish a good movie from those that distinguish a successful TV show or academic paper, and future research could look into the origins of these cross-domain differences. The style of discourse (e.g., story vs. exposition), the purpose (e.g., to entertain vs. to transmit knowledge), the modality (e.g., video vs. written), the result measure (e.g., like vs. citations), and audience expectations are all possible considerations. Other types of writings may be studied in the future (e.g., books, speeches, or documentaries).
How quantifying the shape of stories predicts their success, Olivier Toubia, Jonah Berger, and Jehoshua Eliashberg
Published: June 25, 2021
DOI: 10.1073/pnas.2011695118