Tech News
← Back to articles

Detecting Dementia Using Lexical Analysis: Terry Pratchett's Discworld

read original related products more articles

1. Introduction

Dementia is an umbrella term for a range of conditions involving neurodegeneration or other aetiologies that result in significant cognitive decline, the most common of which is Alzheimer’s disease [ 1 ]. The cognitive decline observed in people with Alzheimer’s disease involves the gradual accumulation of toxic amyloid-beta and tau proteins causing neuronal damage [ 2 ], so identifying those people with signs of Alzheimer’s-related cognitive decline as early as possible would enable interventions to be utilised to delay or even prevent some of the damage. People with dementia may first notice they have an issue when they experience increased episodes of confusion or issues with memory or language (e.g., [ 3 ]). However, Alzheimer’s pathology likely begins many years and perhaps decades before the onset of symptoms [ 4 ]. Indeed, research has shown that there are earlier warning signs of dementia which may be too subtle for a patient to be aware of, for example, problems with attention (e.g., [ 5 ]). Further, research suggests that it is currently possible to predict who will experience dementia 12 years prior to formal diagnosis [ 6 ]. Therefore, it may be possible to identify at-risk people before their cognitive decline worsens. Dementia has an effect on both speech and writing (e.g., [ 7 , 8 ]); therefore, measuring the first signs of decline in these functions may provide an early biomarker for dementia. In early-stage Alzheimer’s, researchers have observed impairments in both producing and understanding words and sentences [ 9 ]. Therefore, by looking for changes in how someone uses language, then this may provide an early warning sign for dementia. The complexity of sentence structure, as measured by factors like the number of clauses per utterance, decreases with age in both spoken and written language [ 10 ]. Older adults struggle more with complex sentence structures, such as those with left-branching clauses, compared to younger adults [ 11 ].

Overall, linguistic changes are to be expected as people age (e.g., [ 12 ]). However, these changes become more profound within people with cognitive decline (e.g., [ 13 ]). If a patient’s writing history is available, then linguistic analysis techniques could be used to supplement clinical assessments or as a standalone early detection tool. Recent studies have done exactly this by measuring individual writers’ publications over their careers to analyse how their language use has evolved. Garrard et al. [ 14 ] studied the works of Iris Murdoch, a renowned English author who was diagnosed with Alzheimer’s posthumously. Her final novel, published shortly before her diagnosis, is widely considered to exhibit signs of cognitive decline. While Garrard found minimal differences in overall structure and syntax, they observed significant and consistent variations in lexical diversity and word choice between the final book and control books from earlier in Murdoch’s career. These results provide evidence that Alzheimer’s may indeed be measured using linguistic analysis, specifically the number of unique word-types relative to the overall wordcount. Le et al. [ 15 ] explored this further by including additional authors, additional books, and improved analysis techniques. Le et al. analysed two authors believed to have Alzheimer’s disease during their careers, Iris Murdoch and Agatha Christie, as well as P.D. James to act as a control participant, who published until the age of 88 without experiencing evidence of cognitive decline. They included twenty of Murdoch’s twenty-six novels, published between ages 35 and 76, sixteen of Christie’s novels written between ages 28 and 82, and fifteen of the novels of P.D. James. They then produced an analysis of the novels at the lexical level, using a variety of measures, including vocabulary size, lexical repetition, lexical specificity, word-class deficits, and fillers. Type-token ratio (TTR: e.g., [ 16 ]) calculates the proportion of unique words to the total wordcount, and the word-type introduction rate (WTIR: e.g., [ 17 ]), which measures the rate at which new words are introduced in the text, calculated every 10,000 words. Regarding lexical repetition, while intentional repetition can be a stylistic device, an increasing rate of repeated words may suggest a limited vocabulary or difficulty accessing words. To examine this, they conducted two analyses: a global analysis and a local analysis. Lexical specificity is calculated by the frequency of indefinite nouns and high-frequency, low-imagery verbs in each text. A higher proportion of these generic words suggests lower overall lexical specificity. Word class deficit (WCD: e.g., [ 18 ]) is an analysis of the distribution of word classes across each text, examining both the total number of words and the number of unique words. This allows for identification of potential deficits or overreliance on specific word classes and to measure the vocabulary size of open classes. Filler words are a measure of the proportion of interjections and filler words. While these words often appear in dialogue, fiction authors strive for natural-sounding conversations. However, this measure may be influenced by stylistic choices rather than cognitive decline and should be interpreted with care. Le, et al. observed that P.D. James maintained stable linguistic diversity into their late 80s, whereas, for Iris Murdoch and Agatha Christie TTR and WTIR were associated with cognitive decline and a decline in vocabulary led to an increase in repetitions in content words, and a word-class deficit can be seen in noun-token proportion, with a compensatory increase in verb-token proportion. They also observed a deficit in noun tokens that is significantly correlated with a rise in verb and pronoun tokens. Syntactic-complexity results were also found to fluctuate in a relatively wider range. Interestingly, they also report that deficits in Murdoch’s writing appeared in Murdoch’s late 40s and early 50s, which suggests that language deficits are observed many years before a formal diagnosis and indicates that Alzheimer’s disease has a long preclinical period. Therefore, linguistic analysis would appear to show promise in identifying whether an author has experienced cognitive decline and may even indicate when the preclinical phase of dementia has begun.

The current research further explores the idea of using lexical analysis in dementia by studying the works of Sir Terry Pratchett. Terry Pratchett was an English author, humourist, and satirist, best known for his Discworld series of 41 comic fantasy novels published between 1983 and 2015. Terry Pratchett was diagnosed with Posterior Cortical Atrophy (PCA) in December 2007. This diagnosis came at a time when he was still actively writing and publishing his beloved Discworld series. Despite the challenges posed by his condition, Pratchett continued to write and advocate for dementia awareness until his passing in 2015. PCA is a rare form of Alzheimer’s disease that primarily affects visual processing and spatial awareness [ 19 ]: although note that 15% of PCA cases may be unrelated to AD pathology; see [ 20 ]. It affects areas in the back of the brain responsible for spatial perception, complex visual processing, spelling, and calculation [ 21 ]. PCA has also been found to be associated with word retrieval difficulties [ 22 ]. Given Terry Pratchett’s prolific writing career and the fact he continued writing after his diagnosis, a linguistic analysis of his novels could provide valuable insights into the potential early signs of cognitive decline. By comparing his earlier works to his later ones, particularly those written closer to his dementia diagnosis, it would be possible to identify subtle changes in linguistic patterns, such as decreased lexical diversity, and a decline in the use of specific word classes, such as nouns and adjectives, potentially preceding clinical diagnosis.