I read an astonishing article this afternoon titled ‘Quantitative Analysis of Culture Using Millions of Digitized Books‘, published early last year in the journal Science. Based on Google’s effort to digitise all books in all languages, researchers have carried out computational analysis on a corpus of over 5 million books – approximately 4% of all books ever published – to give access to vast amounts of data on word use.
The availability of this data allows researchers to observe cultural trends and then subject them to quantitative investigation – the study of ‘culturomics‘. The paper illustrates fascinating changes in language size and use, and shows how the data is used to draw more socio-cultural conclusions.
Best of all, Google has a nifty tool for presenting the data called the ngram viewer, which has allowed me to do a little culturomics of my own for the field of engineering.