Fame, fortune, and words
Digitising books is a problematic endeavour, not because of the technology, but due to the ethical issues that arise. Once a book becomes digitised problems with copyright and sales enter the frame. Nevertheless, Google Books has digitised more than fifteen million books which is about twelve per cent of all the books ever published. The digitising process has enabled a detailed analysis of the booksâ€™ content that would not be possible if done by individuals and some remarkable findings have resulted.
The analysis that has taken place was done on five million of the fifteen million digitised books. That meant that approximately 500 billion words were in the final data set. Digging into the words used in those books across time has revealed some fascinating facts about our evolving culture.
In the last 50 years the English language was shown to grown by more than 70 per cent with around 8500 words being added each year. Interestingly, only around 50 per cent of the words used in current books can be found in standard dictionaries. The findings are not only about word usage though, they also reflect what is happening in society.
For instance, using references to a name as a measure of fame, the analysis showed that actors reach the zenith of their fame at around age 30. Writers however, have to wait a little longer as they reach their peak at age 40 but achieve a higher peak of fame. If you want fame, then donâ€™t pursue science as biologists and physicists only achieve fame late in life, if at all, and mathematicians tend not to achieve it all.
Intriguing findings included that mentions of love and God reached a peak in the early to mid 1800s and have since experienced a steady and continuing decline. By contrast, the word â€œsexâ€ showed exactly the opposite pattern and was hardly mentioned in the mid 1800s but its usage has risen ever since and is now at a peak. For some reason the word â€œcarâ€ had a steady rise in usage from the 1880s to the 1940s but then dropped off until the early 1960s saw a resurgence and now mentions are back to being higher than they were in the 1940s.
More questions than answers can sometimes be raised by the data found but the potential understanding offered by this new science of â€œculturomicsâ€ is infinite.