Don’t worry –I’m not about to write up the 200-year history of shipwreck and salvage. But the graph above succinctly relays the gist of the tale: fewer shipwreck (blue), more salvage (red).
So what is this? Launched by Google last year, the Ngram Viewer searches an amazing “corpus” of digitized texts and plots how frequently a given word(s) or phrase(s) occur. The Ngram above depicts a search limited to “American English” (books published in North America) published between 1800 and 2000–something on the order of 155 billion words (about 4% of everything published during that time).
Google’s Ngram Viewer mines the largest corpus but has relatively limited search capabilities. Smaller corpuses, like the Corpus of Historical American English (COHA), which chimes in at around 400 million words, offer more sophisticated search options. (I’m still trying to wrap my head around them.) You can, for example, sort results by genre and if my excel chops were up to par I would have included a graph showing the fluctuations for “shipwreck” between 1800 and 2000. But that’ll have to wait for another day–chapter 4 beckons.
All in all this is pretty amazing stuff. But what it means and how to use it are not particularly clear. I’m fascinated in the place of shipwrecks, salvage and the shore in American culture during the nineteenth century, so this seems like it could be a great tool for my research. It almost seems too easy — like making a Wordle. But interpreting the results and effectively searching any corpus has proven to be anything but straightforward. For a primer on the growing debate over text mining and the emerging field of Culturomics (or, preferably, Freakumanities) see this recent post by Dan Cohen. It’s a debate worth following–if only for the questions it raises about culture, history and our digital future.