Analyzing Samuel Pepys’ Dairy

An Introduction to Digital Humanities

Research Question

Samuel Pepys lived in London in 17th Century. His diary represents the period between 1659 and 1672. During this time London was subject to two disasters - the Great Plague of London (1665-66) and the Great Fire of London (1666). Does Samuel Pepys Diary reflect the events of the time and place he lived in?

Diary Source

The Project Gutenberg Diary of Samuel Pepys, Complete was analyzed for this assignment.

Text Analysis

Word frequency

The complete Samuel Pepys Diary has 1,286,966 words of which 23,083 are unique word forms.

The word home appears 7,431 times. In keeping with a daily diary domestic words appear frequently including: wife (4,476), bed (3,918), day (3,782), dinner (3,634), house (3,096), morning (2,644), night (2,531).

Samuel Pepys worked for the King and captured details of his daily working life as shown by the frequency of words including: sir (5,919), mr (5,709), office (4,811), lord (4,797), business (3,599), king (2,846), duke (2,296).

The word frequency cloud shows that words related to both domestic life, for example, home and house occur regularly in the diary. Words related to Pepys’ work for the King also appear frequently, for example the titles of those he interacted with, sir, lord, and king.

The Great Plague of London

The word plague appears 245 times in the diary. The trends graph shows this occurs most frequently in the sixth segment of the document (out of ten segments.) Assuming Voyant has broken up the segments in a linear manner the sixth segment corresponds to the middle of the period between 1659 and 1672.

Review of the contexts shows numerous phrases referring to the plague, including:

  • Mrs Rawlinson died of the plague (see August 9th, 1666)
  • He died of the plague, March 1666

In the diary references are made to the deaths from the plague, including:

  • 267 were deaths from the plague
  • deaths from the plague 7,165
  • and the deaths from the plague to 5,533

The Great Fire of London

The word fire appears 424 times in the diary. The trends graph show the most frequent occurrences in the seventh segment of the document (out of ten segments.) Again the seventh segment is towards the middle of the period between 1659 and 1672. The most frequent of the use of the word fire is after the most frequent use of the word plague, which corresponds the Great Fire of London (1666) occurring after the Great Plague of London (1665-1666).

The trends graph shows the frequency of the word plague and fire in the diary. Fire is mentioned most frequently after the most frequent mention of plague. This correlates the dates of the Great Plague of London, 1665-66, and the Great Fire of London, 1666. trends_fire_plague.png.


Word analysis of Samuel Pepys’ Diary shows that it does reflect the events of his time. The Great Plague of London and the Great Fire of London are mentioned with increased frequency around the times of their occurrence, 1665-66 and 1666 respectively.


Data Preparation

The Project Gutenberg headers and footers were removed from the source text before uploading to Voyant Tools. The command line instructions used were:

gsed '1,/START/d' $file_name > topped.txt
gsed -e '/END OF THIS PROJECT GUTENBERG EBOOK/,$d' topped.txt > tailed.txt

Data Analysis

The Voyant Tools website was used for the text analysis.


This post is based on an assignment I have completed for the edX “Introduction to Digital Humanities” course delivered by HarvardX.

Gareth Digby
Gareth Digby
A student of the digital humanities

My research interests include digital humanities.