Gekaapte Brieven

About the Corpus application

The corpus application is developed by the INT. The backend of the application is the BlackLab Lucene based search engine developed for corpora with token-based annotation ( The web-based frontend is a further development of the corpus-frontend application developed by INT ( in CLARIN and CLARIAH projects. Its design is inspired by the first version of the OpenSoNaR user interface by Tilburg and Radboud University (

About the Gekaapte Brieven

The Gekaapte Brieven comprises the transcriptions of 5862 letters and other documents, such as bills, that were written in the 17th and 18th centuries to and from sailors and others from abroad. These letters and documents were present on Dutch ships that were hijacked by the English during one of the four wars that were fought by Britain and the Republic of the Seven United Provinces in this period. These letters ended up in the archives of the High Court of Admiralty (now part of The National Archives in Kew), where Dutch historian S. Braunius rediscovered them in 1980. Within Metamorfoze, the contents of 7 boxes were photographed, yielding around 9000 scans available at the Dutch National Archive with the inventory numbers HCA30-223, HCA30-226, HCA30-227, HCA30-336, HCA30-379 and HCA30-749.

In 2011 a crowdsourcing project was launched at the Meertens Institute by Nicoline van der Sijs to add metadata (such as date, sender, addressee, addresses, genre) and transcriptions to the 9000 available scans in order to make the Gekaapte Brieven accessible for research. This project was made possible by support from the Culture Fund. Volunteers made so-called diplomatic transcriptions, meaning they followed the original as faithfully as possible. For more information (in Dutch) see this pdf. Thanks to the crowdsourcing project, it became clear that about 6000 of the 9000 scans contain handwritten text (the rest are blank pages or pages from printed books), with about half of the documents being bills, and the other half being personal letters. Most of the documents are written in Dutch, but some are in other languages, such as French, Spanish, Portuguese or German. The letters date from the 17th and 18th centuries. Two-thirds of the 17th-century letters were written between 1664-1672. Most of the 18th-century letters come from the period 1773-1790.

In two tranches, in 2012 and 2014, the transcriptions and scans were placed on a website at the Meertens Institute, developed by Rob Zeeman. Since then the metadata have been enriched and corrected with a grant from the Time Capsule project. The technology of the website was outdated in 2019, which meant the transcriptions were no longer available. The data were then transferred to the Institute of the Dutch Language (INT), where the metadata were cleaned up and enriched, and a new website and search engine for the data were developed. For an overview and explanation of the metadata, see the Application manual.

The title of each document contains information about the inventory number under which it can be found in the National Archives. Texts can be searched and filtered by metadata and by words in the texts. The page and a thumbnail of the original photograph are displayed. That photo can also be viewed separately in high resolution. When you click on this, a new page is shown where you can find the Content (thumbnail and transcription), Metadata, Statistics (number of tokens, number of unique word types, type/token ratio) and Page image (original scan in high resolution) of the document. If a letter or document consists of more than one page, under Content the thumbnails of the corresponding pages are shown in the left-hand column; to consult these, you can click on the thumbnail. See the Application manual for further information.

Of a small number of pages, we could not identify which document they belonged to; this was caused by the way the originals were photographed. In those cases, we displayed the documents in the order of the inventory number. To find all letters from one writer or one recipient, the spelling of the names in the Sender field and in the Recipient field have been normalised. Some documents are undated. To these, we automatically added an indication of the period from which they date, e.g. 1642-1675 or 1773-1790.

The Gekaapte Brieven website supplements the website Brieven als Buit, also hosted by the INT, which contains 1033 letters with additional metadata and was developed under the supervision of Marijke van der Wal.


We would like to thank the 110 volunteers of the Stichting Vrijwilligersnetwerk Nederlandse Taal who took care of the transcription and correction of all texts and metadata.

We would like to thank Peter van den Hooff and Harm Zwarts from the Time Capsule project for correcting the metadata.

When referring to the Gekaapte Brieven, please use the following reference:

Gekaapte Brieven (December 2023) [Online Service]. Available at the Dutch Language Institute:

For information on the Gekaapte Brieven, please contact the project leader Nicoline van der Sijs.

For BlackLab:

Software available at

Does, Jesse de, Jan Niestadt en Katrien Depuydt (2017), Creating research environments with BlackLab. In: Jan Odijk and Arjan van Hessen (eds.) CLARIN in the Low Countries, 151-165. London: Ubiquity Press. DOI:

For the corpus frontend:

Software available at:

Instituut voor de Nederlandse Taal Meertens Instituut Cultuurfonds Stichting Vrijwilligersnetwerk Nederlandse Taal Nationaal Archief Koninklijke Bibliotheek