Why HTR will change it all

For some years now, archives and libraries have been dedicating more and more of their time to the digitisation of historical manuscripts. The strategies are quite different. Some would like to present their “treasures” in a contemporary manner, others would like to make more extensive collections available for use in an appropriate digital form. The advantages of digitisation are obvious. The original sources are preserved and the interested researchers and non-experts can access the material independently of place and time without having to spend days or weeks in reading rooms. Considering the practice of the 20th century, this is an enormous step forward.

Initially, such digital services provide no more than a digital image of the original historical source. They are developed and maintained at gerat expense, both financially and in terms of staff. If you look at the target groups of these services, you can see that they are mainly aimed at the very same people who also visit archives and libraries. However, the addressees usually have the ability to decipher such historical manuscripts. Optimistically speaking, we are talking about one or two percent of the population. For everyone else, these digital copies are just beautiful to look at.

Keep this picture in mind if you want to understand why the Handwritten Text Recognition (HTR) is opening a whole new chapter in the history of digital indexing and use of historical documents. In a nutshell: HTR allows us to move from simple digitalization to the digital transformation of historical sources. Thanks to the HTR, not only the digital image of a manuscript but also its content is made available in a form that can be read by everyone and searched by machines – over hundreds of thousands of pages.

Thus the contents of historical handwritings can be opened up to a public to whom it has so far remained closed or at least not easily accessible. This does not only adress the non-professional researchers. Access to the contents of the sources will also be much easier for academic experts from disciplines that do not have historical auxiliary sciences like palaeography as part of their classical educational canon. This makes new constellations of inderdisciplinary research possible. Ultimately, since the contents of the manucsripts can now be evaluated by machine, questions and methods of the Digital Humanities can be more easily applied to the material than before.

Tips & Tools
Recommendation for further reading: Mühlberger, Archiv 4.0 oder warum die automatisierte Texterkennung alles verändern wird Tagungsband Archivtag Wolfsburg, in: Massenakten – Massendaten. Rationalisierung und Automatisierung im Archiv (Tagungsdokumentationen zum Deutschen Archivtag, Band 22), hg. v. VdA, Fulda 2018, S. 145-156.