Dirk Alvermann

Posted by Dirk Alvermann on

First volumes with decisions of the Wismar High Court online

Last week we were able to provide the first volumes with the opinions of the assessors of the High Royal Tribunal to Wismar – the final Court of appeal in the German territories of the Swedish Crown. Assessors  is what the judges at the tribunal are called. Since the Great Nordic War there was only a panel of four judges instead of eight. The Deputy President assigned them the cases in which they should form a legal opinion. As at the Reichskammergericht at Wetzlar, speakers and co-referees were appointed for each case, who formulated their opinions in writing and discussed them with their colleagues.  If the votes of the two judges were in agreement, the consensus of the remaining colleagues was only formally requested in the court session. In addition, all relations had to be checked and confirmed by the Deputy President. If the case was more complicated, all assessors expressed their opinion on the verdict. These reasons for the verdict are recorded in the collection of so-called “Relationes”.

These relations are a first-class source for the history of law, since they refer first to the course of the conflict in a narrative and then propose a judgment.  Here we can understand both the legal bases in the justifications and the everyday life of the people in the narratives.The text recognition was realized with an HTR-model that was trained on the manuscripts of 9 different judges of the royal tribunal. The training set consisted of 600,000 words. Accordingly, the accuracy rate of handwritten text recognition is good, which in this case is about 99%.

The results can be seen here. How to navigate in our documents and how the full text search works is explained here.

Who were the judges?

In the second half of the 18th century there was a new generation of judges. At the end of the 1750s / at the beginning of the 1760s, justice  at the tribunal was administered by: Hermann Heinrich von Engelbrecht (1709-1760), since 1745 as Assessor, since 1750 as Deputy President, Bogislaw Friedrich Liebeherr (1695-1761), since 1736 as Assessor, Anton Christoph Gröning (1695-1773), since 1749 as Assessor, Christoph Erhard von Corswanten (about 1708-1777), since 1751 Assessor, since 1761 Deputy President, Carl Hinrich Möller (1709-1759), since 1751 as Assessor, Joachim Friedrich Stemwede (about 1720-1787), since 1760 as Assessor, Johann Franz von Boltenstern (1700-1763), since 1762 as Assessor, Johann Gustrav Friedrich von Engelbrechten (1733-1806), between 1762 and 1775 as Assessor and Augustin von Balthasar (1701-1786), since 1763 as Assessor, since 1778 as Deputy President.

Posted by Dirk Alvermann on

Transcribus in Chicago

Transkribus will be presented at this year’s meeting of the Social Sciences History Association (SSHA) in Chicago. Günter Mühlberger will not only present the potential of Transkribus, but also first results and experiences. These results come from the processing of the cadastral protocols of the Tiroler Landesarchiv and our digitization project. He will pay special attention to the training of HTR models and the chances of keyword spotting.  The lecture will take place on 21.11. at 11:00 am under the title: ‘Handwritten Text Recognition and Keyword Spotting as Research Tools for Social Science and History’ in Session 31 (Emerging Methods: Computation/Spatial Econometrics).

Posted by Dirk Alvermann on

How to create test sets and why they are important, #2

Release 1.7.1

What is the best procedure for creating test sets?
In the end, everyone can find their own way. In our project, the pages for the test sets are already selected during the creation of the GT. They receive a special edit status (Final) and are later collected in separate documents. This ensures that they will not accidentally be put into  training. Whenever new GT is created for future training, the material for the test set is also extended at the same time. So both sets grow in proportion.

For the systematic training we create several Documents, which we call “test sets” and which are each related to a single Spruchakte (one year). For example, we create a “test set 1594” for the Document of the Spruchakte 1594. Here, we place representatively selected images, which should reflect the variety of writers as exactly as possible. In the “mother document” we mark the pages selected for the test set as “Final” to make sure that they will not be edited there in the future. We have not created a separate test set for each singel record or year, but have proceeded in five-year steps.

Since a model is often trained over many rounds, this procedure also has the advantage that the test set always remains representative. The CERs of the different versions of a model can therefore always be compared and observed during development, because the test is always executed on the same (or extended) set. This makes it easier to evaluate the progress of a model and to adjust the training strategy accordingly.

Transkribus also stores the test set used for each training session in the affected collection independently. So you can always fall back on it.
It is also possible to select a test set just before the training and simply assign individual pages of the documents from the training material to the test set. This may be a quick and pragmatic solution for the individual case, but it is not suitable for the planned development of powerful models.

Posted by Dirk Alvermann on

How to create test sets and why they are important, #1

Release 1.7.1

If we want to know how much a model has learned in training, we have to  test it. We do this with precisely defined test sets. Test sets – like the training set – contain exclusively Ground Truth. However, we make sure that this GT has never been used to train the model. So the model does not “know” this material. This is the most important characteristic of test sets. A text page that has already been used as training material will always be better read by the model than one it is not yet “familiar” with. This can easily be proved experimentally. So if you want to get valid statements about CERs and WER, you need “non-corrupted” test sets.

It is also important that a test set is representative. As long as you train an HTR model for a single writer or an individual handwriting, it’s not difficult – after all, it’s always the same hand. As soon as there are several writers involved, you have to make sure that all the individual handwritings used in the training material are also included in the test set. The more different handwritings are trained in a model, the larger the test sets will be.

The size of the test set is another factor that influences representativity. As a rule, a test set should contain 5-10% of the training material. However, this rule of thumb should always be adapted to the specific requirements of the material and the training objectives.

To illustrate this with two examples: Our model for the Spruchakten from 1580 to 1627 was trained with a training set of almost 200,000 words. The test set contains 44,000 words. This is of course a very high proportion of about 20%. It is due to the fact that material of about 300 different writers was trained in this model, which must also be represented in the test set. – In our model for the judges’ opinions of the Wismar Tribunal, there are about 46,000 words in the training set, the test set contains only 2,500 words, i.e. a share of about 5%. However, we only have to deal with 5 different writers. In order to have a representative test set, the material is sufficient.

Posted by Dirk Alvermann on

Word Error Rate & Character Error Rate – How to evaluate a model

Release 1.7.1

The Word Error Rate (WER) and Character Error Rate (CER) indicate the amount of text in a handwriting that the applied HTR model did not read correctly. A CER of 10% means that every tenth character (and these are not only letters, but also punctuations, spaces, etc.) was not correctly identified. The accuracy rate would therefore be 90 %. A good HTR model should recognize 95% of a handwriting correctly, the CER is not more than 5%. This is roughly the value that is achieved today with “dirty” OCR for fracture fonts. Incidentally, an accuracy rate of 95% also corresponds to the expectations formulated in the DFG’s Practical Rules on Digitization.

Even with a good CER, the word error rate can be high. The WER shows how good the exact reproduction of the words in the text is. As a rule, the WER is three to four times higher than the CER and is proportional to it. The value of the WER is not particularly meaningful for the quality of the model, because unlike characters, words are of different lengths and do not allow a clear comparison (a word is already incorrectly recognized if just one letter in it is not correct). That is why the WER is rarely used to characterize the value of a model.

The WER, however, gives clues to an important aspect. Because when I perform a text recognition with the aim of later performing a full text search in my document, the WER shows me the exact success rate that I can expect in my search. The search is for words or parts of words. So no matter how good my CER is: with a WER of 10%, potentially every tenth search term cannot be found.

Tips & Tools
The easiest way to display the CER and WER is to use the Compare function under Tools. Here you can compare one or more pages of a Ground Truth version with an HTR text to estimate the quality of the model.

Posted by Dirk Alvermann on

Ground Truth is the Alpha and Omega

Release 1.7.1

Ground Truth (GT) is the basis for the creation of HTR models. It is simply a typewritten copy of the historical manuscript, a classic literal or diplomatic transcription that is 100% correct – plainly “Groundt Truth”.

Any mistake in this training material will cause “the machine” to learn – among many correct things – something wrong. That’s why quality management is so important when creating GT. But don’t panic, not every mistake in the GT has devastating consequences. It simply must not be repeated too often; otherwise it becomes “chronic” for the model.

In order to ensure the quality of the GT within our project, we have set up a few fixed transcription guidelines, as you know them from edition projects. It is worthwhile to strive for a literal, character-accurate transcription. Regulations of any kind must be avoided; e.g. normalizations, such as the vocal or consonant usage of “u” and “v” or the encoding of complex abbreviations.

If the material contains only one or two different handwritings, about 100 pages of transcribed text are sufficient for a first training session. This creates a basic model that can be used for further work. In our experience, the number of languages used in the text is irrelevant, since the HTR models usually work without dictionaries.

In addition to conventional transcription, Ground Truth can also be created semi-automatically. Transkribus offers a special tool – Text2Image – which is presented in another post.

Posted by Dirk Alvermann on

WebUI & Expert Client

As we said before, this blog is almost exclusively about the Expert Client of Transkribus. It offers a variety of possibilities. To handle them it requires a certain level of knowledge.

The tools of the WebUI are much more limited, but also easier to work with. In the WebUI it is not possible to perform an automatic layout analysis or to start an HTR, let alone to train a model or to interfere in the user management. But that’s not what it’s meant for.

The WebUI is the ideal interface for crowd projects with a lot of volunteers who mainly transcribe or comment and tag content. And this is exactly what it is used for most of the time. The coordination of such a crowd project is done via the Expert Client.

The WebUI’s advantages are that it can be used without any requirements. It is a web application called from the browser; no installation, no updates, etc. Moreover, it is almost intuitive and can be used by anyone without any previous knowledge.


Tips & Tools
The WebUI has also a version management – somewhat adapted for crowd projects. When a transcriber is done with the page to be edited, he sets the edit status to “ready for review”, so that his supervisor knows that now it’s his turn.


Posted by Dirk Alvermann on

Knowing what you want

A digitization project with Handwritten Text Recognition can have very different goals. They can range from the critical digital edition to the provision of manuscripts as full texts to the indexing of large text corpora via Key Word Spotting. All three objectives allow different approaches, which have a great influence on the technical and personnel efforts.

In this project, only the last two target definitions are of interest.  A critical edition is not intended, even if the full texts generated in this project could serve as the basis of such.

We aim at a complete indexing of the manuscripts by automatic text recognition. The results will then be made public online in the Digital Library Mecklenburg-Vorpommern. A search is available there, which shows the hits in the image itself. The user, who has sufficient palaeographic knowledge, can explore the context of the hit in the image himself or switch to a modern full text view, or even only use the latter.

Posted by Dirk Alvermann on

Why HTR will change it all

For some years now, archives and libraries have been dedicating more and more of their time to the digitisation of historical manuscripts. The strategies are quite different. Some would like to present their “treasures” in a contemporary manner, others would like to make more extensive collections available for use in an appropriate digital form. The advantages of digitisation are obvious. The original sources are preserved and the interested researchers and non-experts can access the material independently of place and time without having to spend days or weeks in reading rooms. Considering the practice of the 20th century, this is an enormous step forward.

Initially, such digital services provide no more than a digital image of the original historical source. They are developed and maintained at gerat expense, both financially and in terms of staff. If you look at the target groups of these services, you can see that they are mainly aimed at the very same people who also visit archives and libraries. However, the addressees usually have the ability to decipher such historical manuscripts. Optimistically speaking, we are talking about one or two percent of the population. For everyone else, these digital copies are just beautiful to look at.

Keep this picture in mind if you want to understand why the Handwritten Text Recognition (HTR) is opening a whole new chapter in the history of digital indexing and use of historical documents. In a nutshell: HTR allows us to move from simple digitalization to the digital transformation of historical sources. Thanks to the HTR, not only the digital image of a manuscript but also its content is made available in a form that can be read by everyone and searched by machines – over hundreds of thousands of pages.

Thus the contents of historical handwritings can be opened up to a public to whom it has so far remained closed or at least not easily accessible. This does not only adress the non-professional researchers. Access to the contents of the sources will also be much easier for academic experts from disciplines that do not have historical auxiliary sciences like palaeography as part of their classical educational canon. This makes new constellations of inderdisciplinary research possible. Ultimately, since the contents of the manucsripts can now be evaluated by machine, questions and methods of the Digital Humanities can be more easily applied to the material than before.

Tips & Tools
Recommendation for further reading: Mühlberger, Archiv 4.0 oder warum die automatisierte Texterkennung alles verändern wird Tagungsband Archivtag Wolfsburg, in: Massenakten – Massendaten. Rationalisierung und Automatisierung im Archiv (Tagungsdokumentationen zum Deutschen Archivtag, Band 22), hg. v. VdA, Fulda 2018, S. 145-156.