Monthly Archives

3 Articles

Posted by Dirk Alvermann on

On the Shoulders of Giants: Training with Base Models

Release 1.10.1

If you want to develop generic HTR models, there is no way around working with base models. When training with base models, each training session for a model is based on an existing model, i.e. a base model. This is usually the last HTR model that was trained in the corresponding project.

Base models “remember” what they have already “learned”. Therefore each new training session improves the quality of the model (theoretically). The new model learns from its predecessor and thus becomes better and better. Therefore, training with Base Models is also particularly suitable for large generic models that are continuously developed over a long period of time.

To carry out training with Base Model, you simply select a specific Base Model in the training tool – in addition to the usual settings. Then, from the HTR Model Data tab, insert the Train Set and the Validation Set (called Test Set in earlier Trankribus versions) of the base model, as well as the new Training and Validation Set. Additionally you can add more new Ground Truth and then start the training.

Posted by Elisabeth Heigl on

Validation possibilities

Release 1.10.1

There are different ways to measure the accuracy of our HTR-models in Transkribus. Three Compare tools calculate the results and present them in different ways. In all three cases the hypothesis (HTR version) of a text is compared with a corresponding reference (correct version, i.e. GT) of the same text.

The first tool which shows the most immediate result is “Compare Text Versions“. It visualizes the validation for the currently opened page in the text itself. Here we can see exactly which mistakes the HTR has made at which points.

The standard “Compare” gives us these same validation results as numerical values. Among other things, it calculates the average word error rate (WER), the character error rate (CER) and the respective accuracy rates. (If someone knows what the bag tokens are about, he/she is welcome to write us a comment). In the “Compare” we also have the possibility to run the “Advanced Compare“, which allows us to perform the corresponding calculations for the whole document or only for certain pages.

We already have presented the validation tool “Compare Sample” briefly in another post to show how to create Test Samples. The actual Sample Compare then predicts how a model will potentially read on a Test Sample that has been created for this purpose.

Posted by Dirk Alvermann on

Generic Models and what they do

Release 1.10.1

In a previous post we talked about the differences between special models and generic models. Special models should always be the first choice if your material includes a limited number of writers. If your material is very diverse – for example, if the writer changes frequently in a bundle of handwritings – it makes sense to train a generic model.

The following articles are based on our experiences with the training of a generic model for the Responsa of the Greifswald Law Faculty, in which about 1000 different writer’s hands were trained.

But first: What should a generic HTR model be able to do? The most important point has already been said: It should be able to handle a variety of different writer’s hands. But it should also be able to “read” different fonts (alphabets) and languages and be able to interpret abbreviations. Below are a few typical examples of such challenges from our collection.

Different writer’s hands in one script:

Abbreviations:

Different languages in one script: