Category Archives

20 Articles

Posted by Elisabeth Heigl on

Compare Samples

Release 1.10.1

As the name suggests, the Compare Samples tool tests the capabilities of an HTR model based on a sample rather than a manually selected test set. We have explained in an earlier post how to create such samples, that they represent an objective alternative to conventional test sets and why they can be created with much less effort.

“Compare Samples” may look like a validation tool, but is actually not one of them. You can use it to validate an HTR model, but Advanced Compare is better suited for this.  The real function of “Sample Compare” is to make predictions about the success of an HTR model on a given material.

You may remember the Model Booster. There you need a suitable HTR model that can serve as a base model for a planned HTR training. With the numerous Public Models available, it is a good idea to first check with “Compare Samples” which model fits to your project.

To create such a prediction for a sample, you first have to run the selected HTR models over the entire sample (before that, of course, you have already created the GT for the sample). Then open the Samples tab of the “Compare Samples” tool. This tab lists all samples of your active collection. You select the sample that will be used as the basis for the prediction. Now you can select the model in the middle, whose text version should serve as a reference for the GT. Start “Compute” and you’re done.

The tool now calculates average values for all lines of the sample with an upper bound, a lower bound and an average value. In the range between upper bound and lower bound you should find the Character Error Rate for 95% of your material at which the selected HTR model is expected to work.  In our example below, between 4,7 and 2,9 %.

This way you can compare as many models for your material as you like. But the tool also allows a few other things. For example, you can easily check how an HTR model with or without language model or dictionary works on your material and if it is worth using one or the other. Of course this is especially useful to check your own models.


Tips & Tools
Create several smaller samples rather than one giant sample for all your material. You can separate them chronologically or by writer’s hands, for example. This will allow you to make a differentiated prediction for the use of HTR models on all your material or parts of it.

Posted by Elisabeth Heigl on

CER? Don´t Worry!

Release 1.10.1

The Character Error Rate (CER) compares, for a given page, the total number of characters (n), including spaces, to the minimum number of insertions (i), substitutions (s) and deletions (d) of characters that are required to obtain the GT result. If that was not mathematical enough for you:

CER = [ (i + s + d) / n ]*100

This means that even all the little mistakes are statistically full-fledged errors. Every missing comma, a “u” instead of a “v”, an additional space or even an uppercase letter instead of a lowercase letter are included in the CER as “whole errors”. The small details neither disturb the reading and understanding of the text nor do they prevent the search engine from finding a term.
So don’t only look at the numbers but also at the text comparison.Your model is usually better than the CER (and especially the WER) suggest.
To illustrate this, we have calculated this exemplary:

Posted by Dirk Alvermann on

Use Case: “Model Booster”

Release 1.10.1

In our example we want to improve our HTR model for the Responsa. This is an HTR model that can read 17th century Kurrent documents. In the search for a possible base model, you can find two candidates in the “public models” of Transkribus: “German Kurrent M1+” from the Transkribus Team and “German_Kurrent_XVI-XVIII_M1” from Tobias Hodel. Both could fit. But the test on the Sample Compare shows that “German_Kurrent_XVI-XVIII_M1” performed better with a predicted average CER of 9.3% on our sample set.

Therefore “German_Kurrent_XVI-XVIII_M1” was chosen as the base model for the training. Afterwards the Ground Truth of the Responsa (108.000 words) and also the Validation Set of our old model was added. The average CER of our HTR model has improved considerably after the Base Model Training, from 7.3% to 6.6%.As you can see in the graph, the base model on the test set reads much worse than the original model, but the hybrid of the two is better than either one. The improvement of the model can be seen in each of the years tested and is up to 1%.

Posted by Dirk Alvermann on

Combining Models

Release 1.10.1

The longer you train HTR models yourself, the more you will be interested in the possibility of combining models. For example, you may want to combine several special models for individual writers or models that are specialized in particular fonts or languages.

To achieve a combination of models there are different possibilities. Here I would like to introduce a technique that works in my experience especially well for very large generic models – the “Model Booster“.

You start a base model training and use a powerful, foreign HTR model as base model and your own ground truth as train set. But before you start, two recommendations:

a) take a close look at the characteristics of the base model you are using (for how long is it trained, for which font style and which language?) – they have to match those of your own material as much as possible.

b) if possible try to predict the performance of the base model on your own material and then choose the base model with the best performance. Such a prediction can be made quite easily using the Sample Compare function. Another possibility is to test the basemodel with the Andvanced Compare on your own test set.

Posted by Elisabeth Heigl on

Advanced Compare

Release 1.10.1

In contrast to the visualization of the errors via the tool “Compare Text Versions” the ordinary “Compare” gives us the same validation results as numerical values.

In addition to the word error rate, we also get the somewhat more conclusive character error rate (CER). Furthermore, in the “Advanced Compare” we can have these results calculated for the whole document or for specific pages in it – always provided that the selected pages have a GT version. Because in Advanced Compare the GT is automatically set as reference.

So select the model to be validated and start the calculation. The result gives you not only the average value for the whole document, but also the corresponding values for each individual page. And that makes the Advanced Compare the most important validation tool in systematic analysis when developing models.

In our rather complex model training for the Spruchakten (over 1000 writer’s hands from more than 150 years) we worked with separate small test sets. On them we could validate our new models over and over again via the Advanced Compare and analyse the results thoroughly. In this way not only average improvements or worsening could be traced in detail. We were also able to identify particular exceptions, such as individual concept fonts or particularly “smeared” ones, which worsened the otherwise good overall result. In addition, we were able to create many graphics from the numerical material, which helped us – and now you – to better understand certain phenomena and developments.

Tips & Tools
You can also download the validation results of the Advanced Compare as an Excel spreadsheet to your computer. To do so, you can select a folder under the result display where you want to save the document. Then click on the button “Download XLS”. Do not just press Enter – otherwise you will have to start all over again.

Posted by Elisabeth Heigl on

Compare Text Versions

Release 1.10.1

So, a new HTR model has run over a page and you want to have a first overview on how the model has read? Go to the tool option “Compute Accuracy”, enter the corresponding reference (GT) and hypothesis (HTR Text) and take a look at the validation tool „Compare Text Versions“:

The Text Compare visualizes the comparison of HTR and GT version directly in the text. A word with an error appears red marked and crossed out, behind it you see in green the correct version from the GT. So the text Compare basically shows the word error rate (WER). But above all it allows us to quickly recognize which mistakes exactly were made. So we can also see, for example, that many of the errors are actually minor mistakes, which don’t really bother us when reading and searching for words. In our example here we see a WER of 15%.

Posted by Dirk Alvermann on

Use Case: Extend and improve existing HTR-Models

Release 1.10.1

In the last post we described that a base model can pass on everything it has “learned” to the new HTR model. With additional ground truth, the new model can then extend and improve its capabilities.

Here is a typical use case: In our subproject on the Assessor Relations of the Wismar Tribunal we train a model with eight different writers. The train set contains 150,000 words, the CER was 4.09% in the last training. However, the average CER for some writers was much higher than for others.

So we decided to do an experiment. We added 10,000 words of new GT for two of the obvious writers (Balthasar and Engelbrecht)and used the Base Model as well as its Training and Validation Set for the new training.

As a result, the new model had an average CER of 3.82% – it had improved. But what is remarkable is that not only the CER of the two writers for which we had added new GT was improved – in both cases up to 1%. Also the reliability of the model applied to the other writers did not suffer, but was reduced as well.

Posted by Dirk Alvermann on

On the Shoulders of Giants: Training with Base Models

Release 1.10.1

If you want to develop generic HTR models, there is no way around working with base models. When training with base models, each training session for a model is based on an existing model, i.e. a base model. This is usually the last HTR model that was trained in the corresponding project.

Base models “remember” what they have already “learned”. Therefore each new training session improves the quality of the model (theoretically). The new model learns from its predecessor and thus becomes better and better. Therefore, training with Base Models is also particularly suitable for large generic models that are continuously developed over a long period of time.

To carry out training with Base Model, you simply select a specific Base Model in the training tool – in addition to the usual settings. Then, from the HTR Model Data tab, insert the Train Set and the Validation Set (called Test Set in earlier Trankribus versions) of the base model, as well as the new Training and Validation Set. Additionally you can add more new Ground Truth and then start the training.

Posted by Elisabeth Heigl on

Validation possibilities

Release 1.10.1

There are different ways to measure the accuracy of our HTR-models in Transkribus. Three Compare tools calculate the results and present them in different ways. In all three cases the hypothesis (HTR version) of a text is compared with a corresponding reference (correct version, i.e. GT) of the same text.

The first tool which shows the most immediate result is “Compare Text Versions“. It visualizes the validation for the currently opened page in the text itself. Here we can see exactly which mistakes the HTR has made at which points.

The standard “Compare” gives us these same validation results as numerical values. Among other things, it calculates the average word error rate (WER), the character error rate (CER) and the respective accuracy rates. (If someone knows what the bag tokens are about, he/she is welcome to write us a comment). In the “Compare” we also have the possibility to run the “Advanced Compare“, which allows us to perform the corresponding calculations for the whole document or only for certain pages.

We already have presented the validation tool “Compare Sample” briefly in another post to show how to create Test Samples. The actual Sample Compare then predicts how a model will potentially read on a Test Sample that has been created for this purpose.

Posted by Dirk Alvermann on

Generic Models and what they do

Release 1.10.1

In a previous post we talked about the differences between special models and generic models. Special models should always be the first choice if your material includes a limited number of writers. If your material is very diverse – for example, if the writer changes frequently in a bundle of handwritings – it makes sense to train a generic model.

The following articles are based on our experiences with the training of a generic model for the Responsa of the Greifswald Law Faculty, in which about 1000 different writer’s hands were trained.

But first: What should a generic HTR model be able to do? The most important point has already been said: It should be able to handle a variety of different writer’s hands. But it should also be able to “read” different fonts (alphabets) and languages and be able to interpret abbreviations. Below are a few typical examples of such challenges from our collection.

Different writer’s hands in one script:


Different languages in one script: