Elisabeth Heigl


Posted by Elisabeth Heigl on

How to train PyLaia models

Release 1.12.0

Since version 1.12.0 it is possible to train PyLaia models in Transkribus besides the proven HTR+ models. We have gained some experience with this in the last months and are quite impressed by the performance of the models.

PyLaia models can be trained like HTR or HTR+ models using the usual training tool. But there are some differences.

Like a normal HTR+ model you have to enter the name of the model, a description and the languages the model can be used for. Unlike the training of HTR+ models, the number of iterations (epochs) is limited to 250. This is sufficient in our experience. You can also train PyLaia models with base models, i.e. you can design longer training series that are based on each other. In contrast to the usual training, PyLaia has an “Early Stopping” setting. It determines when the training can be stopped, if a good result is achieved. At the beginning of your training attempts you should always set this value to the same number of iterations as you have chosen for the whole training. So if you train with 250 epochs, the setting for “Early Stopping” should be the same. Otherwise you risk that the training will be stopped too early.

The most important difference is that in PyLaia Training you can choose whether you want to train with the original images or with compressed images. Our recommendation is: train with compressed images. The PyLaia training with original images can take weeks in the worst case (with a correspondingly large amount of GT). With compressed images a PyLaia training is finished within hours or days (if you train with about 500.000 words).

Tips & Tools
For more detailed information, especially for setting specific training parameters, we recommend the tutorial by Annemieke Romein and the READ Coop guidelines.

Posted by Elisabeth Heigl on

Our first public model for german current (17th century)

Today we proudly present our HTR-model “Acta 17” as a public model.

The model was trained on the base of more than 500,000 words from about a 1000 different writers during the period of 1580-1705. It can handle the languages German, Lower German and Latin and is able to decipher simple german and latin abbreviations. Besides the usual chancellery lettering, the training material also contained a selection of concept writings and printed material of the period.

The entire training material is based on legal texts or court writings from the Responsa of the Greifswald Law Faculty. Validation sets are based on a chronological selection of the years: 1580 – 1705. GT & validation set was produced by Dirk Alvermann, Elisabeth Heigl, Anna Brandt.

Due to some problems with creating a large series of base-model-trainings for HTR+ Models in the last couple of weeks, we decided to launch an HTR+ Model trained from the scratch.

It is accompanied by a PyLaia model, which is based on the same training and validation sets and was also trained without using a base model.

For the validation set we choose pages representing single years of the total set of documents. All together they represent 48 selected years, five pages of each year.

How the models do perform in the several time periods of the validation set, you can check in the comparison below. Both models did run without language model.

Posted by Elisabeth Heigl on

textual tagging

Like everything else, tagging can be integrated into your work on very different levels and with different requirements. In Transkribus, a large number of tags are available for a wide range of applications, some of which are described here.

We decided to try it with only two tags, namely “person” and “place”. These tags will later allow systematic access to the corresponding text passages.

When tagging, Transkribus automatically adopts the term under the cursor as “value” or “label” for the specific case. So if I mark “Wolgast” as in the example below and tag it as “place”, then two important pieces of information are already recorded. The same is true for the name of the person further down.

Transkribus offers the possibility to assign properties to each tagged element, e.g. to display the historical place name in modern spelling or to assign a gnd number to the person’s name. You can also create additional properties, geodata for places etc.

Given the amount of text we process, we have decided not to assign properties to our tags. Only the place names are identified as best as possible. The aim is to be able to display the tag-values separately for people and places next to the respective document when presenting them in the viewer of the Digital Library M-V, thus enabling the user to navigate systematically through the document.

Posted by Elisabeth Heigl on

Tagging: what for? – when and why tagging makes sense

Tagging allows – in addition to content indexing by HTR – systematic indexing of the text by the later user. In contrast to an HTR model that does its work independently, tagging has to be done mostly by hand, which means that it requires a lot of effort. Therefore, a realistic effort analysis should be carried out before developing far-reaching plans regarding tagging.

Due to the amount of material processed in our project, we primarily use tagging where it helps us in the practical work on the text. This is the case with structure tagging, where the layout analysis is improved with the help of the tagging and the P2PaLA developed from it, and then of course also with the tagging of textstyles in case of deletions and blackening. This is where tagging is basically used “area-wide” by us. A fixed component of our transcription rules is also the use of the “unclear” tag for passages that cannot be read correctly by the transcriber. In this case, the tag is used more for internal team communication.

For the systematic preparation of texts for which an HTR has already been performed, we are experimenting with the “person” and “place” tags in order to offer systematic indexing, at least in this limited form.

Posted by Elisabeth Heigl on

Compare Samples

Release 1.10.1

As the name suggests, the Compare Samples tool tests the capabilities of an HTR model based on a sample rather than a manually selected test set. We have explained in an earlier post how to create such samples, that they represent an objective alternative to conventional test sets and why they can be created with much less effort.

“Compare Samples” may look like a validation tool, but is actually not one of them. You can use it to validate an HTR model, but Advanced Compare is better suited for this.  The real function of “Sample Compare” is to make predictions about the success of an HTR model on a given material.

You may remember the Model Booster. There you need a suitable HTR model that can serve as a base model for a planned HTR training. With the numerous Public Models available, it is a good idea to first check with “Compare Samples” which model fits to your project.

To create such a prediction for a sample, you first have to run the selected HTR models over the entire sample (before that, of course, you have already created the GT for the sample). Then open the Samples tab of the “Compare Samples” tool. This tab lists all samples of your active collection. You select the sample that will be used as the basis for the prediction. Now you can select the model in the middle, whose text version should serve as a reference for the GT. Start “Compute” and you’re done.

The tool now calculates average values for all lines of the sample with an upper bound, a lower bound and an average value. In the range between upper bound and lower bound you should find the Character Error Rate for 95% of your material at which the selected HTR model is expected to work.  In our example below, between 4,7 and 2,9 %.

This way you can compare as many models for your material as you like. But the tool also allows a few other things. For example, you can easily check how an HTR model with or without language model or dictionary works on your material and if it is worth using one or the other. Of course this is especially useful to check your own models.

 

Tips & Tools
Create several smaller samples rather than one giant sample for all your material. You can separate them chronologically or by writer’s hands, for example. This will allow you to make a differentiated prediction for the use of HTR models on all your material or parts of it.

Posted by Elisabeth Heigl on

CER? Don´t Worry!

Release 1.10.1

The Character Error Rate (CER) compares, for a given page, the total number of characters (n), including spaces, to the minimum number of insertions (i), substitutions (s) and deletions (d) of characters that are required to obtain the GT result. If that was not mathematical enough for you:

CER = [ (i + s + d) / n ]*100

This means that even all the little mistakes are statistically full-fledged errors. Every missing comma, a “u” instead of a “v”, an additional space or even an uppercase letter instead of a lowercase letter are included in the CER as “whole errors”. The small details neither disturb the reading and understanding of the text nor do they prevent the search engine from finding a term.
So don’t only look at the numbers but also at the text comparison.Your model is usually better than the CER (and especially the WER) suggest.
To illustrate this, we have calculated this exemplary:

Posted by Elisabeth Heigl on

Use Case: “Model Booster”

Release 1.10.1

In our example we want to improve our HTR model for the Responsa. This is an HTR model that can read 17th century Kurrent documents. In the search for a possible base model, you can find two candidates in the “public models” of Transkribus: “German Kurrent M1+” from the Transkribus Team and “German_Kurrent_XVI-XVIII_M1” from Tobias Hodel. Both could fit. But the test on the Sample Compare shows that “German_Kurrent_XVI-XVIII_M1” performed better with a predicted average CER of 9.3% on our sample set.

Therefore “German_Kurrent_XVI-XVIII_M1” was chosen as the base model for the training. Afterwards the Ground Truth of the Responsa (108.000 words) and also the Validation Set of our old model was added. The average CER of our HTR model has improved considerably after the Base Model Training, from 7.3% to 6.6%.As you can see in the graph, the base model on the test set reads much worse than the original model, but the hybrid of the two is better than either one. The improvement of the model can be seen in each of the years tested and is up to 1%.

Posted by Elisabeth Heigl on

Combining Models

Release 1.10.1

The longer you train HTR models yourself, the more you will be interested in the possibility of combining models. For example, you may want to combine several special models for individual writers or models that are specialized in particular fonts or languages.

To achieve a combination of models there are different possibilities. Here I would like to introduce a technique that works in my experience especially well for very large generic models – the “Model Booster“.

You start a base model training and use a powerful, foreign HTR model as base model and your own ground truth as train set. But before you start, two recommendations:

a) take a close look at the characteristics of the base model you are using (for how long is it trained, for which font style and which language?) – they have to match those of your own material as much as possible.

b) if possible try to predict the performance of the base model on your own material and then choose the base model with the best performance. Such a prediction can be made quite easily using the Sample Compare function. Another possibility is to test the basemodel with the Andvanced Compare on your own test set.

Posted by Elisabeth Heigl on

Advanced Compare

Release 1.10.1

In contrast to the visualization of the errors via the tool “Compare Text Versions” the ordinary “Compare” gives us the same validation results as numerical values.

In addition to the word error rate, we also get the somewhat more conclusive character error rate (CER). Furthermore, in the “Advanced Compare” we can have these results calculated for the whole document or for specific pages in it – always provided that the selected pages have a GT version. Because in Advanced Compare the GT is automatically set as reference.

So select the model to be validated and start the calculation. The result gives you not only the average value for the whole document, but also the corresponding values for each individual page. And that makes the Advanced Compare the most important validation tool in systematic analysis when developing models.

In our rather complex model training for the Spruchakten (over 1000 writer’s hands from more than 150 years) we worked with separate small test sets. On them we could validate our new models over and over again via the Advanced Compare and analyse the results thoroughly. In this way not only average improvements or worsening could be traced in detail. We were also able to identify particular exceptions, such as individual concept fonts or particularly “smeared” ones, which worsened the otherwise good overall result. In addition, we were able to create many graphics from the numerical material, which helped us – and now you – to better understand certain phenomena and developments.

Tips & Tools
You can also download the validation results of the Advanced Compare as an Excel spreadsheet to your computer. To do so, you can select a folder under the result display where you want to save the document. Then click on the button “Download XLS”. Do not just press Enter – otherwise you will have to start all over again.

Posted by Elisabeth Heigl on

Compare Text Versions

Release 1.10.1

So, a new HTR model has run over a page and you want to have a first overview on how the model has read? Go to the tool option “Compute Accuracy”, enter the corresponding reference (GT) and hypothesis (HTR Text) and take a look at the validation tool „Compare Text Versions“:

The Text Compare visualizes the comparison of HTR and GT version directly in the text. A word with an error appears red marked and crossed out, behind it you see in green the correct version from the GT. So the text Compare basically shows the word error rate (WER). But above all it allows us to quickly recognize which mistakes exactly were made. So we can also see, for example, that many of the errors are actually minor mistakes, which don’t really bother us when reading and searching for words. In our example here we see a WER of 15%.