June 2020

Posted by Elisabeth Heigl on 23. June 2020

Advanced Compare

Release 1.10.1

In contrast to the visualization of the errors via the tool “Compare Text Versions” the ordinary “Compare” gives us the same validation results as numerical values.

In addition to the word error rate, we also get the somewhat more conclusive character error rate (CER). Furthermore, in the “Advanced Compare” we can have these results calculated for the whole document or for specific pages in it – always provided that the selected pages have a GT version. Because in Advanced Compare the GT is automatically set as reference.

So select the model to be validated and start the calculation. The result gives you not only the average value for the whole document, but also the corresponding values for each individual page. And that makes the Advanced Compare the most important validation tool in systematic analysis when developing models.

In our rather complex model training for the Spruchakten (over 1000 writer’s hands from more than 150 years) we worked with separate small test sets. On them we could validate our new models over and over again via the Advanced Compare and analyse the results thoroughly. In this way not only average improvements or worsening could be traced in detail. We were also able to identify particular exceptions, such as individual concept fonts or particularly “smeared” ones, which worsened the otherwise good overall result. In addition, we were able to create many graphics from the numerical material, which helped us – and now you – to better understand certain phenomena and developments.

Tips & Tools
You can also download the validation results of the Advanced Compare as an Excel spreadsheet to your computer. To do so, you can select a folder under the result display where you want to save the document. Then click on the button “Download XLS”. Do not just press Enter – otherwise you will have to start all over again.

Posted by Elisabeth Heigl on 13. June 2020

Compare Text Versions

Release 1.10.1

So, a new HTR model has run over a page and you want to have a first overview on how the model has read? Go to the tool option “Compute Accuracy”, enter the corresponding reference (GT) and hypothesis (HTR Text) and take a look at the validation tool „Compare Text Versions“:

The Text Compare visualizes the comparison of HTR and GT version directly in the text. A word with an error appears red marked and crossed out, behind it you see in green the correct version from the GT. So the text Compare basically shows the word error rate (WER). But above all it allows us to quickly recognize which mistakes exactly were made. So we can also see, for example, that many of the errors are actually minor mistakes, which don’t really bother us when reading and searching for words. In our example here we see a WER of 15%.

Posted by Elisabeth Heigl on 3. June 2020

Use Case: Extend and improve existing HTR-Models

Release 1.10.1

In the last post we described that a base model can pass on everything it has “learned” to the new HTR model. With additional ground truth, the new model can then extend and improve its capabilities.

Here is a typical use case: In our subproject on the Assessor Relations of the Wismar Tribunal we train a model with eight different writers. The train set contains 150,000 words, the CER was 4.09% in the last training. However, the average CER for some writers was much higher than for others.

So we decided to do an experiment. We added 10,000 words of new GT for two of the obvious writers (Balthasar and Engelbrecht)and used the Base Model as well as its Training and Validation Set for the new training.

As a result, the new model had an average CER of 3.82% – it had improved. But what is remarkable is that not only the CER of the two writers for which we had added new GT was improved – in both cases up to 1%. Also the reliability of the model applied to the other writers did not suffer, but was reduced as well.

Rechtsprechung im Ostseeraum

Monthly Archives

Advanced Compare

Compare Text Versions

Use Case: Extend and improve existing HTR-Models