Posted by Elisabeth Heigl on 14. May 2020

Validation possibilities

Transkribus in practice/Models & Training

Release 1.10.1

There are different ways to measure the accuracy of our HTR-models in Transkribus. Three Compare tools calculate the results and present them in different ways. In all three cases the hypothesis (HTR version) of a text is compared with a corresponding reference (correct version, i.e. GT) of the same text.

The first tool which shows the most immediate result is “Compare Text Versions“. It visualizes the validation for the currently opened page in the text itself. Here we can see exactly which mistakes the HTR has made at which points.

The standard “Compare” gives us these same validation results as numerical values. Among other things, it calculates the average word error rate (WER), the character error rate (CER) and the respective accuracy rates. (If someone knows what the bag tokens are about, he/she is welcome to write us a comment). In the “Compare” we also have the possibility to run the “Advanced Compare“, which allows us to perform the corresponding calculations for the whole document or only for certain pages.

We already have presented the validation tool “Compare Sample” briefly in another post to show how to create Test Samples. The actual Sample Compare then predicts how a model will potentially read on a Test Sample that has been created for this purpose.