Posted by Elisabeth Heigl on 14. August 2020

CER? Don´t Worry!

Transkribus in practice/Models & Training

Release 1.10.1

The Character Error Rate (CER) compares, for a given page, the total number of characters (n), including spaces, to the minimum number of insertions (i), substitutions (s) and deletions (d) of characters that are required to obtain the GT result. If that was not mathematical enough for you:

CER = [ (i + s + d) / n ]*100

This means that even all the little mistakes are statistically full-fledged errors. Every missing comma, a “u” instead of a “v”, an additional space or even an uppercase letter instead of a lowercase letter are included in the CER as “whole errors”. The small details neither disturb the reading and understanding of the text nor do they prevent the search engine from finding a term.
So don’t only look at the numbers but also at the text comparison.Your model is usually better than the CER (and especially the WER) suggest.
To illustrate this, we have calculated this exemplary: