Language Models

Release 1.10.1

We talked about the use of dictionaries in an earlier post and mentioned that the better an HTR model is (CER below 7%), the less useful a dictionary is for the HTR result.
This is different when using Language Models, which are available in Transkribus since December 2019. Like dictionaries, Language Models are generated from the ground truth used in each HTR training. Unlike dictionaries, Language Models do not aim at identifying individual words. Instead, they determine the probability of a word sequence or the frequent combination of words and expressions in a particular context.
Unlike dictionaries, the use of Language Models always leads to much better HTR results. In our tests, the average CER improved by as much as 1% compared to the HTR result without the Language Model – consistently, on all test sets.

Tips & Tools: The Language Model can be selected when configuring the HTR. Unlike dictionaries, Language Models and HTR model cannot be freely combined. Each HTR model has its uniquely genereated Language Model and only this one can be used.