Successful handwriting text recognition depends on four factors:
– Quality of Originals
– Quality of digital copies
– Reliable layout analysis and segmentation of image areas containing the text to be recognized
– Performance of the HTR models, “reading” the handwriting
Our blog will provide regular field reports on all these points. First of all, here are some general remarks.
Basically you can edit all handwritten documents with the tools available in Transkribus. Neither the used character system (Latin, Greek, Hebrew, Russian, Serbian etc.) nor the language is a criterion – the “models” can “learn” almost everything.
However, the quality of the originals has a big effect on the result. In other words – heavily soiled, completely faded or blackened documents have less chances for automatic text recognition than clean, strong writings.
Completely muddled text layouts, i.e. with horizontal and vertical or diagonal lines, numerous marginal notes or insertions and text between the lines, cause more problems for the automatic layout analysis than chancellery copies. And more problems means more work for the editors.
When selecting the material, one should therefore consider the challenges it poses for the available tools and the individual work areas. This can only be done with a little experience.
In our project, multilingual documents from the 16th to 20th centuries are processed with varying degrees of difficulty. We are glad to share our experience.