In the previous post we described that if the document layouts are very complicated, the standard LA in Transkribus does not always provide good results. But for a perfect HTR result you need a perfect LA.
Especially in the documents of the 16th and early 17th century the CITlab Advanced LA could not convince us. It was clear to us from the beginning that the standard LA wouldn’t identify the more complex layouts (text regions) in a differentiated way. However, it was the line detection that ultimately failed to meet our demands in these documents.
An example of how (in the worst case) the line detection of the standard LA worked on our material can be seen here:
This may be an isolated case. However, if you process large quantities of documents in Transkribus, such cases may occur more frequently. In order to be able to evaluate the problem correctly, we have therefore recorded representative error statistics on two bundles of our material. It has been found that the standard LA here worked with an average of 12 errors in the line detection per page (see graph below, 1598). This of course has undesirable effects on the HTR result, which we will describe in more detail in the next post.