Many of you probably know the tool “Remove small text regions“, which has been available at Transkribus for the last year. Now his little brother “Remove small text lines” is coming. Finally – a tool that many users have been hoping for for a long time.
With the Citlab Advanced Layout Analysis (even on quite “normal” pages) it happens again and again that textregions or baselines are recognized where we don’t need or want them.
Often “mini-baselines” are recognized in decorated initials or between the individual lines. The HTR model of course can’t do anything with these during text recognition and the transcript will contain “empty” lines. With this tool you can easily and automatically delete these baselines
Try it yourself. We have had the best experience with this if we set the threshold to 0.05.