Merge small Base Lines

This tool is – like “Remove small text lines” – distributed with version 1.12.0 of Transkribus. The idea behind it is very interesting.

Maybe you have had problems with “torn” lines in the automatic line detection (Citlab Advanced Layout Analysis). We have mentioned in an earlier post how annoying this problem can be.

So the expectations for such a great thing were of course high. But after a short time we realized that its use needs some practice and that it cannot be used everywhere without problems.

Here we show a simple example:

The Citlab Advanced Layout Analysis detected five “superfluous” text regions on the page and just as many “torn” base lines. In such a case you should first remove the redundant text regions with “remove small text regions” and then start the automatic merge tool.

Tips & Tools
Be careful with complicated layouts. You must always check the result of “merge small text lines”, because often base lines are merged that do not belong together (from lines with different reading order).