Monthly Archives

2 Articles

Posted by Dirk Alvermann on

New public model – German Kurrent 17th-18th

Today we are proud to present our second publicly available HTR model.

“German_Kurrent_17th-18th” is an HTR model for current scripts of the 17th and 18th century. For this model we used ground truth from our various larger and smaller projects of the last four years.

It is a generic model that includes material from about 2000 individual writer’s hands. About 35% of the Ground Truth comes from 17th century manuscripts and 50% from 18th century manuscripts. The remaining 15% are spread over the decades before and after. The documents selected for the training consist mainly of official minutes and legal documents, but also private records and letters of the time. In addition, a few contemporary printed materials (Fraktur), which appear in the records from time to time, were also used for the training. The language of the texts used is predominantly German and Latin. In addition, some Low German and French texts were also used.

Have fun trying it out. Please use the comments to let us know how the model works for you.

Posted by Dirk Alvermann on

Tag Exports I

Once you have taken the trouble to tag one or more documents, there are several ways to use this “added value” outside of Transkribus. The tags can be easily exported as an Excel spreadsheet using Transkribus’ export tool.

From there you have many options. We had conducted our “tagging experiment” to see if this would be a good way to visualize the geographical distribution of our documents. At the same time, this map should allow access to the digitized documents in the presenter.

All in all we are satisfied with the result of the experiment. You can select specific years or periods of time, search for locations and use the points on the map to access the documents.

In the end, however, the effort for this kind of tagging has proven to be so high that we cannot afford it within the scope of this project. But there are other ways to use tags in export, which we will write about in the next post.