9 Articles

Posted by Dirk Alvermann on

Tag Export II

In the last post we presented a benefit of tags. As an example we showed the visualization of results by displaying place tags on a map. But there are other possibilities.

Tags can not only be exported separately, as in the form of an Excel table. Some tags (place or person) are also output in the ALTO files. These files are among other things responsible for the fact that we can display the hits of the full text search in our viewer/presenter. To do this, simply select “Export ALTO (Split lines into words)” when exporting the METS files.

In our presenter in the Digital Library Mecklenburg-Vorpommern the tags are then displayed as “named entities” separated by places and persons for the respective document. The whole thing is still in the experimental phase and will be further developed in the near future so that you can jump directly to the corresponding places in the document via an actual tag cloud.

Posted by Dirk Alvermann on

Tag Exports I

Once you have taken the trouble to tag one or more documents, there are several ways to use this “added value” outside of Transkribus. The tags can be easily exported as an Excel spreadsheet using Transkribus’ export tool.

From there you have many options. We had conducted our “tagging experiment” to see if this would be a good way to visualize the geographical distribution of our documents. At the same time, this map should allow access to the digitized documents in the presenter.

All in all we are satisfied with the result of the experiment. You can select specific years or periods of time, search for locations and use the points on the map to access the documents.

In the end, however, the effort for this kind of tagging has proven to be so high that we cannot afford it within the scope of this project. But there are other ways to use tags in export, which we will write about in the next post.

Posted by Anna Brandt on

Searching and editing tags

Release 1.11.0

If you tag large amounts of historical text, as we have tried to do with place and person names, you will sooner or later have a problem: the spelling varies a lot – or in other words, the tag values are not identical.
Let’s take the places and a simple example. As “Rosdogk”, “Rosstok” or “Rosdock” the same place is always referred to – the City of Rostock. To make this recognizable, you use the properties. But if you do this over more than ten thousand pages with hundreds or thousands of places (we set about 15,000 tags for places in our attempt), you easily lose the overview. And besides, tagging takes much longer if you also assign properties.

Fortunately, there is an alternative. You can search in the tags, not only in the document you are working on, but in the whole collection. To do this, you just have to select the “binoculars” in the menu, similar to starting a full text search or KWS, only that you now select the submenu “Tags”.

Here you can select the search area (Collection, Document, Page) and also on which level you want to search (Line or Word).Then you have to select the corresponding tag and if you want to limit the search, you have to enter the tagged word. The search results can also be sorted. This way we can quickly find all “Rostocks” in our collection and can enter the desired additional information in the properties, such as the current name, geodata and similar. These “properties” can then be assigned to all selected tagged words. In this way, tagging and enrichment of data can be separated from each other and carried out efficiently.

The same is possible with tags like “Person” or “Abbrev” (where you would put the resolution/expansion in the properties).

Posted by Dirk Alvermann on

textual tagging

Like everything else, tagging can be integrated into your work on very different levels and with different requirements. In Transkribus, a large number of tags are available for a wide range of applications, some of which are described here.

We decided to try it with only two tags, namely “person” and “place”. These tags will later allow systematic access to the corresponding text passages.

When tagging, Transkribus automatically adopts the term under the cursor as “value” or “label” for the specific case. So if I mark “Wolgast” as in the example below and tag it as “place”, then two important pieces of information are already recorded. The same is true for the name of the person further down.

Transkribus offers the possibility to assign properties to each tagged element, e.g. to display the historical place name in modern spelling or to assign a gnd number to the person’s name. You can also create additional properties, geodata for places etc.

Given the amount of text we process, we have decided not to assign properties to our tags. Only the place names are identified as best as possible. The aim is to be able to display the tag-values separately for people and places next to the respective document when presenting them in the viewer of the Digital Library M-V, thus enabling the user to navigate systematically through the document.

Posted by Anna Brandt on

Tagging in WebUI

For tasks like tagging already transcribed documents, the WebUI, which is especially designed for crowd sourcing projects, is very well suited.

Tagging in the WebUI works slightly different than in the Expert Client. There are different tools and settings.

If you have selected your collection and the document in the WebUI and want to tag something, you have to select “Annotation” and not “plain text” for the page you want to edit.  Both modes are similar, except that in Annotation you can additionally tag. To do this, you need to select the words and right-click on them to pick the appropriate tag. Always save when you leave the page, even if you switch to layout mode. The program doesn’t ask you to save the tag as it does in the Expert Client and without saving your tags will be lost.

All tags appear to the left of the text field when you click on the word. The tags set in the Expert Client are also displayed there. The whole annotation mode is still in a beta version at the moment.

Posted by Anna Brandt on

Tagging Tools

Release 1.11.0

In a previous post we already wrote about our experiences with structure tagging and described the tools that go with it. But for most users (e.g. in edition projects) enriching texts with additional content information is even more important. To add tags to a transcription you can use the tagging tools in the tab “Metadata”/”Textual” in Transkribus.

Here you can see the available tags as well as those that have already been applied to the text of the page. With the Customize button you can create your own tags or add shortcuts to existing tags, just like with structure tagging. The shortcuts allow for easier and faster tagging in the transcript. If you want to do without shortcuts, you have to mark the respective words in the text (not in the image) and select the desired tag with a right click. Of course a word can be tagged several times.

These tags should not be confused with the so-called TextStyles (for example, crossed out or superscript words). They are not accessible below the tags but via the toolbar at the bottom of the text window.

Posted by Dirk Alvermann on

Tagging: what for? – when and why tagging makes sense

Tagging allows – in addition to content indexing by HTR – systematic indexing of the text by the later user. In contrast to an HTR model that does its work independently, tagging has to be done mostly by hand, which means that it requires a lot of effort. Therefore, a realistic effort analysis should be carried out before developing far-reaching plans regarding tagging.

Due to the amount of material processed in our project, we primarily use tagging where it helps us in the practical work on the text. This is the case with structure tagging, where the layout analysis is improved with the help of the tagging and the P2PaLA developed from it, and then of course also with the tagging of textstyles in case of deletions and blackening. This is where tagging is basically used “area-wide” by us. A fixed component of our transcription rules is also the use of the “unclear” tag for passages that cannot be read correctly by the transcriber. In this case, the tag is used more for internal team communication.

For the systematic preparation of texts for which an HTR has already been performed, we are experimenting with the “person” and “place” tags in order to offer systematic indexing, at least in this limited form.

Posted by Dirk Alvermann on

Structural tagging – what else you might do with it (Layout and beyond)

In one of the last posts you read how we use structural tagging. Here you can find how the whole toolbox of structural tagging works in general. In our project it was mainly used to create an adapted LA model for the mixed layouts. But there is even more potential in it.
Who doesn’t know the problem?

There are several, very different handwritings on one page and it becomes difficult to get consistently good HTR results. This happens most often when a ‘clean’ handwriting has been commented in concept handwriting by another writer. Here is an example:

The real reason for the problem is that HTR has only been executed at the page level so far. This means that you can have one page or several pages read either with one or the other HTR model. But it is not possible to read with two different models, which are adapted to the respective handwritings.

Since version 1.10. it is possible to apply HTR models on the level of text regions instead of just assigning them to pages. This allows the contents of individual specific text regions on a page to be read using different HTR models. Structure tagging plays an important role here, for example, in the case of text regions with script styles that differ from the main text. These are tagged with a specific structure tag, to which a special HTR model is then assigned. Reason enough, therefore, to take a closer look at structure tagging.

Posted by Anna Brandt on

Structural Tagging

How structural tagging is done exactly is explained in this Wiki. In contrast to “textual” tagging you can tag all structures, for example text regions, baselines or tables. In our case, only the text regions are tagged, because we use structure tagging to train a P2PaLA model.

When you create your training material and decide where to position the specific structural elements, you should stick to your choices. For example: for us a “paragraph” is always the TR at the top in the middle, the core so to speak; “marginalia” are all the notes on the left side of the image, separated from the “paragraph”.  With this you can divide the images into ‘types’, i.e. groups of images in which all TRs with the same tags are always in a certain coordinate area of the page.

Tips & Tools
There are three ways to set the corresponding tag. First by right-clicking on the marked area and then assigning a tag via “assign structure type”. Or you can choose the area “Structural” in the tab “Metadata”, where the existing structure types are displayed. There you can also define shortcuts for tags that you are using a lot: click on the button “Customize” and enter a number from one to nine in the column “Shortcut”. Then the shortcut is displayed in the tab, it is always Ctrl+Alt+Number.