Category Archives

6 Articles

Posted by Anna Brandt on

Collaboration – Versions Management

Release 1.7.1

The second important element for organized collaboration is the version management of Transkribus. In the toolbar it seems rather inconspicuous, but it is enormously important. Transkribus stores a version of the currently edited page each time it is saved. It contains the current status of the layout work and content processing.

These versions are provided with an “edit status” so that they can be easier distinguished. A newly uploaded Document contains only pages with the edit status “new”. As soon as you edit a page, the edit status automatically changes to “in progress”. The three other status options – “done”, “final” and “Ground Truth” – can only be set manually.

The logical time to set such a “higher” status depends on the agreements within the team. We use versions management mostly during the production of training material – Ground Truth. All pages that have a finished layout analysis are set to “done” so that the transcribers and editors know that this page can now be finished by them. This status will not be changed until the page has a 100% secure transcription. Then it will be set to “Ground Truth” or “final”. All pages with the status “GT” will later be used as training material for HTR models, while the pages with edit status “final” will be used to create the test sets.

Each collaborator can access and edit or delete all versions of a page at any time. The edit status helps him to find the desired version faster. In addition to the edit status, the last editor and the save time are displayed for each version. If the version was edited with an automatic process (layout analysis or HTR), this is also commented. Thus, the processing steps are traceable in detail.

Tips & Tools
You can have multiple versions with the same status.
You can set any version to any other status – except to “New”.
You can delete single or multiple versions – except final versions, which cannot be deleted.

Posted by Elisabeth Heigl on

Collaboration – User Management

Release 1.7.1

The Transkribus platform is designed for collaboration. So many users can work on a collection and even a document at the same time. Collisions should be easily avoided with a little organizational skill.

The two most important elements enabling organized collaboration are User Management and Version Management in Transkribus. User Management refers explicitly to the collections. The person who creates a collection is always its “owner”, meaning that he has full rights, including the right to delete the entire collection. He can grant other users access to the collection and assign them roles that correspond to different rights:

Owner – Editor – Transcriber

It is wise if more than one member of the team is the “owner” of a collection. All the rest of us are “editors”. Assigning the role “transcriber” is especially useful if you run crowd-projects where volunteers do nothing but transcribe or tag texts. For such “transcribers”, access via the WebUI, with its range of functions adapted to this role, is ideally suited.

Posted by Anna Brandt on

What you should know about Collections & Documents

Release 1.7.1

Collections and Documents are the two most important categories in which you can organize and manage material in Transkribus. A Collection is nothing else than a kind of directory in which you store Documents that belong together. It is important to know that some tools that Transkribus provides do not work beyond the boundaries of a Collection. This includes a tag search, which is an important tool for those who want to tag their HTR results.
Documents are parts of the Collection, e.g. a bundle of letters or a record or even
a single piece of writing. In our project a Document is always a record. Documents can therefore contain many pages. They usually are uploaded into Transkribus via private FTP or directly from a local folder. You cannot upload single images, but only images that are contained in a folder.

Once uploaded, the possibility to edit the individual pages of a Document is limited. Using the document manager, you can move or delete individual pages within the Document, you can even add more pages. However, once images are uploaded, they can no longer be edited or rotated. This means: before uploading you should check if the images are aligned correctly and if the Document is complete.
Thus in our project, Documents are only compiled and uploaded once they have been edited in the Goobi metadata editor, checked for completeness and received structure and metadata. This ensures that when the HTR results are re-imported to Goobi later, they are actually transferred to an identical document structure.

 

Tips & Tools
Documents can be distributed between different Collections at any time. This is done by linking or duplicating. In the first scenario, each change to the Document, no matter in which Collection it is made, is transferred to all Collections it is linked to. The second scenario creates actually two unique Documents that can also be edited independently of each other.

Posted by Anna Brandt on

Material

Release 1.7.1

Successful handwriting text recognition depends on four factors:

– Quality of Originals
– Quality of digital copies
– Reliable layout analysis and segmentation of image areas containing the text to be recognized
– Performance of the HTR models, “reading” the handwriting

Our blog will provide regular field reports on all these points. First of all, here are some general remarks.
Basically you can edit all handwritten documents with the tools available in Transkribus. Neither the used character system (Latin, Greek, Hebrew, Russian, Serbian etc.) nor the language is a criterion – the “models” can “learn” almost everything.
However, the quality of the originals has a big effect on the result. In other words – heavily soiled, completely faded or blackened documents have less chances for automatic text recognition than clean, strong writings.
Completely muddled  text layouts, i.e. with horizontal and vertical or diagonal lines, numerous marginal notes or insertions and text between the lines, cause more problems for the automatic layout analysis than chancellery copies. And more problems means more work for the editors.
When selecting the material, one should therefore consider the challenges it poses for the available tools and the individual work areas. This can only be done with a little experience.

In our project, multilingual documents from the 16th to 20th centuries are processed with varying degrees of difficulty. We are glad to share our experience.

Posted by Dirk Alvermann on

WebUI & Expert Client

As we said before, this blog is almost exclusively about the Expert Client of Transkribus. It offers a variety of possibilities. To handle them it requires a certain level of knowledge.

The tools of the WebUI are much more limited, but also easier to work with. In the WebUI it is not possible to perform an automatic layout analysis or to start an HTR, let alone to train a model or to interfere in the user management. But that’s not what it’s meant for.

The WebUI is the ideal interface for crowd projects with a lot of volunteers who mainly transcribe or comment and tag content. And this is exactly what it is used for most of the time. The coordination of such a crowd project is done via the Expert Client.

The WebUI’s advantages are that it can be used without any requirements. It is a web application called from the browser; no installation, no updates, etc. Moreover, it is almost intuitive and can be used by anyone without any previous knowledge.

 

Tips & Tools
The WebUI has also a version management – somewhat adapted for crowd projects. When a transcriber is done with the page to be edited, he sets the edit status to “ready for review”, so that his supervisor knows that now it’s his turn.

 

Posted by Dirk Alvermann on

Knowing what you want

A digitization project with Handwritten Text Recognition can have very different goals. They can range from the critical digital edition to the provision of manuscripts as full texts to the indexing of large text corpora via Key Word Spotting. All three objectives allow different approaches, which have a great influence on the technical and personnel efforts.

In this project, only the last two target definitions are of interest.  A critical edition is not intended, even if the full texts generated in this project could serve as the basis of such.

We aim at a complete indexing of the manuscripts by automatic text recognition. The results will then be made public online in the Digital Library Mecklenburg-Vorpommern. A search is available there, which shows the hits in the image itself. The user, who has sufficient palaeographic knowledge, can explore the context of the hit in the image himself or switch to a modern full text view, or even only use the latter.