Wikimedia versus traditional biographical encyclopedias (2024–2025)
Wikipedia has become a primary source of biographical information due to its extensive coverage and accessibility. However, Wikipedia and Wikidata are inherently reliant on the outputs of research institutions—partially due to the 'no original research' policy.
This project aims to analyze the production of traditional biographical dictionaries, explore their relationship to Wikipedia and Wikidata, identify current issues, and propose solutions to improve collaboration between Wikimedia and the creators of traditional biographical dictionaries.
Machine learning tools are being utilized and developed for text reading and text processing in this project. The first published version of the model for automatic text annotation can be found here.
Digitization of Jewish Registers (2024)
A pilot project for converting handwritten data from scans of Jewish registers into a relational database.
Machine learning models for handwritten text recognition (HTR) and text transformation (text-to-text) have been developed for data extraction. The first published versions of the developed models: text segmentation, text recognition, text transformation.
The initial dataset is stored on Wikibase Cloud.