Paris Bible Correct-a-thon at the Université de Franche-Comté (January 2023)
- About the Paris Bible Project
- Schedule
- Preparatory work
- Wednesday, January 11, 2023 (1700-1830 France Time)
- Monday, January 16, 2023 (1500-1800 France Time)
- Tuesday, January 17, 2023: Correct-a-thon (1400-1700 France Time)
- Wednesday, January 18, 2023 (1400-1600 France Time)
- Thursday, January 19, 2023 (1400-1600 France Time)
- Friday, January 20, 2023 (1400-1700 France Time)
- Early March: Publishing the GT
- Organization
- Deliverables and Outcomes
- Acknowledgments
About the Paris Bible Project
The Paris Bible Project is a transnational digital humanities project that aims to understand the production and diffusion of Latin Bibles in medieval Europe. Using emerging and established techniques in digital research in the humanities, the project challenges the received opinion of Paris Bibles as uniform, even though it was a model that was created with a degree of standardization and spread in Europe in the 13th century. We use handwritten text recognition (HTR) to create diplomatic transcriptions from digitized copies of manuscripts and we study the spelling and abbreviation of words in the transcriptions as distinctive features for understanding more about the specificities of the manuscript, for example, scribal habits, localization or dating.
If our primary goal is to have an academic understanding of the diffusion of Paris Bibles, this research project opens multiple doors to partnerships, and inclusive forms of engagement through the study of manuscripts–be it found in a library, archive or museum–from different perspectives. Doing digital research implies a different approach to the object, which still represents the core of the study but accessing it through the digital copy. This methodology requires the use of adapted tools and extends the array of possible questions.
Participatory methods such as crowd transcription have caught on in recent years and have been bolstered by the creation of different interfaces with which groups can co-create content. Our event takes a slightly different approach. Instead of engaging a public with transcription itself, we are focusing on crowd correction of transcriptions that have been automatically created by AI. The correct-a-thon is a citizen science initiative, providing an opportunity for anyone interested in manuscripts to learn about the intersection of artificial intelligence and culture, as well as to contribute to an on-going scientific research project about medieval Bibles.
The Besançon correct-a-thon
We propose a “correct-a-thon” at the Université de Franche-Comté in January 2023, spanning several days in collaboration with the Digital Humanities and Book History Master, and with the participation of colleagues from universities and public institutions. The main objectives of the correct-a-thon are
- Assessing the quality of computer created transcriptions and adjusting them using human intelligence.
- Raising awareness about the rapidly evolving domain of AI and the humanities.
- Contributing to the quantity and quality of ground truth for the Paris Bible Project.
- Expanding the project’s abbreviation transcription guidelines.
- Discussing the biases of computer vision in paleography.
- Supporting the creation of new scribal profiles to help locate and date manuscripts belonging to this tradition.
- Bringing together students, researchers and library professionals, connecting languages and international communities.
- Providing an opportunity for experiential learning.
- Modeling an initial public interaction with the PBP research project and setting a baseline for future encounters.
- Dissemination of correct-a-thon results in posts on the project blog.
Learning Objectives
- Learning how to integrate digital tools into your research.
- Demystifying artificial intelligence tools in research
- Learning specific digital skills (using a text editor, training and correcting HTR models, versioning system).
- Participating in the enrichment and creation of a dataset.
Schedule
Preparatory work
- Read:
- Estelle Guéville and David Joseph Wrisley, “Transcribing Medieval Manuscripts for Machine Learning”
- Tobias Hodel, Supervised and Unsupervised: Approaches to Machine Learning for Textual Entities. Archives, Access and Artificial Intelligence
- Watch:
- Create:
- Download:
- Additional Readings:
- Wrisley, David Joseph, Guéville, Estelle, and Cappelletto, Niccolò Acram. (2022). Creating New Audiences for Digital Objects Through Museum-University Collaboration. Museums in the Middle East Journal. Sharjah, UAE. 3: 61-63.
- Romein, Annemieke et al. Exploring Data Provenance in HTR Infrastructure. (under review)
Wednesday, January 11, 2023 (1700-1830 France Time)
Talk: David J. Wrisley and Estelle Guéville “Working with AI-created Transcriptions of Manuscripts: Introducing the Paris Bible Project“
- Presentation of the PBP project.
- Presentation of the methodology and some high-level results.
- Presentation of the challenge:
- Material
- Transcription guidelines
- Organization
- Instructions for next week
Monday, January 16, 2023 (1500-1800 France Time)
- Orientation session with Transkribus
- 1505-1520 - Learning about handwritten text recognition
- 1520-1530 - Creating the teams, collecting team information, adding to collections
- 1530-1540 - Introducing Manuscript of Each Team
- 1540-1630 - Learning the Transkribus Lite interfaces
- 1630-1640 - Break
- 1640-1725 - Gentle Introduction to Palaeography and Codicology - Estelle Guéville
- Learning to describe a medieval manuscript: introduction to features of codicology and paleography in Paris Bibles
- Taking notes and process-oriented thinking
- Describing the most common abbreviations used, punctuation style, etc
- Describing the errors found in the transcription and propose other ways of transcribing letter forms or abbreviations found
- 1725-1750 Christofer Meinecke (Leipzig U) - Annotation interface for Paris Bible images from Mandragore / Initiales
- 1750-1800 Wrap up / questions
Tuesday, January 17, 2023: Correct-a-thon (1400-1700 France Time)
- Correcting one manuscript per team (30 mins)
- Question and answer session (30 mins)
- Guided correction (1 3/4 hours)
- Distraction activity: classifying images from Paris Bibles with basic tags
- Debriefing
Wednesday, January 18, 2023 (1400-1600 France Time)
- Question and answer session (15 min)
- Additional guided correction (45 min)
- Checking in on images - Christopher Meinecke (15 min)
- Blogging with markdown - David Wrisley (45min)
- Research blogging: documenting the process
- Principles of markdown
- Learning to publish a blog post on a static website (using GitHub)
- Visit to the BM, Besançon to see manuscripts
Thursday, January 19, 2023 (1400-1600 France Time)
- General discussion about modes of transcription and abbreviations found.
- Looking at MUFI
- Present the results of the retrained model with PyLaia across manuscripts with diff checker
- General discussion of the output of model: progress? Overfitting?
- Working with tools:
- AntConc. For videos to learn AntConc, see week 3 of DJW’s course.
- Voyant
- Taking notes & drafting blog posts (1 per team)
Friday, January 20, 2023 (1400-1700 France Time)
- What we can do with the data produced
- Presenting the next steps of the project, PCA with the manuscripts worked on and comparison with other manuscripts.
- Proposing a description of a scribal profile.
- Writing session
- Co-authoring a short description of the manuscript (including abbreviations, characters, illuminations, decorative features) to be published as a blog post on the PBP website.
- Publication
- Publish one of the blog posts in real-time
- Publish corrected ground truth at PBP and HTR United with names and ORCID
- Images: what have we learned from the annotation of images?
Early March: Publishing the GT
Organization
Teams
- Number of teams: 7
- Number of people per team: 3
Material
- One manuscript per team.
- Same section(s) of the Bible for all.
- Platform: Transkribus
Inventory | Country of origin | Date |
---|---|---|
Beinecke Library ms. 1100 | Italy | 13th century |
Beinecke Library ms. 387 | England or France | ca. 1325 |
Stanford Libraries ms. 23 | Paris | mid 1250s |
Besançon municipal library ms. 4 | France. By or for a franciscan monastery. | mid 13th century |
Besançon municipal library ms. 8 | France, Besançon? | end of 13th century |
Aarau Aargau Cantonal Library MsWettF 11 | Germany | Third quarter of the 13th century |
Beinecke Library ZZi 56 | Mainz and Germany Mainz | ca. 1454 |
Guidelines
Useful Links
Criteria for judging
- Accuracy of correction (40%)
- Reflections on working with machine-transcribed texts and description of errors found in the automatic transcriptions (30%)
- Overall quality of the blog post (30%)
- Additional: description of illuminations/decoration
Communication
- Social media:
- Twitter account: @BibleParis
- Hashtag: #PBPChallenge
- Blog publications
Deliverables and Outcomes
- Add ground truth to the collection.
- Blog posts by research groups
- Citation of all participants (with their ORCID) on the website, in the HTR united record, on Github and any other relevant platform/publication resulting from the challenge.
Acknowledgments
We would like to acknowledge and thank all the students from the “Digital Humanities and Book History” Master and the “Editions Numériques et Patrimoine de l’Antiquité” master at the Université de Franche-Comté who participated in the course and challenge:
Robert Lloyd, Gauri Bhagwat, Alice Fournier, Amanda Robin Hemmons, Alexandre Keyes, Lucia Sol Bezzecchi Petroff, Marie Noirot, Anna Chemisova, Benedicta Arthur, Sharon Hassive Guerra Álvarez, Nina Jacobson, Sumeyye Topkara, Sonaj Kailas, Kateri Soulard, Jesus David Macchi Franco, Elia Coulot, Serhat Acar, Úna Faller, and Diego Rodriguez.