Paris Bible Correct-a-thon at the Université de Franche-Comté (January 2023)

  1. About the Paris Bible Project
    1. The Besançon correct-a-thon
    2. Learning Objectives
  2. Schedule
    1. Preparatory work
    2. Wednesday, January 11, 2023 (1700-1830 France Time)
    3. Monday, January 16, 2023 (1500-1800 France Time)
    4. Tuesday, January 17, 2023: Correct-a-thon (1400-1700 France Time)
    5. Wednesday, January 18, 2023 (1400-1600 France Time)
    6. Thursday, January 19, 2023 (1400-1600 France Time)
    7. Friday, January 20, 2023 (1400-1700 France Time)
    8. Early March: Publishing the GT
  3. Organization
    1. Teams
    2. Material
    3. Guidelines
    4. Useful Links
    5. Criteria for judging
    6. Communication
  4. Deliverables and Outcomes
  5. Acknowledgments

About the Paris Bible Project

The Paris Bible Project is a transnational digital humanities project that aims to understand the production and diffusion of Latin Bibles in medieval Europe. Using emerging and established techniques in digital research in the humanities, the project challenges the received opinion of Paris Bibles as uniform, even though it was a model that was created with a degree of standardization and spread in Europe in the 13th century. We use handwritten text recognition (HTR) to create diplomatic transcriptions from digitized copies of manuscripts and we study the spelling and abbreviation of words in the transcriptions as distinctive features for understanding more about the specificities of the manuscript, for example, scribal habits, localization or dating.

If our primary goal is to have an academic understanding of the diffusion of Paris Bibles, this research project opens multiple doors to partnerships, and inclusive forms of engagement through the study of manuscripts–be it found in a library, archive or museum–from different perspectives. Doing digital research implies a different approach to the object, which still represents the core of the study but accessing it through the digital copy. This methodology requires the use of adapted tools and extends the array of possible questions.

Participatory methods such as crowd transcription have caught on in recent years and have been bolstered by the creation of different interfaces with which groups can co-create content. Our event takes a slightly different approach. Instead of engaging a public with transcription itself, we are focusing on crowd correction of transcriptions that have been automatically created by AI. The correct-a-thon is a citizen science initiative, providing an opportunity for anyone interested in manuscripts to learn about the intersection of artificial intelligence and culture, as well as to contribute to an on-going scientific research project about medieval Bibles.

The Besançon correct-a-thon

We propose a “correct-a-thon” at the Université de Franche-Comté in January 2023, spanning several days in collaboration with the Digital Humanities and Book History Master, and with the participation of colleagues from universities and public institutions. The main objectives of the correct-a-thon are

  • Assessing the quality of computer created transcriptions and adjusting them using human intelligence.
  • Raising awareness about the rapidly evolving domain of AI and the humanities.
  • Contributing to the quantity and quality of ground truth for the Paris Bible Project.
  • Expanding the project’s abbreviation transcription guidelines.
  • Discussing the biases of computer vision in paleography.
  • Supporting the creation of new scribal profiles to help locate and date manuscripts belonging to this tradition.
  • Bringing together students, researchers and library professionals, connecting languages and international communities.
  • Providing an opportunity for experiential learning.
  • Modeling an initial public interaction with the PBP research project and setting a baseline for future encounters.
  • Dissemination of correct-a-thon results in posts on the project blog.

Learning Objectives

  • Learning how to integrate digital tools into your research.
  • Demystifying artificial intelligence tools in research
  • Learning specific digital skills (using a text editor, training and correcting HTR models, versioning system).
  • Participating in the enrichment and creation of a dataset.

Schedule

Preparatory work

Wednesday, January 11, 2023 (1700-1830 France Time)

Talk: David J. Wrisley and Estelle Guéville “Working with AI-created Transcriptions of Manuscripts: Introducing the Paris Bible Project

  • Presentation of the PBP project.
  • Presentation of the methodology and some high-level results.
  • Presentation of the challenge:
    • Material
    • Transcription guidelines
    • Organization
  • Instructions for next week

Monday, January 16, 2023 (1500-1800 France Time)

  • Orientation session with Transkribus
    • 1505-1520 - Learning about handwritten text recognition
    • 1520-1530 - Creating the teams, collecting team information, adding to collections
    • 1530-1540 - Introducing Manuscript of Each Team
    • 1540-1630 - Learning the Transkribus Lite interfaces
  • 1630-1640 - Break
  • 1640-1725 - Gentle Introduction to Palaeography and Codicology - Estelle Guéville
    • Learning to describe a medieval manuscript: introduction to features of codicology and paleography in Paris Bibles
    • Taking notes and process-oriented thinking
      • Describing the most common abbreviations used, punctuation style, etc
      • Describing the errors found in the transcription and propose other ways of transcribing letter forms or abbreviations found
  • 1725-1750 Christofer Meinecke (Leipzig U) - Annotation interface for Paris Bible images from Mandragore / Initiales
  • 1750-1800 Wrap up / questions

Tuesday, January 17, 2023: Correct-a-thon (1400-1700 France Time)

  • Correcting one manuscript per team (30 mins)
  • Question and answer session (30 mins)
  • Guided correction (1 3/4 hours)
  • Distraction activity: classifying images from Paris Bibles with basic tags
  • Debriefing

Wednesday, January 18, 2023 (1400-1600 France Time)

  • Question and answer session (15 min)
  • Additional guided correction (45 min)
  • Checking in on images - Christopher Meinecke (15 min)
  • Blogging with markdown - David Wrisley (45min)
    • Research blogging: documenting the process
    • Principles of markdown
    • Learning to publish a blog post on a static website (using GitHub)
  • Visit to the BM, Besançon to see manuscripts

Thursday, January 19, 2023 (1400-1600 France Time)

  • General discussion about modes of transcription and abbreviations found.
  • Looking at MUFI
  • Present the results of the retrained model with PyLaia across manuscripts with diff checker
  • General discussion of the output of model: progress? Overfitting?
  • Working with tools:
  • Taking notes & drafting blog posts (1 per team)

Friday, January 20, 2023 (1400-1700 France Time)

  • What we can do with the data produced
    • Presenting the next steps of the project, PCA with the manuscripts worked on and comparison with other manuscripts.
    • Proposing a description of a scribal profile.
  • Writing session
    • Co-authoring a short description of the manuscript (including abbreviations, characters, illuminations, decorative features) to be published as a blog post on the PBP website.
  • Publication
    • Publish one of the blog posts in real-time
    • Publish corrected ground truth at PBP and HTR United with names and ORCID
  • Images: what have we learned from the annotation of images?

Early March: Publishing the GT

Organization

Teams

  • Number of teams: 7
  • Number of people per team: 3

Material

  • One manuscript per team.
  • Same section(s) of the Bible for all.
  • Platform: Transkribus
InventoryCountry of originDate
Beinecke Library ms. 1100Italy13th century
Beinecke Library ms. 387England or Franceca. 1325
Stanford Libraries ms. 23Parismid 1250s
Besançon municipal library ms. 4France. By or for a franciscan monastery.mid 13th century
Besançon municipal library ms. 8France, Besançon?end of 13th century
Aarau Aargau Cantonal Library MsWettF 11GermanyThird quarter of the 13th century
Beinecke Library ZZi 56Mainz and Germany Mainzca. 1454

Guidelines

Criteria for judging

  • Accuracy of correction (40%)
  • Reflections on working with machine-transcribed texts and description of errors found in the automatic transcriptions (30%)
  • Overall quality of the blog post (30%)
  • Additional: description of illuminations/decoration

Communication

Deliverables and Outcomes

  • Add ground truth to the collection.
  • Blog posts by research groups
  • Citation of all participants (with their ORCID) on the website, in the HTR united record, on Github and any other relevant platform/publication resulting from the challenge.

Acknowledgments

We would like to acknowledge and thank all the students from the “Digital Humanities and Book History” Master and the “Editions Numériques et Patrimoine de l’Antiquité” master at the Université de Franche-Comté who participated in the course and challenge:

Robert Lloyd, Gauri Bhagwat, Alice Fournier, Amanda Robin Hemmons, Alexandre Keyes, Lucia Sol Bezzecchi Petroff, Marie Noirot, Anna Chemisova, Benedicta Arthur, Sharon Hassive Guerra Álvarez, Nina Jacobson, Sumeyye Topkara, Sonaj Kailas, Kateri Soulard, Jesus David Macchi Franco, Elia Coulot, Serhat Acar, Úna Faller, and Diego Rodriguez.