Wikisource:Esplanada/Update on the OCR Improvements

Wikisource, a biblioteca livre

Hello! Sorry for writing in English. Ajude a traduzir para a sua língua, por favor.

The OCR Improvements are complete. We, the Community Tech team, are grateful for your feedback from the beginning to the last stage when we were finalizing the interface.

Engine improvements[editar]

OCR menu in toolbar
OCR menu in toolbar
Reliability

Prior to our work, the OCR tools were separate gadgets. We have added "Wikimedia OCR." It is available under one icon inside the toolbar on all Wikisource wikis. This tool supports two other OCR tools, Tesseract and Google OCR. We expect these tools to be more stable. We will maintain Wikimedia OCR.

The gadgets will remain available. The communities will have sovereignty over when to enable or disable these.

Speed

Prior to this work, transcription would take upwards of 40 seconds. Our improvements average a transcription time under 4 seconds.

Advanced Tools improvements[editar]

Multiple-language support

Documents with multiple languages can be transcribed in a new way.

  1. Open the Opções avançadas
  2. Select the Languages (optional) field
  3. Search for and enter the languages in order of prevalence in the document.
UI Crop tool in Advanced tools
Cropping tool / Multi-column support

We have included a Cropper tool. It allows to select regions to transcribe on pages with complicated layouts.

Discoverability and accessibility of OCR

We have added an interface for new users. It is pulsating blue dots over the new icon in the toolbar. The new interface explains what OCR means and what transcription means in Wikisource.

We believe that you will do even more great things because of these changes. We also hope to see you at the 2022 Community Wishlist Survey. Thanks you again for all your opinions and support.

Please share your opinions on the project talk page!

NRodriguez (WMF) and SGrabarczuk (WMF) 01h57min de 19 de agosto de 2021 (UTC)[responder]