The UBC Digitization Centre is responsible for the creation of more than 50 collections, all available through the Open Collections website. Our collections are diverse in formats, information and languages.

Having non-English materials, or materials that are not written using the Latin-based alphabet, may be a barrier to access and retrieving information. But technology can be used to help us minimize these barriers.

Laura Ferris and Rebecca Dickson, from the Digitization Centre, have discovered a process to generate searchable transcripts for non-Latin text. The idea originated from an article about a workshop on Optical Character Recognition for Bangla. The result of the workshop was the realization that Google Drive was the most accurate tool for generating transcripts for non-Latin text.

With that information in hand, Ferris and Dickson started to explore Google Drive to create an automated workflow for transcribing batches of items.

Are you interested in trying the workflow out for yourself? If so, check the instructions that Rebecca prepared and give it a try!

  1. Access Google Drive, create a “New folder” and rename it
  2. Create a Google Sheet inside the folder
  3. Open the Sheet, click on “Share”, “Receive shared link” and look for the sheet identifier (the numbers and letters between /d/ and /edit?)
  4. In the Sheet, under “Tools” menu, click “Script editor”
  5. Paste the content from “gs” into the script editor
  6. Update the “folderName” with the name of your folder (defined in step 1)
  7. Update the “sheetId” with the identifier that you found in step 3
  8. Click the “clock” icon and select the options: “extractTextOnOpen”, “From spreadsheet” and “On open”
  9. Save the script editor and close it
  10. Upload jpegs to the folder (you can check out the sample items prepared for this work)
  11. Open the spreadsheet and wait for Google to do the work!

 

If you want to check Laura and Rebecca’s presentation about the topic, check out their slides. If you have questions, feel free to contact us.

 

Sources:

A workshop on Optical Character Recognition for Bangla (British Library)

OCR for non-English language text (Pixelating)

Pixelating-ocr (GitHub)

LAW LIBRARY level 3: K120 .M83 2017
Ronda Muir, Beyond Smart: Lawyering with Emotional Intelligence (Chicago: American Bar Association, Section of Dispute Resolution, 2017).

LAW LIBRARY level 3: K1401.5 .I58 2018
Christoph Antons & William Logan, eds. Intellectual Property, Cultural Property and Intangible Cultural Heritage (Christoph Antons & William Logan).
Online access: http://resolve.library.ubc.ca/cgi-bin/catsearch?bid=9116849

LAW LIBRARY level 3: K1485 .C67 2018
Dennis Campbell, ed., Copyright Infringement (Alphen aan den Rijn: Kluwer Law International B.V., 2018.

LAW LIBRARY level 3: K5191.W65 L44 2018
Rashida Manjoo & Jackie Jones, eds., The Legal Protection of Women from Violence: Normative Gaps in International Law (Abingdon: Routledge, an imprint of the Taylor & Francis Group, 2018).

LAW LIBRARY level 3: KD2960.M55 J66 2018
Michael A. Jones, Medical Negligence, 5th ed. (London: Sweet & Maxwell, 2018).

LAW LIBRARY level 3: KE3109 .D66 2017
David J. Doorey, The Law of Work (Toronto: Emond, 2017).

LAW LIBRARY level 3: KE8973 .D38 2018
Grace Hession David & Jonathan Shime, Prosecuting and Defending Fraud Cases: A Practitioner’s Handbook (Toronto: Emond Montgomery Publications Limited, 2018).

LAW LIBRARY level 3: KF306 .R677 2018
Ronald D. Rotunda, Legal Ethics in a Nutshell, 5th ed. (St. Paul: West Academic Publishing, 2018).

LAW LIBRARY level 3: KF3197 .S26 2018
Sharon K. Sandeen & Elizabeth A. Rowe, Trade Secret Law Including The Defend Trade Secrets Act of 2016 in a Nutshell, 2d ed. (St. Paul: West Academic Publishing, 2018).

LAW LIBRARY level 3: KF3775.Z9 H63 2018
James G. Hodge, Jr., Public Health Law in a Nutshell, 3d ed. (St. Paul: West Academic Publishing, 2018).

LAW LIBRARY level 3: KF3945 .S685 2018
John G. Sprankling & Rachael E. Salcido, The Law of Hazardous Wastes and Toxic Substances in a Nutshell, 3d ed. (St. Paul: West Academic Publishing, 2018).

LAW LIBRARY level 3: KF4770 .B37 2018
Jerome A. Barron & C. Thomas Dienes, First Amendment Law in a Nutshell, 5th ed. (St. Paul: West Academic Publishing, 2018).

LAW LIBRARY level 3: KF8779 .M35 2017
Raymond J. McKoski, Judges in Street Clothes: Acting Ethically Off-the-bench (Madison: Fairleigh Dickinson University Press, 2017).

LAW LIBRARY level 3: KF8841 .K36 2018
Mary Kay Kane, Civil Procedure in a Nutshell, 8th ed. (St. Paul: West Academic Publishing, 2018).

LAW LIBRARY level 3: KF8935.Z9 G73 2018
Michael H. Graham, Federal Rules of Evidence in a nutshell, 10th ed. (St. Paul: West Academic Publishing, 2018).

LAW LIBRARY level 3: KUQ716.7 .J65 2016
Carwyn Jones, New Treaty, New Tradition: Reconciling New Zealand and Māori Law (Wellington: Victoria University Press 2016).

LAW LIBRARY reference room (level 2): KUQ2568 .J663 2016
Carwyn Jones, New Treaty, New Tradition: Reconciling New Zealand and Māori Law (Vancouver: UBC Press, 2016).

LAW LIBRARY level 3: UG1242.D7 S69 2016
Sara M. Smyth, Drone Controversies: Ethical and Legal Debates Surrounding Targeted Strikes and Electronic Surveillance (Toronto: Thomson Reuters Canada, 2016).

The exhibition, “And there’s the humor of it”: Shakespeare and the four humors, will run from June 4 to July 14, 2018, at Rare Books and Special Collections on Level 1 of the Irving K. Barber Learning Centre and in the Memorial Room at Woodward Library. The theory of the four humors was initially borrowed from Ancient […]

a place of mind, The University of British Columbia

UBC Library

Info:

604.822.6375

Renewals: 

604.822.3115
604.822.2883
250.807.9107

Emergency Procedures | Accessibility | Contact UBC | © Copyright The University of British Columbia

Spam prevention powered by Akismet