Text Analysis Using a Non-English Script

The digital humanist scholar and historian Thomas Mullaney (Stanford University) has argued that there is an “Asia deficit” within Digital Humanities due to the platforms and digital tools that form the foundation of digital humanities (DH). Digital databases and text corpora – the “raw material” of text mining and computational text analysis – are far more abundant for English and other Latin alphabetic scripts than they are for non-Latin orthographies. Although text mining, an emerging area in DH, enables researchers to work with textual content, they are often not applicable to texts (such as the Chinese language) due to the differences in language structures. In western languages, words are usually defined by white spaces or punctuation while the lack of punctuation and whitespace in Chinese texts represents one of many significant barriers to entry in this area of research.

Minghui Yu, Programmer Analyst, UBC IT has been conducting research in the area of text analysis for a number of years, including a TLEF-funded research project called Daxue 2.0, and will examine some tools that will examine the current state of non-DH text analysis.


Thursday, November 16th, 2017 at 12:00PM – 2:00PM.


Registration online. Link for registration.

Leave a Reply

Your email address will not be published. Required fields are marked *

Spam prevention powered by Akismet