XML and Structured Data in the PKP Framework: The Session Blog
Presenter: MJ Suhonos, Session Abstract
July 10, 2009 at 11:00 a.m.
Background
MJ Suhonos is a system developer and librarian with the Public Knowledge Project at Simon Fraser University. He has served as technical editor for a number of Open Access journals, helping them to improve their efficiency and sustainability. More recently, he leads development of PKP’s Lemon8-XML software, as part of their efforts to decrease the cost and effort of electronic publishing, while improving the quality and reach of scholarly communication.
Session Overview
“Lemon8-XML is a web-based application designed to make it easier for non-technical editors and authors to convert scholarly papers from typical word-processor editing formats such as MS-Word .DOC and OpenOffice .ODT, into XML-based publishing layout formats.” (Lemon8-XML).
This was a packed session, 50+ attendees. This technical session attempted to give a fairly non technical overview of the L8X software and its relationship to the PKP software suite and equally importantly to highlight the rich benefits that are provided by using XML workflow and the foundation it provides for the future.
The big question is why use XML workflow. Using XML workflow allows numerous things to be possible. These include interaction with other web services (direct interaction with indexes and better interaction with online reading tools); automatic layout (generate html and/or PDF on the fly); complex citation interaction (forward and reverse linking which allows the discovery of everyone who cited you anywhere on the web; advanced bibliometrics, not just impact measures; resource discovery (universal metadata can find related works; and rich document data allows search engine to be much more effective; the document becomes the metadata (remove separation between article and document so all information is in one place. This is the goal of L8X, to convert articles into structured xml and thus enable these benefits. This is also future proofing as XML makes documents fundamentally open, convertible and preservable. Archiving XML (which is text) is much more flexible than archiving PDF files.
Using XML allows connection and communication to all these systems and means of display. We are also future proofing, as XML will be able to be modified into future formats, as its just text.
Where does this fit within PKP framework? Already being used in OJS (import and export and exposing metadata to OAI harvesters). But the next goal is to apply these benefits to all kinds of scholarly work e.g. journal articles, proceedings, theses, books / monographs. So moving L8X into the PKP web application library will allow all these features to be made available to the whole PKP framework. So that’s the near term future plans for L8X. In the long term, beyond the next few years, the goal is to work on this concept of the doc is the metadata by building support for multiple XML formats in the web application library (WAL) and the merging of annotation, reading tools and comments directly into the article.
The distributed resource-linking diagram at the end of the presentation, some find complex. Essentially, structured metadata is needed to make this a reality, which is to let applications in the publication sphere all talk to each other.
Session Questions
Question: How automatic is automatic into XML for non-technical people? When can I just upload my doc and have it magically turn into XML?
Answer: Probably not ever, but it is semi automatic already. Some tools, like L8X, automate part of this process. Some things can be automated, but some will always require human effort.
Question: Will I be able to use L8X in my applications after this is integrated into the PKP framework?
Answer: We would like to be make L8X available for use after it becomes part of the framework and without requiring the framework. We are considering this for the future.
References and Related Links
Lemon8-XML demo server (login: lemon8 password: xmldoc)
2 comments
1 Creating an Open Access Journal; A Medical Students’ Prospective: The Session Blog — PKP Scholarly Publishing Conference Blog 2009 { 07.10.09 at 8:29 pm }
[…] their major cost is copyediting, which they hope will be eliminated by using university resources. XML generation is also a marginal expense as is(use blog on this)website […]
2 XML and Structured Data in the PKP Framework: The Session Blog « India LPO { 07.10.09 at 11:59 pm }
[…] Link: XML and Structured Data in the PKP Framework: The Session Blog […]