Live-blogging the 2009 Vancouver PKP Conference

Importing Backissues into OJS: Development of an OJS Import Script with Django: the Session Blog

Friday, July 10th, 2009 (11:00 AM)
SFU Harbour Centre (Sauder Industries Room 2270)

Presenter: Syd Weidman, Library Systems Supervisor, University of Winnipeg – Session Abstract

Session Overview

Why was this an issue?

With the transition to open-access publishing of several journals at the University of Winnipeg, Syd Weidman and the University library have been involved in multiple aspects of this transformation.  Given that these journals have been in print for decades, one of the major obstacles that needed to be addressed was the importation of back issues into an online, open-access compatible format.

Initial attempts using the available software proved difficult.  They were met with bugs and their associated patches; overall, Syd described the process as “laborious and convoluted”.  He surmised that “in the context of importing [a large volume of] back issues, small efficiencies [may] have a large impact.”  With this notion in mind, Syd began work on the Open Journal Systems (OJS) Import Project.

Tackling the problem – Use of Django

Syd highlighted the basic design goals of any software to be used for this purpose; he stressed that the process needed to be as EASY as possible.  He sought to optimize the software’s ease of CONSTRUCTION, USE, DEPLOYMENT and MAINTENANCE.  Being most familiar and comfortable with the Python programming language, Syd opted to use the Django Web framework to build a Web-based application to carry out the task of importing back issues.

Django is an open-source framework that was initially used by the online publishing industry.  With a short digression, Syd took a moment to review the “4 freedoms” of open-source software, being the “free” use of software for any purpose, free access to its code, freedom to modify as well as an understanding that improvements will be shared with others (for more, take a look at the Free Software Foundation’s website.)  Django, in particular, has several advantages over other similar frameworks, namely:

  • object relational mapping – allows use of fewer lines of programming, increasing robustness
  • automatic administrator interface
  • elegant URL design
  • pluggable template system
  • flexible and robust cache system
  • i18n compatible – allows for the application to be adapted to other languages without significant engineering changes
  • excellent documentation
  • an active mailing list (a double-edged sword!)

Success!

With the development of the new importation software, the U of W was able to scan backissues into .pdf format, to ultimately be uploaded into their respective online journals.  This required the entering of appropriate metadata in order to allow for accurate archiving and searching.

Challenges and future directions

One of the difficulties in developing a script for another piece of software is to ensure that they remain in sync when new versions appear.  In a subsequent OJS release following the development of the OJS importing application, incompatibilities/bugs appeared, and needed patching.

Commentary/Questions

Just prior to the question period, Syd mentioned the recent development of another application, “Quick Submit”, which may now be able to perform similar functions to his program.

Related Links

University of Winnipeg library (and their OA publications: Canadian Bulletin of Medical History, Journal of Mennonite Studies and the Canadian Children’s Literature Journal)
Python programming language
Django framework

References

Weidman, S. (2009). Importing backissues into ojs: development of an ojs import script with django. PKP Scholarly Publishing Conference 2009. Retrieved 2009-07-08, from http://pkp.sfu.ca/ocs/pkp/index.php/pkp2009/pkp2009/paper/view/190

July 11, 2009   Comments Off on Importing Backissues into OJS: Development of an OJS Import Script with Django: the Session Blog

PKP Open Archives Harvester for the Veterinarian Academic Community: The Session Blog

Date: July 9, 2009

Presenters: Astrid van Wesenbeeck and Martin van Luijt – Utrecht University

PKP 2009

Photo taken at PKP 2009, with permission

Astrid van Wesenbeeck is Publishing Advisor for Igitur, Utrecht University Library
Martin van Luijt is the Head of Innovation and Development, Utrecht University Library

Abstract

Presentation:

Powerpoint presentation used with permission of Martin van Luijt

Quote: “We always want to work with our clients. The contributions from our users are very important to us.”

Session Overview


The University Library is 425 years old this year. While they are not scientists or students, they have a mission to provide services that meet the needs of their clients. Omega-integrated searches bring in all metadata and indexes it from publishers and open access areas.

Features discussed included the institutional repository, digitization and journals [mostly open and digital, total about 10 000 digitized archives].

Virtual Knowledge Centers [see related link below]

– this is the area of their most recent work
– shifts knowledge sharing from library to centers
– see slides of this presentation for more detail

The Problem They Saw:

We all have open access repositories now. How do you find what you need? There are too many repositories for a researcher to find information.

The Scenario

They chose to address this problem by targeting the needs of a specific group of users. The motivation – a one-stop shop for users and increased visibility for scientists.

The Solution:

Build an open-access subject repository, targeted at veterinarians,  containing the content of at least 5 high-profile veterinarian institutions and meeting other selected standards.

It was organized by cooperating to create a project board and a project team consisting of knowledge specialists and other essential people. The user interface was shaped by the users.

Their Findings:

Searching was not sufficient, the repository content, to use his word, “Ouch!” Metadata quality varied wildly, relevant material was not discernible, non-accessible content existed and there were low quantities in repositories.

Ingredients Needed:

A harvester to fetch content from open archives.

Ingredients Needed 2:

Fetch more content from many more archives, filter it and put it into records and entries through a harvester, then normalize each archive, and put it through a 2000+ keyword filter. This resulted in 700,000+ objects.

Ingredients 3:

Use the harvester, filter it and develop a search engine and finally, a user interface.

Problem: The users wanted a search history and pushed them into dreaming up a way of doing that without a login. As designers, they did not want or need that login, but at first saw no way around a login in order to connect the history to the user. Further discussion revealed that the users did not have a problem with a system where the history did not follow them from computer to computer. A surprise to the designers, but it allowed for a login-free system.

Results: Much better research. Connected Repositories: Cornell, DOAJ, Glasgow, Ugitur, etc.

Workshop Discussion and Questions:

1. How do you design an intelligent filter for searches? [gentleman also working to design a similar search engine] Re-harvesting occurs every night with the PKP harvester rerunning objects through the filter. Incremental harvests are quick. Full harvests take a long time, a couple weeks, so they try not to do them.

2. Do you use the PKP harvester and normalization tools in PKP? We started, but found that we needed to do more and produced a tool outside the harvester.

3. <Question not heard> It was the goal to find more partners to build the tool and its features. We failed. In the evaluation phase, we will decide if this is the right moment to roll out this tool. From a technical viewpoint, it is too early. We may need 1 to 2 years to fill the repositories. If you are interested in starting your own, we would be delighted to talk to you.

4. I’m interested in developing a journal. Of all your repositories, do you use persistent identifiers? How do I know that years down the road I will still find these things? Is anyone interested in developing image repositories? There is a Netherlands initiative to build a repository with persistent identifiers. What about image repositories? No. There are image platforms.

5. Attendee comment: I’m from the UK. If valuable, we’ll have to fight to protect these systems because of budget cuts and the publishers fighting. So, to keep value, we’ll have to convince government about it.

Related Links:

OAI6 talk in Virtulal Knowledge Centers

University Library at Utrecht

Online Journal

Open access interview

http://www.igitur.nl

http://www.darenet.nl

http://www.surf.nl

http://www.openarchives.org

NARCIS, a dutch repository of theses

First Monday article

Posted by Jim Batchelor, time, date

July 9, 2009   Comments Off on PKP Open Archives Harvester for the Veterinarian Academic Community: The Session Blog