The following is meant to be an informal discussion of some of the issues involved in integrating access to Electronic Resources.
Definition: integrating access means to simplify user access to online material.
Definition: Electronic Resources means all of our online material, including the catalogue, indexes, databases, and full-text as well as title lists that link to these.
In (some people’s view of) an ideal world, library resources would be searchable from a single search box. Fielded searches would also be available, as would mechanisms for limiting by resource type, date, etc. In an ideal world, results would be returned marked in some way by their source and type and ranked by some well-understood and perhaps settable relevance. In other words, users wouldn’t have to worry about whether the material they wanted was online, locally available print or available through ILL, except for what these imply about when they would be available.
In the real world of resources widely diverse in type, organization and searchability, providing this level of service has proven to be a challenge. It is worth noting as well that the notion of searching across disparate information items such as bibliographic records, citation information and full-text, contained in such diverse categories of information as physical holdings, local digital collections, article indexes, ejournals or ebooks makes little sense to many people. Unless, of course, there are ways of limiting searches or distinguishing search results in some mostly unambiguous way. It is also worth noting that direct search access to individual resources would likely still be required, as it is very difficult to match their sophisticated level of searching.
General Methods for Integrating Access
There are at least three general approaches to providing integrated access. At this point, at least given the hardware and software at our disposal, none comes very close to providing a solution. What follows are (very) brief descriptions of what they do, what they do well and what they don’t do well.
1. Federated Searching (aka metasearch, MultiSearch, etc.)
This approach assumes (correctly) that online materials of interest are spread over many servers, most of which we have no control over. Federated searching software provides mechanisms for “connecting” to these servers using various mechanisms (Z39.50, XML gateways, HTML “screen scraping”). Search terms are sent, result sets are returned, processed and finally displayed.
Good points: multiple database searching can actually be performed.
Bad points: HTML “screen scraping” is very unreliable. Result sets vary depending on the source. Results sets are not easily merged and de-duplicated. There is no easy mechanism for ranking results. Remote searches are relatively slow in theory (and very slow in practice, where practice in our case is ERA). A number of our indexes and databases cannot be included for ERA searching due to subscription limits and/or use of thesauri. ERA is very difficult to extend to resources not supported, so much of what we subscribe to is unavailable. Finally, ERA is extremely resource intensive, which means in practice that it will not support many concurrent users; which means in practice that providing access to our heavily used materials is not recommended.
2. Spidering and Indexing
This approach also assumes (correctly) that online materials of interest are spread over many servers, most of which we have no control over. It assumes correctly that all of our resources are accessible through HTTP. It assumes, incorrectly, both that there are mechanisms for linking to all “records” of an electronic resource and that we are allowed to spider our subscription resources. We aren’t, in many cases.
This is the Google approach, which requires spidering software to follow links, indexing software to index the pages found and search software to access the resulting index(es).
Good points: spidering is a much more reliable way of accessing the records of a resource that are HTTP accessible than screen scraping. The resulting index(es) are local and so searching is much more efficient and can be more sophisticated. Relevance ranking is a feature of many spider/indexers. DtSearch is a good example, which we use to spider/index some local collections, as well as our staff web site (coming soon).
Bad points: we are not allowed to spider any of the resources we subscribe to. (That said, however, Google spiders many of the top level of resources we buy, while Google Scholar spiders all of a few of the resources we buy.) While sophisticated searching can be done of the indexes produced, there is typically no easy way to do fielded searches or pull out resources by type. It’s not clear that we want to spider/index our catalogue.
An alternative approach involves Open Archive Initiative harvesting, for those resources that “expose their metadata”. While searches can be done on metadata (which is as good as the metadata provided), many resources don’t “expose their metadata”. Local software supporting OAI-PMH includes Encompass, ContentDM and DSPACE. We don’t currently have software to harvest metadata and make it searchably available.
3. Localizing everything
This might be known as the University of Toronto approach. In conjunction with the OCUL consortium, UofT has purchased large amounts of full text (approximately seven million journal articles and 50 article indexes) and the software and hardware to search it. They would like to “patriate” their other ejournals as well, although it is not clear how successful they will be. Even if successful, UofT users will not be able to simultaneously search all article indexes (unless they all become available through CSA), the catalogue or other digital collections unless the data is stored with the ejournal full text. While the costs are not known, it seems likely this makes sense only for consortiums like OCUL.
What’s Practical?
Since all of these approaches have significant shortcomings in meeting the single search box goal, what is the best approach to take? The software tools we have at our disposal include Voyager, Encompass, SFX, ColdFusion, ContentDM, dtSearch and a few as yet unexplored products like DSPACE and Meridian. Ignoring the latter for the time being, is there any way to use this tool set better?
While we can’t provide much in the way of multi-resource, multi-server searches, we can provide tools to at least generate the appropriate queries (via constructed URLs) for various resource types. We already do this for journals/ejournals, by creating a journal
title (or issn) search of the catalogue at the same time we search our ejournal database. We should investigate extending this to other resource types (e.g. local digital collections). As well, we should:
• Set up a main Electronic Resources page to provide better access to our various resource types.
• Investigate direct (read-only) access of our catalogue and SFX server using ColdFusion. This might make it possible to display records from both, along with records from our ejournal and article indexes databases.
• Continue to monitor ERA for improvements both in performance and addition of reliable “connectors”, as this provides our only federated searching capability.
• Investigate OAI-PMH to harvest local metadata, as well as require all data repositories we make available to be OAI-PMH compliant.
• Investigate spidering all local digital collections, to provide “federated searching” through a tool like dtSearch.

What might a single main electronic resource page look like?
What is ERA?
This is an exciting prospect:
“Investigate direct (read-only) access of our catalogue and SFX server using ColdFusion. This might make it possible to display records from both, along with records from our ejournal and article indexes databases.”
Finally, how will newest iteration
of SFX: effect how & what subject headings
will be added to E Journals?
What might a single main electronic resource page look like?
A number of academic libraries direct users to a single page. Some examples include:
Harvard
U. of Pennsylvania
U. of Toronto
Northwestern
What is ERA?
Encompass for Resource Access
This is an exciting prospect …
Yes; the ability to query multiple remote databases through ODBC or JDBC and display the results could be very useful.
Finally, how will newest iteration
of SFX: effect how & what subject headings
will be added to E Journals?
Subject fields is one of the “value added” services library staff provides. It has become increasingly difficult as the number of titles has increased. Whether the subject headings provided by SFX will be both relevant and accessible is an issue for exploration and discussion.
I looked at the Harvard single access point/main page example and Wow! Everything is there in one clear intuitive HCI friendly layout. the top 15, the icons (including db graphic)and key, the ‘A-Z’ (plus numerics), and “Find it!” what a good meaningful local name for SFX.
Wondering where the connect from home info is or how they handle…is that why the PIN is required in front of every licenced access? And why is there a ‘supplementary’ journal list. Also notice no Information Resource page display before free resources, just a direct link.
Definately this idea merits consideration as a next step to improve services to our community. Thank you Tom. DW
U Penn offers Government Information as a resource type…I thoughtfully applied could be very useful. Does offer IR pages that can be skipped.