This is a series on web archiving at the UBC Library. For all posts about web archiving, please see

From the new report by the United Nations Intergovernmental Panel on Climate Change, to the new NAFTA (USMCA) Agreement to Vancouver’s housing crisis, government information is all around us. Historically, government information was sent to academic libraries via depository agreements, but with the phasing out of print publishing in favor of born digital publications, the majority of these deposit agreements have ceased.

Born digital information can be taken down as quickly as it is published and government information is no exception. Websites are removed for a variety of reasons including the site being seen as outdated, perceived national security issues, changes in administrations or organizational and departmental website guidelines. Canada’s federal Guidelines on Implementing the Standard on Web Accessibility includes a section on website links perceived to be redundant, outdated or trivial (ROT). What may be trivial according to government guidelines could be of value to researchers, historians or the general public, which is where the importance of web archiving comes in.

Since 2013, archiving government websites has been at the forefront of UBC Library’s archiving initiatives.  One of the Library’s first web archiving projects involved archiving federal government websites. In 2013, the federal government announced that the government’s web presence would be consolidated from over 1500 websites down to essentially one –   Librarians were warned that the merger would result in the removal of valuable information including reports and data, which wouldn’t be transferred to the new site. Due to the enormous scale of the project, UBC Library collaborated with other academic libraries across Canada to quickly archive nine federal departments including Citizenship and Immigration Canada, Canadian Heritage, the National Research Council, Elections Canada and the Canadian Human Right Council. These sites are now preserved and viewable on the Library’s Archive-IT collection page.

Canadian Government Information – Digital Preservation Network (CGI-DPN)

The federal government website project was initiated by the Canadian Government Information – Digital Preservation Network (CGI-DPN), a national collaborative web archiving group established in 2012, of which UBC is a partner.

Modelling itself on the U.S. Digital Federal Depository Library Program (FDLP), CGI-DPN uses LOCKSS to distribute copies of replicated Canadian government information in secure dispersed locations including British Columbia.

The CGI-DPN web archive includes copies of the Depository Services Program E-collection, at-risk government websites of all jurisdictions (federal, provincial, municipal) as well as thematic collections. UBC is a LOCKSS node for the CGI-DPN and participates in curating various collections for the project. The collections are all available via

Municipal government collection

Along with archiving federal websites, we have also partnered with Simon Fraser University and the University of Victoria to capture local municipal content. UBC Library archived 132 municipal websites which are hosted on the University of Victoria’s British Columbia Local Governments Archive-it collection.

One of the benefits of archiving sites and curating a collection is that the content is all located in one area. Some cities, like the City of Vancouver, archive their own web domain but a researcher would have to visit each site as opposed to viewing all the collections in one account and in some cases these municipalities view archiving purely as preservation and keep their collections “dark” and not open to the public.


The challenges of web archiving government content include copyright issues as well the necessity of working in an agile environment. Copyright for government websites varies from province to province as each province and territory interprets crown copyright differently. Some governments allow their domains to be archived while others do not; the Province of British Columbia is one that does not allow their site to be archived.

Websites can come down very quickly and sometimes we only have hours or days to capture this content. Working collaboratively with other institutions across British Columbia and Canada has allowed us to preserve material that would have otherwise disappeared forever.

Current government collections we are actively engaged in archiving include the BC local government elections, impacts of the legalization of marijuana, and Vancouver’s recently announced rapid transit projects.

We always welcome suggestions, so if you have any ideas for government collections please fill out our web archiving proposal form!

By Susan Paterson, Government Publications Librarian

This is a series on web archiving at the UBC Library. For all posts about web archiving, please see

The Digital Initiatives unit at UBC Library offers an opportunity every term for students to have a Professional Experience project in web archiving. During the Summer of 2018, I had the chance to work with them in the web archiving initiatives.

The Professional Experience was great in many ways. I had the opportunity to learn about web archiving, from understanding its importance to performing quality assurance on crawled web pages. During the term, I focused on creating a web archiving collection of sites related to Marijuana Legalization in Canada, and more specifically in British Columbia. The collection will enable people to access web content that was created, like awareness campaigns, and also see different perspectives about the topic.


Why web archiving?

Developing a working knowledge of web archiving seemed like a great opportunity. New data and content are posted to the internet every day and a great portion of it is made of information with a short life cycle—social media and news, for example.

Along with the massive production of information, there is also information loss. Websites, web pages, and links just stop working, because someone decided that the information was no longer useful or the content was moved from one site to another. Web archiving is an alternative to prevent information loss. Archiving websites and webpages uses a process that involves information curation, copyright, crawling the web, quality assurance and a lot of troubleshooting.



Web archiving initiatives are growing. Not only are academic and public libraries investing in web archiving, but companies and cities are as well. Libraries have created collections to serve different purposes and preserve information on specific topics like institutional memory, elections, natural disasters, landmark laws, politics, educational purposes, and more.

Companies that use web archiving services tend to do so for two main reasons: competitiveness and litigation purposes. For example, a company may want to avoid or create legal processes, based on what is on the internet, or to save statements and information released about a competitor.

As a future librarian, I perceive several opportunities with web archiving, due to the profile of our profession. In general, librarians are experts when it comes to monitoring information, content curation, metadata, users’ needs, copyright, and technology. Those are some of the skills and knowledge needed to work with web archiving in the mentioned contexts.

In turn, web archives are a useful tool for librarians as they can help in many ways, for example:

  • Reducing the amount of work needed to update broken links on Research Guides
  • Making it easier to find new resources to substitute ones that are no longer available for access
  • Ensuring access to great resources on the web, without worrying if they will still be available
  • Registering information that is easily lost, like social media and news



While web archiving is full of opportunities, it is also full of challenges. The main ones in working with web archiving are:

  • To perform quality assurance (QA): sometimes web crawlers have problems collecting information from websites with interactive content, for example. Figuring out how to scope and define rules for crawling in order to properly display web pages may be challenging sometimes.
  • To find a balance between archiving content and data loads: finding the ideal scope and rules helps to find the balance but is not everything. Decision making is required to find the balance between how much of data will be saved (meaning how much will be invested) and archiving the website/web page (how much content should be web archived) is another challenge.



A Professional Experience in web archiving is an excellent opportunity to learn about the topic, have a hands-on experience, work with professionals from the field and strengthen your resume. The position will enable you not only to learn about web archiving, but also to exercise and improve your skills related to time and project management, reporting, and to work autonomously.

If you are interested in getting to know more about web archiving, then check the resources:


Written by Paula Arasaki, MLIS student at UBC

This is a series on web archiving at the UBC Library using the Internet Archive’s Archive-it service. For all posts about web archiving, please see


Web archiving projects can be proposed by anyone: community members, students, researchers, librarians etc. We have also completed larger collaborative projects that were proposed by a group of libraries – such as the 2017 B.C. Wildfires collection. The proposal is reviewed by Digitization Centre Librarians as well as subject-matter experts that we identify as collaborative or consulting partners. For example, the Site C Dam is a hydroelectric project managed by B.C. Hydro, a Crown corporation, and has a significant impact on the province’s First Nations groups. For this reason, the proposal for our B.C. Hydro Site C Dam web archiving collection was evaluated and later developed with the UBC Library’s Government Publication Librarian Susan Paterson, and Aboriginal Engagement Librarian Sarah Dupont.


To be selected as a candidate for web archiving, websites are evaluated based on their risk of disappearance, originality, availability in other web archiving collections, and copyright considerations … among other factors. While we are currently updating our collection policy for web archiving collections, potential collections and websites are evaluated based on their intrinsic value and significance to researchers, students, and the broader community. We aim to capture content that is relevant to the needs of the wide range of subject areas taught and researched at UBC, as well as content that contributes to the institutional memory of the university.


Web archiving projects are resource-intensive. Websites are assessed for how extensive we anticipate that the content captured will be and therefore how much of our subscription’s data storage will be used. We also consider how much time the project will take, and who is available to undertake the work within the required time frame. Some projects need to be responsive to current events unfolding in real time – such as a political rally, or catastrophic event such as an earthquake – with resources required to identify the content and set up the crawls immediately. While we are fortunate in having had a number of students from the iSchool interested in working on our web archiving projects, they may already be committed to working on another project.

Technical considerations

Websites are constructed in many different ways, with a range of elements that dictate how the site behaves. Before starting a project we consider archive-friendliness: how easily the content, structure, functionality, and presentation of a site will be captured with Archive-it. Dynamic content – anything that relies on human interaction to function, or that contains database-driven content – can be problematic. Sites constructed using Javascript or Flash often mean that web crawlers have trouble capturing certain elements on a web page. While there are ways to customize the crawls in certain cases, a successful crawl can take time to construct and success is not always guaranteed.


Aside from capturing the web content itself, we create metadata for each collection and each website (or “seed”) that we capture; this often includes a description for the collection and the seed, as well as the creator of the seed and subject terms for the content, such as Environmental protection for a seed in the Site C Dam collection.

This metadata provides context for why the seed was included in the collection, and helps users discover the content relevant to their interests when searching in Archive-it.


By Larissa Ringham, Digital Projects Librarian

Estimates put the current size of the internet at around 1.2 million terabytes and 150 billion pages. Sites go up, sites come down, pages are removed, content changes continuously. And an increasing amount of this information is available only online. You might not care if you can no longer access the comments about watching someone’s grass grow, but you may be concerned one day to find that a political candidate no longer has their statement of opposition on an important local topic up on their campaign website – a statement that you desperately need for your research.

Fortunately much of the content on the web today is captured by the Internet Archive, which harvests and makes available web content through its Wayback Machine. Sites are crawled by the Wayback Machine for archiving on an irregular schedule; depending on a variety of factors such as how heavily a site is linked, the Internet Archive web crawlers may crawl a site several times a day – or only once every few months. Web content can change so frequently that unless you can specify exactly when the content on a specific site is captured, there is a chance that information will be lost forever. The Wayback Machine does what it can, but it has billions of web pages to try to crawl.

Enter Archive-it. Archive-it is a subscription web archiving service that the Internet Archive created in order to give organizations like the UBC Library the ability to harvest, build, and preserve, collections of digital content on demand. This service gives us control over what we crawl and how often, and allows us to apply the metadata that will permit users to find our archived web content more easily. And information can now be pulled out of our collections for analysis using Archive-it’s API. The sites we harvest are available on our institution’s Archive-it home page, and are added to the Wayback Machine’s own site crawls so that our information is full text searchable, and freely available to anyone in the world at any time.

We started web archiving in 2013, when a group of university libraries – including UBC – began crawling the Canadian federal government websites collaboratively in order to capture content important to Canadians that was scheduled for removal online. Since then, we have created nine collections of archived web content, with three more under active development. These collections are representative of the research interests of UBC and its community, and include such topics as the BC Hydro Site C Dam project and First Nations and Indigenous Communities websites, as well as the University of British Columbia websites themselves.

Over the next few weeks we will be exploring some aspects of our web archiving work at UBC, and will hear from some of our library partners and past students who have done work in the area. Stay tuned for posts on developing web archiving projects, archiving government web content, and the technical limitations of web archiving.

See all posts related to web archiving:


By Larissa Ringham, Digital Projects Librarian

Travelling and tourism are prominent topics in our collections. You could even base your next vacation on some of the items we’ve digitized! Check out some of the major cities represented in the materials of the Canadian Pacific Railway Company.



As the capital of the Philippines, Manilla is full of beautiful and fashionable architecture, European-inspired shops, old monasteries, palaces, and picturesque Spanish-style houses. This particular guide notes that cockfighting, although prohibited, was the popular sport among locals.

Canadian Pacific cruises: round the world and Mediterranean, 1925-1926



For about 300 years, the Algiers was where Barbary pirates used to stay. This guide tells readers that visitors can see the Bab-Azoun shopping street and the Kasbah (or Palace of the Deys), and marvel at the city’s rich history.

Canadian Pacific cruises: round the world and Mediterranean, 1925-1926



This guide published in 1925 proclaims Shanghai to be the most cosmopolitan city in the world and the commercial capital of North China. The guide informs visitors could see modern streets lined with six-storey buildings, the longest bar in the world at the Shanghai club, and amazing shopping streets that featured department stores and amusement paces.

Around the world cruise 1925


If these little snippets have got you interested, check out some of our other guides and explore more cities!

Mediterranean cruise: Canadian Pacific Empress of Scotland, 1924


Around the world cruise 1925


Canadian Pacific cruises: round the world and Mediterranean, 1925-1926

Old manuscripts can be full of surprises. Going through our collection of Western Manuscripts and Early Printed Books Collection can almost feel like a treasure hunt! You can find amazing and delicate details in the drawings, the margins, and even the letters of the page. But do you know why manuscripts were created so intricately?

The decorative elements found in borders, drawings, and initials on manuscripts had a purpose. These details increased the value of the material while enhancing the appearance of the manuscript. They helped readers to interpret the text by offering visual content, which aided literacy, and by delineating passages of text. Decorations could also be used to demonstrate the importance of the manuscript owner or of the represented person.

Book of hours, 1440


Books of hours, 1440


Illuminated manuscripts often used bright colors from natural pigments, gold leaf and even semi-precious stones for decoration. Illuminators were very skilled and specialized workers, and it was common for a single work to take multiple years to complete. Illuminated manuscripts were of high prestige and were often given as diplomatic gifts or to celebrate dynastic marriages. Although the increasing popularity of the printing press in the 16th century meant that books no longer had to be hand-written, many aristocrats and rulers continued to order these manuscripts for private devotion and viewing.

In addition to illuminations, manuscripts could also incorporate decoration in other ways. Common decorative embellishments were:

  • Initials: decorations varied from a simple rubrication (inking the initial letter in red) to drawings inside the initial letter, which were color painted and represented a scene from the text.
  • Borders: these varied from a simple pen-work with red or blue ink to borders using gold leaf and pigments. Expensive books sometimes had borders on every page, while others would have borders only on the first page, or to introduce new parts of the text.
  • Miniatures: these were smaller works of art that could fulfill a range of purposes, such as providing commentary, deepening the reader’s understanding of the text, or by providing aesthetic and meditative value.


Spanish chant manuscript, 1625


[Quaestiones super tres libros De anima Aristotelis], 1480

[Quaestiones super tres libros De anima Aristotelis], 1480

Explore our Western Manuscripts and Early Printed Books Collection to check out these and other items!



An introduction to illuminated manuscripts (British Library)

Illuminated manuscripts (National Gallery of Art)

Manuscripts and special collections (University of Nottingham)

Manuscript illumination in Northern Europe (Metropolitan Museum of Art)

a place of mind, The University of British Columbia

UBC Library





Emergency Procedures | Accessibility | Contact UBC | © Copyright The University of British Columbia

Spam prevention powered by Akismet