New & Noteworthy

Library Hi Tech News

ISSN: 0741-9058

Article publication date: 18 October 2011



(2011), "New & Noteworthy", Library Hi Tech News, Vol. 28 No. 8.



Emerald Group Publishing Limited

Copyright © 2011, Emerald Group Publishing Limited

New & Noteworthy

Article Type: New & Noteworthy From: Library Hi Tech News, Volume 28, Issue 8

Automating quality assurance and assessment of digital collections with AQuA

Manual quality assurance of digitised content is typically fallible and can result in collections that are marred by a variety of quality issues. Poor storage conditions can result in further damage due to bit-rot. Technological obsolescence can lead to additional risks. Detecting, identifying and fixing these issues in legacy digital collections are costly and time-consuming manual processes. Identifying problems in a timely manner following digitisation or acquisition supports more effective and cost-efficient mitigation.

The Automating Quality Assurance (AQuA) project applied a variety of existing software tools in order to automate quality assurance and assessment. Two AQuA events brought together digital preservation practitioners, collection curators and technical experts to present problematic digital collections, articulate requirements for their validation and apply tools to automate the detection and identification of preservation and quality issues.

The “collections, issues and solutions” page describes all of the digital collections that were considered, the corresponding preservation or quality issues identified, and the solutions to those issues that the AQuA participants developed. Links to tools, scripts and other useful and reusable results are included: Solutions

The project has now closed but the AQuA approach lives on in a new event in September 2011 from the Digital Preservation Coalition and Open Planets Foundation (OPF) that will use the event format developed by AQuA:

Results from the JISC funded AQuA project mashups can be found on the project wiki: AQuAResults

AQuA project:


New vocabulary data added to LC Authorities and Vocabularies Service

The Authorities and Vocabularies web service was first made available in May 2009 and offered the Library of Congress Subject Headings (LCSH), the library’s initial entry into the linked data environment. In part by assigning each vocabulary and each data value within it a unique resource identifier, the service provides a means for machines to semantically access, use and harvest authority and vocabulary data that adhere to W3C recommendations, such as Simple Knowledge Organization System (SKOS), and the more detailed vocabulary MADS/RDF. In this way, the Authorities and Vocabularies web service also makes government data publicly and freely available in the spirit of the open government directive. Although the primary goal of the service is to enable machine access to Library of Congress data, a web interface serves human users searching and browsing the vocabularies.

Sally H. McCallum, Chief, Network Development and Standards Office of the Library of Congress, has announced that the library has made available additional vocabularies from its Authorities and Vocabularies web service (ID.LOC.GOV), which provides access to Library of Congress standards and vocabularies as linked data. The new dataset is:

  • Library of Congress Name Authority File (LC/NAF).

In addition, the service has been enhanced to provide separate access to the following datasets which have been a part of the LCSH dataset access:

  • Library of Congress Genre/Form Terms; and

  • Library of Congress Children’s Headings.

The LC/NAF data are published in RDF using the MADS/RDF and SKOS/RDF vocabularies, as are the other datasets. Individual concepts are accessible at the ID.LOC.GOV web service via a web browser interface or programmatically via content negotiation. The vocabulary data are available for bulk download in MADS and SKOS RDF (the name file and main LCSH file will be available by Friday, August 12).

The new datasets join the term and code lists already available through the service:

  • LCSH;

  • Thesaurus of Graphic Materials;

  • MARC Code List for Relators;

  • MARC Code List for Countries (which reference their equivalent ISO 3166 codes);

  • MARC Code List for Geographic Areas;

  • MARC Code List for Languages (which have been cross referenced with ISO 639-1, 639-2 and 639-5, where appropriate); and

  • PREMIS vocabularies for cryptographic hash functions, preservation events and preservation level roles.

The above code lists also contain links with appropriate LCSH and LC/NAF headings. Additional vocabularies will be added in the future, including additional PREMIS controlled vocabularies.

All are invited to explore the Authorities and Vocabularies service at:

User feedback is important and welcomed, and user contributions directly inform service enhancements. Users can send comments or report any problems via the ID feedback form or ID listserv (

Principles on open bibliographic data

Producers of bibliographic data such as libraries, publishers, universities, scholars or social reference management communities have an important role in supporting the advance of humanity’s knowledge. For society to reap the full benefits from bibliographic endeavours, it is imperative that bibliographic data be made open – that is available for anyone to use and re-use freely for any purpose.

The Open Bibliographic Data Working Group of the Open Knowledge Foundation recommends adopting and acting on the following four principles on Open Bibliographic Data:

  1. 1.

    When publishing bibliographic data, make an explicit and robust license statement.

  2. 2.

    Use a recognized waiver or license that is appropriate for data.

  3. 3.

    If you want your data to be effectively used and added to by others they should be open as defined by the Open Definition ( – in particular non-commercial and other restrictive clauses should not be used.

  4. 4.

    Where possible, explicitly place bibliographic data in the public domain via Public Domain Dedication and Licence or Creative Commons Zero Waiver.

The Working Group continues to offer the opportunity, for both individuals and groups, to endorse the four principles on Open Bibliographic Data.

Endorse the principles at:

Open Bibliographic Data Working Group:

Clustering and Sustaining Digital Resources: case studies from JISC

Alastair Dunning, JISC Digitisation Programme Manager, reports that JISC has recently published a set of case studies from UK projects looking at two particular themes:

  1. 1.

    embedding digitisation within an institution; and

  2. 2.

    clustering digital resources.

With an investment of £2.1 million JISC has funded this programme of work which will not only create more digital content, but also help sustain and deliver existing content in a more effective way. Clustering and Sustaining Digital Resources presents the final case studies of the projects in the programme.

The JISC Digital Conference in Cardiff, 2007 report, made it clear that there is a need to break down silos of content, and to create a connected mass of digitised content. At the same time, new business models need to be developed while also developing the skills and expertise to allow for the long-term sustainability of digital content.

With these strategic priorities in mind, JISC has funded 11 projects under two separate strands:

  1. 1.

    institutional skills and strategy; and

  2. 2.

    clustering and enhancement.

The report Clustering and Sustaining Digital Resources, available in multiple formats, including ePub for eBook readers, brings together the final case studies of the projects in the e-content programme 2009-2011.

Projects in the first strand looked at the skills required to build and sustain digital collections, with a focus on how universities embed digitisation as a strategic activity within their core work. The clustering and enhancement strand draws on case studies examining how digital silos can be broken down, as users demand increasingly sophisticated resources that cluster or aggregate-related content from different areas of the internet.

The case studies in the report will be particularly useful for those who wish to:

  • develop digitisation strategies within their own institutions; or

  • create web sites that cluster together digital content from disparate sources.

All projects funded under the e-content programme deliver digital collections openly accessible to all. In addition, all digitised material is licenced under particularly favourable terms to allow for its use and re-use in a number of different educational contexts. The 11 funded projects ran from September 2009 until January 2011.

The volume is freely available in pdf and ePub (for eBook readers) formats, or available for online reading at: AndSustainingDigitalResources.aspx

IU Data To Insight Center to lead investigation into non-consumptive research

Indiana University’s Data To Insight Center (D2I) will lead a $600,000 grant from the Alfred P. Sloan Foundation to fund the first investigation of non-consumptive research for a major mass digitized collection of content. Partners with D2I on this include the HathiTrust Research Center (HTRC) and the University of Michigan’s Department of Electrical Engineering and Computer Science.

“This funding will enable us to pursue a research track around non-consumptive research uses of the HathiTrust digital corpus,” said principal investigator Beth Plale, Professor in the IU Bloomington School of Informatics and Computing and Director of the D2I. “At the end of the project we expect to have cyberinfrastructure in place that successfully demonstrates that non-consumptive research can be carried out safely under the conditions of unintended malicious user algorithms.”

Non-consumptive research involves computational analysis of one or more books without the researcher having the ability to reassemble the collection. Rather than reading the material, researchers use specialized algorithms to analyze text as a massive dataset and the Sloan grant will help ensure the work can be conducted in a secure environment.

In some cases, HTRC would own the algorithms used by researchers, so HTRC needs to examine the security requirements for users, the algorithms and the data, all within the context of using the suite of algorithms available in the Software Environment for the Advancement of Scholarly Research.

In other cases, the researcher would own and submit their own algorithms for use and the Sloan Foundation funding will be used to create what Plale called a “data capsule framework” prototype that would allow the scholar the freedom to experiment with new algorithms on a huge body of information, but with technological “trust but verify” mechanisms in place to confirm compliance with non-consumptive research policy.

Without taking into account the actual content of materials, researchers using their own complex algorithms might analyze such massive datasets for anything as simple as repetition of words to complex linguistic structures or the evolution of word usage over a range of time, space or even demographic class.

The HathiTrust repository contains almost 8.6 million digitized volumes, and about 2.2 million of those – roughly 26 percent – are in the public domain and currently available for non-consumptive research.

The model for implementing non-consumptive research is founded on a principle of trust but verify, where the researcher should generally be trusted to do the right thing and be given the freedoms to carry out creative research, but with mechanisms in place to ensure good behavior and adherence to rules. The security aspects of the project leverage research by Atul Prakash of University of Michigan, also a principal investigator on the project with Plale.

Leveraging cyberinfrastructure at Indiana University, including FutureGrid, and at the University of Illinois at Urbana-Champaign, the HTRC will provision a secure computational and data environment. “This collaborative cyberinfrastructure test-bed will serve as a proving ground for our research agenda around non-consumptive uses of the collection,” said Robert H. McDonald, Associate Director in the IU D2I and another principal investigator on the project.

“In defining new methods of non-consumptive research of the HathiTrust digital corpus, the HathiTrust Research Center and the IU Data to Insight Center are enabling research faculty and the HathiTrust partner libraries to engage in groundbreaking new research across the corpus while maintaining the security and integrity of the collection and the researcher’s fair-use access to its content,” said Brenda Johnson, Ruth Lilly Dean of Libraries at Indiana University.



Indiana State Library receives grant to digitize historic Indiana newspapers

The National Endowment for the Humanities (NEH) has awarded a $293,157 grant to the Indiana State Library to digitize Indiana’s historically significant newspapers. Indiana joins 25 states participating in the National Digital Newspaper Program, a partnership between the NEH, the Library of Congress and participating states to provide enhanced access to American newspapers published between 1836 and 1922. Newspapers digitized as part of this two-year project will be included in the Library of Congress’s Chronicling America Database (

“This grant is crucial to the State’s efforts to provide optimal public access to Indiana’s historical documents and cultural heritage,” said Jim Corridan, State Archivist and Associate Director of the Indiana State Library. “The State Library houses millions of copies of historic Hoosier newspapers and this initiative will enable Hoosiers instant access to these collections via the internet.”

The Indiana State Library will be assisted on the project by an advisory group of representatives from the Indiana Commission on Public Records, the Indiana Historical Bureau, Ball State University, the Hoosier Press Foundation, the Indiana Historical Society, the Indiana University School of Journalism and Indiana University Purdue University Indianapolis. The advisory group will develop criteria for inclusion of historic papers and ultimately select the newspapers to be digitized.

In addition to the Indiana papers presence in the Chronicling America Database, the digitized papers will also be available through Indiana Memory – a collaborative effort to provide access to the wealth of primary sources in Indiana libraries, archives, museums and other cultural institutions. Indiana Memory’s mission is to create and maintain a digital library that enables free public access to Indiana’s unique cultural and historical heritage. Through information and pictures found in digitized books, manuscripts, photographs, newspapers, maps and other digital materials available on the Indiana Memory web site, the program seeks to enhance education and scholarship of Indiana’s past. As a portal to the collections, Indiana Memory assists individuals to locate materials relevant to their interests and to better appreciate the connections between those materials.

Indiana State Library:

Indiana Memory:

Full press release:

Digital Gough Map launched: earliest medieval map of Britain

A 15-month research project of the earliest surviving geographically recognizable map of Great Britain, known as the Gough Map, provides some revealing insights into one of the most enigmatic cartographic pieces from the Bodleian collections. The findings are recorded on a newly launched web site:

The 15-month AHRC-funded project used an innovative approach that explores the map’s “linguistic geographies”, that is the writing used on the map by the scribes who created it, with the aim of offering a re-interpretation of the Gough Map’s origins, provenance, purpose and creation of which so little is known.

Although the identity of the map maker is unknown, it is now possible to reveal that the text on the Gough Map is the work of at least two scribes: the original fourteenth-century scribe and a fifteenth-century reviser.

One of the key investigations based on historical reference and the handwriting on the map was to date the map more accurately. The project has discovered that the map was made closer to 1,375, rather than in 1,360 as was previously thought.

There are visible differences between recorded details in Scotland and England. For example: the text written by the original scribe is best preserved in Scotland and the area North of Hadrian’s Wall, whereas the text written by the reviser is found in Southeastern and Central England. The buildings in Scotland do not have windows and doors, whereas in the revised part of the map, essentially everywhere South of Hadrian’s Wall, most buildings have both windows and doors.

Throughout, towns are shown in some detail, the lettering for London and York coloured gold, while other principal medieval settlements such as Bristol, Chester, Gloucester, Lincoln, Norwich, Salisbury and Winchester are lavishly illustrated.

One of the key outcomes of Linguistic Geographies is to make available online a searchable version of the Gough Map based upon a digital image of the map. The web site features a zoomable, pan-able digital version of the Gough Map – this map is fully searchable and browse-able by place name (current and medieval), and also by geographical features. Once clicking on a chosen location, information regarding that location’s geographical appearance, etymology, appearance on earlier maps, and much, much more is revealed.

The web site also includes a series of scholarly essays discussing the map; latest news about the project and a blog, among others.

Nick Millea, Bodleian Map Librarian, said: “The project team was keen to ensure that our research findings reach the widest possible audiences, not least because maps are enduringly popular objects and always capture the imagination; medieval maps especially. To this end one of the main project outcomes is this web-resource through which the Gough Map is made more widely accessible. We hope this will help others to develop other lines of enquiry on medieval maps and mapmaking, whether in academic or non-academic sectors, as well as provide greater levels of access to the Gough Map, enhancing its world-wide significance in the history of cartography.”

This is an outcome of the collaborative project involving Queen’s University Belfast, King’s College London and the Bodleian Libraries.

Gough Map:

WorldCat Identities Network maps connections among WorldCat Identities

The WorldCat Identities Network gives users the opportunity to visually explore the interconnectivity and relationships between WorldCat Identities.

WorldCat Identities creates a summary page for every name in WorldCat, including people, things (e.g. the Titanic), fictitious characters (e.g. Harry Potter) and corporations (e.g. IBM). The WorldCat Identities Network uses the WorldCat Identities Web Service and the WorldCat Search API to create an interactive Related Identity Network Map for each identity in the WorldCat Identities database. The Identity Maps can be used to explore the interconnectivity between WorldCat Identities.

The WorldCat Identities API provides up to ten related identities for each identity. These related identities are then displayed as a visual network map that allows users to easily jump from identity network to identity network by clicking the identity name on the map.

The WorldCat Search API is also used to provide added information for each relationship displayed on the Identity Network Map. Each identity on the map has a corresponding entry below the map that uses the WorldCat Search API to display titles found in WorldCat that reference the two related identities.

This application was developed primarily by JD Shipengrover, Senior Web & User Interface Designer, and Senior Software Engineer Jeremy Browning.

The WorldCat Identities Network is available at:

More information: WorldCat Identities Network activity page:

WorldCat Identities (descriptive article):

WorldCat Identities API landing page (technical information):

WorldCat Search API landing page (technical information):

EOS International announces “Museum Connect” community connection program

EOS International, a library information and knowledge management software and services provider, has announced a new community connection program, “Museum Connect”. The EOS.Web Museum-Connect Program allows museums, historical societies, botanical gardens, archives and other organizations to share library records, information, knowledge, ideas and easily collaborate with similar organizations. The EOS.Web Museum-Connect Program can create for EOS.Web clients a secure, sociable community of libraries with similar interests, issues and goals.

The EOS.Web Museum-Connect Program is based on EOS’s flagship software platform EOS.Web which uses a cloud computing hosted environment. EOS.Web was the first, fully web-based, MARC 21 library solution to be built on Microsoft’s.Net technology and the SQL relational database.

The EOS.Web Museum-Connect Program is available to all EOS.Web clients at no additional charge. When you participate in the program, you immediately have access to other museums and similar organizations worldwide.

“The EOS.Web Museum-Connect Program will provide EOS.Web users with additional tools to increase productivity, reduce costs and facilitate community networking,” says Sal Provenza, Vice President of Global Sales and Marketing.

Currently EOS.Web is used by institutions including the Museum of London, San Diego Air & Space Museum, Denver Museum of Nature & Science, The McNay Art Museum Library and the Seattle Art Museum Library.

EOS International:

EBSCO and Innovative Interfaces demonstrate partnership with EDS™ and Encore™

In an effort to improve accessibility and usability for customers requesting various ports of entry into EBSCO Discovery Service™ (EDS), EBSCO Publishing (EBSCO) and Innovate Interfaces, Inc. (Innovative) are working to develop improved access into these resources via Innovative’s Encore™ discovery platform. The result of the partnership is a smarter experience for mutual customers and library users who begin their search from Encore. EDS will be available via a dedicated API with enhanced functionality.

Executive Vice President of Technology and Chief Information Officer for EBSCO Publishing, Michael Gorrell, says a dedicated API allows users accessing EDS through the Encore platform access to the valuable resources available in EDS. “EDS has quickly become the discovery service for hundreds of universities around the world. EDS provides a full-featured experience for end-users. In other words, we have brought together a comprehensive index and a single-search approach, but we also offer a true academic and powerful environment in order to facilitate a comprehensive discovery experience. As much as we’ve invested into our native user interface, we also must accommodate the users who may start their research on a partner platform such as Encore. This agreement lets users who are in the Encore platform access the power of EDS from within the Encore user environment.”

For Innovative, partnering with EBSCO to satisfy the needs of mutual customers is a key strategy for the company. This most recent announcement builds on a long-standing relationship with EBSCO in which the two companies continue to explore areas where collaboration would benefit libraries. According to Innovative Interfaces: Vice President, Encore Division, John McCullough, Encore users will benefit from this arrangement. “Our goal with Encore is to offer users the most successful library discovery experience possible and providing access to services like EDS from within Encore is a fundamental part of that strategy.”

Encore Synergy’s Services-Oriented Architecture seamlessly integrates powerful discovery services like EDS into the user experience without the limitations and tradeoffs of “one size fits all” systems. McCullough expanded on the partnership saying, “with libraries facing an increasingly complex world of content and technology, it’s a Web imperative that discovery systems interoperate and scale collaboratively to ensure that libraries have the freedom to choose best-of-breed functionality for their users.”

Innovative’s Encore:


Single Search: The Quest for the Holy Grail – report from OCLC Research

Single Search: The Quest for the Holy Grail, a new report from OCLC Research, highlights successful strategies in providing a single point of access to library, archive and museum collections.

In the era of global search engines, users are often puzzled by the realization that they can search the internet through a single interface, yet the resources of universities and other institutions are often compartmentalized in a plethora of informational silos, each with its own dedicated system, search categories and user interfaces. Many institutions want to make the breadth of their local resources easily discoverable regardless of where and how the resources are managed.

To address this desire, OCLC Research facilitated a working group of nine single-search implementers through discussions about the opportunities for, and obstacles to, integrating single-search access across an institution. Members of this group told their stories, identified issues, and acknowledged similarities and differences in their approaches. This brief report summarizes those discussions and highlights the emerging practices in providing single-search access to an institution’s collections. The goal of the report is to foster successful single-search implementations by sharing the experience of the working group with those who are just beginning to plan single-search implementations.

This report is the latest in a series of OCLC Research reports about how to increase access to special collections that have resulted from our work under the thematic focus of Mobilizing Unique Materials.

Read the report, Single Search: The Quest for the Holy Grail at:

Learn more about the Single Search for Library, Archive and Museum Collections project:

Future direction of the Federal Depository Library Program (FDLP): modeling initiative

From the FDLP Desktop (

“In September 2010, the US Government Printing Office (GPO) contracted with Ithaka S+R (Ithaka) to develop practical and sustainable models for the Federal Depository Library Program (FDLP) to continue to fulfill its mission in a changing information environment now dominated by digital technology. These models were intended to serve as a guide in planning the future direction of the Program. After careful review it was determined that the models presented by Ithaka are not practical and sustainable to meet the mission, goals, and principles of the FDLP. These models have some value as we move forward together with the library community to develop new models based on a shared vision which will increase flexibility for member libraries and ensure the vibrant future of the Program in the digital age.”

“The archived version of the Ithaka Web site that was created as part of this study, as requested under the terms of the contract, is available in PDF format ( GPO appreciates the comments that were submitted by members of our community during Ithaka’s study of the FDLP. […] We look forward to obtaining comments and feedback from more participants in our depository library network. We plan to use these comments as part of the foundation to build on as we continue our future visioning and modeling process.”

Ithaka has provided a PDF of their final report. The final report is accompanied by a statement from GPO (

Comments were solicited until September 16, 2011 at the “Future direction of the FDLP: modeling initiative” home page ( Comments submitted through this site will be made available on the FDLP Desktop in preparation for the Thursday, October 20, 2011 day-long discussion, “Creating our shared vision: roles and opportunities in the FDLP.”

Kindle Cloud Reader advances Amazon’s “Buy Once, Read Everywhere” mission

For over two years, Amazon has been offering a wide selection of free Kindle reading apps that enable customers to “Buy Once, Read Everywhere.” Customers can already read Kindle books on the largest number of the most popular devices and platforms, including Kindles, iPads, iPhones, iPod touches, PCs, Macs, Android phones and tablets and BlackBerrys. In August 2011, announced Kindle Cloud Reader, its latest Kindle reading application that leverages HTML5 and enables customers to read Kindle books instantly using only their web browser – online or offline – with no downloading or installation required. As with all Kindle apps, Kindle Cloud Reader automatically synchronizes the customer’s Kindle library, as well as the last page read, bookmarks, notes and highlights for all of the reader’s Kindle books, no matter how the reader chooses to read them. Kindle Cloud Reader with its integrated touch optimized Kindle Store is now available for Safari on iPad, Safari on desktop and Chrome.

“We are excited to take this leap forward in our ‘Buy Once, Read Everywhere’ mission and help customers access their library instantly from anywhere,” said Dorothy Nicholls, Director, Amazon Kindle. “We have written the application from the ground up in HTML5, so that customers can also access their content offline directly from their browser. The flexibility of HTML5 allows us to build one application that automatically adapts to the platform you’re using – from Chrome to iOS. To make it easy and seamless to discover new books, we’ve added an integrated, touch optimized store directly into Cloud Reader, allowing customers one click access to a vast selection of books.”

Features of Kindle Cloud Reader include:

  • An immersive view of your entire Kindle library, with instant access to all of your books.

  • Ability to start reading over 950,000 Kindle books instantly within your browser.

  • An embedded Kindle Store optimized for your web browser makes it seamless to discover new books and start reading them instantly.

  • Your current book is automatically made available for offline use, and you can choose to save a book for reading offline at any time.

  • Select any book to start reading, customize the page layout to your desired font size, text color, background color and more.

  • View all of the notes, highlights and bookmarks that you have made on other Kindle apps or on Kindle.

Kindle Cloud Reader will be available on additional web browsers, including Internet Explorer, Firefox, the BlackBerry PlayBook browser and other mobile browsers, in the coming months. customers can start reading their Kindle books immediately using Kindle Cloud Reader at:

OCLC to offer Atlas Systems’ free electronic document delivery software

Atlas Systems has announced that OCLC will extend its suite of resource sharing services with Odyssey™ 3.0, the new version of Atlas’ free stand-alone electronic delivery software. Odyssey complements the OCLC ILLiad™ Resource Sharing Management Software that was developed by Atlas Systems and is now distributed by OCLC.

The stand-alone version of Odyssey allows sites to send and receive electronic documents to and from other Odyssey sites, OCLC ILLiad sites, and any other supplier’s software that supports the Odyssey protocol. “Odyssey 3.0 features the ability to send and receive PDF files and allows for users of the stand-alone module to be ‘trusted senders’ with ILLiad partners, resulting in even faster delivery times,” says Genie Powell, Chief Customer Officer at Atlas Systems.

“In addition, we’ve made it easier to set up and administer Odyssey, making the free software even more attractive.”

“The Odyssey stand-alone represents an outstanding opportunity for OCLC members to expand their resource sharing using a free application,” says Katie Birch, Director, OCLC Delivery Services. “The partnership between OCLC and Atlas Systems continues to provide industry-leading software opportunities to the resource sharing community.”

Atlas Systems:

Related articles