CitationDownload as .RIS
Emerald Group Publishing Limited
Copyright © 2011, Emerald Group Publishing Limited
New & Noteworthy
Article Type: New & Noteworthy From: Library Hi Tech News, Volume 28, Issue 1
Google Books Ngram Viewer: visualizing cultural trends
In December 2010, Jon Orwant, Engineering Manager, Google Books, announced on the Google blog a new visualization tool called the Google Books Ngram Viewer, produced by Matthew Gray and intern Yuan K. Shen, and available on Google Labs. The Ngram Viewer lets you graph and compare phrases from these datasets over time, showing how their usage has waxed and waned over the years. One of the advantages of having data online is that it lowers the barrier to serendipity: you can stumble across something in these 500 billion words and be the first person ever to make that discovery.
Examples of queries that can be mapped by the Ngram Viewer:
World War I, Great War;
child care, nursery school, kindergarten; and
fax, phone, and e-mail.
Google has also made the datasets backing the Ngram Viewer freely downloadable so that scholars will be able to create replicable experiments in the style of traditional scientific discovery.
Scholars interested in topics such as philosophy, religion, politics, art, and language have employed qualitative approaches such as literary and critical analysis with great success. As more of the world’s literature becomes available online, it is increasingly possible to apply quantitative methods to complement that research. Since 2004, Google has digitized more than 15 million books worldwide. The datasets Google has now made available to further humanities research are based on a subset of that corpus, weighing in at 500 billion words from 5.2 million books in Chinese, English, French, German, Russian, and Spanish. The datasets contain phrases of up to five words with counts of how often they occurred in each year.
These datasets were the basis of a research project led by Harvard University’s Jean-Baptiste Michel and Erez Lieberman Aiden published in Science in December. The research paper, coauthored by several Googlers and entitled Quantitative Analysis of Culture Using Millions of Digitized Books, provides several examples of how quantitative methods can provide insights into topics as diverse as the spread of innovations, the effects of youth and profession on fame, and trends in censorship. The researchers constructed a corpus of digitized texts containing about 4 percent of all books ever printed. Analysis of this corpus enables researchers to investigate cultural trends quantitatively, and to survey the vast terrain of “culturomics”, focusing on linguistic and cultural phenomena that were reflected in the English language between 1800 and 2000. The results show how this approach can provide insights about fields as diverse as lexicography, the evolution of grammar, collective memory, the adoption of technology, the pursuit of fame, censorship, and historical epidemiology. “Culturomics” extends the boundaries of rigorous quantitative inquiry to a wide array of new phenomena spanning the social sciences and the humanities.
Google Books Ngram Viewer: web site: http://ngrams.googlelabs.com/
Quantitative Analysis of Culture Using Millions of Digitized Books: www.sciencemag.org/content/early/2010/12/15/science.1199644.abstract
EZproxy hosted service now available from OCLC
The library community’s leading authentication and access solution is now available as a cloud-based hosted service. The new hosted version of EZproxy makes it even easier for libraries to deliver eContent and make services accessible to their users wherever they are, at any time. A pilot version of the hosted service has been active with five participating libraries since April 2009.
“The hosted version of EZproxy will help truly deliver on the promise of its name – a quick and easy installation of a library proxy server for any size library,” explains Don Hamparian, OCLC Senior Product Manager for EZproxy:
A hosted solution provides 24/7 reliability and access – with minimal technical support required from the library. Our goal is to help more libraries manage and provide one-click access to electronic content for their authorized users. Libraries who subscribe to the EZproxy hosted service receive additional benefits such as:
timely addition of new databases;
reduced reliance on technical staff for initial configuration or ongoing configuration file changes;
a secure environment and security for user information;
24/7/365 access monitoring and reporting on usage;
elimination of local proxy server (or other hardware) maintenance; and
automatic updating for bi-annual enhancements.
EZproxy hosted service is available as a yearly subscription, based on FTE or population served. All hosted implementations will run the latest release of EZproxy, currently version 5.3. Current EZproxy client users can use their existing configuration files when moving to the hosted service. New EZproxy users also receive up to 10 hours of configuration time in the first year’s subscription. OCLC supplies the security certificate for libraries who subscribe to the hosted service.
The hosted version of EZproxy is currently available for libraries in the USA and Canada. Hosted services will be available for additional regions at a later date.
EZproxy hosted service: www.oclc.org/ezproxy/hosted/
SOAP project (study of open access publishing) presents final results
The SOAP project, presented its final results on January 13, 2011 in Berlin to an audience of publishers, librarians, and funding agencies, including the European Commission.
The project, which runs from March 2009 to February 2011 describes and analyses the open access (OA) publishing landscape and provides facts and evidence allowing libraries, publishers, and funding agencies to assess drivers and barriers, risks, and opportunities in the transition to OA publishing. The project is co-funded by the European Commission comprises publishers (BioMed Central, SAGE Publication Ltd, Springer Science, and Business Media), research institutions (CERN – European Organization for Nuclear Research, Max Planck Society) and funding agencies (STFC – Science and Technology Facilities Council, UK).
The event showcased the main findings of the project regarding:
the OA publishing landscape (www.slideshare.net/ProjectSoap/soap-symposiumtalki);
the beliefs and attitudes of researchers in respect of OA publishing (www.slideshare.net/ProjectSoap/soap-symposiumtalkii); and
the drivers and the barriers for the submission of articles to OA journals (www.slideshare.net/ProjectSoap/soap-symposiumtalkiii).
The key findings of the project are:
The number of OA articles published in “full” or “hybrid” OA journals was around 120,000 in 2009, some 8-10 percent of the estimated yearly global scientific output (see also http://arxiv.org/abs/1010.0506). Journals offering a “hybrid” OA option had a take-up of around 2 percent.
OA journals in several disciplines (including Life Sciences, Medicine, and Earth Sciences) are of outstanding quality, and have impact factors in the top 1-2 percent of their disciplines.
Out of some 40,000 published scholars who answered a large-scale online survey, approximately 90 percent are convinced that OA journals are or would be beneficial for their field. The main reasons given for this view are: benefit for the scientific community as a whole; financial issues; public good; and benefit to the individual scientist. The vast majority disagrees with the idea that OA journals are either of low quality or undermine the process of peer review.
A separate survey of scientists who published in OA journals reveals that their drivers for this choice were the free availability of the content to readers and the quality of the journal, as well as the speed of publication and, in some cases, the fact that no fee had to be paid directly by the author.
The main barriers encountered by 5,000 scientists who would like to publish in OA journals but did not manage to do so are funding (for 39 percent of them) and the lack of journals of sufficient quality in their field (for 30 percent).
Several speakers commented on the SOAP results during the event.
Bettina Goerner, from project-partner Springer, who recently launched a range of fully OA journals under the SpringerOpen brand, found the SOAP results encouraging, while remarking that the publisher’s experience of marketing SpringerOpen to the scientific community matched the SOAP finding that funding is still the biggest obstacle to OA publishing.
Deborah Kahn, from project-partner BioMed Central, commented that the SOAP research on the supply side shows a healthy OA publishing landscape, while BioMed Central and other main OA publishers have recently experienced around 100 percent growth in submission and publication numbers. She remarked that the SOAP results imply further efforts are needed to raise awareness of the quality of OA journals and the availability of funding and waiver schemes, so that no one need be barred from publishing due to lack of funds. Finally, she noted that the SOAP findings indicate stakeholders in OA need to continue to work with funders to ensure the availability of research funds to cover publication fees.
David Ross from project-partner SAGE remarked among the SOAP results that there was as much appetite for OA in the Social Science and Humanities (SSH) than other disciplines – if not more, and that those from these disciplines who publish OA do not generally pay to do so – although many of those that do had their fees covered by their institution. He stated that, it was SAGE’s challenge to work with institutions and funding agencies to find a way to enable SSH authors to publish OA as reflected in their recently launch of SAGE Open.
Mark Patterson from the Public Library of Science noted that on the “supply” side, the SOAP data support the view that OA is an established (and growing) part of the publishing landscape while on the “demand” side, while researchers are very supportive of OA, publishers, funders and institutions must address the funding flows that are necessary to support OA publishing and drive its widespread adoption.
Peter Strickland from the International Union of Crystallography, which publishes a large OA journal, commented that in their experience journal quality and impact factor are more important for authors than OA when selecting a journal to publish in and that for a small publisher. For a small publisher, the SOAP data and results are very helpful in shaping communication with potential authors.
Caroline Sutton, president of the Open Access Scholarly Publishers Association (OASPA), concluded that many of the results confirm what was intuitively felt in the community, while some results –, e.g. that 90 percent of Humanities researchers feel that OA would benefit their field – were surprising. She noted in particular three findings of SOAP: that a minority of researchers continue to have doubts about the quality of OA; that funding is difficult to come by for many researchers across academic fields, even in relatively mature OA disciplines such as the biological sciences; and that more information is needed to better understand the long tail of OA journals published as the sole journal of a publisher. She concluded that the SOAP results will provide an excellent benchmark against which future studies can be compared.
During the event, several funding agencies commented on the SOAP results. Robert Kiley from the Wellcome Trust mentioned that they validated three action items for funding agencies to promote OA: to have clear OA policies, enforce them appropriately and work to communicate the benefits of OA to researchers; to make it easier for researchers to access funding to cover OA publishing costs; and to develop better metrics for assessing research outputs. Heinz Pampel from the Helmholtz Association found that the SOAP results translated “feelings” into “facts” and gave the Association a good basis for the further development of their gold OA strategy. Celina Ramjoué of the European Commission, SOAP Project Officer, concluded that the SOAP results illustrate that funding agencies need policies on OA to mainstream this issue in a way linked with, and strengthened by, national OA policies and strategies. In addition, funding agencies should address the issue of OA publishing and how their evaluation systems work in a broader sense, with particular attention to the dominant role of impact factors.
An article describing the highlights of the SOAP Survey has been posted on the available at: http://arxiv.org/abs/1101.5260. It also marks the official release of the data of the SOAP survey under a CC0 waiver. A manual describing the data is available at: http://bit.ly/gI8nct while data in compressed CSV format are available at: http://bit.ly/gSmm71
Berlin8 open access conference papers available
In October 2010, the Berlin8 open access conference was held outside Europe for the first time, in Beijing, China. It was co-hosted by Chinese Academy of Sciences and Max Planck Society, co-organized by National Science Library, CAS and Max Planck Digital Library. The theme of the 2010 conference: implementation progress, best practices, and future challenges.
The goals of the Berlin8 conference are to share and discuss the strategies, policies, implementation mechanisms, sustainable infrastructures, and international collaboration for open access to information, especially those by governmental and funding agencies, research and education institutions, scholarly communications and knowledge organizations; and to promote effective and sustainable open access in today’s and future digital research, education, and cultural environments.
Materials from the Berlin8 open access conference, including posters, presentation slides, photos and Media reactions are now available the conference web site.
Berlin8 open access conference web site: www.berlin8.org
Inspiring research, inspiring scholarship: the value of digitised resources
The UK’s JISC (Joint Information Systems Committee, supporting the use of ICT in Higher and Further Education) has recently released a new report, Inspiring Research, Inspiring Scholarship, looking at the value and impact of digitised resources. Written by Simon Tanner of King’s College London, it considers four broad areas in which the creation of digitised resources have has a significant impact.
The four themes are:
Inspiring research. Digitised resources not only improve access but enable new types of research to be asked, such as the Data Mining with Criminal Intent project that is based on the Proceedings of the Old Bailey, 1674-1913 – www.oldbaileyonline.org/
Bestowing economic benefits. The digitisation of journals, such as the Wellcome Trust Medical Journal Backfiles project, provides free and immediate access for scientists. One digitised journal, the Biochemical Journal, receives over 300,000 uses a month – www.jisc.ac.uk/whatwedo/programmes/digitisation/medicaljournals.aspx
Connecting people and communities. Resources such as Great War Archive, gathering digitised memorabilia from World War I, not only provide new material for scholars, but enable new communities and expertise to be developed outside the campus walls – www.oucs.ox.ac.uk/ww1lit/gwa/
Digital Britain. Digitising some of Britain’s special collections not only provides new data for educators and learners around the world, but also for a greater appreciation of the nation’s “prize jewels”; examples include the Freeze Frame collection of polar photographs, or the Old Weather resource for measuring and transcribing weather reports in Naval logbooks – www.freezeframe.ac.uk; www.oldweather.org/
Inspiring Research, Inspiring Scholarship (full report): http://bit.ly/9NjGw6 (pdf file).
More JISC-supported content is available via: www.jisc-content.ac.uk/
Quality in large-scale digitization: UM Professor receives IMLS grant for research
University of Michigan School of Information Associate Professor Paul Conway has received a two-year National Leadership Grant from the federal Institute of Museum and Library Services (IMLS) to support his research on validating quality in large-scale digitization.
Conway’s project, “Validating Quality in Large-Scale Digitization: Metrics, Measurement, and Use-Cases,” grows out of a planning project for evaluating the quality of digital objects in the HathiTrust Digital Library, for which he received an Andrew Mellon Foundation grant in 2009. HathiTrust is a shared digital repository for the nation’s great research libraries with over 7 million digitized volumes.
Mass digitization of books and serials is generating vast digital collections and transforming education and research at all levels. But this transformation has also given rise to questions about the quality and value of some of the digital copies produced by such large-scale projects.
Conway’s project will rigorously address some of these questions:
This research project is an effort to learn what quality means for users in a large-scale digitization program. IMLS’s valuable support will help us take a major step toward automating quality review and sharing the characteristics of digitized books and journals. The School of Information and the University of Michigan Library, in collaboration with HathiTrust and the University of Minnesota Libraries, will investigate possible methods for detecting and measuring errors and other quality issues within mass-digitized literature. They will also analyze the potential impact of found errors on educational and scholarly use within a representative set of use cases: reading online; printing copies; mining texts; and managing print collections.
The findings of this study will make a significant contribution to the field of information quality, and will inform digital repositories about assessing the quality of objects they have committed to preserving on a large-scale:
“Understanding how to judge the quality of the HathiTrust digital deposits will help libraries make decisions about re-digitization of materials and about managing collections of print volumes with secure and useable copies held securely in digital repositories,” said Paul Courant, Dean of Libraries at the University of Michigan. “Libraries will additionally be able to use the results of this research to regain control of what has been our hallmark of service for centuries: the quality and usefulness of the information we make available to our users.”
IMLS grants support projects that advance the ability of museums, archives, and libraries to preserve culture, heritage, and knowledge, and contribute to building 21st century technology infrastructure and information technology services. National Leadership Grants are the largest museum and library joint grant programs administered by IMLS. “National Leadership grantees help us better understand and advance best practice in museums, libraries, and archives,” said IMLS Acting Director Marsha L. Semmel.
Paul Conway is the faculty coordinator of the Preservation of Information specialization in the Master of Science in Information program at the University of Michigan’s School of Information. Before coming to SI, he served as director for information technology services and director for digital asset initiatives at Duke University and as head of the Preservation Department at Yale University Library.
Full press release: www.si.umich.edu/about-SI/paul_conway_imls_release.pdf
VRA Core data standard now hosted by library of congress
The VRA Core is a data standard for the description of works of visual culture as well as the images that document them. The standard is now being hosted by the Network Development and MARC Standards Office of the Library of Congress (LC) in partnership with the Visual Resources Association. VRA Core’s schemas and documentation are now accessible at: www.loc.gov/standards/vracore/ while user support materials, such as VRA Core examples, FAQs and presentations, will continue to be accessible at: www.vraweb.org/projects/vracore4/
In addition, a new listserv has been created called The Core list (VRACORE@LOC.GOV). The Core list is an unmoderated computer forum that allows users of the VRA Core community to engage in a mutually supportive environment where questions, ideas, and tools can be shared. The Core list is operated by the Library of Congress Network Development and MARC Standards Office. Users may subscribe to this list by filling out the subscription form at the VRACORE Listserv site.
Questions about the VRA Core’s schemas, documentation, and user support materials should be directed to email@example.com. Questions about the LC-hosted Core list should be directed to firstname.lastname@example.org.
VRA Core home: www.loc.gov/standards/vracore/
VRACORE listserv site: http://listserv.loc.gov/listarch/vracore.html
Digital preservation education web site offers guidance, resources for everyone
The North Carolina Department of Cultural Resources’ State Library and State Archives (Cultural Resources) have announced a new website to guide local and state government employees responsible for the preservation of our state’s public record. The web site: http://digitalpreservation.ncdcr.gov, has resources that can help North Carolina Government employees – and those responsible for digital information in general – learn how to ensure that today’s digital information is saved so that it can become tomorrow’s heritage.
For those new to the concept, tutorials explain what digital preservation is and why it is important, and a checklist of key digital preservation practices helps integrate digital preservation activities into day-to-day workflows. For those already incorporating digital preservation into their daily work, more advanced guidance is provided, including “quick tips”, tutorials, institutions and research projects to watch, and much more.
While the site is directed toward North Carolina public servants, it is general enough to be useful to those considering, implementing, or teaching on the topic. Anyone interested in learning more about digital preservation is invited to stop by the web site.
For more information, please visit: http://digitalpreservation.ncdcr.gov/
Using controlled vocabularies to enhance discovery and retrieval online
Patricia Harpring’s book, Introduction to Controlled Vocabularies: Terminology for Art, Architecture, and Other Cultural Works, is now available on the web. Each chapter is available as HTML and also as a printer-friendly PDF.
Patricia Harpring is Managing Editor of the Vocabulary Program at the Getty Research Institute. This detailed book is a “how-to” guide to building controlled vocabulary tools, cataloging and indexing cultural materials with terms and names from controlled vocabularies, and using vocabularies in search engines and databases to enhance discovery and retrieval online.
Also covered are the following: What are controlled vocabularies and why are they useful? Which vocabularies exist for cataloging art and cultural objects? How should they be integrated in a cataloging system? How should they be used for indexing and for retrieval? How should an institution construct a local authority file? The links in a controlled vocabulary ensure that relationships are defined and maintained for both cataloging and retrieval, clarifying whether a rose window and a Catherine wheel are the same thing, or how pot-metal glass is related to the more general term stained glass. The book provides organizations and individuals with a practical tool for creating and implementing vocabularies as reference tools, sources of documentation, and powerful enhancements for online searching.
Selected titles from the Getty Research Institute are available in electronic formats. Many are geared toward students and professionals working with cultural heritage resources and aim to advance knowledge of standards and best practices in the management of information in libraries, archives, and museums. These books can be read online and downloaded for future use; most are also available in print format for purchase.
Introduction to controlled vocabularies: www.getty.edu/research/publications/electronic_publications/intro_controlled_vocab/
Getty Research Institute Electronic publications: www.getty.edu/research/publications/electronic_publications/
Grey literature knowledge base now available via open access
As of mid-October 2010, the most current and comprehensive knowledge base in the field of grey literature is openly accessible to researchers, students of library and information science, as well as communities of information practitioners and professionals across science and technology.
For over 15 years, GreyNet has sought to serve researchers and authors in the field of grey literature. To further this end, GreyNet signed-on to the OpenSIGLE Repository and in so doing seeks to preserve and make openly accessible research results originating in the International Conference Series on Grey Literature. GreyNet together with INIST-CNRS designed the format for a metadata record, which encompasses standardized PDF attachments for full-text conference preprints, PowerPoint presentations, abstracts and biographical notes. All eleven volumes (1993-2009) of the GL Conference Proceedings are now available in the OpenSIGLE Repository.
Grey Literature Conference Proceedings: www.greynet.org/opensiglerepository.html
GreyNet International home: www.greynet.org/
Census Bureau current industrial report being digitally archived
Southern New Hampshire University (SNHU) is digitally archiving annual Current Industrial Reports from the US Census Bureau. The US Census Bureau’s Current Industrial Report (CIR) program provides annual measures of industrial activity. According to the US Census Bureau:
The primary objective of the CIR program is to produce timely, accurate data on production and shipments of selected products. The data are used to satisfy economic policy needs and for market analysis, forecasting, and decision-making in the private sector. The Current Industrial Reports data is being uploaded in phases, with all data from 1993-present scheduled to be available as of October 29, 2010. Historic data will be saved in the SNHU Academic Archive, the university’s institutional repository.
To search or browse the data, visit: http://academicarchive.snhu.edu/xmluit/handle/10474/570