Search results
1 – 10 of over 23000Miquel Termens, Mireia Ribera and Anita Locher
The purpose of this paper is to analyze the file formats of the digital objects stored in two of the largest open-access repositories in Spain, DDUB and TDX, and determines the…
Abstract
Purpose
The purpose of this paper is to analyze the file formats of the digital objects stored in two of the largest open-access repositories in Spain, DDUB and TDX, and determines the implications of these formats for long-term preservation, focussing in particular on the different versions of PDF.
Design/methodology/approach
To be able to study the two repositories, the authors harvested all the files corresponding to every digital object and some of their associated metadata using the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) and Open Archives Initiative Object Reuse and Exchange (OAI-ORE) protocols. The file formats were analyzed with DROID software and some additional tools.
Findings
The results show that there is no alignment between the preservation policies declared by institutions, the technical tools available, and the actual stored files.
Originality/value
The results show that file controls currently applied to institutional repositories do not suffice to grant their stated mission of long-term preservation of scientific literature.
Details
Keywords
Roland Erwin Suri and Mohamed El-Saad
Changes in file format specifications challenge long-term preservation of digital documents. Digital archives thus often focus on specific file formats that are well suited for…
Abstract
Purpose
Changes in file format specifications challenge long-term preservation of digital documents. Digital archives thus often focus on specific file formats that are well suited for long-term preservation, such as the PDF/A format. Since only few customers submit PDF/A files, digital archives may consider converting submitted files to the PDF/A format. The paper aims to discuss these issues.
Design/methodology/approach
The authors evaluated three software tools for batch conversion of common file formats to PDF/A-1b: LuraTech PDF Compressor, Adobe Acrobat XI Pro and 3-HeightsTM Document Converter by PDF Tools. The test set consisted of 80 files, with 10 files each of the eight file types JPEG, MS PowerPoint, PDF, PNG, MS Word, MS Excel, MSG and “web page.”
Findings
Batch processing was sometimes hindered by stops that required manual interference. Depending on the software tool, three to four of these stops occurred during batch processing of the 80 test files. Furthermore, the conversion tools sometimes failed to produce output files even for supported file formats: three (Adobe Pro) up to seven (LuraTech and 3-HeightsTM) PDF/A-1b files were not produced. Since Adobe Pro does not convert e-mails, a total of 213 PDF/A-1b files were produced. The faithfulness of each conversion was investigated by comparing the visual appearance of the input document with that of the produced PDF/A-1b document on a computer screen. Meticulous visual inspection revealed that the conversion to PDF/A-1b impaired the information content in 24 of the converted 213 files (11 percent). These reproducibility errors included loss of links, loss of other document content (unreadable characters, missing text, document part missing), updated fields (reflecting time and folder of conversion), vector graphics issues and spelling errors.
Originality/value
These results indicate that large-scale batch conversions of heterogeneous files to PDF/A-1b cause complex issues that need to be addressed for each individual file. Even with considerable efforts, some information loss seems unavoidable if large numbers of files from heterogeneous sources are migrated to the PDF/A-1b format.
Details
Keywords
E.G. Sieverts, M. Hofstede, Ph.H. Haak, P. Nieuwenhuysen, G.A.M. Scheepsma, L. Veeger and G.C. Vis
This article lists and compares specifications, properties, and test results of microcomputer software for information storage and retrieval. Nine different programs which fall…
Abstract
This article lists and compares specifications, properties, and test results of microcomputer software for information storage and retrieval. Nine different programs which fall into the category of classical retrieval systems (see Part I of this series) have been tested and assessed: BIB/SEARCH, CARDBOX‐PLUS, CDS/ISIS, FREEBASE, HEADFAST, IDEALIST, INMAGIC, NUTSHELL‐PLUS, and POLYDOC. All of them run under MS‐DOS. For each of these nine programs about 100 facts and test results are tabulated. Each program is also discussed individually.
Downloading and uploading offer labour‐saving advantages and are now accepted as useful options in online searching. All aspects are here considered, from recent technical…
Abstract
Downloading and uploading offer labour‐saving advantages and are now accepted as useful options in online searching. All aspects are here considered, from recent technical advances, applications and legal attitudes. There is also a review of current software for downloading. Recent developments mean a trend to higher internal memory and storage capacity, and greater transmission speeds. Packages now offer access to more than one host, give maximum assistance to the user without being menu‐driven and incorporate the latest developments in artificial intelligence. Disadvantages are in the length of time involved in the process and the fact that the legal issue of copyright has not yet been finalised. Database producers have turned to licensing under contract law, but there is still need to rely on user ethics, and the need for a standard permissions form is highlighted.
Details
Keywords
Alexandra Dolan-Mescal, Marcie Farwell, Sara Howard, Jessica Rozler and Matthew Smith
This paper aimed to conduct an inventory of digital resources for the Queens College Special Collections and Archives and had two purposes. The first was to assess the digital…
Abstract
Purpose
This paper aimed to conduct an inventory of digital resources for the Queens College Special Collections and Archives and had two purposes. The first was to assess the digital resources for a department too understaffed to address digital preservation and to provide a step-by-step program for them to start thinking in the long-term. The second was to show how these steps can be generalized for many institutions just starting to have digital holdings and looking to create a long-term digital preservation plan.
Design/methodology/approach
The main method for research involved taking a significant sampling of the department’s digital holdings and conducting an inventory of them, analyzing such characteristics as file size, names, formats and metadata. After the inventory was conducted, recommendations were made to the department based on best practices in the field of digital preservation.
Findings
We found that while the department generally does not follow industry-best practices for preservation, the files were relatively new and, therefore, many issues could still be fixed. With a concrete plan and a bit of effort, their digital files can be more easily accessible and protected against future threats.
Originality/value
The issues that the Department of Special Collections had with their digital holdings are similar to those at many other institutions – especially educational ones where staff turnover is high. This case study could help similar small organizations start to access their digital holdings and start formulating a plan for long-term preservation.
Details
Keywords
The emergence of online databases represents a shift from providing a physical entity, a book or an article, to the more abstract concept of providing or transferring information…
Abstract
The emergence of online databases represents a shift from providing a physical entity, a book or an article, to the more abstract concept of providing or transferring information. The role of the database developer/analyst in that shift is that of an information retrieval ‘cataloger’ responsible for determining the access points supported by the database's contents, much as a traditional library cataloger defines, describes, and classifies the intellectual content of a book and ‘maps’ it into the library's card catalog. This is only one of the several parallels between the functions of an information retrieval service and a traditional library. For example, users ‘check‐out’ information from both, but while a circulation staff shifts the collection to accommodate growth, a retrieval service updates databases and allocates additional disk space to allow for expansion. Describing the tasks required in developing a database for online searching is the purpose of this paper. The requisite tasks for database development are: analysis, design, conversion, testing, loading, and documentation. Analysis involves a determination of file format (fixed, stream, directory) across all years of the file. Design requires understanding file content, the needs of end user, and retrieval system standards and features. Conversion is accomplished by file generation programs that convert input data into the searchable and printable fields of an online database. Testing consists of debugging the conversion program and adjusting the original design to accommodate aberrant data conditions. Loading is the creation, by the file generation programs, of disk files to be accessed by the retrieval system. Documentation is the transfer of experience and knowledge about file content and system feature from file designer to file user. The task of designing databases for an online information retrieval service requires more than data processing expertise. It also requires an intellectual understanding of the information‐seeking behavior and needs of the users of the database in general, and users in the subject area in particular. Information professionals from outside the purely EDP area are requisite to support the technical analysis, design, and development of databases for online searching. For it is upon their broad‐based understanding, translated technically into access points, database and system features that any information retrieval service bases its successful operation. Database development, then, is the hub of the wheel in such a service, much as descriptive cataloging and subject classification are the intellectual underpinning of libraries, upon which all other services are based.
Increasing amounts of the information in libraries are created and stored in digital formats. These files may or may not be accessible five or ten years from now.
Abstract
Purpose
Increasing amounts of the information in libraries are created and stored in digital formats. These files may or may not be accessible five or ten years from now.
Design/methodology/approach
Describes the steps we can take to minimize the danger of losing our data to the ravages of time. The aim of this article is to look at the vulnerabilities of various file formats and storage media, and offer practical advice for preservation planning.
Findings
The only protection libraries possess is planning.
Originality/value
Stresses the need to have a disaster recovery plan in place.
Details
Keywords
E.G. Sieverts, M. Hofstede, A. Nieuwland, C. Groeneveld and B. de Zwart
In this article, the sixth in a series on microcomputer software for information storage and retrieval, test results of nine programs are presented and various properties and…
Abstract
In this article, the sixth in a series on microcomputer software for information storage and retrieval, test results of nine programs are presented and various properties and qualities of these programs are discussed. We discuss additional programs for information storage and retrieval and for text retrieval from several of the various categories which have been looked at in previous instalments. One new (secondary) type of 1SR software is defined as administrative software. The programs reviewed in this issue are BRS‐Search, dtSearch, InfoBank, Micro‐OPC, Q&A, STN‐PFS, Strix, TINman and ZYindex. All but dtSearch and ZYindex can be regarded as primarily classical retrieval packages; Q&A boasts comprehensive administrative features as well; dtSearch and ZYindex are indexing programs. For ZYindex a new Windows version has been tested. All other programs run under MS‐DOS. For each of the nine programs about 100 facts and test results are tabulated. All the programs are individually discussed as well.
No single word processing program suits everybody equally well. The same can be said for spreadsheets, databases, drawing packages and even computers. Different people prefer to…
Abstract
No single word processing program suits everybody equally well. The same can be said for spreadsheets, databases, drawing packages and even computers. Different people prefer to use different tools to create and manipulate machine‐readable data, and sometimes people need to share data with others or move data from one application to another. Not only is it silly to re‐enter data already in machine‐readable form, it may not be necessary. The author discusses categories of file and data conversion, with some examples of techniques available for conversion. He also discusses “interoperability,” an awful word for the useful ability to operate on the same data from more than one program—not merely convert data, but use and modify it without conversion. The author includes a sidebar on The Trailing Edge, his new column that will appear in Library Hi Tech beginning with the next issue.
This article describes the evolution of the design of the FCLA digital archive, a preservation repository under development for the libraries of the public universities of…
Abstract
This article describes the evolution of the design of the FCLA digital archive, a preservation repository under development for the libraries of the public universities of Florida. The starting assumptions of the designers were challenged as they moved from theory towards implementation. The logic leading to changes in policy and in preservation strategies is described.
Details