Search results

1 – 10 of over 52000
Article
Publication date: 9 September 2014

Quan Lu, Gao Liu and Jing Chen

The purpose of this paper is to propose a novel approach to integrate portable document format (PDF) interface into Java-based digital library application. It bridges the gap…

Abstract

Purpose

The purpose of this paper is to propose a novel approach to integrate portable document format (PDF) interface into Java-based digital library application. It bridges the gap between conducting content operation and viewing on PDF document asynchronously.

Design/methodology/approach

In this paper, the authors first review some related research and discuss PDF and its drawbacks. Next, the authors propose the design steps and implementation of three modes of displaying PDF document: PDF display, image display and extensible markup language (XML) display. A comparison of these three modes has been carried out.

Findings

The authors find that the PDF display is able to completely present the original PDF document contents and thus obviously superior to the other two displays. In addition, the format specification of PDF-based e-book does not perform well; lack of standardization and complex structure is exposed to the publication.

Practical implications

The proposed approach makes viewing the PDF documents more convenient and effective, and can be used to retrieve and visualize the PDF documents and to support the personalized function customization of PDF in the digital library applications.

Originality/value

This paper proposes a novel approach to solve the problem between content operation and the view of PDF synchronously, providing users a new tool to retrieve and reuse the PDF documents. It contributes to improve the service specification and policy of viewing the PDF for digital library. Besides, the personalized interface and public index make further development and application more feasible.

Details

Library Hi Tech, vol. 32 no. 3
Type: Research Article
ISSN: 0737-8831

Keywords

Article
Publication date: 6 June 2018

Roland Erwin Suri and Mohamed El-Saad

Changes in file format specifications challenge long-term preservation of digital documents. Digital archives thus often focus on specific file formats that are well suited for…

1833

Abstract

Purpose

Changes in file format specifications challenge long-term preservation of digital documents. Digital archives thus often focus on specific file formats that are well suited for long-term preservation, such as the PDF/A format. Since only few customers submit PDF/A files, digital archives may consider converting submitted files to the PDF/A format. The paper aims to discuss these issues.

Design/methodology/approach

The authors evaluated three software tools for batch conversion of common file formats to PDF/A-1b: LuraTech PDF Compressor, Adobe Acrobat XI Pro and 3-HeightsTM Document Converter by PDF Tools. The test set consisted of 80 files, with 10 files each of the eight file types JPEG, MS PowerPoint, PDF, PNG, MS Word, MS Excel, MSG and “web page.”

Findings

Batch processing was sometimes hindered by stops that required manual interference. Depending on the software tool, three to four of these stops occurred during batch processing of the 80 test files. Furthermore, the conversion tools sometimes failed to produce output files even for supported file formats: three (Adobe Pro) up to seven (LuraTech and 3-HeightsTM) PDF/A-1b files were not produced. Since Adobe Pro does not convert e-mails, a total of 213 PDF/A-1b files were produced. The faithfulness of each conversion was investigated by comparing the visual appearance of the input document with that of the produced PDF/A-1b document on a computer screen. Meticulous visual inspection revealed that the conversion to PDF/A-1b impaired the information content in 24 of the converted 213 files (11 percent). These reproducibility errors included loss of links, loss of other document content (unreadable characters, missing text, document part missing), updated fields (reflecting time and folder of conversion), vector graphics issues and spelling errors.

Originality/value

These results indicate that large-scale batch conversions of heterogeneous files to PDF/A-1b cause complex issues that need to be addressed for each individual file. Even with considerable efforts, some information loss seems unavoidable if large numbers of files from heterogeneous sources are migrated to the PDF/A-1b format.

Details

Library Hi Tech, vol. 39 no. 2
Type: Research Article
ISSN: 0737-8831

Keywords

Article
Publication date: 1 March 2003

Kathy Konicek, Joy Hyzny and Richard Allegra

Electronic reserves help registered campus users who need anytime‐access to documents. Electronic reserves comprise digital files, mostly HTML or PDF formats. In some…

1075

Abstract

Electronic reserves help registered campus users who need anytime‐access to documents. Electronic reserves comprise digital files, mostly HTML or PDF formats. In some circumstances the HTML or PDF file is “readable” to the sighted individual, but are sometimes either partially or completely unreadable to the visually impaired using assistive technology. Creating “accessible” PDF files poses more challenges than creating “accessible” HTML files. Several options are suggested to help solve this problem.

Details

Library Hi Tech, vol. 21 no. 1
Type: Research Article
ISSN: 0737-8831

Keywords

Article
Publication date: 1 January 2006

Susan J. Sullivan

This article sets out to explain the purpose of PDF/A, how it addresses archival and records management concerns, how PDF/A was designed to have “desirable properties of a

3238

Abstract

Purpose

This article sets out to explain the purpose of PDF/A, how it addresses archival and records management concerns, how PDF/A was designed to have “desirable properties of a long‐term preservation format”, and the future of PDF/A.

Design/methodology/approach

The contents of this article are based on the author's knowledge and experience of the subject.

Findings

It is emphasized that PDF/A must be implemented in conjunction with policies and procedures, including quality assurance procedures to ensure acceptable replication of source material.

Originality/value

This article will be of interest to anyone working with PDF files. Work has already begun on PDF/A Part 2 which will be based on PDF 1.6. Application notes and a listing of frequently asked questions will be made publicly available to assist developers of PDF/A applications to better understand the requirements of the file format and provide implementation guidance.

Details

Records Management Journal, vol. 16 no. 1
Type: Research Article
ISSN: 0956-5698

Keywords

Article
Publication date: 31 October 2018

Julius T. Nganji

This paper aims to suggest how the information journey of students with disabilities could be facilitated, by first revealing the existence of inaccessible formats such as…

Abstract

Purpose

This paper aims to suggest how the information journey of students with disabilities could be facilitated, by first revealing the existence of inaccessible formats such as Portable Document Format (PDF) and then suggesting the inclusion of alternative formats of accessible learning materials, thus improving retrieval.

Design/methodology/approach

A sample of 400 articles published over 10 years (2009-2018) from four journals are selected and analysed for accessibility against the Web Content Accessibility Guidelines WCAG 2.0 by using automated accessibility checkers, a screen reader and manual human expertise. The results are presented and recommendations made on improving accessibility.

Findings

The findings suggest that the PDF versions of the selected journal articles are not accessible for screen reader users but could be improved by adopting accessible and inclusive practices. Including alternative formats of the learning materials could help support the student information journey.

Research limitations/implications

The results of the study might not be very representative of all the articles in the journals given the small sample size. Additionally, the criteria used in the study do not consider all existing disabilities. Thus, although the PDFs may be inaccessible for some people with disabilities, they may be accessible to others.

Practical implications

Given that PDFs seem to be the preferred format of journal articles online, there is potential for a difficult information journey for some students due to the limitations posed by inaccessibility of the PDFs. Thus, it is recommended to include alternative formats which could be more accessible, giving the student the choice of accessing the learning materials in their preferred format.

Social implications

If students are unable to access the learning materials that are required for their course, this could lead to poor grade, which might negatively affect the students’ morale. In some cases, some students might drop out.

Originality/value

This study analyses the accessibility of learning materials provided by a third party (journal publishers) and how they affect the student, something that is not usually given much importance when research in accessibility is carried out.

Details

Information and Learning Science, vol. 119 no. 12
Type: Research Article
ISSN: 2398-5348

Keywords

Article
Publication date: 20 November 2009

Michael Seadle

The purpose of this paper is to consider whether PDF formats are appropriate for long‐term digital archiving.

1521

Abstract

Purpose

The purpose of this paper is to consider whether PDF formats are appropriate for long‐term digital archiving.

Design/methodology/approach

The approach takes the form of examining how well PDF's capabilities fit eReader devices that future scholars may use in addition to or instead of paper print‐outs.

Findings

Fixity is the advantage that PDF offers for archiving, while its alternatives generally offer greater flexibility for eReader devices. The question for long‐term digital archiving is whether fixity or flexibility best suits the interests of future readers?

Originality/value

PDF is widely accepted as a digital archiving format and PDF documents are found in virtually every repository. There has, however, been little discussion as to whether the fixed format is not in fact a long‐term disadvantage.

Details

Library Hi Tech, vol. 27 no. 4
Type: Research Article
ISSN: 0737-8831

Keywords

Article
Publication date: 1 July 2014

Wen-Feng Hsiao, Te-Min Chang and Erwin Thomas

The purpose of this paper is to propose an automatic metadata extraction and retrieval system to extract bibliographical information from digital academic documents in portable…

Abstract

Purpose

The purpose of this paper is to propose an automatic metadata extraction and retrieval system to extract bibliographical information from digital academic documents in portable document formats (PDFs).

Design/methodology/approach

The authors use PDFBox to extract text and font size information, a rule-based method to identify titles, and an Hidden Markov Model (HMM) to extract the titles and authors. Finally, the extracted titles and authors (possibly incorrect or incomplete) are sent as query strings to digital libraries (e.g. ACM, IEEE, CiteSeerX, SDOS, and Google Scholar) to retrieve the rest of metadata.

Findings

Four experiments are conducted to examine the feasibility of the proposed system. The first experiment compares two different HMM models: multi-state model and one state model (the proposed model). The result shows that one state model can have a comparable performance with multi-state model, but is more suitable to deal with real-world unknown states. The second experiment shows that our proposed model (without the aid of online query) can achieve as good performance as other researcher's model on Cora paper header dataset. In the third experiment the paper examines the performance of our system on a small dataset of 43 real PDF research papers. The result shows that our proposed system (with online query) can perform pretty well on bibliographical data extraction and even outperform the free citation management tool Zotero 3.0. Finally, the paper conducts the fourth experiment with a larger dataset of 103 papers to compare our system with Zotero 4.0. The result shows that our system significantly outperforms Zotero 4.0. The feasibility of the proposed model is thus justified.

Research limitations/implications

For academic implication, the system is unique in two folds: first, the system only uses Cora header set for HMM training, without using other tagged datasets or gazetteers resources, which means the system is light and scalable. Second, the system is workable and can be applied to extracting metadata of real-world PDF files. The extracted bibliographical data can then be imported into citation software such as endnote or refworks to increase researchers’ productivity.

Practical implications

For practical implication, the system can outperform the existing tool, Zotero v4.0. This provides practitioners good chances to develop similar products in real applications; though it might require some knowledge about HMM implementation.

Originality/value

The HMM implementation is not novel. What is innovative is that it actually combines two HMM models. The main model is adapted from Freitag and Mccallum (1999) and the authors add word features of the Nymble HMM (Bikel et al, 1997) to it. The system is workable even without manually tagging the datasets before training the model (the authors just use cora dataset to train and test on real-world PDF papers), as this is significantly different from what other works have done so far. The experimental results have shown sufficient evidence about the feasibility of our proposed method in this aspect.

Details

Program, vol. 48 no. 3
Type: Research Article
ISSN: 0033-0337

Keywords

Article
Publication date: 24 April 2009

Luis M. de Campos, Juan M. Fernández‐Luna, Juan F. Huete, Carlos J. Martín‐Dancausa, Antonio Tagua‐Jiménez and Carmen Tur‐Vigil

The purpose of this paper is to present an overview of the reorganisation of the Andalusian Parliament's digital library to improve the electronic representation and access of its…

Abstract

Purpose

The purpose of this paper is to present an overview of the reorganisation of the Andalusian Parliament's digital library to improve the electronic representation and access of its official corpus by taking advantage of a document's internal organisation. Video recordings of the parliamentary sessions have also been integrated with their corresponding textual transcriptions.

Design/methodology/approach

After analysing the state of the Andalusian Parliament's digital library and determining the aspects that could be improved both in the repository and access mechanisms, this paper describes each component of the developed integrated information system.

Findings

A methodology has been developed to tackle the problem and this could be applied to other similar institutions and organisations. Exploiting the internal structure of the parliament's official documents has also proved to be extremely interesting for users as they are directed towards the most relevant parts of the documents.

Originality/value

The paper presents an application of an information retrieval system for structured documents to a real framework and the integration of multimedia sources (e.g. text and video) for retrieval purposes.

Details

Program, vol. 43 no. 2
Type: Research Article
ISSN: 0033-0337

Keywords

Article
Publication date: 1 March 2000

Lynne C. Chivers

In recent years there has been an increasing move towards publishing in electronic format, sometimes as facsimiles of paper originals and sometimes “born digital”. For anyone…

1025

Abstract

In recent years there has been an increasing move towards publishing in electronic format, sometimes as facsimiles of paper originals and sometimes “born digital”. For anyone concerned with publishing and document supply, whether as a supplier or customer, this trend cannot be ignored. The British Library ran a wide programme of projects (“Initiatives for Access”) to develop an understanding of how digital technologies might be used to support and enhance its operations and some of those particularly relevant to document supply are described. These projects were valuable in highlighting many of the issues to be considered in the use of electronic documents in document supply and an overview of these is given from four perspectives, namely business management, operational, technical and customer. Finally, the library’s current “Digital Library” programme is mentioned, together with two recent additions to the library’s remote service operations: Ariel® for electronic delivery and an integrated electronic journal store.

Details

Interlending & Document Supply, vol. 28 no. 1
Type: Research Article
ISSN: 0264-1615

Keywords

Article
Publication date: 18 June 2019

Hirak Jyoti Hazarika and S. Ravikumar

This paper aims to provide an overview of the need and current development of document viewers for digitized objects in DSpace repositories, including a local viewer developed for…

Abstract

Purpose

This paper aims to provide an overview of the need and current development of document viewers for digitized objects in DSpace repositories, including a local viewer developed for a document collection as like research papers, thesis and dissertation.

Design/methodology/approach

The authors developed the concept for the to preserve and store all types of document in one data base with the help of open source software (OSS). The authors used Java Script programing to integrated and developed the system.

Findings

The major finding of our work is that large document file can be accommodated in DSpace without modifying the originality of the documents and viewing the document in a different dimension as a specialist needs. The combination of current technologies such as Google Doc Viewer and the Internet Archive Book Reader, as well as the growing number of digital repositories hosting digitized content, suggests that the DSpace community will probably benefit with an “out-of-the-box.”

Originality/value

In addition, to exploring the opportunities of OSS implementation in different research institutes, the study includes issues related to the implementation of the open source repository. This is the first time in India, as well as DSpace history to created and developed Document Viewer with the help of DSpace OSS. The study would have value for library professionals and the developer to understand the market in the context of OSS.

Details

Library Hi Tech News, vol. 36 no. 5
Type: Research Article
ISSN: 0741-9058

Keywords

1 – 10 of over 52000