Search results
1 – 10 of 347F. Canan Pembe and Tunga Güngör
The purpose of this paper is to develop a new summarisation approach, namely structure‐preserving and query‐biased summarisation, to improve the effectiveness of web searching…
Abstract
Purpose
The purpose of this paper is to develop a new summarisation approach, namely structure‐preserving and query‐biased summarisation, to improve the effectiveness of web searching. During web searching, one aid for users is the document summaries provided in the search results. However, the summaries provided by current search engines have limitations in directing users to relevant documents.
Design/methodology/approach
The proposed system consists of two stages: document structure analysis and summarisation. In the first stage, a rule‐based approach is used to identify the sectional hierarchies of web documents. In the second stage, query‐biased summaries are created, making use of document structure both in the summarisation process and in the output summaries.
Findings
In structural processing, about 70 per cent accuracy in identifying document sectional hierarchies is obtained. The summarisation method is tested on a task‐based evaluation method using English and Turkish document collections. The results show that the proposed method is a significant improvement over both unstructured query‐biased summaries and Google snippets in terms of f‐measure.
Practical implications
The proposed summarisation system can be incorporated into search engines. The structural processing technique also has applications in other information systems, such as browsing, outlining and indexing documents.
Originality/value
In the literature on summarisation, the effects of query‐biased techniques and document structure are considered in only a few works and are researched separately. The research reported here differs from traditional approaches by combining these two aspects in a coherent framework. The work is also the first automatic summarisation study for Turkish targeting web search.
Details
Keywords
Najd Al-Mouh and Hend S. Al-Khalifa
Millions of visually impaired people (VIP) in the world still face difficulties browsing the Web and accessing information. This paper aims to present a proxy service that takes…
Abstract
Purpose
Millions of visually impaired people (VIP) in the world still face difficulties browsing the Web and accessing information. This paper aims to present a proxy service that takes advantage of the concept of context-aware to help contextualizing web pages for visually impaired users.
Design/methodology/approach
The VIP-aware proxy combines five components to utilize the user preferences, adapts the requested web page and reorganizes its content to best match the preferences set by the user. This new scenario will assist VIP in browsing the Web more effectively.
Findings
A preliminary evaluation of the system resulted in general user satisfaction.
Practical implications
The VIP-aware proxy will provide users with a clean, accessible web page, save them time when screen readers examine content related to their preferences and save them money when unnecessary content is not downloaded.
Originality/value
The VIP-aware proxy presented in this paper is the first of its kind targeting VIP.
Details
Keywords
One of the problems facing systems administrators and security auditors is that a security test/audit can generate a vast quantity of information that needs to be stored, analysed…
Abstract
Purpose
One of the problems facing systems administrators and security auditors is that a security test/audit can generate a vast quantity of information that needs to be stored, analysed and cross referenced for later use. The current state‐of‐the‐art in security audit tools does not allow for information from multiple different tools to be shared and integrated. This paper aims to develop an Extensible Markup Language (XML)‐based architecture that is capable of encoding information from a variety of disparate heterogeneous sources and then unifying and integrating them into a single SQL database schema.
Design/methodology/approach
The paper demonstrates how, through the application of the architecture, large quantities of security related information can be captured within a single database schema. This database can then be used to ensure that systems are conforming to an organisation's network security policy.
Findings
This type of data integration and data unification within a vulnerability assessment/security audit is currently not possible; this leads to confusion and omissions in the security audit process.
Originality/value
This paper develops a data integration and unification architecture that will allow data from multiple vulnerability assessment tools to be integrated into a single unified picture of the security state of a network of interconnected computer systems.
Details
Keywords
Xiu Susie Fang, Quan Z. Sheng, Xianzhi Wang, Anne H.H. Ngu and Yihong Zhang
This paper aims to propose a system for generating actionable knowledge from Big Data and use this system to construct a comprehensive knowledge base (KB), called GrandBase.
Abstract
Purpose
This paper aims to propose a system for generating actionable knowledge from Big Data and use this system to construct a comprehensive knowledge base (KB), called GrandBase.
Design/methodology/approach
In particular, this study extracts new predicates from four types of data sources, namely, Web texts, Document Object Model (DOM) trees, existing KBs and query stream to augment the ontology of the existing KB (i.e. Freebase). In addition, a graph-based approach to conduct better truth discovery for multi-valued predicates is also proposed.
Findings
Empirical studies demonstrate the effectiveness of the approaches presented in this study and the potential of GrandBase. The future research directions regarding GrandBase construction and extension has also been discussed.
Originality/value
To revolutionize our modern society by using the wisdom of Big Data, considerable KBs have been constructed to feed the massive knowledge-driven applications with Resource Description Framework triples. The important challenges for KB construction include extracting information from large-scale, possibly conflicting and different-structured data sources (i.e. the knowledge extraction problem) and reconciling the conflicts that reside in the sources (i.e. the truth discovery problem). Tremendous research efforts have been contributed on both problems. However, the existing KBs are far from being comprehensive and accurate: first, existing knowledge extraction systems retrieve data from limited types of Web sources; second, existing truth discovery approaches commonly assume each predicate has only one true value. In this paper, the focus is on the problem of generating actionable knowledge from Big Data. A system is proposed, which consists of two phases, namely, knowledge extraction and truth discovery, to construct a broader KB, called GrandBase.
Details
Keywords
The semantic and structural heterogeneity of large Extensible Markup Language (XML) digital libraries emphasizes the need of supporting approximate queries, i.e. queries where the…
Abstract
Purpose
The semantic and structural heterogeneity of large Extensible Markup Language (XML) digital libraries emphasizes the need of supporting approximate queries, i.e. queries where the matching conditions are relaxed so as to retrieve results that possibly partially satisfy the user's requests. The paper aims to propose a flexible query answering framework which efficiently supports complex approximate queries on XML data.
Design/methodology/approach
To reduce the number of relaxations applicable to a query, the paper relies on the specification of user preferences about the types of approximations allowed. A specifically devised index structure which efficiently supports both semantic and structural approximations, according to the specified user preferences, is proposed. Also, a ranking model to quantify approximations in the results is presented.
Findings
Personalized queries, on one hand, effectively narrow the space of query reformulations, on the other hand, enhance the user query capabilities with a great deal of flexibility and control over requests. As to the quality of results, the retrieval process considerably benefits because of the presence of user preferences in the queries. Experiments demonstrate the effectiveness and the efficiency of the proposal, as well as its scalability.
Research limitations/implications
Future developments concern the evaluation of the effectiveness of personalization on queries through additional examinations of the effects of the variability of parameters expressing user preferences.
Originality/value
The paper is intended for the research community and proposes a novel query model which incorporates user preferences about query relaxations on large heterogeneous XML data collections.
Details
Keywords
Most e‐commerce web sites use HTML forms for user authentication, new user registration, newsletter subscription, and searching for products and services. The purpose of this…
Abstract
Purpose
Most e‐commerce web sites use HTML forms for user authentication, new user registration, newsletter subscription, and searching for products and services. The purpose of this paper is to present a method for automated classification of HTML forms, which is important for search engine applications, e.g. Yahoo Shopping and Google's Froogle, as they can be used to improve the quality of the index and accuracy of search results.
Design/methodology/approach
Describes a technique for classifying HTML forms based on their features. Develops algorithms for automatic feature generation of HTML forms and a neural network to classify them.
Findings
The authors tested their classifier on an e‐commerce data set and a randomly retrieved data set and achieved accuracy of 94.7 and 93.9 per cent respectively. Experimental results show that the classifier is effective and efficient on both test beds, suggesting that it is a promising general purpose method.
Originality/value
The paper is of value to those involved with information management and e‐commerce.
Details
Keywords
To help to clarify the role of XML tools and standards in supporting transition and migration towards a fully XML‐based environment for managing access to information.
Abstract
Purpose
To help to clarify the role of XML tools and standards in supporting transition and migration towards a fully XML‐based environment for managing access to information.
Design/methodology/approach
The Ching Digital Image Library, built on a three‐tier architecture, is used as a source of examples to illustrate a number of methods of data manipulation for presentation processing. An SQL relational database is implemented in the data tier and Microsoft Internet Information Server (IIS) is used to manage processes and sessions in the middle tier. Extensible Markup Language (XML) is used in the data tier to represent offers and in the presentation tiers to represent screen displays that can be manipulated using the XML Document Object Model (DOM), XML Data Islands, and XSL (eXtensible Stylesheet Language), before being delivered to the web browser as HTML.
Findings
It is demonstrated that, although XML itself is not a database, the XML family provides many, though not all, of the components found in databases. XML coupled with a database gives greater power than the sum of the parts in a web application.
Originality/value
This paper is a digital image library case study with practical generic tutorial elements about the role and function of XML in modern database‐backed web sites.
Details
Keywords
The self‐describing nature of data marked up using extensible markup language (XML) allows the XML document itself to act in a manner similar to a database, but without the large…
Abstract
The self‐describing nature of data marked up using extensible markup language (XML) allows the XML document itself to act in a manner similar to a database, but without the large file sizes and proprietary software generally associated with database applications. XML data can be made directly available to users using a variety of methods. This paper explores methods for both server‐side and client‐side processing and display of XML‐encoded data, using an annotated bibliography as an example.
Details
Keywords
Jason Underwood and Alastair Watson
A three year Esprit project – ProCure – is ultimately aiming to take a significant but achievable step forward in the application of available information and communication…
Abstract
A three year Esprit project – ProCure – is ultimately aiming to take a significant but achievable step forward in the application of available information and communication technology (ICT) to the large scale engineering (LSE) construction industry. The ProCure consortium consists of five industrial partners supported by four associated research and expert partners. The project combines leading expertise from three member states to support ICT deployment by three industrial collaborative groups, i.e. UK, Germany and Finland. The basis of the project is in the partners’ belief that sufficient ICT is now available to achieve deployment, with care, in real projects, with an acceptable risk of failure. This paper presents work undertaken within the project to investigate the various metadata standards that exist in order to define a minimum metadata set based on these standards for the implementation of two demonstrators for XML based automated document exchange between a simulation of a corporate document management system and a simulation of a collaborative construction project Web site.
Details
Keywords
Srinivas Vadrevu, Fatih Gelgi, Saravanakumar Nagarajan and Hasan Davulcu
The purpose of this research is to automatically separate and extract meta‐data and instance information from various link pages in the web, by utilizing presentation and linkage…
Abstract
Purpose
The purpose of this research is to automatically separate and extract meta‐data and instance information from various link pages in the web, by utilizing presentation and linkage regularities on the web.
Design/methodology/approach
Research objectives have been achieved through an information extraction system called semantic partitioner that automatically organizes the content in each web page into a hierarchical structure, and an algorithm that interprets and translates these hierarchical structures into logical statements by distinguishing and representing the meta‐data and their individual data instances.
Findings
Experimental results for the university domain with 12 computer science department web sites, comprising 361 individual faculty and course home pages indicate that the performance of the meta‐data and instance extraction averages 85, 88 percent F‐measure, respectively. Our METEOR system achieves this performance without any domain specific engineering requirement.
Originality/value
Important contributions of the METEOR system presented in this paper are: it performs extraction without the assumption that the object instance pages are template‐driven; it is domain independent and does not require any previously engineered domain ontology; and by interpreting the link pages, it can extract both meta‐data, such as concept and attribute names and their relationships, as well as their instances with high accuracy.
Details