Search results

1 – 10 of over 3000
Click here to view access options
Article
Publication date: 23 November 2010

Hao Han and Takehiro Tokuda

The purpose of this paper is to present a method to realize the flexible and lightweight integration of general web applications.

Abstract

Purpose

The purpose of this paper is to present a method to realize the flexible and lightweight integration of general web applications.

Design/methodology/approach

The information extraction and functionality emulation method are proposed to realize the web information integration for the general web applications. All the processes of web information searching, submitting and extraction are run at client‐side by end‐user programming like a real web service.

Findings

The implementation shows that the required programming techniques are within the abilities of general web users, and without needing to write too many programs.

Originality/value

A Java‐based class package was developed for web information searching/submitting/extraction, which users can integrate easily with the general web applications.

Details

International Journal of Web Information Systems, vol. 6 no. 4
Type: Research Article
ISSN: 1744-0084

Keywords

Click here to view access options
Article
Publication date: 1 February 2002

A.C.M. Fong, S.C. Hui and H.L. Vu

Research organisations and individual researchers increasingly choose to share their research findings by providing lists of their published works on the World Wide Web

Abstract

Research organisations and individual researchers increasingly choose to share their research findings by providing lists of their published works on the World Wide Web. To facilitate the exchange of ideas, the lists often include links to published papers in portable document format (PDF) or Postscript (PS) format. Generally, these publication Web sites are updated regularly to include new works. While manual monitoring of relevant Web sites is tedious, commercial search engines and information monitoring systems are ineffective in finding and tracking scholarly publications. Analyses the characteristics of publication index pages and describes effective automatic extraction techniques that the authors have developed. The authors’ techniques combine lexical and syntactic analyses with heuristics. The proposed techniques have been implemented and tested for more than 14,000 Web pages and achieved consistently high success rates of around 90 percent.

Details

Online Information Review, vol. 26 no. 1
Type: Research Article
ISSN: 1468-4527

Keywords

Click here to view access options
Book part
Publication date: 10 February 2012

Dirk Ahlers

Purpose — To provide a theoretical background to understand current local search engines as an aspect of specialized search, and understand the data sources and used…

Abstract

Purpose — To provide a theoretical background to understand current local search engines as an aspect of specialized search, and understand the data sources and used technologies.

Design/methodology/approach — Selected local search engines are examined and compared toward their use of geographic information retrieval (GIR) technologies, data sources, available entity information, processing, and interfaces. An introduction to the field of GIR is given and its use in the selected systems is discussed.

Findings — All selected commercial local search engines utilize GIR technology in varying degrees for information preparation and presentation. It is also starting to be used in regular Web search. However, major differences can be found between the different search engines.

Research limitations/implications — This study is not exhaustive and only uses informal comparisons without definitive ranking. Due to the unavailability of hard data, informed guesses were made based on available public interfaces and literature.

Practical implications — A source of background information for understanding the results of local search engines, their provenance, and their potential.

Originality/value — An overview of GIR technology in the context of commercial search engines integrates research efforts and commercial systems and helps to understand both sides better.

Click here to view access options
Article
Publication date: 21 September 2012

Jorge Martinez‐Gil and José F. Aldana‐Montes

Semantic similarity measures are very important in many computer‐related fields. Previous works on applications such as data integration, query expansion, tag refactoring…

Abstract

Purpose

Semantic similarity measures are very important in many computer‐related fields. Previous works on applications such as data integration, query expansion, tag refactoring or text clustering have used some semantic similarity measures in the past. Despite the usefulness of semantic similarity measures in these applications, the problem of measuring the similarity between two text expressions remains a key challenge. This paper aims to address this issue.

Design/methodology/approach

In this article, the authors propose an optimization environment to improve existing techniques that use the notion of co‐occurrence and the information available on the web to measure similarity between terms.

Findings

The experimental results using the Miller and Charles and Gracia and Mena benchmark datasets show that the proposed approach is able to outperform classic probabilistic web‐based algorithms by a wide margin.

Originality/value

This paper presents two main contributions. The authors propose a novel technique that beats classic probabilistic techniques for measuring semantic similarity between terms. This new technique consists of using not only a search engine for computing web page counts, but a smart combination of several popular web search engines. The approach is evaluated on the Miller and Charles and Gracia and Mena benchmark datasets and compared with existing probabilistic web extraction techniques.

Details

Online Information Review, vol. 36 no. 5
Type: Research Article
ISSN: 1468-4527

Keywords

Click here to view access options
Article
Publication date: 31 July 2007

Alesia Zuccala, Mike Thelwall, Charles Oppenheim and Rajveen Dhiensa

The purpose of this paper is to explore the use of LexiURL as a Web intelligence tool for collecting and analysing links to digital libraries, focusing specifically on the…

Downloads
2044

Abstract

Purpose

The purpose of this paper is to explore the use of LexiURL as a Web intelligence tool for collecting and analysing links to digital libraries, focusing specifically on the National electronic Library for Health (NeLH).

Design/methodology/approach

The Web intelligence techniques in this study are a combination of link analysis (web structure mining), web server log file analysis (web usage mining), and text analysis (web content mining), utilizing the power of commercial search engines and drawing upon the information science fields of bibliometrics and webometrics. LexiURL is a computer program designed to calculate summary statistics for lists of links or URLs. Its output is a series of standard reports, for example listing and counting all of the different domain names in the data.

Findings

Link data, when analysed together with user transaction log files (i.e. Web referring domains) can provide insights into who is using a digital library and when, and who could be using the digital library if they are “surfing” a particular part of the Web; in this case any site that is linked to or colinked with the NeLH. This study found that the NeLH was embedded in a multifaceted Web context, including many governmental, educational, commercial and organisational sites, with the most interesting being sites from the.edu domain, representing American Universities. Not many links directed to the NeLH were followed on September 25, 2005 (the date of the log file analysis and link extraction analysis), which means that users who access the digital library have been arriving at the site via only a few select links, bookmarks and search engine searches, or non‐electronic sources.

Originality/value

A number of studies concerning digital library users have been carried out using log file analysis as a research tool. Log files focus on real‐time user transactions; while LexiURL can be used to extract links and colinks associated with a digital library's growing Web network. This Web network is not recognized often enough, and can be a useful indication of where potential users are surfing, even if they have not yet specifically visited the NeLH site.

Click here to view access options
Book part
Publication date: 13 December 2017

Qiongwei Ye and Baojun Ma

Internet + and Electronic Business in China is a comprehensive resource that provides insight and analysis into E-commerce in China and how it has revolutionized and

Abstract

Internet + and Electronic Business in China is a comprehensive resource that provides insight and analysis into E-commerce in China and how it has revolutionized and continues to revolutionize business and society. Split into four distinct sections, the book first lays out the theoretical foundations and fundamental concepts of E-Business before moving on to look at internet+ innovation models and their applications in different industries such as agriculture, finance and commerce. The book then provides a comprehensive analysis of E-business platforms and their applications in China before finishing with four comprehensive case studies of major E-business projects, providing readers with successful examples of implementing E-Business entrepreneurship projects.

Internet + and Electronic Business in China is a comprehensive resource that provides insights and analysis into how E-commerce has revolutionized and continues to revolutionize business and society in China.

Details

Internet+ and Electronic Business in China: Innovation and Applications
Type: Book
ISBN: 978-1-78743-115-7

Click here to view access options
Article
Publication date: 3 August 2021

Irvin Dongo, Yudith Cardinale, Ana Aguilera, Fabiola Martinez, Yuni Quintero, German Robayo and David Cabeza

This paper aims to perform an exhaustive revision of relevant and recent related studies, which reveals that both extraction methods are currently used to analyze…

Abstract

Purpose

This paper aims to perform an exhaustive revision of relevant and recent related studies, which reveals that both extraction methods are currently used to analyze credibility on Twitter. Thus, there is clear evidence of the need of having different options to extract different data for this purpose. Nevertheless, none of these studies perform a comparative evaluation of both extraction techniques. Moreover, the authors extend a previous comparison, which uses a recent developed framework that offers both alternates of data extraction and implements a previously proposed credibility model, by adding a qualitative evaluation and a Twitter-Application Programming Interface (API) performance analysis from different locations.

Design/methodology/approach

As one of the most popular social platforms, Twitter has been the focus of recent research aimed at analyzing the credibility of the shared information. To do so, several proposals use either Twitter API or Web scraping to extract the data to perform the analysis. Qualitative and quantitative evaluations are performed to discover the advantages and disadvantages of both extraction methods.

Findings

The study demonstrates the differences in terms of accuracy and efficiency of both extraction methods and gives relevance to much more problems related to this area to pursue true transparency and legitimacy of information on the Web.

Originality/value

Results report that some Twitter attributes cannot be retrieved by Web scraping. Both methods produce identical credibility values when a robust normalization process is applied to the text (i.e. tweet). Moreover, concerning the time performance, Web scraping is faster than Twitter API and it is more flexible in terms of obtaining data; however, Web scraping is very sensitive to website changes. Additionally, the response time of the Twitter API is proportional to the distance from the central server at San Francisco.

Details

International Journal of Web Information Systems, vol. 17 no. 6
Type: Research Article
ISSN: 1744-0084

Keywords

Click here to view access options
Article
Publication date: 7 August 2009

F. Canan Pembe and Tunga Güngör

The purpose of this paper is to develop a new summarisation approach, namely structure‐preserving and query‐biased summarisation, to improve the effectiveness of web

Abstract

Purpose

The purpose of this paper is to develop a new summarisation approach, namely structure‐preserving and query‐biased summarisation, to improve the effectiveness of web searching. During web searching, one aid for users is the document summaries provided in the search results. However, the summaries provided by current search engines have limitations in directing users to relevant documents.

Design/methodology/approach

The proposed system consists of two stages: document structure analysis and summarisation. In the first stage, a rule‐based approach is used to identify the sectional hierarchies of web documents. In the second stage, query‐biased summaries are created, making use of document structure both in the summarisation process and in the output summaries.

Findings

In structural processing, about 70 per cent accuracy in identifying document sectional hierarchies is obtained. The summarisation method is tested on a task‐based evaluation method using English and Turkish document collections. The results show that the proposed method is a significant improvement over both unstructured query‐biased summaries and Google snippets in terms of f‐measure.

Practical implications

The proposed summarisation system can be incorporated into search engines. The structural processing technique also has applications in other information systems, such as browsing, outlining and indexing documents.

Originality/value

In the literature on summarisation, the effects of query‐biased techniques and document structure are considered in only a few works and are researched separately. The research reported here differs from traditional approaches by combining these two aspects in a coherent framework. The work is also the first automatic summarisation study for Turkish targeting web search.

Details

Online Information Review, vol. 33 no. 4
Type: Research Article
ISSN: 1468-4527

Keywords

Click here to view access options
Article
Publication date: 1 June 2005

Yanbo Ru and Ellis Horowitz

The existence and continued growth of the invisible web creates a major challenge for search engines that are attempting to organize all of the material on the web into a…

Downloads
2323

Abstract

Purpose

The existence and continued growth of the invisible web creates a major challenge for search engines that are attempting to organize all of the material on the web into a form that is easily retrieved by all users. The purpose of this paper is to identify the challenges and problems underlying existing work in this area.

Design/methodology/approach

A discussion based on a short survey of prior work, including automated discovery of invisible web site search interfaces, automated classification of invisible web sites, label assignment and form filling, information extraction from the resulting pages, learning the query language of the search interface, building content summary for an invisible web site, selecting proper databases, integrating invisible websearch interfaces, and accessing the performance of an invisible web site.

Findings

Existing technologies and tools for indexing the invisible web follow one of two strategies: indexing the web site interface or examining a portion of the contents of an invisible web site and indexing the results.

Originality/value

The paper is of value to those involved with information management.

Details

Online Information Review, vol. 29 no. 3
Type: Research Article
ISSN: 1468-4527

Keywords

Click here to view access options
Article
Publication date: 11 April 2008

Yingzi Jin, Mitsuru Ishizuka and Yutaka Matsuo

Purpose – Social relations play an important role in a real community. Interaction patterns reveal relations among actors (such as persons, groups, firms), which can be…

Abstract

Purpose – Social relations play an important role in a real community. Interaction patterns reveal relations among actors (such as persons, groups, firms), which can be merged to produce valuable information such as a network structure. This paper aims to present a new approach to extract inter‐firm networks from the web for further analysis. Design/methodology/approach – In this study extraction of relations between a pair of firms is obtained by using a search engine and text processing. Because names of firms co‐appear coincidentally on the web, an advanced algorithm is proposed, which is characterised by the addition of keywords (“relation keywords”) to a query. The relation keywords are obtained from the web using a Jaccard coefficient. Findings – As an application, a network of 60 firms in Japan is extracted including IT, communication, broadcasting, and electronics firms from the web and comprehensive evaluations of this approach are shown. The alliance and lawsuit relations are easily obtainable from the web using the algorithm. By adding relation keywords to named pairs of firms as a query, It is possible to collect target pages from the top of web pages more precisely than by only using the named pairs as a query. Practical implications – This study proposes a new approach for extracting inter‐firm networks from the web. The obtained network is useful in several ways. It is possible to find a cluster of firms and characterise a firm by its cluster. Business experts often make such inferences based on firm relations and firm groups. For that reason the firm network might enhance inferential abilities on the business domain. Also we might use obtained networks to recommend business partners based on structural advantages. The authors' intuition is that extracting a social network might provide information that is only recognisable from the network point of view. For example, the centrality of each firm is identified only after generating a social network. Originality/value – This study is a first attempt to extract inter‐firm networks from the web using a search engine. The approach is also applicable to other actors, such as famous persons, organisations or other multiple relational entities.

Details

Online Information Review, vol. 32 no. 2
Type: Research Article
ISSN: 1468-4527

Keywords

1 – 10 of over 3000