Search results

1 – 10 of 326
Article
Publication date: 19 March 2018

Hyo-Jung Oh, Dong-Hyun Won, Chonghyuck Kim, Sung-Hee Park and Yong Kim

The purpose of this paper is to describe the development of an algorithm for realizing web crawlers that automatically collect dynamically generated webpages from the deep web.

Abstract

Purpose

The purpose of this paper is to describe the development of an algorithm for realizing web crawlers that automatically collect dynamically generated webpages from the deep web.

Design/methodology/approach

This study proposes and develops an algorithm to collect web information as if the web crawler gathers static webpages by managing script commands as links. The proposed web crawler actually experiments with the algorithm by collecting deep webpages.

Findings

Among the findings of this study is that if the actual crawling process provides search results as script pages, the outcome only collects the first page. However, the proposed algorithm can collect deep webpages in this case.

Research limitations/implications

To use a script as a link, a human must first analyze the web document. This study uses the web browser object provided by Microsoft Visual Studio as a script launcher, so it cannot collect deep webpages if the web browser object cannot launch the script, or if the web document contains script errors.

Practical implications

The research results show deep webs are estimated to have 450 to 550 times more information than surface webpages, and it is difficult to collect web documents. However, this algorithm helps to enable deep web collection through script runs.

Originality/value

This study presents a new method to be utilized with script links instead of adopting previous keywords. The proposed algorithm is available as an ordinary URL. From the conducted experiment, analysis of scripts on individual websites is needed to employ them as links.

Details

Data Technologies and Applications, vol. 52 no. 2
Type: Research Article
ISSN: 2514-9288

Keywords

Article
Publication date: 1 May 2002

Mike Thelwall

There have been many attempts to study the content of the Web, either through human or automatic agents. Describes five different previously used Web survey methodologies, each…

2272

Abstract

There have been many attempts to study the content of the Web, either through human or automatic agents. Describes five different previously used Web survey methodologies, each justifiable in its own right, but presents a simple experiment that demonstrates concrete differences between them. The concept of crawling the Web also bears further inspection, including the scope of the pages to crawl, the method used to access and index each page, and the algorithm for the identification of duplicate pages. The issues involved here will be well‐known to many computer scientists but, with the increasing use of crawlers and search engines in other disciplines, they now require a public discussion in the wider research community. Concludes that any scientific attempt to crawl the Web must make available the parameters under which it is operating so that researchers can, in principle, replicate experiments or be aware of and take into account differences between methodologies. Also introduces a new hybrid random page selection methodology.

Details

Internet Research, vol. 12 no. 2
Type: Research Article
ISSN: 1066-2243

Keywords

Article
Publication date: 1 May 2001

Mike Thelwall

Every hyperlink pointing at a Web site is a potential source of new visitors, especially one near the top of a results page from a popular search engine. The order of the links in…

1504

Abstract

Every hyperlink pointing at a Web site is a potential source of new visitors, especially one near the top of a results page from a popular search engine. The order of the links in a search results page is often decided upon by an algorithm that takes into account the number and quality of links to all matching pages. The number of standard links targeted at a site is therefore doubly important, yet little research has touched on the actual interlinkage between business Web sites, which numerically dominate the Web. Discusses business use of the Web and related search engine design issues as well as research on general and academic links before reporting on a survey of the links published by a relatively random collection of business Web sites. The results indicate that around 66 percent of Web sites do carry external links, most of which are targeted at a specific purpose, but that about 17 percent publish general links, with implications for those designing and marketing Web sites.

Details

Internet Research, vol. 11 no. 2
Type: Research Article
ISSN: 1066-2243

Keywords

Article
Publication date: 26 April 2011

Sudip Ranjan Hatua and Devika P. Madalli

The purpose of this paper is to discuss the methodology in building an integrated domain information system with illustrations that provide proof of concept

Abstract

Purpose

The purpose of this paper is to discuss the methodology in building an integrated domain information system with illustrations that provide proof of concept

Design/methodology/approach

The present work studies the usual search engine approach to information and its pitfalls. A methodology was adopted for construction of a domain‐based information system, known as Aerospace Information System (AERIS), comprising six distinct steps in identifying and sourcing, evaluating and then technically integrating resources into the information system. AERIS is an integrated gateway for resources in the domain of aerospace science and technology. AERIS is designed to provide information from varied sources such as formal publications (e.g. articles), aggregators (e.g. harvesters) and also informal resources such as blogs and discussion fora. Interaction is provided through a simple user interface.

Findings

The domain‐based information system with focussed collection and services serves patrons with more precision than general web search engines.

Research limitations/implications

At present the AERIS system is populated with a limited number of resources. A fully‐fledged system may be developed based on the same model.

Originality/value

This original research work provides a model for a comprehensive integrated gateway to domain‐based information using open‐source tools.

Details

Program, vol. 45 no. 2
Type: Research Article
ISSN: 0033-0337

Keywords

Article
Publication date: 1 June 1961

In case of thin paint films, the standard type of high‐voltage holiday detector is not suitable owing to breakdown of the thin film, and for this reason Metal & Pipeline Endurance…

Abstract

In case of thin paint films, the standard type of high‐voltage holiday detector is not suitable owing to breakdown of the thin film, and for this reason Metal & Pipeline Endurance Ltd. have developed a thin paint film holiday detector which will determine pinholes, voids or bare spots in a surface coating with a very high electrical resistance when such material is covering an electrically conductive surface.

Details

Anti-Corrosion Methods and Materials, vol. 8 no. 6
Type: Research Article
ISSN: 0003-5599

Article
Publication date: 1 February 2003

Marcel Machill, Christoph Neuberger and Friedemann Schindler

Search engines exist to help sort through all the information available on the Internet, but have thus fair failed to shoulder any responsibility for the content which appears on…

1600

Abstract

Search engines exist to help sort through all the information available on the Internet, but have thus fair failed to shoulder any responsibility for the content which appears on the pages they present in their indexes. Search engines lack any transparency to clarify how results were found, and how they are connected to the search terms. Thus, problems arise in connection with the protection of minors – namely, that minors have access, intentional or unwitting, to content which may be harmful to them. The findings of this study point to the need for a better framework for the protection of children. This framework should include codes of conduct for search engines, more accurate labeling of Web site data, and the outlawing of search engine manipulation. This study is intended as a first step in making the public aware of the problem of protecting children on the Internet.

Details

info, vol. 5 no. 1
Type: Research Article
ISSN: 1463-6697

Keywords

Article
Publication date: 1 December 2002

Mike Thelwall

Web links are a phenomenon of interest to bibliometricians by analogy with citations, and to others because of their use in Web navigation and search engines. It is known that…

Abstract

Web links are a phenomenon of interest to bibliometricians by analogy with citations, and to others because of their use in Web navigation and search engines. It is known that very few links on university Web sites are targeted at scholarly expositions and yet, at least in the UK and Australia, a correlation has been established between link count metrics for universities and measures of institutional research. This paper operates on a finer‐grained level of detail, focussing on counts of links between pairs of universities. It provides evidence of an underlying linear relationship with the quadruple product of the size and research quality of both source and target institution. This simple model is proposed as applying generally to national university systems, subject to a series of constraints to identify cases where it is unlikely to be applicable. It is hoped that the model, if confirmed by studies of other countries, will open the door to deeper mining of academic Web link data.

Details

Journal of Documentation, vol. 58 no. 6
Type: Research Article
ISSN: 0022-0418

Keywords

Article
Publication date: 12 October 2010

Alexander R.M. Schellong

The purpose of this paper is to offer insights and suggestions for the design of existing and future e‐government benchmarks.

1562

Abstract

Purpose

The purpose of this paper is to offer insights and suggestions for the design of existing and future e‐government benchmarks.

Design/methodology/approach

The paper presents several frameworks to structure the discussion of e‐government benchmark design based on a review of existing research and practice. Second, it provides an overview of relevant benchmarking activities including new insights on the European Union's (EU's) benchmarking activities. Finally, suggestions for the future design of the EU's benchmarking are made.

Findings

The scope of prominent e‐government benchmarks is mostly on the supply/output side and a development stage model of a selection of government (online) services. Benchmarks follow underlying cause‐and‐effect frameworks. Capturing government transformation also remains a core challenge. To discuss the design of e‐government benchmarks, a three‐tier structure is proposed: guiding principles, benchmark methodology, and reporting and learning. Overall, governments around the globe are facing significant changes in the coming years which will shape their thinking on digital government in general and the priorities for benchmarking it in particular. Among others, these are the trade‐off between free market and regulation, demographic change and the information economy.

Practical implications

The paper provides policy makers and consultants with a framework to approach and discuss e‐government benchmarks in general and the future design of the EU e‐government benchmark in particular.

Originality/value

The paper analyzes existing e‐government benchmarks, presents a framework for designing e‐government benchmarks and makes a range of recommendations on changes to the methodology of the EU e‐government benchmark.

Details

Transforming Government: People, Process and Policy, vol. 4 no. 4
Type: Research Article
ISSN: 1750-6166

Keywords

Article
Publication date: 30 January 2023

Zhongbao Liu and Wenjuan Zhao

In recent years, Chinese sentiment analysis has made great progress, but the characteristics of the language itself and downstream task requirements were not explored thoroughly…

Abstract

Purpose

In recent years, Chinese sentiment analysis has made great progress, but the characteristics of the language itself and downstream task requirements were not explored thoroughly. It is not practical to directly migrate achievements obtained in English sentiment analysis to the analysis of Chinese because of the huge difference between the two languages.

Design/methodology/approach

In view of the particularity of Chinese text and the requirement of sentiment analysis, a Chinese sentiment analysis model integrating multi-granularity semantic features is proposed in this paper. This model introduces the radical and part-of-speech features based on the character and word features, with the application of bidirectional long short-term memory, attention mechanism and recurrent convolutional neural network.

Findings

The comparative experiments showed that the F1 values of this model reaches 88.28 and 84.80 per cent on the man-made dataset and the NLPECC dataset, respectively. Meanwhile, an ablation experiment was conducted to verify the effectiveness of attention mechanism, part of speech, radical, character and word factors in Chinese sentiment analysis. The performance of the proposed model exceeds that of existing models to some extent.

Originality/value

The academic contribution of this paper is as follows: first, in view of the particularity of Chinese texts and the requirement of sentiment analysis, this paper focuses on solving the deficiency problem of Chinese sentiment analysis under the big data context. Second, this paper borrows ideas from multiple interdisciplinary frontier theories and methods, such as information science, linguistics and artificial intelligence, which makes it innovative and comprehensive. Finally, this paper deeply integrates multi-granularity semantic features such as character, word, radical and part of speech, which further complements the theoretical framework and method system of Chinese sentiment analysis.

Details

Data Technologies and Applications, vol. 57 no. 4
Type: Research Article
ISSN: 2514-9288

Keywords

Article
Publication date: 28 February 2023

Abhijit Roy, Marat Bakpayev, Melanie Florence Boninsegni, Smriti Kumar, Jean-Paul Peronard and Thomas Reimer

Technological progress and the advancement of the 4th Industrial Revolution (IR 4.0) are well underway. However, its influence on the transformation of core sectors from the…

Abstract

Purpose

Technological progress and the advancement of the 4th Industrial Revolution (IR 4.0) are well underway. However, its influence on the transformation of core sectors from the perspective of consumer well-being remains under-explored. Seeking to bridge this gap in the marketing and public policy literature, this study aims to propose a conceptual framework to explicate how data-driven, intelligent and connected IR 4.0 technologies are blurring traditional boundaries between digital, physical and biological domains.

Design/methodology/approach

This is a conceptual paper using primarily a literature review of the field. The authors position the work as a contribution to consumer well-being and public policy literature from the lens of increasingly important in our technology-integrated society emerging technologies.

Findings

The authors define and conceptualize technology-enabled well-being (TEW), which allows a better understanding of transformative outcomes of IR 4.0 on three essential dimensions of consumer well-being: individual, societal and environmental. Finally, the authors discuss public policy implications and outline future research directions.

Originality/value

The authors highlight specific gaps in the literature on IR 4.0. First, past studies in consumer well-being did not incorporate substantial changes that emerging IR 4.0 technologies bring, especially across increasingly blurring digital, physical and biological domains. Second, past research focused on individual technologies and individual well-being. What is unaccounted for is the potential for a synergetic, proactive effect that emerging technologies bring on the aggregate level not only to individuals but also to society and the environment. Finally, understanding the differences between responses to different outcomes of technologies has important implications for developing public policy. Synergetic, proactive effect of technologies on core sectors such as healthcare, education, financial services, manufacturing and retailing is noted.

Details

Journal of Consumer Marketing, vol. 40 no. 4
Type: Research Article
ISSN: 0736-3761

Keywords

1 – 10 of 326