Search results
1 – 10 of 326Hyo-Jung Oh, Dong-Hyun Won, Chonghyuck Kim, Sung-Hee Park and Yong Kim
The purpose of this paper is to describe the development of an algorithm for realizing web crawlers that automatically collect dynamically generated webpages from the deep web.
Abstract
Purpose
The purpose of this paper is to describe the development of an algorithm for realizing web crawlers that automatically collect dynamically generated webpages from the deep web.
Design/methodology/approach
This study proposes and develops an algorithm to collect web information as if the web crawler gathers static webpages by managing script commands as links. The proposed web crawler actually experiments with the algorithm by collecting deep webpages.
Findings
Among the findings of this study is that if the actual crawling process provides search results as script pages, the outcome only collects the first page. However, the proposed algorithm can collect deep webpages in this case.
Research limitations/implications
To use a script as a link, a human must first analyze the web document. This study uses the web browser object provided by Microsoft Visual Studio as a script launcher, so it cannot collect deep webpages if the web browser object cannot launch the script, or if the web document contains script errors.
Practical implications
The research results show deep webs are estimated to have 450 to 550 times more information than surface webpages, and it is difficult to collect web documents. However, this algorithm helps to enable deep web collection through script runs.
Originality/value
This study presents a new method to be utilized with script links instead of adopting previous keywords. The proposed algorithm is available as an ordinary URL. From the conducted experiment, analysis of scripts on individual websites is needed to employ them as links.
Details
Keywords
There have been many attempts to study the content of the Web, either through human or automatic agents. Describes five different previously used Web survey methodologies, each…
Abstract
There have been many attempts to study the content of the Web, either through human or automatic agents. Describes five different previously used Web survey methodologies, each justifiable in its own right, but presents a simple experiment that demonstrates concrete differences between them. The concept of crawling the Web also bears further inspection, including the scope of the pages to crawl, the method used to access and index each page, and the algorithm for the identification of duplicate pages. The issues involved here will be well‐known to many computer scientists but, with the increasing use of crawlers and search engines in other disciplines, they now require a public discussion in the wider research community. Concludes that any scientific attempt to crawl the Web must make available the parameters under which it is operating so that researchers can, in principle, replicate experiments or be aware of and take into account differences between methodologies. Also introduces a new hybrid random page selection methodology.
Every hyperlink pointing at a Web site is a potential source of new visitors, especially one near the top of a results page from a popular search engine. The order of the links in…
Abstract
Every hyperlink pointing at a Web site is a potential source of new visitors, especially one near the top of a results page from a popular search engine. The order of the links in a search results page is often decided upon by an algorithm that takes into account the number and quality of links to all matching pages. The number of standard links targeted at a site is therefore doubly important, yet little research has touched on the actual interlinkage between business Web sites, which numerically dominate the Web. Discusses business use of the Web and related search engine design issues as well as research on general and academic links before reporting on a survey of the links published by a relatively random collection of business Web sites. The results indicate that around 66 percent of Web sites do carry external links, most of which are targeted at a specific purpose, but that about 17 percent publish general links, with implications for those designing and marketing Web sites.
Details
Keywords
Sudip Ranjan Hatua and Devika P. Madalli
The purpose of this paper is to discuss the methodology in building an integrated domain information system with illustrations that provide proof of concept
Abstract
Purpose
The purpose of this paper is to discuss the methodology in building an integrated domain information system with illustrations that provide proof of concept
Design/methodology/approach
The present work studies the usual search engine approach to information and its pitfalls. A methodology was adopted for construction of a domain‐based information system, known as Aerospace Information System (AERIS), comprising six distinct steps in identifying and sourcing, evaluating and then technically integrating resources into the information system. AERIS is an integrated gateway for resources in the domain of aerospace science and technology. AERIS is designed to provide information from varied sources such as formal publications (e.g. articles), aggregators (e.g. harvesters) and also informal resources such as blogs and discussion fora. Interaction is provided through a simple user interface.
Findings
The domain‐based information system with focussed collection and services serves patrons with more precision than general web search engines.
Research limitations/implications
At present the AERIS system is populated with a limited number of resources. A fully‐fledged system may be developed based on the same model.
Originality/value
This original research work provides a model for a comprehensive integrated gateway to domain‐based information using open‐source tools.
Details
Keywords
In case of thin paint films, the standard type of high‐voltage holiday detector is not suitable owing to breakdown of the thin film, and for this reason Metal & Pipeline Endurance…
Abstract
In case of thin paint films, the standard type of high‐voltage holiday detector is not suitable owing to breakdown of the thin film, and for this reason Metal & Pipeline Endurance Ltd. have developed a thin paint film holiday detector which will determine pinholes, voids or bare spots in a surface coating with a very high electrical resistance when such material is covering an electrically conductive surface.
Marcel Machill, Christoph Neuberger and Friedemann Schindler
Search engines exist to help sort through all the information available on the Internet, but have thus fair failed to shoulder any responsibility for the content which appears on…
Abstract
Search engines exist to help sort through all the information available on the Internet, but have thus fair failed to shoulder any responsibility for the content which appears on the pages they present in their indexes. Search engines lack any transparency to clarify how results were found, and how they are connected to the search terms. Thus, problems arise in connection with the protection of minors – namely, that minors have access, intentional or unwitting, to content which may be harmful to them. The findings of this study point to the need for a better framework for the protection of children. This framework should include codes of conduct for search engines, more accurate labeling of Web site data, and the outlawing of search engine manipulation. This study is intended as a first step in making the public aware of the problem of protecting children on the Internet.
Details
Keywords
Web links are a phenomenon of interest to bibliometricians by analogy with citations, and to others because of their use in Web navigation and search engines. It is known that…
Abstract
Web links are a phenomenon of interest to bibliometricians by analogy with citations, and to others because of their use in Web navigation and search engines. It is known that very few links on university Web sites are targeted at scholarly expositions and yet, at least in the UK and Australia, a correlation has been established between link count metrics for universities and measures of institutional research. This paper operates on a finer‐grained level of detail, focussing on counts of links between pairs of universities. It provides evidence of an underlying linear relationship with the quadruple product of the size and research quality of both source and target institution. This simple model is proposed as applying generally to national university systems, subject to a series of constraints to identify cases where it is unlikely to be applicable. It is hoped that the model, if confirmed by studies of other countries, will open the door to deeper mining of academic Web link data.
Details
Keywords
The purpose of this paper is to offer insights and suggestions for the design of existing and future e‐government benchmarks.
Abstract
Purpose
The purpose of this paper is to offer insights and suggestions for the design of existing and future e‐government benchmarks.
Design/methodology/approach
The paper presents several frameworks to structure the discussion of e‐government benchmark design based on a review of existing research and practice. Second, it provides an overview of relevant benchmarking activities including new insights on the European Union's (EU's) benchmarking activities. Finally, suggestions for the future design of the EU's benchmarking are made.
Findings
The scope of prominent e‐government benchmarks is mostly on the supply/output side and a development stage model of a selection of government (online) services. Benchmarks follow underlying cause‐and‐effect frameworks. Capturing government transformation also remains a core challenge. To discuss the design of e‐government benchmarks, a three‐tier structure is proposed: guiding principles, benchmark methodology, and reporting and learning. Overall, governments around the globe are facing significant changes in the coming years which will shape their thinking on digital government in general and the priorities for benchmarking it in particular. Among others, these are the trade‐off between free market and regulation, demographic change and the information economy.
Practical implications
The paper provides policy makers and consultants with a framework to approach and discuss e‐government benchmarks in general and the future design of the EU e‐government benchmark in particular.
Originality/value
The paper analyzes existing e‐government benchmarks, presents a framework for designing e‐government benchmarks and makes a range of recommendations on changes to the methodology of the EU e‐government benchmark.
Details
Keywords
In recent years, Chinese sentiment analysis has made great progress, but the characteristics of the language itself and downstream task requirements were not explored thoroughly…
Abstract
Purpose
In recent years, Chinese sentiment analysis has made great progress, but the characteristics of the language itself and downstream task requirements were not explored thoroughly. It is not practical to directly migrate achievements obtained in English sentiment analysis to the analysis of Chinese because of the huge difference between the two languages.
Design/methodology/approach
In view of the particularity of Chinese text and the requirement of sentiment analysis, a Chinese sentiment analysis model integrating multi-granularity semantic features is proposed in this paper. This model introduces the radical and part-of-speech features based on the character and word features, with the application of bidirectional long short-term memory, attention mechanism and recurrent convolutional neural network.
Findings
The comparative experiments showed that the F1 values of this model reaches 88.28 and 84.80 per cent on the man-made dataset and the NLPECC dataset, respectively. Meanwhile, an ablation experiment was conducted to verify the effectiveness of attention mechanism, part of speech, radical, character and word factors in Chinese sentiment analysis. The performance of the proposed model exceeds that of existing models to some extent.
Originality/value
The academic contribution of this paper is as follows: first, in view of the particularity of Chinese texts and the requirement of sentiment analysis, this paper focuses on solving the deficiency problem of Chinese sentiment analysis under the big data context. Second, this paper borrows ideas from multiple interdisciplinary frontier theories and methods, such as information science, linguistics and artificial intelligence, which makes it innovative and comprehensive. Finally, this paper deeply integrates multi-granularity semantic features such as character, word, radical and part of speech, which further complements the theoretical framework and method system of Chinese sentiment analysis.
Details
Keywords
Abhijit Roy, Marat Bakpayev, Melanie Florence Boninsegni, Smriti Kumar, Jean-Paul Peronard and Thomas Reimer
Technological progress and the advancement of the 4th Industrial Revolution (IR 4.0) are well underway. However, its influence on the transformation of core sectors from the…
Abstract
Purpose
Technological progress and the advancement of the 4th Industrial Revolution (IR 4.0) are well underway. However, its influence on the transformation of core sectors from the perspective of consumer well-being remains under-explored. Seeking to bridge this gap in the marketing and public policy literature, this study aims to propose a conceptual framework to explicate how data-driven, intelligent and connected IR 4.0 technologies are blurring traditional boundaries between digital, physical and biological domains.
Design/methodology/approach
This is a conceptual paper using primarily a literature review of the field. The authors position the work as a contribution to consumer well-being and public policy literature from the lens of increasingly important in our technology-integrated society emerging technologies.
Findings
The authors define and conceptualize technology-enabled well-being (TEW), which allows a better understanding of transformative outcomes of IR 4.0 on three essential dimensions of consumer well-being: individual, societal and environmental. Finally, the authors discuss public policy implications and outline future research directions.
Originality/value
The authors highlight specific gaps in the literature on IR 4.0. First, past studies in consumer well-being did not incorporate substantial changes that emerging IR 4.0 technologies bring, especially across increasingly blurring digital, physical and biological domains. Second, past research focused on individual technologies and individual well-being. What is unaccounted for is the potential for a synergetic, proactive effect that emerging technologies bring on the aggregate level not only to individuals but also to society and the environment. Finally, understanding the differences between responses to different outcomes of technologies has important implications for developing public policy. Synergetic, proactive effect of technologies on core sectors such as healthcare, education, financial services, manufacturing and retailing is noted.
Details