Search results
1 – 10 of over 83000BRIAN VICKERY and ALINA VICKERY
There is a huge amount of information and data stored in publicly available online databases that consist of large text files accessed by Boolean search techniques. It is widely…
Abstract
There is a huge amount of information and data stored in publicly available online databases that consist of large text files accessed by Boolean search techniques. It is widely held that less use is made of these databases than could or should be the case, and that one reason for this is that potential users find it difficult to identify which databases to search, to use the various command languages of the hosts and to construct the Boolean search statements required. This reasoning has stimulated a considerable amount of exploration and development work on the construction of search interfaces, to aid the inexperienced user to gain effective access to these databases. The aim of our paper is to review aspects of the design of such interfaces: to indicate the requirements that must be met if maximum aid is to be offered to the inexperienced searcher; to spell out the knowledge that must be incorporated in an interface if such aid is to be given; to describe some of the solutions that have been implemented in experimental and operational interfaces; and to discuss some of the problems encountered. The paper closes with an extensive bibliography of references relevant to online search aids, going well beyond the items explicitly mentioned in the text. An index to software appears after the bibliography at the end of the paper.
Robert Gaizauskas and Yorick Wilks
In this paper we give a synoptic view of the growth of the text processing technology of information extraction (IE) whose function is to extract information about a pre‐specified…
Abstract
In this paper we give a synoptic view of the growth of the text processing technology of information extraction (IE) whose function is to extract information about a pre‐specified set of entities, relations or events from natural language texts and to record this information in structured representations called templates. Here we describe the nature of the IE task, review the history of the area from its origins in AI work in the 1960s and 70s till the present, discuss the techniques being used to carry out the task, describe application areas where IE systems are or are about to be at work, and conclude with a discussion of the challenges facing the area. What emerges is a picture of an exciting new text processing technology with a host of new applications, both on its own and in conjunction with other technologies, such as information retrieval, machine translation and data mining.
Details
Keywords
Classification, indexing and abstracting can all be regarded as summarisations of the content of a document. A model of text comprehension by indexers (including classifiers and…
Abstract
Classification, indexing and abstracting can all be regarded as summarisations of the content of a document. A model of text comprehension by indexers (including classifiers and abstractors) is presented, based on task descriptions which indicate that the comprehension of text for indexing differs from normal fluent reading in respect of: operational time constraints, which lead to text being scanned rapidly for perceptual cues to aid gist comprehension; comprehension being task oriented rather than learning oriented, and being followed immediately by the production of an abstract, index, or classification; and the automaticity of processing of text by experienced indexers working within a restricted range of text types. The evidence for the interplay of perceptual and conceptual processing of text under conditions of rapid scanning is reviewed. The allocation of mental resources to text processing is discussed, and a cognitive process model of abstracting, indexing and classification is described.
The ‘Office of the Future’, ‘Office Technology’, ‘Word Processing’, ‘Electronic Mail’, ‘Electronic Communications’, ‘Convergence’, ‘Information Management’. These are all terms…
Abstract
The ‘Office of the Future’, ‘Office Technology’, ‘Word Processing’, ‘Electronic Mail’, ‘Electronic Communications’, ‘Convergence’, ‘Information Management’. These are all terms included in the current list of buzz words used to describe current activities in the office technology area. Open the pages of almost any journal or periodical today and you will probably find an article or some reference to one or more of the above subjects. Long, detailed and highly technical theses are appearing on new techniques to automate and revolutionize the office environment. Facts and figures are quoted ad nauseam on the high current cost of writing a letter, filing letters, memos, reports and documents, trying to communicate with someone by telephone or other telecommunication means and, most significant of all, the high cost of people undertaking these never‐ending tasks. The high level of investment in factories and plants and the ever‐increasing fight to improve productivity by automating the dull, routine jobs are usually quoted and compared with the extremely low investment in improving and automating the equally tedious routine jobs in the office environment; the investment in the factory is quoted as being ten times greater per employee than in the office. This, however, is changing rapidly and investment on a large scale is already taking place in many areas as present‐day inflation bites hard, forcing many companies and organizations to take a much closer look at their office operations.
Nael Alqtati, Jonathan A.J. Wilson and Varuna De Silva
This paper aims to equip professionals and researchers in the fields of advertising, branding, public relations, marketing communications, social media analytics and marketing…
Abstract
Purpose
This paper aims to equip professionals and researchers in the fields of advertising, branding, public relations, marketing communications, social media analytics and marketing with a simple, effective and dynamic means of evaluating consumer behavioural sentiments and engagement through Arabic language and script, in vivo.
Design/methodology/approach
Using quantitative and qualitative situational linguistic analyses of Classical Arabic, found in Quranic and religious texts scripts; Modern Standard Arabic, which is commonly used in formal Arabic channels; and dialectical Arabic, which varies hugely from one Arabic country to another: this study analyses rich marketing and consumer messages (tweets) – as a basis for developing an Arabic language social media methodological tool.
Findings
Despite the popularity of Arabic language communication on social media platforms across geographies, currently, comprehensive language processing toolkits for analysing Arabic social media conversations have limitations and require further development. Furthermore, due to its unique morphology, developing text understanding capabilities specific to the Arabic language poses challenges.
Practical implications
This study demonstrates the application and effectiveness of the proposed methodology on a random sample of Twitter data from Arabic-speaking regions. Furthermore, as Arabic is the language of Islam, the study is of particular importance to Islamic and Muslim geographies, markets and marketing.
Social implications
The findings suggest that the proposed methodology has a wider potential beyond the data set and health-care sector analysed, and therefore, can be applied to further markets, social media platforms and consumer segments.
Originality/value
To remedy these gaps, this study presents a new methodology and analytical approach to investigating Arabic language social media conversations, which brings together a multidisciplinary knowledge of technology, data science and marketing communications.
Details
Keywords
Hong Zhou, Binwei Gao, Shilong Tang, Bing Li and Shuyu Wang
The number of construction dispute cases has maintained a high growth trend in recent years. The effective exploration and management of construction contract risk can directly…
Abstract
Purpose
The number of construction dispute cases has maintained a high growth trend in recent years. The effective exploration and management of construction contract risk can directly promote the overall performance of the project life cycle. The miss of clauses may result in a failure to match with standard contracts. If the contract, modified by the owner, omits key clauses, potential disputes may lead to contractors paying substantial compensation. Therefore, the identification of construction project contract missing clauses has heavily relied on the manual review technique, which is inefficient and highly restricted by personnel experience. The existing intelligent means only work for the contract query and storage. It is urgent to raise the level of intelligence for contract clause management. Therefore, this paper aims to propose an intelligent method to detect construction project contract missing clauses based on Natural Language Processing (NLP) and deep learning technology.
Design/methodology/approach
A complete classification scheme of contract clauses is designed based on NLP. First, construction contract texts are pre-processed and converted from unstructured natural language into structured digital vector form. Following the initial categorization, a multi-label classification of long text construction contract clauses is designed to preliminary identify whether the clause labels are missing. After the multi-label clause missing detection, the authors implement a clause similarity algorithm by creatively integrating the image detection thought, MatchPyramid model, with BERT to identify missing substantial content in the contract clauses.
Findings
1,322 construction project contracts were tested. Results showed that the accuracy of multi-label classification could reach 93%, the accuracy of similarity matching can reach 83%, and the recall rate and F1 mean of both can reach more than 0.7. The experimental results verify the feasibility of intelligently detecting contract risk through the NLP-based method to some extent.
Originality/value
NLP is adept at recognizing textual content and has shown promising results in some contract processing applications. However, the mostly used approaches of its utilization for risk detection in construction contract clauses predominantly are rule-based, which encounter challenges when handling intricate and lengthy engineering contracts. This paper introduces an NLP technique based on deep learning which reduces manual intervention and can autonomously identify and tag types of contractual deficiencies, aligning with the evolving complexities anticipated in future construction contracts. Moreover, this method achieves the recognition of extended contract clause texts. Ultimately, this approach boasts versatility; users simply need to adjust parameters such as segmentation based on language categories to detect omissions in contract clauses of diverse languages.
Details
Keywords
Jihye Park, Min Zhang, Seunghyun Yoo and Hannah Gloria Kwon
This study investigates the effects of vertical direction and rotation of English loan brand names in East Asian languages (Chinese and Korean) on processing fluency, perceived…
Abstract
Purpose
This study investigates the effects of vertical direction and rotation of English loan brand names in East Asian languages (Chinese and Korean) on processing fluency, perceived product quality and purchase intention.
Design/methodology/approach
Four experiments were conducted in China and Korea, employing a 2 (vertical direction: downward vs upward) X 3 (rotation: 0°/marquee vs 90° clockwise vs 90° counterclockwise) between-subjects factorial design.
Findings
The findings showed that when the English loan Chinese brand name was displayed downward, the marquee format was preferred, while counterclockwise rotation was favored when displayed upward. In Korean, clockwise rotation was preferred for downward presentation, while counterclockwise rotation was favored for upward presentation. The effects on purchase intention were mediated by processing fluency and perceived product quality.
Practical implications
This research provides practical implications for global manufacturers and retailers, offering guidance on presenting brand names in East Asian languages and optimizing product packaging designs. For Chinese consumers, the marquee format is recommended for downward-oriented brand names, while counterclockwise rotation is effective for upward orientation. For Korean consumers, clockwise rotation is favored for downward presentation and counterclockwise rotation is preferred for upward presentation. Understanding linguistic habits allows the tailoring of brand presentations, enhancing brand perception and consumer responses.
Originality/value
This study contributes to understanding the role of cultural and linguistic influences on consumer information processing and product perception in vertical presentations of brand names.
Details
Keywords
Somayeh Tamjid, Fatemeh Nooshinfard, Molouk Sadat Hosseini Beheshti, Nadjla Hariri and Fahimeh Babalhavaeji
The purpose of this study is to develop a domain independent, cost-effective, time-saving and semi-automated ontology generation framework that could extract taxonomic concepts…
Abstract
Purpose
The purpose of this study is to develop a domain independent, cost-effective, time-saving and semi-automated ontology generation framework that could extract taxonomic concepts from unstructured text corpus. In the human disease domain, ontologies are found to be extremely useful for managing the diversity of technical expressions in favour of information retrieval objectives. The boundaries of these domains are expanding so fast that it is essential to continuously develop new ontologies or upgrade available ones.
Design/methodology/approach
This paper proposes a semi-automated approach that extracts entities/relations via text mining of scientific publications. Text mining-based ontology (TmbOnt)-named code is generated to assist a user in capturing, processing and establishing ontology elements. This code takes a pile of unstructured text files as input and projects them into high-valued entities or relations as output. As a semi-automated approach, a user supervises the process, filters meaningful predecessor/successor phrases and finalizes the demanded ontology-taxonomy. To verify the practical capabilities of the scheme, a case study was performed to drive glaucoma ontology-taxonomy. For this purpose, text files containing 10,000 records were collected from PubMed.
Findings
The proposed approach processed over 3.8 million tokenized terms of those records and yielded the resultant glaucoma ontology-taxonomy. Compared with two famous disease ontologies, TmbOnt-driven taxonomy demonstrated a 60%–100% coverage ratio against famous medical thesauruses and ontology taxonomies, such as Human Disease Ontology, Medical Subject Headings and National Cancer Institute Thesaurus, with an average of 70% additional terms recommended for ontology development.
Originality/value
According to the literature, the proposed scheme demonstrated novel capability in expanding the ontology-taxonomy structure with a semi-automated text mining approach, aiming for future fully-automated approaches.
Details
Keywords
Ankie Visschedijk and Forbes Gibb
This article reviews some of the more unconventional text retrieval systems, emphasising those which have been commercialised. These sophisticated systems improve on conventional…
Abstract
This article reviews some of the more unconventional text retrieval systems, emphasising those which have been commercialised. These sophisticated systems improve on conventional retrieval by using either innovative software or hardware to increase retrieval speed or functionality, precision or recall. The software systems reviewed are: AIDA, CLARIT, Metamorph, SIMPR, STATUS/IQ, TCS, TINA and TOPIC. The hardware systems reviewed are: CAFS‐ISP, the Connection Machine, GESCAN,HSTS,MPP, TEXTRACT, TRW‐FDF and URSA.
SIMPR (Structured Information Management: Processing and Retrieval) is an ESPRIT II Project aiming to achieve technological advances in information management This new technology…
Abstract
SIMPR (Structured Information Management: Processing and Retrieval) is an ESPRIT II Project aiming to achieve technological advances in information management This new technology is instantiated in the SIMPR software system. SIMPR will process documents by indexing them and classifying their subjects, before storing them in an electronic information base from which they can then be retrieved using simple natural language search requests. Building this system has required initiatives in automatic indexing, in language analysis, in subject classification and in machine learning. These initiatives are discussed in this paper, in the context of the strategy and achievements to date of the SIMPR Project.