Search results

1 – 10 of over 7000
Book part
Publication date: 26 July 2014

Lars Engwall, Enno Aljets, Tina Hedmo and Raphaël Ramuz

Computer corpus linguistics (CCL) is a scientific innovation that has facilitated the creation and analysis of large corpora in a systematic way by means of computer…

Abstract

Computer corpus linguistics (CCL) is a scientific innovation that has facilitated the creation and analysis of large corpora in a systematic way by means of computer technology since the 1950s. This article provides an account of the CCL pioneers in general but particularly of those in Germany, the Netherlands, Sweden, and Switzerland. It is found that Germany and Sweden, due to more advantageous financing and weaker communities of generativists, had a faster adoption of CCL than the other two countries. A particular late adopter among the four was Switzerland, which did not take up CCL until foreign professors had been recruited.

Details

Organizational Transformation and Scientific Change: The Impact of Institutional Restructuring on Universities and Intellectual Innovation
Type: Book
ISBN: 978-1-78350-684-2

Keywords

Book part
Publication date: 9 November 2020

Siân Alsop, Virginia King, Genie Giaimo and Xiaoyu Xu

In this chapter, we explore uses of corpus linguistics within higher education research. Corpus linguistic approaches enable examination of large bodies of language data…

Abstract

In this chapter, we explore uses of corpus linguistics within higher education research. Corpus linguistic approaches enable examination of large bodies of language data based on computing power. These bodies of data, or corpora, facilitate investigation of the meaning of words in context. The semiautomated nature of such investigation helps researchers to identify and interpret language patterns that might otherwise be inaccessible through manual analysis. We illustrate potential uses of corpus linguistic approaches through four short case studies by higher education researchers, spanning educational contexts, disciplines and genres. These case studies are underpinned by discussion of the development of corpus linguistics as a field of investigation, including existing open corpora and corpus analysis tools. We give a flavour of how corpus linguistic techniques, in isolation or as part of a wider research approach, can be particularly helpful to higher education researchers who wish to investigate language data and its context.

Details

Theory and Method in Higher Education Research
Type: Book
ISBN: 978-1-80043-321-2

Keywords

Article
Publication date: 11 November 2019

Chinmay Tumbe

The purpose of this paper is to demonstrate the utility of corpus linguistics and digitised newspaper archives in management and organisational history.

Abstract

Purpose

The purpose of this paper is to demonstrate the utility of corpus linguistics and digitised newspaper archives in management and organisational history.

Design/methodology/approach

The paper draws its inferences from Google NGram Viewer and five digitised historical newspaper databases – The Times of India, The Financial Times, The Economist, The New York Times and The Wall Street Journal – that contain prints from the nineteenth century.

Findings

The paper argues that corpus linguistics or the quantitative and qualitative analysis of large-scale real-world machine-readable text can be an important method of historical research in management studies, especially for discourse analysis. It shows how this method can be fruitfully used for research in management and organisational history, using term count and cluster analysis. In particular, historical databases of digitised newspapers serve as important corpora to understand the evolution of specific words and concepts. Corpus linguistics using newspaper archives can potentially serve as a method for periodisation and triangulation in corporate, analytically structured and serial histories and also foster cross-country comparisons in the evolution of management concepts.

Research limitations/implications

The paper also shows the limitation of the research method and potential robustness checks while using the method.

Practical implications

Findings of this paper can stimulate new ways of conducting research in management history.

Originality/value

The paper for the first time introduces corpus linguistics as a research method in management history.

Details

Journal of Management History, vol. 25 no. 4
Type: Research Article
ISSN: 1751-1348

Keywords

Article
Publication date: 3 February 2020

Nikola Nikolić, Olivera Grljević and Aleksandar Kovačević

Student recruitment and retention are important issues for all higher education institutions. Constant monitoring of student satisfaction levels is therefore crucial…

Abstract

Purpose

Student recruitment and retention are important issues for all higher education institutions. Constant monitoring of student satisfaction levels is therefore crucial. Traditionally, students voice their opinions through official surveys organized by the universities. In addition to that, nowadays, social media and review websites such as “Rate my professors” are rich sources of opinions that should not be ignored. Automated mining of students’ opinions can be realized via aspect-based sentiment analysis (ABSA). ABSA s is a sub-discipline of natural language processing (NLP) that focusses on the identification of sentiments (negative, neutral, positive) and aspects (sentiment targets) in a sentence. The purpose of this paper is to introduce a system for ABSA of free text reviews expressed in student opinion surveys in the Serbian language. Sentiment analysis was carried out at the finest level of text granularity – the level of sentence segment (phrase and clause).

Design/methodology/approach

The presented system relies on NLP techniques, machine learning models, rules and dictionaries. The corpora collected and annotated for system development and evaluation comprise students’ reviews of teaching staff at the Faculty of Technical Sciences, University of Novi Sad, Serbia, and a corpus of publicly available reviews from the Serbian equivalent of the “Rate my professors” website.

Findings

The research results indicate that positive sentiment can successfully be identified with the F-measure of 0.83, while negative sentiment can be detected with the F-measure of 0.94. While the F-measure for the aspect’s range is between 0.49 and 0.89, depending on their frequency in the corpus. Furthermore, the authors have concluded that the quality of ABSA depends on the source of the reviews (official students’ surveys vs review websites).

Practical implications

The system for ABSA presented in this paper could improve the quality of service provided by the Serbian higher education institutions through a more effective search and summary of students’ opinions. For example, a particular educational institution could very easily find out which aspects of their service the students are not satisfied with and to which aspects of their service more attention should be directed.

Originality/value

To the best of the authors’ knowledge, this is the first study of ABSA carried out at the level of sentence segment for the Serbian language. The methodology and findings presented in this paper provide a much-needed bases for further work on sentiment analysis for the Serbian language that is well under-resourced and under-researched in this area.

Article
Publication date: 2 September 2019

Guellil Imane, Darwish Kareem and Azouaou Faical

This paper aims to propose an approach to automatically annotate a large corpus in Arabic dialect. This corpus is used in order to analyse sentiments of Arabic users on…

Abstract

Purpose

This paper aims to propose an approach to automatically annotate a large corpus in Arabic dialect. This corpus is used in order to analyse sentiments of Arabic users on social medias. It focuses on the Algerian dialect, which is a sub-dialect of Maghrebi Arabic. Although Algerian is spoken by roughly 40 million speakers, few studies address the automated processing in general and the sentiment analysis in specific for Algerian.

Design/methodology/approach

The approach is based on the construction and use of a sentiment lexicon to automatically annotate a large corpus of Algerian text that is extracted from Facebook. Using this approach allow to significantly increase the size of the training corpus without calling the manual annotation. The annotated corpus is then vectorized using document embedding (doc2vec), which is an extension of word embeddings (word2vec). For sentiments classification, the authors used different classifiers such as support vector machines (SVM), Naive Bayes (NB) and logistic regression (LR).

Findings

The results suggest that NB and SVM classifiers generally led to the best results and MLP generally had the worst results. Further, the threshold that the authors use in selecting messages for the training set had a noticeable impact on recall and precision, with a threshold of 0.6 producing the best results. Using PV-DBOW led to slightly higher results than using PV-DM. Combining PV-DBOW and PV-DM representations led to slightly lower results than using PV-DBOW alone. The best results were obtained by the NB classifier with F1 up to 86.9 per cent.

Originality/value

The principal originality of this paper is to determine the right parameters for automatically annotating an Algerian dialect corpus. This annotation is based on a sentiment lexicon that was also constructed automatically.

Details

International Journal of Web Information Systems, vol. 15 no. 5
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 16 April 2018

Lynne Bowker

The purpose of this paper is to generate awareness of and interest in the techniques used in computer-based corpus linguistics, focusing on their methodological…

Abstract

Purpose

The purpose of this paper is to generate awareness of and interest in the techniques used in computer-based corpus linguistics, focusing on their methodological implications for research in library and information science (LIS).

Design/methodology/approach

This methodology paper provides an overview of computer-based corpus linguistics, describes the main techniques used in this field, assesses its strengths and weaknesses, and presents examples to illustrate the value of corpus linguistics to LIS research.

Findings

Overall, corpus-based techniques are simple, yet powerful, and they support both quantitative and qualitative analyses. While corpus methods alone may not be sufficient for research in LIS, they can be used to complement and to help triangulate the findings of other methods. Corpus linguistics techniques also have the potential to be exploited more fully in LIS research that involves a higher degree of automation (e.g. recommender systems, knowledge discovery systems, and text mining).

Practical implications

Numerous LIS researchers have drawn attention to the lack of diversity in research methods used in this field, and suggested that approaches permitting mixed methods research are needed. If LIS researchers learn about the potential of computer-based corpus methods, they can diversify their approaches.

Originality/value

Over the past quarter century, corpus linguistics has established itself as one of the main methods used in the field of linguistics, but its potential has not yet been realized by researchers in LIS. Corpus linguistics tools are readily available and relatively straightforward to apply. By raising awareness about corpus linguistics, the author hopes to make these techniques available as additional tools in the LIS researcher’s methodological toolbox, thus broadening the range of methods applied in this field.

Details

Library Hi Tech, vol. 36 no. 2
Type: Research Article
ISSN: 0737-8831

Keywords

Article
Publication date: 4 October 2018

Yuqin Liu, Lanling Han, Bo Jiang and Xiaoyan Su

The aim of this paper is to solve the problem of lack of real context in JFL (Japanese as Foreign Language) classroom with video corpus-based teaching. It also offers…

Abstract

Purpose

The aim of this paper is to solve the problem of lack of real context in JFL (Japanese as Foreign Language) classroom with video corpus-based teaching. It also offers reference for the development of video corpus.

Design/methodology/approach

The authors designed an intelligent Japanese online video corpus, namely the JV Finder, which is a corpus of Japanese films and TV series. The authors applied the JV Finder to JFL teaching to solve the problem of lack of real context and designed several teaching experiments to validate its benefits.

Findings

The results of teaching experiments show that the video corpus-based teaching significantly improves the learning effect. The JV Finder can help students memorize vocabularies and understand the meaning of new vocabularies in a better way.

Research limitations/implications

There are still some differences in language context between real life and films, which cannot fully reflect the state of native speaker in real life. Meanwhile, the number of students participating in this experiment is relatively small, so the universality of the result need further study.

Practical implications

This study combined linguistics with software engineering to solve the problem of lack of real context. Video corpus-based teaching not only can be used in Japanese teaching field but also provide value for other foreign language teaching.

Social implications

The JV Finder has obtained Chinese national patent license (patent no. 20131118). The video corpus (the JV Finder) has a far-reaching impact on JFL teaching.

Originality/value

This paper provides an intelligent Japanese online video corpus. It is applied to JFL teaching to solve the problem of lack of real context. The findings show that the video corpus can significantly improve the effectiveness of Japanese learning.

Details

The Electronic Library, vol. 36 no. 4
Type: Research Article
ISSN: 0264-0473

Keywords

Article
Publication date: 1 May 2019

Mehrdad Vasheghani Farahani and Zeinab Amiri

In an effort to bridge the gap between applying translation corpora, specialized terminology teaching and translation performance of undergraduate students, the purpose of…

Abstract

Purpose

In an effort to bridge the gap between applying translation corpora, specialized terminology teaching and translation performance of undergraduate students, the purpose of this paper is to investigate the possible impacts of teaching specialized terminology of law as a specific area of inquiry on translation performance of Iranian undergraduate translation student (English–Persian language pairs). The null hypothesis of this study is that using specialized terminology does not have statistically significant impacts on the translation performance of the translation students.

Design/methodology/approach

The design of this research was experimental in that there was pretest, treatment, posttest and random sampling. In other words, this research was pre-experimental one-group pretest-posttest design. This design was used in this research as the number of subjects who participated in the research was limited. Apart from being experimental, this research enjoyed a corpus-based perspective. As Mcenery and Hardie (2012) claim, corpus-based research uses the “corpus data in order to explore a theory or hypothesis, typically one established in the current literature, in order to validate it, refute it or refine it” (p. 6). Table I shows the design of this research.

Findings

The results of this research indicated that on the whole, the posttest results had statistically significant differences with that of the pretest. In this regard, the quality of students’ translation enhanced after using the specialized terminology in the form of three types of corpora. Indeed, there was a general trend in the improved quality of the novice translators in translating specialized and subject-field terminologies in an English–Persian context.

Originality/value

This paper is original in that it probes into one of the less researched areas of Translation Studies Research and employs corpora methodology.

Details

Journal of Applied Research in Higher Education, vol. 11 no. 3
Type: Research Article
ISSN: 2050-7003

Keywords

Article
Publication date: 11 July 2019

Manjula Wijewickrema, Vivien Petras and Naomal Dias

The purpose of this paper is to develop a journal recommender system, which compares the content similarities between a manuscript and the existing journal articles in two…

Abstract

Purpose

The purpose of this paper is to develop a journal recommender system, which compares the content similarities between a manuscript and the existing journal articles in two subject corpora (covering the social sciences and medicine). The study examines the appropriateness of three text similarity measures and the impact of numerous aspects of corpus documents on system performance.

Design/methodology/approach

Implemented three similarity measures one at a time on a journal recommender system with two separate journal corpora. Two distinct samples of test abstracts were classified and evaluated based on the normalized discounted cumulative gain.

Findings

The BM25 similarity measure outperforms both the cosine and unigram language similarity measures overall. The unigram language measure shows the lowest performance. The performance results are significantly different between each pair of similarity measures, while the BM25 and cosine similarity measures are moderately correlated. The cosine similarity achieves better performance for subjects with higher density of technical vocabulary and shorter corpus documents. Moreover, increasing the number of corpus journals in the domain of social sciences achieved better performance for cosine similarity and BM25.

Originality/value

This is the first work related to comparing the suitability of a number of string-based similarity measures with distinct corpora for journal recommender systems.

Details

The Electronic Library , vol. 37 no. 3
Type: Research Article
ISSN: 0264-0473

Keywords

Article
Publication date: 10 August 2012

Pierre Saulais and Jean‐Louis Ermine

Innovation within companies is becoming mandatory and vital. A policy of voluntarism aiming at supporting innovation can be based on an operational process managing the

3422

Abstract

Purpose

Innovation within companies is becoming mandatory and vital. A policy of voluntarism aiming at supporting innovation can be based on an operational process managing the evolution of the firm's intellectual corpus, becoming a tool for innovation. This paper seeks to explain and demonstrate the link between knowledge management and innovation.

Design/methodology/approach

The fundamental assumption is to regard knowledge creation as an intellectual corpus evolution process, based on knowledge workers' creativity. Their creativity is stimulated by the critical analysis of the intellectual corpus, which leads to the creation of new technologic trajectories in continuity or divergence from existing trajectories. Based on a systemic model of intellectual capital, the analysis of the dynamic of knowledge has shown that the increase of value of intellectual capital may be described as an evolutionist process.

Findings

An experiment was conducted to validate the assumptions based on the analysis of the intellectual capital of a company, on the process of generating new items for the intellectual capital, on the regulation of this process by a community of knowledge workers and based on the integration of the results into the value chain of the organization.

Research limitations/implications

Based on interviews with experts about their inventive tracks during recent decades, the main limitations/difficulties come from making the inventory of the intellectual corpus of an organization.

Social implications

Social implications include an emphasis on the projection of experts' inventive tracks onto the knowledge map of the organization.

Originality/value

This paper links intellectual corpus and creativity: creation leads to intellectual property rights.

1 – 10 of over 7000