Search results

1 – 10 of over 9000
Book part
Publication date: 26 July 2014

Lars Engwall, Enno Aljets, Tina Hedmo and Raphaël Ramuz

Computer corpus linguistics (CCL) is a scientific innovation that has facilitated the creation and analysis of large corpora in a systematic way by means of computer technology…

Abstract

Computer corpus linguistics (CCL) is a scientific innovation that has facilitated the creation and analysis of large corpora in a systematic way by means of computer technology since the 1950s. This article provides an account of the CCL pioneers in general but particularly of those in Germany, the Netherlands, Sweden, and Switzerland. It is found that Germany and Sweden, due to more advantageous financing and weaker communities of generativists, had a faster adoption of CCL than the other two countries. A particular late adopter among the four was Switzerland, which did not take up CCL until foreign professors had been recruited.

Details

Organizational Transformation and Scientific Change: The Impact of Institutional Restructuring on Universities and Intellectual Innovation
Type: Book
ISBN: 978-1-78350-684-2

Keywords

Book part
Publication date: 9 November 2020

Siân Alsop, Virginia King, Genie Giaimo and Xiaoyu Xu

In this chapter, we explore uses of corpus linguistics within higher education research. Corpus linguistic approaches enable examination of large bodies of language data based on…

Abstract

In this chapter, we explore uses of corpus linguistics within higher education research. Corpus linguistic approaches enable examination of large bodies of language data based on computing power. These bodies of data, or corpora, facilitate investigation of the meaning of words in context. The semiautomated nature of such investigation helps researchers to identify and interpret language patterns that might otherwise be inaccessible through manual analysis. We illustrate potential uses of corpus linguistic approaches through four short case studies by higher education researchers, spanning educational contexts, disciplines and genres. These case studies are underpinned by discussion of the development of corpus linguistics as a field of investigation, including existing open corpora and corpus analysis tools. We give a flavour of how corpus linguistic techniques, in isolation or as part of a wider research approach, can be particularly helpful to higher education researchers who wish to investigate language data and its context.

Details

Theory and Method in Higher Education Research
Type: Book
ISBN: 978-1-80043-321-2

Keywords

Article
Publication date: 28 November 2023

Mohamad Javad Baghiat Esfahani and Saeed Ketabi

This study attempts to evaluate the effect of the corpus-based inductive teaching approach with multiple academic corpora (PICA, CAEC and Oxford Corpus of Academic English) and…

Abstract

Purpose

This study attempts to evaluate the effect of the corpus-based inductive teaching approach with multiple academic corpora (PICA, CAEC and Oxford Corpus of Academic English) and conventional deductive teaching approach (i.e., multiple-choice items, filling the gap, matching and underlining) on learning academic collocations by Iranian advanced EFL learners (students learning English as a foreign language).

Design/methodology/approach

This is a quasi-experimental, quantitative and qualitative study.

Findings

The result showed the experimental group outperformed significantly compared with the control group. The experimental group also shared their perception of the advantages and disadvantages of the corpus-assisted language teaching approach.

Originality/value

Despite growing progress in language pedagogy, methodologies and language curriculum design, there are still many teachers who experience poor performance in their students' vocabulary, whether in comprehension or production. In Iran, for example, even though mandatory English education begins at the age of 13, which is junior and senior high school, students still have serious problems in language production and comprehension when they reach university levels.

Details

Journal of Applied Research in Higher Education, vol. 16 no. 4
Type: Research Article
ISSN: 2050-7003

Keywords

Article
Publication date: 19 January 2023

Peter Organisciak, Michele Newman, David Eby, Selcuk Acar and Denis Dumas

Most educational assessments tend to be constructed in a close-ended format, which is easier to score consistently and more affordable. However, recent work has leveraged…

Abstract

Purpose

Most educational assessments tend to be constructed in a close-ended format, which is easier to score consistently and more affordable. However, recent work has leveraged computation text methods from the information sciences to make open-ended measurement more effective and reliable for older students. The purpose of this study is to determine whether models used by computational text mining applications need to be adapted when used with samples of elementary-aged children.

Design/methodology/approach

This study introduces domain-adapted semantic models for child-specific text analysis, to allow better elementary-aged educational assessment. A corpus compiled from a multimodal mix of spoken and written child-directed sources is presented, used to train a children’s language model and evaluated against standard non-age-specific semantic models.

Findings

Child-oriented language is found to differ in vocabulary and word sense use from general English, while exhibiting lower gender and race biases. The model is evaluated in an educational application of divergent thinking measurement and shown to improve on generalized English models.

Research limitations/implications

The findings demonstrate the need for age-specific language models in the growing domain of automated divergent thinking and strongly encourage the same for other educational uses of computation text analysis by showing a measurable difference in the language of children.

Social implications

Understanding children’s language more representatively in automated educational assessment allows for more fair and equitable testing. Furthermore, child-specific language models have fewer gender and race biases.

Originality/value

Research in computational measurement of open-ended responses has thus far used models of language trained on general English sources or domain-specific sources such as textbooks. To the best of the authors’ knowledge, this paper is the first to study age-specific language models for educational assessment. In addition, while there have been several targeted, high-quality corpora of child-created or child-directed speech, the corpus presented here is the first developed with the breadth and scale required for large-scale text modeling.

Details

Information and Learning Sciences, vol. 124 no. 1/2
Type: Research Article
ISSN: 2398-5348

Keywords

Article
Publication date: 11 November 2019

Chinmay Tumbe

The purpose of this paper is to demonstrate the utility of corpus linguistics and digitised newspaper archives in management and organisational history.

1260

Abstract

Purpose

The purpose of this paper is to demonstrate the utility of corpus linguistics and digitised newspaper archives in management and organisational history.

Design/methodology/approach

The paper draws its inferences from Google NGram Viewer and five digitised historical newspaper databases – The Times of India, The Financial Times, The Economist, The New York Times and The Wall Street Journal – that contain prints from the nineteenth century.

Findings

The paper argues that corpus linguistics or the quantitative and qualitative analysis of large-scale real-world machine-readable text can be an important method of historical research in management studies, especially for discourse analysis. It shows how this method can be fruitfully used for research in management and organisational history, using term count and cluster analysis. In particular, historical databases of digitised newspapers serve as important corpora to understand the evolution of specific words and concepts. Corpus linguistics using newspaper archives can potentially serve as a method for periodisation and triangulation in corporate, analytically structured and serial histories and also foster cross-country comparisons in the evolution of management concepts.

Research limitations/implications

The paper also shows the limitation of the research method and potential robustness checks while using the method.

Practical implications

Findings of this paper can stimulate new ways of conducting research in management history.

Originality/value

The paper for the first time introduces corpus linguistics as a research method in management history.

Details

Journal of Management History, vol. 25 no. 4
Type: Research Article
ISSN: 1751-1348

Keywords

Article
Publication date: 3 February 2020

Nikola Nikolić, Olivera Grljević and Aleksandar Kovačević

Student recruitment and retention are important issues for all higher education institutions. Constant monitoring of student satisfaction levels is therefore crucial…

1026

Abstract

Purpose

Student recruitment and retention are important issues for all higher education institutions. Constant monitoring of student satisfaction levels is therefore crucial. Traditionally, students voice their opinions through official surveys organized by the universities. In addition to that, nowadays, social media and review websites such as “Rate my professors” are rich sources of opinions that should not be ignored. Automated mining of students’ opinions can be realized via aspect-based sentiment analysis (ABSA). ABSA s is a sub-discipline of natural language processing (NLP) that focusses on the identification of sentiments (negative, neutral, positive) and aspects (sentiment targets) in a sentence. The purpose of this paper is to introduce a system for ABSA of free text reviews expressed in student opinion surveys in the Serbian language. Sentiment analysis was carried out at the finest level of text granularity – the level of sentence segment (phrase and clause).

Design/methodology/approach

The presented system relies on NLP techniques, machine learning models, rules and dictionaries. The corpora collected and annotated for system development and evaluation comprise students’ reviews of teaching staff at the Faculty of Technical Sciences, University of Novi Sad, Serbia, and a corpus of publicly available reviews from the Serbian equivalent of the “Rate my professors” website.

Findings

The research results indicate that positive sentiment can successfully be identified with the F-measure of 0.83, while negative sentiment can be detected with the F-measure of 0.94. While the F-measure for the aspect’s range is between 0.49 and 0.89, depending on their frequency in the corpus. Furthermore, the authors have concluded that the quality of ABSA depends on the source of the reviews (official students’ surveys vs review websites).

Practical implications

The system for ABSA presented in this paper could improve the quality of service provided by the Serbian higher education institutions through a more effective search and summary of students’ opinions. For example, a particular educational institution could very easily find out which aspects of their service the students are not satisfied with and to which aspects of their service more attention should be directed.

Originality/value

To the best of the authors’ knowledge, this is the first study of ABSA carried out at the level of sentence segment for the Serbian language. The methodology and findings presented in this paper provide a much-needed bases for further work on sentiment analysis for the Serbian language that is well under-resourced and under-researched in this area.

Article
Publication date: 2 September 2019

Guellil Imane, Darwish Kareem and Azouaou Faical

This paper aims to propose an approach to automatically annotate a large corpus in Arabic dialect. This corpus is used in order to analyse sentiments of Arabic users on social…

Abstract

Purpose

This paper aims to propose an approach to automatically annotate a large corpus in Arabic dialect. This corpus is used in order to analyse sentiments of Arabic users on social medias. It focuses on the Algerian dialect, which is a sub-dialect of Maghrebi Arabic. Although Algerian is spoken by roughly 40 million speakers, few studies address the automated processing in general and the sentiment analysis in specific for Algerian.

Design/methodology/approach

The approach is based on the construction and use of a sentiment lexicon to automatically annotate a large corpus of Algerian text that is extracted from Facebook. Using this approach allow to significantly increase the size of the training corpus without calling the manual annotation. The annotated corpus is then vectorized using document embedding (doc2vec), which is an extension of word embeddings (word2vec). For sentiments classification, the authors used different classifiers such as support vector machines (SVM), Naive Bayes (NB) and logistic regression (LR).

Findings

The results suggest that NB and SVM classifiers generally led to the best results and MLP generally had the worst results. Further, the threshold that the authors use in selecting messages for the training set had a noticeable impact on recall and precision, with a threshold of 0.6 producing the best results. Using PV-DBOW led to slightly higher results than using PV-DM. Combining PV-DBOW and PV-DM representations led to slightly lower results than using PV-DBOW alone. The best results were obtained by the NB classifier with F1 up to 86.9 per cent.

Originality/value

The principal originality of this paper is to determine the right parameters for automatically annotating an Algerian dialect corpus. This annotation is based on a sentiment lexicon that was also constructed automatically.

Details

International Journal of Web Information Systems, vol. 15 no. 5
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 16 April 2018

Lynne Bowker

The purpose of this paper is to generate awareness of and interest in the techniques used in computer-based corpus linguistics, focusing on their methodological implications for…

1044

Abstract

Purpose

The purpose of this paper is to generate awareness of and interest in the techniques used in computer-based corpus linguistics, focusing on their methodological implications for research in library and information science (LIS).

Design/methodology/approach

This methodology paper provides an overview of computer-based corpus linguistics, describes the main techniques used in this field, assesses its strengths and weaknesses, and presents examples to illustrate the value of corpus linguistics to LIS research.

Findings

Overall, corpus-based techniques are simple, yet powerful, and they support both quantitative and qualitative analyses. While corpus methods alone may not be sufficient for research in LIS, they can be used to complement and to help triangulate the findings of other methods. Corpus linguistics techniques also have the potential to be exploited more fully in LIS research that involves a higher degree of automation (e.g. recommender systems, knowledge discovery systems, and text mining).

Practical implications

Numerous LIS researchers have drawn attention to the lack of diversity in research methods used in this field, and suggested that approaches permitting mixed methods research are needed. If LIS researchers learn about the potential of computer-based corpus methods, they can diversify their approaches.

Originality/value

Over the past quarter century, corpus linguistics has established itself as one of the main methods used in the field of linguistics, but its potential has not yet been realized by researchers in LIS. Corpus linguistics tools are readily available and relatively straightforward to apply. By raising awareness about corpus linguistics, the author hopes to make these techniques available as additional tools in the LIS researcher’s methodological toolbox, thus broadening the range of methods applied in this field.

Details

Library Hi Tech, vol. 36 no. 2
Type: Research Article
ISSN: 0737-8831

Keywords

Article
Publication date: 4 October 2018

Yuqin Liu, Lanling Han, Bo Jiang and Xiaoyan Su

The aim of this paper is to solve the problem of lack of real context in JFL (Japanese as Foreign Language) classroom with video corpus-based teaching. It also offers reference…

Abstract

Purpose

The aim of this paper is to solve the problem of lack of real context in JFL (Japanese as Foreign Language) classroom with video corpus-based teaching. It also offers reference for the development of video corpus.

Design/methodology/approach

The authors designed an intelligent Japanese online video corpus, namely the JV Finder, which is a corpus of Japanese films and TV series. The authors applied the JV Finder to JFL teaching to solve the problem of lack of real context and designed several teaching experiments to validate its benefits.

Findings

The results of teaching experiments show that the video corpus-based teaching significantly improves the learning effect. The JV Finder can help students memorize vocabularies and understand the meaning of new vocabularies in a better way.

Research limitations/implications

There are still some differences in language context between real life and films, which cannot fully reflect the state of native speaker in real life. Meanwhile, the number of students participating in this experiment is relatively small, so the universality of the result need further study.

Practical implications

This study combined linguistics with software engineering to solve the problem of lack of real context. Video corpus-based teaching not only can be used in Japanese teaching field but also provide value for other foreign language teaching.

Social implications

The JV Finder has obtained Chinese national patent license (patent no. 20131118). The video corpus (the JV Finder) has a far-reaching impact on JFL teaching.

Originality/value

This paper provides an intelligent Japanese online video corpus. It is applied to JFL teaching to solve the problem of lack of real context. The findings show that the video corpus can significantly improve the effectiveness of Japanese learning.

Details

The Electronic Library, vol. 36 no. 4
Type: Research Article
ISSN: 0264-0473

Keywords

Article
Publication date: 1 May 2019

Mehrdad Vasheghani Farahani and Zeinab Amiri

In an effort to bridge the gap between applying translation corpora, specialized terminology teaching and translation performance of undergraduate students, the purpose of this…

Abstract

Purpose

In an effort to bridge the gap between applying translation corpora, specialized terminology teaching and translation performance of undergraduate students, the purpose of this paper is to investigate the possible impacts of teaching specialized terminology of law as a specific area of inquiry on translation performance of Iranian undergraduate translation student (English–Persian language pairs). The null hypothesis of this study is that using specialized terminology does not have statistically significant impacts on the translation performance of the translation students.

Design/methodology/approach

The design of this research was experimental in that there was pretest, treatment, posttest and random sampling. In other words, this research was pre-experimental one-group pretest-posttest design. This design was used in this research as the number of subjects who participated in the research was limited. Apart from being experimental, this research enjoyed a corpus-based perspective. As Mcenery and Hardie (2012) claim, corpus-based research uses the “corpus data in order to explore a theory or hypothesis, typically one established in the current literature, in order to validate it, refute it or refine it” (p. 6). Table I shows the design of this research.

Findings

The results of this research indicated that on the whole, the posttest results had statistically significant differences with that of the pretest. In this regard, the quality of students’ translation enhanced after using the specialized terminology in the form of three types of corpora. Indeed, there was a general trend in the improved quality of the novice translators in translating specialized and subject-field terminologies in an English–Persian context.

Originality/value

This paper is original in that it probes into one of the less researched areas of Translation Studies Research and employs corpora methodology.

Details

Journal of Applied Research in Higher Education, vol. 11 no. 3
Type: Research Article
ISSN: 2050-7003

Keywords

1 – 10 of over 9000