Search results
1 – 10 of over 9000Lars Engwall, Enno Aljets, Tina Hedmo and Raphaël Ramuz
Computer corpus linguistics (CCL) is a scientific innovation that has facilitated the creation and analysis of large corpora in a systematic way by means of computer technology…
Abstract
Computer corpus linguistics (CCL) is a scientific innovation that has facilitated the creation and analysis of large corpora in a systematic way by means of computer technology since the 1950s. This article provides an account of the CCL pioneers in general but particularly of those in Germany, the Netherlands, Sweden, and Switzerland. It is found that Germany and Sweden, due to more advantageous financing and weaker communities of generativists, had a faster adoption of CCL than the other two countries. A particular late adopter among the four was Switzerland, which did not take up CCL until foreign professors had been recruited.
Details
Keywords
Siân Alsop, Virginia King, Genie Giaimo and Xiaoyu Xu
In this chapter, we explore uses of corpus linguistics within higher education research. Corpus linguistic approaches enable examination of large bodies of language data based on…
Abstract
In this chapter, we explore uses of corpus linguistics within higher education research. Corpus linguistic approaches enable examination of large bodies of language data based on computing power. These bodies of data, or corpora, facilitate investigation of the meaning of words in context. The semiautomated nature of such investigation helps researchers to identify and interpret language patterns that might otherwise be inaccessible through manual analysis. We illustrate potential uses of corpus linguistic approaches through four short case studies by higher education researchers, spanning educational contexts, disciplines and genres. These case studies are underpinned by discussion of the development of corpus linguistics as a field of investigation, including existing open corpora and corpus analysis tools. We give a flavour of how corpus linguistic techniques, in isolation or as part of a wider research approach, can be particularly helpful to higher education researchers who wish to investigate language data and its context.
Details
Keywords
Mohamad Javad Baghiat Esfahani and Saeed Ketabi
This study attempts to evaluate the effect of the corpus-based inductive teaching approach with multiple academic corpora (PICA, CAEC and Oxford Corpus of Academic English) and…
Abstract
Purpose
This study attempts to evaluate the effect of the corpus-based inductive teaching approach with multiple academic corpora (PICA, CAEC and Oxford Corpus of Academic English) and conventional deductive teaching approach (i.e., multiple-choice items, filling the gap, matching and underlining) on learning academic collocations by Iranian advanced EFL learners (students learning English as a foreign language).
Design/methodology/approach
This is a quasi-experimental, quantitative and qualitative study.
Findings
The result showed the experimental group outperformed significantly compared with the control group. The experimental group also shared their perception of the advantages and disadvantages of the corpus-assisted language teaching approach.
Originality/value
Despite growing progress in language pedagogy, methodologies and language curriculum design, there are still many teachers who experience poor performance in their students' vocabulary, whether in comprehension or production. In Iran, for example, even though mandatory English education begins at the age of 13, which is junior and senior high school, students still have serious problems in language production and comprehension when they reach university levels.
Details
Keywords
Peter Organisciak, Michele Newman, David Eby, Selcuk Acar and Denis Dumas
Most educational assessments tend to be constructed in a close-ended format, which is easier to score consistently and more affordable. However, recent work has leveraged…
Abstract
Purpose
Most educational assessments tend to be constructed in a close-ended format, which is easier to score consistently and more affordable. However, recent work has leveraged computation text methods from the information sciences to make open-ended measurement more effective and reliable for older students. The purpose of this study is to determine whether models used by computational text mining applications need to be adapted when used with samples of elementary-aged children.
Design/methodology/approach
This study introduces domain-adapted semantic models for child-specific text analysis, to allow better elementary-aged educational assessment. A corpus compiled from a multimodal mix of spoken and written child-directed sources is presented, used to train a children’s language model and evaluated against standard non-age-specific semantic models.
Findings
Child-oriented language is found to differ in vocabulary and word sense use from general English, while exhibiting lower gender and race biases. The model is evaluated in an educational application of divergent thinking measurement and shown to improve on generalized English models.
Research limitations/implications
The findings demonstrate the need for age-specific language models in the growing domain of automated divergent thinking and strongly encourage the same for other educational uses of computation text analysis by showing a measurable difference in the language of children.
Social implications
Understanding children’s language more representatively in automated educational assessment allows for more fair and equitable testing. Furthermore, child-specific language models have fewer gender and race biases.
Originality/value
Research in computational measurement of open-ended responses has thus far used models of language trained on general English sources or domain-specific sources such as textbooks. To the best of the authors’ knowledge, this paper is the first to study age-specific language models for educational assessment. In addition, while there have been several targeted, high-quality corpora of child-created or child-directed speech, the corpus presented here is the first developed with the breadth and scale required for large-scale text modeling.
Details
Keywords
The purpose of this paper is to demonstrate the utility of corpus linguistics and digitised newspaper archives in management and organisational history.
Abstract
Purpose
The purpose of this paper is to demonstrate the utility of corpus linguistics and digitised newspaper archives in management and organisational history.
Design/methodology/approach
The paper draws its inferences from Google NGram Viewer and five digitised historical newspaper databases – The Times of India, The Financial Times, The Economist, The New York Times and The Wall Street Journal – that contain prints from the nineteenth century.
Findings
The paper argues that corpus linguistics or the quantitative and qualitative analysis of large-scale real-world machine-readable text can be an important method of historical research in management studies, especially for discourse analysis. It shows how this method can be fruitfully used for research in management and organisational history, using term count and cluster analysis. In particular, historical databases of digitised newspapers serve as important corpora to understand the evolution of specific words and concepts. Corpus linguistics using newspaper archives can potentially serve as a method for periodisation and triangulation in corporate, analytically structured and serial histories and also foster cross-country comparisons in the evolution of management concepts.
Research limitations/implications
The paper also shows the limitation of the research method and potential robustness checks while using the method.
Practical implications
Findings of this paper can stimulate new ways of conducting research in management history.
Originality/value
The paper for the first time introduces corpus linguistics as a research method in management history.
Details
Keywords
Nikola Nikolić, Olivera Grljević and Aleksandar Kovačević
Student recruitment and retention are important issues for all higher education institutions. Constant monitoring of student satisfaction levels is therefore crucial…
Abstract
Purpose
Student recruitment and retention are important issues for all higher education institutions. Constant monitoring of student satisfaction levels is therefore crucial. Traditionally, students voice their opinions through official surveys organized by the universities. In addition to that, nowadays, social media and review websites such as “Rate my professors” are rich sources of opinions that should not be ignored. Automated mining of students’ opinions can be realized via aspect-based sentiment analysis (ABSA). ABSA s is a sub-discipline of natural language processing (NLP) that focusses on the identification of sentiments (negative, neutral, positive) and aspects (sentiment targets) in a sentence. The purpose of this paper is to introduce a system for ABSA of free text reviews expressed in student opinion surveys in the Serbian language. Sentiment analysis was carried out at the finest level of text granularity – the level of sentence segment (phrase and clause).
Design/methodology/approach
The presented system relies on NLP techniques, machine learning models, rules and dictionaries. The corpora collected and annotated for system development and evaluation comprise students’ reviews of teaching staff at the Faculty of Technical Sciences, University of Novi Sad, Serbia, and a corpus of publicly available reviews from the Serbian equivalent of the “Rate my professors” website.
Findings
The research results indicate that positive sentiment can successfully be identified with the F-measure of 0.83, while negative sentiment can be detected with the F-measure of 0.94. While the F-measure for the aspect’s range is between 0.49 and 0.89, depending on their frequency in the corpus. Furthermore, the authors have concluded that the quality of ABSA depends on the source of the reviews (official students’ surveys vs review websites).
Practical implications
The system for ABSA presented in this paper could improve the quality of service provided by the Serbian higher education institutions through a more effective search and summary of students’ opinions. For example, a particular educational institution could very easily find out which aspects of their service the students are not satisfied with and to which aspects of their service more attention should be directed.
Originality/value
To the best of the authors’ knowledge, this is the first study of ABSA carried out at the level of sentence segment for the Serbian language. The methodology and findings presented in this paper provide a much-needed bases for further work on sentiment analysis for the Serbian language that is well under-resourced and under-researched in this area.
Details
Keywords
Guellil Imane, Darwish Kareem and Azouaou Faical
This paper aims to propose an approach to automatically annotate a large corpus in Arabic dialect. This corpus is used in order to analyse sentiments of Arabic users on social…
Abstract
Purpose
This paper aims to propose an approach to automatically annotate a large corpus in Arabic dialect. This corpus is used in order to analyse sentiments of Arabic users on social medias. It focuses on the Algerian dialect, which is a sub-dialect of Maghrebi Arabic. Although Algerian is spoken by roughly 40 million speakers, few studies address the automated processing in general and the sentiment analysis in specific for Algerian.
Design/methodology/approach
The approach is based on the construction and use of a sentiment lexicon to automatically annotate a large corpus of Algerian text that is extracted from Facebook. Using this approach allow to significantly increase the size of the training corpus without calling the manual annotation. The annotated corpus is then vectorized using document embedding (doc2vec), which is an extension of word embeddings (word2vec). For sentiments classification, the authors used different classifiers such as support vector machines (SVM), Naive Bayes (NB) and logistic regression (LR).
Findings
The results suggest that NB and SVM classifiers generally led to the best results and MLP generally had the worst results. Further, the threshold that the authors use in selecting messages for the training set had a noticeable impact on recall and precision, with a threshold of 0.6 producing the best results. Using PV-DBOW led to slightly higher results than using PV-DM. Combining PV-DBOW and PV-DM representations led to slightly lower results than using PV-DBOW alone. The best results were obtained by the NB classifier with F1 up to 86.9 per cent.
Originality/value
The principal originality of this paper is to determine the right parameters for automatically annotating an Algerian dialect corpus. This annotation is based on a sentiment lexicon that was also constructed automatically.
Details
Keywords
The purpose of this paper is to generate awareness of and interest in the techniques used in computer-based corpus linguistics, focusing on their methodological implications for…
Abstract
Purpose
The purpose of this paper is to generate awareness of and interest in the techniques used in computer-based corpus linguistics, focusing on their methodological implications for research in library and information science (LIS).
Design/methodology/approach
This methodology paper provides an overview of computer-based corpus linguistics, describes the main techniques used in this field, assesses its strengths and weaknesses, and presents examples to illustrate the value of corpus linguistics to LIS research.
Findings
Overall, corpus-based techniques are simple, yet powerful, and they support both quantitative and qualitative analyses. While corpus methods alone may not be sufficient for research in LIS, they can be used to complement and to help triangulate the findings of other methods. Corpus linguistics techniques also have the potential to be exploited more fully in LIS research that involves a higher degree of automation (e.g. recommender systems, knowledge discovery systems, and text mining).
Practical implications
Numerous LIS researchers have drawn attention to the lack of diversity in research methods used in this field, and suggested that approaches permitting mixed methods research are needed. If LIS researchers learn about the potential of computer-based corpus methods, they can diversify their approaches.
Originality/value
Over the past quarter century, corpus linguistics has established itself as one of the main methods used in the field of linguistics, but its potential has not yet been realized by researchers in LIS. Corpus linguistics tools are readily available and relatively straightforward to apply. By raising awareness about corpus linguistics, the author hopes to make these techniques available as additional tools in the LIS researcher’s methodological toolbox, thus broadening the range of methods applied in this field.
Details
Keywords
Yuqin Liu, Lanling Han, Bo Jiang and Xiaoyan Su
The aim of this paper is to solve the problem of lack of real context in JFL (Japanese as Foreign Language) classroom with video corpus-based teaching. It also offers reference…
Abstract
Purpose
The aim of this paper is to solve the problem of lack of real context in JFL (Japanese as Foreign Language) classroom with video corpus-based teaching. It also offers reference for the development of video corpus.
Design/methodology/approach
The authors designed an intelligent Japanese online video corpus, namely the JV Finder, which is a corpus of Japanese films and TV series. The authors applied the JV Finder to JFL teaching to solve the problem of lack of real context and designed several teaching experiments to validate its benefits.
Findings
The results of teaching experiments show that the video corpus-based teaching significantly improves the learning effect. The JV Finder can help students memorize vocabularies and understand the meaning of new vocabularies in a better way.
Research limitations/implications
There are still some differences in language context between real life and films, which cannot fully reflect the state of native speaker in real life. Meanwhile, the number of students participating in this experiment is relatively small, so the universality of the result need further study.
Practical implications
This study combined linguistics with software engineering to solve the problem of lack of real context. Video corpus-based teaching not only can be used in Japanese teaching field but also provide value for other foreign language teaching.
Social implications
The JV Finder has obtained Chinese national patent license (patent no. 20131118). The video corpus (the JV Finder) has a far-reaching impact on JFL teaching.
Originality/value
This paper provides an intelligent Japanese online video corpus. It is applied to JFL teaching to solve the problem of lack of real context. The findings show that the video corpus can significantly improve the effectiveness of Japanese learning.
Details
Keywords
Mehrdad Vasheghani Farahani and Zeinab Amiri
In an effort to bridge the gap between applying translation corpora, specialized terminology teaching and translation performance of undergraduate students, the purpose of this…
Abstract
Purpose
In an effort to bridge the gap between applying translation corpora, specialized terminology teaching and translation performance of undergraduate students, the purpose of this paper is to investigate the possible impacts of teaching specialized terminology of law as a specific area of inquiry on translation performance of Iranian undergraduate translation student (English–Persian language pairs). The null hypothesis of this study is that using specialized terminology does not have statistically significant impacts on the translation performance of the translation students.
Design/methodology/approach
The design of this research was experimental in that there was pretest, treatment, posttest and random sampling. In other words, this research was pre-experimental one-group pretest-posttest design. This design was used in this research as the number of subjects who participated in the research was limited. Apart from being experimental, this research enjoyed a corpus-based perspective. As Mcenery and Hardie (2012) claim, corpus-based research uses the “corpus data in order to explore a theory or hypothesis, typically one established in the current literature, in order to validate it, refute it or refine it” (p. 6). Table I shows the design of this research.
Findings
The results of this research indicated that on the whole, the posttest results had statistically significant differences with that of the pretest. In this regard, the quality of students’ translation enhanced after using the specialized terminology in the form of three types of corpora. Indeed, there was a general trend in the improved quality of the novice translators in translating specialized and subject-field terminologies in an English–Persian context.
Originality/value
This paper is original in that it probes into one of the less researched areas of Translation Studies Research and employs corpora methodology.
Details