Search results

1 – 10 of over 6000
To view the access options for this content please click here
Book part
Publication date: 26 July 2014

Lars Engwall, Enno Aljets, Tina Hedmo and Raphaël Ramuz

Computer corpus linguistics (CCL) is a scientific innovation that has facilitated the creation and analysis of large corpora in a systematic way by means of computer…

Abstract

Computer corpus linguistics (CCL) is a scientific innovation that has facilitated the creation and analysis of large corpora in a systematic way by means of computer technology since the 1950s. This article provides an account of the CCL pioneers in general but particularly of those in Germany, the Netherlands, Sweden, and Switzerland. It is found that Germany and Sweden, due to more advantageous financing and weaker communities of generativists, had a faster adoption of CCL than the other two countries. A particular late adopter among the four was Switzerland, which did not take up CCL until foreign professors had been recruited.

Details

Organizational Transformation and Scientific Change: The Impact of Institutional Restructuring on Universities and Intellectual Innovation
Type: Book
ISBN: 978-1-78350-684-2

Keywords

To view the access options for this content please click here
Book part
Publication date: 9 November 2020

Siân Alsop, Virginia King, Genie Giaimo and Xiaoyu Xu

In this chapter, we explore uses of corpus linguistics within higher education research. Corpus linguistic approaches enable examination of large bodies of language data…

Abstract

In this chapter, we explore uses of corpus linguistics within higher education research. Corpus linguistic approaches enable examination of large bodies of language data based on computing power. These bodies of data, or corpora, facilitate investigation of the meaning of words in context. The semiautomated nature of such investigation helps researchers to identify and interpret language patterns that might otherwise be inaccessible through manual analysis. We illustrate potential uses of corpus linguistic approaches through four short case studies by higher education researchers, spanning educational contexts, disciplines and genres. These case studies are underpinned by discussion of the development of corpus linguistics as a field of investigation, including existing open corpora and corpus analysis tools. We give a flavour of how corpus linguistic techniques, in isolation or as part of a wider research approach, can be particularly helpful to higher education researchers who wish to investigate language data and its context.

Details

Theory and Method in Higher Education Research
Type: Book
ISBN: 978-1-80043-321-2

Keywords

Content available
Article
Publication date: 17 July 2020

Imad Zeroual and Abdelhak Lakhouaja

Recently, more data-driven approaches are demanding multilingual parallel resources primarily in the cross-language studies. To meet these demands, building multilingual…

Abstract

Recently, more data-driven approaches are demanding multilingual parallel resources primarily in the cross-language studies. To meet these demands, building multilingual parallel corpora are becoming the focus of many Natural Language Processing (NLP) scientific groups. Unlike monolingual corpora, the number of available multilingual parallel corpora is limited. In this paper, the MulTed, a corpus of subtitles extracted from TEDx talks is introduced. It is multilingual, Part of Speech (PoS) tagged, and bilingually sentence-aligned with English as a pivot language. This corpus is designed for many NLP applications, where the sentence-alignment, the PoS tagging, and the size of corpora are influential such as statistical machine translation, language recognition, and bilingual dictionary generation. Currently, the corpus has subtitles that cover 1100 talks available in over 100 languages. The subtitles are classified based on a variety of topics such as Business, Education, and Sport. Regarding the PoS tagging, the Treetagger, a language-independent PoS tagger, is used; then, to make the PoS tagging maximally useful, a mapping process to a universal common tagset is performed. Finally, we believe that making the MulTed corpus available for a public use can be a significant contribution to the literature of NLP and corpus linguistics, especially for under-resourced languages.

Details

Applied Computing and Informatics, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2210-8327

Keywords

Content available
Article
Publication date: 13 July 2020

Dalia Hamed

The purpose of this study is to apply a corpus-assisted analysis of keywords and their collocations in the US presidential discourse from Clinton to Trump to discover the…

Abstract

Purpose

The purpose of this study is to apply a corpus-assisted analysis of keywords and their collocations in the US presidential discourse from Clinton to Trump to discover the meanings of these words and the collocates they have. Keywords are salient words in a corpus whose frequency is unusually high (positive keywords) or low (negative keywords) in comparison with a reference corpus. Collocation is the co-occurrence of words.

Design/methodology/approach

To achieve this purpose, the investigation of keywords and collocations is generated by AntConc, a corpus processing software.

Findings

This analysis leads to shed light on the similarities and/or differences amongst the past four American presidents concerning their key topics. Keyword analysis through keyness makes it evident that Clinton and Obama, being Democrats, demonstrate a clear tendency to improve Americans’ life inside their social sphere. Obama surpasses Clinton as regard foreign affairs. Clinton and Obama’s infrequent subjects have to do with terrorism and immigration. This complies with their condensed focus on social and economic improvements. Bush, a republican, concentrates only on external issues. This is proven by his keywords signifying war against terrorism. Bush’s negative use of words marking cooperative actions conforms to his positive use of words indicating external war. Trump’s positive keywords are about exaggerated descriptions without a defined target. He also shows an unusual frequency in referring to his name and position. His words used with negative keyness refer to reforming programs and external issues. Collocations around each top content keyword clarify the word and harmonize with the presidential orientation negotiated by the keywords.

Research limitations/implications

Limitations have to do with the issue of the accurate representation of the samples.

Originality/value

This research is original in its methodology of applying corpus linguistics tools in the analysis of presidential discourses.

Details

Journal of Humanities and Applied Social Sciences, vol. 3 no. 2
Type: Research Article
ISSN: 2632-279X

Keywords

To view the access options for this content please click here
Article
Publication date: 11 November 2019

Chinmay Tumbe

The purpose of this paper is to demonstrate the utility of corpus linguistics and digitised newspaper archives in management and organisational history.

Abstract

Purpose

The purpose of this paper is to demonstrate the utility of corpus linguistics and digitised newspaper archives in management and organisational history.

Design/methodology/approach

The paper draws its inferences from Google NGram Viewer and five digitised historical newspaper databases – The Times of India, The Financial Times, The Economist, The New York Times and The Wall Street Journal – that contain prints from the nineteenth century.

Findings

The paper argues that corpus linguistics or the quantitative and qualitative analysis of large-scale real-world machine-readable text can be an important method of historical research in management studies, especially for discourse analysis. It shows how this method can be fruitfully used for research in management and organisational history, using term count and cluster analysis. In particular, historical databases of digitised newspapers serve as important corpora to understand the evolution of specific words and concepts. Corpus linguistics using newspaper archives can potentially serve as a method for periodisation and triangulation in corporate, analytically structured and serial histories and also foster cross-country comparisons in the evolution of management concepts.

Research limitations/implications

The paper also shows the limitation of the research method and potential robustness checks while using the method.

Practical implications

Findings of this paper can stimulate new ways of conducting research in management history.

Originality/value

The paper for the first time introduces corpus linguistics as a research method in management history.

Details

Journal of Management History, vol. 25 no. 4
Type: Research Article
ISSN: 1751-1348

Keywords

To view the access options for this content please click here
Article
Publication date: 3 February 2020

Nikola Nikolić, Olivera Grljević and Aleksandar Kovačević

Student recruitment and retention are important issues for all higher education institutions. Constant monitoring of student satisfaction levels is therefore crucial…

Abstract

Purpose

Student recruitment and retention are important issues for all higher education institutions. Constant monitoring of student satisfaction levels is therefore crucial. Traditionally, students voice their opinions through official surveys organized by the universities. In addition to that, nowadays, social media and review websites such as “Rate my professors” are rich sources of opinions that should not be ignored. Automated mining of students’ opinions can be realized via aspect-based sentiment analysis (ABSA). ABSA s is a sub-discipline of natural language processing (NLP) that focusses on the identification of sentiments (negative, neutral, positive) and aspects (sentiment targets) in a sentence. The purpose of this paper is to introduce a system for ABSA of free text reviews expressed in student opinion surveys in the Serbian language. Sentiment analysis was carried out at the finest level of text granularity – the level of sentence segment (phrase and clause).

Design/methodology/approach

The presented system relies on NLP techniques, machine learning models, rules and dictionaries. The corpora collected and annotated for system development and evaluation comprise students’ reviews of teaching staff at the Faculty of Technical Sciences, University of Novi Sad, Serbia, and a corpus of publicly available reviews from the Serbian equivalent of the “Rate my professors” website.

Findings

The research results indicate that positive sentiment can successfully be identified with the F-measure of 0.83, while negative sentiment can be detected with the F-measure of 0.94. While the F-measure for the aspect’s range is between 0.49 and 0.89, depending on their frequency in the corpus. Furthermore, the authors have concluded that the quality of ABSA depends on the source of the reviews (official students’ surveys vs review websites).

Practical implications

The system for ABSA presented in this paper could improve the quality of service provided by the Serbian higher education institutions through a more effective search and summary of students’ opinions. For example, a particular educational institution could very easily find out which aspects of their service the students are not satisfied with and to which aspects of their service more attention should be directed.

Originality/value

To the best of the authors’ knowledge, this is the first study of ABSA carried out at the level of sentence segment for the Serbian language. The methodology and findings presented in this paper provide a much-needed bases for further work on sentiment analysis for the Serbian language that is well under-resourced and under-researched in this area.

To view the access options for this content please click here
Article
Publication date: 2 September 2019

Guellil Imane, Darwish Kareem and Azouaou Faical

This paper aims to propose an approach to automatically annotate a large corpus in Arabic dialect. This corpus is used in order to analyse sentiments of Arabic users on…

Abstract

Purpose

This paper aims to propose an approach to automatically annotate a large corpus in Arabic dialect. This corpus is used in order to analyse sentiments of Arabic users on social medias. It focuses on the Algerian dialect, which is a sub-dialect of Maghrebi Arabic. Although Algerian is spoken by roughly 40 million speakers, few studies address the automated processing in general and the sentiment analysis in specific for Algerian.

Design/methodology/approach

The approach is based on the construction and use of a sentiment lexicon to automatically annotate a large corpus of Algerian text that is extracted from Facebook. Using this approach allow to significantly increase the size of the training corpus without calling the manual annotation. The annotated corpus is then vectorized using document embedding (doc2vec), which is an extension of word embeddings (word2vec). For sentiments classification, the authors used different classifiers such as support vector machines (SVM), Naive Bayes (NB) and logistic regression (LR).

Findings

The results suggest that NB and SVM classifiers generally led to the best results and MLP generally had the worst results. Further, the threshold that the authors use in selecting messages for the training set had a noticeable impact on recall and precision, with a threshold of 0.6 producing the best results. Using PV-DBOW led to slightly higher results than using PV-DM. Combining PV-DBOW and PV-DM representations led to slightly lower results than using PV-DBOW alone. The best results were obtained by the NB classifier with F1 up to 86.9 per cent.

Originality/value

The principal originality of this paper is to determine the right parameters for automatically annotating an Algerian dialect corpus. This annotation is based on a sentiment lexicon that was also constructed automatically.

Details

International Journal of Web Information Systems, vol. 15 no. 5
Type: Research Article
ISSN: 1744-0084

Keywords

To view the access options for this content please click here
Article
Publication date: 16 April 2018

Lynne Bowker

The purpose of this paper is to generate awareness of and interest in the techniques used in computer-based corpus linguistics, focusing on their methodological…

Abstract

Purpose

The purpose of this paper is to generate awareness of and interest in the techniques used in computer-based corpus linguistics, focusing on their methodological implications for research in library and information science (LIS).

Design/methodology/approach

This methodology paper provides an overview of computer-based corpus linguistics, describes the main techniques used in this field, assesses its strengths and weaknesses, and presents examples to illustrate the value of corpus linguistics to LIS research.

Findings

Overall, corpus-based techniques are simple, yet powerful, and they support both quantitative and qualitative analyses. While corpus methods alone may not be sufficient for research in LIS, they can be used to complement and to help triangulate the findings of other methods. Corpus linguistics techniques also have the potential to be exploited more fully in LIS research that involves a higher degree of automation (e.g. recommender systems, knowledge discovery systems, and text mining).

Practical implications

Numerous LIS researchers have drawn attention to the lack of diversity in research methods used in this field, and suggested that approaches permitting mixed methods research are needed. If LIS researchers learn about the potential of computer-based corpus methods, they can diversify their approaches.

Originality/value

Over the past quarter century, corpus linguistics has established itself as one of the main methods used in the field of linguistics, but its potential has not yet been realized by researchers in LIS. Corpus linguistics tools are readily available and relatively straightforward to apply. By raising awareness about corpus linguistics, the author hopes to make these techniques available as additional tools in the LIS researcher’s methodological toolbox, thus broadening the range of methods applied in this field.

Details

Library Hi Tech, vol. 36 no. 2
Type: Research Article
ISSN: 0737-8831

Keywords

To view the access options for this content please click here
Article
Publication date: 4 October 2018

Yuqin Liu, Lanling Han, Bo Jiang and Xiaoyan Su

The aim of this paper is to solve the problem of lack of real context in JFL (Japanese as Foreign Language) classroom with video corpus-based teaching. It also offers…

Abstract

Purpose

The aim of this paper is to solve the problem of lack of real context in JFL (Japanese as Foreign Language) classroom with video corpus-based teaching. It also offers reference for the development of video corpus.

Design/methodology/approach

The authors designed an intelligent Japanese online video corpus, namely the JV Finder, which is a corpus of Japanese films and TV series. The authors applied the JV Finder to JFL teaching to solve the problem of lack of real context and designed several teaching experiments to validate its benefits.

Findings

The results of teaching experiments show that the video corpus-based teaching significantly improves the learning effect. The JV Finder can help students memorize vocabularies and understand the meaning of new vocabularies in a better way.

Research limitations/implications

There are still some differences in language context between real life and films, which cannot fully reflect the state of native speaker in real life. Meanwhile, the number of students participating in this experiment is relatively small, so the universality of the result need further study.

Practical implications

This study combined linguistics with software engineering to solve the problem of lack of real context. Video corpus-based teaching not only can be used in Japanese teaching field but also provide value for other foreign language teaching.

Social implications

The JV Finder has obtained Chinese national patent license (patent no. 20131118). The video corpus (the JV Finder) has a far-reaching impact on JFL teaching.

Originality/value

This paper provides an intelligent Japanese online video corpus. It is applied to JFL teaching to solve the problem of lack of real context. The findings show that the video corpus can significantly improve the effectiveness of Japanese learning.

Details

The Electronic Library, vol. 36 no. 4
Type: Research Article
ISSN: 0264-0473

Keywords

To view the access options for this content please click here
Article
Publication date: 1 May 2019

Mehrdad Vasheghani Farahani and Zeinab Amiri

In an effort to bridge the gap between applying translation corpora, specialized terminology teaching and translation performance of undergraduate students, the purpose of…

Abstract

Purpose

In an effort to bridge the gap between applying translation corpora, specialized terminology teaching and translation performance of undergraduate students, the purpose of this paper is to investigate the possible impacts of teaching specialized terminology of law as a specific area of inquiry on translation performance of Iranian undergraduate translation student (English–Persian language pairs). The null hypothesis of this study is that using specialized terminology does not have statistically significant impacts on the translation performance of the translation students.

Design/methodology/approach

The design of this research was experimental in that there was pretest, treatment, posttest and random sampling. In other words, this research was pre-experimental one-group pretest-posttest design. This design was used in this research as the number of subjects who participated in the research was limited. Apart from being experimental, this research enjoyed a corpus-based perspective. As Mcenery and Hardie (2012) claim, corpus-based research uses the “corpus data in order to explore a theory or hypothesis, typically one established in the current literature, in order to validate it, refute it or refine it” (p. 6). Table I shows the design of this research.

Findings

The results of this research indicated that on the whole, the posttest results had statistically significant differences with that of the pretest. In this regard, the quality of students’ translation enhanced after using the specialized terminology in the form of three types of corpora. Indeed, there was a general trend in the improved quality of the novice translators in translating specialized and subject-field terminologies in an English–Persian context.

Originality/value

This paper is original in that it probes into one of the less researched areas of Translation Studies Research and employs corpora methodology.

Details

Journal of Applied Research in Higher Education, vol. 11 no. 3
Type: Research Article
ISSN: 2050-7003

Keywords

1 – 10 of over 6000