Search results
1 – 10 of 14Imad Zeroual and Abdelhak Lakhouaja
Recently, more data-driven approaches are demanding multilingual parallel resources primarily in the cross-language studies. To meet these demands, building multilingual parallel…
Abstract
Recently, more data-driven approaches are demanding multilingual parallel resources primarily in the cross-language studies. To meet these demands, building multilingual parallel corpora are becoming the focus of many Natural Language Processing (NLP) scientific groups. Unlike monolingual corpora, the number of available multilingual parallel corpora is limited. In this paper, the MulTed, a corpus of subtitles extracted from TEDx talks is introduced. It is multilingual, Part of Speech (PoS) tagged, and bilingually sentence-aligned with English as a pivot language. This corpus is designed for many NLP applications, where the sentence-alignment, the PoS tagging, and the size of corpora are influential such as statistical machine translation, language recognition, and bilingual dictionary generation. Currently, the corpus has subtitles that cover 1100 talks available in over 100 languages. The subtitles are classified based on a variety of topics such as Business, Education, and Sport. Regarding the PoS tagging, the Treetagger, a language-independent PoS tagger, is used; then, to make the PoS tagging maximally useful, a mapping process to a universal common tagset is performed. Finally, we believe that making the MulTed corpus available for a public use can be a significant contribution to the literature of NLP and corpus linguistics, especially for under-resourced languages.
Details
Keywords
Radhia Toujani and Jalel Akaichi
Nowadays, the event detection is so important in gathering news from social media. Indeed, it is widely employed by journalists to generate early alerts of reported stories. In…
Abstract
Purpose
Nowadays, the event detection is so important in gathering news from social media. Indeed, it is widely employed by journalists to generate early alerts of reported stories. In order to incorporate available data on social media into a news story, journalists must manually process, compile and verify the news content within a very short time span. Despite its utility and importance, this process is time-consuming and labor-intensive for media organizations. Because of the afore-mentioned reason and as social media provides an essential source of data used as a support for professional journalists, the purpose of this paper is to propose the citizen clustering technique which allows the community of journalists and media professionals to document news during crises.
Design/methodology/approach
The authors develop, in this study, an approach for natural hazard events news detection and danger citizen’ groups clustering based on three major steps. In the first stage, the authors present a pipeline of several natural language processing tasks: event trigger detection, applied to recuperate potential event triggers; named entity recognition, used for the detection and recognition of event participants related to the extracted event triggers; and, ultimately, a dependency analysis between all the extracted data. Analyzing the ambiguity and the vagueness of similarity of news plays a key role in event detection. This issue was ignored in traditional event detection techniques. To this end, in the second step of our approach, the authors apply fuzzy sets techniques on these extracted events to enhance the clustering quality and remove the vagueness of the extracted information. Then, the defined degree of citizens’ danger is injected as input to the introduced citizens clustering method in order to detect citizens’ communities with close disaster degrees.
Findings
Empirical results indicate that homogeneous and compact citizen’ clusters can be detected using the suggested event detection method. It can also be observed that event news can be analyzed efficiently using the fuzzy theory. In addition, the proposed visualization process plays a crucial role in data journalism, as it is used to analyze event news, as well as in the final presentation of detected danger citizens’ clusters.
Originality/value
The introduced citizens clustering method is profitable for journalists and editors to better judge the veracity of social media content, navigate the overwhelming, identify eyewitnesses and contextualize the event. The empirical analysis results illustrate the efficiency of the developed method for both real and artificial networks.
Details
Keywords
Flavio M. Cecchini, Greta H. Franzini and Marco C. Passarotti
The presence of Latin in heavy metal music ranges from full texts, intros, song and album titles to band names, pseudonyms, and literary quotations. This chapter sheds light on…
Abstract
The presence of Latin in heavy metal music ranges from full texts, intros, song and album titles to band names, pseudonyms, and literary quotations. This chapter sheds light on heavy metal's fascination with the history and ‘arcane’ sound of Latin, and investigates its patterns of use in lyrics with the help of Natural Language Processing tools and digitally-available linguistic resources. First, the authors collected a corpus of lyrics containing differing amounts of Latin and enhanced it with descriptive metadata. Next, the authors calculated the richness of the vocabulary and the distribution of content words. The authors processed the corpus with a morphological analyser and performed both a manual and a computational search for intertextuality, including allusions, paraphrase and verbatim quotations of literary sources. The authors show that, despite it being a dead language, Latin is very frequently used in metal. Its historical status appears to fascinate bands and lends itself well to those religious, epic and mysterious themes so characteristic of the heavy metal world. The widespread use of Latin in metal lyrics, however, sees many bands simply reusing Latin texts – mostly from the Bible – or even misspelling literary quotations.
Details
Keywords
Vijay Viswanathan, Edward C. Malthouse, Ewa Maslowska, Steven Hoornaert and Dirk Van den Poel
The purpose of this paper is to study consumer engagement as a dynamic, iterative process in the context of TV shows. A theoretical framework involving the central constructs of…
Abstract
Purpose
The purpose of this paper is to study consumer engagement as a dynamic, iterative process in the context of TV shows. A theoretical framework involving the central constructs of brand actions, customer engagement behaviors (CEBs), and consumption is proposed. Brand actions of TV shows include advertising and firm-generated content (FGC) on social media. CEBs include volume, sentiment, and richness of user-generated content (UGC) on social media. Consumption comprises live and time-shifted TV viewing.
Design/methodology/approach
The authors study 31 new TV shows introduced in 2015. Consistent with the ecosystem framework, a simultaneous system of equations approach is adopted to analyze data from a US Cable TV provider, Kantar Media, and Twitter.
Findings
The findings show that advertising efforts initiated by the TV show have a positive effect on time-shifted viewing, but a negative effect on live viewing; tweets posted by the TV show (FGC) have a negative effect on time-shifted viewing, but no effect on live viewing; and negative sentiment from tweets posted by viewers (UGC) reduces time-shifted viewing, but increases live viewing.
Originality/value
Content creators and TV networks are faced with the daunting challenge of retaining their audiences in a media-fragmented world. Whereas most studies on engagement have focused on static firm-customer relationships, this study examines engagement from a dynamic, multi-agent perspective by studying interrelationships among brand actions, CEBs, and consumption over time. Accordingly, this study can help brands to quantify the effectiveness of their engagement efforts in terms of encouraging CEBs and eliciting specific TV consumption behaviors.
Details
Keywords
Evanthia Faliagka, Athanasios Tsakalidis and Giannis Tzimas
The purpose of this paper is to present a novel approach for recruiting and ranking job applicants in online recruitment systems, with the objective to automate applicant…
Abstract
Purpose
The purpose of this paper is to present a novel approach for recruiting and ranking job applicants in online recruitment systems, with the objective to automate applicant pre‐screening. An integrated, company‐oriented, e‐recruitment system was implemented based on the proposed scheme and its functionality was showcased and evaluated in a real‐world recruitment scenario.
Design/methodology/approach
The proposed system implements automated candidate ranking, based on objective criteria that can be extracted from the applicant's LinkedIn profile. What is more, candidate personality traits are automatically extracted from his/her social presence using linguistic analysis. The applicant's rank is derived from individual selection criteria using analytical hierarchy process (AHP), while their relative significance (weight) is controlled by the recruiter.
Findings
The proposed e‐recruitment system was deployed in a real‐world recruitment scenario, and its output was validated by expert recruiters. It was found that with the exception of senior positions that required domain experience and specific qualifications, automated pre‐screening performed consistently compared to human recruiters.
Research limitations/implications
It was found that companies can increase the efficiency of the recruitment process if they integrate an e‐recruitment system in their human resources management infrastructure that automates the candidate pre‐screening process. Interviewing and background investigation of applicants can then be limited to the top candidates identified from the system.
Originality/value
To the best of the authors’ knowledge, this is the first e‐recruitment system that supports automated extraction of candidate personality traits using linguistic analysis and ranks candidates with the AHP.
Details
Keywords
Chedi Bechikh Ali, Hatem Haddad and Yahya Slimani
A number of approaches and algorithms have been proposed over the years as a basis for automatic indexing. Many of these approaches suffer from precision inefficiency at low…
Abstract
Purpose
A number of approaches and algorithms have been proposed over the years as a basis for automatic indexing. Many of these approaches suffer from precision inefficiency at low recall. The choice of indexing units has a great impact on search system effectiveness. The authors dive beyond simple terms indexing to propose a framework for multi-word terms (MWT) filtering and indexing.
Design/methodology/approach
In this paper, the authors rely on ranking MWT to filter them, keeping the most effective ones for the indexing process. The proposed model is based on filtering MWT according to their ability to capture the document topic and distinguish between different documents from the same collection. The authors rely on the hypothesis that the best MWT are those that achieve the greatest association degree. The experiments are carried out with English and French languages data sets.
Findings
The results indicate that this approach achieved precision enhancements at low recall, and it performed better than more advanced models based on terms dependencies.
Originality/value
Using and testing different association measures to select MWT that best describe the documents to enhance the precision in the first retrieved documents.
Details
Keywords
Janina Seutter, Michelle Müller, Stefanie Müller and Dennis Kundisch
Whenever social injustice tackled by social movements receives heightened media attention, charitable crowdfunding platforms offer an opportunity to proactively advocate for…
Abstract
Purpose
Whenever social injustice tackled by social movements receives heightened media attention, charitable crowdfunding platforms offer an opportunity to proactively advocate for equality by donating money to affected people. This research examines how the Black Lives Matter movement and the associated social protest cycle after the death of George Floyd have influenced donation behavior for campaigns with a personal goal and those with a societal goal supporting the black community.
Design/methodology/approach
This paper follows a quantitative research approach by applying a quasi-experimental research design on a GoFundMe dataset. In total, 67,905 campaigns and 1,362,499 individual donations were analyzed.
Findings
We uncover a rise in donations for campaigns supporting the black community, which lasts substantially longer for campaigns with a societal than with a personal funding goal. Informed by construal level theory, we attribute this heterogeneity to changes in the level of abstractness of the problems that social movements aim to tackle.
Originality/value
This research advances the knowledge of individual donation behavior in charitable crowdfunding. Our results highlight the important role that charitable crowdfunding campaigns play in promoting social justice and anti-discrimination as part of social protest cycles.
Details
Keywords
Laura Rocca, Davide Giacomini and Paola Zola
Because of the expansion of the internet and Web 2.0 phenomenon, new challenges are emerging in the disclosure practises adopted by organisations in the public-sector. This study…
Abstract
Purpose
Because of the expansion of the internet and Web 2.0 phenomenon, new challenges are emerging in the disclosure practises adopted by organisations in the public-sector. This study aims to examine local governments’ (LGOs) use of social media (SM) in disclosing environmental actions/plans/information as a new way to improve accountability to citizens to obtain organisational legitimacy and the related sentiment of citizens’ judgements.
Design/methodology/approach
This paper analyses the content of 39 Italian LGOs’ public pages on Facebook. After the distinction between five classes of environmental issues (air, water, energy, waste and territory), an initial study is performed to detect possible sub-topics applying latent Dirichlet allocation. Having a list of posts related to specific environmental themes, the researchers computed the sentiment of citizens’ comments. To measure sentiment, two different approaches were implemented: one based on a lexicon dictionary and the other based on convolutional neural networks.
Findings
Facebook is used by LGOs to disclose environmental issues, focussing on their main interest in obtaining organisational legitimacy, and the analysis shows an increasing impact of Web 2.0 in the direct interaction of LGOs with citizens. On the other hand, there is a clear divergence of interest on environmental topics between LGOs and citizens in a dialogic accountability framework.
Practical implications
Sentiment analysis (SA) could be used by politicians, but also by managers/entrepreneurs in the business sector, to analyse stakeholders’ judgements of their communications/actions and plans on corporate social responsibility. This tool gives a result on time (i.e. not months or years after, as for the reporting system). It is cheaper than a survey and allows a first “photograph” of stakeholders’ sentiment. It can also be a useful tool for supporting, developing and improving environmental reporting.
Originality/value
To the best of the authors’ knowledge, this paper is one of the first to apply SA to environmental disclosure via SM in the public sphere. The study links modern techniques in natural language processing and machine learning with the important aspects of environmental communication between LGOs and citizens.
Details