Keywords and collocations in US presidential discourse since 1993: a corpus-assisted analysis

Dalia Hamed (Department of Foreign Languages (English), Faculty of Education, Tanta University, Tanta, Egypt)

Journal of Humanities and Applied Social Sciences

ISSN: 2632-279X

Article publication date: 13 July 2020

Issue publication date: 23 April 2021




The purpose of this study is to apply a corpus-assisted analysis of keywords and their collocations in the US presidential discourse from Clinton to Trump to discover the meanings of these words and the collocates they have. Keywords are salient words in a corpus whose frequency is unusually high (positive keywords) or low (negative keywords) in comparison with a reference corpus. Collocation is the co-occurrence of words.


To achieve this purpose, the investigation of keywords and collocations is generated by AntConc, a corpus processing software.


This analysis leads to shed light on the similarities and/or differences amongst the past four American presidents concerning their key topics. Keyword analysis through keyness makes it evident that Clinton and Obama, being Democrats, demonstrate a clear tendency to improve Americans’ life inside their social sphere. Obama surpasses Clinton as regard foreign affairs. Clinton and Obama’s infrequent subjects have to do with terrorism and immigration. This complies with their condensed focus on social and economic improvements. Bush, a republican, concentrates only on external issues. This is proven by his keywords signifying war against terrorism. Bush’s negative use of words marking cooperative actions conforms to his positive use of words indicating external war. Trump’s positive keywords are about exaggerated descriptions without a defined target. He also shows an unusual frequency in referring to his name and position. His words used with negative keyness refer to reforming programs and external issues. Collocations around each top content keyword clarify the word and harmonize with the presidential orientation negotiated by the keywords.

Research limitations/implications

Limitations have to do with the issue of the accurate representation of the samples.


This research is original in its methodology of applying corpus linguistics tools in the analysis of presidential discourses.



Hamed, D. (2021), "Keywords and collocations in US presidential discourse since 1993: a corpus-assisted analysis", Journal of Humanities and Applied Social Sciences, Vol. 3 No. 2, pp. 137-158.



Emerald Publishing Limited

Copyright © 2020, Dalia Hamed.


Published in Journal of Humanities and Applied Social Sciences. Published by Emerald Publishing Limited. This article is published under the Creative Commons Attribution (CC BY 4.0) licence. Anyone may reproduce, distribute, translate and create derivative works of this article (for both commercial and non-commercial purposes), subject to full attribution to the original publication and authors. The full terms of this licence may be seen at

1. Statement of the problem

A corpus is a large collection of texts and examples of language stored electronically (Bennet, 2010). Corpus linguistics is considered to be an empirical approach to language analysis depending on a sample of target language stored as an electronic database (i.e. a corpus) (Biber et al. as cited in Anthony, 2017). It is a methodological approach to analyse large collections of texts. It depends on software to analyse large amounts of data. Corpus linguistics analysis, in this way, facilitates textual comparisons on a large scale. Corpora contain 1,000s of words from newspapers, films, transcriptions of spoken discourse, parliamentary speeches, presidential speeches and the like. Recently, electronic corpora have come to prominence as a resource examined by researchers in linguistic studies (McEnery and Gabrielatos, 2006).

AntConc (Anthony, 2011) is a corpus – processing software that offers many functions including the automatic generation of word lists, concordances, a keyword list – a reference corpus (RC) is needed in this concern – and collocations for particular words or phrases. The last two functions are the focus of this paper.

Keywords are words or phrases that describe a given content, text or database (available at: A keyword is used to refer to a word or a phrase when one wants to emphasize how important that word or that phrase is (Collins English Dictionary, available at: Keywords are salient words in a corpus whose frequency is unusually high (positive keywords) or low (negative keywords) in comparison with a reference corpus (available at:, Scott, 2008). To count keywords, the study corpus (SC) is compared to a RC. The two corpora are processed via corpus analysis software, AntConc in this study.

Collocation is defined as the co-occurrence of words with a frequency that is much higher than it would be by chance (Walter in O’Keeffe and McCarthy, 2010). Collocations are the following crucial issues in linguistic studies as they are used to:

  • describe the way in which words group together in their normal use in texts; and

  • describe the analysis tool used to explore this grouping and to assess its significance and implications (Barnbrook et al., 2013, p. 3).

This corpus-assisted study investigates keywords and their collocations in the USA presidential discourse from Clinton to Trump to discover the meanings of these words, what they tell about presidential stances, what interesting collocates they have. This analysis leads to shed light on the similarities and/or differences amongst the past four American presidents concerning their key topics and the items collocating with each key topic.

2. Aim and significance

Because language use “is one of the most mysterious products of the human mind” (Van Peer, 2011, p. 1), this paper aims at examining keywords and their collocates that are of notable statistics in the past four American presidents’ discourses to clarify the topics de/emphasized and the relevant attitudes towards each cause. The current paper aims to present an investigation of keywords and their relevant collocations in an attempt to analyse American presidential discourse from a corpus –linguistics methodological framework.

In this concern, the main objective of this study is the exploration of keywords in American presidential speeches since 1993. Another objective is the search for each keyword’s collocations. Studying items, which have considerable keyness, whether positive or negative and the words collocating with them may be a window to the presidential pivotal schemes.

This research is significant in that it compares American presidential corpora to a larger RC. The RC in this study is prepared so that it will include all the American presidential discourses from Clinton to Trump in an attempt to discover the number of keywords and their collocates and the resulting inferences. This examination of keywords is significant as they are assumed to be tools summarizing the president’s focal points. The method of analysis comprises, in addition to keywords, collocations so that the surrounding text may be considered and the contextual meaning of each key term may be defined. This procedure, accordingly, renders a more significant analytical method of research.

3. Research questions

Because this paper is a corpus linguistics analysis with a text-linguistic focus, it raises questions about text inequality. Some words or phrases share the quality of being a key to the whole theme, while others are not. Accordingly, this paper attempts to address the following questions:


What are the prominent keywords and their relevant collocations in Clinton, Bush, Obama and Trump’s speeches?


What are the words with positive keyness?


What are the relevant collocations of words, which have positive keyness?


What are the words with negative keyness?


What are the relevant collocations of words, which have negative keyness?


What do these tell about the prevailing attitude/stance of each president?

4. Literature review

4.1 Corpus linguistics

Corpus linguistics analyses generated data for propositions that are quantitative, empirical and probabilistic. Quantitative analysis is carried out via computer software that clarifies electronically generated data. The interpretation of quantitative data forms the qualitative analysis (Starcke, 2010).

Biber et al. (1998, p. 4, as cited in Balossi, 2014) demonstrate the characteristics of corpus linguistics as follows:

  • It is empirical.

  • It makes extensive uses of computers for analysis.

  • It depends on both quantitative and qualitative analytical approaches.

The integration between corpus analysis tools and discourse analysis becomes widespread in the past few decades as linguists begin to generate linguistic analysis based on corpus and discourse tools. Because corpus linguistic and discourse analysis are “the twin pillars of language research” (Sinclair, 2004, p. 11), this study makes use of both to carry out a quantitative analysis via corpus tools followed by a qualitative analysis via discourse analysis tools. This integration between corpus linguistics and discourse analysis, namely corpus-assisted discourse studies (Partington, 2006), renders results that are reliable due to their dependence on a large number of discourses.

4.2 Keywords/keyness

A key is a tool that has the metaphorical power of opening/closing and revealing/hiding what is unknown or unclear. It gives access to the features of the text/corpus (Bondi, 2010, as cited in Bondi and Scott, 2010).

A keyword is what serves as a key to the code and the significant word that is used as a reference for finding other words. Keyness is the statistical significance/measure of a keyword's frequency in a given corpus, in relation to a RC (available online at:

Words are of two main kinds as follows: content words and function words. Expressing grammatical functions, the function words are prepositions, articles, conjunctions, forms indicating number, gender or tense and pronouns. Content words express the cultural content of the text. They contain nouns, verbs, adjectives and adverbs. They, nouns and verbs, in particular, are theme-related words or content-heavy as they define the actions intended by the speaker/writer (Chung and Pennebaker, 2007). Nouns are either common or proper. While common nouns refer to general entities, proper nouns refer to specific entities, persons, places or times.

A keyword, according to Scott (1997), is a word that appears with unusual frequency in a text. The word “unusual”, he adds, does not necessarily mean a high frequency; it refers to frequency or infrequency that is unusual in comparison with a RC. Keywords are those words or phrases (multi-word lexeme) that echo the topic and are related to it. Keyword analysis has become an effective method for identifying the discourse topic. Keywords are “lexical items possessing statistical keyness”, while keywords refer to “items perceived by human readers as key” (Tyrkko (2010), as cited in Bondi and Scott, 2010, p. 80).

Keyness is the statistically higher frequency of certain words in comparison with a RC (Baker et al., 2008). Biber et al. (2007a, 2007b, p. 138) define the keyness of keywords as follows:

The keyness of a keyword represents the value of log-likelihood or χ2 statistics; in other words, it provides an indicator of a keyword’s importance as a content descriptor for the appeal.

According to Phillips (1989), as cited in Gabrielatos (2018), the notion of keyness and that of aboutness are closely related, as both refer to the topics and concepts in the corpus. Keyness is a text-dependent quality of words, phrases or word-clusters, not a language-dependent one. As a result, words are not keys in a certain language. They may be key in a given text. Within this view of keyness theory, some words are above the general level in a certain text because of their prominence and outstandings. The keyness analysis is significant in that it may lead us “to perceive the aboutness of the whole text or of certain parts of it […]” (Scott, 1996, pp. 34–44). Scott adds that the word “key” entails a metaphor in that a key grants access to a place, which is restricted or sealed. A word or a phrase that is a key is an enabling evidence to get text-meaning.

4.3 Collocation

As far as the frequency-based approach is concerned, collocation is the frequent co-occurrence of words within a certain distance recognized to be four words to either side of the specified focal word or node. The phraseological approach considers collocation to be a type of commonly-fixed word-combination (Carter, 1998 and Siepmann, 2007 as cited in Al Ghazali, 2006). Semantically-based approaches to collocations assume that they consist of two semantically-differing constituents: a semantic base that combines with a semantically dependent collocate. “Pay a compliment”, for instance, is the combination of the base “complement” with the collocate “pay” (Seipmann, 2007).

Hausmann (1984), as cited in Shammas (2013, p. 109), differentiates between fixed and non-fixed word combination and explains that collocation is a non-fixed combination of words, a base and a collocator, that may be of the following types:

  • verb + noun as in: express admiration;

  • adjective + noun, as in: serious consequences;

  • noun + verb, as in: a problem persists;

  • noun + noun, as in: job market;

  • adverb + adjective, as in: deadly serious; and

  • verb + adverb, as in: (to) sleep soundly.

4.4 The study corpus and the reference corpus

To compute/compare keywords, wordlists from a SC are to be compared to other word lists from a larger RC (Goh, 2011). The SC (the focus corpus) is the one investigated, the presidential discourses. The RC acts as a standard of comparison, a benchmark that provides the basic data – the 1,000s of items according to which keywords are computed and compared. The RC is “a yardstick, something that people can regard as a standard of comparison” (Leech, 2002, p. 1). The RC applied in this investigation is prepared, by the researcher, for this study to include all the presidential speeches from Clinton to Trump.

Reference corpora, according to Teubert and Cermakova (2004, p. 118), contain the standard vocabulary of the language. For the corpus linguist, reference corpora are the main resource to learn about the meaning:

[…] if they are large enough, they reveal the contexts into which words are usually embedded and with which other words they form collocations […] We need reference corpora, the larger the better (p. 118).

4.5 Some-related studies

Ozturk (2007) explained “the textual organization of research article introductions in applied linguistics: variability within a single discipline”, Nelson (2006) studied “semantic associations in business English: a corpus-based analysis”, Walker (2011) examined “how a corpus-based study of the factors, which influence collocation can help in the teaching of business English”, Xie (2013) examined “corpus linguistics and corpus-based research in Hong Kong: A state-of-art review”, Sun and Jiang (2013) approached “metaphor use in Chinese and US corporate mission statements: a cognitive sociolinguistic analysis”, Cheng and Cheng (2013) focussed on “epistemic modality in court judgments: a corpus-driven comparison of civil cases in Hong Kong and Scotland”, Gurtu et al. (2014) explained “an analysis of keywords used in the literature on green supply chain management”, Weir and Anagnostou (2014) examined newspapers: “exploring newspapers: a case study in corpus analysis” and Al Rawi (2017) studied “using AntConc: a corpus-based tool to investigate and analyse the keywords in Dickens’ Novel “A Tale of Two Cities”.

American presidential discourse is thought to be worthy of attention as it uncovers the attitudes adopted by each president. It also clarifies the differing focusses of attention assumed in relation to the president’s political party as follows: the republican or the democratic. The aforementioned studies are some examples of using corpus analysis tools in discourse. The following part sheds light on literature regarding political discourse.

Adagbonyin et al. (2016) investigated American and Nigerian presidential discourses by using two sets of both discourses as their database. Their corpus analysis makes use of Wmatrix software so as to examine some linguistic features pertaining to the two corpora as follows: the American and the Nigerian presidential speeches. The study concludes that features as negation, nominalization, pronominal references and repetition are used in both sets of discourses.

Kubát and Čech (2016) conducted an investigation into the inaugural addresses of the American presidents. The research aimed to examine the effect of political and historical circumstances on the style of the inaugural speeches. Indices of the richness of vocabulary, the concentration of a theme and the activity of the text are considered to be the parameters of analysis. The database of the study contains inaugural speeches from George Washington to Barack Obama. The study concluded that there is no general tendency to expose as each president adopted the style correlated with his personality. It was also found that political affiliation did not influence the style of the inaugural speeches.

Chen et al. (2019) researched Hilary Clinton and Donald Trump’s linguistic styles in their presidential campaigns. The study put Clinton and Trump’s rhetoric in par to uncover their differing linguistic styles. Textual analysis of corpora, via AntConc, signified Clinton’s use of lexical diversity in comparison with Trump. It was also exposed that Clinton was rational, positive and interested in commonalities with her fellow citizens. Trump, on the contrary, was perceived to be appealing to emotions, negative and focussing on the differences between himself and his opponents.

Sing (2010) inspected American inaugural addresses from a corpus-based approach. The analysis concentrated on the change frame in the American presidential rhetoric. The study used quantitative and qualitative approaches. It illustrated the occurrences of change frames in the whole corpus. The study also added some discourses to prove that the representation of change was related to motion concepts.

Christiansen (2017) studied Trump’s representation of the media. The way Trump describes the media in his tweets and his speeches is investigated. Knoblock (2017) discussed Trump’s views to ban Muslims to access his verbal assault. Data are collected from Trump’s official Facebook page. Results had to do with Trump’s depiction of Muslims as “Others” who are aggressive.

Vincent (2020) applied a corpus linguistic analysis to compare and contrast the democratics and the republicans’ political discourses from a religious perspective. The study analysed campaign speeches and examined the claim that the republican political discourses manifested a remarkable degree of religiosity.

The above – mentioned studies justify the current research, which has an original methodological technique. Previous studies focussed on features of style in each presidential set of speeches. This research applies a new methodology in American presidential discourse analysis. Detecting keywords and collocation via a corpus-assisted analysis of American presidential discourses is a new technique worthy of attention because of its accurate results.

5. Methodology

The paper seeks to uncover the topics of major interest as expressed by Clinton, Bush, Obama and Trump. The Miller Center is a nonpartisan affiliate of the University of Virginia and is specialized in the USA political history. Studying the presidency of the USA, The Miller Center offers the speeches delivered by American presidents from George Washington to the current president, Trump. The Miller Center is available at: ( All, presidential discourses from Clinton to Trump are downloaded and saved in separate files. Each file acts as a sub-SC. Clinton, Bush and Obama have six separate files, two for each because the previous presidents get two presidential terms. Discourses pertaining to every presidential term are downloaded, saved and analysed separately. As a result, we have seven sub-study corpora.

The RC is a gathering of the seven sub-study corpora in one file. This file is the reference standard according to which each sub-SC is investigated.

As suggested by its name, quantitative research studies make use of statistical and computational tools to make concluding remarks. They have to do with mathematical analyses of texts to measure, objectively, a certain category. According to Rasinger (2010) (as cited in Litosseliti, 2010), quantitative research is concerned with how many there is of a particular item. It has to do with the patterns denoting the way something is presented. Qualitative research studies attempt to interpret data, describe it and delve into its meaning rather than measure it.

In this concern, this research is both quantitative and qualitative. It is quantitative in the sense that it uses AntConc software to measure keyword collocates, keyword frequency and keyness through the statistical method of log-likelihood. It is also qualitative as keywords detected along with their collocates are analysed.

Top-down and bottom-up are two descriptive approaches to analyse texts. Top-down applies discourse techniques to the entire corpus. Looking at the whole picture before analyzing its smaller components, top-down processing begins with the whole text to get its meaning. Being corpus-based, bottom-up approaches focus first on micro components with the aim of identifying main discourse elements and producing a relevant linguistic analysis (Biber et al., 2007a, 2007b). Consequently, this research applies both approaches as it examines the whole text, via AntConc and analyses the resulting linguistic components as follows: keywords and their collocates.

6. Procedure

The seven presidential discourses are downloaded and transformed into text files (TXT). Each collection of speeches delivered by the selected president in each presidential term serves as a separate SC. As a result, seven sub-study corpora are uploaded separately to the corpus analysis toolkit AntConc. Identifying keywords requires comparison against another larger corpus, a RC. The RC is a larger file containing the seven sub-corpora, after being transformed into a TXT file. Keywords are identified by making a wordlist for the SC and another wordlist for the RC. Each wordlist counts all words in each corpus and orders them according to frequency. This step clarifies the most repeated words in each corpus. A comparison of the two wordlists follows. Via the software settings, statistical and log-likelihood tests are carried out and keyness is investigated. This paves the way for identifying the most salient words or phrases, whether positively or negatively. The screen window shows the keywords ranked by default by the keyness. The keyword list tool shows words that are unusually in/frequent.

Following keyword analysis, top-content keywords are used to search for their collocates in each SC. Collocates tool, according to Anthony (2018), investigates collocates of a search term: words, which are likely to appear together. AntConc is set to ignore tags, which denote line numbers and to include contracted forms. Word span is also set so that it will search for collocates in the span of five words to the left and five to the right of each search term. After typing the search term in the search box, selecting the “Freq” column and clicking “apply”, the computer window lists words by their total frequency around the centred search term, frequency on the right and on the left of the search term. In other words, the result list shows the frequency with which collocates appear to the left or to the right of the search term (the search keyword). The “Stat” column counts a “mutual information” score, which is a measure of the probability that the collocate and the keyword occur near each other, relative to how many times they each occur in total. This value measures how-related the search term and the collocates are. In this analysis, results are arranged according to frequency “Freq”.

The keywords and the collocations, which are generated by the software are saved in a separate file (Appendix). The file includes the tables brought about by AntConc. Each table is given a number and a label in the main text with a corresponding number and a label in the Appendix. Each table is an illustration of keywords/collocations, which are salient in each presidential discourse.

AntConc carries out type/token ratio (TTR). A token refers to the total number of words in a text. If a text is 100 words long, it is assumed to have 100 tokens. A type is a term referring to distinct words in a text. For example, if the text contains repeated words so that there may be 50 different words, the ratio between types and tokens in that text will be 50%. This TTR may be an indication of text diversity.

7. Analysis

In this study, American presidential discourses are compared according to the keywords and their collocations in each presidential discourse. These two criteria are expected to highlight the president’s centre of attention. Data generated by AntConc are typed in the Appendix.

Keywords are supposed to define the aboutness of each corpus. Being salient lexical items that occur with unusual frequency in text, keywords are assumed to revealing. They capture the manner each president evaluates the world(s), a thing that embodies prevailing positions and overwhelming trends.

Collocations accompanying keywords add an illumination around the meaning of the contextual utterance, its surrounding objects and the resulting actions. In consequence, keywords and their collocating terms serve to build the presidents’ world.

Data of this analysis are selected from US presidential discourses. Wordlist tool is applied to count the number of tokens/words and of word types in each corpus. The Clinton corpus contains 92,448 words/tokens in his first presidential term (word types: 6,370) and 44,745 tokens/words in his second (word types: 4,489).

The Bush corpus contains 54,705 words/tokens in his first term (types: 5,385) and 56,111words in his second one (types: 5,403).

The Obama corpus contains 131,904 words in his first term (type: 7,663) and 92,512 in his second (type: 6,729).

The Trump corpus contains 94,815 tokens/words (type: 5,993). The RC contains 567,240 words (types: 15,011). The RC, in this concern, is generally five times larger than each sub-SC. It is thought that this size is large enough to render accurate results (Berber-Sardinha, 2000 as cited in Fruttaldo, 2017).

7.1 Keyword analysis

7.1.1 Bill Clinton (1993–2001). Clinton’s first term.

Bill Clinton belongs to the Democratic Party. His speeches are downloaded from ([41]=41).

Table 1 shows the words that are used with a remarkable frequency in his first term (1993–1996) – see Appendix.

Content words are far exceeding. The notable frequency of “welfare” makes it evident that Clinton is mainly interested in internal affairs that may lead to the Americans’ well-being. This idea becomes more evident when other content words referring to the people’s prosperity appear with high keyness such as “affirmative, people, children, care, health and service”. Clinton is interested in arranging the current inner front as a primary concern, which is apparent in his use of “do, Americans, today”. Clinton never mentions “war, terrorism, borders” or any word related to external affairs with unusual frequency. On the other hand, his keywords revolve around issues inside America that affect the lives of all citizens at the present time.

Function words are much lower in number. They include “ought to, all, the, who”. These four examples refer to Clinton’s tendency towards interior matters as he expresses that “all” people “ought to” be interested in a definite orientation, as apparent in the unusual use of the definite article and the relative pronoun “who”.

Negative keywords, that are used with remarkable infrequency, work in with the same trend represented by the positive keywords. As apparent from the scarce use of the words mentioned as negative keywords such as “Iraq, Iran, border, nations, immigration and terrorists”, Clinton does not care about these issues.

Keywords, positive and negative, conform to Clinton’s major concern, which is prosperity inside the American society. No, matter what good or evil other countries may be, Clinton searches for inner comfort through work and common services. Positive anticipation is a feature of Clinton. Teamwork is another regard. That is why he prefers using keywords relating to whole groups of citizens such as “all, Americans, people”.

As far as word number is concerned, this text contains 15,388 tokens and 64 types. TTR is about 0.4%. These signals lack linguistic diversity in the text. This, in its turn, echoes the unified theme of Clinton: American’s interest inside. Clinton’s second term.

Table 2 shows results in Clinton’s second term (1997–2001).

Clinton focusses on “children, budget, schools, social, help” – which all denote a tendency towards enforcing the inner community to ensure social and economic stability. Content words, in this term, denote a slight change towards external affairs. This is manifest in Clinton’s unusually frequent use of “Kosovo, Africa, NATO, Rwanda”. This change in interest may go back to Clinton’s belief that this is his final chance standing as a president and, consequently, he should show some balance between inner and outer concerns. In his first term, Clinton, being the current figure in charge of the USA, focusses on the present “today” as being his prior concern. In his second and final term, the word “century” enjoys the highest keyness. This may be due to Clinton’s state of leaving the White House and his relevant attempt to appear optimistic and to anticipate the future.

Function words as “to, our, all” serve Clinton’s disposition towards gathering efforts of the whole Americans to attain social security inside. The modal “must” refers to his insistence on making people believe that his directions are indispensable.

Negative keywords include “Iraq, Iraqi”. Clinton proves to be uninterested in the Iraq issue and the whole matter of war on terror. Though he shows more external concerns, Clinton does not consider launching campaigns outside America to be his great triumph. Had he regarded external agendas to be a prime concern, he would have used words related to these agendas with a highly positive keyness.

Concerning word number, this corpus contains 9,109 tokens and 73 types. TTR is about 0.8%. These signals lack linguistic diversity in the text. It also shows higher diversity than Clinton’s first corpus. This conforms to Clinton’s focus on one key topic, that of inner safety, in addition to some careful attention outside America.

7.1.2 George W. Bush (2001–2009). Bush’s first term.

Table 3 presents the words that are used with uncommon frequency in Bush’s first term (2001–2004).

It is clear that Bush’s focal keywords are opposite to Clinton’s. Bush is anxious about the war on terrorism outside America. He considers that war to be a prerequisite to any progress inside America. Content words outnumber function words to indicate his regard that external war is an essential precondition before any plans inside. Words with remarkable keyness signify that analysis as “terror, Hussein, weapons, Saddam, regime, terrorist, terrorists, against, homeland”. Bush never mentions any words associated with inner prosperity or economic progression. Unlike Clinton, Bush considers Saddam Hussein, the former Iraqi president and the war on terrorism in Iraq to be a priority.

Function words do not tell much. The modal “will” marks Bush’s future intensions and plans: arranging the scene outside America. Negative keywords include as follows: “that, but, what, think, going, let, done, you, her”. They, too, do not tell much.

This corpus contains 16,459 tokens and 107 types. TTR is about 0.6%. This signals to lack of linguistic diversity. This conforms to Bush’s focus on one key campaign, that of external war against terrorism in Iraq. Bush’s second term.

Bush’s keywords in his second term (2005–2009) are in Table 4.

Table 4 proves that Bush continues his second presidential term with the same trouble: war on terrorism in Iraq. Content words are bigger in number and are a signal of Bush’s plans. Words with unusual keyness include “Iraq, Iraqi, Iraqis, Baghdad” to indicate that Bush considers Iraq the source of threat. Words as “terrorists, enemy, extremists, terror” explain Bush’s justification for his external war, which he seeks to end terrorism. Words as “border, coast” summarize Bush’s program, a program more concerned with external areas. Words as “freedom, liberty” add another reason for Bush focus on external war rather than internal society. Function words as “the and, in” refer to Bush’s definite orientation.

Negative ones include as follows: “but, we, what, very, she, everybody”. These words mean that Bush is not interested in collaborative social work. This is in harmony with the trend expressed by positive keywords that he only cares about external affairs.

Tokens in this corpus are 12,541; types are 73. TTR is about 0.6%. This lack in the diversity of linguistic items signifies Bush’s coherent focal point and target: external war.

7.1.3 Barack Obama (2009–2017). Obama’s first term.

Table 5 shows a list of keywords in Obama’s first term (2009–2012).

Obama has an orientation similar to Clinton; opposite to Bush. Words with high keyness are mostly content indicating Obama’s interest in the inner economy and social comfort. The word “insurance” signifies Obama’s plan to make life in American societies easier and more secure. His focus on interior causes is manifest in his choice of certain words to be of pivotal focus as “energy, financial, afford, companies, clear”. Obama, however, shows a tendency to balance inner and outer issues. He is halfway between Clinton, who mainly considers Americans’ “welfare” and Bush who is totally oriented towards the war on “terrorism” especially in “Iraq”. In this regard, Obama considers “Afghan, Afghanistan, Qaeda, Bin Laden” as markers of his due concern. Function words as “that, her, she, why” are of minor effects as they do not tell a lot about Obama’s strain.

Negative ones include “you, crime, drugs, immigration, terrorists”. These words make Bush and Obama two extremes. While Bush considers his war against “terrorists” to be his great victory, Obama considers that issue to be of a trivial weight. The crisis caused by “terrorists” and “immigration” is nearly ignored by Obama.

Tokens in this corpus are 19,261; types are 59. TTR is about 0.3%. Obama refrains from linguistic diversity and shows less use of diverse items than Clinton and Bush. This is a unique feature of Obama’s discourse: longer than usual without language diversity. Obama’s second term.

As for Obama’s second term (2013–2017), the notable keywords are depicted in Table 6.

As indicated by Table 6, Obama continues to cover two main focal points as follows: inner security and outer attention. Words as “kids, grace, class, folks, inequality, just” refer to his care about maintaining the social life safe and sound in America. Words as “Cuba, Cuban, Israel, Assad” denote his concern with foreign affairs. Function words are hardly found. This may be due to the nature of presidential discourse and its focus on specific points expressed by content words.

Negative ones are “Iraq, drugs, national, welfare”, which prove that Obama is in the middle area between Clinton’s focus on “welfare” and Bush’s insistence on fighting terror in “Iraq”.

Tokens are 25,201, types are 53. TTR is 0.2%. The previous table points to Obama’s language feature: using long speeches with lower diversity in linguistic items. It seems that Obama prefers long discourses so that he may be able to cover both fronts: the inner one in America and the outer one in foreign areas. His commitment to certain language items features his focus on specific content regardless of its form.

7.1.4 Donald Trump (2017).

Trump’s keywords are presented in Table 7.

Table 7 is about Trump’s words with high keyness that are to tell about his focal plans and programs. Surprisingly, the words show no definite pivotal point. Trump likes to address others with personal pronouns as “you, they, it”. Trump admires emotive language that arouses exaggerated feelings, as remarked in adjectives such as “very, great, incredible, really, tremendous, beautiful”. He also admires his own personality as clear in the uttering of his own name with remarked frequency “Trump, don”. So, Trump likes exaggerated language, likes to talk about himself and about others. He does not show any orientation inside or outside of America. Contrary to Clinton and Obama, he is not interested in American’s well-being and peaceful society. Unlike Bush, he is not worried about terrorism or the Iraqi war. It is hard to specify a specific line related to Trump’s plan. He does not arrange a certain target so that words pertaining to it may appear with relative keyness. Trump is a strange example amongst American presidents. His motives are not well defined. His aims are not set as they should be. He just talks about himself, about others and uses an emotive adjective to make the audience have a thrilled reaction.

Trump’s negative ones are as follows: “health, Iraq, care, children and education”. These words totally conform to the previous analysis. Trump has no inner plans that are why words related to requirements of sound life as “health, education, children” are unusually infrequent. Neither does he have plans outside America. He does not consider Iraq a major concern. Trump is different from Bush, who belongs to the same Republican Party. He is also the opposite of Clinton and Obama. Keywords tell that Trump is a case on its own.

Tokens are 38,459. The types are 171. TTR is 0.4%. Though Trump’s discourses are the longest, his language diversity is nearly as low as other presidents. His lengthy speeches mirror his habit of talking about others and about himself.

Though Trump’s historical record is actually incomplete, the data available seem to be sufficient to uncover his unidentified agenda. Had Trump planned a well-defined project, he would have marked it.

7.2 Collocation analysis

After making wordlists for each presidential discourse, top content keywords are typed in the search box and the tool “collocates” is processed to generate each keyword collocations according to a frequency – this option is selected by pressing “sort by freq”. Word span is five on the right and the left of each search term. This section explains the resulting top content keywords based on the resulting words collocating with each keyword.

7.2.1 Bill Clinton (1993–2001).

Clinton’s first term top content keywords are “welfare, affirmative, people, children”. Table 8 illustrates the collocates of “welfare”.

The word “welfare” is associated with “to”, a preposition showing mutual relations. This reciprocity seems to be at the centre of Clinton’s plan, that achieving welfare through teaming up with each other. This analysis is supported by the results of other terms collocating with “welfare” such as “work, reform”. Clinton considers “welfare, work, reform” to be connected. Words as “people, we” are expected to appear beside “welfare” because they refer to the common team spirit that is insisted upon by Clinton. The definite article “the” indicates Clinton’s fixed target towards “welfare”.

Table 9 illustrates that Clinton frequently associates “affirmative” with “action” and “programs”. This is evidence of his direction towards taking positive steps to make citizens’ life better. The use of “the” near “affirmative” is a sign of Clinton’s determined objectives. Because successful plans need teamwork, it is not surprising to find “and” – which indicates coordination and “to” –which refers to reciprocity – collocating with “affirmative”.

As indicated by Table 10, Clinton associates “the” with “people” and “American” to mean that “the American” citizens are his first priority. Function words collocating with “people” are “to and, of, our, for” – all denote cooperative efforts. These words affirm Clinton’s main focus on inner society and its “welfare”.

Collocates of “children”, as apparent from Table 11, signify collaboration as “and, our, there, we” and mutual relations as “to, for”. Clinton emphasizes joint action and definite aims when it comes to “children”.

Clinton’s second term top content keywords are “century, Kosovo, ask, children”.

Table 12 shows that the word “century” is correlated with items suggesting reciprocal interconnection as “to, for, of” and joint action “we and”. All, previous items propose that Clinton manages to spend his presidency trying to gather efforts for the sake of “new” interior accomplishments.

Collocates of “Kosovo” is indicated by Table 13 and it seems that these collocates follow Clinton’s fixed-line concerning showing mutual interrelations and cooperation.

Table 14 illustrates that Clinton’s “ask” has to do with the “congress” to get “support”. He considers himself the agent “I”, responsible for asking. His endeavour is mutual, denoted by “to”, “for” and “our”, as he addresses the Americans by “you” whose “support” he needs. These linguistic items expose Clinton’s trend towards the inner policy.

Table 15 is about the words collocating with “children”. Collocates of the word “children” point to mutual collaboration and defined orientation. This result echoes Clinton’s vision in his first term.

7.2.2 George W. Bush (2001–2009).

Bush’s first term top content keywords are “seniors, terror, Hussein, weapon”.

Table 16 points out that Bush is solely interested in offering the old “seniors” a good medical treatment, “medicare”. This is just one minor point inside the social milieu.

Table 17 is about the words collocating with “terror”. Unlike Clinton, Bush associates collaborative mutual efforts – indicated by “and, of, to, we” – with “terror” and “war”. The word “against” normally collocates with “terror” as the term signifies two opposing fronts.

Table 18 is about the collocates of “weapons”. The word “Hussein” collocates with “Saddam, weapons” – referring to the former Iraqi president and Bush’s belief that Hussein has fatal weapons. This is in harmony with the results in Table 17.

Bush associates between “terror”, “war”, “Saddam Hussein” and “weapons” on the one hand and “mass, destruction, nuclear, biological” on the other hand. In this way, he paves the way for his fixed conviction that his mission is to launch “war against terror”.

Bush’s second term top content keywords are “Iraq, Iraqi, Iraqis, terrorists”.

Collocations of “Iraq” are words as “forces, troops, terrorists, enemy, war”. “Iraqi” is associated with “forces, military, army”. “Iraqis” is connected to “help, elections, confidence”. These collocates present Bush’s motivation for war: that “Iraq” is controlled by terrorist troops and that Bush wants to help “Iraqis” and offer the people some help to elect new leaders. Bush uses plural pronouns associated with the previous words, but collaboration changes to serve Bush’s war on terrorism. The word “terrorists” collocates with “Iraq, extremists, fighting, terror, Qaeda” – all justify Bush’s determination to launch a war against Iraq.

7.2.3 Barack Obama (2009–2017).

Obama’s first term top content keywords are “insurance, afghan, energy, financial”. Table 19 presents the collocations of “insurance”.

Obama shifts his attention towards mutual certain actions-indicated by “the, to and, of, for”- directed towards Americans’ “health”. Obama’s claimed victory is his inner struggle to raise the standards of health care. He is similar to Clinton as both figures share an interest in American’s luxury. Bush, on the contrary, considers war against terror as his great achievement.

“Afghan” is related to words expressing Obama’s intention to pull the American troops from Afghanistan as “security, forces, government, transition, responsibility”. Unlike Bush, Obama believes that his success lies inside. Obama is also interested in “energy” whose collocates are words such as “clean, more, renewable” – all are in harmony with his focus on interior affairs. The word “financial” is interrelated with “system, crisis, sector, reform” – which all emphasize Obama’s positioning towards the economy and standard of life.

Obama’s second term top content keywords are “kids, Cuba, Cuban, Israel”.

The word “kids” collocates with items indicating group efforts and common interests as “our and, to, for, we” – this is a reminder of Clinton’s “children” (Appendix for Tables 11 and 15).

Obama revives friendly relations with Cuba. “Cuba” and “Cuban” are associated with words expressing mutuality as “here and, to, for, our, change” and “people and, to, in, relations”, respectively. These collocates signify Obama’s step towards a change improving the American-Cuban affairs. Obama has the American-Israeli relation at his centre of attention as he associates the word “Israel” with “of, to and, for, security”. Previous items explain Obama’s joint efforts with Israel to ensure its “security”. Collocations reveal Obama’s concern with the American’s benevolence amidst a safe society and his careful consideration for foreign relations.

7.2.4 Donald Trump (2017).

Trump’s first term top content keywords are “going, great, Trump, incredible”.

Collocates of the word “going” include “to”, in reference to future intention. It is expected to find action verbs related to this intension. Verbs associated with “going to”, however, include “work, happen, want, make”. It is thought that these verbs are not directed towards a specific plan, whether inside or outside. This is because Trump does not mention any objective. The adjective “great” collocates with “people, American, job, doing, country”, which do not specify a goal set to be fulfilled.

Table 20 signifies the collocations of the keyword “Trump”. Trump is the only president whose name is mentioned with unusual frequency in his own speeches. Moreover, his first name “Donald” collocates with his surname “Trump”. Moreover, “Trump” is associated with a reference to himself “I” and his formal position, “president” of “USA”. Other than referring to himself, Trump does not use significant collocations that may indicate a certain program. Referring to himself with a notable frequency, Trump seems to have an egocentric strain. “Incredible” has collocations as “we, this, you, people”. No, a well-defined topic/project is detected so that it may be described as being “incredible”. Trump’s language is not oriented to or away from, a specific purpose.

8. Discussion

Analysis of American presidential speeches has attracted the attention of many researchers because of the fact that these discourses are rich in the data that can be investigated. This study, though addressing American presidential discourses, has a genuine method of research. The current study examines the presidents’ keywords and their collocations. Because the examination is carried out via AntConc, the resultant words are believed to be authentic and explicit as well. Investigating the collocations of each keyword helps to identify the intended meaning in an objective manner.

The methodology in this research is made more reliable by its combination of quantitative and qualitative approaches. The quantitative approach is evident in the employment of AntConc, the computer software, to count the number of words, to signify the keywords and to demonstrate keyness. Explaining the meaning of the resultant keywords and the related collocations is the core of the qualitative approach.

Though the author uses corpus linguistics tools and comprises quantitative and qualitative approaches to render an accurate analysis, the study has a major limitation. Limitations of applying corpus tools to presidential discourses lie in the issue of accurate representation. This corpus analysis is based on keywords and their collocations to represent the American presidents’ central agendas. Though the study is based on accurate quantitative analyses followed by qualitative ones, the results may not be absolutely accurate. This is because presidential speeches are prepared in advance to serve a certain purpose and deliver a certain message.

Corpus – assisted analyses are recommended in the analyses of presidential discourses. These corpus linguistic techniques survey large amounts of date and present results that have statistical significance. The paper introduces a simple technique for corpus analysis. Being simple, it can be reproduced by many who are not necessarily specialists in US politics.

This paper suggests that further research should pay attention to keywords/keyness and collocating items in differing corpora such as media corpora, parliamentary corpora, scientific corpora and the like.

9. Conclusion

Keyword analysis through keyness makes it evident that Clinton and Obama, being democrats, demonstrate a clear tendency to improve Americans’ life inside their social sphere. That is why Clinton uses “welfare” to be his focal keyword; Obama uses “insurance” as his pivotal one. Both show the same interest in children. Obama surpasses Clinton as regard foreign affair that is why words as “Afghan, Cuba, Cuban” are used with positive frequency in Obama’s discourses. Bush, a Republican, concentrates only on external issues. This is proven by his keywords signifying “terror, terrorists, weapons, enemy, Saddam Hussein, Iraq”.

Trump’s positive keywords are totally irregular. His direction is neither inside nor outside. His keywords are not about interior programs or external causes. He is not similar to Clinton or Obama at any point. He is also different from Bush, though both belong to the Republican Party. Trump’s keywords are about exaggerated descriptions without a defined target to be described. He also shows an unusual frequency in referring to his name and position.

Negative keywords demonstrate topics that are infrequent in discourse. Clinton and Obama’s infrequent subjects have to do with terrorism and immigration. This complies with their condensed focus on social and economic improvements. Bush’s negative use of words marking cooperative actions conforms to his positive use of words indicating external war against terrorism.

Trump’s words used with negative keyness are an eccentric collection. Their weirdness goes back to their combination of words expressing inner amendments and outer concerns. He is neither concerned with reforming programs nor with external issues.

Collocations around each top content keyword clarify the word and harmonize with the presidential orientation negotiated by keywords with positive and negative keyness. Though this study is descriptive in focus, the application of keywords and their collocations are believed to be tools, which render creditable analyses and describe the corpora in a reliable manner.



Figure A1


Adagbonyin, A.S. Aluya, I. and Edem, S. (2016), “A corpus-based approach to the linguistic features in Nigerian and American presidential speeches”, available at:

Al Ghazali, F. (2006), “Collocations and word-combinations in English: considerations, classifications, and pedagogic implications”, available at:

Al Rawi, M.K. (2017), “Using AntConc: a corpus-based tool to investigate and analyze the keywords in dickens’ novel ‘a tale of two cities”, International Journal of Advanced Research (IJAR).

Anthony, L. (2011), “AntConc”, Tokyo, available at: www.antlab.sci

Anthony, L. (2017), “Corpus linguistics and vocabulary: a commentary on four studies”, Vocabulary Learning and Instruction, Vol. 6 No. 2.

Anthony, L. (2018), “AntConc (windows, macintosh OS X, and linux)”, available at:

Balossi, G. (2014), A Corpus Linguistic Approach to Literary Language and Characterization. VT Woolf’s the Waves, John Benjamins Publishing Company, Amsterdam and Philadelphia.

Baker, P., Gabrielatos, C., Khosravinik, M., Krzyz, A., Mcenery, T. and Wodak, R. (2008), “A useful methodological synergy? Combining critical discourse analysis and corpus linguistics to examine discourses of refugees and asylum seekers in the UK press”, Discourse & Society, SAGE Publications, Vol. 19 No. 3, pp. 273-306.

Barnbrook, G., Mason, O. and Krishnamurthy, R. (2013), Collocation, Applications and Implications, Palgrave Macmillan, London.

Bennet, G.R. (2010), Using Corpora in the Language Learning Classroom: corpus Linguistics for Teachers, ELT, MI.

Biber, D., Connor, U. and Upton, T.A. (2007a), Discourse on the Move. Using Corpus Analysis to Describe Structure, John Benjamins Publishing Company, Amsterdam and Philadelphia.

Biber, D., Connor, U., Upton, A., Anthony, M. and Gladkov, K. (2007b), “Rhetorical appeals in fundraising”, Biber, D. Connor U. and Upton A. (Ed.), Discourse on the Move: Using Corpus Analysis to Describe Discourse Structure, John Benjamin, Amsterdam, pp. 121-151.

Bondi, M. (2010), “Perspectives on keywords and keyness: an introduction”, Bondi M. and Scott M. (Ed.), Keyness in Texts, John Benjamins Publishing Company, Amsterdam and Philadelphia.

Chen, X., Yan, Y. and Hu, J. (2019), “A corpus-based study of hillary clinton’s and donald trump’s linguistic styles”, International Journal of English Linguistics, Vol. 9 No. 3, pp. 13-22.

Cheng, W. and Cheng, L. (2013), “Epistemic modality in court judgments: a corpus-driven comparison of civil cases in Hong Kong and Scotland”, available at:

Christiansen, A. (2017), “Enemy of the american people – A corpus-assisted discourse analysis of @realDonaldTrump”, available at

Chung, C.K. and Pennebaker, J.W. (2007), “The psychological functions of function ords”, In Fiedler K. (Ed.), Social Communication, Psychology Press, New York, NY.

Fruttaldo, A. (2017), News Discourse and Digital Currents: A Corpus-Based Genre Analysis of News Tickers, Cambridge Scholars Publishing.

Goh, G.Y. (2011), “Choosing a reference corpus for keyword calculation”, Linguistic Research, Vol. 28 No. 1, pp. 239-256.

Gurtu, A. Searcy, C. and Jaber, M. (2014), “An analysis of keywords used in the literature on green supply chain management”, available at:

Knoblock, N. (2017), “Xenophobic trumpeters: a corpus-assisted discourse study of donald trump’s facebook conversations”, available at:'s_Facebook_conversations

Kubát, M. and Čech, R. (2016), “Quantitative analysis of US presidential inaugural addresses”, available at:

Leech, G. (2002), “The importance of reference corpora”, available at:

McEnery, T. and Gabrielatos, C. (2006), “English corpus linguistics”, available at:

Nelson, M. (2006), “Semantic associations in business English: a corpus-based analysis”, available at:

O’Keeffe, A. and McCarthy, M. (2010), The Routledge Handbook of Corpus Linguistics, Routledge, London and New York, NY.

Ozturk, I. (2007), “The textual organization of research article introductions in applied linguistics: variability within a single discipline”, English for Specific Purposes, Vol. 26 No. 1, pp. 25-38.

Partington, A. (2006), The Linguistics of Laughter: A Corpus-Assisted Study of Laughter Talk, Routledge, New York.

Phillips, M. (1989), “Lexical structure of text”, in Gabrielatos C. (Ed.), “Keyness analysis: nature, metrics and techniques”, Chapter published in Taylor, C. and Marchi, A. (Eds), (2018) Corpus Approaches to Discourse: A Critical Review, Routledge, Oxford.

Rasinger, S.M. (2010), “Qualitative methods. Concepts, frameworks and issues”, Litosseliti, L. (Ed.), Research Methods in Linguistics, Continuum, London and New York, NY.

Scott, M. (1996), “Problems in investigating keyness, or clearing the undergrowth and marking out trails”, Bondi M. and Scott M. (Ed.), Keyness in Texts, Amsterdam/Philadelphia, John Benjamins Publishing Company.

Scott, M. (1997), “PC analysis of key words - and keywords”, System, Vol. 25 No. 2, pp. 233-245.

Scott, M. (2008), Wordsmith Tools Version 5, Lexical Analysis Software, Liverpool.

Siepmann, D. (2007), “Collocations and examples: their relationship and treatment in a new corpus -based learner's dictionary”, available at:’s_dictionary

Seipmann, D. (2007), “Collocations and examples: Their relationship and treatment in a new corpus -based learner's dictionary”, available at:

Shammas, N.A. (2013), “Collocation in English: comprehension and use by MA students at arab universities”, International Journal of Humanities and Social Science, Vol. 3 No. 9.

Sinclair, J. (2004), Trust the Text, Routledge, London.

Sing, C.S. (2010), “Yes, we can”—framing political events in terms of change: a corpus-based analysis of the ‘change’ frame in American presidential discourse”, Belgian Journal of Linguistics, Vol. 24 No. 1, pp. 139-163.

Starcke, B.F. (2010), “Corpus linguistics in literary analysis”, Jane Austen and Her Contemporaries, Continuum, London and New York, NY.

Sun, Y. and Jiang, J. (2013), “Metaphor use in Chinese and US corporate mission statements: a cognitive sociolinguistic analysis”, available at:

Teubert, W. and Cermakova, A. (2004), “Directions in corpus linguistics”, Halliday M.A.K., TeubertW. and Colin Yallopand Cermakova, A. (Eds), Lexicology and Corpus Linguistics, London and New York, NY, Continuum.

Tyrkko, J. (2010), “Hyperlinks: keywords or key words?”, Bondi M. and Scott M. (Ed.), Keyness in Texts, Amsterdam and Philadelphia, John Benjamins Publishing Company.

Van Peer, W. (2011), Introduction. Scientific Study of Literature, John Benjamins Publishing Company.

Vincent, A. (2020), The Religious Rhetoric of US Presidential Candidates: A Corpus Linguistics Approach to the Rhetorical God Gap (Routledge Advances in Corpus Linguistics), 1st ed., Kindle Edition, Routledge.

Walker, C. (2011), “How a corpus-based study of the factors which influence collocation can help in the teaching of business English”, available at:

Weir, G.R.S. and Anagnostou, N.K. (2007), “Exploring newspapers: a case study in corpus analysis”, available at:

Xie, Q. (2013), “Corpus linguistics and Corpus-Based research in Hong Kong: a state-of-Art review”, available at:

Further reading

Baker, P. and McEnry, T. (2015), Corpora and Discourse Studies. Integrating Discourse and Corpora, Palgrave Macmillan.


The author declares that there is no conflict of interest and that all data generated or analysed during this study has been included. Dalia Hamed is the sole author and the research did not receive any specific grant from funding agencies in the public, commercial or not for profit sectors. The author is extremely grateful to her late mother, Fariza Eissa, for her caring and preparing her for her future. May God bless her soul.

Corresponding author

Dalia Hamed can be contacted at:

About the author

Dalia Hamed is a lecturer of linguistics in the Department of Foreign Languages, Faculty of Education, Tanta University, Egypt. She is interested in Discourse Analysis and Pragmatics. Her MA is a stylistic analysis based on Pragmatics. Her PhD is a comparative analysis of legal discourses in American and Egyptian legal institutions.

Related articles