Mining the Web to approximate university rankings

Corren G. McCoy (Old Dominion University, Norfolk, Virginia, USA)
Michael L. Nelson (Old Dominion University, Norfolk, Virginia, USA)
Michele C. Weigle (Old Dominion University, Norfolk, Virginia, USA)

Information Discovery and Delivery

ISSN: 2398-6247

Publication date: 20 August 2018

Abstract

Purpose

The purpose of this study is to present an alternative to university ranking lists published in U.S. News & World Report, Times Higher Education, Academic Ranking of World Universities and Money Magazine. A strategy is proposed to mine a collection of university data obtained from Twitter and publicly available online academic sources to compute social media metrics that approximate typical academic rankings of US universities.

Design/methodology/approach

The Twitter application programming interface (API) is used to rank 264 universities using two easily collected measurements. The University Twitter Engagement (UTE) score is the total number of primary and secondary followers affiliated with the university. The authors mine other public data sources related to endowment funds, athletic expenditures and student enrollment to compute a ranking based on the endowment, expenditures and enrollment (EEE) score.

Findings

In rank-to-rank comparisons, the authors observed a significant, positive rank correlation (τ = 0.6018) between UTE and an aggregate reputation ranking, which indicates UTE could be a viable proxy for ranking atypical institutions normally excluded from traditional lists.

Originality/value

The UTE and EEE metrics offer distinct advantages because they can be calculated on-demand rather than relying on an annual publication and they promote diversity in the ranking lists, as any university with a Twitter account can be ranked by UTE and any university with online information about enrollment, expenditures and endowment can be given an EEE rank. The authors also propose a unique approach for discovering official university accounts by mining and correlating the profile information of Twitter friends.

Keywords

Citation

McCoy, C., Nelson, M. and Weigle, M. (2018), "Mining the Web to approximate university rankings", Information Discovery and Delivery, Vol. 46 No. 3, pp. 173-183. https://doi.org/10.1108/IDD-05-2018-0014

Download as .RIS

Publisher

:

Emerald Publishing Limited

Copyright © 2018, Emerald Publishing Limited


1. Introduction

Universities and other academic institutions increasingly see their presence and visibility on the Web as central to their reputation. In this context, information content on the academic Web is viewed as a reflection of the overall organization and performance of the university (Aguillo et al., 2008). Academic rankings can play an important role in assessing reputation. With disparate criteria and methodologies, however, there can be a significant divergence in the rankings of a particular institution.

We consider the set of data associated with a university that is publicly available on the Web as a collection. In this work, we mine the data in this collection to compute two different metrics that can be used to approximate typical academic rankings of US universities. We mine Twitter data to compute a ranking based on the University Twitter Engagement (UTE) score. We mine other public sources of data related to endowment funds, athletic expenditures and student enrollment to compute a ranking based on the endowment, expenditures and enrollment (EEE) score. Both of these metrics can be computed at any time and by any party, without having to wait for the release of annual university rankings or depending on subjective measures such as reputation.

UTE is the total number of all affiliated users the university promotes on its homepage plus the followers of any Twitter friends who indicate an affiliation with the university in their profile Uniform Resource Identifier (URI). The UTE score quantifies the potential popularity or prestige of the university without an extensive data collection effort. The EEE score is computed from publicly available data on the web about alumni engagement (reflected in alumni donations and endowments), athletic engagement (reflected in athletic expenditures) and student enrollment.

We assume that:

  • Universities with higher undergraduate enrollment are likely to have more Twitter followers as students transition to alumni status.

  • Official Twitter accounts will be featured on the university’s homepage.

  • Sports participation is a driver that increases awareness of the university’s brand.

  • The data needed to comprise the ranking criteria are readily available from public data on the Web.

Figure 1 depicts a point-in-time glimpse into the Twitter followers (675K) for Harvard University, a perennially top-ranked school, which represents an approximate 100:1 ratio to its undergraduate enrollment (6,660). On the other hand, the Twitter follower count (1,213) for Virginia Military Institute (VMI), a top-100 school, barely maintains a 1:1 ratio with its under-graduate enrollment (1,717). We would expect schools with similar enrollment to attract a similar number of Twitter followers. The large disparity between Harvard and VMI presents a first indication that some correlation may exist between rank position and Twitter followers. We propose a novel approach which considers not only the primary Twitter accounts which the university may advertise on its home page but also secondary accounts which the university informally promotes by following them on Twitter. To ensure that a mutual affiliation exists between the primary and secondary accounts, we require the top level domain assigned to the university in its URI (e.g. harvard.edu) be present in the Twitter profile of all affiliated Twitter accounts.

The contributions of this study are as follows. We aggregate the rankings from multiple expert sources to compute an adjusted reputation rank (ARR) for each university, which allows direct comparison based on position in the list and provides a collective perspective of the individual rankings. We conduct a Web-based analysis to identify and collect a mutually aligned set of primary and secondary Twitter accounts as a measure of social media engagement. We propose two easily collected proxy measurements, UTE and EEE that achieve comparable rankings to more complex methodologies that rely upon manual compilation. We produce a social media rich data set containing Twitter profile data and institutional demographics which will allow other researchers to reproduce our work (Weller and Kinder-Kurlanda, 2016). The complete data set is posted on GitHub[1].

2. Related work

Using Twitter followers as a way to measure reputation has been the subject of many previous studies. Our work parallels the work conducted by Klein et al. (2009a, 2009b) and Nelson et al. (2008) who attempt to find correlations between the rankings of real-world entities (e.g. college football teams, Billboard Hot 100) and the page rank of their respective home pages. We examine something similar, but instead derive the ranking score using social media.

2.1 Challenge of ranking universities

University rankings are subject to assumptions about the type of variables used and their associated weightings. Therefore, ranking systems reflect the conceptual framework and the modeling choices used to build them (Goglio, 2016). These systems can potentially give inaccurate indications to university administrators about the activities in which it is better to invest in order to improve the ranking of their institution (Goglio, 2016). And, as predicted by decision-making theory, Bowman and Bastedo (2011) found that anchoring effects exert a substantial influence on future reputational assessments. Once a university reaches the pinnacle of any ranking system, they are anchored and often do not fall very far from their original position. Nearly always, rankings drive reputation, not the other way around. The notion of reputation largely serves as a feedback loop to maintain the status quo, establishing the credibility of the rankings and ensuring stability in results over time (Bowman and Bastedo, 2011).

Heterogeneous metrics used by ranking organizations can make direct comparisons difficult as each list may be intended to convey a distinct purpose. Three of the four ranking systems we reference determine best colleges based on academic excellence while the fourth, Money Magazine, is focused solely on perceived value and affordability. A particular ranking list may count factors such as external funding, numbers of articles authored by faculty, library resources and proportion of faculty with advanced degrees. This information is not always easy to obtain. Conducting surveys can be time-consuming and expensive if the data must be gathered over a long period of time or requires input from a university official.

The ranking systems often assume that one set of metrics and the norms of research-based and elite universities are applicable to everyone (Altbach, 2015). Goglio (2016) showed that the competition to improve ranks among lower-ranked universities is different from the competition to do so among higher-ranked universities. Grewal et al. (2008) results showed that a top-ranked university has a 0.965 probability of appearing in the top five the next year. Ultimately, regardless of popularity, universities exhibit very little power to control their rank position, especially when the top positions are perennially dominated by the same institutions (Goglio, 2016).

2.2 Social media in higher education

Even when the ranking systems have the same goal, technical challenges can still hamper data collection, specifically, changes in page names or Web domains can affect both the visibility and discoverability of the institution’s Web presence. An organization can also use different Web domains for search engines, aliases and independent domains for some of their subunits or services (Aguillo et al., 2008). For example, in addition to purdue.edu which is the expected domain for Purdue University, we found purduesports.com and purduealumni.org as domains associated with university-affiliated organizations. As noted by Aguillo et al. (2008), an adequate Web presence or lack thereof may not always correlate with the prestige of the institution.

Social networking sites have proven to be an effective vehicle for organizations seeking to implement diverse branding strategies, given that such sites allow consumers to share their experiences and opinions concerning the organization’s brand in real time (Heller Baird and Parasnis, 2011; Jansen et al., 2009). Many organizations have rapidly adopted social networking services such as Facebook and Twitter, a move that has altered the face of customer relationship management from managing customers to collaborating with customers. While social media interactions in the higher education space are not transactional in the traditional sense, they do provide a way for institutions to continually engage with their constituents. Another form of engagement, or public involvement with a chosen organization that may fall outside of consumer interests, is affective commitment which Kang (2014) defines as a voluntary bonding between entities, perhaps similar to how a university might maintain contact with its alumni long after graduation. We will focus on engagement at a very basic or minimal level based on familiarity and cognition where one first needs to be familiar with a university’s online activity and subsequently start to follow them via social media.

A 2016 study conducted by the Pew Research Center measured social media usage in the USA. The study concluded that while Facebook continues to be the USA’s most popular social networking site with nearly 79 per cent of online users using the platform, Twitter usage is holding steady at 24 per cent and is also somewhat more popular among the highly educated (Greenwood et al., 2016). Go and You’s (2016) 201 social media benchmarking report also suggests that Twitter is perceived as the most useful application for businesses. At the organizational level, Tsimonis and Dimitriadis (2014) examined the policies, strategies and outcomes that companies might expect when engaging on social media. Further, research findings attest to the value of social media engagement in building communities and nurturing positive public attitudes regarding the reputation of the organization (Men and Tsai, 2015). Through data collected via a large-scale survey, Dijkmans et al. (2015) also found that social media engagement is positively related to corporate reputation.

2.3 Influence of twitter followers

Measuring social influence on Twitter has been widely discussed. Related work includes approaches which not only take followers and interactions into account, but also analyze topical similarities with the help of a ranking method similar to PageRank (Weng et al., 2010). Other approaches define different types of Twitter influence, namely, in-degree, retweet and mention influence (Cha et al., 2010). Accordingly, a question that arises concerns how to determine the Twitter accounts that are most influential and how their influence is subsequently measured (Antoniadis et al., 2016). Measuring Twitter followers is generally considered to be a popular metric as having many followers can indicate a higher level of influence among interested users. This metric implies that the more followers a user has, the more impact the user has, as the user seems to be more popular (Leavitt et al., 2009). Preussler and Kerres (2010) contend that the number of followers is an indicator for the social reputation and the number of followers will increase as the user becomes more important. Finally, Kunegis et al. (2013) assert that preferential attachment indicates that people who are already followed by many people (i.e. are popular) are more likely to receive new followers.

An alternative approach for ranking Twitter users undertaken by Saito and Masuda (2013) considers the number of others that a user follows, i.e. friends. They concluded the number of others that a user follows is equally important as the number of followers. In previous studies, a variety of characteristics, both personal and social, have been used to identify influencers and each study measures influence from different perspectives (Black, 1993; Kwak et al., 2010; Leavitt et al., 2009; Weng et al., 2010). Weng introduced the concept of homophily which implies that a Twitterer follows a friend because she is interested in some topics the friend is publishing, and the friend follows back because she finds they share a similar topical interest. The presence of homophily implies there are Twitter users who are highly selective when choosing friends to follow (Weng et al., 2010). These conclusions are evidenced by super users who are followed by many, but they only follow a select group of Twitter friends (e.g. consider the friend-to-follower ratio of Harvard shown in Figure 1).

3. Methodology

The following section discusses how we chose the performance indicators to correspond with the entries in the expert lists, the ranking algorithm and other operational details.

3.1 Establishing the selection criteria

We begin with the 351 American universities currently classified as Division I by the National Collegiate Athletic Association (NCAA)[2]. We then consider which institutions appear among the rankings of the Academic Rankings of World Universities (ARWU) 2016[3], the Times Higher Education (THE) World University rankings 2015-2016[4], Money’s Best Colleges (MONEY) 2016-2017[5] and US News & World Report (USNEWS) Best Global Universities 2015 and 2016[6].

In Table I, we identify the overlap between the total number of universities on each list and the NCAA Division I category of interest. While Division I is not necessarily a ranking, participation in Division I athletics might be an indicator that the university garners more attention from alumni and the general public via national media exposure. For example, consider the difference in the adjusted reputation rank (Section 3.3) between Ohio State (Division I) with a rank of 56 and the smaller Case Western Reserve (Division III) with a rank of 128, which are public and private universities located within close proximity in Columbus and Cleveland. These rankings support the assertions of Standifird (2005) who noted student enrollment and athletic team performance may influence the assessment of private and public universities, respectively, with a slight bias toward more visible institutions in general. A review of the unique appearance of a university on one or more lists demonstrates the diversity or lack thereof between the five rankings under consideration. Only Money Magazine, with its emphasis on perceived value, includes 115 institutions not evaluated elsewhere, while more than 53 per cent of the universities in our data set appear on at least two of the indicated lists. This anchoring of universities among the ranking lists is consistent with previous research (Bowman and Bastedo, 2011) regarding adherence to the status quo (Section 2.1).

3.2 Standardizing the rank positions

Two of the ranking systems that contribute to our data set bin universities alphabetically into groups after a certain threshold has been reached, resulting in tied ranking positions for those universities found lower on the list. After the first 200 individual rankings, THE places the remaining institutions ranked between 201 and 400 into bins of size 50 and then uses bins of size 100 for ranks between 401 and 800. The ranking for each binned institution is the lowest number in the bin. All institutions listed alphabetically as ranked between 401 and 500 would be assigned rank 401. The rankings of ARWU are conducted similarly except ARWU starts to bin after the first 100 individual rankings.

One of the problems when comparing two ranked lists is that the items may not be identical, meaning items that appear in List A do not necessarily appear in list B. Fagin et al. (2003) introduced a new measure which extends Spearman’s Footrule by assigning a rank to the non-overlapping elements. For two rankings of size k, each element that appears in List A but does not appear in the List B [either totally missing from B or ranked at position (k)] is assigned rank k + 1. For our data set, application of the footrule essentially places all universities which are not ranked at the end of a respective list. After removing the international entries, if any, the remaining institutions on each ranking list were sequentially ordered as shown in Table II using the THE rank as an example. The sequential ordering according to relative position was necessary due to differences in the number of US institutions on each list (see Table I), and the need to standardize ranking positions to obtain concordance between all lists.

3.3 Computing adjusted reputation rank

One of our goals is to compute an adjusted reputation rank. Therefore, we must avoid unduly penalizing an institution by including a low raw ranking on a particular list in our ARR calculation, especially when the institution is referenced on just one or two of the named lists. To ensure that we incorporate different ranking perspectives in our evaluation, we average the ordered positional rankings from all ranking lists in our consolidated data set to compute a mean reputation score which we then use to sequentially order the listed universities to obtain the adjusted reputation rank shown in Table IV. Upon examination, we discovered that some schools which met the criteria to be ranked by Money Magazine based on value performed differently using the criteria established by the other ranking systems. For example, Columbia University is consistently in the top-15 of the other four ranking systems, while MONEY ranks the school considerably lower at Position 52. As described later in Section 4.1, we computed rank-order correlation for each of the rankings. Table V shows that the rankings from MONEY are consistently weak-to-moderately correlated with all other ranking lists we consider. Therefore, we exclude the MONEY rankings from our computation of ARR. The 115 schools which appeared only on the MONEY list were placed in a non-ranked position at the end of ARWU, THE, and the lists from USNEWS. A standardized ranking position was then calculated using the methodology described in Section 3.2.

3.4 Computing the composite EEE rank

We identified several candidate attributes to determine which combination of quantifiable attributes might provide a good evaluation metric for our ranking system. We empirically selected characteristics that can be calculated or retrieved from the Web: monetary value of the endowment, athletic expenditures and undergraduate enrollment. This comprised our composite EEE ranking. We include the total expenditures for men’s and women’s sports as a measure of the institution’s commitment to promoting the university as a whole. The data sources for these values are listed below:

For endowments that were attributed to a university system (e.g. University of Minnesota Foundation vs University of Minnesota-Twin Cities), we used DBpedia to obtain the endowment value for the particular university present in the ranking lists to avoid overstating the endowment[7]. Specific institutional data such as the founding date that could not be obtained from another already mentioned source were retrieved using Web searches of DBpedia.

Due to the broad range of values, each of the EEE was normalized individually across the full data set of 264 universities to obtain the same scale, from 0 to 1, then aggregated to obtain a sequential EEE ranking of the universities. The top-10 universities as ranked by our EEE score are shown in Table III.

Later in Section 4.3, we theorize whether the EEE score might serve as a viable proxy measure for a subset of our data, the NCAA Power Five. The NCAA Power Five Conferences include the South-eastern Conference (SEC), Atlantic Coast Conference (ACC), Big Ten, Pac-12 and Big 12. These conferences are composed of 65 flagship public and private universities that share excellent academic reputations, large endowments and big budgets allocated for their athletic programs. These schools are representative of institutions that are playing at the highest level of NCAA competition and typically excel in two if not all three of the dimensions of EEE.

3.5 Mining official twitter accounts

One of the proposed performance indicators for our data set is constructed around a set of primary Twitter seed accounts for each university. For the present study, the presence of Twitter friends is also needed to bootstrap the discovery of affiliated, secondary Twitter accounts. The complete process for identifying these accounts and determining the value for UTE is shown in Algorithm 1 and described here. As illustrated in Figure 2, we start with the URI for the university’s homepage obtained from the detailed institutional profile information in the ranking lists. For each URI, we navigated to the associated webpage and searched the HTML source for links to valid Twitter handles. After examining the source anchor link text, we eliminated known false positives which were longer than 15 characters (Twitter limit for a valid screen name) or included/intent,/share,/tweet,/search or/hashtag in the URI, which are directives to Twitter queries. Once the Twitter screen name was identified, the Twitter GET users/Show API was used to retrieve the URI from the profile of each user name. If the domain of the URI matched exactly or resolved to the known domain of the institution, we considered the account to be one of the university’s official, primary Twitter handles since the user had self-associated with the university via the URI reference. As an example, the user names @NBA, @DukeAnnualFund, @Duke_MBB and @DukeU were extracted from the page source of the Duke University homepage (www.duke.edu). However, only @DukeAnnualFund and @DukeU are considered official primary accounts because their respective URIs, annualfund.duke.edu and duke.edu, are in the same domain as the university.

Ten institutions did not have a Twitter account identified on their homepage as of August 2016; therefore, a primary official account could not be determined via our automated homepage search. These schools included South Carolina (@uofsc), Missouri (@mizzou), North Carolina at Greensboro (@uncg), Ball State (@ballstate), Evansville (@evansville), Fordham (@fordham-notes), Marist College (@marist), Portland State (@portland_state) and East Carolina (@eastcarolina). For this subset only, we used the Google Custom Search Engine[8] to initiate an X-ray search using the keywords “institution” AND “twitter.” We accepted the top-ranked result returned by Google, if any, as the official, primary Twitter account for the university. In the event that Google did not return a Twitter account, we manually searched using the search bar located on http://twitter.com.

Colleges and universities have a reputation for being decentralized, with many departments operating independently of one another, maintaining a separate social media presence. However, we observed that only 24 of the 264 universities in our dataset promoted multiple, official Twitter accounts on their homepage. For the purpose of computing our UTE score, we want to consider the contribution of all university-affiliated Twitter accounts. Therefore, for each of the identified official, primary accounts, we obtained the full list of their Twitter friends, i.e. users that they follow. Again, we used the Twitter GET users/Show API to determine which of the friends could be included as secondary official Twitter accounts based on the URI in the profile (must have the same domain as the university). These secondary accounts might include the athletic teams, faculty members and other university organizations. Once the primary and secondary accounts were identified, we used the Twitter GET followers/IDs API to retrieve and accumulate the follower count to form the UTE score for the university.

We launched our crawler to find all of the designated Twitter followers during the time period between June 15, 2016, and August 30, 2016. In total, we collected 1,087,000 user profiles. Approximately 9 per cent of all the user accounts we collected were protected at the profile owner’s request; allowing only their friends to view their profiles. Subsequently, we ignored these users in the computation of the UTE score because the underlying profile data are inaccessible using the Twitter API. Once we calculated the UTE score, we then ranked each university, in sequential order, based on the score, as shown in Table IV.

4. Evaluation

In this section we evaluate our EEE and UTE rankings by computing rank-order correlation with the adjusted reputation rank (Section 3.3). We also directly compare the rankings of individual universities for the full data set and discuss the implications for universities in the NCAA Power Five conferences.

4.1 Rank-order correlation

Since we know the potential for tied rankings exists in our data, we used Kendall’s Tau-b (τ) rank-order correlation to test for statistically significant (p < 0.05), moderate (0.40 < τ ≤ 0.60) or strong (0.60 < τ ≤ 0.80) correlations between the individual ranking systems and our adjusted reputation rank. Table V shows the respective inter-rank correlation measured in Kendall τ. With τ values in the range of 0.3189 to 0.4191, the rankings on Money Magazine are weak to moderately correlated with all other ranking lists including our ARR. This range of τ values confirms our intuition that the disparate ranking criteria based on value and the underlying goals of the Money Magazine system appropriately deem it an outlier among the other lists. We note a strong correlation, in the range of 0.7634 to 0.8787, between the remaining four lists which indicates that:

  • the criteria traditionally used to rank universities based on academic excellence change slowly, thus resulting in minimal differentiation in the selected universities; and

  • the relative ranking position of a particular university is anchored and does not vary significantly from year to year.

The strong correlation of 0.8787 between subsequent lists found in the 2015 and 2016 rankings in USNEWS along with the addition of only three new entrants in 2016 (see Table I) confirms this observation. The lack of variety between the USNEWS rankings is also consistent with the conclusions of Grewal et al. (2008), noted previously in Section 2.1, which indicated the high probability of a top-ranked university retaining its rank from year to year. Our adjusted reputation rank, with τ values in the range of 0.8285 to 0.9375, is strongly correlated with the rankings in ARWU, THE and both years of USNEWS. Therefore, we conclude that ARR can be used as a representative proxy for any traditional ranking system.

4.2 Composite ranking correlation with UTE

To evaluate our EEE and UTE rankings against the ARR, we again used Kendall’s Tau-b (τ) rank-order correlation to test for statistically significant (p < 0.05), moderate (0.40 < τ ≤ 0.60) or strong (0.60 < τ ≤ 0.80) correlations. Using ARR as the ranking criteria, we selected the top-50, top-100, top-141 ranked on two or more lists, and all 264 universities in our data set. As shown in Table VI(a), we found with a τ value of 0.6691, UTE is most strongly correlated with the ARR for the top-50 institutions followed closely by EEE at 0.5728. We must note the majority of the universities in the top-50 of any ranking list are usually members of the Ivy League or large schools with highly recognizable athletic programs like those in the Power Five (e.g. Ohio State, Penn State) so we might expect similarities in the metrics that comprise EEE. The correlation between UTE and ARR decreases slightly for the top-100, but persists to indicate a strong correlation, τ = 0.6018, when we examine the full data set in Table VI(d). We conclude that primary and secondary Twitter followers, as we have defined for UTE, presents a strong metric for ranking and assessing the reputation of a university; especially those atypical institutions normally excluded from traditional ranking lists. For the full list, Table VI(d) shows that both EEE and UTE have strong correlation to the ARR.

To further investigate the correlation of ARR, UTE and EEE, we show scatterplots in Figure 3 of the combinations of the three rankings for all 264 universities. The colors represent bins of the EEE rank, which can be directly seen in Figure 3(a). As discussed in Section 3.3, the 115 schools that appeared exclusively on the Money Magazine list were binned and all assigned a rank of 142 on the ARR. Note that all of the universities in the first bin of EEE (black dots) are ranked below 150 in ARR, suggesting that universities with high enrollments, endowments and/or athletic budgets also have high academic rank. Figure 3(b) (ARR vs UTE) shows that there are several universities that have larger Twitter followings than can be explained just by academic rank (i.e. UTE rank is higher than ARR rank). Most of these rankings fall into the first bin of EEE, which could explain the increase in Twitter following. Twitter engagement provides an inexpensive means for smaller schools to reach a large audience, potentially enhancing their reputations. Figure 3(b) also shows that there are several smaller schools (in the last EEE bin, cyan dots) that have larger Twitter followings than their academic rank (not ranked in ARR) or EEE would explain. These schools may be making a concerted effort to enhance their profile and could potentially move into the standard academic rankings in the future. This would be an interesting avenue for future study. Finally, Figure 3(c) shows EEE vs UTE, which indicates that as expected, universities with more financial resources tend to have larger Twitter followings, though there are still some universities in the lower EEE bins that have significant Twitter followings.

4.3 Correlation between the NCAA Power Five

We use the fraternity of the schools in the Power Five to more closely examine the collective ranking correlation of these conferences based on their 2016 membership. Within the complete data set, we observed that 55 out of the 65 Power Five member institutions (84.6 per cent) were ranked within the top-100 positions based on the ARR rank. Further, we found that all 65 schools (100 per cent) were ranked within the top-100 positions based on the EEE rank. The latter observation is consistent with the strong correlation between EEE and UTE, τ = 0.6461, shown in Table VI (d), and is consistent with our intuition that large schools with ample financial resources would attract more Twitter followers. Figure 4 highlights the relationships between the Power Five and the various metrics by repeating the same charts from Figure 3 but with members of the Power Five shown in blue.

We noted several similarities which were indicative of the ten schools (15.4 per cent) that were ranked outside of the top tier for ARR. Notably both Texas Christian and Mississippi State are the only schools which were not ranked on two or more of the ranking lists. Both schools also fall significantly below the mean values for the Power Five in terms of undergraduate enrollment (approximately 21,000), endowment value (approximately $2.3bn), and athletic expenditures (approximately $90m), placing them at the bottom of the EEE ranking. On the other hand, Wake Forest is the smallest institution in the Power Five, but the school garners an academic reputation (ARR = 45) that cannot be readily explained by its comparatively low EEE ranking (EEE = 97).

We also note four schools that fall within the bottom 50 per cent of UTE. In particular, the University of Louisville could achieve a considerable boost in UTE ranking (approximately 107,000 followers) if the Twitter account used by the athletic department (@GoCards) would reference the primary URI of the university rather than its own domain (http://gocards.com). We discovered 284 primary and secondary accounts followed by Georgia Tech; however, only four of these could be considered official, because 150 of 280 secondary accounts did not include a URI in the profile bio. A similar scenario was noted for Oregon State where 271 of the 341 secondary accounts did not include a URI. While we identified 74 official accounts for the University of Pittsburgh, as was the case with Louisville, approximately 140,000 underreported secondary Twitter accounts are associated with university sports. We discovered the Twitter followers of Wake Forest are bolstered significantly by a single celebrity professor, Melissa Harris-Perry, who in addition to her faculty position previously hosted a weekly news style program on US television. More than 80 per cent of the Wake Forest UTE score is attributed to the verified @MHarrisPerry Twitter account which has more than 600,000 followers.

In Table VII, we provide a sampling of the diverse, though not exhaustive, list of unique university domains referenced in the profile of secondary Twitter accounts of the NCAA Power Five. The full table for all 65 universities in the Power Five is found in Appendix A of McCoy et al. (McCoy, 2017). Upon visual inspection of the Web content of each domain, we find they are related to the university in some capacity (e.g. sports teams, clubs), but do not conform to our domain association rule. We identified 181 secondary domains associated with Purdue University. The total followers associated with the 296 Twitter accounts which referenced one of the secondary domains is 426,586. As evident by this example, the omission of followers for the secondary Twitter accounts can, in some cases, significantly lower our calculation of UTE score. For those under performing universities, in terms of Twitter followers, inclusion of more domains would elevate the UTE rank and likely present a stronger correlation of Kendall’s Tau-b (τ) than was noted in Table V. We did not attempt to identify additional secondary domains for the entire set of 264 universities in our dataset. This exercise would be manually intensive and counter to our stated goal of automated data collection.

5. Discussion and future work

As noted during our own collection efforts, the quality and availability of the data chosen as performance indicators can impede the efficiency of constructing of a gold standard data set. Manual correction can improve the data collection, but is expensive and is not conducive to reproducible research. We observed that institutions themselves do not maintain a complete listing of all official Twitter accounts as noted by the number of undiscovered and undocumented accounts we extracted during a secondary search. We must also acknowledge the impact of celebrity professors and verified accounts (e.g. Melissa Harris-Perry, @MHarrisPerry). Given the small number of verified accounts among our official Twitter profiles, we contend that celebrity faculty members might be equated to the influence of Nobel Prize laureates; an indicator which is used by some ranking systems. We did not address known issues with bots and spam accounts which may overinflate the stated number of Twitter followers, the primary component of our UTE score (Davis et al., 2016). We also understand that our methodology constrains universities to a single official hostname which can deflate the UTE score as Twitter accounts that reference other university-owned domains are omitted. Based on our research assumptions, we observed that enrollment does not necessarily increase the Twitter followers needed to compute UTE. Universities are not taking the opportunity to advertise their Twitter accounts and are at times promoting other entities on their homepage. This observation necessitated the need to expand the follower network as we have defined. Schools with highly visible sports programs, like those in the Power Five, tend to have more Twitter followers as the public is more aware of the university’s overall brand. In general, the perceived reputation of any university is impacted less by metrics which are intrinsic to the institution, but intangibles that translate into more impressions or brand awareness by the public and constituents. This parallels the assertions in prior research (Leavitt et al., 2009; Preussler and Kerres, 2010) which contends that popular entities are more likely to attract more followers (see Section 2.3).

5.1 Algorithm 1. Mining official twitter accounts

1: Let h ← homePageURI  2: Let d ← domain (h)  3: primaryTwitterAccts ← findOfficialTwitterAccounts (h,d)  4: function findOfficialTwitterAccounts (H,D)   5:        foundAccountInd ← false  6:        TwitterPrimary ← nil  7:        W ← ViewPageSource (H)  8:        repeat               ▷Search for anchor tag with href in the Twitter format  9:          A ← anchorTag  10:          user ← TwitterRegexp (A)  11:          if user ≡ TwitterAccount then  12:                profile ← TwitterGETusers (user)  13:                 if domain (profileURI) ⊂ D then                           ▷Twitter friends are the users an account follows  14:                     TwitterPrimary ← TwitterPrimary ∪ profile  15:                      friends ← TwitterGET Friends (profile)  16:                     TwitterPrimary ← TwitterPrimary ∪ friends  17:                     foundAccountInd ← true  18:        until W ≡ nil   19:        if  foundAccountInd then   20:            UTE ← 0   21:            for i = 1 do length (TwitterPrimary)  22:                  primAcct ∈ TwitterPrimary (i)  23:                  profile ← TwitterGET Followers (primAcct)  24:                  if domain (profileURI) ⊂ D then  25:                        UTE ← UTE + followers  26:        else  27:          searchResults ← GoogleCustomSearch (h, “twitter”)  28:             TwitterPrimary ← searchResults (0)  29:             UTE ← 0  30:             primAcct ← TwitterPrimary (0)  31:             profile ← TwitterGET Followers (primAcct)  32:             if domain (profileURI) ⊂ D then  33:                    UTE ← UTE + followers              return UTE

Our study is subject to a number of limitations that present opportunities for future work. Campbell’s and Goodhart’s law suggest that if UTE becomes popular, institutions may seek to artificially increase their Twitter followers to improve their ranking. Future work could include only the Twitter accounts of real people. In order to obtain a more complete set of official Twitter accounts, the domain associated with the account URI could be expanded to include all registered domains for the university. Additional research might also broaden the scope of our study to include both US and international universities. It might also be advantageous to subject our observations to a temporal analysis to ascertain whether the UTE rankings, at least for those in the upper echelon, persist over time and to look for non-linear spikes in Twitter followers which may indicate artificial manipulation. The Twitter API is subject to limitations in the retrieval of historical follower counts. Smith (2018) suggests an alternative approach using the Internet Archive’s Wayback Machine to access the digital snapshots, if available, of profile pages as they appeared since the account creation date. We might also consider the ranking potential of other networks with follower counts such as Pinterest or Instagram where universities may have established a social presence.

We note that universities that have large endowments, enrollments and/or athletic budgets will be ranked higher using EEE. This is reflected in the fact that all 65 schools in the NCAA Power Five conferences are found in the top-100 EEE rank. Further analysis of the relationship between the EEE rank and UTE rank could be done to determine which factors (such as athletic engagement) affect the number of Twitter followers a university has.

The use of Web metrics to compute the UTE might also provide an incentive for institutions to increase their web presence as way to further engage with constituents and the general public. Social media allows us to measure another proxy for reputation - how the universities and the public engage with one another. The universities themselves have to decide whether this kind of outreach is important and invest in it, and the public needs to be interested enough to follow them.

6. Conclusion

We examined and ranked a set of 264 US universities extracted from the NCAA Division I membership and lists published in US News & World Report, Times Higher Education, Academic Ranking of World Universities and Money Magazine using an adjusted reputation rank (ARR) that we compared to our endowment, expenditures enrollment (EEE) and University Twitter Engagement (UTE) scores. To compute the EEE and UTE rankings, we mined available data from Twitter and other publicly available collections of data. When compared to the ARR rank for all 264 represented universities, we noted a strong correlation with UTE (τ = 0.6018) and a similar correlation with EEE (τ = 0.5969). We conclude that these rankings are comparable to those presented in other academic-based ranking systems; however, we present a low-cost data acquisition methodology using only web-based artifacts. Both EEE and UTE also offer distinct advantages because:

  • they can be calculated on-demand rather than relying on an annual publication; and

  • they promote diversity in the ranking lists, as any university with a Twitter account can be given a UTE rank and any university with online information about enrollment, expenditures and endowment can be given an EEE rank.

Figures

Twitter follower comparison

Figure 1

Twitter follower comparison

Mining official Twitter accounts

Figure 2

Mining official Twitter accounts

Correlation of composite rankings (full data set)

Figure 3

Correlation of composite rankings (full data set)

Correlation of composite rankings (full data set)

Figure 4

Correlation of composite rankings (full data set)

Contribution of each ranking list to our data set

Ranking system Total universities US & NCAA Division 1 Unique entries
ARWU 500 107 1
Money Magazine 705 249 115
THE 800 118 4
US News 2015 500 99 0
US News 2016 750 137 3
Any Two Lists 22
Any Three Lists 19
Any Four Lists 16
All Five Lists 84
Total 264

Rank sequencing using Spearman’s footrule

University THE rank THE ordered
Stanford University 3 1
Harvard University 6 2
Princeton University 7 3
Yale University 12 4
University of California-Berkeley 13 5
Columbia University 15 6
University of California-Los Angeles 16 7
University of Pennsylvania 17 8
Cornell University 18 9
Duke University 20 10
University of San Diego 112
Villanova University 112

Union of the top 10 universities according to ARR and top 10 according to UTE, sorted by ARR

University ARWU ordered THE ordered USNEWS 2015 ordered USNEWS 2016 ordered Mean reputation score Adjusted reputation rank UTE score UTE rank
Harvard University 1 2 1 1 1 1 4,562,501 1
Stanford University 2 1 3 3 2 2 2,239,440 2
University of California-Berkeley 3 5 2 2 3 3 474,901 19
Princeton University 4 3 6 7 5 4 574,758 15
Columbia University 5 6 5 5 5 4 759,574 7
University of California-Los Angeles 7 7 4 4 6 6 394,815 28
Yale University 6 4 9 8 7 7 808,461 4
University of Pennsylvania 10 8 10 8 9 8 778,805 5
University of Washington 9 13 7 6 9 8 274,674 44
University of Michigan 11 11 7 10 10 10 671,277 12
Cornell University 8 9 12 12 10 10 820,656 3
Note:

UTE score represents the total primary and secondary twitter followers

Kendall’s Tau-b correlation between ranking lists and our adjusted reputation rank (N = 264)

Ranking system ARWU MONEY USNEWS2015 USNEWS2016 THE ARR
ARWU 1 0.4191 0.8763 0.8565 0.7634 0.8533
MONEY 0.4191 1 0.3761 0.3239 0.3504 0.3189
USNEWS2015 0.8763 0.3761 1 0.8787 0.7496 0.8542
USNEWS2016 0.8565 0.3239 0.8787 1 0.7605 0.9375
THE 0.7634 0.3504 0.7496 0.7605 1 0.8285
ARR 0.8533 0.3189 0.8542 0.9375 0.8285 1

Top-10 universities ranked by EEE

University Undergraduate enrollment Endowment, thousands ($) Athletic expenditures ($) EEE
Ohio State University 40,452 3,633,887 136,966,818 1
University of Texas 36,072 3,341,835 152,853,239 2
Pennsylvania State University 39,077 3,635,730 117,818,050 3
University of Michigan 27,297 9,952,113 131,003,957 3
University of Wisconsin-Madison 27,867 2,465,051 122,975,876 5
University of Florida 29,577 1,550,000 130,772,416 6
Michigan State University 35,038 2,274,813 89,491,630 7
University of Washington 27,733 3,076,226 88,580,078 8
University of California-Los Angeles 29,027 1,864,605 96,912,767 9
Indiana University 31,161 1,974,215 81,161,423 10

Kendall’s Tau-b correlation between composite rankings and UTE rank for institutions on two or more lists

Composite ranking EEE UTE ARR
(a) Top 50
EEE 1 0.5728 0.5310
UTE 0.5728 1 0.6691
ARR 0.5310 0.6691 1
(b) Top 100
EEE 1 0.5620 0.5410
UTE 0.5620 1 0.5920
ARR 0.5410 0.5920 1
(c) Top 141
EEE 1 0.5960 0.5538
UTE 0.5960 1 0.5967
ARR 0.5538 0.5967 1
(d) All 264
EEE 1 0.6461 0.5969
UTE 0.6461 1 0.6018
ARR 0.5969 0.6018 1

Underreported UTE for NCAA power five universities where the URI does not conform to our domain rules

University Homepage domain Unique secondary domains Sampling of secondary domains Secondary twitter accounts Secondary UTE
Arizona State University asu.edu 92 thesundevils.com asufoundation.org 138 498,097
Auburn University auburn.edu 15 auburntigers.com auburnalabama.org auburn.collegiatelink.net 43 809,923
Purdue University purdue.edu 181 purduesports.com purduealumni.org 296 426,586
University of Wisconsin wisc.edu 72 uwbadgers.com badgernation.com 118 978,770

Notes

References

Aguillo, I.F., Ortega, J.L. and Fernández, M. (2008), “Webometric ranking of world universities: introduction, methodology, and future developments”, Higher Education in Europe, Vol. 33 Nos 2/3, pp. 233-244.

Altbach, P. (2015), “The dilemmas of ranking”, International Higher Education, Vol. 42.

Antoniadis, K., Zafiropoulos, K. and Vrana, V. (2016), “A method for assessing the performance of e-government Twitter accounts”, Future Internet, Vol. 8 No. 4, p. 12.

Black, T.R. (1993), Evaluating Social Science Research: An Introduction. Sage, Thousand Oaks, CA.

Bowman, N.A. and Bastedo, M.N. (2011), “Anchoring effects in world university rankings: exploring biases in reputation scores”, Higher Education, Vol. 61 No. 4, pp. 431-444.

Cha, M., Haddadi, H., Benevenuto, F. and Krishna Gummadi, P. (2010), “Measuring user influence in Twitter: the million follower fallacy”, ICWSM, Vol. 10 Nos 10/17, p. 30.

Davis, C.A., Varol, O., Ferrara, E., Flammini, A. and Menczer, F. (2016), “BotOrNot: a system to evaluate social bots”, Technical Report arXiv:1602.00975.

Dijkmans, C., Kerkhof, P. and Beukeboom, C.J. (2015), “A stage to engage: social media use and corporate reputation”, Tourism Management, Vol. 47, pp. 58-67.

Fagin, R., Kumar, R. and Sivakumar, D. (2003), “Comparing top k lists”, SIAM Journal on Discrete Mathematics, Vol. 17 No. 1, pp. 134-160.

Go, E. and You, K.H. (2016), “But not all social media are the same: analyzing organizations’ social media usage patterns”, Telematics and Informatics, Vol. 33 No. 1, pp. 176-186.

Goglio, V. (2016), “One size fits all? A different perspective on university rankings”, Journal of Higher Education Policy and Management, Vol. 38 No. 2, pp. 212-226.

Greenwood, S., Perrin, A. and Duggan, M. (2016), “Social media Update2016”.

Grewal, R., Dearden, J.A. and Lilien, G. (2008), “The university rankings game: modeling the competition among universities for ranking”, The American Statistician, Vol. 62 No. 3, pp. 232-237.

Heller Baird, C. and Parasnis, G. (2011), “From social media to social customer relationship management”, Strategy & Leadership, Vol. 39 No. 5, pp. 30-37.

Jansen, B.J., Zhang, M., Sobel, K. and Chowdury, A. (2009), “Twitter power: tweets as electronic word of mouth”, Journal of the American Society for Information Science and Technology, Vol. 60 No. 11, pp. 2169-2188.

Kang, M. (2014), “Understanding public engagement: conceptualizing and measuring its influence on supportive behavioral intentions”, Journal of Public Relations Research, Vol. 26 No. 5, pp. 399-416.

Klein, M., Hunsicker, O. and Nelson, M.L. (2009a), “Comparing the performance of US college football teams in the web and on the field”, Proceedings of the 20th ACM Conference on Hypertext and Hypermedia, ACM, pp. 63-72.

Klein, M., Hunsicker, O. and Nelson, M.L. (2009b), “Correlation of music charts and search engine rankings”, Proceedings of the 9th ACM/IEEE-CS Joint Conference on Digital Libraries, ACM, pp. 415-416.

Kunegis, J., Blattner, M. and Moser, C. (2013), “Preferential attachment in online networks: measurement and explanations”, Proceedings of the 5th Annual ACM Web Science Conference, ACM, pp. 205-214.

Kwak, H., Lee, C., Park, H. and Moon, S. (2010), “What is twitter, a social network or a news media?”, Proceedings of the 19th International Conference on World Wide Web, ACM, pp. 591-600.

Leavitt, A., Burchard, E., Fisher, D. and Gilbert, S. (2009), “The influentials: new approaches for analyzing influence on Twitter”, Web Ecology Project, Vol. 4 No. 2, pp. 1-18.

McCoy, C.G., Nelson, M.L. and Weigle, M.C. (2017), “University Twitter engagement: using Twitter followers to rank universities”, Technical Report arXiv:1708.05790, Old Dominion University Department of Computer Science.

Men, L.R. and Tsai, W.H.S. (2015), “Infusing social media with humanity: corporate character, public engagement, and relational outcomes”, Public Relations Review, Vol. 41 No. 3, pp. 395-403.

Nelson, M.L., Klein, M. and Magudamudi, M. (2008), “Correlation of expert and search engine rankings”, Technical Report arXiv:0809.2851, Old Dominion University Department of Computer Science.

Preussler, A. and Kerres, M. (2010), “Managing reputation by generating followers on twitter”, Medien-Wissen-Bildung Explorationen visualisierter und kollaborativer Wissensräume, pp. 129-143.

Saito, K. and Masuda, N. (2013), “Two types of Twitter users with equally many followers”, Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ACM, pp. 1425-1426.

Smith, M. (2018), “Twitter follower count history via the internet archive”, Web Science and Digital Libraries Research Group at Old Dominion University, Blog, available at: https://ws-dl.blogspot.com/2018/03/2018-03-14-twitter-follower-count.html

Standifird, S.S. (2005), “Reputation among peer academic institutions: an investigation of the US news and world report’s rankings”, Corporate Reputation Review, Vol. 8 No. 3, pp. 233-244.

Tsimonis, G. and Dimitriadis, S. (2014), “Brand strategies in social media”, Marketing Intelligence & Planning, Vol. 32 No. 3, pp. 328-344.

Weller, K. and Kinder-Kurlanda, K.E. (2016), “A manifesto for data sharing in social media research”, Proceedings of the 8th ACM Conference on Web Science, ACM, pp. 166-172.

Weng, J., Lim, E.P., Jiang, J. and He, Q. (2010), “Twitterrank: finding topic-sensitive influential Twitterers”, Proceedings of the third ACM International Conference on Web search and Data Mining, ACM, pp. 261-270.

Acknowledgements

This paper was accepted for presentation at the JCDL 2018 Workshop on Knowledge Discovery from Digital Libraries, June 6, 2018, Fort Worth, TX.

Corresponding author

Corren G. McCoy can be contacted at: cmccoy@cs.odu.edu