# Archival research considerations for CRSP data

Rick N. Francis (Department of Accounting and Information Systems, College of Business Administration, University of Texas at El Paso, El Paso, Texas, USA)
Grace Mubako (Department of Accounting and Information Systems, College of Business Administration, University of Texas at El Paso, El Paso, Texas, USA)
Lori Olsen (School of Accounting, Central Michigan University, Mount Pleasant, Michigan, USA)

ISSN: 1030-9616

Publication date: 3 September 2018

## Abstract

### Purpose

This study aims to remind researchers that measurement errors and inappropriate inferences may result from improperly combining and adjusting certain Center for Research in Security Prices (CRSP) measures.

### Design/methodology/approach

In addition to real-world working examples, the study uses earnings announcements data to examine the effects of improperly combining and adjusting CRSP measures.

### Findings

This study assists researchers with the following two considerations when using CRSP data: stand-alone share prices adjusted with CRSP adjustment factors are inaccurate in the presence of property dividend, spin-off and rights offering events; and ignoring covertly missing stock returns may create misleading test results. The primary objectives of the study are to help researchers increase the integrity of their studies and the probability of publication.

### Research limitations/implications

Inadequate consideration for the two issues discussed in the paper may change the researcher’s statistical inferences.

### Originality/value

Archival researchers who overtly address and discuss the existence of these issues achieve two important and related benefits. First, the researcher increases his or her credibility with editors and reviewers, which enhances the probability of a published study. Second, the researcher increases his or her perceived technical competency, which potentially affects promotion and tenure decisions, editorial membership decisions, co-authorship opportunities and other professional effects. Doctoral students will find this study to be particularly useful.

## Keywords

#### Citation

Francis, R., Mubako, G. and Olsen, L. (2018), "Archival research considerations for CRSP data", Accounting Research Journal, Vol. 31 No. 3, pp. 360-370. https://doi.org/10.1108/ARJ-06-2016-0065

### Publisher

:

Emerald Publishing Limited

## Introduction

Researchers frequently encounter tasks that require modifying variables from the Center for Research in Security Prices (CRSP) database. This study reminds researchers that measurement errors and inappropriate inferences may result from improperly adjusting the prices from CRSP and ignoring the implications of missing prices and stock returns. Moreover, the study aims to assist researchers in avoiding unnecessary reviewer and editorial skepticism, which reduces the probability of a publication.

The literature contains several studies which highlight biases in CRSP data (Guenther and Rosman, 1994; Shumway, 1997; Canina et al., 1998; Shumway and Warther, 1999; and Fisher et al., 2010). Other studies address errors in CRSP data (Rosenberg and Houglet, 1974; Bennin, 1980; Courtenay and Keller, 1994; and Elton et al., 2001). In contrast to past studies which highlight problems with CRSP data per se, the focal point of the current study is to address errors that can arise when archival researchers modify and adjust CRSP data to construct certain variables that are commonly used in the literature.

This study assists researchers with the following two considerations when using CRSP data:

1. Stand-alone share prices adjusted with CRSP adjustment factors are inaccurate in the presence of property dividend, spin-off and rights offering events.

2. Ignoring covertly missing stock returns may create misleading test results.

Archival researchers who explicitly address and discuss the existence of these issues increase their credibility with editors, reviewers and colleagues. Doctoral students will find this study to be particularly useful. A discussion of these two issues appears in the following sections, along with complementary statistics.

## Center for Research in Security Prices adjustment factors

CRSP adjustment factors for share prices (CFACPR) enable the researchers to restate historical share prices for artificial reductions created by events such as stock splits and stock dividends[1]. Although these events change share prices, they lack economic substance and create synthetic structural breaks, which undermine the comparability of share prices before and after the events. Unexpected cash dividends and other property dividends are examples of events which also change share prices. However, these events change the economic position of the firm; thus, the corresponding changes in share price are substantive and require no adjustment.

An important point is that using CRSP adjustment factors may create unintended consequences when the researcher’s sample contains events with and without economic substance. For example, assume that a researcher’s sample includes observations for property dividend, spin-off or rights offering events, along with observations for stock split or stock dividend events. Further assume that the researcher uses the CRSP cumulative adjustment factor for price (CFACPR) to accommodate the artificial decrease in share prices associated with stock split and stock dividend events. Herein lies the potential danger with the use of CRSP adjustment factors: the researcher adjusts share prices for all observations in the sample and unknowingly adjusts the share prices for property dividend, spin-off and rights offering events. These adjustments are problematic as CRSP adjustment factors for share price include the effects of property dividend, spin-off and rights offering events. Assuming that the researcher is using share price in a stand-alone manner (e.g. as a dependent variable for a regression model), then the adjustment of the historical share prices for these events creates erroneous sample observations. The key idea is that researchers must only use the CRSP adjustment factors to accommodate events which lack economic substance, such as stock splits and stock dividends.

The problem associated with CRSP adjustment factors potentially affects several studies, yet the narratives of most studies preclude the clear identification of the sources of the adjustment factors. For example, Kothari and Zimmerman (1995) indicate the use of earnings and returns data from Compustat and CRSP. The authors further state that “Earnings and price data are adjusted for stock splits, stock dividends and stock issues.” Presumably, the authors use the adjustment factor associated with the database from which they retrieve share price. However, the source of the share price, as well as the adjustment factor, is indeterminable. Similarly, Lipe (1986) uses data from Compustat and CRSP and indicates the use of “price at the end of March of year t, adjusted for stock splits and stock dividends.” Again, clearly identifying the source for the adjustment factor is not possible. One explanation for omitting the sources of the adjustment factors is that researchers assume that the same values appear in both databases (i.e. CRSP and Compustat). However, the clear identification of the source for the adjustment factors is crucial, as Compustat adjustment factors exclude the effects of property dividend, spin-off and rights offering events. It is worth noting that the primary reason that CRSP adjustment factors include the effects of property dividend, spin-off and rights offering events is to compute and report an accurate value for the stock return variable. The failure to adjust share prices for the effects of property dividend, spin-off and rights offering events will typically generate a large negative stock return, which is misleading.

Examples of studies clearly identifying CRSP as the source for the adjustment factors include (but are not limited to) Easton and Harris (1991), Easton et al. (1992), Brown and Pfeiffer (2007), Freeman, Koch and Li (2011) and Haggard et al. (2015)[2]. The use of CRSP adjustment factors should not automatically diminish the inferences from these studies, as the sample selection and exclusion processes are unique to each study and potentially exclude many of the problematic observations. More importantly, the overall objective of the current study is prospective in nature, where the aim is to inform and assist the research community in planning and avoiding any unintended consequences associated with the use of CRSP adjustment factors.

One obvious solution to the CRSP adjustment factor hazard is for the researcher to use share price and adjustment factor variables from Compustat when possible and practical. This limits any share price adjustments to stock split and stock dividend events. Alternatively, the researcher may use the CRSP events file to identify property dividend, spin-off and rights offering events in the sample and then exclude these observations from the sample or modify the adjustment factor for the impact of these events (i.e. modify using FACPR). A sensitivity test with and without these observations is also appropriate.

## Table II.

Descriptive statistics for three-day raw returns around earnings announcement dates (1976-2015) (719,347 events, 20,619 firms)

N Mean SD N Mean SD N Mean SD
Panel A – all observations (positive, zero and negative)
1 39,693 0.012 0.098 58,816 0.006 0.065 58,770 0.005 0.056
2 50,256 0.010 0.104 61,801 0.006 0.083 61,768 0.006 0.066
3 55,078 0.008 0.108 64,122 0.006 0.089 64,096 0.005 0.069
4 59,253 0.005 0.105 66,816 0.004 0.089 66,793 0.003 0.068
5 64,039 0.003 0.100 70,325 0.003 0.087 70,313 0.002 0.064
6 67,680 0.002 0.101 73,175 0.002 0.091 73,162 0.002 0.064
7 70,357 0.001 0.100 74,954 0.001 0.088 74,948 0.001 0.063
8 73,047 0.001 0.116 76,690 0.001 0.089 76,684 0.001 0.062
9 75,348 0.001 0.100 77,920 0.001 0.091 77,913 0.001 0.062
10 78,290 0.001 0.089 80,055 0.002 0.082 80,049 0.002 0.056
Panel B – positive event-period return observations
1 21,425 0.062 0.100 18,993 0.052 0.084 28,645 0.036 0.058
2 25,968 0.069 0.101 24,482 0.060 0.095 30,408 0.043 0.066
3 28,041 0.069 0.108 26,602 0.063 0.099 31,435 0.045 0.069
4 29,794 0.068 0.099 28,276 0.063 0.094 32,674 0.044 0.065
5 32,087 0.066 0.088 30,785 0.062 0.083 34,405 0.043 0.058
6 33,935 0.066 0.089 32,544 0.062 0.087 36,031 0.042 0.057
7 34,953 0.065 0.086 33,837 0.061 0.080 36,673 0.041 0.055
8 36,242 0.065 0.118 35,234 0.061 0.078 37,765 0.041 0.052
9 37,678 0.063 0.087 36,590 0.059 0.083 38,614 0.040 0.054
10 39,834 0.057 0.071 39,058 0.054 0.066 40,758 0.037 0.046
Panel C – negative event-period return observations
1 18,268 −0.046 0.054 16,574 −0.040 0.047 26,278 −0.029 0.035
2 24,288 −0.053 0.059 22,837 −0.048 0.053 28,650 −0.034 0.040
3 27,037 −0.056 0.061 25,754 −0.051 0.055 30,273 −0.037 0.041
4 29,459 −0.060 0.063 28,351 −0.054 0.057 32,020 −0.038 0.042
5 31,952 −0.061 0.064 30,825 −0.055 0.059 34,187 −0.038 0.042
6 33,745 −0.062 0.066 32,780 −0.057 0.061 35,667 −0.039 0.043
7 35,404 −0.062 0.067 34,289 −0.057 0.062 37,042 −0.038 0.042
8 36,805 −0.063 0.069 35,691 −0.058 0.064 37,909 −0.039 0.043
9 37,670 −0.061 0.069 36,809 −0.056 0.064 38,549 −0.037 0.043
10 38,456 −0.057 0.065 37,523 −0.053 0.062 38,757 −0.035 0.041
Notes:

The events are earnings announcement dates from Compustat (mnemonic RDQ) during the period from January 1976 through November 2015, and the event window begins with the day prior to the earnings announcement date (−1) and ends with the day following the earnings announcement date (+1); the event-period returns are three-day raw (i.e., unadjusted) returns from the CRSP database; trade-to-trade returns represent the changes in share prices from actual trading and exclude overtly missing returns (i.e. missing values), as well as covertly missing returns (i.e. returns using changes in prices from the bid-ask average prices); lumped returns are essentially trade-to-trade returns with zeros substituted for any missing return (overt or covert); and CRSP returns are the returns as reported by CRSP (i.e. mnemonic RET)

## Table III.

Paired difference t-tests for three-day event-period returns (1976-2015)

Volume portfolio N Mean difference N Mean difference N Mean difference
Panel A – all observations
1 39,373 0.005*** 39,343 0.006*** 58,660 0.001***
2 49,873 0.003*** 49,847 0.004*** 61,652 0.001**
3 54,714 0.002*** 54,695 0.003*** 63,969 0.002***
4 58,889 0.001 58,868 0.002*** 66,646 0.001
5 63,687 −0.001 63,678 0.001 70,195 0.001*
6 67,269 −0.001 67,257 0.001 73,052 0.000
7 69,958 −0.001 69,955 (0.001) 74,850 (0.000)
8 72,618 −0.001 72,614 (0.001) 76,568 (0.000)
9 74,938 −0.001 74,932 (0.001) 77,845 0.000
10 77,896 −0.001 77,891 (0.001) 79,956 0.000
Panel B – positive difference observations
1 15,358 0.042*** 20,806 0.035*** 28,128 0.030***
2 17,455 0.040*** 25,170 0.036*** 29,712 0.035***
3 18,250 0.037*** 27,245 0.035*** 30,883 0.035***
4 19,029 0.035*** 28,959 0.034*** 32,108 0.036***
5 20,151 0.032*** 31,572 0.033*** 34,248 0.035***
6 20,989 0.031*** 33,077 0.033*** 35,600 0.036***
7 21,356 0.029*** 34,430 0.032*** 36,523 0.034***
8 22,015 0.029*** 35,554 0.032*** 37,295 0.035***
9 22,669 0.026*** 36,827 0.031*** 38,259 0.033***
10 24,384 0.023*** 38,808 0.028*** 39,701 0.031***
Panel C – negative difference observations
1 14,508 −0.033*** 18,523 −0.027*** 26,717 −0.030***
2 17,166 −0.034*** 24,652 −0.029*** 29,242 −0.034***
3 18,019 −0.033*** 27,425 −0.030*** 30,709 −0.034***
4 18,911 −0.033*** 29,906 −0.031*** 32,462 −0.035***
5 19,973 −0.031*** 32,112 −0.031*** 34,244 −0.034***
6 20,566 −0.031*** 34,194 −0.031*** 36,010 −0.034***
7 20,896 −0.030*** 35,540 −0.032*** 37,103 −0.034***
8 21,754 −0.030*** 37,080 −0.032*** 38,271 −0.034***
9 22,383 −0.028*** 38,116 −0.031*** 38,838 −0.033***
10 24,110 −0.024*** 39,105 −0.029*** 39,746 −0.031***
Notes:

***

;

**;

and

*

indicate statistical significance at the 1%, 5% and 10% levels, respectively

The events are earnings announcement dates from Compustat (mnemonic RDQ) during the January 1976 through November 2015, and the event window begins with the day prior to the earnings announcement date and ends with the day following the earnings announcement date (i.e. −1 and +1); The event-period returns are three-day raw (i.e. unadjusted) returns from the CRSP database; the t-tests assess the mean differences in paired observations from two return measurements, where a firm identifier (PERMCO) and an event date define a pair, and the return is one of three types: trade-to-trade, lumped or CRSP. Thus, the three possible pairings are: trade-to-trade returns vs lumped returns; trade-to-trade returns vs CRSP returns; and lumped returns vs CRSP returns; the standard errors include adjustments for clusters of observations in time and across firms (i.e. repeated measures of time and firms) using the method proposed by Gow et al. (2010); positive differences refer to the following three outcomes: trade-to-trade returns > lumped returns; trade-to-trade returns > CRSP returns; and lumped returns > CRSP returns. In contrast, negative returns refer to the following three outcomes: trade-to-trade returns < lumped returns; trade-to-trade returns < CRSP returns; and lumped returns < CRSP returns. Note that the sum of the number of observations for positive and negative differences reported in Panels B and C is less than the total number of observations reported in Panel A due to the equivalence of the three measures for several observations (i.e. some observations generate zero differences and are therefore neither positive nor negative)

## Notes

1.

References to stock splits are for transactions that reduce share prices. Although potentially relevant, the discussion excludes events which increase share prices, such as reverse stock splits. References to adjustment factors are for share prices instead of shares outstanding.

2.

This list is not exhaustive and only serves as an example.

3.

Although shares which do not trade for a given period generate no true return for the period, the bid and ask prices may change despite the absence of trading.

4.

Hereafter, all references to missing returns include both overt and covert missing returns.

5.

The focal point of the current study is upon raw return rather than abnormal returns. It is worth noting that event studies frequently use abnormal returns, and abnormal trade-to-trade returns require the use of market index returns (see Campbell et al., 2010).

6.

The five journals and their respective number of event studies are The Accounting Review (16); Contemporary Accounting Research (11); Journal of Accounting and Economics (9); Journal of Accounting Research (7); and Review of Accounting Studies (3).

## References

Bartholdy, J., Olson, D. and Peare, P. (2007), “Conducting event studies on a small stock exchange”, European Journal of Finance, Vol. 13 No. 3, pp. 227-252.

Bennin, R. (1980), “Error rates in CRSP and compustat: a second look”, Journal of Finance, Vol. 35 No. 5, pp. 1267-1271.

Brown, R. and Pfeiffer, R. (2007), “Causes and consequences of the relation between split-adjusted share prices and subsequent stock returns”, Journal of Business Finance & Accounting, Vol. 34 Nos 1/2, pp. 292-312.

Campbell, C., Cowan, A. and Salotti, V. (2010), “Multi-country event-study methods”, Journal of Banking and Finance, Vol. 34 No. 12, pp. 3078-3090.

Canina, L., Michaely, R., Thaler, R. and Womack, K. (1998), “Caveat compounder: a warning about using the daily CRSP equal-weighted index to compute long-run excess returns”, Journal of Finance, Vol. 53 No. 1, pp. 403-416.

Courtenay, S. and Keller, S. (1994), “Errors in databases revisited: an examination of the CRSP shares outstanding data”, The Accounting Review, Vol. 69 No. 1, pp. 285-291.

Easton, P. and Harris, T. (1991), “Earnings as an explanatory variable for returns”, Journal of Accounting Research, Vol. 29 No. 1, pp. 19-36.

Easton, P., Harris, T. and Ohlson, J. (1992), “Aggregate accounting earnings can explain most of security returns: the case of long run return intervals”, Journal of Accounting and Economics, Vol. 15 Nos 2/3, pp. 119-142.

Elton, E., Gruber, M. and Blake, C. (2001), “A first look at the accuracy of the CRSP mutual fund database and a comparison of the CRSP and morningstar mutual fund databases”, Journal of Finance, Vol. 56 No. 6, pp. 2415-2430.

Fisher, L., Weaver, D. and Webb, G. (2010), “Removing biases in computed returns”, Review of Quantitative Finance and Accounting, Vol. 35 No. 2, pp. 137-161.

Freeman, R., Koch, A. and Li, H. (2011), “Can historical returns-earnings relations predict price responses to earnings news?”, Review of Quantitative Finance and Accounting, Vol. 37 No. 1, pp. 35-62.

Gow, I., Ormazabal, G. and Taylor, D. (2010), “Correcting for cross-sectional and time-series dependence in accounting research”, The Accounting Review, Vol. 85 No. 2, pp. 483-512.

Guenther, D. and Rosman, A. (1994), “Differences between compustat and CRSP SIC codes and related effects on research”, Journal of Accounting and Economics, Vol. 18 No. 1, pp. 115-128.

Haggard, K., Walkup, B. and Xi, Y. (2015), “Short-term performance of US-bound Chinese IPOs”, Financial Review, Vol. 50 No. 1, pp. 121-141.

Kothari, S. and Zimmerman, J. (1995), “Price and return models”, Journal of Accounting and Economics, Vol. 20 No. 2, pp. 155-192.

Lesmond, D., Ogden, J. and Trzcinka, C. (1999), “A new estimate of transaction costs”, Review of Financial Studies, Vol. 12 No. 5, pp. 1113-1141.

Lipe, R. (1986), “The information contained in the components of earnings”, Journal of Accounting Research, Vol. 24, pp. 37-64.

Maynes, E. and Rumsey, J. (1993), “Conducting event studies with thinly traded stocks”, Journal of Banking and Finance, Vol. 17 No. 1, pp. 145-157.

Rosenberg, B. and Houglet, M. (1974), “Error rates in CRSP and Compustat databases and their implications”, Journal of Finance, Vol. 29 No. 4, pp. 1303-1310.

Shumway, T. (1997), “The delisting bias in CRSP data”, Journal of Finance, Vol. 52 No. 1, pp. 327-340.

Shumway, T. and Warther, V. (1999), “The delisting bias in CRSP’s NASDAQ data its implication for the size effect”, Journal of Finance, Vol. 54 No. 6, pp. 2361-2379.

## Acknowledgements

The authors are grateful for the helpful comments of Jim Angel, Tom Dyckman, Oscar Varela and Faith Xie, the participants at the 2015 Midwest American Accounting Association Conference, and two anonymous reviewers. All errors and omissions are solely the responsibility of the authors.

## Corresponding author

Rick N. Francis can be contacted at: rnfrancis@utep.edu