Segregation and the onset of COVID-19 in American cities

Purpose – This paper investigates one of the potential costs of rising segregation in American cities by evaluating empirically the extent at which ethnic-based segregation contributes to the onset and the speed of propagation of the COVID-19 pandemic. Design/methodology/approach – Regression analysis based on matched data on early incidence of COVID-19 cases, segregation and covariates. Identification resorts on variations in segregation across MSAs and heterogeneity in the geography and timing of stay-at-home orders. Findings – Onecross-MSAstandarddeviationincreasein segregationleadstoasignificantand robustrise of COVID-19 cases of 8.7 per 100,000 residents across urban counties. Originality/value – Combines spatial data on COVID-19 cases and segregation; use of a new segregation measure; focus on early incidence of the pandemic and its drivers.


Introduction
There has been substantial research about the biological causes and consequences of the Coronavirus pandemic, the process of its transmission, the underlying mechanisms, alongside the economic and social consequences of its spreading.In social sciences, the COVID-19 epidemics resulting from the global spreading of the virus has been treated as a health shock, and its implications at individual and societal levels investigated (for a review, see Rathnayaka et al., 2023).
Understanding the role of mediating factors that may have contributed to the onset of COVID-19 is equally important from the perspective of public health and the economy as a whole.This paper contributes with evidence that identifies ethnic-based segregation in American cities as a driver for the onset and early spreading of the COVID-19 infection.Wu and McGoogan (2020) highlight the relevance of demographics and income as potential drivers.Housing characteristics, population density, and the extent of urbanization are found to be relevant for explaining the likelihood of exposure to the disease.This happens because poorer households are typically larger and multi-generational, live in smaller dwellings, and are more likely to use public transport, thereby being exposed to more occasions for human interactions and facing lower chances of self-isolation.Borjas (2020) makes use of zip-code data on testing coverage and COVID-19 onset in New York City to highlight a correlation between the ethnic composition of the neighborhoods and the rate of infection, which is also related to non-randomness in testing.
We conjecture that if the quantity of social interactions is stronger within communities with similar traits, then societies that are more segregated along the same traits also display larger chances of interactions, which become more specialized within the same community.This mechanism has been explored in Jackson et al. (2023) and Kim et al. (2023), suggesting the possibility of a segregation gradient on COVID-19 spreading.
In this paper, we investigate empirically this hypothesis.Our focus is, in particular, on the dimension of segregation in the space of social interactions.The measure of segregation that we consider aims at capturing heterogeneity in the way people of various ethnic origins who live in different places of the city display different patterns of social interactions with the rest of the population.Segregation arises from the fact that the chances of interacting with people of the same origin or from a different origin are unevenly distributed across the population.We adopt an index of segregation that embodies these features to measure segregation in American cities, for which fine-grained information about the ethnic distribution of urban residents across neighborhoods is available from censuses and surveys.More segregated cities are cities where people tend to interact with larger frequency with other individuals sharing a particular ethnic origin background.
The analysis that follows uncovers evidence supporting our stating hypothesis: segregation, pre-determined with respect to the pandemic, is positively correlated with the onset and speed of increase of cases of COVID-19 at county level that were reported from official sources at the very onset of the pandemic, about March 2020.

Relevant literature
This paper adds to a growing body of research that emphasizes the role of social connections as a driver for the COVID-19 pandemic.Kuchler et al. (2022) provides empirical support to this conjecture: making use of data from social media as a proxy of social connections between regions, they find that social ties correlate with the spread of the pandemic.The extent at which exposure to social connections can be avoided or contained during the onset of the pandemic crucially depends on the means available, on the policy context and on the strength of the connections.
Concerning the means of avoidance, Brown and Ravallion (2020) find that social distancing remains a relevant driver to reduce the infection spread, albeit the effect being weaker in poorer and more unequal US counties, where the cost of reducing social distancing is higher.Likewise, Jung et al. (2021) show that in low-density counties, the incidence of COVID-19 infections rises with the poverty level.Conversely, in densely populated areas with limited opportunities for self-isolation, a U-shaped relationship between county poverty level and infection spread predominates.Moreover, a positive association between mobility and the spread of COVID-19 have been thoroughly documented (Ghirelli et al., 2023), whereas the relation is accentuated in areas with higher poverty rates and per capita income (Yilmazkuday, 2023).
Policy actions, taking the forms of stay-at-home orders, have been put in place since March 2020 in order to restrain mobility.Evidence from the 1918 influenza pandemic suggests that strict and timely quarantine measures are among the most effective short-term measures to prevent the spread of epidemic (Bootsma and Ferguson, 2007).Differences in the timing of implementation of social distancing in American cities around 1918 have been used to

Journal of Economic Studies
analyze such effects.Only strict and timely quarantine measures have a significant impact on the epidemic incidence and mortality (Bootsma and Ferguson, 2007), while less effective measures may also have a strong effect in reducing the virus attack rate, albeit the effect is channeled by a delaying of the pandemic duration (Markel et al., 2007).The impact of mobility restrictions appears to hinge on the socioeconomic context of the places where these measures are implemented (Berry et al., 2021;Chernozhukov et al., 2021).Specifically, Coven and Gupta (2020) show that residents of low-income neighborhoods in New York City face greater constraints in complying with stay-at-home orders, while Chiou and Tucker (2020) report evidence of higher compliance with social distance directives among residents of higher-income regions.See Brodeur et al. (2021) for a survey of determinants of compliance with and effectiveness of social distancing.
Less evidence is available about the impact of the strength of social connections.Social bonds can be measured in many dimensions.Recent contributions have focused on the ethnic dimensions and the way social bond differ across ethnic groups and locations.The degree of ethnic diversity represents a relevant driver of the compliance with, and hence the effectiveness of, social distancing orders.To this regard, Zhai et al. (2023) find evidence that people in high ethnically diverse neighborhood show less compliance with mobility restrictions and are less likely to reduce social interactions.More generally, residential segregation has been a significant factor in rising mortality and infection rates within US counties, with Black and Latino residents exposed to higher mortality rate than white residents (Torrats-Espinosa, 2021;Trounstine and Goldman-Mellor, 2023).Moreover, the interplay of income inequality and segregation exacerbates the toll, amplifying the adverse health outcomes and resulting in a greater loss of life (Yu et al., 2021).
Differently from the aforementioned contributions, this paper examines the relation between COVID-19 onset and city-level segregation in the domain of social interactions.

Measuring exposure segregation
Exposure segregation is conceived as inequality in the distribution of interaction profiles with the relevant social groups.In this application, the focus is on groups defined on the basis of their ethnic origin.An interaction profile measures the probabilities that a given individual i has to interact with each of the G ethnic groups that we consider.Denote such probability π gi ∈ [0, 1], with P g π gi 5 1 and g ∈ G.An interaction profile is depicted by column vector π i .
There are multiple dimensions that may affect interaction profiles.It is common in applied analysis to assume that all individuals observed in the same location have the same interaction probabilities and that spatial proximity can be used as a proxy for social proximity.If i are units, such as neighborhoods or ZIP codes, one can compute interaction profiles for each unit by looking at the ethnic composition of the unit and neighboring units.
Our goal is to assess exposure segregation in a specific city.In our application, i stands for a census tract of the city, and π gi is the probability that individuals of census tract i interact with individuals from group g in that tract or in neighboring tracts.We adopt a spatial proximity interaction model as in Andreoli (2014) to estimate the interaction probabilities π gi based on the American Community Survey data.Within a city, tracts differ in demographic size.The weights w i ∈ [0, 1] measure the population size of tract i as a share of the city population.Each city displays a specific social composition, represented by the groupspecific probability π g ¼ P i w i π gi .It measures the probability that any random person in the city has to interact with an individual of the group g, given the distribution of groups across the city's traits.Cities differ in population size, density, and group composition.To account for such sources of within and across municipality heterogeneity, it is proposed to weight interaction probabilities by the weight of each unit-measuring the probability that a person JES 51,9 from that unit i is observed in the city population-and scaling that value by the average probability of interaction.This leads to the interaction likelihood ' gi dw i π gi πg .From the Bayes rule, ' ig ∈ [0, 1] and P i ' gi 5 1.Interaction likelihoods can be understood as the chances that the population from group g has to interact with a random inhabitant of unit i.For a given unit, we can represent these probabilities by the column vector ' i .Segregation accrues from the extent to which these probabilities differ across groups.
A segregation index is a function that maps information provided by interaction likelihoods into a number, which we regard as the level of exposure segregation displayed by the city.Our reference measure is the multi-group Gini Exposure Segregation index GS : M GN → ½0; 1.The index has been introduced by Andreoli (2014) and is inspired by the multidimensional extension of the Gini index by Koshevoy and Mosler (1997) (see also Koshevoy, 1995;Koshevoy and Mosler, 1996).For a given city, the index measures inequality in the distribution of normalized profiles ' i across the population of units.Formally: where {i 1 , . .., i G } is any subset of G units drawn at random without replacement among N units available in the city (we limit our analysis to cities where N > G, the number of tracts is larger than the number of social groups).
The index displays some important features.When all units display the same interaction profiles π i 5 π, then all vectors ' i can be expressed as linear combinations of one of the others, implying GS 5 0. This is arguably the case in which there is a complete lack of segregation, and the residents of any unit i have the same chances as any other unit to interact with each of the G groups.Conversely, if for every i there exists a group γ such that π iγ 5 1 and π ig 5 0 for all other groups except γ, then GS 5 1.Arguably, this is the case of maximal segregation in which residents of unit i interact with one and only one group, γ.Lastly, the index has been shown (see Andreoli, 2014 andAndreoli andZoli, 2015) to be the unique index satisfying some desirable decomposition properties and behaving consistently with basic transformations of the data that are unambiguously understood as being segregation-reducing.
In the application, we compute the GS index for each American city considered in this study, and we interpret variations of the index as a measure of segregation in the city, with larger values of GS corresponding to more segregated cities.

Estimation strategy
In the absence of individual-level information about the incidence of COVID-19 infections, we can only rely on average levels of COVID-19 cases and observable covariates aggregated at the county level, the finer spatial partition for which all data are available.Using counties as units allows us to account for larger heterogeneity in healthcare availability across finer-scale partition of the territory, thereby reducing the risk that variability in COVID-19 cases reporting is driven by access to care.Second, county estimates provide enough within-MSA variation in COVID-19 cases.
Our analysis considers two periods.The earliest period encompasses the onset of COVID-19 outbreaks up to March 29, 2020, which is ten days (roughly corresponding to the average incubation period of the virus) after the first stay-at-home order was implemented in California.Therefore, COVID-19 incidence data on this date are likely unaffected by the introduction of early lock-down policies, providing insights into the initial patterns of the pandemic's evolution in response to local characteristics.The second period concludes on April 13, 2020, approximately ten days after the enforcement of lock-down restrictions across Journal of Economic Studies all US states.We examine the number of new cases as of April 13 and their speed of variation over one-and two-week intervals to assess the persistence of segregation effects on COVID-19 incidence subsequent to the implementation of lock-down policies.
We employ regression analysis to study the partial correlation between exposure segregation and the evolution of COVID-19 across American counties.While COVID-19 measures change over time, the main treatment, exposure segregation, and the relevant set of covariates are pre-determined and static with respect to the evolution of COVID-19 cases.For this reason, our estimates only rely on cross-sectional variability of segregation and COVID-19 across MSAs and counties, whereas we explore the longitudinal features of the data to assess and update our estimates at different timings of early onset of the pandemic.
The estimating equation is: where  2) is estimated with OLS when the dependent variable is continuous, such as for cases normalized by the resident population.We use Poisson regression when the dependent variable corresponds to counts.We further utilize logistic regressions to analyze the impact of changes in segregation on the probability of COVID-19 infection outbreaks across counties.
Independently on the way models are estimated, the effects of interest relate to coefficients β 1 , β 2 and β 3 .More specifically, β 1 captures the effect of increasing segregation (i.e. by one standard deviation of the GS index) on new and cumulative COVID-19 cases.Coefficients β 2 and β 3 capture the additional contribution of variations in segregation originating within MSAs that have adopted lock-down measures earlier during the pandemic.
Identification hinges on a strong exogeneity assumption, positing that variations in segregation within a State are unrelated to unobservable factors influencing the pandemic and potentially associated with other variables, such as mobility restrictions imposed in a specific county.This assumption could be compromised if factors related, for instance, to the quality of healthcare services in a State (which we do not observe and that are related to the effectiveness and extent of COVID-19 monitoring) also play a significant role in driving early lock-down policies, often enacted to prevent over-crowding of ICU units.We address such issues by controlling for stay-at-home adoptions on different days by county and separating the analysis by period of analysis (the issue being more relevant for later dates in which COVID-19 is measured).

Data
Data on ethnic composition and covariates are from the American Community Survey (ACS) 5-year modules, covering urban counties in 366 Metropolitan Statistical Areas (MSA).Estimates of segregation are based on the 2014-2018 modules.We consider these estimates as pre-determined to the onset of the COVID-19 epidemic and, as such, they are not impacted by the restrictions put in place during and after the COVID-19 pandemic.
In 2018, the White population was 60% of urban residents on averages, and more than 25% of American MSAs now display a clear non-White majority.The Hispanic population counts for almost 20% of the population, while the average proportion of urban Black population at about 10% and the Asian population experienced to 7% of the total urban population.Table 1 describes the Gini Exposure Segregation index in 2018 for the median, bottom and top quartile cities as ranked by the level of the index in that year.
Table 2 reports descriptive statistics of our sample based on 1,087 counties.As additional controls, we consider: (A) demographics (which includes population size, density, indicators for 10 and 5 largest cities, percentages of Black, Hispanics, Asians and Whites as well as share of individuals aged 65 or more); (B) housing market characteristics (such as the share of owner-occupied houses, the share of old houses, the median value and rent of houses and their variability within a census tract); (C) human capital (indicators for college towns and for the presence of students, average education and mean, median and variance of income); (D) access to health insurance; (E) ethnic segregation and poverty at county level as measured by dissimilarity indices for Black, Asian, Hispanic and Native origin groups.In panel (F) we consider further multi-group spatial segregation measures which are largely adopted in the literature (see Reardon and O'Sullivan, 2004).Figure A1 in the appendix shows that all indices are rank-correlated to the Gini Exposure index, albeit such correlation is very blurry and mostly driven by few large cities.The Figure highlights the fact that the Gini Exposure index captures features of segregation in American MSA that are, to a large extent, independent from what is measured by alternative indices.
The data on confirmed COVID-19 cases are from the Centers for Disease Control and Prevention and from the state and local level public agencies.These data have been recorded at county level on the New York Times dedicated data repository [1].Data for geographic exceptions have been collected from USAFact online repository [2].Additional data about legislation on stay-at-home and shelter-in-place orders have been collected from the New York Times interface [3].
As of March 29, 5% of the 3,220 American counties (for which data are available) record at least 100 cases tested positive to COVID-19, with a total of 136;820 cumulative positive cases.The average incidence of COVID-19 is 13.3 cases (s.d.38.2) per 100,000 residents across counties, and 495 counties display an incidence larger than 20 cases per 100 k residents.Incidence of positive cases of COVID-19 is larger in urban counties, 1,087 in our matched MSA-county database (each MSA may gather more than one county).Among all urban counties, the average incidence is 21.7 cases per 100,000 residents, and in more than 25% of counties incidence is larger than 20 cases per 100,000 residents.

Results
Table 3 reports estimates of different models for the incidence of COVID-19 in American counties.Models ( 1)-( 4) analyze the correlation between the total number of positive COVID-19 cases and the MSA-level Gini Exposure index, normalized by its standard   Journal of Economic Studies deviations (SD) across the using sample.Model (1) controls only for the level of Gini Exposure Segregation, all the other models control for relevant county-level covariates.According to Model (3), our preferred specification, one SD increment in the Gini Exposure corresponds to a significant increase of 8.77 cases per 100,000 residents.The effect is robust to a large number of controls and to State fixed effects, capturing differences in testing policies across States jurisdictions.Additionally, the effect is robust with respect to controls for whether stay-athome orders were in force on March 29 or earlier (Model ( 2)) and closely matches the unconditional estimate reported in Model (1).
The significance of the effects of the Gini Exposure index hold when regressions are augmented with controls for alternative measures of segregation widely adopted in the literature.This result suggests that variations in the Gini Exposure index are informative of features of the distribution of ethnic groups across American cities that is relevant for explaining COVID-19 onset and early evolution and that cannot be captures by alternative measures of segregation.
In Model (4), we use a zero-inflated count data model to assess the impact of the Gini Exposure index on county-level COVID-19 incidence.The estimated effect is significant, indicating that a one SD increment in the Gini Exposure index is associated with an increase of 19.7 cases in COVID-19 incidence.
Models ( 5)-( 7) show that one SD increase in the Gini Exposure index is positively associated with the variation in COVID-19 cases, the effect ranging from 1 to about 6 cases per 100,000 residents, depending on the period over which the variation is measured (daily, weekly or 15 days).Finally, Model (8) provides evidence that segregation is associated with a lower probability (about À2%) of COVID-19 onset in the county.
One explanation for the effects displayed in Table 3 is that rising heterogeneity in interaction profiles (hence more segregation) may impact the quantity and quality of social interactions.In more segregated MSAs, residents are more likely to interact with similar individuals (here, from the same ethnic group), increasing overall social interactions.Moreover, limited exposure to other groups reduces access to information about epidemic spreading and relevant habits mitigating risk factors.Both occurrences raise the probability of infection and contribute to the virus spread.
Table 4 investigates whether exposure segregation has implications for the effectiveness of lock-down policies.We use indicators for whether such policies were in place on April 13 (t), on April 6 (t-7) and on March 28 (t-15) in each county as additional controls and interact these indicators with the Gini Exposure level in order to evaluate the partial association with COVID-19 incidence across American urban counties.We use the same set of controls as in Model (3) in Table 3 and additionally control for fixed effects attributable to New York MSA.
As expected, counties that have implemented earlier restrictions display lower COVID-19 incidence, the effect being significant across specifications only for stay-at-home orders made 15 or more days earlier than April 13.The extent to which such policy reduces the COVID-19 incidence is mitigated by the degree of exposure segregation in the city, with rising Gini Exposure that offsets (by about one-fourth in magnitude) the effects of early lock-down restrictions on the epidemic spreading.This effect suggests that the extent of exposure segregation may act as a mitigating factor which delays the effects of intervention against the COVID-19 spreading.
Our results hold even after controlling for alternative dimensions of ethnic segregation that are not captured by the Gini Exposure index.In Tables A1 and A2 in the Appendix we report estimates from the same regressions in Tables 3 and 4, where we additionally control for the multi-group segregation indices described in Figure A1.Estimated regression coefficients for the Gini Exposure index are unaffected in sign, magnitude and significance after controlling for multi-group segregation, thereby validating our assumption that rising     Journal of Economic Studies within group exposure, and not aspects of segregation related to groups size, diversity or dissimilarity, explains the onset of the COVID-19 pandemic across American MSAs.

Concluding remarks
Using ACS data, we document the effect of ethnic-based segregation in American cities on the onset of the COVID-19 pandemic.While factors like human capital, location decisions, labor market attachments, or demographics may not directly impact the probability of virus transmission, the extent of location and the quantity and quality of interactions may substantially influence the uneven spread of the virus across different places.It is well-known that there are important differences between ethnic groups in terms of quantity and quality of interactions.The Gini Exposure index is capable of quantifying such differences and measuring the degree of segregation of groups along the interaction dimensions.
We find that a one-standard-deviation increase in the Gini Exposure index is associated with an increment of about 8.77 cases per 100,000 urban county residents as of March 29, 2020.The effect, which survives after controlling for relevant mechanisms related to demographics, human capital, working status, housing market conditions, social and ethnic composition, State fixed effects, and the extent of stay-at-home orders in force, is likely a lower bound of the true effect.In fact, many positive cases are not tested but contribute to the spreading, and are more likely observed in areas where more positive cases are recorded.Additionally, a separate set of regressions reveals that higher Gini Exposure limits the effectiveness of quarantine measures in reducing the COVID-19 incidence.
This paper answers the call of Avery et al. (2020), among others, who highlight an "urgent need" for improving our knowledge about factors that have contributed to the evolution of the pandemic.The Gini Exposure index is shown to play a role in this respect.A back of the envelop calculation determines the health and economic costs exposure segregation.A SD reduction in exposure segregation reduces by 0.174 (8.77 times 2%, the official rate of COVID-19 cases requesting hospitalization according to WHO) the demand for intensive care units, thus reducing the turnaround of IC units by 6.96% (the average number of ICU in American hospitals is 24 per 100,000 residents according to Weiner-Lastinger et al., 2022).Despite the obvious lifesaving positive impact of reducing segregation, the effect leads to a reduction of financial pressure on hospitals of at least $723 for the average COVID-19 patient treated in IC units (given that the direct cost of COVID-19 inpatient was of $10,394 in 2020, see Kapinos et al., 2024).
The Gini Exposure index provides a simple tool for monitoring segregation with widely accessible data, for charting segregation differences across cities, and for inferring potential costs of high segregation.While this paper infers costs of segregation based on figures related to the COVID-19 pandemic, a future avenue of research could investigate other dimensions of public health or economic activities as outcomes, and employ the Gini Exposure index to quantify the hidden costs on an uneven social mix in American cities on such dimensions.
Note(s): Using sample limited to urban counties.Robust standard errors clustered at MSA level.Estimated effects have to be interpreted as variations in cases per 100,000 residents in the county.Models (4) and (8) report marginal effects at the average.SD stands for standard deviation units.Significance levels: * 5 10% and **5 5%Source(s): Authors own work based on ACS data, the New York Times online repository (all counties) and USAFact data collection (New York).Data extracted on April 14, 2020

À1
Note(s): Estimates are based on urban counties.Robust standard errors clustered at MSA level.Estimated effects have to be interpreted as variations in cases per 100,000 residents in the county.Models (4) and (8) report marginal effects at the average.SD stands for standard deviation units.Significance levels: * 5 10% and **5 5%Source(s): Authors own work based on ACS data, the New York Times online repository (all counties) and USAFact data collection (New York).Data extracted on April 14, 2020 Y cms (t) is a measure of COVID-19 cases in county c located in MSA m in State s observed at date t, whereas GS m is the Gini Exposure Segregation index observed in MSA m and normalized by its standard deviation in the sample.D c (t À 7) and D c (t À 15) are dummies taking values 1 if county c had experienced respectively a stay-at-home order one week and two weeks before the reference date t (March 29 or April 13).When t 5 March 29, we set D c (t À 15) 5 0 for all counties, switching off any interaction term with the Gini exposure index (GS m $ D c (t À 15) 5 0) for all counties.Covariates are collected on the vector X cm .The model features State fixed effects, which capture differences in monitoring practices across healthcare structures.Model (

Table 1 .
Bottom, median and top quartile refer to the distribution of the Gini Exposure index across the 325 MSA used in this study Source(s): Authors own work based on ACS data