1. 程式人生 > 其它 >《Public health implications of opening National Football League stadiums during the COVID-19 pandemic》

《Public health implications of opening National Football League stadiums during the COVID-19 pandemic》

《在 COVID-19 大流行期間開放國家橄欖球聯盟體育場的公共衛生影響》

Abstract:Using attendance data from the 2020 National Football League (NFL) regular season and local COVID-19 case counts, we estimate the public health impact of opening NFL stadiums to fans during the COVID-19 pandemic. Data are analyzed using robust synthetic control, a statistical method that is employed to obtain counterfactual estimates from observational data. Unlike previous studies [J. Kurland et al., SSRN, 2021], which do not consider confounding factors such as evolving policy landscapes in different states, the synthetic control methodology allows us to account for effects that are county specific and may be changing over time. We find it is likely that opening stadiums had no impact on local COVID-19 case counts; this suggests that, for the 2020 NFL season, the benefits of providing a tightly controlled outdoor spectating environment—including masking and distancing requirements—counterbalanced the risks associated with opening. These results are specific to the 2020 NFL season, and care should be taken in generalizing our conclusions. In particular, 1) these data reflect a period during which earlier strains of COVID-19 were dominant prior to the emergence of more-transmissive strains such as the Delta and Omicron variants, and 2) the data are restricted to outdoor environments; hence our results cannot be applied to small indoor spaces where transmission-restricting controls are essential.

摘要:利用 2020 年國家橄欖球聯盟(NFL)常規賽的上座率資料和當地新冠肺炎的病例數,我 們估計了在新冠肺炎疫情期間向球迷開放 NFL 體育場對公眾健康的影響。使用穩健綜合控制分析資料,這是一種用於從觀察資料中獲得反事實估計的統計方法。與之前的研究[J. Kurland 等人,SSRN,2021 年]不同,之前的研究沒有考慮混雜因素,如不同 州不斷髮展的政策環境,綜合控制方法允許我們考慮縣具體的影響,並可能隨著時間的推移而變化。我們發現開放體育館可能對當地新冠肺炎病例數沒有影響;這表明, 對於 2020 年 NFL 賽季,提供嚴格控制的戶外觀看環境的好處——包括掩蔽和距離要求 ——抵消了與開放相關的風險。這些結果是針對 2020 年 NFL 賽季的,在概括我們的結論時應該小心。特別是,1)這些資料反映了在更具傳播性的毒株(如 Delta 和 Omicron 變種)出現之前,早期新冠肺炎毒株佔優勢的時期,以及 2)這些資料僅限於室外環境; 因此,我們的結果不能應用於傳輸限制控制是必不可少的小室內空間。

Significance: Using data from 2020, we measure the public health impact of allowing fans into sports stadiums during the COVID-19 pandemic; these results may inform future policy decisions regarding large outdoor gatherings during public health crises. Second, we demonstrate the utility of robust synthetic control in this context. Synthetic control and other statistical approaches may be used to exploit the underlying low-dimensional structure of the COVID-19 data and serve as useful instruments in analyzing the impact of mitigation strategies adopted by different communities. As with all statistical methods, reliable outcomes depend on proper implementation strategies and well-established robustness tests; in the absence of these safeguards, these statistical methods are likely to produce specious or misleading conclusions.

意義:使用 2020 年的資料,我們測量了在新冠肺炎疫情期間允許球迷進入體育場對公共健康的影響; 這些結果可能為未來公共衛生危 機期間大型戶外集會的決策提供資訊。其次,我們證明了在這種情況下魯棒綜合控制的效用。綜合控制和其他統計方法可用於開發潛在的 新冠肺炎資料的低維結構,可用作分析不同社群採取的緩解戰略的影響的有用工具。與所有統計方法一樣,可靠的結果取決於適當的實施策略和完善的穩健性測試;如果沒有這些保障措施,這些統計方法很可能產生似是而非或誤導性的結論。

Introduction:

A year and a half into the global COVID-19 pandemic, we have an opportunity to analyze and reflect upon the policies and decisions enacted over the past 18 mo. Given the distributed nature of policy decisions in the United States, we find ourselves in a unique position in which states and municipalities have explored different strategies to combat the virus, and the efficacy of those policies has been imprinted in the local case counts, hospitalizations, and death records. In particular, these data contain a wealth of information about which policies have proven to be effective in preserving the health and safety of our communities.

COVID-19已有一年半之久,作者提出反思和分析過去18個月所制定的政策和決定。

One activity that one may wish to consider is the opening of outdoor sporting events to spectators. This question has recently generated quite a bit of interest as ballparks across the nation open for summer and events such as the 2021 Summer Olympics in Japan take place. On the one hand, governing bodies are naturally wary of opening stadiums given the well-documented importance of avoiding large gatherings. On the other hand, sporting events are often held outdoors, where airflow is largely unobstructed , and in venues where crowd density can be carefully controlled if the event is properly managed. In the absence of a detailed analysis, it is not immediately obvious which of these effects dominates.

作者提出向觀眾開放戶外體育賽事是研究者非常感興趣的一點。因為:一 方面,大量文獻證明避免大型集會的重要性,管理機構自然對開放體育場持謹慎態度。 另一方面,體育賽事通常在室外舉行,那裡的氣流基本上是通暢的,如果賽事管理得當,可以在人群密度得到精心控制的場館舉行。而在缺乏詳細分析的情況下,哪種影響占主導地位並不明顯。

Data from the National Football League (NFL) may provide an answer to this question. During the 2020 regular season, teams in the NFL collaborated with local communities to determine whether or not to allow fans in the stadiums during the pandemic. In general, stadiums that opened their doors to fans adopted pandemic requirements for all in attendance , which typically include some combination of staggered entry, required masking, health questionnaires, temperature checks for staff, deployment of compliance officers, modified concessions, social distancing in seating and lines, mobile ticketing, enhanced cleaning protocols, amplified health and safety communications, and capacity limitations. The highest capacity that any NFL stadium allowed during the 2020 regular season was 30% (Dallas), with most other stadiums considerably below that limit . These policy decisions were made based on local guidelines, local prevalence, community risk tolerance, and other localized considerations; some stadiums ultimately decided to allow fans at the games, while others remained closed, providing perhaps the first set of natural experiments that can be analyzed to investigate the impact of opening stadiums on COVID-19 case rates. In the words of Kurland et al. , who recently provided a first look at this data, “Scant evidence has been gathered in the extant literature on the impact of sport venues on local public health, influenza-related mortality rates, or disease contagion more generally. There is a complete absence of any evidence related to the impact of fans gathering at sporting events, or mass gatherings more generally, on incidence of COVID-19 at the local-level.” The natural experiments from the 2020 NFL season and other sports leagues present a golden opportunity to address these questions in the context of the original 2020 COVID-19 strain .

美國國家橄欖球聯盟(NFL)的資料可能會為這個問題提供答案。

In the Kurland et al. study, the authors compared COVID-19 case data from NFL stadium counties that allowed fans in the stadium to counties that did not allow fans, and looked for spikes in the data in the weeks following a game; the authors concluded, from this analysis, that the presence of large numbers of fans at NFL games led to “tangible increases” in the local incidence of COVID-19 cases. However, this type of analysis may be problematic: In this context, the control stadiums (i.e., those without fans) tend to be embedded in states with stricter COVID-19 policies— rather than a random control—so the sample of control counties is strongly biased. New York and Dallas, for example, are immersed in very different environments with different pandemic policies, and it is not at all obvious that one can attribute the differences in case spikes to the stadiums, given the enormous number of confounding factors.

作者對Kurland的研究提出問題,認為這不是隨機控制,考慮到大量的混雜因素,實驗證明可能不準確。

Fortunately, there exists a rich literature of techniques— longitudinal methods, hierarchical methods, factor model methods, synthetic control, etc.—that we can draw upon to account for these confounding factors. In this particular analysis, we turn to synthetic control (, which has been applied in a diversity of fields—criminology , healthcare , sports , and political science and policy evaluation , to name a few. At its heart, synthetic control is a method for estimating a counterfactual in the absence of an intervention, in this case, what would have happened if stadiums had not opened. The method provides a systematic way to choose relevant comparison units when randomized controls are not available.

作者運用綜合控制法——在沒有干預的情況下估計反事實的方法。(體育館沒開放會發生什麼?)

To illustrate the power of synthetic control, imagine the ideal experiment one would like to run in order to quantify the impact of opening the Dallas stadium to fans. In principle, we would like to have COVID-19 case counts from Dallas County throughout the season with the stadium open to fans and case counts from a Dallas twin—with identical people and policies to the first Dallas—in which the stadium did not open for comparison. The first set of data (Dallas open to fans) is readily available. The second set of data can be constructed from information from other counties in Texas—hereafter referred to as donor counties— which have policies and characteristics similar to Dallas. Synthetic control provides a methodology to build a weighted combination of these Dallas-like counties, which can then be used as a control group, that is, a “synthetic” Dallas twin. In particular, we seek the linear combination of case counts from other Texas counties that most closely mirrors the Dallas case counts prior to the stadium opening. Given that none of these non-Dallas counties have a stadium, this linear combination can be extended postintervention (i.e., after opening the stadium) to estimate what would have happened in the synthetic Dallas in which no stadium opened. Once it has been established that the stadium county and the synthetically generated county have similar behavior over extended periods of time prior to the intervention, a discrepancy in the number of COVID-19 cases following the intervention may be interpreted as a result of allowing fans in the stadium. One of the advantages of this method is that it can account for the effects of confounding factors that are county specific and may be changing over time, which is crucial in the ever-evolving policy landscape of a pandemic (16). In particular, our methodology allows for correlation between the decision to open the stadium and characteristics that define the county (cultural or political leaning, population density, demographics, etc.), but cannot account for correlations between the decision and exogenous noise.

通過舉例法,說明合成法的優點之一是它可以解決因縣而異且可能隨時間變化的混雜因素的影響,這在疫情不斷演變的政策格局中至關重要。

At this point, it is reasonable to speculate whether one should expect linear combinations of donor counties to accurately represent stadium counties (both observed and counterfactual). In general, assuming linearity is appropriate provided there exists an underlying low-dimensional structure to the case count data, that is, if the matrix containing discretized time series of donor county case counts is approximately low rank. Under a such a setting, linearity between counties is an almost immediate consequence (see Materials and Methods for details). This low-rank assumption is common in the matrix completion literature; notably, lowrank matrices have also been shown to naturally arise in modern datasets and emerge from “well-behaved” generative models (e.g., Lipschitz functions) . This point will be revisited in Results, where we test for low rankedness empirically in the context of our dataset.

Finally, the selection of donor units is a critical step in the successful implementation of creating a synthetic control. In particular, donor units (in our case, counties) should have the following characteristics:

  1. Counties affected by the intervention or by events of a similar nature should be excluded from the donor pool.

  2. Counties that may have suffered large “idiosyncratic shocks” during the preintervention period should be excluded.

  3. The donor pool should be restricted to counties with characteristics similar to the stadium county; in this case, we restrict our pool to counties from the same state to maintain some consistency in COVID-19 policies.

  4. Case counts that cover an extended period of time prior to the intervention are required for both stadium counties and donor counties.

In order to establish which counties satisfy these constraints, the NFL provided us with aggregate attendance data indicating the percentage of fans from each county in each state . In general, 10% or more of the fans come from the county in which the stadium is located. Hence, we designate counties that provided more than 10% of the fan base as stadium counties. In addition, there are a number of counties that are home to many fans but not to the same extent as that of the stadium counties. Since there is some ambiguity as to whether these counties should be counted as stadium counties or donor counties, we designate counties that supply between 1% and 10% of the fan base as buffer counties and, in light of the first criterion above, do not include them as either stadium or donor counties. Second, to address criterion 3, we only include counties in the donor pool that come from the same state as the stadium county. Although there is variation at the county level, overarching COVID-19 guidance, in general, comes from the states; hence, we assume that policies are relatively consistent within states and allow that they may vary dramatically from state to state. In addition, we only retain counties in which at least 200 cases have been recorded, in order to eliminate donor counties that are either markedly underreporting or undertesting. Finally, we are fortunate that football season starts in September, which allows us to address criterion 4; given that relatively reliable COVID-19 case count data have been available since approximately April 2020, we have 4 mo of training data at our disposal to learn the weights for the synthetic counties. Criterion 2 is trickier, given that we do not necessarily know, a priori, all events that could cause a shock to the system; however, a posteriori, we can investigate the outcomes and look for signs of such a shock.

作出假設,驗證上述的4個特徵。

Results:

  • we simply identify stadiums as open or closed for the season starting with the first game in which fans were allowed in the stadium.

  • define the difference between the synthetic county and the measured county as Δ(t) = c(t) − csynth(t)

  • c(t) is the cumulative number of reported COVID-19 cases in the stadium county

  • csynth(t) represents the counterfactual number of cases in the synthetic county

  • Positive Δs indicate excess cases in stadium counties; negative Δs indicate fewer than expected cases in stadium counties relative to the counterfactual.

Comparison of the measured COVID-19 case counts from Hamilton County, OH (red line), and COVID-19 case counts from the counterfactual synthetic county (blue line). The vertical gray line indicates the date of the first home game that allowed fans in the stadium. In this example, the stadium county recorded fewer cases than the counterfactual after fans were allowed in the stadium, suggesting that, for Hamilton County, the benefit of moving fans into a controlled outdoor environment outweighed the potential harm associated with large gatherings.

  • interestingly, the measured counts are slightly lower than the projected counts, suggesting that, in this particular county, opening the stadium may have modified fan behavior in a way that was helpful to the community, and not harmful.

Comparison of measured case counts with synthetic county case counts for all NFL stadium counties except Maricopa County. The top 16 plots show stadiums that allowed fans for some portion of the 2020 season; the bottom 13 plots show stadiums that remained closed. Red lines indicate measured data; blue lines indicate synthetic data; light gray shaded regions indicate 99% prediction intervals; vertical gray line indicates the first day that the stadium was open to fans (for open stadiums) or the date of the first home game (for closed stadiums)

  • As expected, on average, stadiums without fans show no significant difference from the synthetic counties

Discussion:

Impact of Opening Stadiums to Fans

  • the analysis shows no indication that opening stadiums had any impact on community spread.
  • we find that counties which allowed fans in the stadium show no statistically significant difference from the synthetic counties, that is, there is no evidence that the NFL’s controlled opening of stadiums to fans led to any increase in COVID-19 cases

(Top)Gray IQR box-and-whiskers plots showing the difference between measured case counts and synthetic case counts, Δ(t), up to 21 d after the first home game for stadiums that did not open to fans. If the synthetic approach is working reliably, the gray box-and-whisker points should be indistinguishable from zero, given that no fans were allowed in the stadium, and hence there was no intervention. (Bottom) Blue IQR box-and-whiskers plots showing the difference between measured case counts and synthetic case counts, Δ(t), up to 21 d after the first game for counties with stadiums that opened during the pandemic. Again, the points show no significant difference from zero, indicating that allowing fans in the stadium had no impact on the local prevalence of COVID-19. Note that negative Δs signify fewer cases in stadium counties relative to the counterfactual.

  • hint that providing controlled outdoor environments for fans to assemble may have benefited some counties.

Large versus Small Crowds

  • Given that most stadiums were operating far under their capacity limits, one might argue that the null result above is dominated by stadiums with small attendance numbers, which may overshadow the signal from stadiums that allowed more fans to attend games.

(Top)Difference (by percentage) between stadium and synthetic counties 14 d after the stadium first opened to fans versus average attendance. Negative Δs indicate counties in which the measured case counts were lower compared to the counterfactual. The blue line is a linear fit. (Bottom) Slope of the linear regression versus number of days after game day. Data indicate that there is no correlation between attendance and COVID-19 case counts.

  • our analysis shows no correlation with attendance.
  • Stadium counties that allowed higher attendance show no increase in COVID-19 cases relative to their lower-attendance counterparts or to stadiums that did not open to fans at all.

Validity of the Synthetic Control Approach

(Top)Blue lines indicate synthetic case counts computed using the entirety of the preintervention period. Gray lines indicate synthetic case counts computed using a subset of the preintervention data, that is, assuming the intervention happened 1 wk to 6 wk prior to game day. The majority of the plots show little dependence on the intervention date suggesting that the synthetic counties in those cases are robust; others—for example, Green Bay, Seattle, and Jacksonville—indicate that the synthetic counterfactual is not reliable for those counties. (Bottom) Number of singular values in the donor matrix that are retained to construct each synthetic stadium county. By and large, a low-dimensional representation suffices (again, with a few notable exceptions such as Green Bay).

  • in most counties, ∼10 or fewer singular values are sufficient to capture the variance in the preintervention period which typically consists of a few hundred data points, that is, one point per day in the months prior to opening (bottom)

Comparison of measured COVID-19 case counts (red line) and COVID19 case counts from a counterfactual synthetic Meade County (blue line) for the Sturgis Motorcycle Rally. The vertical gray line indicates the first day of the rally.

  • The synthetic control approach does indeed find a significant increase in COVID-19 cases in Meade County following the rally and suggests that Sturgis may be responsible for a 24% ± 11% increase in COVID-19 cases after 14 d and a 43% ± 11% increase after 21 d.

Connection to Other Observed Variables

  • examine whether counties that are “close” to one another in our low-dimensional representation are also “close” with respect to relevant observed variables.
  • one might expect counties with similar COVID-19 profiles to also share similar political views.
  • Pi ≈ 0 for counties with temporal profiles similar to the most Republican county, and Pi ≈ 1 for counties similar to the most Democratic county.

(Left) Map of Ohio colored by Pi, the relative distance defined in the subspace constructed from COVID-19 case counts, from the most Democratic county in the state (Cuyahoga) and the most Republican county (Mercer). (Middle) The 2020 electoral map of Ohio. Colors have been scaled in both maps such that Cuyahoga (blue) and Mercer (red) represent the extremes of the color scale. (Right) Histogram of counties depicting the predictive power of the COVID-19 subspace; 1 = perfect estimator; 0 = no better than random; median = 0.55.

  • the most Democratic county corresponds to one (blue) and the most Republican county corresponds to zero (red)
  • To estimate the extent to which variance in voting patterns is captured in the temporal COVID-19 signature, we define a pertinent R-squared for the ith county
  • we will refrain from digging more deeply into the implications of this mapping exercise, and merely emphasize that our low-dimensional COVID-19 subspace is indeed reflective of other pertinent observed variables—geography, population, and political leaning— as one might expect.

Conclusions:

  • we find no evidence that opening NFL stadiums to fans during the 2020 regular season led to any uptick in the number of COVID-19 cases in the stadium counties.
  • the B.1.1.7, Delta, and Omicron , these three variants are known to be significantly more transmissive than the original strain

Materials and Methods:

Causal Framework

Key Assumptions

Methodology