Discussion and Conclusion

In Los Angeles, the regions with the high air pollution concentration are all located in the basin which is also the urban center. The regions with lower air pollution are located in the southeast and northern rural regions. However, the northwestern rural region still has a relatively higher air pollution concentration than the rural southeast. In the San Fransisco bay area, the regions with the high air pollution concentration are all located in the populated urban centers, i.e. Oakland, San Francisco, and San Jose. The region in the west of Oakland also has medium to high overall air pollution concentration. The rural region in the northwest of the bay area has the lowest air pollution concentration. Generally, the regions immediately surrounding San Francisco Bay have high air pollution concentrations. The Average Air Pollution Percentile maps in both cities also show a similar pattern to the multivariate cluster analysis. All highly polluted census tracts are located in the populated valley basin and expanding along with the highway system.

The exploratory regression analyzes which air pollutants have the highest correlation with asthma and cardiovascular disease. In Los Angeles, we found that the toxic releases variable explains most of asthma and cardiovascular disease data. In the San Francisco bay area, we found that the PM2.5 variable explains most of asthma and cardiovascular disease data. We conducted several regression analyses, including General Linear Regression (GLR), Geographically Weighted Regression (GWR), and Spatial Lag Regression, to test which model best explains the relationship between air pollutants and disease data. In summary of the results, the GWR model is the best fit for both diseases and in both Californian cities. GWR consistently has the highest R-squared value and the lowest AIC values. The residual regression maps for both cities also show GWR model has almost no spatial autocorrelation compared to the GLR model (Moran’s I ≈ 0).

The densely populated regions in both the San Francisco bay area and Los Angeles are in the valley basin. The San Francisco bay area has a narrower valley so the population is denser in the urban center. The GWR coefficient maps for both cities show that the hot spots of high correlation between asthma and cardiovascular diseases are in the industrial area after comparing with the land use map (Mckee et al., 2000; Alex, 2016). The air pollutants generated from the industrial facilities and vehicle emissions along the highway might be dispersed to other regions depending on the seasonal wind direction. The westerlies from the sea blow the pollutants inland but are trapped within the valley basin. However, we notice the northwestern bay area has an unusual highly correlated hotspot between PM2.5 and diseases. This rural area has a low population density and this sample population might be prone to asthma and cardiovascular disease due to other environmental factors.

There are limitations to our research. The spatial unit of the dataset is aggregated into census tracts, so the modifiable areal unit problem exists. Our research does not consider the temporal lag of the disease development, and cardiovascular disease often takes longer to develop than asthma. In addition, asthma and cardiovascular disease have different high-risk populations. For instance, asthma is generally associated with younger populations while cardiovascular disease is associated with older populations. A high correlation between air pollutants and diseases does not infer direct causation. We did not include various environmental or demographic factors in this research.

For the future directions, we would suggest taking other environmental and socioeconomic factors into account and studying air pollution over smaller time scales. For instance, we could investigate how air pollution during rush hour influences the health of children actively commuting to and from school.