Methods
Data
NEO CANDO
Foreclosure data was obtained from the North East Ohio Community and Neighborhood Data for Organizing in a spread sheet.
Census: TIGER products
2010 census tract, census block, and road shapefiles.
American Fact Finder
Census tract relation files pertaining to demographics and socio-economic information.
City of Cleveland
Statistical Planning Areas (SPA) for neighbourhood boundaries (aggregated CTs by the city).
Methods
Foreclosure Clustering
Geolocated results were clipped to the 2010 census area for Cleveland. CrimeStats was used to run a Risk-Adjusted Nearest Neighbour Hierarchy. This tool creates an interpolated ellipse of areas where a higher than expected frequency of an event occurs, normalized against localized population. I used the shapefile of Cleveland foreclosures as the primary file and a shapefile of block population centroids as the secondary file. Risk parameters were designated to a normal method of interpolation and an adaptive bandwidth. I selected my output to be ellipses since it is a generalized shape and doesn’t represent exact boundaries.
Exploring Demographic Data
Within ArcGIS I calculated the centroids of my RNNH ellipses and aggregated specific 2010 CT demographic attributes to each centroid. The selected demographic data included: percent white, percent black, percent of single mothers, percent below the poverty line, percent unemployed, and median income. After aggregating this data I ran an explanatory regression tool to determine the most statistically influencing factors of foreclosure using frequency (FREQ) of foreclosure as the dependent variable. The tool showed that using single mothers (SING_MOM), percent below the poverty line (POV), and percent unemployed (UNEMP) would produce the strongest statistical results with an R2 of 0.04 and AICc of 2060.1. Since POV and UNEMP seemed too similar I decided to use the second best result that swapped UNEMP with percent African-American (BLACK) which had a R2 of 0.04 and an AICc of 2060.62.
View the explanatory regression result here.
Geographically Weighted Regression
Using FREQ again as the dependent variable and SING_MOM, POV, and BLACK as the explanatory variables, I ran a geographically weighted regression (GWR) with an adaptive kernel type and AICc bandwidth method. The results were spatially joined back to the original RNNH ellipses and assigned custom break values in order to visualize clusters that have strong, weak, or no influence on foreclosure FREQ.
Dual Kernel Density Interpolation
The final step involved incorporating the GWR results and aggregated centroids of CT data within CrimeStats to see where influences of foreclosures FREQ wasn’t simply a function of localized demographic characteristics. To determine this I ran a Dual Kernel Density Interpolation (DK) with the GWR results for the primary file and the GWR coefficient (for instance, the coefficient representing the influence of SING_MOM) as the weight. This was normalized against the secondary file which was the CT centroids with the corresponding demographic/economic information previously used to run the GWR (for example, percentage of SING_MOM in that tract). Using these steps I created 3 separate DK interpolations which I clipped to the SPAs and overlaid with the GWR ellipses for comparison.
Continue to the Discussion page to see the maps.