Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Dec 12;114(50):13108-13113.
doi: 10.1073/pnas.1700035114. Epub 2017 Nov 28.

Using deep learning and Google Street View to estimate the demographic makeup of neighborhoods across the United States

Affiliations

Using deep learning and Google Street View to estimate the demographic makeup of neighborhoods across the United States

Timnit Gebru et al. Proc Natl Acad Sci U S A. .

Abstract

The United States spends more than $250 million each year on the American Community Survey (ACS), a labor-intensive door-to-door study that measures statistics relating to race, gender, education, occupation, unemployment, and other demographic factors. Although a comprehensive source of data, the lag between demographic changes and their appearance in the ACS can exceed several years. As digital imagery becomes ubiquitous and machine vision techniques improve, automated data analysis may become an increasingly practical supplement to the ACS. Here, we present a method that estimates socioeconomic characteristics of regions spanning 200 US cities by using 50 million images of street scenes gathered with Google Street View cars. Using deep learning-based computer vision techniques, we determined the make, model, and year of all motor vehicles encountered in particular neighborhoods. Data from this census of motor vehicles, which enumerated 22 million automobiles in total (8% of all automobiles in the United States), were used to accurately estimate income, race, education, and voting patterns at the zip code and precinct level. (The average US precinct contains ∼1,000 people.) The resulting associations are surprisingly simple and powerful. For instance, if the number of sedans encountered during a drive through a city is higher than the number of pickup trucks, the city is likely to vote for a Democrat during the next presidential election (88% chance); otherwise, it is likely to vote Republican (82%). Our results suggest that automated systems for monitoring demographics may effectively complement labor-intensive approaches, with the potential to measure demographics with fine spatial resolution, in close to real time.

Keywords: computer vision; deep learning; demography; social analysis.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig. 1.
Fig. 1.
We perform a vehicular census of 200 cities in the United States using 50 million Google Street View images. In each image, we detect cars with computer vision algorithms based on DPM and count an estimated 22 million cars. We then use CNN to categorize the detected vehicles into one of 2,657 classes of cars. For each type of car, we have metadata such as the make, model, year, body type, and price of the car in 2012. Images courtesy of Google Maps/Google Earth.
Fig. 2.
Fig. 2.
We use all of the cities in counties starting with A, B, and C (shown in purple on the map) to train a model estimating socioeconomic data from car attributes. Using this model, we estimate demographic variables at the zip code level for all of the cities shown in green. We show actual vs. predicted maps for the percentage of Black, Asian, and White people in Seattle, WA (i–iii); the percentage of people with less than a high school degree in Milwaukee, WI (iv); and the percentage of people with graduate degrees in Milwaukee, WI (v). (vi) Maps the median household income in Tampa, FL. The ground truth values are mapped on Left, and our estimated results are on Right. We accurately localize zip codes with the highest and lowest concentrations of each demographic variable such as the three zip codes in Eastern Seattle with high concentrations of Caucasians, one Northern zip code in Milwaukee with highly educated inhabitants, and the least wealthy zip code in Southern Tampa.
Fig. 3.
Fig. 3.
Actual and inferred voting patterns. A, i and ii map the actual and predicted percentage of people who voted for Barack Obama in the 2008 presidential election (r = 0.74). iii maps the ratio of detected pickup trucks to sedans in the 165 cities in our test set. As can be seen from the map, the ratio is very low in Democratic cities such as those in the East Coast and high in Republican cities such as those in Texas and Wyoming. (B) Shows actual vs. predicted voter affiliations for various cities in our test set at the precinct level using our full model. Democratic precincts are shown in blue, and Republican precincts are shown in red. Our model correctly classifies Casper, WY as a Republican city and Los Angeles, CA as a Democratic city. We accurately predict that Milwaukee, WI is a Democratic city except for a few Republican precincts in the southern, western, and northeastern borders of the city.

Similar articles

Cited by

References

    1. Department of Commerce, US Census Bureau US census bureau’s budget estimates. 2013 Available at www.osec.doc.gov/bmi/budget/fy13cbj/Census_FY2013_CongressionalJustifica.... Accessed September 13, 2014.
    1. Department of Commerce, US Census Bureau (2012) American community survey 5 year data (2008-2012). Available at https://factfinder.census.gov/faces/tableservices/jsf/pages/productview..... Accessed September 13, 2014.
    1. Department of Commerce, US Census Bureau (2010) Decennial census. Available at https://www.census.gov/data/developers/data-sets/decennial-census.html. Accessed September 13, 2014.
    1. Antenucci D, Cafarella M, Levenstein M, Ré C, Shapiro MD. Using Social Media to Measure Labor Market Flows. Technical Report 20010 National Bureau of Economic Research; Cambridge, MA: 2014.
    1. Michel JB, et al. Quantitative analysis of culture using millions of digitized books. Science. 2011;331:176–182. - PMC - PubMed

Publication types