R package to compute statistics from the American Community Survey (ACS) and Decennial US Census
July 6, 2016 • Sebastian Daza
The acsr package helps extracting variables and computing statistics using the America Community Survey and Decennial US Census. It was created for the Applied Population Laboratory (APL) at the University of Wisconsin-Madison.
The functions depend on the acs and data.table packages, so it is necessary to install then before using acsr. The acsr package is hosted on a github repository and can be installed using devtools:
Remember to set the ACS API key, to check the help documentation and the default values of the acsr functions.
The default dataset is acs, the level is state (Wisconsin, state = "WI"), the endyear is 2014, and the confidence level to compute margins of error (MOEs) is 90%.
The acsr functions can extract all the levels available in the acs package. The table below shows the summary and required levels when using the acsdata and sumacs functions:
state, county, county.subdivision
state, county, tract
state, county, tract, block.group
Getting variables and statistics
We can use the sumacs function to extract variable and statistics. We have to specify the corresponding method (e.g., proportion or just variable), and the name of the statistic or variable to be included in the output.
To download the data can be slow, especially when many levels are being used (e.g., blockgroup). A better approach in those cases is, first, download the data using the function acsdata , and then use them as input.
When computing statistics there are two ways to define the standard errors:
Including all standard errors of the variables used to compute a statistic (one.zero = FALSE)
Include all standard errors except those of variables that are equal to zero. Only the maximum standard error of the variables equal to zero is included (one.zero = TRUE)
Below an example when estimating proportions and using one.zero = FALSE:
When one.zero = TRUE:
When the square root value in the standard error formula doesn’t exist (e.g., the square root of a negative number), the ratio formula is instead used. The ratio adjustment is done variable by variable .
It can also be that the one.zero option makes the square root undefinable. In those cases, the function uses again the ratio formula to compute standard errors. There is also a possibility that the standard error estimates using the ratio formula are higher than the proportion estimates without the one.zero option.
Decennial Data from the US Census
Let’s get the African American and Hispanic population by state. In this case, we don’t have any estimation of margin of error.
The output can be formatted using a wide or long format:
And it can also be exported to a csv file:
Combining geographic levels
We can combine geographic levels using two methods: (1) sumacs and (2) combine.output. The first one allows only single combinations, the second multiple ones.
If I want to combine two states (e.g., Wisconsin and Minnesota) I can use:
If I want to put together multiple combinations (e.g., groups of states):
Let’s color a map using poverty by county:
In sum, the acsr package:
Reads formulas directly and extracts any ACS/Census variable
Provides an automatized and tailored way to obtain indicators and MOEs
Allows different outputs’ formats (wide and long, csv)
Provides an easy way to adjust MOEs to different confidence levels
Includes a variable-by-variable ratio adjustment of standard errors
Includes the zero-option when computing standard errors for proportions, ratios, and aggregations