introductory-walkthrough • precinctsopenelex

library(precinctsopenelex)
library(tidyr)
library(stringr)
library(dplyr, warn.conflicts = FALSE)
library(janitor, warn.conflicts = FALSE)
library(readxl)

This vignette is intended to provide a walk-through of processing a set of precinct results from Saratoga County, NY, for the 2020 U.S. presidential election.

The sample dataset for Saratoga is included with the package to allow for exploration, and provides a template for how the data should be structured for the reshaping functions to operate.

Importing the Wide Data

First we’ll assign a state abbreviation and county name variable needed for the input file string

#chose state abbreviation and county name variable needed for the input file string
current_state <- "NY"
current_county <- "Saratoga"

Now we’ll use our first function from the package to create a name string for the pre-processed “wide” data file saved as Excel. The infile_string() function can help us with this.


infile_string <- create_infile_string(current_state, current_county)
infile_string
#> [1] "NY_Saratoga/NY_Saratoga_GE20_cleaned.xlsx"

Reshaping/Processing the Data

For the purposes of this walkthrough, instead of importing a new file we’ll use a sample dataset included with the package - precinctsampledata_ny - to demonstrate what the pre-processed “wide” data should look like structure-wise when.

Let’s take a look at the sample data:

precinctsampledata_ny %>% 
  head(4) %>% 
  knitr::kable()

precinct	Joseph R. Biden - DEM	Donald J. Trump - REP	Donald J. Trump - CON	Joseph R. Biden - WOR	Howie Hawkins - GRE	Jo Jorgensen - LIB	Brock Pierce - IND	Blanks	Voids
Ballston 1	273	195	16	31	5	8	1	4	1
Ballston 2	485	412	58	18	7	20	6	5	2
Ballston 3	577	502	64	31	3	7	7	7	3
Ballston 4	428	316	48	16	3	11	5	3	4

Note the precise way of formatting the column names that will allow the function to work correctly:
– candidate columns should be written as “Candidate Name - Party Abbreviation” (e.g. “Joe Biden - DEM”)
– additional choices should be listed as “WriteIn” (write-in votes), “Blanks” (undervotes), and “Voids” (overvotes)

Now let’s run the function the package’s main function - reshape_precinct_data() - to transform the dataset into the tidy/long format the OpenElections project needs, along with the correct standardized column names.

The reshape_precinct_data() function wants:
– dataset (or nested import from file function)
– office: text label for office (e.g. “U.S. House”)
– district: text label for district (e.g. “42”; note that statewide offices have have no district, so use "")

Ok, now we’re ready to see the rubber meet the road.

Let’s run reshape_precinct_data() on our sample data:

processed_prez <- reshape_precinct_data(precinctsampledata_ny,
                                  "Presidential",
                                  "")

processed_prez %>% 
  head(10) %>% 
  knitr::kable()

precinct	office	candidate	party	votes
Ballston 1	Presidential	Joseph R. Biden	DEM	273
Ballston 1	Presidential	Donald J. Trump	REP	195
Ballston 1	Presidential	Donald J. Trump	CON	16
Ballston 1	Presidential	Joseph R. Biden	WOR	31
Ballston 1	Presidential	Howie Hawkins	GRE	5
Ballston 1	Presidential	Jo Jorgensen	LIB	8
Ballston 1	Presidential	Brock Pierce	IND	1
Ballston 1	Presidential	Write-Ins		0
Ballston 1	Presidential	Blanks		4
Ballston 1	Presidential	Voids		1

Bingo. We now have the results in the correct format, with the column names also matching the OpenElections naming conventions.

In a real-world scenario, there would be several races to process in addition to presidential: U.S Senate and House, along with state House and Senate. They can be done using the same manner.

As noted earlier, the dataframe fed to the function can a named R object table already imported in your script, or if you prefer you can also nest the import itself inside the first argument, such as this:

# not run:
# processed_prez <- process_ny_data(read_excel(filestring_import, sheet = "presidential"), 
#                                   "President", 
#                                   "")

In the end, once all a county’s races are prepared we’ll stitch them together into a combined file.
There are several ways to accomplish this. For this example we’ll use the code below to capture all dataframes present in the global environment that contain the word processed in their names. Then we’ll append them all into a combined dataframe using the bind_rows() function from dplyr.


target_dfs <- grep("processed", names(.GlobalEnv), value=TRUE)
target_dfs_list <- do.call("list", mget(target_dfs))

processed_combined <- bind_rows(target_dfs_list)

Checking the Data

With the dataset ready, it never hurts to run a few quick counts to check the integrity of what we’ve created.

Let’s do some quick visual inspections to see if anything strange or unexpected stands out, such as districts you don’t expect, or candidate names that shouldn’t be there or were parsed incorrectly.

#check parties
processed_combined %>% 
  count(party) %>% 
  knitr::kable()

party	n
	588
CON	196
DEM	196
GRE	196
IND	196
LIB	196
REP	196
WOR	196


#check districts
processed_combined %>% 
  count(office, district) %>% 
  knitr::kable()

office	district	n
Presidential		1960


#check candidates
processed_combined %>% 
  count(candidate) %>% 
  knitr::kable()

candidate	n
Blanks	196
Brock Pierce	196
Donald J. Trump	392
Howie Hawkins	196
Jo Jorgensen	196
Joseph R. Biden	392
Voids	196
Write-Ins	196

Looks good! In our walkthrough, we’ve now successfully reshaped our precinct data to meet the OpenElections format and standardizations - it’s time to share it.

Exporting the Data

Now we’ll export the results to a csv file. The OpenElections project has a very specific way it prefers its file names to be structured. The function create_outfile_string() provides a shortcut to creating that filename for 2020 general election files. It aims to put the file into a directory also named for the county.

Let’s use the state and county variables we set already at the very top of this walkthrough to feed into create_outfile_string().

outfile_string <- create_outfile_string(current_state, current_county)
outfile_string
#> [1] "NY_Saratoga/20201103__NY__general__saratoga__precinct.csv"

With that in place, we’ll take the final step of exporting the file.

# not run:
# write_csv(processed_combined, outfile_string, na = "")