library(precinctsopenelex)
library(tidyr)
library(stringr)
library(dplyr, warn.conflicts = FALSE)
library(janitor, warn.conflicts = FALSE)
library(readxl)

This vignette is intended to provide a walk-through of processing a set of precinct results from Saratoga County, NY, for the 2020 U.S. presidential election.

The sample dataset for Saratoga is included with the package to allow for exploration, and provides a template for how the data should be structured for the reshaping functions to operate.

Importing the Wide Data

First we’ll assign a state abbreviation and county name variable needed for the input file string

#chose state abbreviation and county name variable needed for the input file string
current_state <- "NY"
current_county <- "Saratoga"

Now we’ll use our first function from the package to create a name string for the pre-processed “wide” data file saved as Excel. The infile_string() function can help us with this.


infile_string <- create_infile_string(current_state, current_county)
infile_string
#> [1] "NY_Saratoga/NY_Saratoga_GE20_cleaned.xlsx"

Reshaping/Processing the Data

For the purposes of this walkthrough, instead of importing a new file we’ll use a sample dataset included with the package - precinctsampledata_ny - to demonstrate what the pre-processed “wide” data should look like structure-wise when.

Let’s take a look at the sample data:

precinctsampledata_ny %>% 
  head(4) %>% 
  knitr::kable()
precinct Joseph R. Biden - DEM Donald J. Trump - REP Donald J. Trump - CON Joseph R. Biden - WOR Howie Hawkins - GRE Jo Jorgensen - LIB Brock Pierce - IND WriteIn Blanks Voids
Ballston 1 273 195 16 31 5 8 1 0 4 1
Ballston 2 485 412 58 18 7 20 6 0 5 2
Ballston 3 577 502 64 31 3 7 7 0 7 3
Ballston 4 428 316 48 16 3 11 5 0 3 4

Note the precise way of formatting the column names that will allow the function to work correctly:
– candidate columns should be written as “Candidate Name - Party Abbreviation” (e.g. “Joe Biden - DEM”)
– additional choices should be listed as “WriteIn” (write-in votes), “Blanks” (undervotes), and “Voids” (overvotes)

Now let’s run the function the package’s main function - reshape_precinct_data() - to transform the dataset into the tidy/long format the OpenElections project needs, along with the correct standardized column names.

The reshape_precinct_data() function wants:
– dataset (or nested import from file function)
– office: text label for office (e.g. “U.S. House”)
– district: text label for district (e.g. “42”; note that statewide offices have have no district, so use "")

Ok, now we’re ready to see the rubber meet the road.

Let’s run reshape_precinct_data() on our sample data:

processed_prez <- reshape_precinct_data(precinctsampledata_ny,
                                  "Presidential",
                                  "")

processed_prez %>% 
  head(10) %>% 
  knitr::kable()
precinct office district candidate party votes
Ballston 1 Presidential Joseph R. Biden DEM 273
Ballston 1 Presidential Donald J. Trump REP 195
Ballston 1 Presidential Donald J. Trump CON 16
Ballston 1 Presidential Joseph R. Biden WOR 31
Ballston 1 Presidential Howie Hawkins GRE 5
Ballston 1 Presidential Jo Jorgensen LIB 8
Ballston 1 Presidential Brock Pierce IND 1
Ballston 1 Presidential Write-Ins 0
Ballston 1 Presidential Blanks 4
Ballston 1 Presidential Voids 1

Bingo. We now have the results in the correct format, with the column names also matching the OpenElections naming conventions.

In a real-world scenario, there would be several races to process in addition to presidential: U.S Senate and House, along with state House and Senate. They can be done using the same manner.

As noted earlier, the dataframe fed to the function can a named R object table already imported in your script, or if you prefer you can also nest the import itself inside the first argument, such as this:

# not run:
# processed_prez <- process_ny_data(read_excel(filestring_import, sheet = "presidential"), 
#                                   "President", 
#                                   "")

In the end, once all a county’s races are prepared we’ll stitch them together into a combined file.
There are several ways to accomplish this. For this example we’ll use the code below to capture all dataframes present in the global environment that contain the word processed in their names. Then we’ll append them all into a combined dataframe using the bind_rows() function from dplyr.


target_dfs <- grep("processed", names(.GlobalEnv), value=TRUE)
target_dfs_list <- do.call("list", mget(target_dfs))

processed_combined <- bind_rows(target_dfs_list)

Checking the Data

With the dataset ready, it never hurts to run a few quick counts to check the integrity of what we’ve created.

Let’s do some quick visual inspections to see if anything strange or unexpected stands out, such as districts you don’t expect, or candidate names that shouldn’t be there or were parsed incorrectly.

#check parties
processed_combined %>% 
  count(party) %>% 
  knitr::kable()
party n
588
CON 196
DEM 196
GRE 196
IND 196
LIB 196
REP 196
WOR 196

#check districts
processed_combined %>% 
  count(office, district) %>% 
  knitr::kable()
office district n
Presidential 1960

#check candidates
processed_combined %>% 
  count(candidate) %>% 
  knitr::kable()
candidate n
Blanks 196
Brock Pierce 196
Donald J. Trump 392
Howie Hawkins 196
Jo Jorgensen 196
Joseph R. Biden 392
Voids 196
Write-Ins 196

Looks good! In our walkthrough, we’ve now successfully reshaped our precinct data to meet the OpenElections format and standardizations - it’s time to share it.

Exporting the Data

Now we’ll export the results to a csv file. The OpenElections project has a very specific way it prefers its file names to be structured. The function create_outfile_string() provides a shortcut to creating that filename for 2020 general election files. It aims to put the file into a directory also named for the county.

Let’s use the state and county variables we set already at the very top of this walkthrough to feed into create_outfile_string().

outfile_string <- create_outfile_string(current_state, current_county)
outfile_string
#> [1] "NY_Saratoga/20201103__NY__general__saratoga__precinct.csv"

With that in place, we’ll take the final step of exporting the file.

# not run:
# write_csv(processed_combined, outfile_string, na = "")