VectorSurv provides public health agencies the tools to manage, visualize and analyze the spread of vector-borne diseases and make informed decisions to protect public health.

The ‘vectorsurvR’ package is intended for users of VectorSurv, a public health vector borne disease surveillance system. The package contains functions tailored to data retrieved from the VectorSurv database. A valid VectorSurv username and password is required for data retrieval. Those without agency access can use sample datasets in place of real data. This documentation covers the functions in ‘vectorsurvR’ and introduces users to methods of R programming. The purpose of this documentation is to introduce and guide users with limited programming experience.

To install package from CRAN (recommended) run:


Or install the developing version from our github run:


Then load the package for use.

Data Retrieval



getToken() returns a token needed to run getArthroCollections() and getPools(). The function prompts users for their Gateway credentials. If credentials are accepted, the function returns a user token needed to obtain data and a list of agencies the user has access to.




token = getToken()



getArthroCollections(...) obtains collections data for a range of years. It prompts the user for their Gateway username and password before retrieving the associated data. You can only retrieve data from agencies linked to your Gateway account.


getArthroCollections(token,start_year, end_year, arthropod, agency_ids = NULL)


collections = getArthroCollections(token, 2022,2023, 'mosquito',55)



getPools() similar to getArthroCollections() obtains pools on a year range (start_year, end_year) after supplying a valid token retrieved from getToken(). getPools() can retrieve data for both mosquito and tick pools.


getPools(token, start_year, end_year, arthropod, agency_ids = NULL) Arguments

pools = getPools(token, 2022,2023, 'mosquito')

Write Data to file

You can save retrieved data as a .csv file in your current directory using write.csv(). That same data can be retrieved using read.csv(). Writing data to a .csv can make the rendering process more efficient when generating reports in R. We recommend that you write the data pulled from our API into a csv and then load that data when generating reports.

#creates a file named "collections_18_23.csv" in your current directory
write.csv(x = collections, file = "collections_22_23.csv")

#loads collections data
collections = read.csv("collections_22_23.csv")

Sample Data

The ‘vectorsurvR’ package comes with two sample datasets which can be used in place of real collections and pools data. sample_collections and sample_pools will be used for example purposes in this document.

Data Processing

Data can be subset to contain columns of interest. Subsetting can also be used to reorder the columns in a data frame.Do not subset collections or pools data before inputting them into VectorSurv calculator functions to avoid losing essential columns. It is recommended to subset after calculations are complete and before inputting into a table generator. Remember, subsetting, filtering, grouping and summarising will not change the value of the data unless it is reassigned to the same variable name. We recommend creating a new variable for processed data.


#Subset using column names or index number

colnames(sample_collections) #displays column names and associated index
#>  [1] "agency_code"          "collection_id"        "collection_date"     
#>  [4] "surv_year"            "species_display_name" "sex_type"            
#>  [7] "trap_acronym"         "trap_problem_bit"     "num_trap"            
#> [10] "trap_nights"          "num_count"            "site_code"

#Subseting by name
head(sample_collections[c("collection_date", "species_display_name", "num_count")])
#> # A tibble: 6 × 3
#>   collection_date species_display_name num_count
#>   <date>          <chr>                    <int>
#> 1 2016-07-26      Ae nigromaculis             21
#> 2 2016-08-07      Cx tarsalis                  1
#> 3 2016-07-23      Cx pipiens                  83
#> 4 2016-05-21      Ae vexans                   12
#> 5 2016-03-25      Cx pipiens                   1
#> 6 2016-08-13      Ae vexans                    1

#by index
head(sample_collections[c(2, 4, 10)])
#> # A tibble: 6 × 3
#> # Groups:   surv_year [1]
#>   collection_id surv_year trap_nights
#>           <int>     <dbl>       <int>
#> 1       1878571      2016           1
#> 2       1886007      2016           7
#> 3       1874021      2016           1
#> 4       1585849      2016           1
#> 5       1544658      2016           7
#> 6       1891660      2016           7

#to save a subset
collections_subset = sample_collections[c(2, 4, 10)]

Filtering and subsetting in ‘dplyr’

‘dplyr’ is a powerful package for filtering and sub-setting data. It follows logic similar to SQL queries.

For more information on data manipulation using ‘dplyr’ Click Here

‘dplyr’ utilizes the pipe operator %>% to send data into functions. The head() function returns the first few rows of data, specifying head(1) tells the software to return only the first row for viewing purposes. Remove head() to see all the data or reassign the data to a new variable.

#NOTE: library was loaded above
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>     filter, lag
#> The following objects are masked from 'package:base':
#>     intersect, setdiff, setequal, union

#Subsetting columns with 'select()'
sample_collections %>%
  dplyr::select(collection_date, species_display_name, num_count) %>% head()
#> Adding missing grouping variables: `surv_year`
#> # A tibble: 6 × 4
#> # Groups:   surv_year [1]
#>   surv_year collection_date species_display_name num_count
#>       <dbl> <date>          <chr>                    <int>
#> 1      2016 2016-07-26      Ae nigromaculis             21
#> 2      2016 2016-08-07      Cx tarsalis                  1
#> 3      2016 2016-07-23      Cx pipiens                  83
#> 4      2016 2016-05-21      Ae vexans                   12
#> 5      2016 2016-03-25      Cx pipiens                   1
#> 6      2016 2016-08-13      Ae vexans                    1

Below are more examples for filtering data.

#filtering with dplyr 'filter'
collections_pip = sample_collections %>%
  filter(species_display_name == "Cx pipiens")

#filtering multiple arguments using '%in%'
collections_pip_tar = sample_collections %>%
  filter(species_display_name %in% c("Cx pipiens", "Cx tarsalis"))

Grouping and Summarising

In addition to filtering and sub-setting, data can be group by variables and summarized.

#groups by species and collection date and sums the number counted

sample_collections %>%
  group_by(collection_date, species_display_name) %>%
  summarise(sum_count = sum(num_count, na.rm = T)) %>%
#> `summarise()` has grouped output by 'collection_date'. You can override using
#> the `.groups` argument.
#> # A tibble: 6 × 3
#> # Groups:   collection_date [4]
#>   collection_date species_display_name sum_count
#>   <date>          <chr>                    <int>
#> 1 2016-01-03      Cs inornata                  1
#> 2 2016-01-14      Cs incidens                  2
#> 3 2016-01-14      Cx pipiens                   1
#> 4 2016-01-14      Cx tarsalis                  1
#> 5 2016-01-22      Cx tarsalis                  1
#> 6 2016-02-05      An freeborni                 1

#groups by species and collection date and takes the average the number counted

sample_collections %>%
  group_by(collection_date, species_display_name) %>%
  summarise(avg_count = mean(num_count, na.rm = T)) %>%
#> `summarise()` has grouped output by 'collection_date'. You can override using
#> the `.groups` argument.
#> # A tibble: 6 × 3
#> # Groups:   collection_date [4]
#>   collection_date species_display_name avg_count
#>   <date>          <chr>                    <dbl>
#> 1 2016-01-03      Cs inornata                  1
#> 2 2016-01-14      Cs incidens                  2
#> 3 2016-01-14      Cx pipiens                   1
#> 4 2016-01-14      Cx tarsalis                  1
#> 5 2016-01-22      Cx tarsalis                  1
#> 6 2016-02-05      An freeborni                 1


Data can be manipulated into long and wide (spreadsheet) forms using pivot_wider() and pivot_longer() from the ‘tidyr’ package. By default data from the API is in long form. Here we pivot on species and sex condition names using num_count as values. The end result is data with num_count values in the columns named species_sex. For more on pivoting see ??pivot_longer() and ??pivot_wider().


collections_wide = pivot_wider(
  names_from = c("species_display_name","sex_type"),
  values_from = "num_count"
#> Warning: Values from `num_count` are not uniquely identified; output will contain
#> list-cols.
#> • Use `values_fn = list` to suppress this warning.
#> • Use `values_fn = {summary_fun}` to summarise duplicates.
#> • Use the following dplyr code to identify duplicates.
#>   {data} |>
#>   dplyr::summarise(n = dplyr::n(), .by = c(agency_code, collection_id,
#>   collection_date, surv_year, trap_acronym, trap_problem_bit, num_trap,
#>   trap_nights, site_code, species_display_name, sex_type)) |>
#>   dplyr::filter(n > 1L)





getAbundance() uses any amount of mosquito collections data to calculate the abundance for the specified parameters. The function calculates using the methods of the Gateway Abundance calculator.


getAbundance(collections,interval, species = NULL, trap = NULL, separate_by = NULL)


  interval = "Biweek",
  species = c("Cx tarsalis", "Cx pipiens"),
  trap = "CO2",
  separate_by = NULL
Abundance Anomaly (comparison to 5 year average)



getAbundanceAnomaly(...) requires at least five years prior to the target_year of mosquito collections data to calculate for the specified parameters. The function uses the methods of the Gateway Abundance Anomaly calculator, and will not work if there is fewer than five years of data present.


getAbundanceAnomaly(collections,interval,target_year, species = NULL, trap = NULL, separate_by = NULL)


                    interval = "Biweek",
                    target_year = 2020,
                    species = c("Cx tarsalis", "Cx pipiens"),
                    trap = "CO2",
                    separate_by  = "species") 
Infection Rate



getInfectionRate(...) estimates the arbovirus infection rate based on testing pools of mosquitoes.


getInfectionRate(pools,interval, target_year, target_disease,pt_estimate, scale = 1000, species = c(NULL), trap = c(NULL))


                      interval = "Week",
                      target_disease = "WNV",
                      pt_estimate = "mle", 
                      scale = 1000,
                      species = c("Cx pipiens", "Cx tarsalis"),
                      trap = c("CO2"),
                      separate_by="species", wide = FALSE )
Vector Index



getVectorIndex(...) The vector index is the relative abundance of infected mosquitoes and is a way to quickly estimate the risk of arbovirus transmission in an area. Vector index is the product of the abundance and infection rate for a given time interval: VectorIndex=InfectionRateAbundance


getVectorIndex(collections, pools, interval, , target_disease, pt_estimate,species=NULL, trap = NULL,)

Arguments - collections: collections data retrieved from getArthroCollections(...) - pools: Pools data retrieved from getPools(...)

Note: Years from pools and collections data must overlap

               interval = "Biweek",
               target_disease = "WNV",
               pt_estimate = "bc-mle",
               species = c("Cx tarsalis"), 
               trap =  c("CO2"),
               wide = FALSE)
getPoolsComparisionTable() produces a frequency table for positive and negative pools counts by year and species. The more years present in the data, the larger the table.


getPoolsComparisionTable(pools,target_disease, species_separate=F)


  interval = "Week",
  target_disease = "WNV"
Styling Dataframes with ‘kable’

Professional looking tables can be produced using the ‘kable’ and ‘kableExtra’ packages.

#> Attaching package: 'kableExtra'
#> The following object is masked from 'package:dplyr':
#>     group_rows

AbAnOutput = getAbundance(
  interval = "Biweek",
  species = c("Cx tarsalis", "Cx pipiens"),
  trap = "CO2",
  separate_by = "species")

Table X: Combined biweekly Abundance Calculation for Cx. tarsalis, pipiens in CO2 traps

Data using ‘datatables’

Interactive html only tables can be produced using the ‘DT’ package. ‘DT’ tables allow for sorting and filtering with in a webpage. These are ideal for viewing data but are not compatible with pdf or word formats.


AbAnOutput %>%
  datatable(colnames =  c("Disease Year", "Biweek", "Count", "Species","Trap Type","Trap Events", "Abundance"))