---
title: "How the dea_street_names file was made."
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{How the dea_street_names file was made.}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  eval = FALSE,
  comment = "#>"
)
```

"c_cs_alpha.pdf" was downloaded from https://www.deadiversion.usdoj.gov/schedules/orangebook/c_cs_alpha.pdf on 2020-09-08

"DIR-020-17 Drug Slang Code Words.pdf" was downloaded from <https://www.dea.gov/> on 2020-09-08 but the full path (`documents/2017/05/01`) no longer exists. We believe the original file can be found here:
<https://www.dea.gov/sites/default/files/2018-07/DIR-020-17%20Drug%20Slang%20Code%20Words.pdf>. Note that the file name was changed to remove the spaces (which appear as `%20`) in the link above.


```{r}
library(pdftools)
x <- pdf_text("../inst/extdata/DIR-020-17DrugSlangCodeWords.pdf")

sink("slang.txt")
cat(x[2:7])
sink()
```

Opened with Ultra Edit
Edited text file to remove headers
Used Format: Convert Line Terminator to Wrap

+ Added ~ as separator
+ Added ~ to make brand column
+ Removed ®
+ Removed () around generic name
+ Moved brand to brand column

```{r}
library(readr)  # read_delm

s <- read_delim("../inst/extdata/slangUE.txt", "~", escape_double = FALSE,
                col_names = c("category", "brand", "slang"),
                col_types = cols(
                  category = col_character(),
                  brand = col_character(),
                  slang = col_character()),
                  trim_ws = TRUE)

library(tidyr)  # for separate_rows()
library(stringr)  #str_squish
suppressPackageStartupMessages(library(dplyr))  # mutate  %>%

dea_street_names <- separate_rows(s, slang, sep = ";") %>%
  mutate(slang = str_squish(slang)) %>%
  distinct() # Blaze listed twice in Synthetic Cannabinoids

```

```{r}
usethis::use_data(dea_street_names,overwrite = TRUE)
```