---
title: "hR: Better Data Engineering in Human Resources"
author: "Dale Kube"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{hR: Better Data Engineering in Human Resources}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
library(hR)
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

Methods for data engineering in the human resources (HR) corporate domain. Designed for HR analytics practitioners and workforce-oriented data sets. The use of two functions, `hierarchy` and `hierarchyStats`, is demonstrated below. Convert standard employee and supervisor relationship data into useful formats for robust analytics, summary statistics, and span of control metrics. Install the package from CRAN by running the `install.packages("hR")` command.

## workforceHistory data

The examples in this vignette use the sample `workforceHistory` data set. This data set reflects an artificial organization's historical workforce/employment data. The sample is reduced to a data.table containing one row per active employee and contractor in order to properly iterate over the current hierarchy structure in the following sections.

```{r workforceHistory}
data("workforceHistory")

# Reduce to DATE <= today to exclude future-dated records
dt = workforceHistory[DATE<=Sys.Date()]

# Reduce to max DATE and SEQ per person
dt = dt[dt[,.I[which.max(DATE)],by=.(EMPLID)]$V1]
dt = dt[dt[,.I[which.max(SEQ)],by=.(EMPLID,DATE)]$V1]

# Only consider workers who are currently active
# This provides a reliable 'headcount' data set that reflects today's active workforce
dt = dt[STATUS=="Active"]

# Exclude the CEO because she does not have a supervisor
CEO = dt[TITLE=="CEO",EMPLID]
dt = dt[EMPLID!=CEO]

# Show the prepared table
# This represents an example, active workforce
print(dt[,.(EMPLID,NAME,TITLE,SUPVID)])
```


## hierarchy

The `hierarchy` convenience function transforms a standard set of unique employee and supervisor identifiers (employee IDs, email addresses, etc.) into an elongated or wide format that can be used to aggregate employee data by a particular line of leadership (i.e. include everyone who rolls up to Susan).

When `format = "long"`, the function returns a long data.table consisting of one row per employee for every supervisor above them, up to the top of the tree. 

```{r hierarchyLong}
hLong = hierarchy(dt$EMPLID,dt$SUPVID,format="long")
print(hLong)

# Who reports up through Susan? (direct and indirect reports)
print(hLong[Supervisor==CEO])
```


When `format = "wide"`, the function returns a wide data.table with a column for every level in the hierarchy, starting from the top of the tree (i.e. "Supv1" is the top person in the hierarchy).

```{r hierarchyWide}
hWide = hierarchy(dt$EMPLID,dt$SUPVID,format="wide")
print(hWide)

# Who reports up through Pablo? (direct and indirect reports)
print(hWide[Supv2==199827])
```

## hierarchyStats

The `hierarchyStats` function computes summary statistics and span of control metrics from a standard set of unique employee and supervisor identifiers (employee IDs, email addresses, etc.). The resulting metrics and table are accessible from a list object.

```{r hierarchyStats}
hStats = hierarchyStats(dt$EMPLID,dt$SUPVID)

# Total Levels:
print(hStats$levelsCount$value)

# Total Individual Contributors:
print(hStats$individualContributorsCount$value)

# Total People Managers:
print(hStats$peopleManagersCount$value)

# Median Direct Reports:
print(hStats$medianDirectReports$value)

# Median Span of Control (Direct and Indirect Reports):
print(hStats$medianSpanOfControl$value)

# Span of Control Table
print(hStats$spanOfControlTable)
```