---
title: "cbc_lm"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{cbc_lm}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r setup}
library(OLStrajr)
```


## Introduction

The OLStrajr package provides the cbc_lm function for obtaining estimates for each case. This vignette will demonstrate its use through an example that utilizes the Robins dataset. This dataset presents the yearly male-to-female ratio for Robins, as documented in  [Birds: incomplete counts—five-minute bird counts Version 1.0](https://www.doc.govt.nz/documents/science-and-technical/inventory-monitoring/im-toolbox-birds-incomplete-five-min-counts.pdf).

```{r}
data(robins)
```


### Preparing Data

While the OLStraj R function works with wide-form data to maintain compatibility with the original OLStraj SAS macro, the cbc_lm function aligns with common R statistical modeling practices by utilizing long-form data. In this example, we employ the tidyr package to transform the Robins dataset into a long format.

Moreover, we adjust the resulting Year variable into a numeric format with a range from 0 to 4 through the following steps: conversion to a factor, coercion to numeric, and subtraction of 1.

```{r}
# Convert to long form
library(tidyr)

robinsL <- robins |> pivot_longer(cols = starts_with("aug"),
                                  names_to = "Year",
                                  values_to = "Ratio")

robinsL$Year = as.numeric(as.factor(robinsL$Year)) - 1
```


### Running the case by case regression

```{r}
robins_mod <- cbc_lm(robinsL, Ratio ~ Year, .case = "site")

# Show class
class(robins_mod)
```


The cbc_lm class includes print, summary, and plot methods. In the following section, we'll take a closer look at the summary method.

### Model Summary

According to [Carrig, Wirth, and Curran (2004)](https://www.tandfonline.com/doi/abs/10.1207/S15328007SEM1101_9), the mean values of the case-by-case OLS intercepts and slopes can act as unbiased estimators of the mean population intercept and slope. In the implementation of case-by-case regression by [Rogosa & Saner (1995)](https://journals.sagepub.com/doi/10.3102/10769986020002149), it is noted that the standard errors are the standard deviations across 4,000 bootstrap replications, and the 90% confidence intervals' endpoints correspond to 5% and 95% values of the empirical distributions obtained from the resampling. They further suggested that more sophisticated and accurate confidence intervals can be developed using the methods in [Efron and Tibshirani (1993)](https://www.taylorfrancis.com/books/mono/10.1201/9780429246593/introduction-bootstrap-bradley-efron-tibshirani)."

By default, the summary method for the cbc_lm class initially displays the mean coefficients, bootstrap standard errors (over 4,000 replications), and bootstrap 95% confidence intervals. In addition, it exhibits the broom::tidy and broom::glance results for each case.

```{r}
summary(robins_mod)
```