%\VignetteEngine{knitr::knitr}
%\VignetteIndexEntry{Chapter 11, Horton et al. using mosaic}
%\VignettePackage{Sleuth3}
\documentclass[11pt]{article}

\usepackage[margin=1in,bottom=.5in,includehead,includefoot]{geometry}
\usepackage{hyperref}
\usepackage{language}
\usepackage{alltt}
\usepackage{fancyhdr}
\pagestyle{fancy}
\fancyhf{}

%% Now begin customising things. See the fancyhdr docs for more info.

\chead{}
\lhead[\sf \thepage]{\sf \leftmark}
\rhead[\sf \leftmark]{\sf \thepage}
\lfoot{}
\cfoot{Statistical Sleuth in R: Chapter 11}
\rfoot{}

\newcounter{myenumi}
\newcommand{\saveenumi}{\setcounter{myenumi}{\value{enumi}}}
\newcommand{\reuseenumi}{\setcounter{enumi}{\value{myenumi}}}

\pagestyle{fancy}

\def\R{{\sf R}}
\def\Rstudio{{\sf RStudio}}
\def\RStudio{{\sf RStudio}}
\def\term#1{\textbf{#1}}
\def\tab#1{{\sf #1}}


\usepackage{relsize}

\newlength{\tempfmlength}
\newsavebox{\fmbox}
\newenvironment{fmpage}[1]
     {
   \medskip
   \setlength{\tempfmlength}{#1}
   \begin{lrbox}{\fmbox}
     \begin{minipage}{#1}
                 \vspace*{.02\tempfmlength}
                 \hfill
           \begin{minipage}{.95 \tempfmlength}}
                 {\end{minipage}\hfill
                 \vspace*{.015\tempfmlength}
                 \end{minipage}\end{lrbox}\fbox{\usebox{\fmbox}}
         \medskip
         }


\newenvironment{boxedText}[1][.98\textwidth]%
{%
\begin{center}
\begin{fmpage}{#1}
}%
{%
\end{fmpage}
\end{center}
}

\newenvironment{boxedTable}[2][tbp]%
{%
\begin{table}[#1]
  \refstepcounter{table}
  \begin{center}
\begin{fmpage}{.98\textwidth}
  \begin{center}
        \sf \large Box~\expandafter\thetable. #2
\end{center}
\medskip
}%
{%
\end{fmpage}
\end{center}
\end{table}             % need to do something about exercises that follow boxedTable
}


\newcommand{\cran}{\href{http://www.R-project.org/}{CRAN}}

\title{The Statistical Sleuth in R: \\
Chapter 11}

\author{
Linda Loi\and Kate Aloisio\and Ruobing Zhang \and Nicholas J. Horton\thanks{Department of Mathematics and Statistics, Smith College, nhorton@smith.edu}
} 

\date{\today}

\begin{document}


\maketitle
\tableofcontents

%\parindent=0pt


<<setup, include=FALSE, cache=FALSE>>=
require(knitr)
opts_chunk$set(
  dev="pdf",
  fig.path="figures/",
        fig.height=3,
        fig.width=4,
        out.width=".47\\textwidth",
        fig.keep="high",
        fig.show="hold",
        fig.align="center",
        prompt=TRUE,  # show the prompts; but perhaps we should not do this
        comment=NA    # turn off commenting of ouput (but perhaps we should not do this either
  )
@


<<pvalues, echo=FALSE, message=FALSE>>=
print.pval = function(pval) {
  threshold = 0.0001
    return(ifelse(pval < threshold, paste("p<", sprintf("%.4f", threshold), sep=""),
                ifelse(pval > 0.1, paste("p=",round(pval, 2), sep=""),
                       paste("p=", round(pval, 3), sep=""))))
}
@

<<setup2,echo=FALSE,message=FALSE>>=
require(Sleuth3)
require(mosaic)
trellis.par.set(theme=col.mosaic())  # get a better color scheme 
set.seed(123)
# this allows for code formatting inline.  Use \Sexpr{'function(x,y)'}, for exmaple.
knit_hooks$set(inline = function(x) {
if (is.numeric(x)) return(knitr:::format_sci(x, 'latex'))
x = as.character(x)
h = knitr:::hilight_source(x, 'latex', list(prompt=FALSE, size='normalsize'))
h = gsub("([_#$%&])", "\\\\\\1", h)
h = gsub('(["\'])', '\\1{}', h)
gsub('^\\\\begin\\{alltt\\}\\s*|\\\\end\\{alltt\\}\\s*$', '', h)
})
showOriginal=FALSE
showNew=TRUE
@ 

\section{Introduction}

This document is intended to help describe how to undertake analyses introduced as examples in the Third Edition of the \emph{Statistical Sleuth} (2013) by Fred Ramsey and Dan Schafer.
More information about the book can be found at \url{http://www.proaxis.com/~panorama/home.htm}.
This
file as well as the associated \pkg{knitr} reproducible analysis source file can be found at
\url{http://www.math.smith.edu/~nhorton/sleuth3}.


This work leverages initiatives undertaken by Project MOSAIC (\url{http://www.mosaic-web.org}), an NSF-funded effort to improve the teaching of statistics, calculus, science and computing in the undergraduate curriculum. In particular, we utilize the 
\pkg{mosaic} package, which was written to simplify the use of R for introductory statistics courses. A short summary of the R needed to teach introductory statistics can be found in the mosaic package vignette (\url{http://cran.r-project.org/web/packages/mosaic/vignettes/MinimalR.pdf}). 

To use a package within R, it must be installed (one time), and loaded (each session). The package can be installed using the following command:
<<install_mosaic,eval=FALSE>>=
install.packages('mosaic')               # note the quotation marks
@
Once this is installed, it can be loaded by running the command:
<<load_mosaic,eval=FALSE>>=
require(mosaic)
@
This
needs to be done once per session.

In addition the data files for the \emph{Sleuth} case studies can be accessed by installing the \pkg{Sleuth3} package.
<<install_Sleuth3,eval=FALSE>>=
install.packages('Sleuth3')               # note the quotation marks
@
<<load_Sleuth3,eval=FALSE>>=
require(Sleuth3)
@

We also set some options to improve legibility of graphs and output.
<<eval=TRUE>>=
trellis.par.set(theme=col.mosaic())  # get a better color scheme for lattice
options(digits=3, show.signif.stars=FALSE)
@

The specific goal of this document is to demonstrate how to calculate the quantities described in Chapter 11: Model Checking and Refinement using R.

\section{Alcohol metabolism in men and women}
How do men and women metabolise alcohol?  This is the question addressed in case study 11.1 in the \emph{Sleuth}.  

\subsection{Data coding, summary statistics and graphical display} 

We begin by reading the data and summarizing the variables.

<<>>=
summary(case1101)
@

A total of \Sexpr{nrow(case1101)} volunteers were included in this data.  There were \Sexpr{nrow(subset(case1101, Sex=="Female"))} females and \Sexpr{nrow(subset(case1101, Sex=="Male"))} males, as recorded on Display 11.1 (page 311 of the \emph{Sleuth}).

The following is a graphical display of the variables akin to Display 11.2 (page 312).
<<fig.height=8, fig.width=8>>=
xyplot(Metabol ~ Gastric | Sex+Alcohol, data=case1101, auto.key=TRUE,
  xlab="Gastric AD activity (mu mol/min/g of tissue)", 
  ylab="first pass metabolism (mmol/liter-hour)")
@

\subsection{Multiple regression}

First we can fit a full model for estimating \emph{metabolism} given a subjects \emph{gastric AD activity}, whether they are \emph{alcoholic} and \emph{gender}.  This first model is summarized on page 321 (Display 11.9).
<<display11.9>>=
case1101 = transform(case1101, Sex = factor(Sex, levels = c("Male", "Female")))
case1101 = transform(case1101, Alcohol = factor(Alcohol, levels = c("Non-alcoholic", "Alcoholic")))
lm1 = lm(Metabol ~ Gastric+Sex+Alcohol+Gastric*Sex+Sex*Alcohol+Gastric*Alcohol+Gastric*Sex*Alcohol, data=case1101); summary(lm1)
@

Next we can calculate a number of model diagnostics, including
leverage, studentized resids and Cook's distance (pages 325-327).
<<message=FALSE>>=
require(MASS)
@
<<>>=
case1101 = transform(case1101, hat = hatvalues(lm1))
case1101 = transform(case1101, studres = studres(lm1))
case1101 = transform(case1101, cooks = cooks.distance(lm1))
# display a particular row
case1101[31,]
@
The following is a residual plot for the full model akin to Display 11.7 (page 319).
<<>>=
plot(lm1, which=1)
@

From these diagnostics it appears that observations 31 and 32 may be influential points.  Therefore, we next re-fit the full model excluding these two observations. The following results are found in Display 11.9 and discussed on page 321.  
<<>>=
case11012 = case1101[-c(31, 32),]
lm2 = lm(Metabol ~ Gastric+Sex+Alcohol+Gastric*Sex+Sex*Alcohol+Gastric*Alcohol+Gastric*Sex*Alcohol, data=case11012); summary(lm2)
@

\subsection{Refining the Model}

This section addresses the process of refining the model.
We first tested the lack of fit for the removal of {\tt Alcohol} as shown in Display 11.13 (page 329).
<<lackoffit>>=
lm3 = lm(Metabol ~ Gastric+Sex+Gastric*Sex, data=case11012); summary(lm3)
anova(lm3, lm2) # page 322
@

Next we assessed a model without an intercept which is scientifically plausible as summarized in Display 11.14 
(page 329).
<<display11.14>>=
lm4 = lm(Metabol ~  Gastric+Gastric:Sex - 1, data=case11012); summary(lm4)
anova(lm4, lm3)
@
Note that the ``Summary of Statistical Findings" section (page 312) is based on this final model.

\section{Blood brain barrier}
Neuroscientists working to better understand the blood brain barrier have infused rats with cells to induce brain tumors.   This is the topic addressed in case study 11.2 in the \emph{Sleuth}.  

\subsection{Data coding and summary statistics}

We begin by reading the data, performing transformations where needed and summarizing the variables.

<<case1102>>=
names(case1102)
case1102 = transform(case1102, Y = Brain/Liver)
case1102 = transform(case1102, logliver = log(Liver))
case1102 = transform(case1102, logbrain = log(Brain))
case1102 = transform(case1102, SAC = as.factor(Time))
case1102 = transform(case1102, logy = log(Brain/Liver))
case1102 = transform(case1102, logtime = log(Time))
case1102 = transform(case1102, Treat = relevel(Treatment, ref="NS"))
summary(case1102)
@

A total of \Sexpr{nrow(case1102)} rats were included in this experiment.  Each rat was given either the barrier solution (n = \Sexpr{nrow(subset(case1102, Treat=="BD"))}) or a normal saline solution (n = \Sexpr{nrow(subset(case1102, Treat=="NS"))}). Then variables of interest were calculated and are displayed in Display 11.4 (page 314 of the \emph{Sleuth}).

We can graphically relationships between the variables using a pairs plot. 
<<fig.height=10, fig.width=10>>=
smallds = subset(case1102, select=c("logy", "logbrain","logliver","Treat", "SAC"))
pairs(smallds)
@

\subsection{Graphical presentation}


The following displays a scatterplot of log ratio (Y) as a function of log time, akin to Display 11.5 on page 315.

<<>>=
xyplot(Y ~ Time, group=Treat, scales=list(y=list(log=TRUE), 
  x=list(log=TRUE)), auto.key=TRUE, data=case1102)
@

The following graphs are akin to the second and third plots in Display 11.16 on page 333.

<<>>=
case1102=transform(case1102, female = ifelse(Sex=="F", 1, 0))
xyplot(logy ~ jitter(female), xlab="Sex", data=case1102)
@

<<>>=
xyplot(logy ~ jitter(Days), data=case1102)
@

\subsection{Multiple regression}

We first fit a model that reflects the initial investigation. This is the proposed model from page 317.
<<fullmodel>>=
lm1 = lm(logy ~ SAC+Treat+SAC*Treat+Days+Sex+ 
  Weight+Loss+Tumor, data=case1102); summary(lm1)
@

We can then display a residual plot to assess the fit of the above model. This is provided in
Display 11.6 (page 318). 
<<>>=
plot(lm1, which=1)
@


\subsection{Refining the model}

Lastly, we fit a refined model. These results can be found in Display 11.17 (page 334).
<<reducedmodel>>=
lm2 = lm(logy ~ SAC+Treat, data=case1102); summary(lm2) 
anova(lm2, lm1)
@
\end{document}