It is a smart idea to evaluate the power of your experimental design before you begin your study. There is no point in running an experiment if the sample size is so small you have no chance of detecting a reasonable effect size.

The analysis will work with a one-way ANOVA, illustrated with three treatments. The user provides the number of groups, the sample size, mean, and variance for each treatment group, as well as the number of random data sets to create.

Initialize and provide input parameters

The number of groups can be changed as needed. Also, you could use a gamma instead of a normal distribution, but you would need to provide shape and scale parameters for each treatment group. AOV model will still assume data are normal. RanN sets the number of replications that will be used in the simulations.

# set.seed <- 57   # for repeatable results
library(ggplot2)

RanN <- 1000     # number of random data sets to create

MeanA <- 100 # mean of treatment group A
sdA <- 20    # standard deviation of treatment group A
RepA <- 15   # number of replicates of treatment group A

MeanB <- 100
sdB <- 20
RepB <- 15

MeanC <- 100
sdC <- 20
RepC <- 15

ParamList <- list(RanN=RanN,
                  MeanA=MeanA,sdA=sdA,RepA=RepA,
                  MeanB=MeanB,sdB=sdB,RepB=RepB,
                  MeanC=MeanC,sdC=sdC,RepC=RepC)

Create data frame

Data are organized in long format, with each row as an observation, and the columns giving the response variable and the treatment assignment (which is read as a factor).

##################################################
# function: DataGen
# create a single data frame
# input: parameter list 
# output: 2-column dataframe, with factor Treatment and response
#------------------------------------------------- 
DataGen <- function(ParamList) {
Trt1 <- rnorm(n=RepA,mean=MeanA,sd=sdA)
Trt2 <- rnorm(n=RepB,mean=MeanB,sd=sdB)
Trt3 <- rnorm(n=RepC,mean=MeanC,sd=sdC)
Trt <- c(rep("A",RepA),rep("B",RepB),rep("C",RepC))
x <- data.frame(Trt,Response=c(Trt1,Trt2,Trt3))
return(x)

}
##################################################

Calculate p-value for ANOVA

Data are fitted to standard AOV model, but glm or others could be substituted here. Note use of unlist function to easily pull out single coefficients from model summaries, which can then be referenced by the quoted name from a simple named vector of elements.

##################################################
# function: getP
# extract p-value from ANOVA
# input: data frame
# output: p-value for ANOVA
#------------------------------------------------- 
getP <- function(dframe) {
pVal <- unlist(summary(aov(dframe$Response~dframe$Trt)))["Pr(>F)1"]
return(pVal)
}
##################################################

Graph output

For a single data set (entered as a data frame; could be real or simulated), this function creates a violin plot, showing all of the data points in black and the treatment means as white diamonds. The p value is taken from the AOV model, which is fitted once. Note code necessary to automatically annotate with model values. The default theme for ggplot is theme_grey, which is needed to adjust the base_size parameter and increase font size of the labels. Default colors are used, but scale_fill_manual could be used to supply a better palette.

##################################################
# function: AnovaPlot
# Creates box plots with data overlays for ANOVA data
# input: 2-column long-form data frame with 1-way layout
# output: ggplot graph
#------------------------------------------------- 
AnovaPlot <- function(z) {
  # pull out p value from model summary
  pVal <- getP(z)
  # create an expression to add to the graph
eqn <- as.character(as.expression(
       substitute(italic(p) == pCon,
                  list(pCon= format(pVal,digits=3)))))
  
  
AnovaFig <- ggplot(data=z, aes(x=Trt,y=Response,fill=Trt))
AnovaFig +  
  geom_violin() +
  geom_point() + 
  theme_gray(base_size = 20) +
  xlab("Treatment") +
  stat_summary(fun.y=mean,geom="point",fill="white",shape=23,size=2.5) +
  annotate("text",label=eqn,parse=TRUE,x=-Inf,y=Inf,hjust=-0.5,vjust=1.5)

}
##################################################

Replicate the model

The SimEngine function replicates the model, with the RanN parameter from the parameter list. Note the single-line use of replicate, which is applied to the DataGen function.

##################################################
# function: SimEngine
# Simulates multiple runs of DataGen and retains p value from each
# input: ParamList, which contains the number of simulations RanN
# output: Vector of p values
#------------------------------------------------- 
SimEngine <- function(z) {

pVec <- replicate(n=z$RanN,expr=getP(DataGen(z)))  

return(pVec)
}
##################################################

Plot results of power test

This function takes as input the vector of p-values that were generated by SimEngine, plots them as a two-color histogram and calculates the proportion that were statistically significant, which is the power of the test for any data set with a non-zero effect size.

##################################################
# function: PowerHistogram
# Plots distribution of p values from simulated data
# input: vector of p values
# output: two-color histogram and calculation of overall power
#------------------------------------------------- 
PowerHistogram <- function(v) {

# create data frame including color split at p <= 0.05
  sigP <- ifelse(v<=0.05,"Sig","NonSig")
  plotData <- data.frame(pVal=v,sig=sigP)
  
  # pull out p value from model summary
  Powerx <- mean(plotData$pVal<=0.05)
    # create an expression to add to the graph
eqn <- as.character(as.expression(
       substitute(Power == Powerx,
                  list(Powerx = format(Powerx,digits=3)))))
  
  PowerPlot <- ggplot(data=plotData,aes(x=pVal,fill=sig))
  PowerPlot + geom_histogram(bins=40,color="black") +
  theme_gray(base_size=20) + 
  xlab("P value") +
  ylab("Frequency") +
  scale_fill_manual(values=c("goldenrod","brown")) +
  xlim(0,1) +
  annotate("text",label=eqn,parse=TRUE,x=Inf,y=Inf,hjust=1.5,vjust=1.5) +
  guides(fill=FALSE)
  
}
##################################################

Typical Use of Functions

The AnovaPlot function can be used to generate (repeatedly) individual data sets and to visualize their distribution and statistical significance. It takes as input the data frame created when DataGen is applied to the ParamList. The PowerHistogram function takes the output from SimEngine based on the ParamList, displays the histogram and the power of the test.

AnovaPlot(DataGen(ParamList))        # Depict sample data set

PowerHistogram(SimEngine(ParamList)) # Replicate model, conduct power test, plot results