---
title: "Correlation and Mixed Effects"
output:
  html_document: default
  html_notebook: default
---


### Introduction

In this tutorial we want to evaluate how the parameters we measure with the MultispeQ relate to crop outputs. We will use the same experiment that we used in the previous tutorial: "ANOVA and multivariate analysis."

However, in this tutorial,the data was transformed in a csv file so we cannot generate the dataframe directly from photosynq.org.

The first step will be to generate a correlation matrix of crop yield against the mean of each PhotosynQ parameter. Next, we will use a mixed effect model to account for factors that affect the PhotosynQ parameters: 1) light intensity, 2) time of measurement, 3) location within the field, etc.

### Import Libraries
```{r}
library("broom", lib.loc="C:/Users/Dan/Documents/R/win-library/3.3")
library("dplyr", lib.loc="C:/Users/Dan/Documents/R/win-library/3.3")
library("data.table", lib.loc="C:/Users/Dan/Documents/R/win-library/3.3")
library("stringr", lib.loc="C:/Users/Dan/Documents/R/win-library/3.3")
library("Hmisc", lib.loc="C:/Users/Dan/Documents/R/win-library/3.3")
library("lme4", lib.loc="C:/Users/Dan/Documents/R/win-library/3.3")

```

### Simple Correlation

We will use the dataset 'sun2.' This dataset contains the mean values of each PhotosynQ parameter and the crop yield for each plot. 


First, we need to open up our data file.
```{r}
# Set the main folder location for your data
your_location = "C:/Users/Dan/Desktop/"
# The main data file, located in the main folder, is called... 
file_name = "sun2.csv"

# Let's pull that data file into R and call it data_raw
my_data = read.csv(str_c(your_location, file_name))
View(my_data)
```

Now, lets generate a correlation matrix.

```{r}
my_data <- my_data[c(3,4,5,6,7)]
head(my_data, 6)

res2 <- rcorr(as.matrix(my_data))
res2
```
The first corrleation matrix presents the correlation coefficients and the second matrix contains the corresponding p values.

There was a signficant correlation between crop yield and SPAD, but crop yield was not significantly correlated with any other PhotosynQ parameter.

### Transforming our data with a Mixed Effect Model

Now, we will use a mixed effect model to generate adjusted coefficients for each PhotosynQ parameter in each unique plot. The mixed effect model will take into account light intensity, the time of measurement, the location within the field (i.e. block or replicate), and each unique plot. Our unique plot, which we generated by using the 'concatenate' function in excel, combines the variety and block to give us the unique plot. For example, Variety A + Block 1 = plot A1.

We will use the dataset 'sun,' which is the same dataset used in the "ANOVA and multivariate analysis tutorial."


First, we need to read our data file
```{r}
# Set the main folder location for your data
your_location = "C:/Users/Dan/Desktop/"
# The main data file, located in the main folder, is called... 
file_name = "sun.csv"

# Let's pull that data file into R and call it data_raw
data = read.csv(str_c(your_location, file_name))
View(data)
```

We need to set block to a categorical variable
```{r}
data$Block <- factor(data$Block, ordered = TRUE)
```


Now we will run a mixed effect model for SPAD. We will set block as a random effect. Plot is the fixed effect whose coefficent we are interested in.

```{r}
SPAD <- lmer(SPAD ~ Plot + (1|Block), data)
summary(SPAD)
SPAD = tidy(SPAD)
setnames(SPAD, "estimate", "SPAD")
write.csv(SPAD, file = "C:/Users/Dan/Desktop/SPAD.csv")
```

For the photosynthesis parameters (Phi2, PhiNPQ, PhiNO, qL, NPQt, and LEF) we will include sqrtPAR and TimeofDay has fixed effects. Again, block is set as a random effect.
```{r}
Phi2 <- lmer(Phi2 ~ sqrtPAR + TimeofDay + Plot + (1|Block), data)
summary(Phi2)
Phi2 = tidy(Phi2)
setnames(Phi2, "estimate", "Phi2")
write.csv(Phi2, file = "C:/Users/Dan/Desktop/Phi2.csv")

PhiNPQ <- lmer(PhiNPQ ~ sqrtPAR + TimeofDay + Plot + (1|Block), data)
summary(PhiNPQ)
PhiNPQ = tidy(PhiNPQ)
setnames(PhiNPQ, "estimate", "PhiNPQ")
write.csv(PhiNPQ, file = "C:/Users/Dan/Desktop/PhiNPQ.csv")

PhiNO <- lmer(PhiNO ~ sqrtPAR + TimeofDay + Plot + (1|Block), data)
summary(PhiNO)
PhiNO = tidy(PhiNO)
setnames(PhiNO, "estimate", "PhiNO")
write.csv(PhiNO, file = "C:/Users/Dan/Desktop/PhiNO.csv")

```

We wrote all of our results into a csv and saved them to our computer. Using these results we will create a new csv file that includes the coefficients of all of our PhotosynQ parameters and crop yield data (you can set the coefficient for plot A1 to 0). We will use this file, taking into account light intensity, time of day, and block, to re-run the correlation matrix we did in the beginning of the tutorial. Hopefully, we will identify more signficant relationships between PhotosynQ parameters and crop yield than we did using plot averages!


### Correlations using our new values

We need to read our new dataset, 'sun3.'
```{r}
# Set the main folder location for your data
your_location = "C:/Users/Dan/Desktop/"
# The main data file, located in the main folder, is called... 
file_name = "sun3.csv"

# Let's pull that data file into R and call it data_raw
data = read.csv(str_c(your_location, file_name))
View(data)
```

Now, we will generate a correlation matrix using crop yield and the coeffcients from the mixed effects model above.
```{r}
data <- data[c(4,5,6,7,8)]
head(data, 6)

res2 <- rcorr(as.matrix(data))
res2

```



After transforming our data using a mixed effects model, the correlations between PhotosynQ parameters and crop yield are much stronger than when using averages. While the SPAD correlation is unchanged, we now see a significantly positve correlation between PhiNPQ and crop yield and a signficantly negative corrleation between PhiNO and crop yield.