Cheminformatics/Other/ICT Sleep Data Case Session.Rmd

174 lines
5.3 KiB
Plaintext

---
title: "Case Study Sleep"
author: "Jonathan Herrewijnen"
date: "September 17, 2018"
output: word_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## R Markdown
This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see <http://rmarkdown.rstudio.com>.
When you click the **Knit** button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
```{r cars}
summary(cars)
```
## Including Plots
You can also embed plots, for example:
```{r pressure, echo=FALSE}
plot(pressure)
#For knitting deactivated
#install.packages("data")
data(sleep)
head(sleep)
sleep
```
Note that the `echo = FALSE` parameter was added to the code chunk to prevent printing of the R code that generated the plot.
We found two groups in our data, but we are not sure whether the second group is our control group or a second medicine. This is not clarified clearly in the data description. So we are assuming that the second group is the control group, but we are not certain.
//OPDRACHTEN - EXERCISES
Question 1)
Describe the dataset by explaining the variables. If any what are the replicates and/or controls?
```{r}
summary(sleep)
sleep
?sleep
```
See the code above, the table "sleep" contains 3 variables, namely: extra, group, ID.
-"extra" stands for the extra hours of sleep a student who took a certain drug
-"group" stands for the drug a student was given, with 2 different types of drugs
-"ID" stands for the patient, one ID per patient/student
20 observations and 3 variables are contained within the table. Second group is the control group.
Question 2)
Describe the performed experiment/ analysis. What was of interest/ what was the research question?
Students were given a drug, two seperate groups were created, in which the first group got a sleep prolongation drug, and second group was used as control group. The researchers were interested in the amount of prolonged sleep in hours.
Using ?sleep.
Question 3)
Summarize the data statistically.
See the summary and t.test below in the next code snippet box
Question 4)
Are there any clear outliers or missing values? If so, which ones?
The boxplot is not showing any outliers, so no outliers are visible.
```{r}
require(stats)
#QUESTION THREE
summary(sleep)
#Simple t.test
with(sleep, t.test(extra[group == 1], extra[group == 2], paired = TRUE))
?sleep
?boxplot
#QUESTION FOUR
#Individual plots
boxplot(with(sleep, extra[group==1]))
boxplot(with(sleep, extra[group==2]), add=TRUE)
boxplot(with(sleep, extra[group==1], extra[group==2]), paired=TRUE)
#plot(with(sleep, extra[group==2]), add = TRUE)
boxplot(sleep$extra ~ sleep$group, xlab="Groups", ylab="Sleep prolongation (h)")
points(sleep$extra ~ sleep$group, pch = 1)
```
Question 5)
Summarize the data graphically using only 1 figure. Choose a figure that summarizes the data adequately. Use the graphical package ggplot2. Explain your choice.
We have two groups and only one variable. So a boxplot is a logical choice for comparing both datasets.
```{r}
library(ggplot2)
ggplot(sleep, aes(x=group, y=extra)) + geom_boxplot()
```
Question 6)
Explore the data, include check for distribution/ normality range of variables mention number and type of variables and observations
```{r}
#Checking for distribution using a QQplot
par(mfrow=c(2, 1)) #plots 2 graphs
with(sleep, hist(extra[group==1]))
with(sleep, hist(extra[group==2]))
```
No distribution seems to be visible (if so, then a bell curve would be expected)
Question 7)
Calculate the mean sleep prolongation of the two drugs using the aggregate function on the sleep data frame.
```{r}
#Calculate mean sleep
aggregate(formula = extra~group, data = sleep, FUN = mean)
```
See function above, results are 0.75 hours and 2.33 hours extra sleep for drug one and two respectively
Question 8)
Calculate the difference in sleep prolongation based on drug type for each patient.
Store this new dataset in a new data frame including patient number. Do this by writing a function with as input argument the sleep data frame and as output the new data frame.
```{r}
my_dataframe <- data.frame(with(sleep, extra[group==1]-extra[group==2]))
my_dataframe
```
See above.
Question 9)
Is the sleep prolongation difference between the drugs (more) normaly distributed?
```{r}
shapiro.test(sleep$extra)
hist(sleep$extra)
```
So the p-value here is 0.3114, meaning that it differs signifcantly from a normal distribution. We can not assume that the data is normaly distributed.
Question 10)
Was there a significant difference in sleep prolongation between the drugs? Mention the effect size, is this of a relevant magnitude? Explain why.
Also explain why you chose this specific test.
```{r}
ThisData <- data.frame(with(sleep, mean(extra[group==1])), with(sleep, mean(extra[group==2])))
ThisData
with(sleep, t.test(extra[group == 1], extra[group == 2], paired = TRUE))
```
The mean between both drugs is very different, 0.75 to 2.33, implying a difference.
The t.test shows a difference of -1.58 between the test with a reliability of 0.002833.