174 lines
5.3 KiB
Plaintext
174 lines
5.3 KiB
Plaintext
|
---
|
||
|
title: "Case Study Sleep"
|
||
|
author: "Jonathan Herrewijnen"
|
||
|
date: "September 17, 2018"
|
||
|
output: word_document
|
||
|
---
|
||
|
|
||
|
```{r setup, include=FALSE}
|
||
|
knitr::opts_chunk$set(echo = TRUE)
|
||
|
```
|
||
|
|
||
|
## R Markdown
|
||
|
|
||
|
This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see <http://rmarkdown.rstudio.com>.
|
||
|
|
||
|
When you click the **Knit** button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
|
||
|
|
||
|
```{r cars}
|
||
|
summary(cars)
|
||
|
```
|
||
|
|
||
|
## Including Plots
|
||
|
|
||
|
You can also embed plots, for example:
|
||
|
|
||
|
```{r pressure, echo=FALSE}
|
||
|
plot(pressure)
|
||
|
|
||
|
#For knitting deactivated
|
||
|
#install.packages("data")
|
||
|
data(sleep)
|
||
|
head(sleep)
|
||
|
sleep
|
||
|
```
|
||
|
|
||
|
Note that the `echo = FALSE` parameter was added to the code chunk to prevent printing of the R code that generated the plot.
|
||
|
|
||
|
We found two groups in our data, but we are not sure whether the second group is our control group or a second medicine. This is not clarified clearly in the data description. So we are assuming that the second group is the control group, but we are not certain.
|
||
|
|
||
|
//OPDRACHTEN - EXERCISES
|
||
|
Question 1)
|
||
|
Describe the dataset by explaining the variables. If any what are the replicates and/or controls?
|
||
|
|
||
|
```{r}
|
||
|
summary(sleep)
|
||
|
sleep
|
||
|
?sleep
|
||
|
```
|
||
|
|
||
|
See the code above, the table "sleep" contains 3 variables, namely: extra, group, ID.
|
||
|
-"extra" stands for the extra hours of sleep a student who took a certain drug
|
||
|
-"group" stands for the drug a student was given, with 2 different types of drugs
|
||
|
-"ID" stands for the patient, one ID per patient/student
|
||
|
|
||
|
20 observations and 3 variables are contained within the table. Second group is the control group.
|
||
|
|
||
|
|
||
|
Question 2)
|
||
|
Describe the performed experiment/ analysis. What was of interest/ what was the research question?
|
||
|
|
||
|
Students were given a drug, two seperate groups were created, in which the first group got a sleep prolongation drug, and second group was used as control group. The researchers were interested in the amount of prolonged sleep in hours.
|
||
|
|
||
|
Using ?sleep.
|
||
|
|
||
|
Question 3)
|
||
|
Summarize the data statistically.
|
||
|
|
||
|
See the summary and t.test below in the next code snippet box
|
||
|
|
||
|
Question 4)
|
||
|
Are there any clear outliers or missing values? If so, which ones?
|
||
|
|
||
|
The boxplot is not showing any outliers, so no outliers are visible.
|
||
|
|
||
|
```{r}
|
||
|
require(stats)
|
||
|
|
||
|
#QUESTION THREE
|
||
|
summary(sleep)
|
||
|
|
||
|
#Simple t.test
|
||
|
with(sleep, t.test(extra[group == 1], extra[group == 2], paired = TRUE))
|
||
|
|
||
|
?sleep
|
||
|
?boxplot
|
||
|
|
||
|
#QUESTION FOUR
|
||
|
#Individual plots
|
||
|
boxplot(with(sleep, extra[group==1]))
|
||
|
boxplot(with(sleep, extra[group==2]), add=TRUE)
|
||
|
|
||
|
boxplot(with(sleep, extra[group==1], extra[group==2]), paired=TRUE)
|
||
|
#plot(with(sleep, extra[group==2]), add = TRUE)
|
||
|
|
||
|
boxplot(sleep$extra ~ sleep$group, xlab="Groups", ylab="Sleep prolongation (h)")
|
||
|
points(sleep$extra ~ sleep$group, pch = 1)
|
||
|
```
|
||
|
|
||
|
Question 5)
|
||
|
Summarize the data graphically using only 1 figure. Choose a figure that summarizes the data adequately. Use the graphical package ggplot2. Explain your choice.
|
||
|
|
||
|
We have two groups and only one variable. So a boxplot is a logical choice for comparing both datasets.
|
||
|
|
||
|
```{r}
|
||
|
library(ggplot2)
|
||
|
|
||
|
ggplot(sleep, aes(x=group, y=extra)) + geom_boxplot()
|
||
|
```
|
||
|
|
||
|
Question 6)
|
||
|
Explore the data, include check for distribution/ normality range of variables mention number and type of variables and observations
|
||
|
|
||
|
```{r}
|
||
|
#Checking for distribution using a QQplot
|
||
|
par(mfrow=c(2, 1)) #plots 2 graphs
|
||
|
with(sleep, hist(extra[group==1]))
|
||
|
with(sleep, hist(extra[group==2]))
|
||
|
```
|
||
|
|
||
|
No distribution seems to be visible (if so, then a bell curve would be expected)
|
||
|
|
||
|
Question 7)
|
||
|
Calculate the mean sleep prolongation of the two drugs using the aggregate function on the sleep data frame.
|
||
|
|
||
|
```{r}
|
||
|
#Calculate mean sleep
|
||
|
aggregate(formula = extra~group, data = sleep, FUN = mean)
|
||
|
```
|
||
|
|
||
|
See function above, results are 0.75 hours and 2.33 hours extra sleep for drug one and two respectively
|
||
|
|
||
|
Question 8)
|
||
|
Calculate the difference in sleep prolongation based on drug type for each patient.
|
||
|
Store this new dataset in a new data frame including patient number. Do this by writing a function with as input argument the sleep data frame and as output the new data frame.
|
||
|
|
||
|
```{r}
|
||
|
my_dataframe <- data.frame(with(sleep, extra[group==1]-extra[group==2]))
|
||
|
my_dataframe
|
||
|
```
|
||
|
|
||
|
See above.
|
||
|
|
||
|
Question 9)
|
||
|
Is the sleep prolongation difference between the drugs (more) normaly distributed?
|
||
|
|
||
|
```{r}
|
||
|
shapiro.test(sleep$extra)
|
||
|
hist(sleep$extra)
|
||
|
```
|
||
|
|
||
|
So the p-value here is 0.3114, meaning that it differs signifcantly from a normal distribution. We can not assume that the data is normaly distributed.
|
||
|
|
||
|
|
||
|
Question 10)
|
||
|
Was there a significant difference in sleep prolongation between the drugs? Mention the effect size, is this of a relevant magnitude? Explain why.
|
||
|
|
||
|
Also explain why you chose this specific test.
|
||
|
|
||
|
```{r}
|
||
|
ThisData <- data.frame(with(sleep, mean(extra[group==1])), with(sleep, mean(extra[group==2])))
|
||
|
ThisData
|
||
|
|
||
|
with(sleep, t.test(extra[group == 1], extra[group == 2], paired = TRUE))
|
||
|
```
|
||
|
|
||
|
The mean between both drugs is very different, 0.75 to 2.33, implying a difference.
|
||
|
|
||
|
The t.test shows a difference of -1.58 between the test with a reliability of 0.002833.
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|