R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

summary(cars)
##      speed           dist       
##  Min.   : 4.0   Min.   :  2.00  
##  1st Qu.:12.0   1st Qu.: 26.00  
##  Median :15.0   Median : 36.00  
##  Mean   :15.4   Mean   : 42.98  
##  3rd Qu.:19.0   3rd Qu.: 56.00  
##  Max.   :25.0   Max.   :120.00

Including Plots

You can also embed plots, for example:

##   extra group ID
## 1   0.7     1  1
## 2  -1.6     1  2
## 3  -0.2     1  3
## 4  -1.2     1  4
## 5  -0.1     1  5
## 6   3.4     1  6
##    extra group ID
## 1    0.7     1  1
## 2   -1.6     1  2
## 3   -0.2     1  3
## 4   -1.2     1  4
## 5   -0.1     1  5
## 6    3.4     1  6
## 7    3.7     1  7
## 8    0.8     1  8
## 9    0.0     1  9
## 10   2.0     1 10
## 11   1.9     2  1
## 12   0.8     2  2
## 13   1.1     2  3
## 14   0.1     2  4
## 15  -0.1     2  5
## 16   4.4     2  6
## 17   5.5     2  7
## 18   1.6     2  8
## 19   4.6     2  9
## 20   3.4     2 10

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.

We found two groups in our data, but we are not sure whether the second group is our control group or a second medicine. This is not clarified clearly in the data description. So we are assuming that the second group is the control group, but we are not certain.

//OPDRACHTEN - EXERCISES Question 1) Describe the dataset by explaining the variables. If any what are the replicates and/or controls?

summary(sleep)
##      extra        group        ID   
##  Min.   :-1.600   1:10   1      :2  
##  1st Qu.:-0.025   2:10   2      :2  
##  Median : 0.950          3      :2  
##  Mean   : 1.540          4      :2  
##  3rd Qu.: 3.400          5      :2  
##  Max.   : 5.500          6      :2  
##                          (Other):8
sleep
##    extra group ID
## 1    0.7     1  1
## 2   -1.6     1  2
## 3   -0.2     1  3
## 4   -1.2     1  4
## 5   -0.1     1  5
## 6    3.4     1  6
## 7    3.7     1  7
## 8    0.8     1  8
## 9    0.0     1  9
## 10   2.0     1 10
## 11   1.9     2  1
## 12   0.8     2  2
## 13   1.1     2  3
## 14   0.1     2  4
## 15  -0.1     2  5
## 16   4.4     2  6
## 17   5.5     2  7
## 18   1.6     2  8
## 19   4.6     2  9
## 20   3.4     2 10
?sleep
## starting httpd help server ... done

See the code above, the table “sleep” contains 3 variables, namely: extra, group, ID. -“extra” stands for the extra hours of sleep a student who took a certain drug -“group” stands for the drug a student was given, with 2 different types of drugs -“ID” stands for the patient, one ID per patient/student

20 observations and 3 variables are contained within the table. Second group is the control group.

Question 2) Describe the performed experiment/ analysis. What was of interest/ what was the research question?

Students were given a drug, two seperate groups were created, in which the first group got a sleep prolongation drug, and second group was used as control group. The researchers were interested in the amount of prolonged sleep in hours.

Using ?sleep.

Question 3) Summarize the data statistically.

See the summary and t.test below in the next code snippet box

Question 4) Are there any clear outliers or missing values? If so, which ones?

The boxplot is not showing any outliers, so no outliers are visible.

require(stats)

#QUESTION THREE
summary(sleep)
##      extra        group        ID   
##  Min.   :-1.600   1:10   1      :2  
##  1st Qu.:-0.025   2:10   2      :2  
##  Median : 0.950          3      :2  
##  Mean   : 1.540          4      :2  
##  3rd Qu.: 3.400          5      :2  
##  Max.   : 5.500          6      :2  
##                          (Other):8
#Simple t.test
with(sleep, t.test(extra[group == 1], extra[group == 2], paired = TRUE))
## 
##  Paired t-test
## 
## data:  extra[group == 1] and extra[group == 2]
## t = -4.0621, df = 9, p-value = 0.002833
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -2.4598858 -0.7001142
## sample estimates:
## mean of the differences 
##                   -1.58
?sleep
?boxplot

#QUESTION FOUR
#Individual plots
boxplot(with(sleep, extra[group==1]))
boxplot(with(sleep, extra[group==2]), add=TRUE)

boxplot(with(sleep, extra[group==1], extra[group==2]), paired=TRUE)

#plot(with(sleep, extra[group==2]), add = TRUE)

boxplot(sleep$extra ~ sleep$group, xlab="Groups", ylab="Sleep prolongation (h)")
points(sleep$extra ~ sleep$group, pch = 1)

Question 5) Summarize the data graphically using only 1 figure. Choose a figure that summarizes the data adequately. Use the graphical package ggplot2. Explain your choice.

We have two groups and only one variable. So a boxplot is a logical choice for comparing both datasets.

library(ggplot2)

ggplot(sleep, aes(x=group, y=extra)) + geom_boxplot()

Question 6) Explore the data, include check for distribution/ normality range of variables mention number and type of variables and observations

#Checking for distribution using a QQplot
par(mfrow=c(2, 1)) #plots 2 graphs
with(sleep, hist(extra[group==1]))
with(sleep, hist(extra[group==2]))

No distribution seems to be visible (if so, then a bell curve would be expected)

Question 7) Calculate the mean sleep prolongation of the two drugs using the aggregate function on the sleep data frame.

#Calculate mean sleep
aggregate(formula = extra~group, data = sleep, FUN = mean)
##   group extra
## 1     1  0.75
## 2     2  2.33

See function above, results are 0.75 hours and 2.33 hours extra sleep for drug one and two respectively

Question 8) Calculate the difference in sleep prolongation based on drug type for each patient. Store this new dataset in a new data frame including patient number. Do this by writing a function with as input argument the sleep data frame and as output the new data frame.

my_dataframe <- data.frame(with(sleep, extra[group==1]-extra[group==2]))
my_dataframe
##    with.sleep..extra.group....1....extra.group....2..
## 1                                                -1.2
## 2                                                -2.4
## 3                                                -1.3
## 4                                                -1.3
## 5                                                 0.0
## 6                                                -1.0
## 7                                                -1.8
## 8                                                -0.8
## 9                                                -4.6
## 10                                               -1.4

See above.

Question 9) Is the sleep prolongation difference between the drugs (more) normaly distributed?

shapiro.test(sleep$extra)
## 
##  Shapiro-Wilk normality test
## 
## data:  sleep$extra
## W = 0.94607, p-value = 0.3114
hist(sleep$extra)

So the p-value here is 0.3114, meaning that it differs signifcantly from a normal distribution. We can not assume that the data is normaly distributed.

Question 10) Was there a significant difference in sleep prolongation between the drugs? Mention the effect size, is this of a relevant magnitude? Explain why.

Also explain why you chose this specific test.

ThisData <- data.frame(with(sleep, mean(extra[group==1])), with(sleep, mean(extra[group==2])))
ThisData
##   with.sleep..mean.extra.group....1...
## 1                                 0.75
##   with.sleep..mean.extra.group....2...
## 1                                 2.33
with(sleep, t.test(extra[group == 1], extra[group == 2], paired = TRUE))
## 
##  Paired t-test
## 
## data:  extra[group == 1] and extra[group == 2]
## t = -4.0621, df = 9, p-value = 0.002833
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -2.4598858 -0.7001142
## sample estimates:
## mean of the differences 
##                   -1.58

The mean between both drugs is very different, 0.75 to 2.33, implying a difference.

The t.test shows a difference of -1.58 between the test with a reliability of 0.002833.