2. Generating random normal distributions for variables in the dataset
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.1.2
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5 v purrr 0.3.4
## v tibble 3.1.5 v dplyr 1.0.7
## v tidyr 1.1.4 v stringr 1.4.0
## v readr 2.0.2 v forcats 0.5.1
## Warning: package 'ggplot2' was built under R version 4.1.2
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
start_weight <- rnorm(n=30, mean=33.5, sd=11)
delta_weight_1 <- rnorm(n=15, mean=2.9, sd=1.8)
delta_weight_2 <- rnorm(n=15, mean=2.9, sd=1.8)
delta_weight <- c(delta_weight_1, delta_weight_2)
treatment <- c(rep('A', 15),rep('B', 15))
3. Formatting into a data frame
sample sizes, means, and standard deviations were drawn directly from the data set I used in HW6 pertaining to a dietary experimental treatment to see how it affects sea star biomass.
S <- data.frame(treatment, start_weight, delta_weight)
S <- mutate(.data = S, end_weight = start_weight + delta_weight)
S
## treatment start_weight delta_weight end_weight
## 1 A 43.822020 0.4835810 44.305601
## 2 A 19.458313 2.3399052 21.798218
## 3 A 26.182849 4.6069136 30.789762
## 4 A 47.302912 1.2927389 48.595651
## 5 A 35.356280 1.6405651 36.996845
## 6 A 37.461265 2.8623616 40.323627
## 7 A 7.697095 1.2452992 8.942394
## 8 A 40.765986 3.8700006 44.635987
## 9 A 28.049277 1.6011701 29.650447
## 10 A 36.183842 1.2791809 37.463023
## 11 A 46.212124 4.1159798 50.328104
## 12 A 40.214683 -0.6190749 39.595608
## 13 A 37.239988 1.6445456 38.884533
## 14 A 48.280210 5.1442362 53.424447
## 15 A 34.378930 3.5900107 37.968940
## 16 B 37.295184 5.0272279 42.322412
## 17 B 41.626548 1.7553283 43.381876
## 18 B 21.064475 1.4363871 22.500862
## 19 B 34.372670 3.5388323 37.911503
## 20 B 24.537295 2.3189837 26.856279
## 21 B 39.288685 1.4995790 40.788264
## 22 B 48.422636 3.2876683 51.710304
## 23 B 26.778920 3.4398655 30.218785
## 24 B 49.383181 6.7228718 56.106053
## 25 B 10.071664 3.7257295 13.797394
## 26 B 33.941488 4.5141678 38.455656
## 27 B 36.502637 2.3957632 38.898400
## 28 B 46.953216 5.2667922 52.220008
## 29 B 44.806838 5.1836046 49.990443
## 30 B 53.561719 5.1833850 58.745104
4) T.test between starfish arm_length and starting_weight
t.test(data = S, end_weight ~ treatment)
##
## Welch Two Sample t-test
##
## data: end_weight by treatment
## t = -0.6062, df = 27.652, p-value = 0.5493
## alternative hypothesis: true difference in means between group A and group B is not equal to 0
## 95 percent confidence interval:
## -11.741229 6.381208
## sample estimates:
## mean in group A mean in group B
## 37.58021 40.26022
Graphical output of linear model of the data
G <- ggplot(S, aes(x=treatment, y=end_weight)) +
geom_boxplot()
print(G)
Multiple runs with the same parameters generate a decent degree of variability in the model. the random normal distributions change each treatment every time new numbers are generated, sometimes ending in a statistical difference between the treamtne groups and sometimes not, depending on their distributions. this is with each group drawing from seperate normal distributions but with the same parameters.
target parameter = delta_weight. After running this multiple times, the means do not need to be changed for there to be a significant difference p < 0.05. because they are drawn from different distributions (although with the same parameters) there is enough deviation with the parameters chosen that statistical differences can be seen sometimes, though rarely. Adjusting the means by which weight is changed in each treatment from 2.9 to 3.5~4.0 for treatment B results in significant differences between the two treatment groups about 50% of the time. raising the mean to around 4.5 for treamtnet B returns significant different between groups most of the time and raising the mean to 5 bring it almost always. this leaves about a 2 pound difference in means results in signifficant differences in the end weight of the starfish in this treatment.
start_weight <- rnorm(n=30, mean=33.5, sd=11)
delta_weight_1 <- rnorm(n=15, mean=2.9, sd=1.5)
delta_weight_2 <- rnorm(n=15, mean=5, sd=1.5)
delta_weight <- c(delta_weight_1, delta_weight_2)
treatment <- c(rep('A', 15),rep('B', 15))
S <- data.frame(treatment, start_weight, delta_weight)
S <- mutate(.data = S, end_weight = start_weight + delta_weight)
t.test(data = S, delta_weight ~ treatment)
##
## Welch Two Sample t-test
##
## data: delta_weight by treatment
## t = -2.6057, df = 27.999, p-value = 0.01452
## alternative hypothesis: true difference in means between group A and group B is not equal to 0
## 95 percent confidence interval:
## -3.0315992 -0.3630155
## sample estimates:
## mean in group A mean in group B
## 3.389628 5.086935
G <- ggplot(S, aes(x=treatment, y=end_weight)) +
geom_boxplot()
print(G)
For this secton I will keep the mean1=2.9 and mean2=5 for significant generated for N=15 (if they had the same mean it would not be significant no matter the N).
start_weight <- rnorm(n=20, mean=33.5, sd=11)
delta_weight_1 <- rnorm(n=10, mean=2.9, sd=1.5)
delta_weight_2 <- rnorm(n=10, mean=5, sd=1.5)
delta_weight <- c(delta_weight_1, delta_weight_2)
treatment <- c(rep('A', 10),rep('B', 10))
S <- data.frame(treatment, start_weight, delta_weight)
S <- mutate(.data = S, end_weight = start_weight + delta_weight)
t.test(data = S, delta_weight ~ treatment)
##
## Welch Two Sample t-test
##
## data: delta_weight by treatment
## t = -2.0727, df = 17.307, p-value = 0.05344
## alternative hypothesis: true difference in means between group A and group B is not equal to 0
## 95 percent confidence interval:
## -2.65579840 0.02177254
## sample estimates:
## mean in group A mean in group B
## 3.248129 4.565142
G <- ggplot(S, aes(x=treatment, y=end_weight)) +
geom_boxplot()
print(G)
Testing it with multiple runs, a sample size of 10 seems as low as we can go to get significance most of the time. once samples drop below 10 with the current parameters used, it returns mostly P>0.5.