Biostatistics

Outside Resources

Random mixed effect model

Basic statistics using Stata

Outside Resources

https://www.learnpicu.com/basic-research-statistics
GraphPad - QuickCalcs: https://www.graphpad.com/quickcalcs/
Sample size calculator: https://www.calculator.net/sample-size-calculator.html
Clincalc - sample size: https://clincalc.com/stats/samplesize.aspx
Calculate sample size in RCT:
- https://pmc.ncbi.nlm.nih.gov/articles/PMC3256489/
- https://anesthesia.healthsci.mcmaster.ca/wp-content/uploads/2022/08/how-to-calculate-sample-size-in-randomized-controlled-trial.pdf
RiskCalc - sample size calculation: https://riskcalc.org/samplesize/
Biostatistics in Research - Manja V, Lakshminrusimha S. Principles of Use of Biostatistics in Research. Neoreviews. 2014 Apr 1;15(4):e133-e150. doi: 10.1542/neo.15-4-e133. PMID: 26229522; PMCID: PMC4517688.
GraphPad explanations
Numiqo - Understanding statistics
https://researchmethodsresources.nih.gov/methods/grt

Random mixed effect model

Basic statistics using Stata

import excel "-----ADDRESS on COMPUTER-----", sheet("NAME OF SHEET") firstrow clear

set more off = allows to stop from putting "more" each time and just present all the analysis

local variables .... , .... , .... = bank the list of variables you are interested in

summarize VARIABLE, detail == Summary with Median (25-75 IQR) for the whole population

summarize VARIABLE == Summary without Median (25-75 IQR) for the whole population

bysort GROUP: summarize VARIABLE, detail == Summary with Median (25-75 IQR) for the GROUP=1 and GROUP=0 (or each level of the GROUP, if the group is a category)

swilk VARIABLE = shapiro wilk test (to see if normally distributed or not). If normally distributed, your p will be 0.05 or more.

ttest VARIABLE, by(NAME of GROUP VARIABLE) = Student T-Test, if you have a normally related continuous variable.

ranksum VARIABLE, by(NAME of GROUP VARIABLE) = Mann–Whitney U test; for continous variable that are non-normally distributed to get the p-value for rank.

generate NAME_of_NEW_VARIABLE = logical format of interest.

Example:

generate large_weight = birth_weight > 4000 === every baby more than 4000 grams will be a "1" and those 4000 grams or less will be a 0
generate large_weight = birth_weight >= 4000 === every baby more or equal than 4000 grams will be a "1" and those less than 4000 grams will be a 0
generate low_gestation_or_low_birthweight = gestational_age_at_birth_w <=26 | birth_weight <=750 === babies that are 26 weeks or less; OR birthweight 750 grams or less will be classified as 1
generate low_gestation_or_low_birthweit2 = gestational_age_at_birth_w <=26 & birth_weight <=750 === babies that are 26 weeks or less; AND birthweight 750 grams or less will be classified as 1

tab VAR1 VAR2, exact col = Fischer Exact test and distribution of a categorical variable 1 by a categorical variable 2 with % based on the column

tab VAR1 = frequency of distribution in the whole population of the VAR1

logit BINARY VAR1 VAR2..., or==== Odds ratio for the multiple logistic regression of the outcomes (BINARY) by variable 1, 2, 3....

logit death rv_endo_gls, or === or for odds ration; logit for multiple logistic regression; outcome is death (1 or 0, where 1 is yes), rv_endo_gls is the variable of interest, which here is the RV strain value
reg birth_weight rv_endo_gls i.sex ==== multiple regression analysis of a continuous outcome (here birthweight) and its association with the RV strain, adjusted for sex (where sex is a categorical outcome so marked with a "i"; and must be coded numerically (1=male, 0=female for example).

Loops:

local variables var1 var2 var3 var4....
foreach var of local variables {
display "Running summary for variable: `var'"
summarize `var', detail
}
- The loop above will run the summary for all the variables listed in local variables and provide the detail.
foreach var of local variables {
display "Running TTEST for variable: `var'"
ttest `var', by(GROUP)
}
- The loop above will run the TTEST for all the variables listed in local variables by the GROUP.
foreach var of local variables {
display "Running details by group for variable: `var'"
bysort GROUP: summarize `var', detail
}
- The loop above will run the summary for all the variables listed in local variables and provide the detail, but this time for each group of interest.

Prepared by Alexie Fonta Holder - August, 19 2025

Manja V, Lakshminrusimha S. Principles of Use of Biostatistics in Research. Neoreviews. 2014 Apr 1;15(4):e133-e150. doi: 10.1542/neo.15-4-e133. PMID: 26229522; PMCID: PMC4517688.

nihms674777.pdf

Giavarina D. Understanding Bland Altman analysis. Biochem Med (Zagreb). 2015 Jun 5;25(2):141-51. doi: 10.11613/BM.2015.015. PMID: 26110027; PMCID: PMC4470095.

bm-25-141.pdf

Bensken WP, Pieracci FM, Ho VP. Basic Introduction to Statistics in Medicine, Part 1: Describing Data. Surg Infect (Larchmt). 2021 Aug;22(6):590-596. doi: 10.1089/sur.2020.429. PMID: 34270357; PMCID: PMC8851219.

sur.2020.429.pdf

Bensken WP, Ho VP, Pieracci FM. Basic Introduction to Statistics in Medicine, Part 2: Comparing Data. Surg Infect (Larchmt). 2021 Aug;22(6):597-603. doi: 10.1089/sur.2020.430. PMID: 34270362; PMCID: PMC8851223.

sur.2020.430.pdf