Great available resources from the web regarding basic statistical testing
import excel "-----ADDRESS on COMPUTER-----", sheet("NAME OF SHEET") firstrow clear
set more off = allows to stop from putting "more" each time and just present all the analysis
local variables .... , .... , .... = bank the list of variables you are interested in
summarize VARIABLE, detail == Summary with Median (25-75 IQR) for the whole population
summarize VARIABLE == Summary without Median (25-75 IQR) for the whole population
bysort GROUP: summarize VARIABLE, detail == Summary with Median (25-75 IQR) for the GROUP=1 and GROUP=0 (or each level of the GROUP, if the group is a category)
swilk VARIABLE = shapiro wilk test (to see if normally distributed or not). If normally distributed, your p will be 0.05 or more.
ttest VARIABLE, by(NAME of GROUP VARIABLE) = Student T-Test, if you have a normally related continuous variable.
ranksum VARIABLE, by(NAME of GROUP VARIABLE) = Mann–Whitney U test; for continous variable that are non-normally distributed to get the p-value for rank.
generate NAME_of_NEW_VARIABLE = logical format of interest.
Example:
generate large_weight = birth_weight > 4000 === every baby more than 4000 grams will be a "1" and those 4000 grams or less will be a 0
generate large_weight = birth_weight >= 4000 === every baby more or equal than 4000 grams will be a "1" and those less than 4000 grams will be a 0
generate low_gestation_or_low_birthweight = gestational_age_at_birth_w <=26 | birth_weight <=750 === babies that are 26 weeks or less; OR birthweight 750 grams or less will be classified as 1
generate low_gestation_or_low_birthweit2 = gestational_age_at_birth_w <=26 & birth_weight <=750 === babies that are 26 weeks or less; AND birthweight 750 grams or less will be classified as 1
tab VAR1 VAR2, exact col = Fischer Exact test and distribution of a categorical variable 1 by a categorical variable 2 with % based on the column
tab VAR1 = frequency of distribution in the whole population of the VAR1
logit BINARY VAR1 VAR2..., or==== Odds ratio for the multiple logistic regression of the outcomes (BINARY) by variable 1, 2, 3....
logit death rv_endo_gls, or === or for odds ration; logit for multiple logistic regression; outcome is death (1 or 0, where 1 is yes), rv_endo_gls is the variable of interest, which here is the RV strain value
reg birth_weight rv_endo_gls i.sex ==== multiple regression analysis of a continuous outcome (here birthweight) and its association with the RV strain, adjusted for sex (where sex is a categorical outcome so marked with a "i"; and must be coded numerically (1=male, 0=female for example).
Loops:
local variables var1 var2 var3 var4....
foreach var of local variables {
display "Running summary for variable: `var'"
summarize `var', detail
}
The loop above will run the summary for all the variables listed in local variables and provide the detail.
foreach var of local variables {
display "Running TTEST for variable: `var'"
ttest `var', by(GROUP)
}
The loop above will run the TTEST for all the variables listed in local variables by the GROUP.
foreach var of local variables {
display "Running details by group for variable: `var'"
bysort GROUP: summarize `var', detail
}
The loop above will run the summary for all the variables listed in local variables and provide the detail, but this time for each group of interest.