Test tubes and microscope

Office of Surgical Research | Statistical Support

Statistical Support

The OSR provides statistical support to all Department of Surgery members, residents, and medical students supervised within the department. We ask anyone who requests biostatistical support to complete a short survey. Completion of the survey allows us to triage requests and help determine the level of support required. The OSR will contact you after your submission is received and reviewed.

If you require more information or have questions, please contact the OSR Biostatistician.


Why Should a Statistician be Involved in Research Design?

There are many benefits to involving a statistician in the process of research design. The statistician can help define and develop your research question, create surveys and questionnaires, determine the optimal sample size, reduce sampling bias, and help create a well-formatted dataset. The statistician will work with you towards the successful completion of the project.

The following information is provided to help guide researchers through sample size calculations, data collection, and data set submissions. Please note that this information is provided only as a guide and does not replace proper consultation with the biostatistician.

Sample Size Calculation

These questions will be asked when a sample size calculation request is received. The statistician will clarify the questions during your first meeting, but we suggest reviewing them before proceeding with data collection.

  1. What is your desired level of significance? Most researchers use 95% or higher.
  2. What is your desired power? Most researchers use 80% or higher. Learn more about statistical power.
  3. What test are you going to use? The test you use depends on the type of dependent and independent variables you have. Consult with the statistician if you are not sure.
  4. What is your expected effect size? Learn more about effect size
  5. What percentage of patients do you estimate will drop out?
  6. What is the proportion of samples in your groups? Sometimes it is easier to get more samples from one group than another group, but this affects the required sample size.

Review the article Power Analysis & Sample Size Estimation to learn more about sample size calculations.

Data Submission

Please use the following guidelines when submitting datasets. The statistician will review your spreadsheet for potential errors, but following the data submission guidelines will help save time.

  1. Group similar items in the same column. All items should use a similar format and the same unit of measurement.
  2. Each row should represent a unique subject (e.g. patient).
  3. Check for duplicate columns. If you have two columns with the same name, check for matching entries before you delete a column.
  4. Check for missing data. Was the information not applicable, or was it missing because it was not recorded or because you could not find it? Ideally, do not leave any cell blank. Instead, fill the cell with "not applicable" or "missing."
  5. Check for data ranges. Does the data make sense? (e.g. negative numbers for time).

 

Watch Your Missing Data

Missing data can occur in almost any dataset. They can be produced during research design or during data collection. In the statistical analysis of data, it is important to understand the nature of missing data and to deal with it accordingly. This ensures we will minimize bias and increase statistical power. There are three main types of missing data:

  • Missing completely at random (MCAR) is when the probability of missing data is not related to any other parameter and data is missing by pure chance.
  • Missing at random (MAR) is when there is a reason behind missingness that can be identified through other observed variables. For example, a geologist is not sampling a specific unit in a gold exploration project because he is certain that there is no gold associated with that geologic unit.
  • Missing not at random (MNAR) is when missingness depends on information that has not been recorded. For example, a certain cancer is associated with smoking, but the data on whether or not patients smoked is not recorded.

There are two main mechanisms for dealing with missing data:

  • Discard the missing values
  • Impute the missing values

Discarding data is an easy approach but should be used with caution and under certain conditions. If a large percentage of data is missing, discarding it results in reducing statistical power. Also, if there is a reason behind missingness, discarding it results in bias. For example, if males are less likely to answer questions about depression status compared to females, then discarding missing values will introduce bias.

Missing data can be imputed through simple approaches like mean imputation, last value carried forward, using information from related observations, or based on logical rules. These methods need to be used with sound judgment. For example, it makes sense to impute a missing temperature from an hour before, but it does not make sense to impute someone’s blood pressure based on a previous patient’s blood pressure. Also, it is important to note that these methods can reduce standard error of estimates toward zero.

The more complicated techniques to impute data include random imputation, regression, random regression, matching and hot-deck imputations, as well as multiple imputations.

Reference: http://www.stat.columbia.edu/~gelman/arm/missing.pdf