Research

Upcoming Events

Vascular Theory Exams 2025

01/01/2025

View Event

Find Your Algorithm 2025

27/02/2025

View Event

CSVS Research Study Day 2025

27/06/2025

View Event

8. Choosing a Statistical Test

Choosing a statistical test

Choosing the appropriate statistical test(s) for a study can be a daunting task when you are new to clinical research. There are many, more comprehensive resources available on each statistical test, so the focus of this article will be to provide a flavour of a few statistical tests to get you started. We hope this will help you start the process of considering what type of analysis might be necessary to answer the clinical questions posed in a research study. At the end of the article, there are some brief signposts for support with planning your research statistics.

Questions to consider

What is the main study hypothesis?

If your study has a proposed hypothesis then a statistical test is likely to be required to determine the significance of the answer to the clinical question, for example “is modality B a suitable replacement for modality A for measurement X?”, “does factor X influence variable Y?”, or “which of modalities A, B or C is most sensitive for identifying condition X?” etc. The answer to your research question can be tested to check whether it is likely to be ‘real’ (i.e. statistically significant - unlikely to have occurred by chance) by using an appropriate statistical test.

What type of data is being used in the study?

In some cases, questions regarding the statistical validity of study data may need to be assessed prior to choosing a statistical test. These questions may include “what type of data are being investigated?”, “are the data independent”, “are the data normally distributed?” and “what is the study power using this data group?”.

What degree of statistical significance should be used?

Most clinical studies use a probability value (P-value) of P <0.05 to indicate statistical significance of results, which means that there is a <5% probability that the results obtained are due chance, and a >95% probability that the results obtained are the result of a true relationship or difference between groups being compared. However, in some studies it may be appropriate to use a different P-value, for example in studies where multiple independent hypothesis are being tested it may be appropriate to use a Bonferroni correction, where the appropriate significance level is P < 0.05/n, where n represents the number of independent hypotheses. For example, in a study with two independent hypotheses, using a Bonferroni correction would provide a required P-value of P <0.025.

Examples of statistical tests in existing studies

In this article, three clinical studies have been briefly analysed to demonstrate the appropriate statistical test for a particular clinical hypothesis, and to show how these tests are used to answer the question. Only the main statistical test used in answering the study hypothesis has been included for the sake of brevity, so this article will not cover analysing independence of study variables or normalisation of data. The aim of this article’s analysis is to demonstrate how some common clinical hypotheses are answered in published research and thus provide an idea of how you might proceed with study design when proposing similar types of research.

Statistical aim: comparing the level of agreement between two different measurement modalities

Statistical test: Bland-Altman analysis

Bland-Altman analysis is a test designed to determine the level of agreement between two different modalities that are measuring the same variable. If one modality is considered the ‘gold-standard’ modality for this type of measurement, then Bland-Altman can be used to assess the level of accuracy of the second modality by comparing it to the gold-standard method. Bland Altman analysis involves graphically plotting the measurement differences between the two modalities on the y-axis, against the mean of the two measurements on the x-axis. The average measurement difference is obtained which provides a ‘bias’; the smaller the bias the higher the level of agreement between the two modalities.

Next, ‘limits of agreement’ are obtained by calculating ±1.96 standard deviations from the bias; 95% of the differences between modalities should ideally lie within these limits. Additionally, ‘limits of acceptability’ should also be agreed upon beforehand and these can also be inputted into the graph: if the limits of agreement fall within the limits of acceptability, then the two modalities can be said to demonstrate an acceptable level of agreement and could in theory be used interchangeably to obtain similar results when measuring that variable in future.

In the following study, Bland-Altman analysis was used to demonstrate that tomographic 3D-ultrasound (tUS) and duplex ultrasound (DUS) both demonstrate good agreement with fistulography for identifying and measuring the degree of stenosis within an arteriovenous fistula (AVF).

Rogers et al. (2021). Arteriovenous Fistula Surveillance Using Tomographic 3D Ultrasound. Eur J Vasc Endovasc Surg. 62(1), pp.82-88.

Study aim:

To investigate the level of agreement between tUS, DUS and fistulography for identifying and measuring AVF stenosis, with the aim of determining whether tUS is a suitable replacement for DUS in the assessment of AVF stenosis in future, as tUS is significantly less time-consuming and therefore would be beneficial for department workflow and reducing ultrasound-related musculoskeletal disorders.

Study summary:

97 patients with a poor-flow arteriovenous fistula (AVF) underwent imaging with fistulography, tUS and DUS, which identified 101 stenoses for analysis. The degree of stenosis was measured and Bland-Altman analysis was performed to assess the level of agreement between each ultrasound modality and fistulography, which was considered the gold-standard measurement modality. Bland-Altman analysis demonstrated close agreement between fistolography and DUS / tUS, with tUS showing slightly better agreement, indicating that all three measurement modalities are interchangeable when measuring degree of AVF stenosis. However, tUS has the additional benefits of being non-invasive unlike fistulography, and takes less than half the time to perform compared to DUS, indicating that tUS is a promising modality for obtaining non-invasive, fast and accurate AVF stenosis measurements.

Figure 1. Bland-Altman agreement for (A) duplex ultrasound (DUS) and (B) tomographic 3D-ultrasound (tUS) compared with fistulography as the gold-standard and (C) tUS compared with DUS as the gold-standard in the measurement of arteriovenous fistula (AVF) stenosis. D = standard deviation; LOA = limit of agreement.

Statistical aim: investigating differences in means between independent groups

Statistical test: One-way ANOVA

One-way analysis of variance (ANOVA) is a statistical test used to compare the means between independent variables in a study and determine whether any of the means are statistically significant from each other. In this study, one-way ANOVA has been used to demonstrate that there is a statistically significant increase in mean AAA growth rate (GR) as AAA size increases. One-way ANOVA has also been used in this study to demonstrate that there was no statistically significant difference in mean AAA GR between non-smokers and smokers or ex-smokers, between genders, and between normotensive or hypertensive patients. This shows that for this patient cohort the most significant factor affecting AAA GR is AAA size, and thus the authors suggest AAA size should be taken into consideration when determining AAA surveillance intervals.

Ian Hornby-Foster. (2023). Abdominal aortic aneurysm growth rates in patients undergoing local ultrasound surveillance. Ultrasound. 31(1), pp.23-32.

Study aim:

A retrospective analysis of abdominal aortic aneurysm (AAA) ultrasound surveillance in University Hospitals Bristol and Weston (UHBW), with the aim of assessing AAA growth rate (GR) and the concurrent impact of AAA risk factors (RFs) and associated medications, to inform whether the current UHBW AAA surveillance protocol is safe and appropriate.

Study summary:

315 patients comprising 1312 AAA scans were investigated, with exclusion criteria including aortic diameter measurements <3.0cm or >5.5cm, and patients who had fewer than 2 AAA scans. The patients were divided into groups of 0.5cm increments (3.0 – 3.4cm, 3.5 – 3.9cm, 4.0 – 4.4cm, 4.5 – 4.9cm, 5.0 – 5.5cm), based on baseline AAA size. Annual GR between groups was compared using one-way ANOVA. One-way ANOVA was also used to investigate the influence of risk factors on AAA GR.

Mean GR for all patients was 0.25cm per year, however one-way ANOVA demonstrated a significant increase in GR with increasing AAA diameter. One-way ANOVA also demonstrated that there was no statistically significant impact of age, smoking, gender, hypertension, or hypercholesterolaemia on AAA GR for this patient cohort. However, there was a significant difference between the mean growth rate of diabetic and non-diabetic patients, suggesting an inverse relationship of diabetes presence and AAA GR.

Figure 2. Mean annual AAA growth rates (cm/year) with error bars indicating top and bottom end 95% confidence intervals. AAA GR can be seen to increase with AAA diameter.

Statistical aim: investigating sensitivity and specificity of a measurement modality

Statistical test: receiver operating characteristic (ROC) curve analysis

A ROC curve is a graphical plot that illustrates the performance of a binary classifier model at varying threshold values, by plotting true positive rate vs false positive rate at each threshold. A higher area under the curve (AUC) indicates higher specificity and sensitivity for this classification model, i.e. this model is more likely to provide a true positive result and less likely to provide a false positive result. In the study below, ROC curves are used to demonstrate that intra-arterial fractional flow reserve (FFR) measurement has the highest sensitivity and joint highest specificity (with translesional pressure measurement (Pd/Pa)) for predicting presence of critical limb-threatening ischaemia (CLTI).

Albayati et al. (2024). Intra-arterial Fractional Flow Reserve Measurements Provide an Objective Assessment of the Functional Significance of Peripheral Arterial Stenoses. Eur J Vasc Endovasc Surg. 67(2), pp.332 - 340.

Study aim:

To use fractional flow reserve (FFR) to investigate the ischaemic potential of peripheral arterial stenoses, and compare this technique to other methods of investigating stenosis: ankle brachial pressure index (ABPI), duplex ultrasound (DUS), CT angiography (CTA), translesional pressure measurement (Pd/Pa).

Study summary:

61 isolated iliac or superficial femoral artery stenoses in 41 patients (10 patients with bilateral disease) with either short-distance claudication or CLTI were recruited prior to elective angioplasty and/or stenting. Pre-procedural investigations (resting and exercise ABPI, DUS peak systolic velocity ratio (PSVR), and CTA were performed; intravascular Doppler derived flow reserve and pressure derived FFR were obtained during angioplasty. Blood oxygen level dependent (BOLD) cardiovascular magnetic resonance (CMR) was performed before and after angioplasty to assess calf oxygenation.

Association between variables and disease severity was assessed using ROC curve analysis, which showed that a lower FFR AUC was associated with CLTI in the cohort studied. The degree of lesional stenosis measured by CTA, ABPI and PSVR had weaker associations with CLTI than FFR. FFR demonstrated the highest sensitivity and joint highest specificity for predicting CLTI in this cohort.

Figure 3. Association between standard of care assessments and intra-arterial pressure-flow measurements with CLTI. ROC curve analysis with corresponding AUC, 95% confidence interval (CI) and sensitivity (Sens) and specificity Spec) values displayed. FFR demonstrates the greatest AUC for association with CLTI in this cohort.

Research support resources

Only a handful of statistical tests have been covered in this article, hopefully providing you with a starting point for how you might begin to consider testing your own research questions and data. There is a much wider range of statistical tests to choose from, which will warrant careful consideration to select the correct statistical method for your research. During the design phase of your research, it is valuable to consider the type and structure of data you intend to generate as it may impact how you use statistics during analysis. For support, try contacting your local research & development department and asking whether they have an associated statistician. Additionally, take a look at the support offered through the National Institute for Health Research (NIHR):

Written by Ben Warner-Michel (Kingston Hospital, London)

Edited by Isaac Colliver (University Hospitals Coventry & Warwickshire, Coventry)

Upcoming Events

Vascular Theory Exams 2025

01/01/2025

View Event

Find Your Algorithm 2025

27/02/2025

View Event

CSVS Research Study Day 2025

27/06/2025

View Event

Member Login

Search

Research

8. Choosing a Statistical Test