Data Point 10 TESTS OF CORRELATION A HICS INITIATIVE

                                                                                                         


Tests of Correlation

Correlation analysis is a cornerstone of medical statistics, used to quantify the strength and direction of association between two variables. In clinical research, it helps investigators understand whether changes in one biological, physiological, or behavioral measure are associated with changes in another. Correlation does not imply causation, but it provides essential groundwork for hypothesis generation, risk factor identification, and model building.

This write‑up provides a comprehensive overview of the major tests of correlation used in medical statistics, their assumptions, interpretation, and practical applications.


1. Understanding Correlation

Correlation refers to the degree to which two variables move together. The correlation coefficient is a numerical index ranging from –1 to +1, where:

  • +1 indicates a perfect positive linear relationship
  • –1 indicates a perfect negative linear relationship
  • 0 indicates no linear relationship

Correlation is typically used when both variables are continuous, though rank‑based methods extend its use to ordinal data.

Correlation coefficients are dimensionless, making them easy to compare across studies and contexts.


2. Why Correlation Matters in Medical Research

Correlation analysis is widely used in:

  • Epidemiology (e.g., association between BMI and blood pressure)
  • Clinical trials (e.g., correlation between drug dose and biomarker response)
  • Diagnostics (e.g., correlation between two measurement methods)
  • Public health (e.g., correlation between pollution levels and respiratory symptoms)

It helps researchers:

  • Identify potential risk factors
  • Explore dose–response relationships
  • Validate measurement tools
  • Support predictive modeling

3. Major Tests of Correlation

A. Pearson’s Correlation Coefficient (r)

Purpose: Measures the strength and direction of a linear relationship between two continuous variables.

Assumptions:

  • Both variables are continuous
  • Relationship is linear
  • Data are normally distributed
  • No significant outliers

Interpretation:

  • r close to ±1 → strong correlation
  • r close to 0 → weak or no correlation

Example: Relationship between systolic blood pressure and age.

Pearson’s r is the most commonly used correlation measure in medical research.


B. Spearman’s Rank Correlation Coefficient (ρ)

Purpose: Measures the strength and direction of a monotonic relationship between two ranked or ordinal variables.

When to Use:

  • Data are ordinal
  • Relationship is non‑linear but monotonic
  • Outliers are present
  • Variables violate normality assumptions

Example: Correlation between pain scores (ordinal) and functional disability.

Spearman’s correlation is robust and widely used in clinical studies involving subjective scales.


C. Kendall’s Tau (τ)

Purpose: Measures the strength of association between two ranked variables.

Advantages:

  • More robust to ties than Spearman’s ρ
  • Better for small sample sizes

Example: Agreement between two clinicians ranking disease severity.


D. Point‑Biserial Correlation

Purpose: Measures the relationship between one continuous variable and one binary variable.

Example: Correlation between gender (binary) and cholesterol level.


E. Phi Coefficient (φ)

Purpose: Measures correlation between two binary variables.

Example: Association between presence/absence of smoking and presence/absence of chronic cough.


F. Partial Correlation

Purpose: Measures the correlation between two variables while controlling for one or more additional variables.

Example: Correlation between BMI and blood pressure after adjusting for age.


4. Testing the Significance of a Correlation

To determine whether an observed correlation is statistically significant, researchers use a correlation significance test.

The test evaluates whether the correlation coefficient is significantly different from zero.

Inputs:

  • Observed correlation coefficient (r)
  • Sample size (n)

Outputs:

  • p‑value
  • 95% confidence interval for the correlation coefficient

If p < 0.05, the correlation is considered statistically significant.


5. Interpreting Correlation Coefficients

A commonly used rule of thumb:

  • 0.00–0.19: Very weak
  • 0.20–0.39: Weak
  • 0.40–0.59: Moderate
  • 0.60–0.79: Strong
  • 0.80–1.0: Very strong

However, interpretation must consider:

  • Clinical context
  • Sample size
  • Measurement error
  • Biological plausibility

Correlation strength alone does not determine clinical importance.


6. Common Pitfalls and Misuse

Misuse of correlation is widespread in medical research.

A. Inferring Causation

Correlation does not imply causation. Confounding variables may create spurious associations.

B. Ignoring Non‑Linear Relationships

Pearson’s r only captures linear relationships; non‑linear patterns may be missed.

C. Outliers

A single extreme value can dramatically alter correlation coefficients.

D. Using Correlation with Categorical Data

Correlation is inappropriate unless using specialized coefficients (phi, point‑biserial).

E. Over‑interpreting Weak Correlations

Statistically significant correlations may be clinically irrelevant.


7. Practical Applications in Medicine

A. Biomarker Research

Correlation helps validate biomarkers by comparing them with gold‑standard measures.

B. Diagnostic Method Comparison

Correlation is used to assess agreement between two measurement tools, though Bland–Altman analysis is preferred for agreement.

C. Epidemiological Studies

Correlation helps identify potential risk factors for disease.

D. Pharmacology

Correlation between drug concentration and therapeutic effect.

E. Public Health Surveillance

Correlation between environmental exposures and disease incidence.


8. Choosing the Right Correlation Test

Scenario Best Test
Two continuous, normally distributed variables Pearson’s r
Two ordinal variables Spearman’s ρ or Kendall’s τ
Continuous + binary variable Point‑biserial
Two binary variables Phi coefficient
Controlling for confounders Partial correlation

9. Reporting Correlation in Medical Papers

A complete correlation report includes:

  • Type of correlation test
  • Correlation coefficient (r, ρ, τ)
  • p‑value
  • 95% confidence interval
  • Sample size
  • Scatterplot (recommended)
  • Clinical interpretation

Example:

Pearson’s correlation between age and systolic blood pressure was r = 0.45, p < 0.001, indicating a moderate positive association.


10. Conclusion

Correlation analysis is a powerful and widely used tool in medical statistics. When applied correctly, it provides valuable insights into relationships between variables, supports hypothesis generation, and informs clinical decision‑making. However, researchers must choose the appropriate test, respect assumptions, and avoid common pitfalls to ensure valid and meaningful results.


         

         





Comments

Popular posts from this blog

Physiology Note - Respiratory Mechanics during Positive Pressure Ventilation

Physiology Note 1: Perfusion Pressures

Physiology note 2 : Cerebral Autoregulation