There are a number of different systems for evaluating the quality of evidence for a study, with a commonly used system called GRADE (Grading of Recommendations Assessment, Development, and Evaluation). Based on this system, the quality of evidence can be very low, low, moderate or high:
| Very low | The true effect is probably markedly different from the estimated effect |
| Low | The true effect might be markedly different from the estimated effect |
| Moderate | The authors believe that the true effect is probably close to the estimated effect |
| High | The authors have a lot of confidence that the true effect is similar to the estimated effect |
With the GRADE system, randomized controlled trials (RCTs) start off as high quality and everything else is low or very low. Study quality can be downgraded/upgraded based on strengths/limitations of the study.
Reasons for downgrading evidence quality include:
- Risk of Bias: Flaws in study design or execution that could introduce errors.
- Inconsistency: Discrepancy in results across studies, indicated by varying effect sizes or significant heterogeneity.
- Indirectness: When the study population, interventions, outcomes, or comparators do not directly match the research question.
- Imprecision: Evidence is considered less reliable if results are based on data with wide confidence intervals or a small number of events (small sample size or small number of affected patients).
- Publication Bias: Suspected selective publication of studies, such as only those with positive outcomes.
Reasons for upgrading evidence quality include:
- Large Magnitude of Effect: When the intervention demonstrates a large effect.
- Dose-Response Gradient: Finding that higher exposures to the intervention lead to increased effects.
- All Plausible Confounding: If after considering all possible confounding factors, a change is still found, then the evidence can be upgraded.
You can read more about GRADE in the GRADE handbook.