Quality of evidence

There are a number of different systems for evaluating the quality of evidence for a study, with a commonly used system called GRADE (Grading of Recommendations Assessment, Development, and Evaluation). Based on this system, the quality of evidence can be very low, low, moderate or high:

Very low	The true effect is probably markedly different from the estimated effect
Low	The true effect might be markedly different from the estimated effect
Moderate	The authors believe that the true effect is probably close to the estimated effect
High	The authors have a lot of confidence that the true effect is similar to the estimated effect

With the GRADE system, randomized controlled trials (RCTs) start off as high quality and everything else is low or very low. Study quality can be downgraded/upgraded based on strengths/limitations of the study.

Reasons for downgrading evidence quality include:

Risk of Bias: Flaws in study design or execution that could introduce errors.
Inconsistency: Discrepancy in results across studies, indicated by varying effect sizes or significant heterogeneity.
Indirectness: When the study population, interventions, outcomes, or comparators do not directly match the research question.
Imprecision: Evidence is considered less reliable if results are based on data with wide confidence intervals or a small number of events (small sample size or small number of affected patients).
Publication Bias: Suspected selective publication of studies, such as only those with positive outcomes.

Reasons for upgrading evidence quality include:

Large Magnitude of Effect: When the intervention demonstrates a large effect.
Dose-Response Gradient: Finding that higher exposures to the intervention lead to increased effects.
All Plausible Confounding: If after considering all possible confounding factors, a change is still found, then the evidence can be upgraded.

You can read more about GRADE in the GRADE handbook.