Improving the Methodological Quality of Single-Case Experimental Design Meta-Analysis

Improving the Methodological Quality of Single-Case Experimental Design Meta-Analysis Laleh Jamshidi1*, Lies Declercq1, John M. Ferron2, Mariola Moeyaert3, S. Natasha Beretvas4, and Wim Van den Noortgate1 1 Faculty of Psychology and Educational Sciences & imec-Itec, KU Leuven (University of Leuven), Belgium 2 University of South Florida, Tampa, Florida, USA 3 University at Albany – State University of New York, New York, USA 4 University of Texas at Austin, Texas, USA

In order to investigate a certain intervention effect, the classic research design is a group comparison experimental design. In this kind of designs, the participants are randomly assigned to either intervention or control groups and the means of one or more dependent variables are compared to assess the effectiveness of the intervention. In order to get reliable effect size estimates and reach an acceptable level of statistical power, a large sample size of study participants is required in these designs. Single-case experimental designs (SCEDs) are alternative research designs that do not require many participants (or cases) and therefore are well suited to be used for studying rare phenomena, e.g., specific diseases or disabilities [1][2][3] . In this kind of designs, outcomes of interest are measured repeatedly for one or multiple cases under at least two conditions (i.e., typically a control phase followed by an intervention phase). Within each specific case, the measurements are compared across conditions or phases to investigate whether introducing the intervention has a causal effect on one or more outcomes 2,[4][5][6][7] . SCEDs are frequently used in a variety of different fields such as psychology and educational sciences to evaluate the effectiveness of interventions of interest [7][8][9][10][11] .
Due to the small number of participants, the main issue of SCEDs is limited generalizability of their findings. To overcome this issue of generalizability, SCEDs can be replicated across participants, and systematic review (SR) approaches can be applied to synthesize the results 4,12,13 . A SR is a kind of literature review to identify, evaluate, and aggregate all relevant studies on the same topic. In order to decrease the possible systematic bias to answer particular research question(s), specific methods could be applied in SR 14 . A SR can include a meta-analysis (MA), which refers to a statistical integration of the findings from individual studies, typically by combining and comparing observed effect sizes 15 .
SCED data have specific features that should be taken into account while calculating effect sizes in individual studies and synthesizing the effect sizes in a meta-analysis afterwards; otherwise, biased estimates might be obtained and statistical inferences may be flawed. For instance, the outcome variable could systematically decrease or increase over time even without being exposed to any intervention. Such a time trend should be accounted for in calculating effect sizes 4,16 . Another feature that has to be considered is the possible presence of serial dependency or autocorrelation in which the sequential measurements are more similar compared to farther measurements, violating the assumption of independence 17,18 .
Conducting a SCED SR or MA could provide better insights into the overall effectiveness of interventions, as well as about factors that moderate the effect. Yet, poorly conducted SRs/MAs can lead to inaccurate inferences about the intervention effectiveness. Conclusions may be affected by deficiencies in designing, performing, and reporting these SRs/MAs. Therefore, it is important that users of SRs/ MAs results (e.g., clinicians, researchers, and policy makers) consider the methodological quality of these studies. One way to do this is by assessing their quality by means of a standardized tool. Such a tool may also be useful for metaanalysts and systematic reviewers to ensure that their studies are well designed, conducted, and reported. On top of giving insight into the specific strengths and weaknesses of a study, such a tool can also be useful to assess the quality in general, although there is a considerable debate over using a quantifiable summary score to assess and rate the quality. The results of our recent systematic review of 178 SCED MAs conducted between 1985 and 2015 19 indicate that according to the R-AMSTAR, a considerable percentage of studies scored low on methodological quality. This tool assesses the methodological quality based on 11 main items that are further operationalized by means of 44 criteria. In order to apply the scale to SCED MAs rather than to MAs of group comparison studies, we had to reformulate some of the criteria. The MAs scored relatively high regarding some aspects such as "providing the characteristics of the included studies" and "doing a comprehensive literature search". The main deficiencies were related to "reporting an assessment of the likelihood of publication bias" and "using the methods appropriately to combine the findings of studies". In that review of SCED MAs, the methodological quality was evaluated by applying the modified R-AMSTAR, but there are other tools available that can be used. In the review of Jamshidi et al. (in press) 19 , the R-AMSTAR was chosen because it was found more comprehensive and detailed compared to other tools and due to its ability to produce a quantifiable assessment of methodological quality. More details related to the choice of the R-AMSTAR and the modified items can be found in that paper. In the current review, we give an overview of some of the frequently used tools for either assessing or improving the quality of SRs/MAs, and discuss their appropriateness for SCED SRs/MAs. To the best of our knowledge, there is no specific validated tool to assess the quality of SCED MAs or SRs and further research to produce a validated tool would be quite beneficial.

Approaches to Evaluate and Improve the Quality of Systematic Reviews and Meta-Analyses
To avoid inaccurate conclusions that might mislead decision-makers, meta-analysts and systematic reviewers should try to decrease key methodological deficiencies 20-22 , such as not applying a random-effects model in case of heterogeneity, not assessing the likelihood of publication bias, or not assessing the scientific quality of included studies in formulating the conclusions, among others. Such kinds of deficiencies could also be expected to occur in SCED MAs and SRs. Conflicting results from SRs may confuse readers 23 , and make it more difficult for practitioners and clinicians to make appropriate inferences. In order for systematic reviews and meta-analyses to provide valid and reliable evidence for informing decisions in research and policy-making, these must strictly uphold high methodological standards 21,23-25 .
In addition, the users of SRs and MAs have a responsibility 26 : scientists, practitioners and clinicians should critically examine the methodological quality of a SR to avoid potentially misleading information when developing clinical decisions and guidelines 20,25,27,28 .
Several tools have been developed specifically to assess the quality of SRs and MAs by either those who are conducting MA/SR or also those who use the results of MAs and SRs, such as practitioners and clinicians. By applying such tools, meta-analysts could ensure their studies meet high standards of quality, while users could be more informed on the reliability of MA or SR when basing their decisions. Table 1 lists some of the more well-known and commonly used tools [28][29][30] , which have been specifically developed for assessing the quality of SRs/MAs and describes the basic features and guidance for their use (e.g. the purpose of the tool, the number of items, the items, and the judgement). Note that these tools are not specifically intended for meta-analyzing results from SCED studies. However, facets of these tools are useful and appropriate for judging the quality of SRs and MAs of SCED research studies. For each of these areas the checklist evaluates whether they are addressed in the systematic review or not. "Adequate" when the item had been fully addressed, "Partial" when some aspect was missing, and "None or unknown" when the item was not addressed Assessing the scientific quality of research overviews 10 items (9 individual items for assessing the quality and the last item is the overall assessment based on the first 9 items) Assessing a review's validity in terms of process rather than outcome. This tool can evaluate the potential threats to validity of this process. Clearly meets the criterion (scored as "yes"), clearly does not meet the criterion (scored as "no"), partially meet or is unclear whether it has met the criterion (scored as "partially") 34 Assessment of Multiple Systematic Reviews (AMSTAR) 28 Assessing the methodological quality of SRs Each individual item should be scored as one of the answers of "Yes", "No", "Can't answer", or "Not applicable"  Revised AMSTAR (R-AMSTAR) 35 Assessing the methodological quality of SRs Assessing the methodological quality of SRs in a quantifiable way. Each domain's score ranges from 1 to 4 (based on how many criteria were met) and the total score of R-AMSTAR would be calculated by summing the scores of all 11 domains and ranges from 11 to 44 Methodology checklist for SRs and MAs

domains
The same domains as AMSTAR, but domain 1 changed as follows: The research question is clearly defined and th e inclusion/ exclusion criteria must be listed in the paper. Domain 2 was divided in two separate domains as follows: At least two people should have selected studies. At least two people should have extracted data.
Most of the items are scored with "yes" and "no".
Other items have options such as "can't say" or/and "not applicable". The overall assessment of the study is judged as "high quality", "acceptable", "low quality", or "unacceptable". Assess the risk of bias in a SR

main domains with 21 items
Study eligibility criteria Five criteria, e.g., on clarity, relevance and the reflection of objectives, eligibility criteria, and restrictions on inclusion Identification and selection of studies Five criteria, e.g., on search strategy, searching in databases or any additional methods, restrictions for search and selection, and minimizing the errors in selection Data collection and study appraisal Five criteria, e.g., on minimizing the error in data collection, providing study characterisitics, collecting study results for synthesizing, assessing quality, and minimizing the risk of bias in assessment Synthesis and findings Six criteria, e.g., on synthesizing all the studies, following all the predefined analyses, addressing heterogeneity, checking sensitivity analysis, checking or addressing the biases in primary studies.
The items are scored as "Yes", "Probably Yes", "Probably No", "No" and "No Information", with "Yes" indicating low concerns. The subsequent level of concern about bias associated with each domain is then judged as "low," "high," or "unclear." Some of these tools focus on the description of the methodology and findings (e.g. PRISMA and QUOROM) and some others concentrate on methodological quality and evaluate how well the SR was designed and performed (e.g. AMSTAR, R-AMSTAR, OQAQ) 31 . Some of the abovementioned tools explicitly address that they can be used not only for conducting and reporting the MAs/SRs, but also for critical appraising published MAs/ SRs (e.g. Sack's checklist, PRISMA, QUOROM). Although it was not explicitly stated in descriptions of other tools whether they could be applied by meta-analysts and reviewers while they are performing and reporting their studies, we believe that being aware of criteria that might be used for critically appraising the quality of SRs/MAs could be helpful for researchers for designing, conducting, and reporting the results and conclusions of SRs/MAs. Table 2 gives a further comparison of the content of the reviewed tools. Some tools assess one aspect of methodological quality via one general item, whereas others use multiple detailed criteria. The comparison indicates that providing search strategy, validity/ quality assessment of primary studies, and checking the possibility of combining the results are the aspects that were considered in all reviewed tools.

Discussion
SRs and MAs are essential methods for aggregating the results of primary studies in a specific field. Nevertheless, the reliability and validity of their associated conclusions could be compromised by the risk of methodological flaws. Since limited generalizability is a key limitation of SCED studies in providing a source of information for practitioners and clinicians to make the best decisions and guidelines for practice, conducting high-quality SCED MAs/SRs is of the uttermost importance. The recent results of the review of methodological quality of SCED MAs 19 , indicate that improving the scientific quality of SCED MAs/SRs is necessary. Applying a validated tool (or using a modified tool or a combination of tools) consisting of methodological standards might be helpful for supporting meta-analysts/reviewers who are conducting studies  or might help users (e.g. clinicians, practitioners, and decision-makers) to appraise the quality of MA/SR that they are referencing. Because there is no validated tool to assess specifically the methodological quality of SCED MAs and SRs, applied researchers can use one of the existing tools or combinations of multiple tools, or better yet develop and validate a new tool to conduct high-quality MAs/SRs of SCED studies' results. Most of these tools could be used to evaluate the quality of SCED MAs/SRs because they mainly focus on general aspects of the methodological quality of studies that do not depend highly on primary studies included in the review.
However, it is possible that some of the detailed criteria of the reviewed tools require being modified, omitted, or added to make it more applicable for assessing SCED MAs as was done in the study by Jamshidi et al. (in press) 19 . For instance, based on the recommendations of What Works Clearinghouse (WWC) 2 for combining the results of multiple SCED studies into a single summary, MAs have to meet certain thresholds: 1) a minimum of five SCED studies examining the intervention that Meet Evidence Standards or Meet Evidence Standards with Reservations, 2) the SCED studies must be conducted by at least three different research teams at three different geographical locations, and 3) the aggregated number of experiments across the studies must total at least 20. Such criteria can help SCED meta-analysts ensure they are following some standards while conducting their own reviews. In addition, the possible features of SCED data such as time trend or serial dependency that might lead to overestimated or underestimated intervention effect should be taken into consideration in meta-analyses. None of the reviewed tools specifically took into account these SCED-specific recommendations and it might be because the tools were not developed for assessing the quality of SCED MAs in particular. These recommendations can be considered by either the meta-analysts or users while developing new tools or applying existing tools for assessing the methodological quality of SCED MAs.