This section explores if and how AI-based automation tools can support systematic searching in evidence synthesis. While these tools may enhance efficiency, such as translating natural language into structured queries and refining search strategies, they may also present limitations in transparency, sensitivity, and specificity. This guide outlines their capabilities, limitations, and ethical considerations of using AI in evidence synthesis, especially in systematic searching, emphasizing the need for human oversight and expertise. As AI continues to evolve rapidly, staying informed is essential for its responsible and effective use in research.
✔AI tools like Elicit and Consensus use Large Language Models (LLMs) to retrieve and summarize literature based on natural language prompts. These tools are useful for scoping reviews and exploratory searches, especially when identifying knowledge gaps and trends, as well as comparing perspectives.
❌AI tools can omit details and oversimplify arguments, introducing bias and errors. Also, they lack Transparency, Sensitivity, and Specificity, making them unsuitable as replacements for systematic search strategies.
✔AI tools allow users to input questions in plain language. The questions are then automatically translated into structured syntax in the database, removing the barrier of needing advanced search skills with complex Boolean search strategies and truncations. Follow-up questions can be asked to explore topics further.
❌However, if a question is vague, ambiguous, or lacks specific keywords, the resulting query may be incomplete or imprecise. AI may also misinterpret context or nuances, leading to irrelevant or overly broad results.
Developing Search Strategies:
✔AI-generated strategies may offer a starting point in identifying relevant concepts and synonyms
❌but often require significant revision due to issues with sensitivity, precision, and hallucinated terms.
Reviewing & Improving Search Strategies:
✔AI tools can identify some errors (e.g. spelling, syntax, field tags)
❌but current tools are not reliable for full peer review.
Human expertise remains essential for evaluating logic, completeness, and appropriateness of search strategies.
✔AI tools can quickly translate keyword or free-text search strategies that stay the same across databases.
❌However, they may not be highly reliable in automating the process of finding the different controlled vocabularies (e.g. MeSH terms, Emtree terms) across databases.
✔Efficiency Gains: AI tools can speed up early-stage exploration with AI-generated literature summaries and reduce manual effort in tasks like deduplication and screening. They may also support iterative refinement and updating of searches.
❌Accuracy Concerns: AI-generated outputs often lack the precision and recall required for systematic reviews. Subscription-based databases also limit automation in running search strategies due to restricted API access.
Before using any AI tools, review carefully their terms and conditions—especially clauses related to data ownership, privacy, and data collection for training purposes.
Responsible use of AI tools involves applying AI where it adds value without compromising core principles of evidence synthesis: methodological rigor, integrity of synthesis, transparency, or reproducibility. Staying current with rapidly evolving AI technology is essential to understand the strengths, limitations and potential biases of these tools. Human oversight and human expertise are required to review, revise, contextualize, and validate outputs. After all, humans remain ultimately responsible for the evidence synthesis.
Any AI that makes or suggests judgements should be fully and transparently reported in the evidence synthesis script. In general, the following should be reported:
Artificial Intelligence (AI) is best used as a supportive and assistive tool rather than a replacement for expert searchers. When paired with librarians and information specialists, AI can accelerate the searching process, while human oversight is essential to ensure accuracy, relevance, and methodological rigor.
Clarivate. (2025). Evaluating the quality of generative AI output: Methods, metrics and best practices. https://clarivate.com/academia-government/blog/evaluating-the-quality-of-generative-ai-output-methods-metrics-and-best-practices/
Cochrane Training. (2025). Artificial Intelligence (AI) methods in evidence synthesis: Learning Live webinar series. https://training.cochrane.org/AI-in-evidence-synthesis-webinars
EBSCO. (2025, August 4). 5 Research Tool Features to Boost Efficiency and Combat MSL Burnout. https://www.ebsco.com/blogs/ebscopost/5-research-tool-features-boost-efficiency-and-combat-msl-burnout
Lieberum, J.-L., Toews, M., Metzendorf, M.-I., Heilmeyer, F., Siemens, W., Haverkamp, C., Böhringer, D., Meerpohl, J. J., & Eisele-Metzger, A. (2025). Large language models for conducting systematic reviews: on the rise, but not yet ready for use—a scoping review. Journal of Clinical Epidemiology, 181, 111746. https://doi.org/10.1016/j.jclinepi.2025.111746
Page, M. J., Moher, D., Bossuyt, P. M., Boutron, I., Hoffmann, T. C., Mulrow, C. D., Shamseer, L., Tetzlaff, J. M., Akl, E. A., Brennan, S. E., Chou, R., Glanville, J., Grimshaw, J. M., Hróbjartsson, A., Lalu, M. M., Li, T., Loder, E. W., Mayo-Wilson, E., McDonald, S., McKenzie, J. E. (2021). PRISMA 2020 explanation and elaboration: updated guidance and exemplars for reporting systematic reviews. Bmj, 372, n160. https://doi.org/10.1136/bmj.n160
(updated on 24 Sept 2025)