Guides & Tutorials: Systematic Search for Systematic Review: Artificial Intelligence (AI) in evidence synthesis

Artificial Intelligence (AI) in evidence synthesis

(How) Can AI-Based Automation Tools Support Systematic Searching?

This section explores if and how AI-based automation tools can support systematic searching in evidence synthesis. While these tools may enhance efficiency, such as translating natural language into structured queries and refining search strategies, they may also present limitations in transparency, sensitivity, and specificity. This guide outlines their capabilities, limitations, and ethical considerations of using AI in evidence synthesis, especially in systematic searching, emphasizing the need for human oversight and expertise. As AI continues to evolve rapidly, staying informed is essential for its responsible and effective use in research.

1. Identifying Knowledge Gaps and Trends

2. Natural Language to Structured Queries

3. Developing, Reviewing & Improving Search Strategies

4. Translating Search Strategies Across Databases

5. Efficiency vs. Accuracy

6. Ethical and Responsible Use

1. Identifying Knowledge Gaps and Trends

✔AI tools like Elicit and Consensus use Large Language Models (LLMs) to retrieve and summarize literature based on natural language prompts. These tools are useful for scoping reviews and exploratory searches, especially when identifying knowledge gaps and trends, as well as comparing perspectives.

❌AI tools can omit details and oversimplify arguments, introducing bias and errors. Also, they lack Transparency, Sensitivity, and Specificity, making them unsuitable as replacements for systematic search strategies.

Transparency: how clearly the AI tool shows what it did for clarity and reproducibility—what sources it searched, how it selected results, and how its outputs were generated
Sensitivity: how good the search was at being comprehensive of getting all of the relevant citations.
- High sensitivity ensures comprehensive coverage of the literature
- Low sensitivity may result in missing literature, potentially compromising the completeness of the evidence base
Specificity: how well the search did at identifying only the relevant studies and not getting false hits or inappropriate articles
- High specificity reduces the number of irrelevant studies, making screening more efficient
- Low specificity can result in a high volume of irrelevant studies, requiring significantly more screening time and effort

2. Natural Language to Structured Queries

✔AI tools allow users to input questions in plain language. The questions are then automatically translated into structured syntax in the database, removing the barrier of needing advanced search skills with complex Boolean search strategies and truncations. Follow-up questions can be asked to explore topics further.

❌However, if a question is vague, ambiguous, or lacks specific keywords, the resulting query may be incomplete or imprecise. AI may also misinterpret context or nuances, leading to irrelevant or overly broad results.

3. Developing, Reviewing & Improving Search Strategies

Developing Search Strategies:

✔AI-generated strategies may offer a starting point in identifying relevant concepts and synonyms

❌but often require significant revision due to issues with sensitivity, precision, and hallucinated terms.

Reviewing & Improving Search Strategies:

✔AI tools can identify some errors (e.g. spelling, syntax, field tags)

❌but current tools are not reliable for full peer review.

Human expertise remains essential for evaluating logic, completeness, and appropriateness of search strategies.

4. Translating Search Strategies Across Databases

✔AI tools can quickly translate keyword or free-text search strategies that stay the same across databases.

❌However, they may not be highly reliable in automating the process of finding the different controlled vocabularies (e.g. MeSH terms, Emtree terms) across databases.

5. Efficiency vs. Accuracy

✔Efficiency Gains: AI tools can speed up early-stage exploration with AI-generated literature summaries and reduce manual effort in tasks like deduplication and screening. They may also support iterative refinement and updating of searches.

❌Accuracy Concerns: AI-generated outputs often lack the precision and recall required for systematic reviews. Subscription-based databases also limit automation in running search strategies due to restricted API access.

6. Ethical and Responsible Use

Before using any AI tools, review carefully their terms and conditions—especially clauses related to data ownership, privacy, and data collection for training purposes.

Responsible use of AI tools involves applying AI where it adds value without compromising core principles of evidence synthesis: methodological rigor, integrity of synthesis, transparency, or reproducibility. Staying current with rapidly evolving AI technology is essential to understand the strengths, limitations and potential biases of these tools. Human oversight and human expertise are required to review, revise, contextualize, and validate outputs. After all, humans remain ultimately responsible for the evidence synthesis.

Any AI that makes or suggests judgements should be fully and transparently reported in the evidence synthesis script. In general, the following should be reported:

Name, version and dates that the AI tools were used
Purpose of using AI
Parts of synthesis that were impacted by AI

Conclusion

Artificial Intelligence (AI) is best used as a supportive and assistive tool rather than a replacement for expert searchers. When paired with librarians and information specialists, AI can accelerate the searching process, while human oversight is essential to ensure accuracy, relevance, and methodological rigor.

References

Clarivate. (2025). Evaluating the quality of generative AI output: Methods, metrics and best practices. https://clarivate.com/academia-government/blog/evaluating-the-quality-of-generative-ai-output-methods-metrics-and-best-practices/

Cochrane Training. (2025). Artificial Intelligence (AI) methods in evidence synthesis: Learning Live webinar series. https://training.cochrane.org/AI-in-evidence-synthesis-webinars

EBSCO. (2025, August 4). 5 Research Tool Features to Boost Efficiency and Combat MSL Burnout. https://www.ebsco.com/blogs/ebscopost/5-research-tool-features-boost-efficiency-and-combat-msl-burnout

Lieberum, J.-L., Toews, M., Metzendorf, M.-I., Heilmeyer, F., Siemens, W., Haverkamp, C., Böhringer, D., Meerpohl, J. J., & Eisele-Metzger, A. (2025). Large language models for conducting systematic reviews: on the rise, but not yet ready for use—a scoping review. Journal of Clinical Epidemiology, 181, 111746. https://doi.org/10.1016/j.jclinepi.2025.111746

Page, M. J., Moher, D., Bossuyt, P. M., Boutron, I., Hoffmann, T. C., Mulrow, C. D., Shamseer, L., Tetzlaff, J. M., Akl, E. A., Brennan, S. E., Chou, R., Glanville, J., Grimshaw, J. M., Hróbjartsson, A., Lalu, M. M., Li, T., Loder, E. W., Mayo-Wilson, E., McDonald, S., McKenzie, J. E. (2021). PRISMA 2020 explanation and elaboration: updated guidance and exemplars for reporting systematic reviews. Bmj, 372, n160. https://doi.org/10.1136/bmj.n160

(updated on 24 Sept 2025)