Guides & Tutorials: Harnessing GenAI in Your Academic Journey: GenAI Tools for Research

Research involves managing vast amounts of information, and GenAI offers new ways to handle it effectively. This guide introduces tools for research processes

Literature Discovery and Evaluation
Tools designed to retrieve relevant scholarly information, aiding the iterative literature review process.
Chatbot Tools
Chatbots utilizing different large language models (LLMs) for various language tasks, such as refining research questions, checking grammar, polishing language, and translation, or even generic information search.
Literature Mapping Tools
Emerging tools that help researchers to visualize and map relationships between papers, simplifying navigation of complex research fields.
Other Research Utility Tools
Beyond those offered by PolyU, numerous AI-integrated tools are available to assist in research tasks such as literature comprehending, information organizing and academic writing. Explore some popular tools with free plans.

3 Factors to Consider When Choosing a GenAI Tool

With the overwhelming number of GenAI tools, it can be challenging to determine which ones to use. Start by considering the following three factors to guide your decision:

Functionality (1)
- Identify tasks that could be enhanced, such as writing, coding, data analysis, or literature review
- Select the tool that specialize in features relevant to your identified task
- Take reference from benchmarks to assess effectiveness of a large language model (LLM)
- Test the tool yourself to evaluate the usability and how well it meets your specific needs
Trustworthiness (2)
- Evaluate the tool's methodology for generating information and its inherent limitations
- Verify the data source to ensure its suitability for academic research; if the source is undisclosed, apply the CRAAP test to evaluate the generated content
- Ensure the tool securely encrypts the information you input, particularly for private, sensitive, or confidential data
- Seek feedback and reviews from other users for reference
Cost(3)
- Consider the time investment needed for ease of use and learning
- Analyze the subscription fee of the tool and weigh it against your budget

Evaluate AI generated content

While GenAI excels in information searching—such as speed, accessibility, and the ability to generate diverse perspectives, it is also essential to critically assess the output, as AI-generated information can be incorrect and may mislead users.

The CRAAP test is a simple tool to help you evaluate information sources, including AI generated content. It involves asking yourself questions across 5 key aspects to determine whether a source is suitable for your research or decision-making. Below are some suggested questions specifically focused on evaluating AI-generated information.

Criteria	Description	Questions
C - Currency	Timeliness of information	Does the AI tool provide up-to-date data? Is the information provided current? How often is the AI model updated? Does the AI tool have a knowledge cutoff date that affects the currency of its information?
R - Relevance	Contextual fit	Who is the intended audience? Have you specified in your prompt? Have you consulted various sources, before determining this is one you will use? Would you cite this in your academic research?
A - Authority	Source credibility	Is the AI tool developed by a reputable organization or individual? What are the sources that the AI tool relies on? Are they credible sources? Does the AI tool provide evidence or references? Can you verify the information by reading the original sources? If any sources are provided, does the website URL offer insights about the source? .gov - a government site .edu - an educational site .com - a commercial site .org - an organization site
A - Accuracy	Reliability of content	Is there a risk of hallucination, which the information generated is fabricated? Can you verify the information through other sources? Is the information complete for your purpose?
P - Purpose	Reason for existence	Any bias in the AI-generated content? Does “garbage in, garbage out” (GIGO) apply? The quality of response is affected by the quality of training data and user input. Is the prompt well-engineered and free from bias?

Modified based on Evaluating Information - Applying the CRAAP Test By Meriam Library, California State University, Chico

LLMs Benchmarks

Understanding how well an LLM performs across different functionalities enables you to select the more appropriate tool for your specific research needs. IBM explained LLM benchmarks as standardized frameworks assessing the performance of large language models (LLMs). These benchmarks facilitate the evaluation of LLM skills in different aspects, such as coding, common sense, reasoning, natural language processing, and machine translation.

The table below consolidates some LLM benchmarks* of GenAI LLMs in PolyU GenAI. You can compare the benchmark scores to determine the most suitable GenAI tool for your work.

*Data retrieved from llm-stats.com

Model
DeepSeek-R1	90.8%	84.0%	71.5%	30.1%	79.8%	-	-	-
Llama-3.3-70B-Instruct	86.0%	68.9%	50.5%	-	-	77.0%	91.1%	88.4%
Mistral	84.0%	-	-	-	-	-	-	92.0%
GPT-o1	91.8%	-	78.0%	47.0%	83.3%	96.4%	89.3%	88.1%
GPT-4o	88.0%	74.7%	53.6%	38.2%	13.4%	-	-	-
GPT-4o-mini	82.0%	-	40.2%	-	-	70.2%	87.0%	87.2%
GPT-o3-mini	86.9%	-	79.7%	15.0%	87.3%	97.9%	92.0%	-
Qwen2.5-72B-Instruct	-	71.1%	49.0%	-	-	83.1%	-	86.6%

Remarks:

Knowledge & Reasoning benchmarks: MMLU, MMLU-Pro, GPQA, SimpleQA

Math benchmarks: AIME 2024, MATH, MGSM

AIME 2024: Challenging problems from high school mathematics competition.
MATH: A dataset of competition-level math problems across 5 levels & 7 disciplines.
MGSM: 250 grade-school math problems.

Coding benchmarks: HumanEval

Other LLMs Benchmarking Websites

	2766-6863
	2766-6863 (service hours)
	Online Form
	Contact your Faculty Librarians on in-depth research questions