Skip to Main Content
PolyU Library

Harnessing GenAI in Your Academic Journey


Research involves managing vast amounts of information, and GenAI offers new ways to handle it effectively. This guide introduces tools for research processes

3 Factors to Consider When Choosing a GenAI Tool

With the overwhelming number of GenAI tools, it can be challenging to determine which ones to use. After you have identified a task where AI can boost efficiency, match your goal to the right tool(s) through evaluating the following 3 factors:

  • Functionality (1)
    • Complex Tasks → Specialized tools or more accurate models
    • Simple Tasks → Generic tools, or faster models
    • Take reference from benchmarks to assess effectiveness of a large language model (LLM)
  • Trustworthiness (2)
    • Understand the tool's methodology for generating information and its inherent limitations
    • Check the data source to ensure its suitability for academic research; if the source is undisclosed, apply the CRAAP test and read laterally to evaluate the generated content
    • Verify how inputs are processed, encrypted, and stored, particularly for private, sensitive, or confidential data
    • Seek reviews and recommendations from peers, PolyU units, and online communities
  • Cost(3)
    • Analyze the subscription fee of the tool and weigh it against your budget
    • Consider the time investment needed for ease of use and learning

After selecting a tool, do not forget to try it yourself. Test tools hands-on to evaluate performance and time savings to ensure it integrates well with your workflow. Also, remember to keep exploring emerging tools that may outperform current options!

Evaluate AI generated content

While GenAI excels in information searching—such as speed, accessibility, and the ability to generate diverse perspectives, it is also essential to critically assess the output, as AI-generated information can be incorrect and may mislead users.

The CRAAP test is a simple tool to help you evaluate information sources, including AI generated content. It involves asking yourself questions across 5 key aspects to determine whether a source is suitable for your research or decision-making. Below are some suggested questions specifically focused on evaluating AI-generated information.

Criteria Description Questions
C - Currency Timeliness of information
  • Does the AI tool provide up-to-date or even current information?
  • How often is the AI model updated?
  • Does the AI tool have a knowledge cutoff date that affects the currency of its information?
R - Relevance Contextual fit
  • Have you consulted various sources, before determining this is one you will use?
  • Would you cite this in your academic research?
  • Who is the intended audience for the original source of information?
A - Authority Source credibility
  • Does the AI tool provide references for you to verify the information by reading original sources?
  • What are the sources that the AI tool relies on? Are they credible sources?
  • Is the AI tool developed by a reputable organization or individual?
  • If any sources are provided, does the website URL offer insights about the source? 
    .gov - a government site
    .edu - an educational site
    .com - a commercial site
    .org - an organization site
A - Accuracy Reliability of content
  • Is there a risk of hallucination, which the information generated is fabricated?
  • Can you verify the information through other sources?
  • Is the information complete for your purpose? 
P - Purpose Reason for existence
  • Any bias in the AI-generated content?
  • Does “garbage in, garbage out” (GIGO) apply?
    The quality of response is affected by the quality of training data and user input. Is the prompt well-engineered and free from bias?

Modified based on Evaluating Information - Applying the CRAAP Test By Meriam Library, California State University, Chico

LLM Benchmarks

Understanding how well an LLM performs across different functionalities enables you to select the more appropriate tool for your specific research needs. IBM explained LLM benchmarks as standardized frameworks assessing the performance of large language models (LLMs). These benchmarks facilitate the evaluation of LLM skills in different aspects, such as coding, common sense, reasoning, natural language processing, and machine translation

However, please also be aware of the limitations of benchmarks. Increasingly, leading models are achieving similar scores and overfitting certain benchmarks, which causes some benchmarks to lose their usefulness in distinguishing LLM capabilities.

The generator below consolidates some LLM benchmarks* of GenAI LLMs in PolyU GenAI. You can compare the benchmark scores to determine the most suitable GenAI tool for your work.

*Data retrieved from llm-stats.com


LLM Benchmark Visualizer

LLM Comparison

This tool enables you to compare the performance of the LLM models provided by PolyU. You can select specific models and benchmarks, then click 'Generate Chart' to visualize the results. Please note that some benchmark data are unavailable and no results will be shown.

Select Models:
Credit Consumed
Credit NOT Consumed
Select Benchmarks:
🧮 Math
Problems from high school mathematics competition (2024)
Problems from high school mathematics competition (2025)
Math problems with visual components (multimodal math)
💻 Coding / Software Engineering
Code generation, self-repair, test prediction, execution
Coding exercises across multiple languages (C++, Go, Java, etc.)
Solving GitHub issues automatically (human-validated subset)
🧠 Reasoning / General Knowledge
"Google-proof" questions in biology, physics, chemistry
Expert-contributed questions across disciplines
Advanced reasoning across multiple subjects
Multimodal questions from college-level materials across six disciplines
🧰 Tool Use / Agent Interaction
Real-world dynamic interactions in airline domain
Real-world dynamic interactions in retail domain