Virginia Tech
Browse

LLM-Based Heuristic Evaluations of High- and Low-Fidelity Prototypes (GPT-4o)

Download all (2.8 MB)
dataset
posted on 2025-05-14, 17:14 authored by Sehrish Basir NizamaniSehrish Basir Nizamani

This dataset contains structured usability evaluation data for a range of user interface prototypes, assessed using Nielsen’s 10 heuristic principles. It includes:

  • Prototype_Metadata.csv – Metadata for each prototype, specifying its name, fidelity level (high or low), domain, and source URL.
  • IRR_Results.csv – Inter-Rater Reliability (IRR) results comparing human evaluator agreement across ten heuristics.
  • EvaluationTemplate.rtf – A standardized evaluation template used by both human raters and GPT-4o to ensure consistent heuristic assessment.
  • LLM_Evaluation folder – Structured evaluations conducted by GPT-4o across two phases (LLM Evaluation 1 and LLM Evaluation 2), each divided into high-fidelity and low-fidelity prototypes. Each prototype file contains issue identification, severity ratings (0–4), and heuristic-based recommendations.

This dataset supports research on AI-assisted usability evaluation, enabling comparison between LLM and human assessments, and analysis of prototype usability across domains and fidelity levels.

History

Publisher

University Libraries, Virginia Tech

Location

Blacksburg, Virginia

Corresponding Author Name

Sehrish Basir Nizamani

Corresponding Author E-mail Address

sehrishbasir@vt.edu

Files/Folders in Dataset and Description

Prototype_Metadata.csv – Contains metadata details for various prototypes, including attributes such as Prototype, Prototype Fidelity, Domain and Source URL IRR_Results.csv – This file contains inter-rater reliability (IRR) data for prototype evaluations, including comparison scores across ten heuristics (H1–H10) for multiple designs, along with links to prototypes and rater agreement metrics. EvaluationTemplate.rtf – A structured template used for heuristic evaluations, guiding LLMs or human evaluators to assess prototypes based on Nielsen’s usability principles. LLM_Evaluation – Contains structured heuristic evaluation templates. LLM Evaluation 1 – First round of LLM-conducted evaluations High Fidelity – Contains GPT-4o evaluations of high-fidelity design prototypes: HospitalApp2.docx – Evaluation of a hospital app prototype (version 2)    HospitalApp.docx – Evaluation of the initial hospital app prototype    Running.docx – Evaluation of a running/fitness tracking app    Travel.docx – Evaluation of a travel booking or planning prototype    Gaia.docx – Evaluation of a community-focused application (GAIA)    Baking shop.docx – Evaluation of a bakery shop prototype    Food delivery.docx – Evaluation of a food delivery app    Housing.docx – Evaluation of a housing or real estate service app    E-Tutor.docx – Evaluation of an e-learning/tutoring app    Notification.docx – Evaluation of a notification management interface Low Fidelity – Contains GPT-4o evaluations of low-fidelity wireframes or early designs: Real Estate.docx – Evaluation of a real estate platform prototype   E-commerce.docx – Evaluation of a shopping/e-commerce wireframe   Artwork.docx – Evaluation of an artwork or gallery website prototype   Healthcare.docx – Evaluation of a healthcare-related design   Restaurant.docx – Evaluation of a restaurant service interface   Membership.docx – Evaluation of a membership or subscription model   Fintech.docx – Evaluation of a financial technology service   Blog.docx – Evaluation of a blogging or content-sharing site   Beer Garden.docx – Evaluation of a beer garden/bar concept design   Pastry Shop.docx – Evaluation of a pastry shop prototype    Voluntary Pet.docx – Evaluation of a pet adoption or volunteer app   Quizzy.docx – Evaluation of a quiz or educational game interface LLM Evaluation 2 – Second round of structured LLM evaluations High Fidelity – GPT-4o evaluations of newer or revised high-fidelity designs: Hospital App 1.docx – Evaluation of a hospital app prototype (version 1)    Travel.docx – Evaluation of a travel-related interface   Food Order.docx – Evaluation of a food ordering service   Bakery.docx – Evaluation of a bakery or pastry app   Housing.docx – Evaluation of a housing/rental application   E tutor.docx – Evaluation of an e-tutoring app (slightly different naming)   Hospital App 2.docx – Follow-up evaluation of the hospital app   Running app.docx – Fitness tracking app interface evaluation   Notification.docx – Notification system interface evaluation   GAIA community app.docx – Community-focused app evaluation Low Fidelity – GPT-4o evaluations of early-stage low-fidelity sketches or wireframes: E-commerce.docx – Shopping platform evaluation    Healthcare.docx – Healthcare app prototype evaluation    Artwork website.docx – Art-focused website prototype    Membership.docx – Subscription/membership service design    FinTech.docx – Financial service prototype    Voluntary Pet.docx – Volunteer or pet-related app   Food Delivery.docx – Food delivery low-fidelity wireframe    Education app.docx – Educational platform prototype   Beer Garden.docx – Social/bar concept evaluation    Blog App.docx – Blog platform early design   Pastry shop.docx – Pastry shop concept wireframe