WeAudit TAIGA — ChenmiaoShi

Chenmiao Shi

DRAWING

Contacts

WeAudit TAIGA

OVERVIEW

This project focused on improving TAIGA, a platform designed to help everyday users identify issues in Generative AI systems. By conducting user research, testing, and prototyping, the team worked to make the platform more intuitive and effective, addressing challenges in its design and interaction to better meet user needs.

Class Project

05410 - User-Centered Research and Evaluation

Skills
Heuristic Evaluation, Data Analysis, Usability Testing, Affinity Diagramming, Prototyping

TABLE OF CONTENTS

Heuristic Evaluation and Synthesis
Qualtric Data Analysis
Usability Testing
Research and Modeling
Speed Dating
Lo-Fi Prototype and Final Testing

STEP 1 - Heuristic Evaluation

Conducted heuristic evaluation of WeAudit TAIGA tool using Neilsen's Ten Usability Heuristics.

A.
Each team member individually evaluated the website, then synthesized the evaluations to dicover insights.
We measured how WeAudit TAIGA align with/violate the heuristics, with green represent alignment and red represent misalignment.

B. Hueristic Evaluation Findings

1. Unclear Instructions
Key features like image generation and thread posting lack detailed guidance, leaving users unsure of how to proceed.

2. Inability to Cancel Actions
Time-consuming processes, such as posting and generating images, cannot be canceled or undone, causing frustration.

3. Confusing Core Concepts
Terms like "Stable Diffusion" are not well-introduced, making it difficult for users to grasp the platform’s purpose.

4. Navigation Issues
Limited menu guidance and inconsistent interface elements make it hard for users to navigate effectively.

5. Lack of Efficiency Tools
The absence of shortcuts and detailed help documentation slows down task completion and impacts user productivity.

6. Disconnected User Goals
Usability challenges prevent users from engaging meaningfully with the platform and achieving their goals. Addressing these issues is key to ensuring alignment between user tasks and platform objectives.

STEP 2 - Qualtric Data Analysis

We analyzed a survey dataset (N=2,197) provided by the WeAudit team at Carnegie Mellon University (March 20, 2023) to uncover patterns and challenges in detecting algorithmic biases. The goal was to guide the design of an inclusive and user-centered interface for WeAudit Taiga.

Our Approach Involved:

Understanding the Data: Reviewing user demographics and qualitative responses to identify key patterns.
Analyzing Structured and Unstructured Data: Using tools like word clouds and statistical comparisons to extract insights.
Visualizing Results: Creating charts and graphs to highlight trends and findings.

Structured
Data
Visualizations

A. Gender Effects on Bias Perception

This graph illustrates how different genders perceive the harmfulness of biases. Male participants were more likely to rate images as "not harmful," whereas female and LGBTQ+ participants were more likely to identify them as harmful. This highlights how experiences and awareness of societal biases influence perception.

B. Familarity and Awareness

The pie chart demonstrates the correlation between familiarity with algorithmic systems and awareness of societal biases. Participants with greater familiarity reported higher awareness, with 94% of familiar users identifying biases compared to 70% of unfamiliar users. This raises questions about causation and education.

Unstructured
Data
Visualization

C. Sentiment Analysis of Bias Categories

Sentiment scores for different types of algorithmic biases (e.g., gender, race, sexuality) were analyzed. The results were generally neutral but leaned slightly positive, suggesting that algorithmic outputs were not overtly discriminatory but lacked representation. Text data limitations may also have influenced these findings.

Key Results

Insights

Users familiar with algorithmic systems were better at identifying harmful biases, suggesting the importance of prior knowledge.
Perceptions of bias varied by cultural and geographical factors, indicating diverse user experiences.

Hypothesises

Generative AI systems may require tailored models to address user-specific biases effectively.
Transparent AI training data could improve trust and bias detection accuracy.

STEP 3 - Usability Testings

Our team develop a usability testing plan and script for WeAudit TAIGA, focusing on evaluating the platform’s user experience through a Think-Aloud study. Insights from this study would inform design improvements for the platform and broader applications in bias detection for generative AI systems.

A. Testing Plan & Script

Research Question

How intuitive is the platform’s navigation and functionality?
Can users easily identify and report AI biases?
What challenges do users face in completing key tasks?

Target Users

Everyday individuals who engage with AI systems and are likely to encounter algorithmic biases.

Assigned Tasks and Usability Metrics

Task 1: Exploring the Homepage
Action: Click a provided link to explore the homepage and share first impressions.
Questions:
What do you think this website is doing?
Are there features you have questions about?
Performance Goal: Complete within 30-40 seconds to ensure intuitive UI/UX design.

Task 2: Generating Images to Identify Bias
Action: Enter prompts into the system to generate images showcasing AI bias.
Questions:
What prompts might trigger AI bias?
What other prompts would you like to try?
Performance Goal: Generate the first image within 1-2 minutes; complete the task in 4-5 minutes.

Task 3: Creating and Posting a Thread
Action: Use generated images to create and post a thread.
Questions:
How intuitive is the process?
Were there challenges in selecting images or structuring the thread?
Performance Goal: Complete within 3 minutes without significant confusion.

Task 4: Reviewing Threads
Action: Review your thread and explore others on the platform.
Questions:
What do you think about your post and its impact?
What are your thoughts on other threads?
Performance Goal: Efficiently review and navigate posts, highlighting potential areas for improvement.

Script

After writing out the testing script according to the plan, we piloted the test with other groups to fix potential testing barriers.

B. Execution

Usability Testing
Each team member conducted usability testing sessions with target users, and recorded observations and user feedback.

Data Synthesis
Identified positive and negatives about the design using affinity diagramming.

C. Usability Findings

Three positive findings and three major usability issues.

D. Usability Insights

Overall, we found that there could be some design improvements in regards to the following Usability Heuristics.

Visibility of System Status
As seen in the drop down menu as well as some unclear instructions as to what the user is currently doing, we recommend that the designers particularly focus on the design of the physical features to ensure that a more unfamiliar user is not confused or lost at any point in the process.

User Control and Customization
Provide users with more control over their experience by implementing customizable features, such as editing/deleting unwanted prompts or prompts with typo, adjustable settings for the frequency and types of prompts they engage with, etc.

Efficiency and Engagement
It will be helpful to incorporate a more intuitive search functionality with clear parameters. Designer can also encourage user engagement by implementing interactive tutorials or guided tours that demonstrate the platform’s features.

STEP 4 - Research and Modeling

Synthesize findings from varied research method, begin to narrow in on a specific need within the project space, and then define your focus and research goals.

1. Synthesize by Walking the Wall

2. Define Goals and Desired Outcome

A. Abstraction Laddering

Our team first identified the core demand--”enhance TAIGA platform”. We each took individual paths upwards to uncover the driven users needs and other reasons for the enhancement. Subsequently we move downwards along each of our verticals and outlined the concrete solutions required to achieve our objectives.

"How might we design TAIGA’s digital environment to foster more meaningful interactions among users, thereby increasing engagement and satisfacntion?"

B. Contextual Research

a. Hypothesis

Primary:
Does interacting with other users encourage users to report AI biases or use AI reporting platforms like Taiga more often?

Secondary:
Can TAIGA make the process more simple and direct?

b. Research Method

We chose contextual inquiry as our research method, which we were able to gain detailed observations on user interaction and how it impact the platform usability.

c. Anticipated Challenge

One challenge we anticipate is users altering their behavior when interacting with the platform due to the presence of an observer, therefore affecting the observation results.

d. Research Preparation

Firstly, we prepared the research guide, consent script, and research script and revised based on pilot testings.
Then, each team member recruited one participant and scheduled time for interview.

e. Conducting the Research

For each interview session, we interviewed in pairs with one as the interviewer and other as the note taker.

f. Date Synthesis

After conducting all interview sessions, we collect all the interview notes, clustered the similar notes together, defined the groups, then continued clustering the groups into meaningful insights/concepts. [Affinity Clustering]
Additionally, we created [Empathy Map Canvas] and [Customer Journey Map] based on the research notes.

C. Research Insights & Report

Users are more likely to use TAIGA when there is a sense of community.
Users are looking for enhanced user interaction and moderation for
the platform.
When posting to a social platform, users are able to learn new
perspectives and methodologies in GenAI bias auditing.

Users expressed demands for quality contents that are more
in-depth and educational.
Users expressed need for sustained engagement through features
and updates.

[Research Report]

STEP 5-Speed Dating

We tried to identify the greatest area of uncertainty risk through methods including The Crazy 8s and Walk the Wall.

A. The Crazy 8s

We looked at the set of Crazy 8s individually then as a group to decide on a set of categories of user needs that we uncovered through the activity:

Live interaction
Share findings and insights with others
Providing recognition for users who are more active
Incentive to contribute to discussions
Nuanced interactions (more diverse interaction tools)
Categorisation of posts

B. Defining User Needs

Looking at the user needs we uncovered last step, we vote on the top needs/concepts that we will pursue during Speed Dating.
The result shows that we should pursue the following needs:

Share findings and insights with others
Providing recognition for users who are more active
Categorisation of posts
Incentive to contribute to discussions

C. Conducting Speed Dating

Each team member chose one of the user needs from last step and created a corresponding storyboard. The storyboard contains three scenarios that illustrate different methods of meeting user needs at different risk levels (Safe, Riskier, and Riskiest).

Speed dating sessions were conducted indivdually,
and participants will be presented with storyboards for each user needs.

After conducting the speed dating, the following user needs were validated:

to be recognised for being active.
to share their findings on WeAudit TAIGA with others.

STEP 6 - Lofi-Prototype

A. Prototype

Addressing the user needs that had been identified and validated, our solution is a daily challenge feature, offering personalized tasks with scaling difficulty. Points earned by completing the challenges can be exchanged for avatar designs and special status, encouraging continued use. This feature will encourage user interaction, foster community, and is tailored to users’ activity levels.

B. Prototype Testing Plan and Result

[Testing Plan]

Risk Assumptions and Measures to Evaluate Them
Risk Assumptions

Recognition for active users may increase engagement but could create perceptions of unfairness.
Live interactions might encourage participation but pose moderation challenges.
Incentives may drive engagement but risk promoting inauthentic behavior.

Evaluation Measures

Usability Testing: Assess ease of finding and understanding challenges.
Completion Rates: Monitor challenge completion trends.
Engagement Rates: Track daily active users.
Returning Rates: Measure user retention after initial engagement.

Honest Signals
Success:

Increased engagement and consistent usage.
Users find social features gratifying and validating.

Failure:

Lack of interest in rewards and inconsistent usage.
Users avoid social features.

Participants and settings

Participants: AI-familiar individuals new to WeAudit TAIGA.
Testing Period: April 22-29, 2024.
Location: Reserved Tepper classroom.

[Testing Result]

Chenmiao Shi
sylsylshi33@gmail.com