Usability Testing Protocols: How to Find What's Actually Breaking Your Website

The step-by-step process for running usability tests that uncover real problems—not just confirm what you already believe

The Myth of Expensive Testing

Nielsen Norman Group's landmark research proved something revolutionary: testing with just 5 users finds 85% of usability problems. Yet most businesses never test at all, believing they need a lab, sophisticated equipment, or a research team.

They're wrong.

Steve Krug, in his book Don't Make Me Think, documented how guerrilla usability testing—informal, cheap, and frequent—consistently outperforms elaborate research setups. The difference between companies with great user experiences and everyone else? The great ones test regularly. Period.

The Three Types of Testing That Matter

1. Moderated Remote Testing

You watch users complete tasks on your site via screen share. According to research from the Interaction Design Foundation, remote testing catches 80% of the same issues as in-person testing, at a fraction of the cost.

2. Unmoderated Task-Based Testing

Users complete specific tasks on their own time while software records their screen and voice. Baymard Institute uses this method for their massive e-commerce studies, finding it especially effective for checkout and form testing.

3. First-Click Testing

Users show you where they'd click first to complete a task. MIT research shows that when users get the first click right, they have an 87% chance of completing the task successfully. When they get it wrong, their success rate drops to 46%.

Your Complete Testing Protocol

Phase 1: Pre-Test Setup (2 Hours Total)

□ Define Your Research Questions

How to check this off: Write down 3-5 specific questions you need answered. Not "Is our site good?" but "Can users find our pricing without help?" or "Do people understand what we do within 30 seconds?"

Good research questions follow this formula: Can [specific user type] successfully [complete specific action] without [specific problem occurring]?

Example questions that generate actionable insights:

Can first-time visitors find our contact information within 15 seconds?

Do users understand our service tiers before reaching the pricing page?

Can mobile users complete a purchase without zooming or horizontal scrolling?

Do returning customers find the login button on their first attempt?

Can users recover from form errors without starting over?

Document these questions in a simple spreadsheet with columns for: Question | Task to Test | Success Metric | Current Hypothesis

Trusted resource: Nielsen Norman Group's guide to research questions: https://www.nngroup.com/articles/ux-research-questions/

□ Create Your Task Scenarios

How to check this off: Write 3-7 realistic scenarios that test your research questions. Each scenario should take 2-5 minutes to complete.

The key is making scenarios realistic and specific, not vague and general. Instead of "Find information about our services," write "Your company needs to process payments online. Find out if we support international transactions and what the fees are."

Task scenario template:

Context: Brief background (1-2 sentences)

Goal: What they're trying to accomplish

Success: What completing the task looks like

Time limit: Maximum reasonable time (usually 2-5 minutes)

Example scenario:

"You run a small bakery and need to accept online orders for custom cakes. You want to know if our platform can handle variable pricing and customer uploads of design images. Find this information and determine if we meet your needs. Success = finding both pieces of information. Time limit = 3 minutes."

Avoid leading language. Don't say "Click on the beautiful blue 'Get Started' button." Say "Begin the signup process." Let users find their own path.

Trusted resource: Steve Krug's task scenario examples: https://www.sensible.com/downloads/tasks-for-demo-usability-test.pdf

□ Build Your Testing Script

How to check this off: Create a consistent script you'll use with every participant. This ensures you're testing the interface, not your ability to give instructions.

Your script needs five sections:

Introduction (1 minute):

"Thanks for helping us today. We're testing our website, not you. There are no wrong answers. Please think out loud as you work—tell me what you're thinking, what confuses you, what you expect to happen. If you get stuck, that's valuable information for us."

Background Questions (2 minutes):

How often do you [relevant activity]?

What websites/apps do you typically use for [relevant task]?

What device are you using right now?

On a scale of 1-10, how comfortable are you with technology?

Task Introduction (30 seconds per task):

"I'm going to read you a scenario, then ask you to complete a task. I'll paste the scenario in the chat so you can refer back to it. Please share your screen before starting."

Post-Task Questions (1 minute per task):

How difficult was that task? (1=very easy, 5=very difficult)

What was the hardest part?

What would have made it easier?

Did anything surprise you?

Wrap-Up (2 minutes):

What's your overall impression?

What's one thing you'd change?

Would you use this site again?

Any questions for me?

Print this script. Read it verbatim. Consistency matters more than sounding natural.

Phase 2: Recruiting Participants (2-3 Days)

□ Find Your Five Users

How to check this off: Recruit 5 participants who match your actual user base. Not your employees. Not your family. Real users or close approximations.

Recruiting options ranked by effectiveness:

Best: Existing customers

Email your customer list with subject: "Help us improve [product] - 30 minute paid session."

Offer a $50 credit or gift card. You'll get 5-10% response rate.

Good: User testing platforms

UserTesting.com ($49 per session)

UsabilityHub.com ($2.50 per response for simple tests)

Maze.co ($99/month for unlimited tests)

Acceptable: Social media recruiting

Post in relevant LinkedIn/Facebook groups. Specify your exact criteria. Offer compensation. Screen respondents with a 3-question survey to ensure fit.

Last resort: Guerrilla recruiting

Coffee shops, coworking spaces, libraries. Approach people who look like your target audience. Offer $20 cash for 20 minutes. Steve Krug swears by this method.

Never use: Friends, family, employees, or anyone with prior knowledge of your site.

Trusted resource: Nielsen Norman Group's recruiting guide: https://www.nngroup.com/articles/recruiting-test-participants/

Phase 3: Running the Test (5 Hours Over 2 Days)

□ Set Up Your Testing Environment

How to check this off: Complete a technical dry run 24 hours before your first test.

Technical checklist:

Install recording software (Zoom, Teams, or Loom all work)

Test screen sharing both directions

Clear browser cache and cookies

Close unnecessary tabs and applications

Disable browser extensions

Set browser to default zoom level (100%)

Turn off notifications

Have backup contact method (phone number)

Prepare shareable link to scenarios

Create note-taking template

Note-taking template columns:

Time stamp

Task number

User action

User comment

Observed problem

Severity (1=low, 3=high)

Practice with a colleague. Iron out technical issues before you're wasting a participant's time.

□ Conduct the Sessions

How to check this off: Run 5 sessions over 1-2 days, with 30-minute breaks between for notes.

During each session:

Stay neutral: Don't help. Don't hint. Don't react. When users ask "Should I click here?" respond with "What would you do if I wasn't here?" When they're struggling, count to 10 before intervening.

Probe without leading: Use phrases like "Tell me more about that" or "What are you thinking right now?" not "Would it help if there was a button here?"

Document everything: Record the session. Take timestamps. Note exact quotes. Capture emotions, not just actions. "User seemed frustrated (2:34)" is more valuable than "User clicked wrong button."

Watch for these behaviors:

Scanning patterns (where do eyes go first?)

Hesitation points (where do they pause?)

Error recovery (how do they handle mistakes?)

Satisfaction signals (smiles, sighs, comments)

Abandonment triggers (what makes them give up?)

Common moderator mistakes to avoid:

Explaining how things work

Defending design decisions

Filling silence with chatter

Asking leading questions

Showing emotional reactions

Rushing through tasks

After each session, immediately spend 10 minutes writing down your top 3 observations while they're fresh.

Trusted resource: Steve Krug's moderating tips: https://www.sensible.com/downloads/things-moderator-says.pdf

Phase 4: Analysis and Action (3 Hours)

□ Synthesize Your Findings

How to check this off: Create a findings spreadsheet within 24 hours of your last session.

Analysis framework:

Step 1: List all problems

Create a spreadsheet with columns:

Problem description

Number of users affected (out of 5)

Task impacted

Severity (1=annoying, 2=difficult, 3=blocking)

User quotes

Potential solution

Example entry:

Step 2: Calculate priority

Priority = (Users affected ÷ 5) × Severity × 10

This gives you a score from 0-30. Fix anything above 15 immediately.

Step 3: Group by theme

Common problem themes:

Navigation/findability

Unclear terminology

Missing information

Broken expectations

Technical errors

Trust/credibility

Mobile-specific issues

Step 4: Create fix list

Rank by priority score. Estimate effort (hours). Calculate ROI (priority ÷ effort). Start with highest ROI.

Trusted resource: Interaction Design Foundation's analysis guide: https://www.interaction-design.org/literature/topics/usability-testing

□ Report and Implement

How to check this off: Share findings within 48 hours and implement at least one fix within a week.

Your report structure:

Executive Summary (1 paragraph):

"We tested [what] with [who] and found [number] issues. The top 3 problems were [list]. Fixing these will improve [specific metrics]."

Key Findings (3-5 bullets):

Problem: What broke

Impact: Who it affects and how often

Evidence: Quote or statistic

Solution: Specific fix

Effort: Time/cost estimate

Video Highlights (3-5 clips, 30 seconds each):

Show, don't tell. Clip moments where users struggle. Nothing convinces stakeholders like watching real users fail.

Implementation Plan:

Week 1: Quick fixes (text changes, button moves)

Week 2-3: Moderate fixes (new pages, form redesigns)

Month 2: Major fixes (navigation overhaul, checkout redesign)

Success Metrics:

Define how you'll measure improvement:

Task completion rate increase

Time-on-task decrease

Error rate reduction

Support ticket decrease

Conversion rate improvement

Testing Frequency and Formats

Monthly: Guerrilla Testing

2 hours, 3 users, 1 specific feature

Test whatever you're about to launch

Quarterly: Full Protocol

5 users, complete task set, formal analysis

Test core user journeys

Annually: Competitive Testing

Test your site against 2 competitors

Identify competitive advantages and gaps

Remote Testing Tools Comparison

For Moderated Testing:

Zoom: Free, everyone has it, good recording

Microsoft Teams: Free with Office, reliable

Google Meet: Free, no downloads required

For Unmoderated Testing:

Maze.co: $99/month, best analytics

UsabilityHub: Pay per response, good for quick tests

Loom: Free, great for async feedback

For Analysis:

Dovetail: $0 for basics, excellent for tagging

Miro: Free tier available, great for affinity mapping

Google Sheets: Free, sufficient for most analysis

Common Testing Mistakes to Avoid

Testing too late: Test prototypes, not just finished products. Paper sketches can be tested. Figma mockups can be tested. Test early, test often.

Testing with the wrong people: Your power users aren't representative. Test with people who've never seen your site. Include edge cases—older users, mobile-only users, international users.

Asking opinion questions: "Do you like this?" is useless. "Can you find the return policy?" is useful. Test behaviors, not preferences.

Testing everything at once: Focus each session on specific flows. Testing your entire site in one session gives you shallow, unhelpful data.

Ignoring small sample sizes: Yes, 5 users is enough. Nielsen Norman Group's research is definitive on this. Stop waiting for "statistical significance" and start learning.

Not testing regularly: One annual study won't cut it. Monthly guerrilla testing beats annual formal research every time.

Red Flags That Demand Immediate Testing

Test TODAY if you see:

Conversion rate below 2%

Bounce rate above 70%

Cart abandonment above 80%

Support tickets about "can't find"

Form completion below 50%

Mobile conversion 50% lower than desktop

High traffic, low engagement

Multiple reports of the same issue

Your First Test: The 2-Hour Quick Win

Stop reading. Start testing. Here's your minimum viable test:

Grab 3 people (colleagues from other departments work for a first test)

Pick your most important task (usually: find product/service, understand value, start purchase)

Watch them try (record with Zoom or Loom)

Note where they struggle

Fix the biggest problem

That's it. You'll learn more in 2 hours of watching real users than 2 months of internal debates.

The ROI Reality Check

According to research from Forrester, every dollar invested in UX returns $100. But that's not even the real value. The real value is avoiding the cost of building the wrong thing.

A two-day testing cycle costs maybe $500 in participant incentives and your time. Building the wrong feature costs months and thousands. Testing isn't an expense—it's insurance against expensive mistakes.

Start Testing This Week

You now have everything you need:

A proven protocol

Specific scripts and templates

Tool recommendations

Analysis frameworks

Implementation guidance

The only thing standing between you and better user experience is actually doing it. Pick one critical task on your site. Find three people. Watch them try. Fix what breaks.

Then do it again next month.

Ready to level up your testing? Your UX Helpdesk membership includes expert consultation on test planning, script review, and findings analysis. Access your member portal for personalized guidance.