Test Product Messaging with Ditto + Claude Code

A complete Claude Code guide to A/B/C message testing and iterative refinement. Seven-question study design, two-round iteration workflow, cross-segment testing, language harvest extraction, and six deliverables from a single study. From three untested variants to validated messaging in 70 minutes.

For Claude Code agents and AI coding assistants. This guide provides copy-paste API calls, study designs, and proven workflows for testing product messaging with Ditto's synthetic research API. Every command is production-tested. The output is a validated messaging hierarchy backed by qualitative evidence from target personas.
Contents
  1. Why Test Messaging with Synthetic Research
  2. The Three Messaging Framings
  3. The 7-Question Message Testing Study
  4. Complete API Workflow: Step by Step
  5. The Iterative Loop: Test, Refine, Re-Test
  6. Generating the Six Deliverables
  7. Deep Dive: Language Harvest Extraction
  8. Cross-Segment Message Testing
  9. Worked Example: Project Management SaaS
  10. Connecting to Other PMM Workflows
  11. Best Practices and Common Mistakes
  12. Frequently Asked Questions

1. Why Test Messaging with Synthetic Research

Positioning is internal and strategic. Messaging is external and is what actually touches customers. Yet most product marketing teams validate positioning rigorously and test messaging by gut feel.

Traditional message testing options and their limitations:

Method Time Per Round Cost Limitation
Customer interviews 2-4 weeks $5-15K Slow to arrange, small sample, biased by relationship
Wynter (B2B panels) 24-48 hours $300-600/test Days per round, costly to iterate
Qualtrics surveys 1-2 weeks $2-10K Quantitative, misses qualitative depth
Live A/B testing 1-4 weeks Opportunity cost Tests after launch with real traffic, no qualitative insight
Ditto + Claude Code 30 min per round API usage only Synthetic (validated at 95% correlation with real research)
The core value: Ditto lets you run two complete rounds of message testing (test, refine, re-test) in approximately 70 minutes. This means messaging decisions are based on evidence from target personas, not internal opinion. The iterative loop is the key differentiator: you can refine losers based on Round 1 feedback and validate the improvements in Round 2, all in one sitting.

2. The Three Messaging Framings

Before running a message test, you need variants worth testing. The same positioning can produce radically different messaging depending on which framing you lead with. The three canonical approaches:

Framing Opens With Works When Fails When Example
Problem-led The pain the customer feels Problem is universal and emotionally resonant Audience doesn't recognise the problem or it feels abstract "Tired of spending $50K on research that takes three months?"
Outcome-led The result the customer achieves Outcome is specific, measurable, and desirable Claim sounds too good to be true without proof "Get validated customer insights in 30 minutes."
Capability-led What the product does The capability itself is the differentiator and the audience is sophisticated Audience cares about outcomes, not mechanisms "AI-powered synthetic research with 300,000 personas across 15 countries."
Claude Code workflow tip: When drafting variants, write all three framings for the same positioning. Use identical value propositions but vary only the leading framing. This isolates how you say it from what you say, producing cleaner test results.

Drafting Variants for Testing

Claude Code should produce three variants of 1-3 sentences each. Keep them similar in length and level of detail so the test measures framing preference, not information asymmetry.

# Example: Three variants for a project management SaaS

MESSAGE_A = """Tired of status meetings that could have been a dashboard?
Your team wastes 5 hours a week on alignment that should be automatic.
FlowBoard replaces the meeting with a living project view."""

MESSAGE_B = """Ship projects 40% faster with zero status meetings.
FlowBoard gives every stakeholder real-time visibility into progress,
blockers, and deadlines without a single sync call."""

MESSAGE_C = """AI-powered project tracking that learns your team's workflow.
FlowBoard auto-generates status updates, predicts delays before they happen,
and routes blockers to the right person instantly."""

3. The 7-Question Message Testing Study

Each question targets a specific dimension of message performance. Together, they produce the data needed for all six deliverables.

Q# Question Template What It Measures Maps To Deliverable
Q1 "Read this message: '[Message A]'. In your own words, what is this company offering? Who is it for? Would you want to learn more?" Comprehension, relevance, intent Clarity Scorecard
Q2 "Now read this: '[Message B]'. How does this compare to the first? Which feels more relevant to your situation?" Comparative preference, framing impact Performance Ranking
Q3 "One more: '[Message C]'. Of the three, which would make you most likely to click, sign up, or reach out? Why?" Action intent, decision drivers Performance Ranking, Audience-Message Fit
Q4 "What is unclear or confusing about any of these messages? What questions do they leave unanswered?" Clarity gaps, information needs Clarity Scorecard, Clarity Checklist
Q5 "If you saw the winning message on a website, what would you expect to find when you clicked through?" Message-to-experience alignment Clarity Checklist
Q6 "What one word or phrase from these messages stuck with you most? What fell completely flat?" Language resonance, memorability Language Harvest
Q7 "Thinking about your actual work or life, which of these problems feels most urgent to you right now? Why?" Problem urgency, messaging-market fit Performance Ranking, Audience-Message Fit
Question order matters. Questions are presented sequentially so personas build context. Q1 introduces Message A alone (unbiased first impression). Q2 introduces Message B with a comparison frame. Q3 introduces Message C and forces a three-way choice. Do not reorder these.

Customising the Questions

Replace the bracketed placeholders with your actual messaging variants. The surrounding question text should remain unchanged to maintain the measurement structure. For example:

# Q1: Replace [Message A] with your problem-led variant
question_1 = f"""Read this message: '{MESSAGE_A}'

In your own words, what is this company offering? Who is it for?
Would you want to learn more?"""

4. Complete API Workflow: Step by Step

Step 1: Recruit a Research Group

Create a group matching your target buyer profile. Use demographic filters to ensure relevance.

curl -s -X POST "https://app.askditto.io/v1/research-groups/recruit" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Message Test: PM SaaS Target Buyers",
    "group_size": 10,
    "filters": {
      "country": "US",
      "age_min": 28,
      "age_max": 50,
      "employment_status": "Employed"
    }
  }'

Response includes group_id. Save it for Step 2.

Group size recommendation: 10 personas for message testing. This provides enough diversity to identify patterns while keeping the data manageable. For cross-segment testing (Section 8), use 10 per segment.

Step 2: Create a Research Study

curl -s -X POST "https://app.askditto.io/v1/research-studies" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "title": "A/B/C Message Test: FlowBoard Project Management",
    "objective": "Test three messaging variants (problem-led, outcome-led, capability-led) to identify which framing resonates most with target buyers and produces the strongest action intent.",
    "research_group_id": GROUP_ID
  }'

Response includes study_id. Save it for Step 3.

Step 3: Ask Questions Sequentially

Ask questions one at a time. Wait for all persona responses to complete before asking the next question. This ensures personas build context from earlier answers, producing richer qualitative data. Asking all 7 simultaneously loses the conversational depth.
# Ask Question 1
curl -s -X POST "https://app.askditto.io/v1/research-studies/STUDY_ID/questions" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "question": "Read this message: '\''Tired of status meetings that could have been a dashboard? Your team wastes 5 hours a week on alignment that should be automatic. FlowBoard replaces the meeting with a living project view.'\'' In your own words, what is this company offering? Who is it for? Would you want to learn more?"
  }'

Response includes job_ids (one per persona). Poll these jobs until all complete.

Step 4: Poll for Question Completion

# Poll each job until status is "completed"
curl -s "https://app.askditto.io/v1/jobs/JOB_ID" \
  -H "Authorization: Bearer YOUR_API_KEY"

When status returns "completed" for all jobs in a question, proceed to the next question. Poll every 3-5 seconds. Typical completion time: 15-45 seconds per question.

Step 5: Repeat for Questions 2-7

After Q1 completes, ask Q2. After Q2 completes, ask Q3. Continue through Q7. Total time for 7 questions across 10 personas: approximately 5-8 minutes.

# Example: Question 3 (the three-way comparison)
curl -s -X POST "https://app.askditto.io/v1/research-studies/STUDY_ID/questions" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "question": "One more: '\''AI-powered project tracking that learns your team'\''s workflow. FlowBoard auto-generates status updates, predicts delays before they happen, and routes blockers to the right person instantly.'\'' Of the three messages, which would make you most likely to click, sign up, or reach out? Why?"
  }'

Step 6: Complete the Study

curl -s -X POST "https://app.askditto.io/v1/research-studies/STUDY_ID/complete" \
  -H "Authorization: Bearer YOUR_API_KEY"

Ditto generates an automated analysis: key segments, divergences, shared mindsets, and suggested follow-up questions. This analysis often surfaces message testing insights not obvious from individual responses.

Step 7: Get the Share Link

curl -s "https://app.askditto.io/v1/research-studies/STUDY_ID" \
  -H "Authorization: Bearer YOUR_API_KEY"

The share_url field provides a public URL to the full study results.

Step 8: Fetch All Questions and Answers

curl -s "https://app.askditto.io/v1/research-studies/STUDY_ID/questions" \
  -H "Authorization: Bearer YOUR_API_KEY"

Returns all 7 questions with all 10 persona responses per question (70 total responses). This is the raw data for generating deliverables.


5. The Iterative Loop: Test, Refine, Re-Test

A single round of message testing is useful. Two rounds produce messaging you can be genuinely confident in. This is the core workflow that makes Ditto + Claude Code uniquely powerful for messaging.

The Two-Round Process

Phase Duration What Happens
Round 1: Test ~30 minutes Test 3 messaging variants against 10 personas using the 7-question study. Analyse responses. Identify: which variant won, why the others lost, which phrases resonated, which fell flat, what gaps remain.
Refinement ~10 minutes Claude Code rewrites the two losing variants, incorporating winning language from Round 1, addressing clarity gaps, adjusting framing based on urgency data from Q7.
Round 2: Re-Test ~30 minutes Test the 3 refined variants against a fresh group of 10 personas (new recruitment). Fresh personas prevent priming bias. If the same variant wins both rounds, you have convergence.
Why fresh personas for Round 2: Using the same personas would bias Round 2 results because they have context from Round 1 questions. Always recruit a new group. The filters should be identical (same demographic profile), but the individual personas will be different.

Refinement Strategy

After Round 1, Claude Code should analyse the 70 responses and apply these refinement rules:

  1. Steal winning language: If specific phrases from the winning variant were cited in Q6 as "stuck with me", incorporate them into the losing variants.
  2. Address clarity gaps: If Q4 reveals that personas didn't understand what the product does, add a concrete explanation.
  3. Match urgency: If Q7 reveals that personas care most about a specific problem, ensure the problem-led variant addresses that exact problem.
  4. Fix misinterpretation: If Q1 shows personas understood Message A incorrectly, rewrite for clarity.
  5. Preserve the winner: The Round 1 winner may receive minor tweaks (e.g., addressing a clarity gap) but should not be substantially rewritten.

Round 2 Implementation

# Recruit a FRESH group with the same filters
curl -s -X POST "https://app.askditto.io/v1/research-groups/recruit" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Message Test Round 2: PM SaaS Target Buyers",
    "group_size": 10,
    "filters": {
      "country": "US",
      "age_min": 28,
      "age_max": 50,
      "employment_status": "Employed"
    }
  }'

# Create a new study with the refined messages
curl -s -X POST "https://app.askditto.io/v1/research-studies" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "title": "A/B/C Message Test Round 2: FlowBoard (Refined)",
    "objective": "Re-test three refined messaging variants. Round 1 winner was outcome-led. Problem-led and capability-led variants have been rewritten based on Round 1 persona feedback.",
    "research_group_id": NEW_GROUP_ID
  }'

# Ask the same 7 questions with the REFINED message variants
# (same question structure, updated message text)
Convergence signals: If the same framing wins in both rounds, you have strong evidence for that approach. If the winning framing changes between rounds, the refined variants may have overcorrected. In that case, compare the specific reasons personas gave in each round to understand what shifted.

6. Generating the Six Deliverables

A completed message testing study (70 qualitative responses: 10 personas x 7 questions) produces six structured deliverables. Claude Code generates these by analysing the response data from the /questions endpoint.

Deliverable Source Questions What It Contains Primary User
Message Performance Ranking Q2, Q3, Q7 Which variant won, by how much, and the specific reasons personas cited. Not just "B won" but "B won because the outcome framing resolved scepticism that the problem framing triggered." PMM, Marketing
Clarity Scorecard Q1, Q4 For each variant: was it understood correctly, was it misinterpreted, what questions did it leave unanswered? Misinterpretation is worse than confusion. PMM, Copywriting
Language Harvest Q6 (primary), all Q's Words and phrases that stuck (keep these) vs fell flat (kill these). The language customers naturally use to describe your value proposition. Copywriting, Content
Audience-Message Fit Matrix Q3, Q7, demographics Which message works for which persona type. Maps demographic/psychographic profiles to message preference. PMM, Demand Gen
Messaging Hierarchy All Q's synthesised Primary message, 3-4 supporting pillars, and proof points, populated with tested, validated language. PMM, Marketing, Sales
Clarity Checklist Q4, Q5 Specific questions personas needed answered: pricing, free trial, social proof, implementation time. These become mandatory elements for any asset carrying this messaging. Web, Content, Sales

Generating the Performance Ranking

Parse Q3 responses (the three-way comparison) to tally explicit preferences. Then cross-reference with Q2 (pairwise comparison) and Q7 (urgency alignment).

# Pseudocode for performance ranking extraction
responses_q3 = fetch_responses(study_id, question_3_id)

tally = {"A": 0, "B": 0, "C": 0}
reasons = {"A": [], "B": [], "C": []}

for response in responses_q3:
    text = response["response_text"]
    # Identify which message the persona chose
    # Extract their stated reason
    # Tally preference and collect reasoning

# Output:
# "Message B (outcome-led) won 6/10 preferences.
#  Primary reason: the specific outcome ('40% faster') felt credible
#  and actionable, while the problem framing ('tired of meetings')
#  was seen as generic by 3 personas."

Generating the Messaging Hierarchy

The messaging hierarchy is the most important output. It follows the standard structure:

MESSAGING HIERARCHY
===================

PRIMARY MESSAGE:
  "Ship projects 40% faster with zero status meetings."
  [Tested: 6/10 preference in Round 1, 7/10 in Round 2]

SUPPORTING PILLARS:

  Pillar 1: Real-time visibility
    "Every stakeholder sees progress, blockers, and deadlines
     without a single sync call."
    Evidence: 8/10 personas cited "no more check-in meetings"
    as the most compelling benefit.

  Pillar 2: Predictive intelligence
    "Know about delays before they happen, not after."
    Evidence: Q6 language harvest - "predicts delays" was the
    #1 phrase that "stuck" across both rounds.

  Pillar 3: Zero-effort status updates
    "Auto-generated updates from your team's actual work."
    Evidence: Q4 clarity gap - personas needed to understand
    HOW updates are generated without manual input.

PROOF POINTS:
  - "40% faster" needs supporting data (case study, benchmark)
  - "AI-powered" needs specificity (what model, what data)
  - Social proof needed: "Who else uses this?"
  [From Q5 expectation alignment data]

7. Deep Dive: Language Harvest Extraction

The language harvest is the most immediately actionable output. It tells you exactly which words and phrases to use (and avoid) in all customer-facing copy.

Extraction Process

Q6 asks directly: "What one word or phrase stuck with you most? What fell completely flat?" But valuable language data is embedded across all seven questions. Claude Code should scan all 70 responses for:

Category What to Look For How to Use It
Keep (high resonance) Phrases cited in Q6 as "stuck with me", language personas use when paraphrasing your message positively in Q1, words that appear in multiple personas' Q3 action reasons Use in headlines, email subject lines, ad copy, sales scripts
Kill (negative resonance) Phrases cited in Q6 as "fell flat", language personas flag as "confusing" or "jargon" in Q4, words associated with scepticism in Q1 Remove from all messaging, replace with tested alternatives
Adopt (customer language) Natural language personas use to describe the value in Q1, problem descriptions in Q7 that differ from your framing, paraphrases that are clearer than your original Replace your internal language with customer language throughout
Pattern threshold: If 3+ out of 10 personas independently cite the same word or phrase as memorable, that phrase has demonstrated resonance. If 3+ cite the same phrase as flat, it should be eliminated. Individual mentions are noise; repeated mentions are signal.

Example Language Harvest Output

LANGUAGE HARVEST
================

KEEP (use more):
  - "zero status meetings"     [cited by 7/10 as memorable]
  - "predicts delays"          [cited by 5/10, appeared in Q3 reasons]
  - "living project view"      [cited by 4/10, natural paraphrase]
  - "40% faster"               [cited by 6/10, BUT 3 wanted proof]

KILL (remove immediately):
  - "alignment that should be automatic" [4/10 said "corporate jargon"]
  - "routes blockers"          [3/10 didn't understand what this means]
  - "learns your workflow"     [3/10 found this "creepy" or "vague"]

ADOPT (customer language, replace yours):
  - Personas say "no more Monday syncs" → use instead of "eliminates meetings"
  - Personas say "see what's stuck" → use instead of "identifies blockers"
  - Personas say "keeps everyone on the same page" → use instead of "real-time visibility"

8. Cross-Segment Message Testing

The basic workflow tests one set of messages against one audience. The advanced version tests the same messages against multiple audiences simultaneously, revealing which framing works for which buyer.

Running Parallel Studies

Claude Code orchestrates three studies concurrently:

# Group 1: SMB decision-makers
curl -s -X POST "https://app.askditto.io/v1/research-groups/recruit" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Message Test: SMB Buyers (28-40)",
    "group_size": 10,
    "filters": {
      "country": "US",
      "age_min": 28,
      "age_max": 40,
      "employment_status": "Employed"
    }
  }'

# Group 2: Enterprise evaluators
curl -s -X POST "https://app.askditto.io/v1/research-groups/recruit" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Message Test: Enterprise Buyers (35-55)",
    "group_size": 10,
    "filters": {
      "country": "US",
      "age_min": 35,
      "age_max": 55,
      "employment_status": "Employed"
    }
  }'

# Group 3: Technical buyers
curl -s -X POST "https://app.askditto.io/v1/research-groups/recruit" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Message Test: Technical Buyers (Bachelor+)",
    "group_size": 10,
    "filters": {
      "country": "US",
      "age_min": 25,
      "age_max": 50,
      "employment_status": "Employed",
      "education_level": "Bachelors"
    }
  }'

Create three separate studies (one per group) and run the same 7 questions with the same 3 message variants across all three. Claude Code can interleave the API calls, asking Q1 to all three studies, then Q2, and so on.

Audience-Message Fit Matrix Output

AUDIENCE-MESSAGE FIT MATRIX
============================

                    SMB Buyers    Enterprise      Technical
Message A (Problem)    6/10          3/10           2/10
Message B (Outcome)    3/10          7/10           4/10
Message C (Capability) 1/10          0/10           4/10

KEY INSIGHT:
- SMB buyers respond to problem framing (pain is personal and immediate)
- Enterprise buyers respond to outcome framing (need to justify ROI)
- Technical buyers split between outcome and capability
  (want to understand the mechanism before trusting the claim)

RECOMMENDATION:
- Website homepage: Outcome-led (broadest appeal)
- SMB email sequences: Problem-led
- Enterprise sales deck: Outcome-led with ROI proof
- Technical documentation: Capability-led
GTM implication: If your SMB motion is PLG and your enterprise motion is sales-led, use different messaging for each channel. One primary message on the website, but segment-specific messaging in email sequences, ad targeting, sales scripts, and in-product onboarding.

9. Worked Example: Project Management SaaS

Case Study: FlowBoard Message Testing

Context: FlowBoard is a project management tool launching a new AI-powered status tracking feature. The PMM team has validated positioning (competitive alternative: Monday.com + manual updates; unique attribute: AI-generated status; value: eliminates status meetings). Now they need to determine which messaging framing will drive the most trial sign-ups.

Three Variants Drafted

Round 1 Results (10 personas, US, employed, age 28-50)

Refinement Applied

Round 2 Results (fresh 10 personas, same filters)

Final Messaging Hierarchy

Primary: "Ship projects faster with zero status meetings."

Pillar 1: Real-time visibility without sync calls

Pillar 2: AI-predicted delays before they become problems

Pillar 3: Auto-generated updates from actual work

Proof: Social proof ("2,000+ teams"), speed claim needs case study support, pricing on the landing page (not behind a form)


10. Connecting to Other PMM Workflows

Message testing sits between positioning and execution in the PMM stack. It connects to other Ditto + Claude Code workflows:

Workflow Relationship to Message Testing Sequence
Positioning Validation Positioning determines what to say. Messaging determines how to say it. Always validate positioning first. Before message testing
Competitive Intelligence Competitive battlecards provide "quick dismisses" and "landmine questions" that should be reflected in competitive messaging variants. Before or parallel
Sales Enablement The messaging hierarchy and language harvest feed directly into pitch decks, email templates, and demo scripts. After message testing
Content Marketing Tested messaging informs blog headlines, social copy, ad creative, and landing page copy. The language harvest provides exact words to use. After message testing
Pricing Research Q4 clarity gaps often surface pricing as the #1 unanswered question. If so, run a pricing study next. After message testing

The Full PMM Sequence (Under 3 Hours)

Positioning Validation     →  30 minutes  →  What to say
       ↓
Message Testing (2 rounds) →  70 minutes  →  How to say it
       ↓
Competitive Intelligence   →  45 minutes  →  How to say it about the competition
       ↓
Total: ~2.5 hours for the strategic foundation most teams spend a quarter building

11. Best Practices and Common Mistakes

Best Practices

Practice Why It Matters
Keep variants similar in length If Message A is 2 sentences and Message C is a paragraph, you're testing length, not framing
Test framing, not content All three variants should convey the same value proposition with different emphasis
Use fresh personas for Round 2 Same personas are primed by Round 1 context, biasing results
Ask questions sequentially Personas build context across questions, producing richer qualitative data
Include the message text in the question Don't reference "Message A" abstractly. Paste the actual message so personas respond to the words
Run the study through completion Ditto's automated analysis often surfaces insights not obvious from individual responses
Trust the language harvest over intuition If 7/10 personas remember a phrase, use it. If your favourite phrase fell flat with 4/10, kill it.

Common Mistakes

Mistake What Goes Wrong How to Avoid
Testing before positioning is validated You might be testing the right framing for the wrong value proposition Run positioning validation first (see Positioning Validation guide)
Only one round of testing No way to verify the winner or test whether refined losers improve Always run two rounds. The second round costs 30 minutes and provides convergence evidence
Reusing the same group for Round 2 Personas are primed by Round 1, biasing Round 2 results Recruit a fresh group with identical filters
Testing more than 3 variants Comparison fatigue. Personas lose the ability to differentiate after 3 options Test 3 at a time. If you have 5 variants, run a first round to narrow to 3, then test those
Ignoring the clarity checklist Messaging wins the preference test but fails in production because it leaves critical questions unanswered Treat Q4 and Q5 outputs as mandatory design requirements for landing pages and assets
Asking all 7 questions simultaneously Loses sequential context. Q2 and Q3 are specifically designed to build on Q1 Ask one question, wait for completion, then ask the next

12. Frequently Asked Questions

Can I test more than three variants?

Not in a single study. Three is the maximum for meaningful comparison without cognitive overload. If you have five variants, run a screening round (Q3 only, rapid preference check) with all five, then take the top three into the full 7-question study.

How do I know when messaging is "done"?

When the same variant wins in two consecutive rounds with different persona groups, and the language harvest shows consistent patterns, messaging has converged. You can always re-test after launch with real-world data, but two rounds of synthetic testing provides a strong pre-launch foundation.

Should I test headlines, taglines, or full messages?

Full messages (2-3 sentences) work best. Headlines alone lack enough context for personas to evaluate comprehension and relevance. If you need to test headlines specifically, include a sentence of supporting context with each one.

What if no variant clearly wins?

This is informative. It means either: (a) the messaging variants are too similar (differentiate the framings more), (b) the value proposition itself doesn't resonate strongly with this audience (a positioning issue, not a messaging issue), or (c) the audience is genuinely split (consider segment-specific messaging). Check the Q7 urgency data to diagnose which.

Can I test messaging in different languages?

Yes. Ditto has personas across 15+ countries. Recruit a group filtered by country (e.g., Germany, France, Japan) and present messages in the target language. Claude Code can orchestrate parallel studies across markets to compare messaging resonance cross-culturally.

How does this relate to A/B testing on my website?

Ditto message testing is a pre-launch qualifier. It eliminates weak variants before you spend real traffic testing them. Use Ditto to narrow from 3 to 1, then use live A/B testing (Optimizely, VWO, etc.) to fine-tune the winner against minor variations with real conversion data.

What if the Round 2 winner is different from Round 1?

This means the refinement overcorrected, or the two groups had meaningfully different preferences. Compare the Q3 reasoning from both rounds. If the reasons are consistent but the winner flipped, the variants are close in performance and either could work. If the reasons differ, the groups may represent different segments, which is itself a valuable finding.

How many personas should I use?

10 per group is the sweet spot for message testing. Fewer than 6 produces unreliable patterns. More than 15 adds data volume without proportionally increasing insight quality. For cross-segment testing, use 10 per segment (30 total for 3 segments).

Can I use this for email subject line testing?

Yes, with a modification. For subject lines, use Q1 to present each subject line and ask "Would you open an email with this subject? Why or why not?" Adapt Q6 to focus on which subject line creates the most curiosity. The rest of the study structure applies.


Related guides: