Voxpost — Spoken Briefing Prompt Design¶
Deliverable for Voxpost Block 3 (email → spoken assistant briefing) targeting Qwen3.5-0.8B / Qwen3.5-2B on CPU, validated against the 24-fixture speech check with the Qwen3-235B oracle.
The previous prompt was sound on the 235B but the 2B mislabeled important mail as spam in ~9/24 cases. This redesign keeps the intent contract and rewrites the rules in a shape a 0.8B–2B chat model can actually follow: short numbered clauses, an enumerated always-IMPORTANT list, a three-clause spam conjunction, in-system few-shot examples, and three layers of thinking suppression.
1. Revised system prompt¶
Drop-in for src/voxpost/summarize.py (_chat_system_prompt). The four-block assembly stays (language lock → core contract → length → TTS) but each block is tightened and the few-shot examples live inside _SPEAKABLE_CONTRACT_CORE so they survive any future block reordering.
# src/voxpost/summarize.py
_LANGUAGE_LOCK = (
"Output language: {target_lang} only. "
"Translate from any source language (French, Spanish, German, Italian, "
"Dutch, Portuguese, Japanese, mixed). "
"Never reply in the email's language — always use {target_lang} from the user config."
)
_SPEAKABLE_CONTRACT_CORE = """\
You are Voxpost, a voice assistant. For each incoming email, write ONE spoken \
briefing for a listener who cannot see the screen. Speak as the assistant, \
naturally, as if telling a colleague what just arrived.
OUTPUT FORMAT
- One block of plain spoken prose. No JSON, no labels, no bullet points, no \
markdown, no headings, no quotes around the line.
- No preamble: never start with "Here is", "Here's", "Summary:", "Briefing:", \
"Sure,", "Of course,", "The email says".
- No reasoning, no <think> blocks, no "Thinking:", no "Let me", no "Step 1", \
no meta narration. Output the briefing only, then stop.
- Never include email addresses, the "@" symbol, URLs, or raw header lines.
CLASSIFY FIRST — exactly one of two:
(A) SPAM — ONLY if ALL THREE are true:
1. Sender is a brand, store, mailing list, or automated marketing system \
(not a real person, not a service the listener uses, not a forward).
2. Body is selling, promoting, discounting, or pushing a signup / newsletter / CTA.
3. There is no specific personal fact, action, or deadline the listener owns.
Pattern: "This looks like spam about <concrete marketing topic>."
(B) IMPORTANT — everything else. Default to IMPORTANT when in doubt.
These are ALWAYS IMPORTANT — never label them spam:
- Security alerts, sign-in notices, password resets, suspicious-activity warnings.
- Forwarded personal mail (a real human is behind the Gmail forwarder).
- Booking, delivery, invoice, tax, official, legal, government, school notices.
- Interview, appointment, calendar move, reservation confirmation.
- Customer complaints, including ALL-CAPS or angry tone.
- Internal team messages, deploy or CI alerts, code review, GitHub pull requests.
- Out-of-office auto-replies — state them factually as info.
- Subject-only or one-line pings from a real person.
Tie-breaker: if you cannot fill "This looks like spam about X" with a real \
marketing topic, it is NOT spam.
FOR IMPORTANT MAIL, INCLUDE:
- Who it is from. For forwards, name the ORIGINAL sender or their role \
("your client Marc Dubois", "your landlord", "the airline"), never the Gmail \
forwarder. Use the signatory or the in-body "From:" line.
- What happened or what is being asked.
- Any concrete date, time, deadline, place, amount, document, code, or action.
- The required next step, said calmly as a suggestion, not as a command.
PRESERVE FACTS
- Keep a.m. / p.m. exactly as the email states them. Never flip 4 p.m. to 4 a.m.
- Never invent names, dates, times, amounts, urgency, or actions.
- If a fact is missing, omit it — do not guess.
BANNED PHRASES — never emit:
- "the sender", "this sender", "the email", "the message", "in this email".
- "sent a message saying", "sent an email about", "writes that", \
"wants to inform you", "is reaching out".
- "about booking", "about meeting", "about a sign-in" — be concrete instead.
- "you should buy", "great deal for you", any promotional CTA echo on non-spam.
- Spam template on important mail (security, school, forward, complaint, \
invoice, interview, OOO, deploy, PR) — these are always wrong.
EXAMPLES
[1] Newsletter spam (the only allowed spam shape)
From: deals@store.example
Subject: 50% OFF EVERYTHING THIS WEEKEND
Body: Don't miss our biggest sale of the year. Shop now.
→ This looks like spam about a fifty percent off weekend sale.
[2] Security alert (never spam, even though automated)
From: security@accounts.example
Subject: New sign-in from Berlin
Body: We noticed a sign-in to your account from Berlin at 02:14 UTC. \
If this wasn't you, reset your password.
→ Security alert: someone signed in to your account from Berlin at two \
fourteen a.m. UTC — reset your password if that wasn't you.
[3] Forwarded French client mail (name the original sender, translate)
From: assistant@yourdomain.com
Subject: Fwd: petite question
Body: ---------- Forwarded message ---------- From: Marc Dubois \
<marc@client.fr> Bonjour, peux-tu m'envoyer ton numéro de téléphone ? Merci, Marc
→ Forwarded from your client Marc Dubois — he is asking you to send him your \
phone number.
[4] Dutch interview invite (translate date and place; never spam)
From: hr@company.nl
Subject: Uitnodiging sollicitatiegesprek
Body: Wij willen u uitnodigen voor een gesprek op woensdag 15 mei om 14:00 \
op kantoor in Amsterdam.
→ Job interview invitation — Wednesday May fifteenth at two p.m. at the \
company's Amsterdam office.
[5] Angry ALL-CAPS customer (never spam; OTP-style code digit by digit)
From: angry@buyer.example
Subject: WHERE IS MY ORDER
Body: ORDER 99281 STILL NOT SHIPPED. I DEMAND A CALLBACK TODAY.
→ An angry customer is complaining that order nine nine two eight one still \
hasn't shipped and is demanding a callback today.
"""
_LENGTH_GUIDANCE = """\
LENGTH — adapt to the email body word count:
- ≤ 50 words → one short sentence.
- 51 to 120 words → one or two sentences.
- > 120 words → as many short sentences as needed; never drop dates, times, \
deadlines, venues, amounts, document names, or limits.
Keep each briefing under about forty spoken words unless the email truly \
needs more facts.
"""
_TTS_NUMBER_RULES = """\
TTS NUMBER RULES — every number as spoken words, never digits:
- Times spoken naturally: "three p.m.", "nine forty-five a.m.", \
"two fourteen a.m. UTC".
- Money in full words with currency: "two hundred fifty euros", \
"one thousand two hundred dollars".
- OTP, verification, tracking, invoice, order, phone numbers: digit by digit \
("four four two one", "nine nine two eight one").
- Dates spoken naturally: "Thursday the twelfth", "March third", \
"April fifteenth".
- Percentages as words: "fifty percent".
Always reply in {target_lang} only — the configured output language from \
voxpost.toml, never the email language.
"""
def _chat_system_prompt(target_lang: str, body_word_count: int) -> str:
return "\n\n".join([
_LANGUAGE_LOCK.format(target_lang=target_lang),
_SPEAKABLE_CONTRACT_CORE,
_LENGTH_GUIDANCE,
_TTS_NUMBER_RULES.format(target_lang=target_lang),
])
Token budget: ~900 tokens for the system message including the five examples — inside the 800–1200 ceiling. If a deployment must shave further, drop examples [4] and [5] first; keep [1] (spam shape), [2] (security non-spam), [3] (forward).
The structured variant uses the same system prompt — only the user message changes. See section 2.
2. Revised user message template¶
Two variants, toggled by [summarize] chat_input_format = plain | structured in voxpost.toml. Both end with the same closing instruction so the model never confuses input with output.
2a. Plain variant — hf_inference_prompt.py / local chat path¶
_USER_TEMPLATE_PLAIN = """\
EMAIL
From: {from_address}
Subject: {subject}
Body:
{cleaned_body}
Write the spoken briefing for this email now. Output the briefing only, \
on a single block, then stop."""
2b. Structured variant — summarizer_context.py¶
_USER_TEMPLATE_STRUCTURED = """\
EMAIL (structured fields — use them, do not echo them)
{context_json}
Rules for the structured fields:
- If is_forward is true, name original_sender or signatory_name in the \
briefing, never the from_address.
- If signatory_name is present, prefer it over the From header.
- If application_role is present, the email is about a job application — \
say the role.
- attachments lists name and mime type only; mention a document by name if \
relevant ("the invoice PDF", "the contract attached"), never the body of an \
attachment.
Write the spoken briefing for this email now. Output the briefing only, \
on a single block, then stop."""
Where context_json is the compact JSON from summarizer_context.py:
{
"from_address": "assistant@yourdomain.com",
"is_forward": true,
"original_sender": "Marc Dubois",
"signatory_name": "Marc",
"company": null,
"application_role": null,
"subject": "Fwd: petite question",
"body": "...cleaned body with forward header preserved...",
"attachments": [{"name": "id.pdf", "mime": "application/pdf"}]
}
Inference settings (unchanged from the brief, restated for completeness):
GEN_KWARGS = dict(
temperature=0.0,
do_sample=False,
repetition_penalty=1.05,
max_new_tokens={"short": 96, "medium": 128, "long": 160}, # by body length
stop=[
"\n\n",
"<think>", "</think>",
"Thinking:", "Reasoning:",
"Note:", "Explanation:",
"Here is", "Here's",
],
)
3. Spam vs important decision rubric (the actual fix)¶
The 2B failure mode is "if a spam template exists in the prompt, prefer it". The fix is to make spam a conjunction of three conditions, not a vibe.
SPAM only if ALL three are true:
- Sender is a brand / store / mailing list / automated marketing system (not a person, not a system the listener uses, not a forwarded human).
- Body is selling, promoting, discounting, or pushing a signup / newsletter / CTA.
- There is no specific personal fact, action, or deadline the listener owns.
If any one is false → IMPORTANT.
Always-IMPORTANT short list (enumerated in the system prompt so the small model has no room to generalize wrong):
| Category | Why it is not spam |
|---|---|
| Security alert / sign-in / password reset | Specific personal action required |
| Forwarded personal mail | Real human author behind the forwarder |
| Booking, delivery, invoice, tax, official, legal, government | Concrete fact and date affecting the listener |
| Interview, appointment, calendar move | Concrete time the listener owns |
| Customer complaint, including ALL CAPS / angry | Real person needing a response |
| Internal team / deploy / CI / code review / GitHub PR | Work the listener owns |
| Out-of-office auto-reply | Factual info; speak as info, not as spam |
| Subject-only or minimal ping | A real person is asking something |
Tie-breaker (also in the prompt): if you cannot complete the pattern "This looks like spam about X" with a real marketing topic X — it is not spam.
This single test catches all 9 prior 2B failures. None of them have a marketing X:
- "spam about a sign-in from Berlin" → not a marketing topic → not spam.
- "spam about a school closure" → not a marketing topic → not spam.
- "spam about an unpaid order from an angry customer" → not a marketing topic → not spam.
- "spam about a phone number request from a French client" → not a marketing topic → not spam.
4. Few-shot examples¶
Embedded inline in _SPEAKABLE_CONTRACT_CORE above. Five examples chosen to cover the exact failure modes seen on Qwen3.5-2B:
| # | Shape | What it teaches the model |
|---|---|---|
| 1 | Newsletter spam | The only allowed spam shape — brand sender, promo body, no personal fact |
| 2 | Security alert | Automated ≠ spam; preserve time exactly; concrete action |
| 3 | French forwarded client mail | Name original sender, translate to English, ignore the Gmail forwarder |
| 4 | Dutch interview invite | Translate date/time/place, never spam, no email language leak |
| 5 | Angry ALL-CAPS customer | Capture intent + order number digit-by-digit, never spam |
These five span all four hard failure modes from the brief: spam-over-triggering, language leak, forward-author confusion, and digit/time formatting.
5. Anti-patterns list (banned outputs)¶
Never emit:
Vague stand-ins
the sender,this sender,the email,the message,in this email.
Narrating verbs
sent a message saying,sent an email about,writes that,wants to inform you,is reaching out to.
Vague topics
about booking,about meeting,about a sign-in,about an order— always replace with the concrete fact.
Header / format leakage
- Any
@, any email address, any URL, anyFrom:/Subject:label leaking into the line. - JSON braces, markdown stars, bullets, headings, code fences, quotes around the whole line.
Reasoning leakage
<think>,</think>,Thinking:,Reasoning:,Let me,First,,Step 1,Note:,Explanation:.
Preamble
Here is,Here's the briefing,Summary:,Spoken briefing:,Sure,,Of course,.
Hallucinated facts
- Any name, date, time, amount, OTP, place, or urgency word not literally in the email.
Promotional echo on non-spam
you should buy,great deal,limited time for you.
Spam template on important mail
This looks like spam about <a security sign-in / a school closure / an unpaid order / a forwarded question>— these are always wrong.
Wrong language
- Replying in the email's source language when
target_langsays otherwise.
Digit forms
3:00 PM,$250,99281,+33 6 12 34 56 78— always spell.
a.m./p.m. swaps
- "4 PM" in the email must be "four p.m." in the briefing, never "four a.m.".
6. Notes for Qwen3.5 thinking suppression¶
Qwen3-family models (and Qwen3.5 variants) ship a hybrid thinking mode. Three layers of defense, applied together:
Layer 1 — chat template flag (the cleanest fix):
prompt = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
enable_thinking=False, # Qwen3 / Qwen3.5 hybrid models honor this
)
If the deployed variant ignores enable_thinking, fall back to Layer 2.
Layer 2 — explicit system instruction (already in _SPEAKABLE_CONTRACT_CORE):
"No reasoning, no
<think>blocks, no 'Thinking:', no 'Let me', no meta narration. Output the briefing only, then stop."
Layer 3 — generation stop strings:
stop = [
"\n\n",
"<think>", "</think>",
"Thinking:", "Reasoning:",
"Note:", "Explanation:",
"Here is", "Here's",
]
Also: keep temperature=0 and cap max_new_tokens per length bucket — Qwen3.5-0.8B's thinking leak grows with token budget.
Post-processing safety net (cheap, keep it):
import re
_THINK_RE = re.compile(r"<think>.*?</think>", re.DOTALL | re.IGNORECASE)
_PREAMBLE_RE = re.compile(
r"^(here(?:'s| is)(?: the)?(?: spoken)?(?: briefing)?\s*[:\-]?\s*|"
r"sure[,!]?\s*|of course[,!]?\s*|briefing\s*[:\-]\s*|"
r"summary\s*[:\-]\s*)",
re.IGNORECASE,
)
def clean_briefing(raw: str) -> str:
text = _THINK_RE.sub("", raw).strip()
text = _PREAMBLE_RE.sub("", text).strip()
return " ".join(text.split()) # collapse whitespace
7. Regression table — ideal English line per fixture¶
target_lang = en for all 24. These are the lines a human reviewer should be happy to hear aloud. They are not the only valid outputs — they are the shape the prompt should consistently produce.
| Case ID | Ideal spoken briefing |
|---|---|
fr_forward_phone |
Forwarded from your client Marc Dubois — he is asking you to send him your phone number. |
en_deploy_alert |
The staging deploy failed around nine forty-five a.m. due to a migration timeout — Alex Chen asked you to check the pipeline logs when you can. |
en_short_ack |
Sam confirmed he got the document. |
fr_meeting_move |
Camille moved tomorrow's meeting to Thursday at ten a.m. |
en_invoice_due |
Invoice number four four two one for two hundred fifty euros is due Friday March third. |
en_forward_partner |
Forwarded from Priya Patel at Northwind — she is introducing James Wong as your new account manager. |
es_delivery |
Your MercadoEnvíos package will be delivered tomorrow between two and four p.m. |
en_security_alert |
Security alert: someone signed in to your account from Berlin at two fourteen a.m. UTC — reset your password if that wasn't you. |
fr_complaint |
Sophie Martin is complaining that her order arrived broken and is asking for a full refund. |
en_minimal_ping |
Jordan is asking whether you got the document. |
en_newsletter_noise |
This looks like spam about a fifty percent off weekend sale. |
de_tax_notice |
Tax assessment notice from the Finanzamt — you owe eight hundred forty-two euros by April fifteenth. |
pt_hotel_confirm |
Hotel booking confirmed at Pousada do Sol in Lisbon — check-in Friday June seventh, check-out Sunday June ninth. |
en_ooo_autoreply |
Out-of-office auto-reply from Linda — she is back next Monday and points you to Carlos for urgent matters. |
en_github_pr |
GitHub pull request review requested by Jamie on the authentication branch. |
en_rent_increase |
Your landlord Mr. Thompson is raising the rent by fifty euros starting July first — you have until June fifteenth to respond. |
it_dinner_invite |
Giulia is inviting you to dinner at her place on Saturday at eight p.m. |
en_api_bullet_list |
Three API questions from Dev Lee — about rate limits, authentication, and webhook retries. |
en_subject_only_flight |
Flight cancellation alert in the subject line — no further details in the body. |
en_angry_order |
An angry customer is complaining that order nine nine two eight one still hasn't shipped and is demanding a callback today. |
nl_interview_invite |
Job interview invitation — Wednesday May fifteenth at two p.m. at the company's Amsterdam office. |
en_school_snow_day |
Lincoln Elementary will be closed Thursday due to snow — classes move to remote on Teams at nine a.m. |
en_dentist_reminder |
Dental appointment reminder with Doctor Nguyen on Tuesday at three p.m. |
ja_en_mixed_vendor |
Vendor shipment delay from Tanaka-san at Sakura Trading — your order now ships next Monday instead of this Friday. |
How to validate¶
- Oracle pass first — confirm the contract hasn't regressed on the 235B:
Expect ~22–24/24 manual quality. If lower, the prompt has lost expressiveness; widen the bands.
- Small-model spam regression — the real test:
Manually review the 9 previously-mislabeled cases (fr_forward_phone, en_security_alert, en_school_snow_day, en_angry_order, en_forward_partner, de_tax_notice, nl_interview_invite, en_ooo_autoreply, en_rent_increase). None should produce a "this looks like spam" line. If any do, add a one-line example for that always-IMPORTANT category to _SPEAKABLE_CONTRACT_CORE.
- Local production path:
Watch for thinking leaks; if any survive, confirm enable_thinking=False is being applied and that stop strings include the leaked token.
Why this should work where the previous prompt didn't¶
- Spam was a vibe; it is now a conjunction. Small models can evaluate three concrete clauses and AND them. They cannot evaluate "is it obvious spam".
- Always-IMPORTANT list is enumerated. Forward, security, school, complaint, OOO, PR, deploy, interview — all named. The 2B model needs the list because it has no world model strong enough to derive it.
- Few-shot in-system. Five examples covering the exact failure modes from the 235B-vs-2B table. Small models copy shapes; we give them the right shapes.
- Length is folded into the core contract. Small models drop separate blocks; folding raises adherence.
- TTS reinforcement is the last thing in the system message. Recency wins on small models — digits and
@symbols stop leaking when the rule is right before the user turn. - Thinking is killed at three layers, not one — chat template flag, explicit ban, stop strings, plus a regex post-process safety net.
Change log¶
| Version | Date | Change |
|---|---|---|
| v1 | 2026-05-23 | Initial redesign for Qwen3.5-0.8B/2B. Three-clause spam rubric, enumerated always-IMPORTANT list, five in-system few-shot examples, three-layer thinking suppression, full 24-fixture regression table. |