Block 3 — Summarizer model choice¶

Voxpost needs a local, multilingual summarizer for a worldwide product. The shipped default is csebuetnlp/mT5_multilingual_XLSum (44 languages).

Research method: web / papers first → shortlist → verify each on Hugging Face against Voxpost constraints below.

Voxpost constraints (scorecard)¶

#	Requirement	Must have
1	Multilingual — summarize in the same language as the email	Yes
2	Local download — no cloud inference API	Yes
3	Short output — one speakable line (~15–30 words)	Yes
4	Transformers — fits existing Block 3 stack	Strong preference
5	Desktop CPU — shipped desktop app, no GPU assumed	Yes
6	Shippable license — Apache/MIT-style	Yes
7	Maintained / proven — real downloads, model card, benchmarks	Strong preference

Reality check: There is no widely adopted, multilingual, email-specific, small, local summarizer on Hugging Face. Every option is a tradeoff between language coverage, size/speed, and email domain.

What the web recommends (2024–2025)¶

Source / pattern	Recommendation	Fits Voxpost?
Multilingual summarization papers (XL-Sum, MLSUM, “Towards Unifying Multi-Lingual and Cross-Lingual Summarization”)	mT5 family fine-tuned on XL-Sum or MLSUM	Yes — seq2seq, many languages
Edge / low-resource NLP surveys	mT5-small (~300M) or quantized mBART for CPU	Partial — small mT5-XLSum forks exist but quality is poor (see below)
Email assistant projects (Vera, E.M.Pilot, local inbox tools)	Small LLMs (Qwen2, Gemma, Llama) with prompts	Partial — multilingual + flexible, but causal LM, slower, harder to constrain length
Email-specific HF models (wordcab, IrisWiris, 7B email LLMs)	T5 on email or 7B instruct	English-only or too heavy for CPU v1
Microsoft UniSumm (ACL 2023)	Unified few-shot summarizer	English-only on HF; weights not first-party maintained

Web consensus for “many languages + local + summarization”: mT5-on-XL-Sum, not email-tuned T5.

Hugging Face verification (candidates)¶

Tier A — Best fit (recommended)¶

csebuetnlp/mT5_multilingual_XLSum ¶

Field	Verified on HF
Task	`summarization` / `text2text-generation`
Base	`google/mt5-base` (~580M params)
Languages	44 tagged (fr, en, ar, es, de, pt, hi, ja, zh, tr, ru, vi, …)
Downloads	~2.4M all-time — dominant multilingual summarizer on HF
Training	XL-Sum (BBC news, 1.35M pairs) — ACL 2021 paper
French XL-Sum ROUGE-1	~35.3 (published per-language table)
Output length	Model config `max_length=84` — good for short cues
API	`AutoModelForSeq2SeqLM` — drop-in for Voxpost
Input	Plain article text (no `summarize_brief:` prefix)
License	Research model; check repo before commercial ship

Voxpost fit: Best match for worldwide + same-language output + Transformers. Weak on email domain (news-trained) and CPU latency (~2–5s).

Suggested Voxpost prompt shape:

From: {sender}. Subject: {subject}. {cleaned_body}

Generation: max_length=84, num_beams=4, no_repeat_ngram_size=2 (from model card).

Tier B — Viable alternatives (not default)¶

google/flan-t5-base (~250M)¶

Pros	Cons
58M+ downloads; Apache 2.0; instruction-tuned	Not a dedicated summarizer — needs prompt engineering
Tagged en, fr, de, ro + “multilingual”	Same-language output not guaranteed
Smaller / faster than mT5-base XLSum	Weaker than XLSum-finetuned models for summarization

Example prompt to test: Summarize this email in the same language as the email:\n{body}

Verdict: Reasonable fallback if XLSum is too slow, or for early prototyping — not as reliable for “always French in → French out.”

google/flan-t5-small (~80M)¶

Same tradeoffs as flan-t5-base, faster, lower quality.

Qwen/Qwen2.5-0.5B-Instruct (~500M)¶

Pros	Cons
29+ languages (per Qwen2.5 paper/blog)	Causal LM — new pipeline (chat template, not seq2seq)
43M+ downloads; Apache 2.0	Easy to ramble; needs strict system prompt + `max_new_tokens` cap
Strong for future “translate to language X” mode	Slower/heavier than T5-small; CPU marginal

Verdict: Best Phase 2 candidate for translate-mode + complex instructions — not the simplest Block 3 swap.

Tier C — Keep as optional profile¶

wordcab/t5-small-email-summarizer (~60M)¶

Pros	Cons
Email-trained; `summarize_brief:` prefix; very fast on CPU	English only — confirmed failure on French mail
Apache 2.0; already integrated	Wrong default for worldwide product

Verdict: Rejected — English-only; failed on French inbox mail in live tests.

Tier D — Rejected after HF check¶

Model	HF finding	Why rejected
ankitkupadhyay/mt5-small-finetuned-multilingual-xlsum	ROUGE-1 ~9; 17 downloads/mo; empty model card	Quality too low
T-Systems-onsite/mt5-small-sum-de-en-v2	DE + EN only	Not worldwide
maan909/unisumm	English only; 7 downloads/mo	Not multilingual on HF
Radiantloom/radiantloom-email-assist-7b	7B LLM	Too heavy for CPU v1
Walid777/llama3-8b-emails-summarization	8B	Too heavy
vapit/bart-large-cnn-finetuned-for-email-and-text	English BART	Not multilingual
Per-language `mt5-small-finetuned-xlsum-{lang}` forks	Single language each	Doesn’t scale for “worldwide” one binary
WiseIntelligence/mT5_multilingual_XLSum-Optimum-ONNX-Quantized-AVX2	No model card; 6 downloads/mo	Unmaintained; risky
GGUF XLSum builds (llama.cpp)	Different runtime	Out of scope until Voxpost adds GGUF path

Scoring matrix (Voxpost worldwide)¶

Model	Multilingual	Email domain	CPU size	Same-lang out	Transformers	HF trust	Total
mT5_multilingual_XLSum	★★★★★	★★☆☆☆	★★☆☆☆	★★★★★	★★★★★	★★★★★	Best default
flan-t5-base (prompted)	★★★☆☆	★★☆☆☆	★★★☆☆	★★★☆☆	★★★★★	★★★★★	Fallback
Qwen2.5-0.5B-Instruct	★★★★☆	★★★☆☆	★★☆☆☆	★★★★☆	★★☆☆☆	★★★★★	Phase 2 / translate
wordcab t5-small	★☆☆☆☆	★★★★☆	★★★★★	★☆☆☆☆	★★★★★	★★★★☆	Rejected (EN-only)

Decision (updated after deep dive)¶

Default for worldwide product¶

csebuetnlp/mT5_multilingual_XLSum

Only HF model that clearly checks: 44 languages, summarization task, millions of downloads, published per-language benchmarks, short outputs, Transformers seq2seq.

Config¶

[summarize]
model = "csebuetnlp/mT5_multilingual_XLSum"

Optional override via VOXPOST_SUMMARIZER_MODEL for another compatible seq2seq hub id.

Pipeline mitigations (still required)¶

Even with mT5 XLSum:

Keep email_clean.py, polish_for_tts(), is_usable_summary(), template fallback — news model on forwards/noise will still fail sometimes.
Language detect → set Supertonic [tts].lang (and future [speech] mode).
Optional ONNX export of XLSum later (like Block 4 Supertonic) if CPU latency blocks UX.

Implementation checklist¶

Add [summarize] to voxpost.toml (model)
mT5 XLSum plain-text email input adapter
Benchmark French forward mail + English sample on real hardware
Document download size ~2.3GB and RAM ~2GB+ in BLOCK_3_SUMMARIZE.md

References¶

XL-Sum paper (ACL 2021)
mT5_multilingual_XLSum model card
FLAN-T5 paper
Qwen2.5 technical report
Supertonic TTS: 31 langs — align [tts].lang with summarizer output