Voxpost — TODO (future work)¶

Backlog for features after Block 1 (Gmail events). Not implemented yet.

Desktop UI (future)¶

A local desktop app (tray/settings window) for configuration and opt-in behavior. The daemon stays headless; UI talks to it via config files or a local API (TBD).

Must have (v1 UI)¶

Later (UI + opt-in)¶

Checkbox: “Include attachment in summary” (per-message or global default)
When checked: fetch attachment content in memory → feed into summarizer → TTS → discard
When unchecked: awareness only (filename / count), as today
Supported attachment types for opt-in summarization (TBD: .txt first, then PDF/DOCX with local parsers)
Preview last cue / last event (debug, no mail archive)

Explicitly out of scope for first UI¶

Full mail client / inbox browser
Storing email bodies or attachments on disk
Cloud TTS or cloud summarization as default
Per-user GCP / Pub/Sub setup in the UI (operator bundles this in the shipped app)

Pipeline blocks (reference)¶

Block	Scope	Status
1	Gmail events (`listen`)	Done
1b	Attachment metadata in events (awareness only)	Done
3	One speakable line (summarize in memory)	Done (CLI + mT5 XLSum default)
4	Local TTS playback (Supertonic 3 ONNX)	Done (CLI)
5	Desktop UI + onboarding	TODO
2	Rules (VIP, keywords, quiet hours)	Deferred — needs UI (Block 5)

Block 2 is not built headless. VIP senders, keywords, quiet hours, and “ignore newsletters” need a settings UI normal users can use. Until Block 5 ships, the pipeline treats every inbox event the same: summarize → speak (Blocks 3–4). No interactive rules editor from CLI for now — but a voxpost.toml for Supertonic/TTS knobs is planned (see Desktop UI → Shared config file).

Current order: 1 → 3 → 4 → 5 + 2 together (UI exposes rule config; daemon reads shared config when present).

Design notes for UI implementer¶

Trust boundary: Attachment content only after explicit user opt-in (checkbox).
Ephemeral: Summaries and fetched attachment text never persisted; same as body today.
Normal users: Never see gcloud, Pub/Sub, or OAuth client JSON — only “Sign in with Google”.
Operator: One GCP project + OAuth app for the product; see SETUP.md.
User settings TOML: ~/.config/voxpost/voxpost.toml — [tts] for Supertonic (voice, lang, speed, steps, playback), [speech] for translate vs speak-as-received, [rules] for Block 2. UI is the editor; daemon reloads on change (mechanism TBD: SIGHUP, file watch, or restart).

Example `voxpost.toml` (planned schema)¶

[tts]
voice = "M1"
lang = "en"          # or "na"; per-message override when speech.mode = "auto"
total_steps = 8
speed = 1.05
playback = "auto"    # sounddevice | aplay | auto
auto_download = true

[speech]
mode = "auto"        # auto = speak in email language | fixed = always target_lang
target_lang = "fr"   # used when mode = "fixed"

Open questions (decide before UI)¶

Language pipeline: For “always translate to X”, which local model(s)? (multilingual summarizer vs separate translate step; keep on-device only)
Summarizer model (worldwide): csebuetnlp/mT5_multilingual_XLSum (44 langs) — BLOCK_3_MODELS.md
Language detection: Library choice for “speak as received” (langdetect, lingua, etc.) and confidence threshold before falling back to na / user default
UI stack: Electron, Tauri, or native (GTK/Qt)?
Daemon ↔ UI: config file only vs local socket/HTTP?
Single-account vs multi-account in v1 UI?
Which local TTS engine first → Supertonic 3 (supertone-inc/supertonic, pip install supertonic, ONNX on-device)