OpenAI Realtime API
Overview
When configuring voice agents on the OpenAI Realtime API, you may need the agent to pronounce a brand name, acronym, or product name in a specific way — for example, saying "KVX" as "kivex" rather than spelling out the letters. This article covers the prompt-based techniques available and how to get the best results.
Understanding the challenge
OpenAI's Realtime model generates speech directly from the prompt and conversation. When it encounters an unfamiliar string of letters like "KVX" or "NTR," it tends to default to spelling them out ("kay-vee-ex," "en-tee-arr") because it has no prior association between that string and a spoken word.
This is different from well-known acronyms. The model already knows that "SQL" is commonly spoken as "sequel," so a simple prompt instruction works. But for proprietary brand names and novel terms the model has never encountered as words, prompt-based pronunciation control is less reliable and may require experimentation.
Approach 1: Reference Pronunciations section
OpenAI recommends adding a dedicated Reference Pronunciations section to your system prompt. Keep entries short and direct:
# Reference Pronunciations
- Pronounce "KVX" as "kivex"
- Pronounce "Solaire" as "sol-AIR"
Tips:
- Use one-line entries, not paragraphs of explanation
- Place this section near the top of your prompt, after Role & Objective
- Use a familiar rhyming word or simple phonetic guide (e.g., "rhymes with fizz") rather than complex notation
- Keep the list short — only include terms the model is actually mispronouncing
Reliability: good for terms the model has some familiarity with. Less reliable for completely novel brand names.
Approach 2: Reinforcing pronunciation across multiple prompt sections
If a single Reference Pronunciations entry isn't enough, try placing the instruction in several natural locations throughout the prompt. Encountering the correct pronunciation at multiple points increases the model's compliance.
Example:
# Role & Objective
You are a concierge for KVX Solutions. Remember: "KVX" is always
pronounced as "kivex" — two syllables, one word.
# Reference Pronunciations
- Pronounce "KVX" as "kivex" (two syllables, rhymes with "give-ex")
# Conversation Flow
## Opening
- Greet the customer. Say "kivex Solutions" — do not spell out the letters.
# Guardrails
- When speaking aloud, always say "kivex" — never spell out K-V-X.
Tips:
- Vary the wording slightly at each location so the model doesn't skip over repeated identical text
- Keep each instance brief — a short reminder, not a full explanation
- Good locations to reinforce: Role & Objective, Reference Pronunciations, Conversation Flow (especially the Opening), and Guardrails
Reliability: better than a single instruction but still inconsistent for novel brand names.
Approach 3: Phonetic spelling in the prompt
If the model persistently spells out the letters despite repeated instructions, you can try writing the brand name phonetically throughout the entire prompt — for example, writing "Kivex" everywhere instead of "KVX."
The model will read "Kivex" as a word and say it correctly. The trade-off is that the model will also write "Kivex" in chat messages rather than the official brand spelling "KVX."
Whether this trade-off is acceptable depends on your use case. If the voice experience is the priority and chat text is secondary, this may be the right approach. If correct written spelling matters equally, this approach has limitations.
Reliability: high for pronunciation. The written brand spelling will not be correct in chat output.
Best practices summary
- Start with a clean Reference Pronunciations section using simple one-line entries
- If that's not enough, reinforce the pronunciation in 3-4 other natural locations in the prompt
- If the model still spells out the letters, consider using the phonetic spelling throughout the prompt (accepting the trade-off in written output)
- Avoid listing wrong pronunciations as negative examples — this can prime the model toward the very errors you're trying to avoid
- Test after each change — small prompt adjustments can shift behavior significantly with the Realtime model
Pronunciation control for novel brand names is an area where the Realtime API has known limitations. If you're experiencing persistent issues after trying these approaches, contact our team so we can work together on a solution for your specific case.