A Dialogue Pipeline
When I start at a studio, I almost always find myself building or rebuilding a pipeline for game dialogue.
I wanted to write down here some things I’ve learned about that, my reasoning behind it, and where it bridges the gap between other pieces of gamedev. I hope there’s something here you’ll find useful.
This is a bit of a brain dump and there’s a lot here. Forgive me if I’ve made too many assumptions!
So what do I mean by dialogue here?
Characters talking. To each other, or to the player.
In (many) games, this dialogue is presented to the user in two ways at the same time:
- As text, on your screen. For example, as subtitles, each line of text normally attributed to a particular character.
- As audio, played through your speakers. These lines are normally linked to a “source” object in your gameplay space, such as a 3D character.
Some games only have text. And that’s fine. There are probably some things that I’m going to say which are still useful to you.
Some games only have audio, and frankly if you can’t provide subtitles you should know better. :-)
So what I’m going to discuss here is this combination — text and audio, both representing the same spoken line of dialogue.
Some Common Production Elements
Before laying out my general approach to this, here are some facts which are often true about a production pipeline for dialogue:
- Dialogue starts as text. Whether written in Word, in a Google Sheet, as a script in Final Draft, in Ink files, or in Articy Draft, dialogue starts as text. And probably very rough text, which then needs to be rewritten and refined until it’s acceptable.
- Text lines are used to prototype. Dialogue lines are often tested in-game simply as subtitles as the game develops, because nothing has been recorded yet.
- Text lines evolve into subtitles. Eventually our text lines will be polished to become proper subtitles, and shipped with the game.
- Text lines are localised. At some point during production those text lines will be sent for localisation, and the localised version of each line for each supported language will be added to the game. These are rarely done all at the same time — often the bulk of the localisation might be done in one delivery, but others might come in later for last-minute lines.
- Text lines evolve into rough audio. Often during testing, particularly for timings, dialogue will be scratch recorded by developers or generated using text-to-speech tools and turned into audio files. These audio files will often be terrible. But they are very useful in testing, so that we can hear that audio events are firing at the right time, and so that we can rough out timings.
- Rough audio is replaced with studio recordings. Exactly when this happens depends on the scene and timing. Particularly for animation, some scene dialogue may be recorded with the proper actors very early on. But usually there is a late-phase recording of final dialogue — it’s rare that everything is done at the same time, which means pipelines need to cope with irregular deliveries of studio audio.
- Text lines are re-edited based on studio audio. Often the actual words in the lines change in the recording studio, as the actors and directors work together to polish the delivery and give a better performance.
- Studio audio is processed to add effects. For example, adding spatial effects such as echo; or radio effects, or monster effects, all sorts of things!
- Multiple language audio is added. Sometimes the whole recording process happens again for multiple languages.
A Dialogue Pipeline?
There’s a lot to keep track of here. In particular some things I care about are: What state is a particular line in? Has it been edited? Is it ready to record? Has it already been recorded? Which actor needs to record it, for which character? What audio processing needs to be done on it? Fundamentally, how much more work do we still need to do to finish this game?
This what I mean when I talk about a Dialogue Pipeline; a system which helps us manage this and makes it easy to keep track of what goes where.
AAA studios often have tools in place for all this, developed over years of getting this wrong and then refining their process until eventually they are getting it right, for their specific cases.
Many smaller studios don’t have this, and it’s not something that engines provide comprehensive answers for yet. There are quite a few solutions for audio, and quite a few solutions for localisation, but very few of them fuse the two into a full dialogue pipeline.
A Useful Pipeline
So a useful pipeline needs to have a way to attach metadata about the state of any given dialogue line asset, and to fuse together the relevant text assets with the relevant audio assets.
Start with Localisation
I find a useful place to start is the localisation system, as that will incorporate a lot of functionality we’ll need. That should already have the concept of a unique Line ID for each line, and some way of fetching and updating the text for the main game language at edit or runtime.
What do I mean by a line ID? An identifier which is used to refer to one and only one specific line in the game. Almost all localisation systems will have one. For example:
Don’t have a localisation system? Sort that out, you will need it whether or not you have dialogue!
Where does your data live?
I strongly recommend that your localisation system stores its data along with the game code in the version control system you’re using for your game assets. That way it stays in absolute sync with the rest of the game — I’ve seen things get out of step way too often when using an external database.
And then treat external editing systems — such as, for example a Google Sheet or some other sort of online database — as a temporary export target. Export to a Sheet, edit, reimport. The truth stays with your game. Same for localisation: export, localise, reimport.
Don’t try and make these incremental numbers like
argument_cave_2 etc. Because during the development of the dialogue lines will be reordered, shuffled, cut, changed to different dialogues, and all sorts of things — your IDs are rarely helpful. It’s simpler to create some sort of hash system which produces an ID, perhaps mashed up by game area or broad scene name.
Edited to Add:
IMPORTANT: Once you have assigned a Line ID to a string and put it into the repo, don’t change it!
This should stay constant across production or you’ll confuse the audio team and localisation departments and everything will become a mess. Yes, don’t change it even if you spelled it wrong.
(Thanks to Sini Downing from Side for pointing that out — it’s something that I assumed people would realise but had forgotten to write down!)
Make sure there’s some way to search for the line ID, and also search for text within a given line! That process needs to be easy, as you’ll use both of those a lot!
Text Line Metadata
On top of that localisation system, it’s then good to add some useful basic metadata. These apply to the text in the main language — this is what everything else is derived from, it is your master content.
A value that tells us what the status of the text line. I use values such as these:
- Placeholder: a temp line that needs to be rewritten. If a programmer adds a dialogue line in game, I’ll ask them to use this, because it tells the writers they need to look at it and rewrite it.
- Draft: a first draft of a line, needs another pass.
- Polished: a second pass, checked.
- Edited: after an edit pass by someone who speaks the target language. Ready for recording.
- Rework: mark the line as needing a rewrite.
- Final: checked, in sync with audio, and ready for ship.
Note that this list is non-exhaustive and will almost certainly be different for you, to fit your own workflow. But you get the idea!
I often try to use colour in the tools for different statuses to make them super-easy to pick out.
But what are these for? Primarily, these are visible, and searchable. You can see how many lines in your dialogue system are placeholders and need to be rewritten. You can see which ones are ready to be recorded. So your dialogue pipeline should include the ability to list these out.
I also add several comment fields:
- Context Comment: A comment that will give context about how the line is going to be used. Useful for developers, writers, localisers and actors!
- Developer Comment: An internal comment for developers only, which is likely to talk about implementation detail.
- Localisation Comment: A comment that will be seen by the localisation team.
- Voice Comment: A comment that will be seen in the voice script. Often performance notes go here.
I find letting lines be tagged with arbitrary tags can be really useful. So you could add
#bark or whatever else is relevant to easily navigate your way through the mass of game lines. Again, add search facilities for that.
Writing the Text
The next part is figuring out how the lines actually get into your localisation tool. If a writer — or anyone else involved in the process — wants to add a line, how do they do that?
That really very much depends on the project, your writers, and the engine! In almost all cases it’s a two step process:
- Writing: Write the lines in some tool.
- Importing: Have a script which imports those lines from the tool into the localisation system, updating any new or altered lines with an appropriate Voice Status, and generating Line IDs as needed.
As examples, I’ve created systems that do this in several ways:
- Write the text in a script format (e.g. Final Draft) stored along with the game. Manually run an import process after you’ve altered them.
- Write it in Excel files, stored along with the game. Import that data as part of the build step.
- Write the text in an external online tool. Add a build-time feature which draws in that version of the text. (This is my least favourite, as things so easily get out of step.)
- Write the text in Ink. Add an autocompile feature which notices the Ink files have been updated and imports all the data into the localisation system. Convert any Ink comments and tags into appropriate values for the localisation system. (This is probably my favourite right now.)
One thing to think about here is how granular your source data is. If split into lots of files, it might seem like more effort, but means you can easily have writers working in parallel.
But what if my writers don’t have a copy of the game?
Hopefully you can set things up in a way where writers can deliver into your version control system, even if they don’t have the full game code. If not, either you can provide a pipeline where you export a file for them to work on (such as a Google or Excel Sheet) and reimport, as discussed above, or give them access to a web-based tool.
Wait, let’s back up a moment. What were your writers writing in that tool? Just lines of text? Usually — no.
It’s important to note that there are often at least two things happening during writing:
- Defining, for each line ID, what text to show.
- Defining, for any given scene or conversation, which line IDs to use and in which order — giving us a flow for the conversation. Sure, Bob says Hello, but does someone then respond? Who, and with which line?
That second part, the dialogue flow, can be simple (a single bark doesn’t have much of a flow!) or much more complex (branching text!). Whatever tool you’ve used will define that, and you need that information to be imported into your game alongside your localised text.
So in the runtime of your game that flow data doesn’t tell you about each line’s content, the localisation system will do that. What the flow data does is tell you which Line ID to use next and when.
This is the sort of thing that Ink or Articy Draft are good at, if you set them up properly, but this is also one of those “How long is a piece of string?” questions. Your dialogue flow toolset will depend on your game and its needs — it could be anything from a graphical flow system to a simple text file. Maybe you don’t have choices or branching conversation.
So, in some way, you’ll need to be able to specify your flow at edit time, and to query it at runtime.
You might find additional metadata that ties into whatever the flow is for your game. Things like “pause after this line” or “only ever show this line once” or “repeat this twice and then never say it again”. This all forms part of your runtime processing of the text flow and is something that you’ll have to establish for your particular game.
Look Who’s Texting?
That’s the base setup for text, apart from one thing — who’s speaking, exactly?
Another part of the metadata for each line is the character. Who says that line of dialogue?
Depending on the tools, a writer will assign a character to a line in different ways. Maybe they’re using a script format so they’re writing the character name in capital letters just above the line. Maybe it’s an Excel sheet and the character is listed in a column. Maybe it’s a line of Ink, in which case putting a prefix in front works well:
pilot: They’re coming in on my six!
In whichever case, your import tools should be able to parse the incoming data and attach a piece of metadata to that line which stores this character ID.
In the above example, the import tool would reparse that line and put it in your localisation system as something like this:
Remember it is a character ID, not final text that will be seen by the player. In the example above, the subtitles might look up a name for the
pilot character ID and translate this into:
“Flt. Lieutenant Cavendish: They’re coming in on my six!”
Writers are fallible (I know that’s highly unlikely, but still…) and you want to avoid errors in character names where possible.
It’s good to have a cast list stored with the rest of the game data. And when lines are imported, the character ID you extract should be checked against that cast list automatically and should complain when a line for a character that has never been heard of before is imported. It’s probably misspolled.
You’ll almost certainly want to attach other metadata to the character IDs, but we’ll get to that when we talk about audio…
Hopefully, given that you built all this on an existing localisation system, it’ll have all you need to export lines to the localisation team and to import them again.
But you might want to add some extra data or functionality to that — in particular things like the character, and the context and localisation comments. All those will make the localisation smoother.
Your game engine will have an audio system. And probably an audio team. Trust them. They know what they’re doing.
But now we want to somehow hook this all up so the audio production pipeline becomes part of our textual dialogue pipeline…
Firstly it’s worth knowing that most audio systems have an audio ID that identifies each individual sound. Isn’t that useful? We already have our Line ID! Can we just use that?
The answer, in most cases, is yes! Or at least an adaptation of our Line ID. That’s useful!
It might be as simple as your
pilot_warn_xg63 line actually being stored as audio in a
pilot_warn_xg63.wav file. Or it might be that Wwise or FMOD store an event called
Play_pilot_warn_xg63. Or maybe your Unity assets have an audio clip named something derived from that. Whichever way, the Line ID is probably what you need to make this connection.
Voice Line Metadata
We really want to track the status of a given audio asset. Audio teams may already have that sort of tracking in place, and if so it’d be good to see whether it can be exposed to our dialogue pipeline. But if not, it’s good to think about more metadata, and sync up with the audio team so that they’re happy to use it.
Like the line status, it’s good to know where any given voice line is in the production process. Statuses I’ve used in the past include:
- None: No voice line exists. We should record one.
- Placeholder: A temp placeholder recording, e.g. AI or scratch audio.
- Recorded: A good quality line has been recorded but is not yet processed with sound effects.
- Processed: Good-quality line with processing.
- Rework: Mark the line as needing to be rerecorded.
- Final: Checked and ready for ship.
Again these are non-exhaustive and your mileage may vary — it will depend on your own audio pipeline.
And once again using colour highlighting in your tools is good here!
We don’t just want to worry about metadata for each individual line — we also care about the character. Some useful extra bits of data to add here:
Which actor has been cast as this character? This is useful to be able to dump out a list of how many lines each actor will be needed for.
What is the subtitle for this character name?
Do we need to apply particular voice processing to a line or set of lines? Then we need some way to specify that so that the audio team know! E.g. a radio effect or a monster voice or something similar. Sometimes you might need this at a character level, sometimes at an individual line level.
Where Are They Now?
Talking about character, there’s another thing which is part of our runtime data, rather than something necessarily in the edit tool. Say you want character
pilot to say a line. Where are they in 3D right now? So your runtime system to play a line of dialogue needs to have some way to set up the game object that represents that character. This is often something we do at the start of a level.
With all of this information, we want to be able to export all the lines for a particular character, actor, scene, or in a particular voice status (or some combination of all that) as a recording script. Often an Excel file or Google Sheet is acceptable here. Although for some cutscenes it’s lovely to have a readable movie script!
Make sure that the export includes any context or voice comments.
Hooking It All Up
There are some knock-on effects to think about here, and some workflow bits and pieces to add if you can.
- Text lines affect audio lines: If a line’s Text Status gets set to Rework, after it was Polished (or Final), then that means that the Voice Status should also get set to Rework, as the line is now out of date!
- Change notification: Somehow the fact that a line needs to be changed needs to be communicated to both the writing and audio teams. This could be automated in a build with warnings etc.
- Recorded audio means edited text: Almost always when lines are imported into the game from a final recording, then the lines affected will need to be edited to match what was really recorded. So maybe the text status needs to be updated in some way.
So there we are. Hopefully at runtime you should now be able to:
- Ask your dialogue system which line IDs to use — the dialogue flow.
- Fetch the textual version of those line IDs from the localisation system.
- Trigger the right audio file based on that line ID.
What About Cutscenes?
Ideally your cutscene system can be adapted to use some or part of this system. The flow and timing should come from your cutscene authoring tool — which might be some sort of timeline — but ideally you want to trigger audio using the same pipeline as you use for normal speech, by sending a line ID. That way you can use the dialogue system for subtitles and so on.
If not… welcome to the hell that is manually timing a set of subtitles. I don’t recommend it!
What About Lipsync?
Lipsync animation might also be a thing you want to worry about as part of this system! An automated pipeline to import audio files and produce a lipsync file at build time is certainly useful and a thing I’ve done in this sort of setup. Normally you want to make sure that is invalidated if the audio file is updated or replaced, so it’ll get rebuilt automatically.
The setup I describe here is a common one, but does expect there to be one voice language along with multiple subtitled languages. It is easily expandable to use multiple voice languages.
It annoys me that there aren’t standards for a lot of this! But that’s partly because there are so many different variables involved. What is the style of games you’re making? Which engine are you using? What is your version control system? How technical are your writers? What is your audio team’s voice production process? What voice tools do they use? How do you work with localisers? There’s no one-size-fits-all. I think there could be improvements and commonalities there, particular at an engine level, but for now it’s a minefield.
Once you’ve nailed a process for your studio, it’s good to move it with you from project to project, it’ll cut down your time hugely.
Automate whatever you can — manual imports and edits can become very very painful.
It’s very, very important to involve game coders, designers, writers, cinematics teams, localisers and the audio department when you’re designing this sort of system. Of course, in smaller companies that can be a very small number of people!
That’s all folks — I hope you find something that makes your lives easier in all of this!