Back in 2021 I made a mod for Deus Ex 1 that fixes the lipsyncing and blinking, which, I betcha didn’t know, was broken since ship. Everything I wrote about it is on Twitter, and it oughta be somewhere else, so here’s a post about it. The mod itself can be downloaded here.
I guess I was playing DX1 and thinking, geez, was this lipsync always this bad? In a weird way? It’s insta-snapping mouth shapes, but they’re not always the same mouth shapes. Is this broken? I couldn’t find anything online about it, but I did find this article: an interview with Chris Norden, a coder on DX, where he goes into the lipsyncing and how it was, at one point, super elaborate and amazing, and they had to pare it back for performance reasons. I thought I’d check how much of this was done in Unrealscript (since the C++ source for DX is nowhere) and whether I could un-pare it. It turns out it was an extremely simple fix to get it as good as I got it, and I think that’s as good as you can get it until someone leaks the source code.
I’d messed around with lipsyncing stuff before and was familiar with the broad strokes of how it tends to work via my intense familiarity with Half-Life 2: you figure out, hopefully automatically, the sounds (phonemes) present in a sound file (“oo”, “ah”, whatever) and map those to mouth shapes (visemes), then when the audio plays, move the mouth into the right shape for the phoneme we’re in at this moment. The figuring-out process is called “phoneme extraction”, at least by Valve, and Valve do this offline, because it takes a sec. In Valve’s case they append this phoneme information to the end of the .wav file, and it looks like this:
PLAINTEXT { Okay, I don't blame you for hesitating, but if we're gonna do this thing, then let's just get through it. } WORDS { WORD Okay 0.064 0.224 { 111 ow 0.014 0.096 1.000 107 k 0.096 0.142 1.000 101 ey 0.142 0.220 1.000 } WORD I 0.224 0.352 { 593 ay 0.220 0.310 1.000 105 iy 0.310 0.364 1.000 } WORD don't 0.352 0.496 { 100 d 0.364 0.396 1.000 111 ow 0.396 0.456 1.000 110 n 0.456 0.496 1.000 }
, etc. Phonemes, start times, end times. Easy!
My assumption is that the reason Deus Ex’s super cool lipsyncing was too expensive to ship was, they don’t seem to save this information anywhere, so I guess they were figuring out the phonemes in realtime. If correct, this is sort of a bummer – doing what Valve did would have scooped the whole cost out. Maybe there was more to it.
Anyway, the Unrealscript. Deus Ex is pre-Unreal having skeletal animation, it’s all vertex animation. The character heads have a few: relevant here, 7 visemes and a blink. nextphoneme is set from somewhere outside this code (probably a cpp audio system I can’t access) to A, E, F, M, O, T or U, which it doesn’t matter which is which and I don’t remember, or X, which is nothing (close mouth). Then this Unrealscript on the character sets the head’s anim sequence to the appropriate pose. This all happens on tick, but only if IsSpeaking . We have a tweentime we’re using to blend between these poses, so we should be seeing nice smooth blending, the lack of which is why I’m here in the first place! So what’s the problem?
The main thing is a dodgy frame rate check:
// update the animation timers that we are using animTimer[0] += deltaTime; animTimer[1] += deltaTime; animTimer[2] += deltaTime; if (bIsSpeaking) { // if our framerate is high enough (>20fps), tween the lips smoothly if (Level.TimeSeconds - animTimer[3] < 0.05) tweentime = 0; else tweentime = 0.1;
“tweentime” is how long it takes to blend to the next viseme in seconds; if 0, it’s an instant snap. The intent here is to skip blending entirely if our framerate is so low that it looks better snapping the lips around than showing any in-between poses, only it doesn’t work. The code is keeping Level.TimeSeconds from the previous frame and subtracting that from the current Level.TimeSeconds to get deltatime, which if it’s less than 0.05, we’re assumed to be getting less than 20fps. So it’s flipped.
... continue reading