The packet function
may be derived from a recorded sound, in such a
way that a recorded sinusoid gives a single formant as described in the
previous section, but so that a recorded vocal sound, for instance, can give
its spectral envelope to the resynthesized result. The technique
still has the same possibility of
shifting and bandwidth control as in the purely synthetic case.
![]() |
We make the initial assumption that the signal to be analyzed is a sum of
(not necessarily harmonic) sinusoids. Assuming their frequencies are widely
enough separated we can treat them individually. We therefore assume that
the signal to analyze is a pure sinusoid of frequency
where
is the fundamental frequency of an analysis period
:
Under this assumption, we analyze a sinusoid:
by windowing it over a period
Exactly as in the synthesis step, the combination of a Hann window and a two-way overlap ensures that any sinusoidal component in the analyzed signal is represented in the resulting waveform in the most compact possible way, as a sum of two neighboring harmonics. If the analyzed sinusoid happens to lie on a harmonic of the analysis period, the resulting wavetable is again a pure sinusoid.
Since all the partials of the output have a fixed phase, different wavetables may be cross-faded coherently. For example, it is straightforward to analyze successive frames in a recorded vocal sample and play them back, successively cross-fading the frames to mimic the time-varying spectral envelope of the original sound.
The phase-bashing technique can be combined with sinusoidal/stochastic decomposition [10,11]. Figure 5, part (a), shows the decomposition step, in which an incoming sound is separated (in real time or not) into sinusoidal and noisy parts. Each is separately divided into a succession of analysis frames and converted into phase-bashed wavetables.
![]() |
For the reconstruction step (Figure 5 part b), the sinusoidal and noisy tables are each used to reconstruct signals as in Figure 2. The noisy part is then de-pitched by modulating it with a band-limited noise signal. For best results several slightly time-shifted copies are modulated separately [4].
Figure 6 shows a real situation in which a sung vowel (/a/ as in ``la") is
analyzed and resynthesized. Part (a) shows an analyzed spectrum of the
original voice, whose frequency is about 600 Hz. Two `formants' have
center frequencies about 1.5
and 5.5
.
In part (b), the voice is resynthesized at a low pitch (about 170 Hz.) to show a sharp reconstruction of the original spectrum. The result has a formant for each harmonic of the original sample.
Part (c) shows the result of adding bandwidth by increasing the
parameter
in order to erase the visible quantization of the spectrum of part (b) around
harmonics of the original fundamental.
Unfortunately (I am grateful to a reviewer who pointed this out!) the /a/ vowel should normally contain two formants at about 800 and 1000 Hz; since the fundamental frequency of the recorded sample is so absurdly high, the correct formants are not possible to infer from the spectrum. But I'm running out of space and must stop here.