From analogue-request@magnus.acs.ohio-state.edu Tue May 25 20:58:26 1993 Received: by quark.magnus.acs.ohio-state.edu (5.65/3.910213) id AA06463; Tue, 25 May 93 20:57:52 -0400 Errors-To: analogue-request@magnus.acs.ohio-state.edu Sender: analogue-request@magnus.acs.ohio-state.edu Received: from SPEECH1.CS.CMU.EDU by quark.magnus.acs.ohio-state.edu (5.65/3.910213) id AA06458; Tue, 25 May 93 20:57:51 -0400 Received: from SPEECH1.CS.CMU.EDU by SPEECH1.CS.CMU.EDU id aa20626; 25 May 93 20:57:04 EDT To: Andrea TONI Cc: analogue@magnus.acs.ohio-state.edu Subject: Re: -- vocoder -- In-Reply-To: Your message of "Mon, 24 May 93 10:48:40 +0700." <9305240842.AA02172@quark.magnus.acs.ohio-state.edu> Date: Tue, 25 May 93 20:56:57 -0400 Message-Id: <20622.738377817@SPEECH1.CS.CMU.EDU> From: Yoshiaki_Ohshima@SPEECH1.CS.CMU.EDU Status: OR hi: dan wiebe's message quoted by andrea toni contains misconceptions and also seems to miss out some important bits from the viewpoint of the acoustic theory of speech production, speech perception, and timbre perception in terms of their roles in the applications of the channel vocoder for musical purposes. --aki (aki@speech1.cs.cmu.edu) ---------------------------------------------------------------------------- > A vocoder is a device that combines the frequency distribution of >one signal with the waveform of another to produce a single output signal. this should be rectified. it's the quasi-stationary envelope of the energy spectrum that controls the carrier, not the frequency distribution. anyone who knows the source-filter theory of speech production would see the difference. >The frequency bands on the equalizer are slaved to the corresponding frequency >bands on the analyzer...so that if the high-frequency content of the >spectrum-control signal suddenly goes up, the high end of the equalizer is >instantly boosted a corresponding amount, and you end up hearing more of the >high end of the waveform-control signal. well, not really. it's again the quasi-stationary nature of the vocal tract filter envelope that controls the filter bank. the typical update rate should be 5~10ms, and each set of time-varying filter control signal must represent a 20~30ms segment of speech in an overlapped manner. it's not instantaneous. >waveform-control input. Since in a lot of (non-Oriental) languages, ^ ^^^ ^^ ^^^^^^^^^^^^ >spoken words depend more on dynamic filtering than on pitch, you can speak >into the microphone in such a system and, by imposing the spectral character- ^^^^^^^^ ^^^^^^^^^ >istics of your voice on the output from a wildly-fuzzed electric guitar, make ^^^^^^ >the guitar seem to sing words. this is doubly wrong. if it addresses the intelligibility issues at the phonetic identification level, it is indeed SO in all languages and there is no difference among oriental and non-oriental languages. on the other hand, if it addresses the word intelligibility issues or the perception of speech in general, it is NOT so in any languages. it was actually marginally better describing it as "spectral characteristics" than spectrum distribution. but again the spectral characteristics, being the short-term energy spectrum of human speech, is the resulting convolution of glottal spectral features, the spectral envelope of the vocal tract filter, the harmonic fine structures of the voicing (or the energy spectrum of the turbulant unvoiced source), and the radiation characteristics. among them, what the channel vocder is trying to extract and make use of as the control signal is mostly the spectral envelope of the vocal tract filter, which makes the bpf'ed carrier sound like "talking". > A more flexible arrangement than the one illustrated above would >allow you to move the center frequencies of the analyzer and equalizer >bands, so that, for instance, you could modulate the entire 20-20KHz >frequency range of the equalizer with only the lower half (20-10KHz) of the >spectrum of the input signal, with higher-frequency information being >discarded. An even more flexible arrangement would allow you to change i have no idea how allowing cf's moving around has anything to do with disregarding the band above 10kHz. actually these are different threads. the one being the principle of codec for more efficient use of the transmission bandwidth, the other being related to the fact that we don't use information above 10kHz to figure out what was spoken, leading that it's bascially a 200Hz~ 10kHz bandwidth that is required to make the vocal tract filter decent. besides, it only requires from 350Hz to 3.4kHz (remember telephone?) for speech to be truly understandable. before blindly going for more channels and finer resolution or bandlimiting the signal on the contrary, we should first assess the nature of the problem and the desirable quality of the end results by better understanding them. also if we look at vocoding as signal processing applied to the musical instruments. perceptual effects on the "carrier" instrument is also very important. this should be discussed in terms of psychophysical findings in timbre perception and we should keep in mind that the resolution of the bpf's and their phase alignment are also extremely important, not to speak of the bandwidth of the filters. >the control connections between analyzer and equalizer frequency bands-- >so that you could reverse them, and have high-frequency material control >low-frequency equalization, and vice versa. (Wonder what that would sound >like. Any of you transistor jockeys out there have the equipment (and the >motivation) to try it?) it's just some weird sort of a bpf bank controlled by irrelevant band-limited signals. someone may find it creative but it's no longer vocoding. so i'd disregard nonsense. mind you it may sound cool inasmuch as it's a modulation effect of which control signal comes from human activity. From analogue-request@magnus.acs.ohio-state.edu Wed May 19 18:59:44 1993 Received: by quark.magnus.acs.ohio-state.edu (5.65/3.910213) id AA19385; Wed, 19 May 93 18:57:19 -0400 Errors-To: analogue-request@magnus.acs.ohio-state.edu Sender: analogue-request@magnus.acs.ohio-state.edu Received: from relay2.UU.NET by quark.magnus.acs.ohio-state.edu (5.65/3.910213) id AA19376; Wed, 19 May 93 18:57:17 -0400 Received: from spool.uu.net (via LOCALHOST) by relay2.UU.NET with SMTP (5.61/UUNET-internet-primary) id AA15752; Wed, 19 May 93 18:57:21 -0400 Received: from island.UUCP by spool.uu.net with UUCP/RMAIL (queueing-rmail) id 185540.26751; Wed, 19 May 1993 18:55:40 EDT Received: from guam.island.com by island.COM (4.1/SMI-4.1) id AA10468; Wed, 19 May 93 15:27:19 PDT Received: by guam.island.com (4.1/SMI-4.1) id AA00237; Wed, 19 May 93 15:29:24 PDT Date: Wed, 19 May 93 15:29:24 PDT From: kin@guam.island.COM (Kin Blas) Message-Id: <9305192229.AA00237@guam.island.com> To: analogue@magnus.acs.ohio-state.edu Subject: Another Question Status: OR Hi, Thanks to all of you who replied to my Vocoder question! Some of you on this list seem to have alot of knowledge about sounds and synth architectures ... I'm a guitar player trying to play and learn about keyboards and have no background in sound at all and was wondering if it was possible to simulate a Leslie effect using an LFO, TVA, and TVF? I've tried doing this, and it just doesn't sound right. Am I on the right track? Should I give up? Thanks, --== Kin Blas ==-- kin@island.com From analogue-request@magnus.acs.ohio-state.edu Tue May 25 20:58:26 1993 Received: by quark.magnus.acs.ohio-state.edu (5.65/3.910213) id AA06463; Tue, 25 May 93 20:57:52 -0400 Errors-To: analogue-request@magnus.acs.ohio-state.edu Sender: analogue-request@magnus.acs.ohio-state.edu Received: from SPEECH1.CS.CMU.EDU by quark.magnus.acs.ohio-state.edu (5.65/3.910213) id AA06458; Tue, 25 May 93 20:57:51 -0400 Received: from SPEECH1.CS.CMU.EDU by SPEECH1.CS.CMU.EDU id aa20626; 25 May 93 20:57:04 EDT To: Andrea TONI Cc: analogue@magnus.acs.ohio-state.edu Subject: Re: -- vocoder -- In-Reply-To: Your message of "Mon, 24 May 93 10:48:40 +0700." <9305240842.AA02172@quark.magnus.acs.ohio-state.edu> Date: Tue, 25 May 93 20:56:57 -0400 Message-Id: <20622.738377817@SPEECH1.CS.CMU.EDU> From: Yoshiaki_Ohshima@SPEECH1.CS.CMU.EDU Status: OR hi: dan wiebe's message quoted by andrea toni contains misconceptions and also seems to miss out some important bits from the viewpoint of the acoustic theory of speech production, speech perception, and timbre perception in terms of their roles in the applications of the channel vocoder for musical purposes. --aki (aki@speech1.cs.cmu.edu) ---------------------------------------------------------------------------- > A vocoder is a device that combines the frequency distribution of >one signal with the waveform of another to produce a single output signal. this should be rectified. it's the quasi-stationary envelope of the energy spectrum that controls the carrier, not the frequency distribution. anyone who knows the source-filter theory of speech production would see the difference. >The frequency bands on the equalizer are slaved to the corresponding frequency >bands on the analyzer...so that if the high-frequency content of the >spectrum-control signal suddenly goes up, the high end of the equalizer is >instantly boosted a corresponding amount, and you end up hearing more of the >high end of the waveform-control signal. well, not really. it's again the quasi-stationary nature of the vocal tract filter envelope that controls the filter bank. the typical update rate should be 5~10ms, and each set of time-varying filter control signal must represent a 20~30ms segment of speech in an overlapped manner. it's not instantaneous. >waveform-control input. Since in a lot of (non-Oriental) languages, ^ ^^^ ^^ ^^^^^^^^^^^^ >spoken words depend more on dynamic filtering than on pitch, you can speak >into the microphone in such a system and, by imposing the spectral character- ^^^^^^^^ ^^^^^^^^^ >istics of your voice on the output from a wildly-fuzzed electric guitar, make ^^^^^^ >the guitar seem to sing words. this is doubly wrong. if it addresses the intelligibility issues at the phonetic identification level, it is indeed SO in all languages and there is no difference among oriental and non-oriental languages. on the other hand, if it addresses the word intelligibility issues or the perception of speech in general, it is NOT so in any languages. it was actually marginally better describing it as "spectral characteristics" than spectrum distribution. but again the spectral characteristics, being the short-term energy spectrum of human speech, is the resulting convolution of glottal spectral features, the spectral envelope of the vocal tract filter, the harmonic fine structures of the voicing (or the energy spectrum of the turbulant unvoiced source), and the radiation characteristics. among them, what the channel vocder is trying to extract and make use of as the control signal is mostly the spectral envelope of the vocal tract filter, which makes the bpf'ed carrier sound like "talking". > A more flexible arrangement than the one illustrated above would >allow you to move the center frequencies of the analyzer and equalizer >bands, so that, for instance, you could modulate the entire 20-20KHz >frequency range of the equalizer with only the lower half (20-10KHz) of the >spectrum of the input signal, with higher-frequency information being >discarded. An even more flexible arrangement would allow you to change i have no idea how allowing cf's moving around has anything to do with disregarding the band above 10kHz. actually these are different threads. the one being the principle of codec for more efficient use of the transmission bandwidth, the other being related to the fact that we don't use information above 10kHz to figure out what was spoken, leading that it's bascially a 200Hz~ 10kHz bandwidth that is required to make the vocal tract filter decent. besides, it only requires from 350Hz to 3.4kHz (remember telephone?) for speech to be truly understandable. before blindly going for more channels and finer resolution or bandlimiting the signal on the contrary, we should first assess the nature of the problem and the desirable quality of the end results by better understanding them. also if we look at vocoding as signal processing applied to the musical instruments. perceptual effects on the "carrier" instrument is also very important. this should be discussed in terms of psychophysical findings in timbre perception and we should keep in mind that the resolution of the bpf's and their phase alignment are also extremely important, not to speak of the bandwidth of the filters. >the control connections between analyzer and equalizer frequency bands-- >so that you could reverse them, and have high-frequency material control >low-frequency equalization, and vice versa. (Wonder what that would sound >like. Any of you transistor jockeys out there have the equipment (and the >motivation) to try it?) it's just some weird sort of a bpf bank controlled by irrelevant band-limited signals. someone may find it creative but it's no longer vocoding. so i'd disregard nonsense. mind you it may sound cool inasmuch as it's a modulation effect of which control signal comes from human activity.