%matplotlib inline
import numpy as np
import sounddevice as sd # new import
from scipy.io import wavfile as wv
from scipy.signal import iirdesign, lfilter, hilbert, butter, freqz
from glob import glob
import matplotlib.pyplot as pl
import seaborn as sb

sb.set(style='ticks', font_scale=1.5)

This is the final version of a notebook, created from scratch and coded live, and then cleaned up, in which 100 sentences are noise vocoded using two different frequency band specifications.

That is, the sentences are read in, filtered into N frequency bands, the amplitude envelope of the signal in each band is extracted, and then a filtered noise signal is multiplied by the envelope, after which the modulated noise signals are added together, and the final signal is written to the hard drive.

In the end, the code consists of a number of nested loops and function calls, but in order to make sure everything on the most nested levels works, we start with the innermost parts of the code, making sure that we can process a single sentence, and then work outward from there to do the same thing with the full set of sentences.

Since we know that we want amplitude envelopes, we need a function to get them. Here is one such function:

def amp_env(snd,fs=44100,rect='full',fphz=40,nfc=5,plot=False,ax=None):
    '''Returns amplitude envelope of input signal snd'''
    if isinstance(snd,str):
        fs,w = wv.read(snd)
    elif isinstance(snd,np.ndarray):
        w = snd
    if rect=='hilbert':
        wr = np.abs(hilbert(w))
    elif rect=='half':
        wr = (w>=0)*w
    elif rect=='full':
        wr = np.abs(w)
    wrn = wr/np.max(np.abs(wr))
    nyq = np.int(fs/2)
    fprad = fphz/nyq
    b,a = butter(N=nfc,Wn=fprad)
    env = lfilter(b,a,wr)
    env = env/np.max(np.abs(env))
    if plot or ax is not None:
        if ax is None:
            fig, ax = pl.subplots(1,1, figsize=(10,5))
        ax.plot(wrn,color='green')
        ax.plot(env,linewidth=2,color='red')
    return env,wrn

Let's see what that function does with various input parameter values:

file_list = glob('sentences/*wav') # read in file names for wav files in folder sentences
fn_t = file_list[0] # pick out the first one for testing various functions and code, _t indicates "temporary"

fs, s = wv.read(fn_t)
nyq = np.int(fs/2)

pl.figure(figsize=(20,5))
pl.plot(s);

Here is what different specifications of the rectification argument do:

fig, axes = pl.subplots(3,1,figsize=(20,15))
e_t, w_t = amp_env(snd=s, ax=axes[0]); axes[0].text(60000,.8,'full wave rectification',fontsize=24)
e_t, w_t = amp_env(snd=s, rect='half',ax=axes[1]); axes[1].text(60000,.8,'half wave rectification',fontsize=24)
e_t, w_t = amp_env(snd=s, rect='hilbert',ax=axes[2]); axes[2].text(60000,.8,'Hilbert transform',fontsize=24);

And here are some different options for the cutoff frequency for the low-pass filter used to smooth the rectified waveform to get an estimate of the amplitude envelope:

fig, axes = pl.subplots(3,1,figsize=(20,15))
e_t, w_t = amp_env(snd=s, fphz=10, ax=axes[0]); axes[0].text(60000,.8,'10 hz low pass filter',fontsize=24);
e_t, w_t = amp_env(snd=s, fphz=40, ax=axes[1]); axes[1].text(60000,.8,'40 hz low pass filter',fontsize=24);
e_t, w_t = amp_env(snd=s, fphz=80, ax=axes[2]); axes[2].text(60000,.8,'100 hz low pass filter',fontsize=24);

Based on these visualizations, we use half wave rectification (since this seems to be slightly more accurate than full or Hilbert-based rectification), and default input values for the other arguments to amp_env().

The next things we need to figure out are (1) how many channels we want to use, and (2) how these channels are spaced in the frequency domain.

One possible way to break up the frequency range into chunks would be to break up the range in equal-sized Hz-based intervals. Another would be to use arbitrary spacing, possibly to approximate the widths of cochlear filters or to probe some specific hypothesis about particular frequency bands (since we are making stimuli for a perceptual experiment in this example).

In order to work out any problems breaking up the frequency range, we'll write some code to test out various approaches:

n_channels = 8
lower = 50 # lowest frequency to bother with, based on human hearing limits
upper = 12000 # upper limit well below nyq
hz_bands = np.linspace(start=lower, stop=upper, num=n_channels+1, endpoint=True) # equally-spaced bands
ir_bands = np.array([lower,500,1500,3000,6000,upper]) # irregular, hard-coded bands
n_ir_b = len(ir_bands)-1

print(hz_bands)
print(ir_bands)

gpass = 1 # in each pass band, I don't want to lose information
gstop = 20 # in each stop band, I want to lose as much information as possible

coef_dict = {} # for storing filter coefficients

# loop through equally spaced bands and get/store corresponding filter coefficients
for f_idx in range(n_channels):
    low_edge = hz_bands[f_idx]
    upp_edge = hz_bands[f_idx+1]
    if f_idx == 0: # make a low-pass filter
        wpass = upp_edge/nyq
        wstop = 1.1*wpass
    elif f_idx <= n_channels:
        wpass = np.array([low_edge, upp_edge])/nyq
        wstop = np.array([.9,1.1])*wpass
    b_t, a_t = iirdesign(wp=wpass, ws=wstop, gpass=gpass, gstop=gstop, ftype='cheby2')
    coef_dict['hz_' + str(f_idx) + '_b'] = b_t
    coef_dict['hz_' + str(f_idx) + '_a'] = a_t

# loop through irregular bands and get/store corresponding filter coefficients
for f_idx in range(n_ir_b):
    low_edge = ir_bands[f_idx]
    upp_edge = ir_bands[f_idx+1]
    if f_idx == 0: # make a low-pass filter
        wpass = upp_edge/nyq
        wstop = 1.1*wpass
    else:
        wpass = np.array([low_edge, upp_edge])/nyq
        wstop = np.array([.9,1.1])*wpass
    b_t, a_t = iirdesign(wp=wpass, ws=wstop, gpass=gpass, gstop=gstop, ftype='cheby2')
    coef_dict['ir_' + str(f_idx) + '_b'] = b_t
    coef_dict['ir_' + str(f_idx) + '_a'] = a_t

[    50.     1543.75   3037.5    4531.25   6025.     7518.75   9012.5
  10506.25  12000.  ]
[   50   500  1500  3000  6000 12000]

Visualize the equally-spaced bands:

fig, ax = pl.subplots(1, 1, figsize=(15,5))
ll = []
for i in range(n_channels):
    b_t, a_t = coef_dict['hz_' + str(i) + '_b'], coef_dict['hz_' + str(i) + '_a']
    om_t, H_t = freqz(b_t, a_t)
    lt, = ax.plot(om_t,np.abs(H_t))
    ll.append(lt)
ax.legend(ll,np.arange(n_channels));

Visualize the irregular bands:

fig, ax = pl.subplots(1, 1, figsize=(15,5))
ll = []
for i in range(len(ir_bands)-1):
    b_t, a_t = coef_dict['ir_' + str(i) + '_b'], coef_dict['ir_' + str(i) + '_a']
    om_t, H_t = freqz(b_t, a_t)
    lt, = ax.plot(om_t,np.abs(H_t))
    ll.append(lt)
ax.legend(ll,np.arange(n_channels));

Now we can test the filters we created above, get and store the maximum amplitude in each filtered band of speech (for scaling the envelopes later), and see what filtered bands of our sample sentence look like. We'll also save the filtered bands as .wav files so we can listen to them and make sure they sound (more or less) like we expect them to.

hz_comps = np.zeros((n_channels, len(s))) # initialize container for filtered signal
hz_maxv = np.zeros(n_channels) # container for values to scale the amplitude within each channel

for i in range(n_channels):
    b_t, a_t = coef_dict['hz_' + str(i) + '_b'], coef_dict['hz_' + str(i) + '_a'] # get filter coefficients
    s_t = lfilter(b=b_t, a=a_t, x=s) # filter the signal
    hz_comps[i, :] = s_t # store the filtered signal
    hz_maxv[i] = np.max(np.abs(s_t)) # get the maximum amplitude deviation for filtered signal
    
fig, axes = pl.subplots(n_channels, 1, figsize=(10,2*n_channels))
for i in range(n_channels):
    s_t = hz_maxv[i]*hz_comps[i,:]/np.max(np.abs(hz_comps[i,:]))
    axes[i].plot(s_t)
    axes[i].set(ylim=(-1,1))
    wv.write('temp_hz_'+str(i)+'.wav',fs,s_t)

Now we can loop through the bands, generate and filter noise (into the same bands), then multiply the filtered noise by the appropriate amplitude envelope (and max amplitude value), and store the result (visualizing the first step of the loop so we can see what's going on):

hz_envs = np.zeros(hz_comps.shape) # container for amplitude envelopes
hz_ncmp = np.zeros(hz_comps.shape) # container for modulated noise components

fig, axes = pl.subplots(3, 1, figsize=(20,12))
ax1, ax2, ax3 = axes

for i in range(n_channels):
    noise = np.random.normal(size=len(s)) # generate sample of noise
    noise = noise/np.max(np.abs(noise)) # normalize it
    b_t, a_t = coef_dict['hz_' + str(i) + '_b'], coef_dict['hz_' + str(i) + '_a'] # get filter coefficients
    n_t = lfilter(b=b_t, a=a_t, x=noise) # filter the noise
    e_t, w_t = amp_env(hz_comps[i,:], rect='half') # get amplitude envelope of speech component in channel i
    hz_envs[i,:] = e_t # store the envelope (not strictly necessary, useful for plotting)
    hz_ncmp[i,:] = hz_maxv[i]*e_t*n_t # normalize and amplitude modulate noise for channel i
    if i == 0:
        ax1.plot(n_t) # plot filtered noise band for channel 0
        ax1.plot(np.max(np.abs(hz_ncmp[i,:]))*e_t, lw=5) # plot envelope
        ax2.plot(hz_ncmp[i,:]) # plot modulated noise band for channel 0
        ax3.plot(hz_comps[i,:]) # plot original speech band for channel 0

Here is the complete noise vocoded version of the sample sentence (visualized and written to the hard drive) and the original wave form for reference:

s_noise = hz_ncmp.sum(axis=0) # sum the noise components across channels
s_noise = .975*s_noise/np.max(np.abs(s_noise)) # normalize the amplitude
wv.write('test_noise_vocode_hzb.wav',fs,s_noise)
fig, axes = pl.subplots(2,1,figsize=(20,10)); ax1, ax2 = axes
ax1.plot(s_noise)
ax2.plot(s);

Using the package sounddevice (imported above as sd), we can listen to the original and the noise vocoded version:

sd.play(s)

sd.play(s_noise)

Next, we repeat everything with the irregular frequency bands. First, we filter the original sentence into the irregular bands, and plot the waveforms:

ir_comps = np.zeros((n_ir_b, len(s))) # container for storing filtered speech components
ir_maxv = np.zeros(n_ir_b) # container for maximum amplitude values for filtered speech components

for i in range(n_ir_b):
    b_t, a_t = coef_dict['ir_' + str(i) + '_b'], coef_dict['ir_' + str(i) + '_a']
    s_t = lfilter(b=b_t, a=a_t, x=s)
    ir_comps[i, :] = s_t
    ir_maxv[i] = np.max(np.abs(s_t))
    
fig, axes = pl.subplots(n_ir_b, 1, figsize=(10,2*n_ir_b))
for i in range(n_ir_b):
    s_t = ir_maxv[i]*ir_comps[i,:]/np.max(np.abs(ir_comps[i,:]))
    axes[i].plot(s_t)
    axes[i].set(ylim=(-1,1))
    wv.write('temp_ir_'+str(i)+'.wav',fs,s_t)

Next, we loop through the bands, generate and filter noise, multiply by the envelope (and max amplitude for the appropriate band), and visualize the complete irregular-band noise vocoded signal

ir_envs = np.zeros(ir_comps.shape) # container for irregular band envelopes
ir_ncmp = np.zeros(ir_comps.shape) # container for iiregular band components

for i in range(n_ir_b):
    noise = np.random.normal(size=len(s)) # generate noise
    noise = noise/np.max(np.abs(noise)) # normalize it
    b_t, a_t = coef_dict['ir_' + str(i) + '_b'], coef_dict['ir_' + str(i) + '_a'] # get filter coefficients
    n_t = lfilter(b=b_t, a=a_t, x=noise) # filter the noise
    e_t, w_t = amp_env(ir_comps[i,:], rect='half') # get amp env of speech component for band i
    ir_envs[i,:] = e_t # store the envelope
    ir_ncmp[i,:] = ir_maxv[i]*e_t*n_t # modulate the noise

t_noise = ir_ncmp.sum(axis=0)
t_noise = .975*s_noise/np.max(np.abs(t_noise))
wv.write('test_noise_vocode_irb.wav',fs,t_noise)
fig, ax = pl.subplots(1,1,figsize=(20,5))
ax.plot(t_noise);

Again, we can listen:

sd.play(s)

sd.play(t_noise)

Finally, we take the appropriate bits and peices of the code from above, copying some of the inital steps for convenience (e.g., so we can specify different frequency bands or filter characteristics without having to scroll up to the top of the notebook and re-run cells there), and then we put the appropriate pieces of code in a loop in which we read in all 100 sentences from the hard drive, processing each one, and writing noise-vocoded versions of all of the sentences to the hard drive (for later use):

# copied for convenience (i.e., so we don't have to scroll up and re-run code to change values)

n_channels = 16
lower = 50 # lowest frequency to bother with, based on human hearing limits
upper = 12000 # upper limit well below nyq
hz_bands = np.linspace(start=lower, stop=upper, num=n_channels+1, endpoint=True)
ir_bands = np.array([lower,750,3000,5000,9000,upper])
n_ir_b = len(ir_bands)-1

#print(hz_bands)
#print(ir_bands)

gpass = 1 # in each pass band, I don't want to lose information
gstop = 20 # in each stop band, I want to lose as much information as possible

coef_dict = {} # for storing filter coefficients

for f_idx in range(n_channels):
    low_edge = hz_bands[f_idx]
    upp_edge = hz_bands[f_idx+1]
    if f_idx == 0: # make a low-pass filter
        wpass = upp_edge/nyq
        wstop = 1.1*wpass
    elif f_idx <= n_channels:
        wpass = np.array([low_edge, upp_edge])/nyq
        wstop = np.array([.9,1.1])*wpass
    b_t, a_t = iirdesign(wp=wpass, ws=wstop, gpass=gpass, gstop=gstop, ftype='cheby2')
    coef_dict['hz_' + str(f_idx) + '_b'] = b_t
    coef_dict['hz_' + str(f_idx) + '_a'] = a_t

for f_idx in range(n_ir_b):
    low_edge = ir_bands[f_idx]
    upp_edge = ir_bands[f_idx+1]
    if f_idx == 0: # make a low-pass filter
        wpass = upp_edge/nyq
        wstop = 1.1*wpass
    else:
        wpass = np.array([low_edge, upp_edge])/nyq
        wstop = np.array([.9,1.1])*wpass
    b_t, a_t = iirdesign(wp=wpass, ws=wstop, gpass=gpass, gstop=gstop, ftype='cheby2')
    coef_dict['ir_' + str(f_idx) + '_b'] = b_t
    coef_dict['ir_' + str(f_idx) + '_a'] = a_t

Visualize the frequency bands:

fig, ax = pl.subplots(1, 1, figsize=(15,5))
ll = []
for i in range(n_channels):
    b_t, a_t = coef_dict['hz_' + str(i) + '_b'], coef_dict['hz_' + str(i) + '_a']
    om_t, H_t = freqz(b_t, a_t)
    lt, = ax.plot(om_t,np.abs(H_t))
    ll.append(lt)
ax.legend(ll,np.arange(n_channels));

fig, ax = pl.subplots(1, 1, figsize=(15,5))
ll = []
for i in range(len(ir_bands)-1):
    b_t, a_t = coef_dict['ir_' + str(i) + '_b'], coef_dict['ir_' + str(i) + '_a']
    om_t, H_t = freqz(b_t, a_t)
    lt, = ax.plot(om_t,np.abs(H_t))
    ll.append(lt)
ax.legend(ll,np.arange(n_channels));

Read in the list of sentences:

file_list = glob('sentences/*wav') # read in file names for wav files in folder sentences
file_list[:5]

['sentences/f7s2.wav',
 'sentences/m9s4.wav',
 'sentences/f1s4.wav',
 'sentences/f1s5.wav',
 'sentences/m9s5.wav']

We'll use the split() method to get information from the file name, first to get rid of the directory structure above the file name:

file_temp = file_list[0]
file_temp.split('/')

['sentences', 'f7s2.wav']

Then to get rid of the file extension:

file_temp.split('/')[1].split('.')

['f7s2', 'wav']

Then we process all 100 sentences:

hz_dir = 'hz_sentences/'
ir_dir = 'ir_sentences/'

for fi, fn in enumerate(file_list): # loop through files
    fs, s = wv.read(fn) # read in wav file, return sampling frequency (fs) and signal (s)
    hz_ncmp = np.zeros((n_channels, len(s))) # for sentence s, create container for noise components
    for i in range(n_channels): # for file fn, loop through hz channels
        b_t, a_t = coef_dict['hz_' + str(i) + '_b'], coef_dict['hz_' + str(i) + '_a'] # get filter coefficients
        hz_c = lfilter(b=b_t, a=a_t, x=s) # filter the signal
        hz_m = np.max(np.abs(hz_c)) # get the maximum amplitude deviation for filtered signal
        noise = np.random.normal(size=len(s)) # generate sample of noise
        noise = noise/np.max(np.abs(noise)) # normalize it
        n_t = lfilter(b=b_t, a=a_t, x=noise) # filter the noise
        e_t, w_t = amp_env(hz_c, rect='half') # get amplitude envelope of speech component in channel i
        hz_ncmp[i,:] = hz_m*e_t*n_t # normalize and amplitude modulate noise for channel i
    nvoc = hz_ncmp.sum(axis=0) # sum across channels
    nvoc = 0.975*nvoc/np.max(np.abs(nvoc)) # normalize
    fn_t = fn.split('/')[1].split('.')[0]
    wv.write(filename=hz_dir + fn_t + '_hzb.wav', rate=fs, data=nvoc)

    ir_ncmp = np.zeros((n_channels, len(s))) # for sentence s, create container for noise components
    for i in range(n_ir_b): # for file fn, loop through hz channels
        b_t, a_t = coef_dict['ir_' + str(i) + '_b'], coef_dict['ir_' + str(i) + '_a'] # get filter coefficients
        ir_c = lfilter(b=b_t, a=a_t, x=s) # filter the signal
        ir_m = np.max(np.abs(ir_c)) # get the maximum amplitude deviation for filtered signal
        noise = np.random.normal(size=len(s)) # generate sample of noise
        noise = noise/np.max(np.abs(noise)) # normalize it
        n_t = lfilter(b=b_t, a=a_t, x=noise) # filter the noise
        e_t, w_t = amp_env(ir_c, rect='half') # get amplitude envelope of speech component in channel i
        ir_ncmp[i,:] = ir_m*e_t*n_t # normalize and amplitude modulate noise for channel i
    nvoc = ir_ncmp.sum(axis=0) # sum across channels
    nvoc = 0.975*nvoc/np.max(np.abs(nvoc)) # normalize
    fn_t = fn.split('/')[1].split('.')[0]
    wv.write(filename=ir_dir + fn_t + '_irb.wav', rate=fs, data=nvoc)

Now that we have 100 regular-band and 100 irregular-band noise vocoded stimuli, we can code up a very simple "experiment" in which we play the sentences and collect transcription responses (for a real experiment, of course, we would want a lot more than just this).

For this very simple version, we'll use input() to collect responses and append them to a list. Here's how input() works:

temp = input('Type your name: ')

Type your name: Noah Silbert

temp

'Noah Silbert'

For illustration purposes, we'll just loop through the first five sentences and get responses:

from numpy.random import shuffle # for randomizing the sentences

hz_sentences = glob('hz_sentences/*.wav') # hz band vocded sentences
shuffle(hz_sentences) # randomize

ir_sentences = glob('ir_sentences/*.wav') # ir band vocoded sentences
shuffle(ir_sentences) # randomize

response_list = [] # initialize a list for storing responses

for hz_s in hz_sentences[:5]:
    fs, s = wv.read(hz_s) # read in the sound file hz_s
    sd.play(s) # play it
    resp_temp = input('Transcribe the sentence here: ') # prompt for transcription
    response_list.append(resp_temp) # store response

Transcribe the sentence here: I have no idea
Transcribe the sentence here: Watch the something something
Transcribe the sentence here: Unclear
Transcribe the sentence here: something something clean?
Transcribe the sentence here: He takes the oath of office each March

response_list

['I have no idea',
 'Watch the something something',
 'Unclear',
 'something something clean?',
 'He takes the oath of office each March']

Here is a loop for the first five irregular-band stimuli (after randomizing):

for ir_s in ir_sentences[:5]:
    fs, s = wv.read(ir_s) # read in the sound file hz_s
    sd.play(s) # play it
    resp_temp = input('Transcribe the sentence here: ') # prompt for transcription
    response_list.append(resp_temp) # store response

Transcribe the sentence here: Nope
Transcribe the sentence here: It is late something I have no idea
Transcribe the sentence here: ?
Transcribe the sentence here: something something space
Transcribe the sentence here: nothing

response_list

['I have no idea',
 'Watch the something something',
 'Unclear',
 'something something clean?',
 'He takes the oath of office each March',
 'Nope',
 'It is late something I have no idea',
 '?',
 'something something space',
 'nothing']

In our simple version of the experiment, we can take our list of responses and save them to a file for later analysis. On way to do this is by opening a file, writing the data to the file, and then closing the file:

file = open('transcription_responses.txt','w') # create file to write data
for r in response_list: # loop through responses
    file.writelines(r + '\n') # write response r, plus a newline character
file.close() # close the file

We could also use a with statement, which has the added benefit of making sure the file connection is closed when all of the commands have been executed, saving us from the error of possibly forgetting to close the file manually:

with open('transcription_responses_2.txt','w') as file:
    for r in response_list:
        file.writelines(r + '\n')

Summary¶

Although it's not at all clear from this cleaned up notebook, the "live coding" implementation was at least a moderately successful pedagogical effort. The end result looks like a (roughly) finished product, in which 100 sentences are read in, processed, and corresponding noise-vocoded sentences are written to the hard drive. Rudimentary stimulus presentation and response collection was then illustrated, as was writing the responses to a .txt file (using two different methods). Finally, we spent some time outside of this notebook looking at some basic uses of PsychoPy.