Python音频数据预处理-使用pydub模块提取wav音频数据（固定采样点）并存储到csv文件

2022-09-29 14:37:18

任务描述
将10s、44.1kHZ的wav音频文件处理成5个2s、8000Hz的音频片段，其中每个2s的音频片段都包含2(s)*2(channel)*8,000个采样点，即32,000个采样点，前16,000个为左声道采样点，后16,000个为右声道采样点（对于单通道数据，直接复制16,000个采样点追加在后面），再存储到csv文件中。
即将10s、44.1kHZ的wav音频文件输出为（5，8000）的特征向量。
涉及模块
os、shutil、random、numpy、pydub、pandas
具体实现
3.1 将目标音频以目标采样率切割成N段，并保存到wav中

defwav_split_sub_process(path,new_path,filename,split_to_N=5,frame_rate=22050):'''
    将目标path（绝对路径）音频文件，分割成 split_to_N  段 ，且该音频必须为 7.99s到11.5s内的音频，否则不会操作。
    该方法会返回一个 numpy数组，形状为 （5，frame_rate*2*2），其中第一行表示第一段音频信号，frame_rate代表每秒具有的采样点个数，2s、2通道。
    不足10s或者超出10s会自动重采样采样点个数，以满足 10*frame_rate*2 的采样点个数。10s、2通道
    :param path:  目标切割文件路径
    :param new_path:   切割完成后音频存储目录
    :param filename:     切割完成后音频文件名的前缀   filename-n.wav  n->{1,2,3,4,5}
    :param split_to_N:  切割成几段
    :param frame_rate:  输出音频的目标采样率
    :return:
    '''# 获取AudioSegment对象
    sound= AudioSegment.from_file(path,channels=1,sample_width=2)# 重置成目标采样率
    sound= sound.set_frame_rate(frame_rate)# 取得音频的声道数
    channel_count= sound.channels# 获取音频数据
    sound_np_array= np.array(sound.get_array_of_samples()).reshape(-1)
    time=len(sound)# 若时间不在这个区间内，分割后质量不高if time/1000>11.5or time/1000<7.99:print('filter one wav with time ----------  ',filename)returnNoneif channel_count==1:# 单通道数据处理
        frame_count= sound.frame_count()
        time=len(sound)
        sound_np_array=Noneif frame_count!= frame_rate*10.0:
            sound= sound.set_frame_rate(int(frame_rate*10/time*1000))
            frame_count= sound.frame_count()
            sound_np_array= np.array(sound.get_array_of_samples()).reshape(-1)if frame_count> frame_rate*10.0:
                sound_np_array= sound_np_array[:frame_rate*10]elif frame_count< frame_rate*10.0:
                sound_np_array= np.pad(sound_np_array,(0,int(frame_rate*10- sound.frame_count())),'constant',constant_values=(0,0))else:
            sound_np_array= np.array(sound.get_array_of_samples(),dtype=np.int16).reshape(-1)
        temp_sound_np_array= np.array(np.split(sound_np_array, split_to_N,axis=-1),dtype=np.int16)# 按最后一个维度切分
        sound_np_array=[]for iteminrange(len(temp_sound_np_array)):
            sound_np_array.append([])
            sound_np_array[item]= np.hstack((temp_sound_np_array[item],temp_sound_np_array[item]))
        sound_np_array= np.array(sound_np_array)elif channel_count==2:
        frame_count= sound.frame_count()
        time=len(sound)
        sound_np_array=Noneif frame_count!= frame_rate*10.0:
            sound= sound.set_frame_rate(int(frame_rate*10/time*1000))# frame_count = sound.frame_count()
            sound_np_array= np.array(sound.get_array_of_samples(),dtype=np.int16).reshape(-1)iflen(sound_np_array)> frame_rate*10.0*2.0:
                sound_np_array= sound_np_array[:frame_rate*10*2]eliflen(sound_np_array)<  frame_rate*10.0*2.0:
                sound_np_array= np.pad(sound_np_array,(0,int(frame_rate*10*2-len(sound_np_array))),'constant',constant_values=(0,0))else:
            sound_np_array= np.array(sound.get_array_of_samples(),dtype=np.int16).reshape(-1)
        left,right= sound_np_array[:frame_rate*2],sound_np_array[frame_rate*2:]
        lefts= np.split(left, split_to_N,axis=-1)
        rights= np.split(right, split_to_N,axis=-1)
        sound_np_array= np.append(lefts,rights,axis=-1)# 保存音频数据到wav文件for jinrange(split_to_N):
        sound._spawn(sound_np_array[j]).export(os.path.join(new_path,filename+'-'+str(j+1)+'.wav'),format='wav')return sound_np_array# (5, 88200)

3.2 将音频数据保存到 csv 中，待续·······