I've always been interested by audio processing. At the recent SciPy 2015 conference, two talks involving "live" spectrograms were part of the highlights (the first concerning Bokeh, a great Python graphics backend for your browser and the second by the VisPy team, an impressive OpenGL Python rendering toolkit for science). This spiked my interest and I've decided to devote a little time to write a simple app for live audio processing. I first tried to start with the example from VisPy but as is part of a pull request that's not yet merged into the main branch, I decided to start from scratch using some of the code provided by the VisPy example and combine it with my existing knowledge of matplotlib and PyQt.
The app I want to code should do the following things:
- record a sound from my microphone continuously
- plot the waveform and the spectrum (frequency representation) of the current waveform
The final result can be seen on YouTube:
from IPython.display import YouTubeVideo YouTubeVideo('1XxR9U_aUog')
This blog post works as follows:
- we'll first look at the Microphone class that records sound and explain its behaviour
- in a second step, we'll design a Qt application that plots the data from the microphone
It is actually a little bit tricky to record sound from the microphone with Python. In particular, I first tried doing this in pure PyQt, but failed. This seems to be possible, but given I did not manage to find code that does this on the internet, I abandoned this way of doing it.
Inspired by the VisPy talk, I decided to take over their implementation of a continuously recording microphone using PyAudio. The PyAudio bindings are great because they allow you to use most devices on most platforms. On Mac, I haven't been able to install them into Python 3, but I managed to find a conda package compatible with Python 2.7. So the following code needs to run under Python 2.7 on my machine.
# class taken from the SciPy 2015 Vispy talk opening example # see https://github.com/vispy/vispy/pull/928 import pyaudio import threading import atexit import numpy as np class MicrophoneRecorder(object): def __init__(self, rate=4000, chunksize=1024): self.rate = rate self.chunksize = chunksize self.p = pyaudio.PyAudio() self.stream = self.p.open(format=pyaudio.paInt16, channels=1, rate=self.rate, input=True, frames_per_buffer=self.chunksize, stream_callback=self.new_frame) self.lock = threading.Lock() self.stop = False self.frames =  atexit.register(self.close) def new_frame(self, data, frame_count, time_info, status): data = np.fromstring(data, 'int16') with self.lock: self.frames.append(data) if self.stop: return None, pyaudio.paComplete return None, pyaudio.paContinue def get_frames(self): with self.lock: frames = self.frames self.frames =  return frames def start(self): self.stream.start_stream() def close(self): with self.lock: self.stop = True self.stream.close() self.p.terminate()
This class works as a microphone recorder by wrapping functions provided by PyAudio (as shown in the PyAudio examples). On instantiation it creates a PyAudio object that uses a callback to feed an internal buffer,
self.frames, with audio chunks that always have the same length (as defined by
The internal buffer can then be accessed (and emptied) using
One of the tricky things in this class is that it actually runs in a thread. So to make sure it stops correctly, a call to
atexit is registered at startup. Unfortunately, this also makes the class unrunnable in the IPython notebook (so I won't instantiate it here).
Once we have a mechanism for getting the audio, we can design the graphical part of the app: a GUI written using PyQt.
The audio GUI's function is easy to describe: it should fetch the accumulated output from the microphone recorder at regular intervals and plot the corresponding signal. The plotting should comprise both a time series and a frequency spectrum computed with
It turns out that the way I do the plooting was to use
matplotlib. Although in general not suited for real-time graphical interfaces, it works in this case. The reason is that we don't redraw a whole figure at each call to the plotting but only lines that we have already instantiated. So in fact the updating is quick enough for our means.
Interface a matplotlib plot with PyQt is done using a specific class, whose code is below:
from matplotlib.backends.backend_qt4agg import FigureCanvasQTAgg as FigureCanvas from matplotlib.backends.backend_qt4agg import NavigationToolbar2QT as NavigationToolbar import matplotlib.pyplot as plt class MplFigure(object): def __init__(self, parent): self.figure = plt.figure(facecolor='white') self.canvas = FigureCanvas(self.figure) self.toolbar = NavigationToolbar(self.canvas, parent)
This just creates an object that has a handle to a matplotlib canvas, which will be used to plot our stuff.
The previous object is incorporated into a pretty classic user interface with a checkbox and a slider.
- the checkbox allows for automatic scaling of the spectrum, which is useful if you're recording sounds at different amplitudes and not sure of your output beforehand
- on the other hand, if you uncheck the box, you can control the normalization of the spectrum using a fixed gain provided by the slider
import numpy as np import matplotlib.pyplot as plt from PyQt4 import QtGui, QtCore class LiveFFTWidget(QtGui.QWidget): def __init__(self): QtGui.QWidget.__init__(self) # customize the UI self.initUI() # init class data self.initData() # connect slots self.connectSlots() # init MPL widget self.initMplWidget() def initUI(self): hbox_gain = QtGui.QHBoxLayout() autoGain = QtGui.QLabel('Auto gain for frequency spectrum') autoGainCheckBox = QtGui.QCheckBox(checked=True) hbox_gain.addWidget(autoGain) hbox_gain.addWidget(autoGainCheckBox) # reference to checkbox self.autoGainCheckBox = autoGainCheckBox hbox_fixedGain = QtGui.QHBoxLayout() fixedGain = QtGui.QLabel('Manual gain level for frequency spectrum') fixedGainSlider = QtGui.QSlider(QtCore.Qt.Horizontal) hbox_fixedGain.addWidget(fixedGain) hbox_fixedGain.addWidget(fixedGainSlider) self.fixedGainSlider = fixedGainSlider vbox = QtGui.QVBoxLayout() vbox.addLayout(hbox_gain) vbox.addLayout(hbox_fixedGain) # mpl figure self.main_figure = MplFigure(self) vbox.addWidget(self.main_figure.toolbar) vbox.addWidget(self.main_figure.canvas) self.setLayout(vbox) self.setGeometry(300, 300, 350, 300) self.setWindowTitle('LiveFFT') self.show() # timer for callbacks, taken from: # http://ralsina.me/weblog/posts/BB974.html timer = QtCore.QTimer() timer.timeout.connect(self.handleNewData) timer.start(100) # keep reference to timer self.timer = timer def initData(self): mic = MicrophoneRecorder() mic.start() # keeps reference to mic self.mic = mic # computes the parameters that will be used during plotting self.freq_vect = np.fft.rfftfreq(mic.chunksize, 1./mic.rate) self.time_vect = np.arange(mic.chunksize, dtype=np.float32) / mic.rate * 1000 def connectSlots(self): pass def initMplWidget(self): """creates initial matplotlib plots in the main window and keeps references for further use""" # top plot self.ax_top = self.main_figure.figure.add_subplot(211) self.ax_top.set_ylim(-32768, 32768) self.ax_top.set_xlim(0, self.time_vect.max()) self.ax_top.set_xlabel(u'time (ms)', fontsize=6) # bottom plot self.ax_bottom = self.main_figure.figure.add_subplot(212) self.ax_bottom.set_ylim(0, 1) self.ax_bottom.set_xlim(0, self.freq_vect.max()) self.ax_bottom.set_xlabel(u'frequency (Hz)', fontsize=6) # line objects self.line_top, = self.ax_top.plot(self.time_vect, np.ones_like(self.time_vect)) self.line_bottom, = self.ax_bottom.plot(self.freq_vect, np.ones_like(self.freq_vect)) def handleNewData(self): """ handles the asynchroneously collected sound chunks """ # gets the latest frames frames = self.mic.get_frames() if len(frames) > 0: # keeps only the last frame current_frame = frames[-1] # plots the time signal self.line_top.set_data(self.time_vect, current_frame) # computes and plots the fft signal fft_frame = np.fft.rfft(current_frame) if self.autoGainCheckBox.checkState() == QtCore.Qt.Checked: fft_frame /= np.abs(fft_frame).max() else: fft_frame *= (1 + self.fixedGainSlider.value()) / 5000000. #print(np.abs(fft_frame).max()) self.line_bottom.set_data(self.freq_vect, np.abs(fft_frame)) # refreshes the plots self.main_figure.canvas.draw()
The refreshing part of the app is handled with a
QTimer that gets called 10 times a second and refreshes the gui at that time by calling the
handleNewData function. That function gets the latest frame from the microphone, plots the time series, computes the Fourier transform and plots its modulus.
The whole app can be run using the following commands:
I made a little video from the app working live on YouTube:
In this post, we learned how to code a little app for live sound recording using PyAudio, a little audio processing using numpy and displaying the data using matplotlib embedded in the PyQt framework.