Seeing”Realtime FFT Graph of Audio WAV File or Microphone Input with Python…” on python.reddit.com reminded me of one I’d built in python with shoebot.
While it works OK, I feel like I’m missing a higher level audio library (especially having seen Minim, for C++ and Java).
To run it in shoebot:
sbot -w audiobot.bot
audiobot.bot
# Major library imports
import atexit
import pyaudio
from numpy import zeros, short, fromstring, array
from numpy.fft import fft
NUM_SAMPLES = 512
SAMPLING_RATE = 11025
def setup():
size(350, 260)
speed(SAMPLING_RATE / NUM_SAMPLES)
_stream = None
def read_fft():
global _stream
pa = None
def cleanup_audio():
if _stream:
_stream.stop_stream()
_stream.close()
pa.terminate()
if _stream is None:
pa = pyaudio.PyAudio()
_stream = pa.open(format=pyaudio.paInt16, channels=1,
rate=SAMPLING_RATE,
input=True, frames_per_buffer=NUM_SAMPLES)
atexit.register(cleanup_audio)
audio_data = fromstring(_stream.read(NUM_SAMPLES), dtype=short)
normalized_data = audio_data / 32768.0
return fft(normalized_data)[1:1+NUM_SAMPLES/2]
def flatten_fft(scale = 1.0):
"""
Produces a nicer graph, I'm not sure if this is correct
"""
for i, v in enumerate(read_fft()):
yield scale * (i * v) / NUM_SAMPLES
def triple(audio):
'''return bass/mid/treble'''
c = audio.copy()
c.resize(3, 255 / 3)
return c
def draw():
'''Draw 3 different colour graphs'''
global NUM_SAMPLES
audio = array(list(flatten_fft(scale = 80)))
freqs = len(audio)
bass, mid, treble = triple(audio)
colours = (0.5, 1.0, 0.5), (1, 1, 0), (1, 0.2, 0.5)
fill(0, 0, 1)
rect(0, 0, WIDTH, 400)
translate(50, 200)
for spectrum, col in zip((bass, mid, treble), colours):
fill(col)
for i, s in enumerate(spectrum):
rect(i, 0, 1, -abs(s))
else:
translate(i, 0)
audio = array(list(flatten_fft(scale = 80)))

You missed a great “with-statement opportunity” in your read_fft() function. Make this function into a generator, put the pa.open() call in a with-statement, then you can iterate on this (call .next() on it) to get each data block. No need for atexit. Use the contextlib.closing wrapper to ensure the stream device is closed when the generator exits.
Cheers, I’ll have a go at using the context manager tomorrow (it’s 1230am here).
The maths bits I’m most shakey, on not really sure if flatten_fft is sane or not for instance (and maybe I can use numpy to do that instead?).
perhaps the graph would appear better visually if you convert it to dB units by taking the log10 of the FFT data… just a thought!
PS: I’m with you on the need for a powerful python audio library!
This may seem a silly question, but what sort of range numbers should I put into the log10 (at the moment I’m using numbers 0 to 1.0) ?
This may seem like a silly question, but what range of number should I put into the log10 ? – At the moment I’m using numbers in the range 1 to 10.