In January 2018, some friends and I created a raspberry pi3 app that tranforms image input to soundscapes. This hypothetically can help blind people perceive their surroundings. How does it work?
Present in every clip of audio are different sound frequencies with different amplitudes. We then use a Hilbert curve to map this frequency domain (which is 1 dimension) to the 2 dimensional Image. So each pixel in the Image corresponds to a unique frequency. We then simply make the amplitude of the frequency correspond to the pixel intensity in greyscale. Watch a Youtube video that explains the concepts better.
Please excuse our poor videography! We filmed this in a one-take at the end of the hackathon, so we were very tired.
- numpy (for
- hilbert_curve (found here)
- PIL (python image library)
hilbert_curve library that we found produces hilbert curves in a fairly inefficient manner, so increasing the resolution from 64x64 to 2048x2048, for example, may run for an obscenely long time preparing the curve.
Also, for some unknown reason to us, the
write function provided in alsaaudio normalizes (?) the amplitudes of the signal passed in, so when we used the function to write a sine curve with amplitude
1e-12, it produced loud audio exactly the same as with amplitude
Finally, the algorithm plays .8 seconds (constant stored as
signal_time_length) of audio for each image, and then after that .8 second period it will capture a new image. This creates a refresh period of around 1 second for the audio out. Theoretically, we should be able to change the
signal_time_length to a smaller period, but there is something weird the python interpreter does with thread locks which prevents it from working.
We would like to acknowledge the following:
- Build18 officers and sponsers. Build18 is the 4.5-day hackathon in which we completed this project. The hackathon provided us with funding for building materials, space for working, and snacks to keep us going.
- 3Blue1Brown (an awesome youtube channel that made a great video on the Hilbert curve and its potential use in mapping visual space to audio space).
- minerscale for open-sourcing initial (if extremely inefficient) code.
- jburkardt for providing hilbert curve code