Skip to content Skip to sidebar Skip to footer

Find The Timestamp Of A Sound Sample Of An Mp3 With Linux Or Python

I am slowly working on a project which where it would be very useful if the computer could find where in an mp3 file a certain sample occurs. I would restrict this problem to meani

Solution 1:

MP3 is an interesting format. The underlying data is stored in 'Frames', each 0.026 seconds long. Each frame is a Fast Fourier transform of the sound wave, encoded with varying degrees of quality depending on the size and bitrate, etc.. In your case, are you certain that the mp3s have matching bitrates? If they do, a relatively straightforward grep-style approach should be possible, given that you select on Frame boundaries. However, it is entirely likely and possible that this is not the case.

For a true solution, you need to process the mp3 file to some degree, to abstract away the encoding. However, there is no guarantee that the resulting wave match even for matching sounds, as bitrates and possibly frame alignment may differ. This small degree of chance makes it much harder.

I will give you my approach to this problem, but it is worth noting that this is not the perfect way to do things, just my best swing. Even though its the same file, there's no guarantee that frame boundaries are aligned, so I think you need to take a very wave-oriented approach, rather than a data-oriented one.

First, convert the mp3s to waves. I know that it'd be great to leave it compressed, but again I think wave-oriented is our only hope. Then, use a high-pass filter to try to remove any artifacts of audio compression that would differ between samples. Once you have two waveforms, it should be relatively straight forward to find the wavelet in the wave. You can iterate through possible starting positions and subtract the waves. When you get close to zero, you know you're close.

Solution 2:

As suggested in Carson's answer, processing the audio gets a lot easier once the files are converted to the .wav format.

You may do so using Wernight's answer on reading mp3 in python:

ffmpeg -i godsavethequeen.mp3 -vn -acodec pcm_s16le -ac 1 -ar 44100 -f wav godsavethequeen.wav
ffmpeg -i gstq_sample.mp3 -vn -acodec pcm_s16le -ac 1 -ar 44100 -f wav gstq_sample.wav

Then to find the position of the sample is mostly a matter of obtaining the peak of the cross-correlation function between the source (godsavethequeen.wav in this case) and the sample to look for (gstq_sample.wav). In essence, this will find the shift at which the sample looks the most like the corresponding portion in the source. This can be done with python using scipy.signal.correlate.

Throwing a small python script to do just that would look like:

import numpy as np
import sys
from scipy.io import wavfile
from scipy import signal

snippet = sys.argv[1]
source  = sys.argv[2]

# read the sample to look for
rate_snippet, snippet = wavfile.read(snippet);
snippet = np.array(snippet, dtype='float')

# read the source
rate, source = wavfile.read(source);
source = np.array(source, dtype='float')

# resample such that both signals are at the same sampling rate (if required)if rate != rate_snippet:
  num = int(np.round(rate*len(snippet)/rate_snippet))
  snippet = signal.resample(snippet, num)

# compute the cross-correlation
z = signal.correlate(source, snippet);

peak = np.argmax(np.abs(z))
start = (peak-len(snippet)+1)/rate
end   = peak/rate

print("start {} end {}".format(start, end))

Note that for good measures I've included a check to make sure both .wav files have the same sampling rate (and resample as needed), but you could alternatively make sure they are always the same while you convert them from .mp3 format using the -ar 44100 argument to ffmpeg.

Post a Comment for "Find The Timestamp Of A Sound Sample Of An Mp3 With Linux Or Python"