Web Audio tuner

This is a simple tuner built using the WebAudio API. Like tuners in the real world, you can also use it to generate a reference tone so you can tune by ear.

Tuner

Pitch
-- Hz
Cents
Note
--

Tune using:

Base frequency
Base frequency
Note

APIs used

Web Audio

This API is the core of this demo. We use it to perform different tasks, from generating synthetic sounds, to analysing the sound we get, to chanelling the sound to whatever the default audio output device is.

Note: Some of the code snippets below are fragments of the source code of this demo, and as such the initialization of some variables may not appear in them. You can find them in other sections on this page and/or the source code of the demo.

AudioContext

This is the entry point of Web Audio, and responsible of generating all the AudioNode instances we use throughout of this demo.

We start by checking if the browser supports Web Audio by looking at whether window.AudioContext (or its webkit prefixed version) is defined. This also will set window.AudioContext so we can easily instance it later if the browser, in fact, supports it.

Example

OscillatorNode

In order to generate a synthetic sound to tune by ear we use an OscillatorNode, which we can configure to play at a specific frequency.

The "Base frequency" controls that you see when you enable this part of the tuner adjusts the frequency of A4, which is used as a reference for the rest of the notes. Although all of them can be calculated from A4's frequency, we have pre-calculated them and placed them in a notes.json file that we dynamically load at runtime. After that, it's just a matter of iterating through the array of notes for a particular A4 frequency and set the correct frequency in the oscillator node.

Example

MediaStreamAudioSourceNode

We use this type of node as a source AudioNode with the stream of data we get from the Media Stream API (see below). You should take into account that the sampling frequency used will match the sampling rate that your output device uses (typically 44.1kHz or 48kHz).

Example

AnalyserNode

This node receives data from the MediaStreamAudioSourceNode and performs a Fast Fourier Transform on those samples. This data is later used by an autocorrelation algorithm to detect the pitch of the sound. For this node we set an fftSize of 2048 (the maximum allowed by the Web Audio API), which although is very tight for such a big sampling rate (we can only fit a tiny fraction of a second in that space) it is the best we can do without downsampling the stream by ourselves.

Example

Media Stream

From this API we only use one specific function to access the audio input device, generally the microphone: getUserMedia. Just like with AudioContext, we whould consider the possibility that the API may be prefixed in some browsers or older versions of them.

Example

Pitch detection

Autocorrelation

There is a variety of methods to detect the pitch of a sound, some work in the frequency domain (like HPS, or Harmonic Product Spectrum), while some others do in the time domain (like Autocorrelation). With such a high sampling rate, we can only fit a small fraction of a second in the buffer used by the AnalyserNode. In these conditions, the latter algorithm usually does a better job than the former, and this is why we chose it for this demo.

Autocorrelation is the process of cross-correlating a signal with a time-delayed version of itself. In other words, we will be comparing a signal at two different points in time. As Wikipedia puts it:

It is a mathematical tool for finding repeating patterns, such as the presence of a periodic signal obscured by noise, or identifying the missing fundamental frequency in a signal implied by its harmonic frequencies.

Given a delay time \(k\) we:

  1. Find the value at a time \(t\)
  2. Find the value at a time \(t+k\)
  3. Multiply those values together
  4. Accumulate those products over a series of times (1000 in our code)
  5. Divide by the number of samples to get the average

As seen in this page, the resulting formula would be something like this:

\[ R(k) = \frac{1}{t_{max} - t_{min}} \int_{t_{min}}^{t_{max}}s(t)s(t+k)dt \]

In addition, as you can see in the example, we also normalize the data. Since we are working with an array of bytes (0-255), we subtract 128 and divide by the same value.

Remember that we are working with periodic signals. As you may imagine, the highest correlation will happen once that signal "repeats itself", i.e. that \(bestK\) will match the period (in frames) of the fundamental frequency. In order to get that frequency we just need to divide the sampling rate by that distance \(bestK\).

Example

Finding the right note

Now that we have the fundamental frequency, we just need to find the note with the closest frequency. Since the notesArray that we showed in previous code snippets is already sorted by this value, we only need to perform a binary search to find it.

Example

Calculating the cents off pitch

The last step, given the fundalmental frequency that we have found and the frequency of the closest note, we need how far the former one is from the latter. This is done using the following formula.

\[ cents = \left \lfloor 1200 \frac{ \log_{10}(f/refF) }{\log_{10}2} \right \rfloor \]

Where \(f\) is the fundamental frequency and \(refF\) is the frequency from the closes note. Since \(\log_{10}2\) is a constant that we can precalculate, we just use it directly in our code.

Example