HRTF and Binaural Audio

Head-Related Transfer Function (HRTF) processing is the foundation of Amplitude's 3D binaural audio. This document explains the science behind HRTF, how Amplitude implements it, and how to get the best results.

What is HRTF?¶

When sound travels from a source to your ears, it interacts with your head, torso, and outer ears (pinnae). These interactions create filtering, delays, and reflections that your brain uses to localize sound in 3D space.

An HRTF is a pair of digital filters (one for each ear) that captures this acoustic transformation for every possible direction around the head.

Sound Source --> [Head blocking] --> [Pinna filtering] --> [Ear drum]
                     |                    |
                     v                    v
                 Left HRTF            Right HRTF

HRIR vs. HRTF¶

Term	Meaning	Domain
HRIR	Head-Related Impulse Response	Time domain (raw audio)
HRTF	Head-Related Transfer Function	Frequency domain (spectrum)

The HRTF is simply the Fourier transform of the HRIR. In practice, the terms are often used interchangeably because audio engines convolve with the HRIR (time domain) rather than filtering with the HRTF (frequency domain).

The HRIR Sphere¶

Amplitude stores HRTF data as an HRIR Sphere — a 3D mesh where each vertex contains:

A direction vector (position on the sphere)
Left ear impulse response
Right ear impulse response
Interaural time difference (ITD) delay

At runtime, when a sound is at a specific direction relative to the listener, Amplitude:

Finds the triangle on the sphere mesh that contains the direction.
Barycentrically interpolates the three vertex HRIRs (bilinear sampling).
Convolves the sound with the interpolated left and right HRIRs.

Spatial Cues¶

HRTF provides several cues that the brain uses for localization:

Interaural Time Difference (ITD)¶

Sound arrives at the nearer ear slightly before the farther ear. For a source at 90° azimuth, the delay is approximately 0.6–0.7 ms.

Primary cue for horizontal localization (left/right).
Effective below ~1.5 kHz (above this, phase becomes ambiguous).

Interaural Level Difference (ILD)¶

The head shadows the farther ear, reducing high-frequency energy. The shadowing is strongest above ~4 kHz.

Primary cue for horizontal localization at high frequencies.
Amplitude varies with direction and frequency.

Spectral Filtering (Pinna Cues)¶

The pinna creates direction-dependent notches and peaks in the spectrum. These cues are especially important for:

Elevation (up/down discrimination)
Front/back discrimination

Supported Datasets¶

Amplitude supports multiple publicly available HRIR datasets:

Dataset	Subjects	Resolution	Best For
IRCAM LISTEN	51	Medium	General use, averaged listener
MIT KEMAR	1 (dummy head)	Medium	Baseline reference
SADIE II	71	High	Research, personalized selection
SOFA	Varies	Varies	Custom measurements

The amir CLI tool converts these datasets into Amplitude's optimized .amir format.

Binaural vs. Ambisonic Binauralization¶

Amplitude offers two paths to binaural output:

Direct HRTF Panning¶

Each mono source is panned individually using HRTF convolution. This is used by the StereoPanning node in HRTF mode.

Pros: Lowest latency, precise per-source control.
Cons: CPU cost scales with source count; blending can sound less natural.

Ambisonic Binauralization¶

All sources are encoded into Ambisonics first, then decoded to binaural using Ambisonic-to-HRTF decoding.

Pros: Natural blending, cheap rotation, consistent spatial quality.
Cons: Slightly higher latency, requires Ambisonic pipeline.

The default Amplitude pipeline uses Ambisonic binauralization for the best balance of quality and performance.

Sampling Modes¶

When looking up HRIR data for an arbitrary direction, Amplitude supports two sampling modes:

Mode	Method	Quality	Speed
NearestNeighbor	Uses the closest vertex	Lower	Faster
Bilinear	Interpolates within the nearest triangle	Higher	Slower

For most games, Bilinear provides noticeably smoother spatialization with acceptable CPU cost.

Personalization¶

HRTF is highly individual. A generic HRTF works for most listeners, but personalization improves accuracy:

Select by anthropometry: Choose a dataset subject with similar head width and ear shape.
Perceptual selection: Let the user choose the dataset that sounds most natural.
Custom measurement: Use the SOFA format to import individually measured HRTF data.

Performance¶

HRTF convolution is the most expensive operation in the spatial audio pipeline:

Factor	Impact
HRIR length	Longer IRs = more convolution cost. 128–512 samples is typical.
Source count	Each HRTF-panned source adds convolution cost.
Sampling mode	`Bilinear` is ~2× more expensive than `NearestNeighbor`.
FFT efficiency	Amplitude uses partitioned convolution for efficiency.

For many simultaneous sources, Ambisonic binauralization is more efficient because it performs one decode operation rather than one convolution per source.

Limitations¶

Headphones required: HRTF does not work well on speakers due to crosstalk.
Individual variation: Generic HRTFs may cause front/back confusion or inside-the-head localization.
Elevation ambiguity: Elevation perception is less reliable than azimuth.

Next Steps¶

Follow the HRTF Setup Guide to configure your project.
Learn about the Ambisonics Pipeline.
Generate custom .amir files with the amir CLI tool.