Code and reproducer commands: the benchmark harness is open-source at github.com/syshlted/timepitch-bench (Apache-2.0). Every test in this post has an exact one-line reproducer in docs/USAGE.md. Independent results welcome — see CONTRIBUTING.md.
Why a new test
MondoLoop’s DSP bench already compared Signalsmith Stretch, SoundTouch, and Rubber Band on sine, sweep, impulse, and white-noise signals — CPU, peak RSS, pitch error in cents. The trouble: on pure sines, all three libraries land within ±0.5 cents. Quality on synthetic tones is essentially equivalent, so the existing battery couldn’t tell us anything about how the libraries actually colour music.
We needed a signal that exercises the spectrum more.
The Shepard tone, weaponised
The idea: octave-spaced partials, swept continuously upward at 0.5 oct/sec, under a fixed Gaussian envelope in log-frequency (center = log₂(500 Hz), sigma = 2 octaves). For an ideal stretcher this gives clean predictions:
- Time-stretch: instantaneous partial frequencies at any equivalent input position should be preserved; only the sweep rate changes.
- Pitch-shift by R: every partial multiplied by R; octave structure preserved.
- FFT check: averaged output spectrum should show peaks at octave spacing.

Closed-form phase for the exponential sweep keeps things numerically clean:
φ(t) = 2π f₀ (2^(rate·t) − 1) / (rate·ln 2)

Implementation landed across signals.{h,cpp} (the synthesis), fft.{h,cpp} (a parabolic-interpolated peak picker ranked by magnitude), and an extended QualityReport carrying input/output peak lists, median adjacent-octave ratio, and observed pitch ratio.
First run: two surprises
Octave preservation was excellent across all three libraries (median adjacent-peak ratio 1.999–2.001 everywhere). But the pitch-ratio detection produced two confusing results.
The +1-octave shift looked broken. Output partials coincide exactly with the input partials one octave above, so the nearest-input-peak matcher snapped to a ratio of ~1, not 2. That’s the Shepard illusion working as designed: a pitch shift of an exact octave on this signal is spectrally indistinguishable from identity. Lesson: probe pitch with non-octave shifts.
Signalsmith showed a 69-cent error at pitch ×1.3348. Suspicious — the existing sine pitch test at the same ratio puts it at 1.02 cents (sub-cent like the others). What went wrong?
Cross-checked against the sine baseline:
| Library | error (cents) on sine, ×1.3348 |
|---|---|
| signalsmith | 1.02 |
| soundtouch | 0.17 |
| rubberband | 0.16 |
So signalsmith’s pitch accuracy is fine. The 69 cents came from comparing mid-windows of a sweeping signal across stretchers with different algorithmic latencies. Signalsmith’s 60 ms latency means its mid-output window corresponds to a slightly earlier point in the input sweep than the other libraries’. At 0.5 oct/sec, even small time misalignments turn into tens of cents.
Stationary mode
Solution: add --shepard-sweep-rate <oct/sec> (default 0.5). Passing 0 produces stationary partials, eliminating sweep-vs-latency interaction entirely. Stationary results:
| identity | pitch ×1.3348 | pitch ×2.0 | time ×1.5 | |
|---|---|---|---|---|
| signalsmith | 1.0000 | 1.3364 (+2¢) | 1.0005 | 1.0000 |
| soundtouch | 1.0000 | 1.3348 (perfect) | 1.0001 | 1.0000 |
| rubberband | 1.0000 | 1.3348 (perfect) | 1.0000 | 1.0000 |
Confirmed: the earlier 69-cent error was 100% sweep-vs-latency, not a quality issue.
The envelope-preservation metric
The Shepard tone’s amplitude envelope is Gaussian in log-frequency by construction:
amplitude_k = exp(-½·((log₂ f_k − log₂ 500) / 2)²)
After a pitch shift R, the envelope should be centered at log₂(500·R) with sigma still ≈ 2 octaves. A pure time-stretch should leave it unchanged.
Fitting log(magnitude) = A + B·log₂(f) + C·log₂(f)² to the detected peaks via least squares recovers μ = −B/(2C) and σ = √(−1/(2C)) — a clean characterisation of how each stretcher preserves spectral balance.

First results were promising, but absolute centers were biased ~1 Hz low at identity (FFT-window amplitude bias on edge peaks affects the fit). Sigma was biased ~0.05 oct low for the same reason. Critically, the bias was consistent across all three stretchers, which suggested it would cancel in a relative measurement.
Input-relative comparison
Fit the same Gaussian to the input peaks too. Report:
center_error_cents = (observed_shift_oct − expected_shift_oct) · 1200sigma_error = output_sigma − input_sigma
By comparing output to input rather than to theory, the analysis-side bias cancels. Identity errors collapsed to sub-cent across all three libraries (1–1.5 Hz on absolute, 0.2–1.4 cents on input-relative). Clean separation of stretcher-side spectral coloration finally emerged.
When to use each mode
- Stationary Shepard (
--shepard-sweep-rate 0) for envelope-precision work. Tightest, lowest-noise metric. Use this to compare libraries. - Sweeping Shepard (default 0.5 oct/sec) for perceptual stress and transient-coherence testing. Noisier, but probes window-vs-sweep interaction in a way that mimics real material.
The actual library comparison — Rubber Band vs SoundTouch vs Signalsmith — is the next post.