What the study found
Reducing the bit depth of a neural network’s output tensor can degrade speaker recognition accuracy, and overly large reductions can cause a significant drop compared with the baseline network.
Why the authors say this matters
The authors say the study matters because voice is increasingly being proposed as a verification key, and large biometric databases require substantial storage and RAM. The study suggests that quantizing the output tensor may help reduce the resources needed to support a biometric system without extra neural network training.
What the researchers tested
The researchers examined how quantization, meaning reducing the number of bits used to represent values, affected the output tensor of speaker-recognition neural networks. They tested three models: CAM++, WavLM, and ReDimNet, using the English-language VoxCeleb-1 dataset and measuring performance with Equal Error Rate (EER), a speaker-recognition error measure.
What worked and what didn't
The abstract states that 32-bit values are typical for the output tensor, while minifloat formats using 8, 6, or 4 bits were of particular interest. It reports that the study assessed how lowering bit depth affected recognition accuracy across the three architectures, but it does not provide the detailed numerical results for each bit depth in the abstract. It also notes that excessive bit-depth reduction can significantly worsen recognition quality compared with the baseline network.
What to keep in mind
The abstract does not list detailed model-by-model results or thresholds for when performance became unacceptable. It also does not describe limitations beyond noting that too much reduction in bit depth can harm recognition quality.
Key points
- The study found that lowering the bit depth of the output tensor can reduce speaker recognition accuracy.
- The researchers tested CAM++, WavLM, and ReDimNet on the VoxCeleb-1 dataset.
- Recognition performance was evaluated with Equal Error Rate (EER).
- The abstract highlights 8-, 6-, and 4-bit minifloat formats as alternatives to the usual 32-bit representation.
- The authors say quantization may help reduce storage and RAM needs in biometric systems without additional training.
Disclosure
- Research title:
- Lower bit depth reduced speaker recognition accuracy
- Publication date:
- 2026-02-04
- OpenAlex record:
- View
Get the weekly research newsletter
Stay current with peer-reviewed research without reading academic papers — one filtered digest, every Friday.


