# Speex Audio Encoder for .NET

Video Capture SDK .Net Video Edit SDK .Net Media Blocks SDK .Net

# Introduction to Speex

Speex is a patent-free audio codec specifically designed for speech encoding. It provides excellent compression while maintaining voice quality across various bitrates. VisioForge integrates this powerful encoder into its .NET SDKs, offering developers flexible configuration options for speech-based applications.

# Core Functionality

The Speex encoder in VisioForge SDKs supports:

  • Multiple frequency bands for different quality levels
  • Variable and fixed bitrate encoding
  • Voice activity detection and silence compression
  • Adjustable complexity and quality settings

# Cross-platform Implementation

VideoCaptureCoreX VideoEditCoreX MediaBlocksPipeline

# Encoder Modes

Speex offers four operation modes optimized for different frequency ranges:

Mode Value Optimal Sample Rate
Auto 0 Automatic selection based on input
Ultra Wide Band 1 32 kHz
Wide Band 2 16 kHz
Narrow Band 3 8 kHz

The encoder automatically adjusts internal parameters based on the selected mode. For most speech applications, Wide Band (mode 2) offers an excellent balance between quality and bandwidth usage.

# Technical Specifications

# Supported Sample Rates

Speex works with three standard sampling frequencies:

  • 8,000 Hz - Best for telephone-quality audio (Narrow Band)
  • 16,000 Hz - Recommended for most voice applications (Wide Band)
  • 32,000 Hz - Highest quality speech encoding (Ultra Wide Band)

# Channel Configuration

The encoder handles both:

  • Mono (1 channel) - Ideal for speech recordings
  • Stereo (2 channels) - For multi-speaker or immersive audio

# Rate Control Methods

# Quality-Based Encoding

For consistent perceptual quality, use the Quality parameter:

var settings = new SpeexEncoderSettings {
    Quality = 8.0f, // Range from 0 (lowest) to 10 (highest)
    VBR = false     // Fixed quality mode
};

Higher quality values produce better audio at the expense of increased file size. Most speech applications work well with quality values between 5-8.

# Variable Bit Rate (VBR)

VBR dynamically adjusts the bitrate based on speech complexity:

var settings = new SpeexEncoderSettings {
    VBR = true,
    Quality = 8.0f  // Target quality level
};

This approach typically saves bandwidth while maintaining consistent perceived quality, making it ideal for streaming applications.

# Average Bit Rate (ABR)

ABR maintains a target bitrate over time while allowing quality fluctuations:

var settings = new SpeexEncoderSettings {
    ABR = 15.0f,   // Target bitrate in kbps
    VBR = true     // Required for ABR mode
};

This option works well when you need predictable file sizes or bandwidth usage.

# Fixed Bitrate Encoding

For consistent data rates throughout the encoding process:

var settings = new SpeexEncoderSettings {
    Bitrate = 24.6f,  // Fixed rate in kbps
    VBR = false
};

Supported bitrates range from 2.15 kbps to 24.6 kbps:

  • 2.15 kbps - Ultra-compressed speech (limited quality)
  • 3.95 kbps - Low bandwidth voice
  • 5.95 kbps - Basic speech clarity
  • 8.00 kbps - Standard voice quality
  • 11.0 kbps - Good speech reproduction
  • 15.0 kbps - Near-transparent speech
  • 18.2 kbps - High-quality voice
  • 24.6 kbps - Maximum quality speech

# Voice Optimization Features

# Voice Activity Detection (VAD)

VAD identifies the presence of speech in audio signals:

var settings = new SpeexEncoderSettings {
    VAD = true,    // Enable voice detection
    DTX = true     // Recommended with VAD
};

This feature improves bandwidth efficiency by focusing encoding resources on actual speech segments.

# Discontinuous Transmission (DTX)

DTX reduces data transmission during silence periods:

var settings = new SpeexEncoderSettings {
    DTX = true     // Enable silence compression
};

For VoIP and real-time communications, enabling DTX can significantly reduce bandwidth requirements.

# Encoding Complexity

Control CPU usage versus encoding quality:

var settings = new SpeexEncoderSettings {
    Complexity = 3  // Range: 1 (fastest) to 10 (highest quality)
};

Lower values prioritize speed and reduce CPU load, while higher values improve audio quality at the cost of performance.

# Implementation Examples

# Checking Encoder Availability

Always verify encoder availability before implementation:

if (!SpeexEncoderSettings.IsAvailable())
{
    throw new InvalidOperationException("Speex encoder not available on this system.");
}

# Basic Configuration

var encoderSettings = new SpeexEncoderSettings
{
    Mode = SpeexEncoderMode.WideBand,
    SampleRate = 16000,
    Channels = 1,
    Quality = 7.0f
};

# Optimized for Voice Calls

var voipSettings = new SpeexEncoderSettings
{
    Mode = SpeexEncoderMode.WideBand,
    SampleRate = 16000,
    Channels = 1,
    VBR = true,
    VAD = true,
    DTX = true,
    Quality = 6.0f,
    Complexity = 4
};

# Highest Quality Speech

var highQualitySettings = new SpeexEncoderSettings
{
    Mode = SpeexEncoderMode.UltraWideBand,
    SampleRate = 32000,
    Channels = 2,
    Bitrate = 24.6f,
    Complexity = 8
};

# SDK Integration

# Video Capture SDK Integration

// Create a Video Capture SDK core instance
var core = new VideoCaptureCoreX();

// Configure Speex settings
var speexSettings = new SpeexEncoderSettings
{
    Mode = SpeexEncoderMode.WideBand,
    SampleRate = 16000,
    Channels = 1,
    VBR = true,
    Quality = 7.0f
};

// Add the Speex output
core.Outputs_Add(speexSettings, true);

# Video Edit SDK Integration

// Create a Video Edit SDK core instance
var core = new VideoEditCoreX();

// Configure and set the output format
var speexSettings = new SpeexEncoderSettings
{
    Mode = SpeexEncoderMode.WideBand,
    SampleRate = 16000,
    VBR = true,
    ABR = 15.0f
};

// Set the output format
core.Output_Format = speexSettings;

# Media Blocks SDK Integration

// Configure Speex settings
var speexSettings = new SpeexEncoderSettings
{
    Mode = SpeexEncoderMode.NarrowBand,
    SampleRate = 8000,
    DTX = true,
    VAD = true
};

// Create a Speex encoder block
var speexEncoder = new SpeexEncoderBlock(speexSettings);

# Performance Optimization

When implementing Speex encoding, consider these optimization strategies:

  1. Match sample rate to content - Use Narrow Band (8 kHz) for telephone audio, Wide Band (16 kHz) for most voice applications, and Ultra Wide Band (32 kHz) only when maximum quality is required

  2. Enable VBR with VAD/DTX for speech content - This combination provides optimal bandwidth efficiency for typical voice recordings

  3. Adjust complexity based on platform - Mobile applications may benefit from lower complexity values (2-4), while desktop applications can use higher values (5-8)

  4. Use ABR for streaming - Average Bit Rate provides predictable bandwidth usage while maintaining quality flexibility

  5. Test different quality settings - Often a quality setting of 5-7 provides excellent results without excessive file size

# Use Cases

Speex encoding excels in these developer scenarios:

  • VoIP applications and internet telephony
  • Voice chat features in games and collaboration tools
  • Podcast creation and distribution
  • Speech recognition preprocessing
  • Voice note applications
  • Audio archiving of speech content

The VisioForge implementation of Speex provides .NET developers with a powerful tool for speech compression that balances quality, performance, and bandwidth efficiency across a wide range of applications.