Speech Codecs Input/Output Summary

The table below shows codecs features:

Codec Frame Length Input Output
AMR GSM 06.90 20 ms 8 KHz, 16 bit, Mono Bitrates: 4750, 5150, 5900, 6700, 7400, 7950, 10200, 12200; a discontinuous transmission mode (DTX) available with 1750 bps non-speech frames
G.726 8 ms 8 KHz, 16 bit, Mono (a-law, μ-law native support possible) Bitrates: 16000, 24000, 32000, 40000

More Speech Codex

By the way, packaging more speech codecs into DMO interface is expected to come soon. With a certain luck it would go even farther to video coding…

To appear:

  • AMR-WB G.722.2
  • G.711
  • G.722
  • G.722.1
  • G.723.1
  • G.726
  • G.728
  • G.729
  • GSM 06.10

Already:

I would also need to make a summary table on input/output formats, bitrates etc.

Chances are that the following are also to be put into DMO:

  • Echo Canceller

AMR GSM 06.90 DMO

The mix of the technologies works very well.

  • MS VC++.NET 2003, ATL, WTL – for development environment
  • MS DirectShow of DirectX 9 – for multimedia infrastructure
  • Intel Integrated Performance Primitives 5.0 – for standard code implementation base

All together – AMR GSM 06.90 Speech Codec DirectX Media Object (DMO) as shown below:

Alax.Info AMR Objects in GraphEdit
Runtime requirements include:

  • MS CRT runtime msvcr71.dll redistributable
  • Intel IPP 5 redistributables, including:
    • at the very least – libguide40.dll, ippcore.dll, ipps.dll, ippsc.dll
    • one or more per processor type sets, eg. – ippspx.dll, ippscpx.dll

AMR Speech DMO

So, DirectX Media Objects (DMO) appeared to be a nice interface to extend DirectShow functionality. I could quite easily create encoder/decoder pair of DMOs for AMR speech coding (GSM 06.90). Reasonable ease and development speed. I will probably even prepare an informational version and put here on website. It is likely to contain forced silence intevals because it’s a licensed stuff and can’t be distributed freely.