Archive for the ‘Audio’ Category:
Ffdshow is getting more annoying
Surprisingly fast I got new problems having ffdshow installed as a part of K-Lite Codec Pack. No wonder though because let us take a look at registration information:
Display Name: @device:sw:{083863F1-70DE-11D0-BD40-00A0C911CE86}\{0F40E1E5-4F79-4988-B1A9-CC98794E6B55}
CLSID: {0F40E1E5-4F79-4988-B1A9-CC98794E6B55}
Friendly Name: ffdshow Audio Decoder
Path: C:\Program Files\K-Lite Codec Pack\ffdshow\ffdshow.ax
Merit: 0×3fffffff
Nice merit, ain’t it? What is merit anyway? Let us check at MSDN:
MERIT_PREFERRED = 0×800000,
MERIT_NORMAL = 0×600000,
…
MERIT_HW_COMPRESSOR = 0×100050
The highest defined value is 0×00800000, while ffdshow is registered with 0×3FFFFFFF, that is on top of everything. No doubt the developers read Guidelines for Registering Filters and decided to get rid of the rest of installed software as unnecessary crap.
Anyway back to the problem: I had an A-law wave file (WAVE_FORMAT_ALAW) to play and make sure its data is valid and quite unexpectedly there was a silence while playing. A quick check confirmed that the sustem has CCITT A-law codec installed, however GraphEdit shown ffdshow Audio Decoder intercepting decoding. Obviously it spoiled the thing!
Finally I decided it was a right time to take an advantage of IAMGraphBuilderCallback interface to detect and reject the bastard.
Skype and SIP
See Why does the N770 have Google Talk instead of Skype? on Robin Jewsbury’s Forum Nokia Blog. I do share the opinion that SIP is going to be a mainstream protocol for Internet Voice [and Video?] applications, though there are other opinions (including SIMPLE, IAX etc.). It does not however seems to appear much too fast since as we can see XMPP is very slowly becoming popular too.
There is no doubt SIP will be the underlying technology of all P2P systems in the future and no doubt in my mind that Skype will have to move to SIP at some stage. SIP is far more efficient than Skype.
Speech Codecs Library
Summary
The library implements some of the speech codecs provided with Intel IPP Library samples (G.726, G.728, GSM 06.90 AMR) as DirectShow filters (using DirectX Media Object technology) making it available to a wide range of DirectX 9 compatible applications.
Usage of Intel Integrated performance primitives make the processing performance efficient.
See DirectX GraphEdt utility screenshots below to get an idea about availability of the codecs:
Announcement: Intel IPP speech coding algorithms as DirectX Media Objects
Getting back to audio codecs, based on Intel IPP library, version 5.0. Basically, I made a wrapper over Intel’s Unified Speech Codec (USC) codec implementation that exposes codec as DMO which makes it available for a wide range of media applications. At the very moment GSM 06.90 AMR codec and G.726 codecs are available but the others are addable with ease. I still have a vague idea about licensing but I think it would hurt noone if a demo library is published. The demo will have some randomly forced silence intervals. I believe it is going to be sufficient to evaluate operating performance and processing quality of the codecs.
Speech Codecs Input/Output Summary
The table below shows codecs features:
| Codec | Frame Length | Input | Output |
|---|---|---|---|
| AMR GSM 06.90 | 20 ms | 8 KHz, 16 bit, Mono | Bitrates: 4750, 5150, 5900, 6700, 7400, 7950, 10200, 12200; a discontinuous transmission mode (DTX) available with 1750 bps non-speech frames |
| G.726 | 8 ms | 8 KHz, 16 bit, Mono (a-law, μ-law native support possible) | Bitrates: 16000, 24000, 32000, 40000 |
Subscribe to the comments for this post