Category Archives: Audio - Page 2

Confusing AUDIO_STREAM_CONFIG_CAPS

I don’t have any idea who makes software nowadays, but how can it expected to be reliable?

Intel DG33FBC motherboard, onboard Realtek ALC888 High Definition Audio. I am tracing AUDIO_STREAM_CONFIG_CAPS capabilities reported by onboard audio capture board, one of them:

AM_MEDIA_TYPE:

majortype {73647561-0000-0010-8000-00AA00389B71}, subtype {00000001-0000-0010-8000-00AA00389B71}, pUnk 0x00000000
bFixedSizeSamples 1, bTemporalCompression 0, lSampleSize 4
formattype {05589F81-C356-11CE-BF01-00AA0055595A}, cbFormat 18, pbFormat 0x002911a8
pbFormat as WAVEFORMATEX:
  wFormatTag 1
  nChannels 2
  nSamplesPerSec 8000
  nAvgBytesPerSec 32000
  nBlockAlign 4
  wBitsPerSample 16
  cbSize 0

AUDIO_STREAM_CONFIG_CAPS:

guid {73647561-0000-0010-8000-00AA00389B71}
MinimumChannels 1, MaximumChannels 2, ChannelsGranularity 1
MinimumBitsPerSample 8, MaximumBitsPerSample 16, BitsPerSampleGranularity 8
MinimumSampleFrequency 11025, MaximumSampleFrequency 44100, SampleFrequencyGranularity 11025

Media type sampling frequency is 8 KHz (correct) but associated capabilities structure still report different sampling rates and granularity (crap), it is in fact 11025..44100 Hz for all capabilities, including those with sampling frequencies from a different row.

Multichannel audio recording

There has been recently an interesting post “Problem playing back multi-channel wave file under vista” on DirectShow Development forum about certain hardware that is capable of recording 64 channels of audio in 24-bit per sample PCM quality, with a sample recorded .WAV file posted.

I was curious what kind of hardware implements such recording. Google suggests it could be PCM H64 multichannel system from Sadie, UK.

This is definitely an interesting piece of hardware which can fill normally video’s bandwidth with audio only data. BTW, some useful related C++/DirectShow code is here on new Assembla’s web itnerface into subversion repository.

DirectSound play buffer notification (IDirectSoundNotify8)

IDirectSoundNotify8 is an interface to get notified on playback on capture audio buffer reaching certain position in the buffer. It is a must thing when implementing ring buffers with new data continuously added to the buffer for seamless playback (continuously copied from in case of capture).

This project is a minimalistic C++ sample code to illustrate the API. To initialize the DirectSound subsystem it is required to provide a window handle, which is created using ATL’s CWindowImpl (CMessageOnlyWindowImpl).

...
ATLENSURE_SUCCEEDED(DirectSoundCreate8(NULL, &pDirectSound8, NULL));
...
ATLENSURE_SUCCEEDED(pDirectSound8->SetCooperativeLevel(Window, DSSCL_PRIORITY));
...
ATLENSURE_SUCCEEDED(pDirectSound8->CreateSoundBuffer(&BufferDescriptor, &pDirectSoundBuffer, NULL));
...
CComQIPtr<IDirectSoundNotify8, &IID_IDirectSoundNotify8> pDirectSoundNotify8 = pDirectSoundBuffer;
...
ATLENSURE_SUCCEEDED(pDirectSoundNotify8->SetNotificationPositions(g_nPositionCount, pPositionNotify));
ATLENSURE_SUCCEEDED(pDirectSoundBuffer->Play(0, 0, DSBPLAY_LOOPING));

Read more »

FFDShow is getting more annoying

Surprisingly fast I got new problems having ffdshow installed as a part of K-Lite Codec Pack. No wonder though because let us take a look at registration information:

Display Name: @device:sw:{083863F1-70DE-11D0-BD40-00A0C911CE86}\{0F40E1E5-4F79-4988-B1A9-CC98794E6B55}
CLSID: {0F40E1E5-4F79-4988-B1A9-CC98794E6B55}
Friendly Name: ffdshow Audio Decoder
Path: C:\Program Files\K-Lite Codec Pack\ffdshow\ffdshow.ax
Merit: 0x3fffffff

Nice merit, ain’t it? What is merit anyway? Let us check at MSDN:

MERIT_PREFERRED = 0×800000,
MERIT_NORMAL = 0×600000,

MERIT_HW_COMPRESSOR = 0×100050

The highest defined value is 0×00800000, while ffdshow is registered with 0x3FFFFFFF, that is on top of everything. No doubt the developers read Guidelines for Registering Filters and decided to get rid of the rest of installed software as unnecessary crap.

Anyway back to the problem: I had an A-law wave file (WAVE_FORMAT_ALAW) to play and make sure its data is valid and quite unexpectedly there was a silence while playing. A quick check confirmed that the sustem has CCITT A-law codec installed, however GraphEdit shown ffdshow Audio Decoder intercepting decoding. Obviously it spoiled the thing!

Finally I decided it was a right time to take an advantage of IAMGraphBuilderCallback interface to detect and reject the bastard.

Read more »

Skype and SIP

See Why does the N770 have Google Talk instead of Skype? on Robin Jewsbury’s Forum Nokia Blog. I do share the opinion that SIP is going to be a mainstream protocol for Internet Voice [and Video?] applications, though there are other opinions (including SIMPLE, IAX etc.). It does not however seems to appear much too fast since as we can see XMPP is very slowly becoming popular too.

There is no doubt SIP will be the underlying technology of all P2P systems in the future and no doubt in my mind that Skype will have to move to SIP at some stage. SIP is far more efficient than Skype.

Speech Codecs Library

Summary

The library implements some of the speech codecs provided with Intel IPP Library samples (G.726, G.728, GSM 06.90 AMR) as DirectShow filters (using DirectX Media Object technology) making it available to a wide range of DirectX 9 compatible applications.

Usage of Intel Integrated performance primitives make the processing performance efficient.

See DirectX GraphEdt utility screenshots below to get an idea about availability of the codecs:

Screenshot 1 Screenshot 2

Screenshot 3

Read more »

Announcement: Intel IPP speech coding algorithms as DirectX Media Objects

Getting back to audio codecs, based on Intel IPP library, version 5.0. Basically, I made a wrapper over Intel’s Unified Speech Codec (USC) codec implementation that exposes codec as DMO which makes it available for a wide range of media applications. At the very moment GSM 06.90 AMR codec and G.726 codecs are available but the others are addable with ease. I still have a vague idea about licensing but I think it would hurt noone if a demo library is published. The demo will have some randomly forced silence intervals. I believe it is going to be sufficient to evaluate operating performance and processing quality of the codecs.