How could it happen that an expreienced DirectShow guy only recently discovered DirectX Media Objects (DMO)? And the idea is really nice. Definitely got to check this at work with some audio processing.
= Serious Problems
A customer, a very loyal one, I must admit, installed several multichannel digital video recorders. He reported a few problems and more or less quickly we fixed all nut one. Serious system fails about one per 1-2 days and noone can tell the reason. Logs don’t show problems, it may be fauly hardware too (enough cases known in past), any ways to come out of this are appreciated. I wish we could fix it, the customer is very good.
Undoubtedly from the very beginning, the idea of sound source localization is far from being new. However, while it’s brewing up in the head it gives more and more keywords and finally I got a lot of theoretical and practical findings in this area. Still the Q is if there is any practical progress, something easy to use to see the effectiveness.
Similar to what I wrote recently:
On the up side, if it CAN do this then with a map of the ‘known’ positions of the microphones it could probably also plot the relative location of each source of sound in the room much like a submarine sonar. If you place a few microphones at floor level as well as ceiling it would even be able to place the sources in a 3D space. You could then make your commands do different things relative to the place or height they were spoken from. By designating the radio and TV as places where commands will be ignored you could eliminate the evil ‘clapper syndrome’ where loud gunfights on TV would turn the lights on and off. In fact, you could go one step further and have things happen simply based on any sound coming from a specific location. This could be refined to commands like ‘knock three times on the ceiling’ or ‘twice on the pipe’.
I wonder if there is any progress in estimating audio source position by simultaneous recording from several points (3+ I believe).
Thinking a bit of this funny feature, I thought it might be quite useful. For example, to remove background noise before speech recognition. Or to record a conference and split audio by the person speaking. I think there can be some fields where it may really work. However I never heard about something of this kind that really works.
If I were not wrong in my calculations, even recording at 44100 Hz may be quite sufficient (in theory, of course) to perform analysis.
Technology does not stand still (no surprise) and we have hardware that produces 1920×1200@20fps Not as cheap as dust, certainly, but still not as expensive as a rocket.
The problem however is that it is still not displayed live at it’s full extent on conventional workstation, esp. when in layout with other 3+ cameras. So imagine this 3 megapixel stuff expanded into 12 megabytes of data 20 times per second. It’s fucking alot, even on a fast machine with a modern video board.
I wish we were given a better focus for that problem because it did not appear suddenly. So it all was in the state “it’s slow but the world is not ideal so let’s get going with this”. However finally we updated our DirectShow JPEG decoding filter to use internal downsampling. Dynamically. To follow effective size of the video on the screen. It should work great!
DirectShow is great technology, I really keep being excited how all the filters work together, agree formats etc. And now it’s even more complex, the filters reagree formats dynamically…