Utility Clearance: Logical Processor Information

LogicalProcessorInformation is a fronend GUI around GetLogicalProcessorInformation API and reveals CPU configuration of the system. If you are fine tuning stuff, you might want to know what sort of CPUs are powering the applications:

  • how many?
  • fully featured cores or HyperThreading technology?
  • mutli-processor configuration and how exactly physical processors are distributed over affinity mask

API and the application gets you the data as a user-friendly dump of GetLogicalProcessorInformation output and a summary of records at the bottom:

Continue reading →

You cannot remove RBBS_NOGRIPPER flag from rebar band

It is as stupid as it sounds: you can add RBBS_NOGRIPPER flag to a rebar band to implement “lock controls” feature, but as soon as you start removing it – you miserably fail.

A related thread on WTL group is dated year 2006. You can find the same dated 2003. Now it’s 2011 and it’s still here.

OK, it might be some bug in controlling code etc, but here is Control Spy 2.0. Here you go:

WTL does a cool trick to fool the bastard:

void LockBands(bool bLock)
{
    int nBandCount = GetBandCount();
    for(int i =0; i < nBandCount; i++)
    {
        REBARBANDINFO rbbi = { RunTimeHelper::SizeOf_REBARBANDINFO() };
        rbbi.fMask = RBBIM_STYLE;
        BOOL bRet = GetBandInfo(i, &rbbi);
        ATLASSERT(bRet);

        if((rbbi.fStyle & RBBS_GRIPPERALWAYS) == 0)
        {
            rbbi.fStyle |= RBBS_GRIPPERALWAYS;
            bRet = SetBandInfo(i, &rbbi);
            ATLASSERT(bRet);
            rbbi.fStyle &= ~RBBS_GRIPPERALWAYS;
        }
        if(bLock)
            rbbi.fStyle |= RBBS_NOGRIPPER;
        else
            rbbi.fStyle &= ~RBBS_NOGRIPPER;

        bRet = SetBandInfo(i, &rbbi);
        ATLASSERT(bRet);
    }
}

Pretty cool and works… if you have one rebar only. If you have another one in the same window (e.g. mine is at bottom) – you fail again.

A nasty workaround is to nudge hosting window by resizing it, followed by a full cycle of internal layout updates:

CRect Position;
ATLVERIFY(GetWindowRect(Position));
Position.bottom++;
ATLVERIFY(MoveWindow(Position));
Position.bottom--;
ATLVERIFY(MoveWindow(Position));

Booo SRW Locks

Windows Vista introduced new synchronization API: slim reader/writer (SRW) locks. Being already armed with critical sections, one perhaps would not desperately need an SRW lock, but still it offers a great option to provide both exclusive (critical section alike) mode and shared mode when 2+ threads can enter protected section simultaneously. Some time earlier, I already touched SRW lock recursion issue.

This time it is more about performance. Let us suppose we have a code fragment:

static const SIZE_T g_nCount = 100000000;
for(SIZE_T nIndex = 0; nIndex < g_nCount; nIndex++)
{
    AcquireSRWLockShared(&m_Lock);
    ReleaseSRWLockShared(&m_Lock);
}

How fast is this? Provided that execution took 1.6 ms on an idle lock variable, how fast it is going to be with another shared “reader” on another thread who acquired once access in shared mode? This part comes up confusing: it appears almost twice as slow: 2.9 ms, with also a number of contentions (and context switches) on the way.

SRW lock is advertised as lightweight and fast API, no recursion, no extra features, no fool proof checks. So it could be assumed to get you top performance, but 50 lines of code class can definitely outperform it.

A perhaps simplest SRW lock can be backed on a LONG (or LONGLONG) volatile variable accessed with interlocked API functions. The rules are simple:

  • initial value of zero means idle state
  • shared “reader” enters incrementing by one, positive values indicates one ore more readers
  • exclusive “writer” enters decrementing by a hundred (well, this needs to be any value big enough to be less than minus maximal amount of simultaneous concurrent readers)
  • releasing acquired state the value is decremented (incremented) back
  • in case of contention thread yields execution to continue later with possibly better luck with shared resource (option: implement spic count for several attempts in a row, which is better suitable for multi-processor systems)

It is clear that concurrent readers touch value only once with a single increment only, leaving no opportunity to yield execution due to contention. How fast it can be?

static const SIZE_T g_nCount = 100000000;
for(SIZE_T nIndex = 0; nIndex < g_nCount; nIndex++)
{
    while(InterlockedIncrement(&m_nNativeLock) < 0)
    {
        InterlockedDecrement(&m_nNativeLock);
        SwitchToThread();
    }
    InterlockedDecrement(&m_nNativeLock);
}

It is 1.3 ms regardless whether there are any shared readers on concurrent threads. It appears that a simple custom SRW lock class is going to be superior to the API:

  • faster due to zero API overhead (inline compiled code versus WINAPI convention functions)
  • faster in shared mode due to not being subject to additional overhead and contentions
  • flexible spin counts possible to tune performance for specific use
  • extensibility:
    • recursion is allowed shared mode
    • easy to implement upgrades and downgrades between shared and exclusive modes

Sample code is Visual Studio 2010 C++ project accessible from SVN repository.

Update: Critical Section Test. This is how SRW lock API performance compares to entry/leaving critical section (provided that critical section is never locked at entry time). Critical section is about 15% slower to enter and leave.

C:\Projects\...\SrwLockTest02\x64\Release>SrwLockTest02.exe
API: 100M iterations, 1575 ms
API: 100M iterations, 2839 ms
Native: 100M iterations, 1311 ms
Native: 100M iterations, 1326 ms
Critical Section: 100M iterations, 1841 ms

Obtaining number of thread context switches programmatically

Previous post on thread synchronization and context switches used number of thread context switches as one of the performance indicators. One might have hard times getting the number from operating system though.

The only well documented access to amount of context switches seems to be accessing corresponding performance counters. Thread performance counter will list available thread instances and counters “Thread(<process-name>/<thread-number>)/Context Switches/sec” will provide context switch rate per second.

While access to performance counters is far not the most convenient API, to access data programmatically one would really prefer absolute number of switches rather than rate per second (which is still good for interactive monitoring).

A gate into kernel world to grab the data of interest is provided with NtQuerySystemInformation function. Although mentioned in documentation, it is marked as unreliable for use, and Windows SDK static library is missing it so one has to obtain it using GetModuleHandle/GetProcAddress it explicity.

typedef NTSTATUS (WINAPI *NTQUERYSYSTEMINFORMATION)(SYSTEM_INFORMATION_CLASS SystemInformationClass, PVOID SystemInformation, ULONG SystemInformationLength, PULONG ReturnLength);
NTQUERYSYSTEMINFORMATION NtQuerySystemInformation = (NTQUERYSYSTEMINFORMATION) GetProcAddress(GetModuleHandle(_T("ntdll.dll")), "NtQuerySystemInformation");
ATLASSERT(NtQuerySystemInformation);
ATLVERIFY(NtQuerySystemInformation(...) == 0);

Having this done, the function is capable of providing SystemProcessInformation/SYSTEM_PROCESS_INFORMATION data about running processes.

Continue reading →

Thread synchronization and context switches

A basic task in thread synchronization is putting something on one thread and getting it out on another thread for further processing. Two or more threads of execution are accessing certain data, and in order to keep data consistent and solid the access is split into atomic operations which are only allowed for one thread at a time. Before one thread completes its thing, another is not allowed to touch stuff, such as waiting for so called wait state. This is what synchronization objects and critical sections in particular for. Furthermore, a thread which is waiting for stuff to be available has nothing to do, so it uses one of the wait functions to not waste CPU time, and both threads are using event or similar objects to notify and receive notifications waking up from wait state.

Let us see what is the cost of doing things not quite right. Let  us take a send thread which is generating data/events which is locking shared resource and setting an event when something is done and requires another receive thread to wake up and take over. Send thread might be doing something like:

CComCritSecLock<CComAutoCriticalSection> DataLock(m_CriticalSection);
m_nSendCount++;
ATLVERIFY(m_AvailabilityEvent.Set());

And receive thread will wait and take over like this:

CComCritSecLock<CComAutoCriticalSection> DataLock(m_CriticalSection);
m_nReceiveCount++;
ATLVERIFY(m_AvailabilityEvent.Reset());

Let us have three send threads and one receive thread running in parallel:

The simplicity is tempting and having run this the result over 60 seconds is:

Continue reading →

How to use your own video transformation DirectShow filter as an effect while playing back a file

One of the popular questions asked in relation to DirectShow filters, and also a popular task is to modify video stream while in playback. There are various reasons one might need to do it, including keying video to replace color/range, or apply text or time code on top of video, including while re-compressing footage, adjust brightness or contrast.

DirectShow BaseClasses/SDK include samples (in most cases EzRGB24 sample is the best to start for a video effect, also demonstrates use of private interface on a filter) to quick-start with the task without getting too much into detail and once this part is done, next step is to integrate filter into playback, connect it with other filters.

As a result of DircetShowflexibility, there are ways to do things not so good, while still being under impression of keeping right track.

File playback is one of the basic tasks with DirectShow. To play a file, one creates a filter graph using powerful helpers provided by Filter Graph Manager object. It might be a actually a single call IGraphBuilder::RenderFile which takes all the complexity of finding matching filters, connecting them together, dealing with splitters and renderers, video and audio. A single call resolves the problems in a convenient way – easy.

Still a simple thing of inserting your own video transformation filters breaks the simplicity. One needs to build the graph partilly, insert the effect and complete building, or build the thing and break in with a new filter. How to find insertion point? Will the other filters like intrusion? Different file types and formats.

There is an easy and elegant solution to pre-add your own effect filter into graph and start rendering a file from there. Sounds reasonable and sometimes works. The problem is however that it does not work always, and you never know when it lets you down. The graph might be build and the effect filter is never taken and is left orphaned aside of playback pipeline.

Reliable graph building assumes you are in control over building steps and allow only the level of flexibility required to connect and build parts – and this is where Intelligent Connect is still a powerful helper. With an effect, the parts are “before the effect” and “after the effect”. RenderFile is no longer an option, and one has to dive deeper into graph building API.

First of all, the building starts with the file itself: unlike RenderFile, IGraphBuilder::AddSourceFilter method adds just the first filter for a given file/format. It stops there and lets caller continue building as required. At this point, it is the right time to manually add effect filter with IFilterGraph::AddFilter (IGraphBuilder is inherited from IFilterGraph and exposes the same method).

Having both ends in the graph for the “before the effect” part, intelligent connect can be used to connect and add filters required to make the connection. For an arbitrary file format, the task may be not trivial: depending on format and installed software components, the chain may look rather different. First, some filters combine stream splitting capability with immediate access to file (or another type of resource), others rely on joint operation of Async File Source filter with a format-dependent splitter filter. Some expose elementary stream pins immediately, some provide transport stream pin.

There may be a few approaches as for addressing pin connection task (see also General Graph-Building Techniques on MSDN). Straightforwardly, one might want to call IGraphBuilder::Connect and take advantage of intelligent connect. Before this can be done, however it takes caller to select a pin of the obtained source filter to start from. There might be a few pins, including those exposing video formats, non-video formats and pre-split formats where video is behind depacketizing (demultiplexing). Considering variety of formats and support, it might make sense to make a first attempt finding a video pin (by enumerating pins and their media types, looking and AM_MEDIA_TYPE::majortype and comparing to MEDIATYPE_Video) and, if not found, taking a first output pin of any type, or going through ping trying to connect first one which succeeds in connection.

An alternate approach is to take advantage of a helper object: Capture Graph Builder. While originally it is intended to help capture graph building, it contains useful methods for general building and connecting pins. It does not own a graph itself: it is designed to operate on top of existing graph, provided by its owner. So one need to provide its existing graph and call helper methods for easy graph building. One of the methods is ICaptureGraphBuilder2::RenderStream, which connects pins of given filters. Unlike API discussed earlier, it takes filter interfaces on input and will be responsible for finding proper pins itself, which might be a good idea if you don’t want to bother yourself doing it. To specify the requested task, it takes media type argument, which in this case might be video or, if fails, any type provided that video media type will still be anyway checked on input of effect filter.

Once the part “before the effect” is done, the other part may be completed as simple as calling IGraphBuilder::Render on the output pin of the effect. This will correspond to final step of original RenderFile execution.

A tiny Visual Studio 2010 C++ project illustrates discussed techniques and it available at SVN repository: RenderStreamTest01:

  • for a given media file in line 109 the project will start graph building
  • a suitable replacement for a video effect filter will be a Sample Grabber filter initialized with a video type (24-bit RGB, but the line 137 can be commented out)
  • switch in line 147 switches between base Connect approach and Capture Graph Builder helper
  • while message box is on the screen and also showing building status, the graph can be looked at using Graph Edit or similar tool, provided that DirectShow Spy is installed; alternatively you might want to put the graph onto ROT manually

The project also illustrates a solution for recent problems referencing Sample Grabber with new SDK. Sample Grabber was obsoleted and removed from Window SDK definition file (qedit.h). In order to resolved the problem without using an older version of SDK, the definitions might be imported from type the corresponding library and (apart from used as such) copied into the source code directly, as in lines 13-60.

See also on graph building:

Build Incrementer Add-In for Visual Studio: Latest Visual Studio Versions

If you share concept (as I do) that every build should have a unique file version stamp in it, for a simple purpose – at least – to distinguish between different version of the same binary, then a helpful tool of automatic incrementing fourth number in FILEVERSION’s file version is something you cannot live without. After going through several fixes and updates, it is finally here available for download.

The last issue was in particular that projects that are in solution’s folder are not found by the add-in with Visual Studio 2008. Why? OnBuildProjConfigBegin event provides you with a unique project name string in Project argument, but it appears that it is only good enough as a quick lookup argument with Visual Studio 2010.

// EnvDTE::_dispBuildEvents
    STDMETHOD(OnBuildBegin)(_EnvDTE_vsBuildScope Scope, _EnvDTE_vsBuildAction Action) throw()
    STDMETHOD(OnBuildDone)(_EnvDTE_vsBuildScope Scope, _EnvDTE_vsBuildAction Action) throw()
    STDMETHOD(OnBuildProjConfigBegin)(BSTR Project, BSTR ProjectConfig, BSTR Platform, BSTR SolutionConfig) throw()
    {
        _Z4(atlTraceCOM, 4, _T("Project \"%s\", ProjectConfig \"%s\", Platform \"%s\", SolutionConfig \"%s\"\n"), CString(Project), CString(ProjectConfig), CString(Platform), CString(SolutionConfig));
        _ATLTRY
        {
            // NOTE: const CString& cast forces compiler to process statement as variable definition rather than function forward declaration
            CProjectConfiguration ProjectConfiguration((const CString&) CString(Project), CString(ProjectConfig), CString(Platform), CString(SolutionConfig));
            CRoCriticalSectionLock DataLock(m_DataCriticalSection);
            // NOTE: Check the project on the first run only (to skip multiple increments in batch build mode)
            if(!Project || m_VersionMap.Lookup(ProjectConfiguration))
                return S_FALSE;
            _Z3(atlTraceGeneral, 3, _T("Checking project \"%s\"...\n"), CString(Project));
            // Checking the project is of C++ kind
            CComPtr<EnvDTE::Project> pProject = GetProject(CComVariant(Project));
            _A(pProject);
            ...

When the project is in a folder, Projects::Item can just fail if you are looking up for the element interface. In which case, you have to walk the collection taking into account SubProjects and additionally look there yourself. Visual Studio 2010 is one step smarter and gives the thing to you right from the start.

Eventually, the add-in is here. It’s job is to go to .RC file and increment file version each time you build the binary. It reports the action into build output window:

To install the add-in:

Or, alternatively, use installation file VisualStudioBuildIncrementerAddInSetup.msi (version 1.0.4, 379K) to have it done for you in a user-friendly way. Partial source code, a Visual Studio 2010 projectis also there in repository.