Published by alax on 23 Dec 2008

ProcessSnapshot to take a snapshot of process modules, threads, stacks and performance

While troubleshooting released application on remote production site, it is very useful to grasp a state of the process for further analysis. There are several scenarios in which the following information about process state is helpful:

  • modules (DLLs) loaded into process and their versions
  • threads and their call stacks
  • process and thread performance

An utility ProcessSnapshot takes advantage of Debugging Tools API (dbghelp.dll - note the dialog also displays DLL version in the right bottom corner) and writes this helpful information to text file and it can also take a sequence of the snapshots to compare thread performance and/or stacks and check the difference.

The generated file is in the directory of the utility application and looks like:

Snapshot
  System Time: 10/14/2008 8:46:33 PM
  Local Time: 10/14/2008 11:46:33 PM

Performance
  Creation System Time: 10/14/2008 8:46:28 PM
  Kernel Time: 0.094 s
  User Time: 0.031 s

Modules

  Module: ProcessSnapshot.exe @00400000
    Base Address: 0x00400000
    Base Size: 0x0005b000 (372736)
    Name: ProcessSnapshot.exe
    Path: D:\Projects\Utilities\ProcessSnapshot\Release\ProcessSnapshot.exe
    Product Version: 1.0.0.1
    File Version: 1.0.0.125

  Module: ntdll.dll @7c900000
    Base Address: 0x7c900000
    Base Size: 0x000af000 (716800)
    Name: ntdll.dll
    Path: C:\WINDOWS\system32\ntdll.dll
    Product Version: 5.1.2600.5512
    File Version: 5.1.2600.5512
[...]

Threads

  Thread: 3824
    Base Priority: 8
    Creation System Time: 10/14/2008 8:46:57 PM
    Kernel Time: 0.063 s
    User Time: 0.031 s
    Call Stack
      ntdll!7c90e4f4 KiFastSystemCallRet (+ 0) @7c900000
      USER32!7e4249c4 GetCursorFrameInfo (+ 460) @7e410000
      USER32!7e424a06 DialogBoxIndirectParamAorW (+ 54) @7e410000
      USER32!7e4247ea DialogBoxParamW (+ 63) @7e410000
      ProcessSnapshot!00403f45 ATL::CDialogImpl<CMainDialog,ATL::CWindow>::DoModal (+ 67) [c:\program files\microsoft visual studio 9.0\vc\atlmfc\include\atlwin.h, 3478] (+ 28) @00400000
      ProcessSnapshot!00403b6f CProcessSnapshotModule::RunMessageLoop (+ 74) [d:\projects\utilities\processsnapshot\processsnapshot.cpp, 67] (+ 0) @00400000
      ProcessSnapshot!004049b9 ATL::CAtlExeModuleT<CProcessSnapshotModule>::Run (+ 17) [c:\program files\microsoft visual studio 9.0\vc\atlmfc\include\atlbase.h, 3552] (+ 0) @00400000
      ProcessSnapshot!004041c3 ATL::CAtlExeModuleT<CProcessSnapshotModule>::WinMain (+ 48) [c:\program files\microsoft visual studio 9.0\vc\atlmfc\include\atlbase.h, 3364] (+ 5) @00400000
      ProcessSnapshot!00434477 wWinMain (+ 5) [*d:\projects\utilities\processsnapshot\release\processsnapshot.inj:5, 14] (+ 0) @00400000
      ProcessSnapshot!00415058 __tmainCRTStartup (+ 274) [f:\dd\vctools\crt_bld\self_x86\crt\src\crt0.c, 263] (+ 27) @00400000
      !00360033

Continue Reading »

Published by alax on 14 Nov 2008

A bright and diverse world of HDD failures

Yesterday there was a support request on another weird problem which was eventually forwarded to me for investigation. The symptom was “application configured as Windows service does not start and shows “Starting” in services console all the time”. The case did not look difficult as the symptom clearly indicates a freeze or deadlock inside service, but the task is anyway to find the reason and make it fast.

The symptom was exactly the same if software was started as a regular application. Luckily I already have the tool available, ProcessSnapshot, to troubleshoot things easily. The call stack indicated that software froze inside CreateFile function and further investigation revealed that it was a particular drive which was causing the problem. This was a digital video recording server with several HDDs attached building up a total video archive capacity of over 7 terabytes.

So basically the job was done but what was particularly interesting: the drive was still accessible from Windows Vista shell explorer, however there was a noticable delay opening a subdirectory. I was trying to find an evidence of HDD failure in system event logs but there were none. The drive did looks healthy in system information and logs, shown unreasonable delay but still was openable when browsing directory, and was completely freezing an application which attempted to write to this drive. And as it is natural in case of such symptoms, the first suspect is software and it was us to find the real reason.

… I appreciate all the support. It’s working very nicely again. I think I have a faulty 1.5 Tb drive (caution with those drives - I heard on the news that they’re failing).

By the way, the drive was Seagate Barracuda 7200.11 SATA 3Gb/s 1.5-TB Hard Drive (ST315003)