Friday, September 10, 2010

Biztalk 64bit Host CPU 100% , Microsoft.BizTalk.MsgBoxPerfCounters.CounterManager.RunCacheThread

I’ve been involved to nail down one CPU 100% issue. per my understanding, the ETW trace API failed and caused the System to run the refresh in an infinite loop.

what’s the symbol of the event?

there are 9 HOSTs in one Biztalk server. only 1 Host kept hitting 100% cpu. even there was no business activity. ( no message came in, no orchestration get invoked. )

what did I do to locate the problem. 
   
  Run a memory Dump, then open the dump in Windbg, see which thread is CPU hungry.

run !runaway
 
91:1235      0 days 3:35:28.796
92:9740      0 days 3:34:24.140

threads 91 and 92 are the top threads which consume a lot CPU time.

then Run a clrstack to see what are those two threads doing.

!91s //switch to thread 91
!clrstack –a //view the stacktrace, and show the instance of the variable
Both 91 and 92 get the same result.

00000000143fc810 00000642783437fa System.Diagnostics.StackTrace.ToString(TraceFormat)
00000000143fc8f0 00000642808b34e6 System.Exception.get_StackTrace()
00000000143fc930 000006427f535468 Microsoft.BizTalk.MsgBoxPerfCounters.CounterManager.RunCacheThread()
00000000143fe850 00000642808b3ac7 Microsoft.BizTalk.MsgBoxPerfCounters.MgmtDbAccessEntity.UpdateDeltaMACacheRefreshInterval()
00000000143fe8e0 00000642808b32b7 Microsoft.BizTalk.MsgBoxPerfCounters.MgmtDbAccessEntity.RunCacheUpdates(Int32)
00000000143fe970 00000642782f173b Microsoft.BizTalk.MsgBoxPerfCounters.CounterManager.RunCacheThread()
00000000143fea10 000006427838959d System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
00000000143fea60 000006427f602672 System.Threading.ThreadHelper.ThreadStart()

Let’s see the stacktrace from bottom up, the closed Biz code is the Microsoft.BizTalk.MsgBoxPerfCounters.CounterManager.RunCacheThread
the update by default should be invoked every 60 seconds. why it keep invoking. ???

since memory dump is a static analysis and snapshot of the memory. then a run a cordbg,  what surprised me is that the process keep throwing exceptions.
like this

first chance exception generated: (0xe00a18b8) <System.Threading.ThreadAbortExcetion>
first chance exception generated: (0x1600a0e70) <System.Threading.ThreadAbortExcption>
first chance exception generated: (0xe00a2ff0) <System.Threading.ThreadAbortExcetion>
first chance exception generated: (0x1600a21b0) <System.Threading.ThreadAbortExcption>
first chance exception generated: (0xe00a4580) <System.Threading.ThreadAbortExcetion>
first chance exception generated: (0x1600a38d8) <System.Threading.ThreadAbortExcption>
first chance exception generated: (0xe00a58d8) <System.Threading.ThreadAbortExcetion>
first chance exception generated: (0x1600a4e60) <System.Threading.ThreadAbortExcption>
first chance exception generated: (0xe00a7010) <System.Threading.ThreadAbortExcetion>
first chance exception generated: (0x1600a61a0) <System.Threading.ThreadAbortExcption>
first chance exception generated: (0xe00a85a0) <System.Threading.ThreadAbortExcetion>

then I turned on the “unhanled exceptino on” by run “ca e” get the exception details

first chance exception generated: (0x120128fd0) <System.Threading.ThreadAbortExc
ption>
_className=<null>
_exceptionMethod=<null>
_exceptionMethodString=<null>
_message=(0x120129058) "Thread was being aborted."
_data=<null>
_innerException=<null>
_helpURL=<null>
_stackTrace=(0x1201290a8) <System.SByte[]>
_stackTraceString=<null>
_remoteStackTraceString=<null>
_remoteStackIndex=0
_dynamicMethods=<null>
_HResult=-2146233040
_source=<null>
_xptrs=0
_xcode=-532459699
xception is called:FIRST_CHANCE
ative disassembly not available.
cordbg)

is it the same stacktrace, yes it is. run W

)* Microsoft.BizTalk.MsgBoxPerfCounters.MgmtDbAccessEntity::UpdateDeltaMACacheR
freshInterval
+0171[native] +0035[IL] in <Unknown File Name>:<Unknown Line Numb
r>
)  Microsoft.BizTalk.MsgBoxPerfCounters.MgmtDbAccessEntity::RunCacheUpdates +01
1[native] +0039[IL] in <Unknown File Name>:<Unknown Line Number>
)  Microsoft.BizTalk.MsgBoxPerfCounters.CounterManager::RunCacheThread +0279[na
ive] +0093[IL] in <Unknown File Name>:<Unknown Line Number>
)  System.Threading.ExecutionContext::Run +0155[native] +0095[IL] in <Unknown F
le Name>:<Unknown Line Number>
)  System.Threading.ThreadHelper::ThreadStart +0077[native] +0025[IL] in <Unkno
n File Name>:<Unknown Line Number>
)  [Internal Frame, 'AD switch':(AD ''. #) -->(AD '__XDomain_3.0.1.0_0'. #7)]

then I use the ILDASM , and Open the assembly, go the the 0x 23 Lines. 

IL_001c:  callvirt   instance void [Microsoft.BizTalk.Tracing]Microsoft.BizTalk.Tracing.Trace/HackTraceProvider::TraceMessage(uint32,
                                                                                                                               string,
                                                                                                                               object[])
IL_0021:  ldnull
IL_0022:  stloc.2d


it is a ETW trace API.

Trace.Tracer.TraceMessage(4, "MgmtDbAccessEntity: Entering RunCacheUpdates fn for host " + this.hostName, new object[0])


what a bad design for the code that Tracing API (non-functional) broke the functional application, and even didn't catch the unhandled exception

More troubleshooting entries.

2 comments:

soukya reddy said...

This is very nice blog,and it is helps for student's.Thanks for sharing
Biztalk Online course Bangalore

svrtechnologies said...

I am inspired with your post writing style & how continuously you describe this topic on biztalk edi tutorial . After reading your post, thanks for taking the time to discuss this, I feel happy about it and I love learning more about this topic.

 
Locations of visitors to this page