performance - Windows CPU Scheduler - very high kernel time -
we trying understand how windows cpu scheduler works in order optimize our applications achieve maximum possible infrastructure/real work ratio. there's things in xperf don't understand , ask community shed light on what's happening. started investigate these issues when got reports servers "slow" or "unresponsive".
background information
we have windows 2012 r2 server runs our middleware infrastructure following specs.
we found concerning 30% of cpu getting wasted on kernel, started dig deeper.
the server above runs "host" ~500 processes (as windows services), each of these "host" processes has inner while loop ~250 ms delay (yuck!), , each of "host" processes may have ~1..2 "child" processes executing actual work.
while having infinite loop 250 ms delay between iterations, actual useful work "host" application execute may appear every 10..15 seconds. there's lot of cycles wasted unnecessary looping.
we aware design of "host" application sub-optimal, least, applied our scenario. application getting changed event-based model not require loop , therefore expect significant reduction of "kernel" time in cpu utilization graph.
however, while investigating problem, we've done xperf analysis raised several general questions windows cpu scheduler unable find clear/concise explanation.
what don't understand
below screenshot 1 of xperf sessions.
you can see "cpu usage (precise)"
there's 15 ms time slices, of majority under-utilized. utilization of slices ~35-40%. assume in turn means cpu gets utilized ~35-40% of time, yet system's performance (let's observable through casual tinkering around system) really sluggish.
with have "mysterious" 30% kernel time cost, judged task manager cpu utilization graph.
some cpu's utilized whole 15 ms slice , beyond.
questions
as far windows cpu scheduling on multiprocessor systems concerned:
- what causes 30% kernel cost? context switching? else? consideration should made when applications written reduce cost? or - achieve perfect utilization minimal infrastructure cost (on multiprocessor systems, number of processes higher number of cores)
- what these 15 ms slices?
- why cpu utilization has gaps in these slices?
to diag cpu usage issues, should use event tracing windows (etw) capture cpu sampling data (not precise, useful detect hangs).
to capture data, install windows performance toolkit, part of windows sdk.
now run wprui.exe
, select first level
, under resource select cpu usage , click on start.
now capture 1 minute of cpu usage. after 1 minute click on save.
now analyze generated etl file windows performance analyzer drag & drop cpu usage (sampled)
graph analysis pane
, order colums see in picture:
inside wpa, load debug symbols , expand stack of system process. in demo, cpu usage comes nvidia driver.
Comments
Post a Comment