CPU is a finite resource. Like many resources, there are limits to a CPU's capacity. The NetScaler appliance has two kinds of CPUs in general: The Management CPU and Packet CPU.
Wherein, the Management CPU is responsible for processing all the Management traffic on the appliance and the Packet CPU(s) are responsible for handling all the data traffic for eg. TCP , SSL etc.
When diagnosing a complaint involving high CPU, start by gathering the following fundamental facts:
The following command o/p are quintessential for troubleshooting the high CPU issues:
Sample o/p of stat cpu command:
> stat cpu CPU statistics ID Usage 1 29
The above o/p indicates that there is only 1 CPU (utilized for both Management and Data traffic) and the percentage of utilization is 29%.
The CPU ID is 1.
Now, there are appliances with multiple cores (nCore ) wherein more than single core is allocated to the appliance and then we see multiple CPU IDs on the "stat system cpu " o/p.
*The high CPU seen when running a "top" command does not impact the performance of the box. It also "does not" mean that the NetScaler is running at high CPU or consuming all of the CPU. The NetScaler Kernel runs on top of BSD and that is what is being seen. Although it appears to be using the full amount of the CPU, it is actually not.
 
We can further follow the below steps for understanding the CPU usage:
Check the following counters to understand CPU usage.
	
	CLASSIC:
	master_cpu_use
	cc_appcpu_use filter=cpu(0)
	(If AppFW or CMP is configured, then looking at slave_cpu_use also makes sense for classic)
	
	nCORE:
	(For an 8 Core system)
	mgmt_cpu_use (CPU0 - nscollect runs here)
	master_cpu_use (average of cpu(1) thru cpu(7))
	cc_cpu_use filter=cpu(1)
	cc_cpu_use filter=cpu(2)
	cc_cpu_use filter=cpu(3)
	cc_cpu_use filter=cpu(4)
	cc_cpu_use filter=cpu(5)
	cc_cpu_use filter=cpu(6)
	cc_cpu_use filter=cpu(7)
	 
How to look for CPU use for a particular CPU?
	Use the nsconmsg command and search for cc_cpu_use and grep for the CPU you are interested in.
	The output will look like the following:
| Index | rtime | totalcount-val | delta | rate/sec | symbol-name&device-no | 
| 320 | 0 | 209 | 15 | 2 | cc_cpu_use cpu(8) | 
| 364 | 0 | 205 | -6 | 0 | cc_cpu_use cpu(8) | 
| 375 | 0 | 222 | 17 | 2 | cc_cpu_use cpu(8) | 
| 386 | 0 | 212 | -10 | -1 | cc_cpu_use cpu(8) | 
| 430 | 0 | 216 | 6 | 0 | cc_cpu_use cpu(8) | 
| 440 | 0 | 201 | -15 | -2 | cc_cpu_use cpu(8) | 
| 450 | 0 | 208 | 7 | 1 | cc_cpu_use cpu(8) | 
| 461 | 0 | 202 | -6 | 0 | cc_cpu_use cpu(8) | 
| 471 | 0 | 209 | 7 | 1 | cc_cpu_use cpu(8) | 
| 482 | 0 | 238 | 29 | 4 | cc_cpu_use cpu(8) | 
| 492 | 0 | 257 | 19 | 2 | cc_cpu_use cpu(8) | 
Look at the total count (third) column and divide by 10 to get the CPU percentage. For eg. in the last line above, 257 implies that 257/10 = 25.7% CPU is used by CPU(8).
	Run the following command to investigate the nsconsmg counters for CPU issue:
nsconmsg –K newnslog –g cpu_use –s totalcount=600 –d current nsconmsg –K newnslog –d current | grep cc_cpu_use
We can further check for the Profiler o/p to understand who is taking the CPU.
	For details on the profiler o/p , logs , refer to the below article:
	https://support.citrix.com/article/CTX212480
We can further use the CPU counters mentioned in the below article for more details:
	https://support.citrix.com/article/CTX133887
This refers to the running of CPU profiler at all times, as soon as the NetScaler device comes up. At the boot time, the profiler is invoked and it keeps running. Any time any of the PE's associated CPU exceeds 90%, the profiler captures the data into a set of files.
 
This was necessitated with the issues seen at some customer sites and in internal tests. With customer issues, it's hard to go back and request the customer to run the profiler when the issue is seen again. Hence, we have felt the need of a profiler running to be able to see the functions triggering high CPU. With this feature now, the profiler will be running always and the data gets captured when the high CPU usage occurs.
 
TOT (Crete) 44.2+
9.3 - all builds
9.2 52.x +
Only nCore builds are affected.
 
Run the ps command to check if nsproflog and nsprofmon are running. The number of nsprofmon processes should be the same as the number of PEs running.
root@nc1# ps -ax | grep nspro 36683 p0 S+ 0:00.00 grep nspro 79468 p2- I 0:00.01 /bin/sh /netscaler/nsproflog.sh cpuuse=800 start 79496 p2- I 0:00.00 /bin/sh /netscaler/nsproflog.sh cpuuse=800 start 79498 p2- I 0:00.00 /bin/sh /netscaler/nsproflog.sh cpuuse=800 start 79499 p2- I 0:00.00 /bin/sh /netscaler/nsproflog.sh cpuuse=800 start 79502 p2- S 33:46.15 /netscaler/nsprofmon -s cpu=3 -ys cpuuse=800 -ys profmode=cpuuse -O -k /v 79503 p2- S 33:48.03 /netscaler/nsprofmon -s cpu=2 -ys cpuuse=800 -ys profmode=cpuuse -O -k /v 79504 p2- S 32:20.63 /netscaler/nsprofmon -s cpu=1 -ys cpuuse=800 -ys profmode=cpuuse -O -k /v
The profiled data is collected in /var/nsproflog directory. Here is a sample output of the list of files in that folder. At any point of time, the currently running files are newproflog_cpu_<penum>.out. Once the data in these files exceed 10MB in size, they are archived into a tar file and compressed. The roll over mechanism is similar to what we have for newnslog files.
newproflog.0.tar.gz newproflog.5.tar.gz newproflog.old.tar.gz newproflog.1.tar.gz newproflog.6.tar.gz newproflog_cpu_0.out newproflog.2.tar.gz newproflog.7.tar.gz nsproflog.nextfile newproflog.3.tar.gz newproflog.8.tar.gz nsproflog_options newproflog.4.tar.gz newproflog.9.tar.gz ppe_cores.txt
The current data is always captured in newproflog_cpu_<ppe number>.out. Once the profiler is stopped, the newproflog_cpu_* files will be archived into newproflog.(value in nsproflog.nextfile-1).tar.gz.
 
Nsprofmon is the binary that interacts with PE, retrieves the profiler records and writes them into files. There are a myriad of options present which are hard to remember. The wrapper script nsproflog.sh is easier to use and remember. Going forward, it is recommended to use the wrapper script, if it’s limited to collecting CPU usage data.
 
In earlier releases (9.0 and earlier), nsprofmon was heavily used internally and by the support groups. Some internal scripts that devtest use, refer to nsprofmon. It is recommended to use nsproflog.sh, if it’s limited to collecting CPU usage data.
 
It will affect the existing scripts if they try to invoke the profiler. Please see the next question.
 
There can be only one instance of profiler running at any time. If the profiler is already running (invoked at boot time with constant profiling), and if we want to invoke again, it flags an error and exits.
root@nc1# nsproflog.sh cpuuse=900 start nCore Profiling Another instance of profiler is already running. If you want to run the profiler at a different CPU threshold, please stop the current profiler using # nsproflog.sh stop ... and invoke again with the intended CPU threshold. Please see nsproflog.sh -h for the exact usage.
Similarly, nsprofmon is also modified to check if another instance is running. If it is, it exits flagging an error.
If the profiler needs to be run again with a different CPU usage (i.e. 80%), the running instance needs to be stopped and invoked again:
root@nc1# nsproflog.sh stop nCore Profiling Stopping all profiler processes Removing buffer for -s cpu=1 Removing profile buffer on cpu 1 ... Done. Saved profiler capture data in newproflog.5.tar.gz Setting minimum lost CPU time for NETIO to 0 microsecond ... Done. Stopping mgmt profiler process
root@nc1# nsproflog.sh cpuuse=800 start
In /var/nsproflog, unzip and untar the desired tar archive. Each file in this archive should correspond to each PE.
Caution: When we unzip and untar the older files, the files from the archive will overwrite the current ones. The names stored inside the tar archive are the same as the ones to which currently running profiler keeps writing into. To avoid this, unzip and untar into a temporary directory.
The simplest way to see the profiled data is
# nsproflog.sh kernel=/netscaler/nsppe display=newproflog_cpu_<ppe number>.out
The showtech script has been modified to collect the profiler data. When customer issues arrive, var/nsproflog can be checked to see if the profiler has captured any data.
 
Collecting traces and profiler data are made mutually exclusive. When nstrace.sh is run to collect traces, profiler is automatically stopped and restarted when nstrace.sh exits. We wouldn't have the profiler data during the time of collecting traces.
 
Initialization:
For each CPU, the following commands are executed initially:
nsapimgr -c nsapimgr -ys cpuuse=900 nsprofmon -s cpu=<cpuid> -ys profbuf=128 -ys profmode=cpuuse
Capturing:
For each CPU, the following are executed:
nsapimgr -c nsprofmon -s cpu=<cpuid> -ys cpuuse=900 -ys profmode=cpuuse -O -k /var/nsproflog/newproflog_cpu_<cpuid>.out -s logsize=10485760 -ye capture
After the above, nsprofmon processes will be running till any one of the capture buffers is full.
nsproflog.sh waits for any of the above child processes to exit
Stopping:
Kill all nsprofmon processes (killall -9 nsprofmon)
For each CPU, the following commands are executed:
nsprofmon -s cpu=<cpuid> -yS profbuf
Profiler capture files are archived:
nsapimgr -ys lctnetio=0