In recent blog entries I talked about the Beta Results of the X-Link Enterprise Edition and how it improved message delivery times by reduction of 77%. We also found that many messages were delayed due to slow responsiveness of other systems and/or locked data items. In addition, we found some small delays that were associated with CPU utilization by the X-Link Engines while the Engines are not transferring data. This utilization is significant and hopefully can be addressed in future releases of X-Link. This is the topic of this series of blog entries.
What is causing the Unexpected CPU Usage?
In order to determine what exactly is causing the unexpected CPU usage, we replicated a similar, but busier test scenario. I setup an X-Link on my laptop (4 CPUs) with a tyupical PM system data base and a TCPip connected simulated EHR. I setup 30 linkages with each linkage waiting 30 seconds between reading new data from the PM. I setup an automated patient updater that every 30 seconds updates a patient in the source PM. All 30 linkages use the same data base, so they all find the patient change twice a minute.
Here's how the test system looks for 15 minutes under the above conditions. To make it easier to see what is happening, I created a special edition of the engine that tracks more information about timing than normal. It is also tracking this information by engine, of which only one engine's information is displayed below:
Note the maximum CPU usage shown on the graph is 1% of the total CPU time available.
This graph shows 7 different segments of the engine's CPU utilization, instead of data transfer time verses everything else as was collected during prior tests. Each segment shows the User and Kernel time used by that aspect of the program. The important segment to note is the red/blue layers in the center, this is the time spent on data transfers. The rest is time spent on other functions performed to utilize the resources of the server effectively.
Here's a run down of the different segments of the graph starting at the top - User and Kernel layers are shown for each segment:
The top layer is idle and all other processing time, User and Kernel together.
The orange/light blue layers is the time spent setting engine status flags, checking for a command to process, and capturing CPU usage.
The purple/green layers is the time spent determining the next x-link data transfer task to execute, if any are ready.
The red/blue layers is the time spent transferring the data.
The next two layers, orange and light blue, are almost non-existent. It represents the time used to execute a DoEvents command for the UI. But X-Link Engine hasn't had it's own UI since version 15, so this appendix will be removed from all editions of the X-Link Engine in version 20.01 (released 11/21/2017).
The purple/green layers is the time spent capturing CPU usage, writing the message statistics, setting status flags and for calculating the amount of processing time used since the last cycle.
And the final bottom red/blue layers is the amount of CPU time used while waiting. This is due to background tasks performed by the X-Link Engine while it is "waiting" for the next cycle. These are deferred processing tasks, like memory clean up and other programming overhead.
But it still didn't make sense: why does it take more time to not transfer data compared to the data transfer time. The answer didn't hit me right away. But then I realized the graph is quite 2 dimensional and suggests at a glance that you only check for scheduling once for every data transfer cycle. But the engine processes in a more 3 dimensional manner, the checking for processes occurs in this case 330 times a minute per engine distributed somewhat evenly over the minute, while data transfers only happen six times a minute. Multiply that times 30 engines, and now it starts to make sense why so much CPU is being used to not transfer data.
What does stand out is that whenever the engine is capturing CPU usage data, it seems to be taking a lot of CPU time to do so. I noticed this prior to making the changes in the engine to collect these new statistics and developed a much quicker method of collecting the raw CPU usage data. This new method causes much more voluminous data, and is not practical for normal operations.
What can we do about the CPU utilization?
All of this information suggests that some changes should be made in the Enterprise Edition to improve the way the engine utilizes resources, especially CPU. In my next entry I will try some changes to X-Link and we'll measure if we reduced it's CPU usage in a server environment while not degrading message delivery time.