In my last three blog entries I talked about the Beta Results of the X-Link Enterprise Edition and how it improved message delivery times by reduction of 77%. But we also found that a handful of messages were still being delivered slowly even with all of the improvements of Enterprise. We found a 4th reason for these delays that may be associated with CPU utilization by X-Link Engines while the Engines are not transferring data. This is the topic of this blog entry.
What is causing the final delivery delay?
As mentioned in the last blog entry, we needed to replicate the issue so we could do further analysis of it. In an effort to replicate a similar, but busier test scenario, I setup an X-Link on my laptop (4 CPUs) with the same PM system data base and a similar TCPip connected simulated EHR. I setup 30 linkages which reside with 3 other non-related linkages that were parked for these tests (the client had 29 linkages operating on the 24th). Each linkage waits 30 seconds between reading new data from the PM or EHR, same as the client. I setup an automated patient updater that every 30 seconds updates a patient in the source PM. All 30 linkages use the same data base, so they all find the patient change.
Here's how the test system CPU Utilization looks for 15 minutes under the above conditions. To make it easier to see what is happening, I created a special edition of the engine that tracks more information about timing than normal. It is also tracking this information by engine, of which only one engine's information is displayed below:
Note that this graph is only for one engine and the maximum CPU usage shown on the graph is 1%.
This graph shows 7 different segments of the engine's CPU utilization, instead of data transfer time verses everything else as was collected at the customer site. The important layers to note are the red/blue bands in the center, this is the time spent on data transfers. The rest is time spent on other functions performed to utilize the resources of the server effectively. As noted, we tried and succeeded in creating an environment with more data transferring per minute (about 60 patients) than at the customer site, hoping to force significantly degraded message delivery time verses those at the client site. Even though the load was significantly higher, the delivery times were only affected to a lessor extent - only 1 to 4 seconds on only a handful of messages.
Here's a run down of the different layers of the graph - User and Kernel still apply and are shown for each segment:
The top layer is idle and all other processing time, User and Kernel together.
The orange/light blue layers is the time spent setting engine status flags, checking for a command to process, and capturing message statistics.
The purple/green layers is the time spent determining the next x-link data transfer task to execute, if any are ready.
The red/blue layers is the time spent transferring the data.
The next two layers, orange and light blue, are almost non-existent. It represents the time used to execute a DoEvents command for the UI. But X-Link Engine hasn't had it's own UI since version 15, so this appendix will be removed from all editions of X-Link Engine in version 20.
The purple/green layers is the time spent capturing more message statistics, writing the message statistics, write the engine status, and for calculating the amount of processing time used since the last cycle.
And the final bottom red/blue layers are the amount of CPU time used while waiting. This is due to background tasks performed by the X-Link Engine while it is "waiting" for the next cycle.
It still doesn't make sense: why does it take more time to not transfer data compared to the data transfer time. The answer didn't hit me right away. But then I realized the graph is quite 2 dimensional and suggests at a glance that you only check for scheduling once for every data transfer cycle. But the engine processes in a more 3 dimensional manner, the checking for processes occurs in this case and at the client site, 168 times a minute per engine distributed somewhat evenly over the minute, while data transfers only happen twice a minute. Multiply that times 30 engines, and now it starts to make sense why so much CPU is being used.
So it seems that CPU utilization is not a major contributing factor to message delivery times, but does appear to be the final portion of time the messages were delayed.
What can we do about the CPU utilization?
All of this information suggests that some changes should be made in the Enterprise Edition to improve the way the engine utilizes resources, especially CPU. My next series of blog entries will be the results of these changes.
Thank you for reading about our Beta Results and Message Delivery times for X-Link Enterprise Edition. We are excited to see the results are this significant.