In my last two blog entries I talked about the Beta Results of the X-Link Enterprise Edition and how it improved message delivery times by reduction of 77%. We looked at how and why the results were obtained and came to the conclusion that X-Link Enterprise Edition does perform high volume message delivery much better than the application and/or service editions.
But we also found that a handful of messages were still being delivered slowly even with all of the improvements of Enterprise. We've found 4 reasons for these delays which are:
- Trying to update patients in the PM when the patient is locked adds an additional 5 minutes to the delivery time or more,
- The EHR sometimes replies very slowly or not at all, and these delays directly affect delivery time,
- When reading updated patients in the PM when the patient is locked in the PM adds time to the delivery time of the message for this patient. This also delays all subsequent messages from transferring until the lock is gone.
- The final very small delay of just a few messages is unknown and is the topic of this blog entry.
How do we find out what is causing this 4th delay?
As I noted in the last entry, we had 4.5 million CPU Utilization data points from the client site, so why not either find if the issue is related to CPU utilization or rule CPU utilization out.
Let's start by taking a closer 15 minute view when messages were delivering slowly at the client site on the final day of the final graph in part 2:
Let's look closely. First - what is User verses Kernel: User time is when a program is executing in normal program mode. Kernel time is when Windows is performing some operation for the user program in protected state, like reading hard disk, sending data over a network, etc.
The data transfer times (User and Kernel) are the very thin line at the bottom of the graph. The data transfers are using less than 2% of the available CPUs (8 of them).
Scheduling and other non-data transfer activities is the green and purple layers. Yes, this function is eating up a large amount of the CPU time at the customer site (about 25% - a problem by itself), but is it the cause of the remaining message delay?
All other processes, the blue and orange layers, is just that. This includes the database, the PM system, the dozens of users on terminal services using the PM, their internet connections to the EHR system, and whatever else the users are doing.
Idle, the top layer, is time the CPUs are not being utilized. During the peak period depicted above, there is still about 35% of the CPUs not used. This shows that the CPUs as a resource is not causing any system slowdowns.
Where do we go from here?
At this point, we needed to replicate the issue so we could do further analysis of it. We had exhausted the extent we could analyze of the types of data collected at the customer site. In an effort to replicate a similar, but busier test scenario, I setup an X-Link with the same PM system data base and a similar TCPip connected simulated EHR. The results of this test are the topic of my next blog entry - Part 4.