Server Backup Delay Over Gigabit Ethernet
By: Chris Greer
Have you ever experienced a backup that takes much longer than you think it should? Even with networks running faster than light these days, sometimes a server backup can take hours longer than the numbers would lead us to expect. When we have Gigabit connections in the data center, we naturally want to see servers take full advantage of these high-speed links.
Doing the simple math – when we need to back up 100GB of data, we should see this completed in less than 15 minutes.
100GB x 8 (8 bits per byte) = 800 Gigabits of data to transfer.
800Gb/1Gb per second = 800 seconds.
800/60 seconds per minute = 13.3 Minutes.
If the servers can take full advantage of the network, 15 minutes would be a reasonable number to expect, unless other backups are simultaneously occurring or network traffic is contending for the link.
In one scenario however, it was taking several hours to complete this backup. Looking at the amount of time it should take, it is clear that the backup process was not utilizing the network as it should have been.
What was the root cause in this instance?
The TCP Window on the backup server was filling, requiring the source server to halt while the buffer was cleared by the application. This occurred thousands and thousands of times, causing enormous delay in the backup process.
We will now discuss why a problem with TCP Window can have this kind of effect, how we found this problem with Wireshark, and what we can do to address this issue and improve performance.
TCP Window – What is it, and why did it cause the delay?
A TCP Window can be described as a receive buffer on a given station. It is the amount of data that can be sent to that station at one time, without receiving an ACK for the data sent. With every ACK, a receiver will inform the sender how much space it has remaining in its TCP Window.
In many cases, we don’t see this number drop, showing that we have 65,535 bytes of room in the buffer.
If this window does fill, the receiver will send a window size of Zero to the sender, indicating a full window, which should cause the sender to halt transmission until further notice. One reason the TCP window can fill is that the application is not clearing this buffer as quickly as traffic is coming into it. The overall halt caused by this event can add seconds, minutes, or even hours to a backup.
How can this problem be found with Wireshark?
Often, hours of extra delay is caused by several thousand smaller delays. The first key to finding these delays in Wireshark is to use the time column to look for any halt in the data stream. In the screenshot below, the time column was set to view the time between the current packet and the previously displayed packet. This is also known as delta time. Note packet 144.
When compared to the time between the packets above it, which are sub-millisecond, this packet experiences a relatively long delay. It is a 64 byte ACK from the backup server to the source server. What caused this packet to be delayed for 20 milliseconds?
Look at the Window value of the previous packet in this direction – packet 142. This packet has a window of 2299 Bytes, meaning it only has 2299 bytes left in the receive buffer before it is full. When the sender received this ACK, it saw that there was only room for one more full size packet. This final packet is sent in frame 145, then the sender waits for the receive buffer to clear. In packet 144, the window has returned to 65,535 bytes and transfer can resume. After this frame, the delta times return to microseconds.
Note: In this situation, we never actually saw the TCP Window completely fill, resulting in a Zero Window event. As this example shows, zero windows are not the only thing to look for when troubleshooting window problems.
This 20 millisecond delay occurred once for every 10 milliseconds of transfer. In some cases, it took even longer for the TCP Window to return to 65,535. These delays caused the overall backup time to slow to over 5 hours.
What can be done to resolve this?
The backup application in this example is single-threaded. Only one TCP connection is used to transfer data from one machine to another, and the application on the receiver is not processing data out of the receive buffer as fast as it is coming in. In some cases, backup apps can be tuned to use more TCP connections, however this is not always the case. It may be that a new backup application is needed in order to use multi-threading. Also, check with the vendor to see if there are any tuning possibilities in the application itself that will cause it to clear the TCP buffer more frequently.
There are many things that can impact server backups. So far we have discussed the impact of file handling and delays caused by full TCP Windows.
About the Author:
Chris Greer is a Senior Network Analyst for Network Protocol Specialists, a Seattle based Network Consulting company. Chris has 10 years of experience in analyzing and troubleshooting networks. He regularly assists companies in tracking down the source of network and application performance problems using a variety of protocol analysis and monitoring tools including Wireshark. When he isn’t hunting down problems at the packet level, he can be found teaching various analysis workshops at Interop and other industry trade shows. Chris also delivers Fluke Networks public courses and protocol analysis themed webcasts. He can be contacted at chris (at) nps-llc (dot) com
This Article is Brought to you by:
Article Sponsorships Available
Short description about your link.
Add your link here
Article Sponsorships Available
Short description about your link.
Add your link here
Network Analysis Tips - Analyzers - Packet Sniffer Related Articles:
Identifying Slow Server Response
Why is tracking down a server performance problem so difficult? First, it can be hard to dig through thousands of packets to find a solid example of a slow response. Once a slow response is isolated, identifying the root caus...
By: Chris Greer
Applications/Services/Systems monitoring
Advanced HostMonitor is a system management tool that continuously monitors servers' availability and performance. In the event of net...
Flow Monitoring
ACE Live Netflow module uses integrated web-based dashboards to provide a business-centric view of network utilization and applicat...
By: Flow Monitoring
Updated Information Technology Related News:
China Information Technology, Inc. , a leading total solutions provider of Geographic Information Systems , digital public security and hospital information systems in China, today announced that it h...
China Information Technology to Host Analyst Day in Shenzhen, China
China Information Technology, Inc. , a leading total solutions provider of Geographic Information Systems , digital public security and hospital information systems in China, today announced that it w...
Beijing Power Information Technology Research Center to Manage Electric Netwo...
Beijing Power Information Technology Research Center, based in Beijing, China, has selected Intergraph® G/Technology and GeoMedia® WebMap to improve its smart grid operations. The technology will be...
China Information Technology Inc. to Participate in Upcoming Investor Confere...
China Information Technology, Inc. , a leading total solutions provider of Geographic Information Systems , digital public security and hospital information systems in China, today announced its senio...
Research and Markets: India Information Technology Report Q3 2010
DUBLIN--(BUSINESS WIRE)--Research and Markets (http://www.researchandmarkets.com/research/14f799/india_information) has announced the addition of the "India Information Technology Report Q3 2010" repo...
