为什么要offloading?
TCP的分片和校验和计算原本都由CPU完成,最新的网络适配器实现了硬件级别的分片和校验和计算,减轻了主机CPU的负担从而提升本机网络包的吞吐量;
原理
当数据包被内核tcp/ip 栈传递到驱动的时候, tcp/ip栈并不会利用CPU来进行tcp checksum的计算。所以在这个时刻,这个tcp checksum的值是随机的脏数据。只要数据包传递到驱动后,由于网卡驱动运行在tcp checksum offload 模式,所以驱动会调用芯片的tcp checksum计算功能来完成高性能tcp checksum重新计算。并将数据包放到线路上
TCP segmentaion同理;
Snoop是solaris自带的原生网络抓包工具,工作在内核和网络驱动之间,它抓的包发生在被NIC芯片重新计算tcp checksum之前,所以这个脏数据会导致wireshark报告tcp checksum incorrect的告警。
http://wdqfirst.blog.163.com/blog/static/1133474112011510101844959/
Checksum Offload的设置有四种:是否对Rx或Tx有效,也可以为对两者都有效。
对于Tx,设置Checksum Offload有效之后,传输层将随机填充TCP校验和,因此在本机上抓取的数据包是Bad CheckSum。然后,网卡会自动计算正确的校验码然后发送,因此对方收到的仍然是正确的TCP包。
对于Rx,设置Checksum Offload有效之后,网卡在接收数据时,会填充一个NDIS_TCP_IP_CHECKSUM_PACKET_INFO 结构并设置标志位;
http://blog.csdn.net/wangqi0079/article/details/9064557
TCP Segmentaion offload
10.10.1.22采用netcat接收数据
sgordon@basil$ nc -l 5001
192.168.1.2则发送10000字节的数据
sgordon@ginger$ nc -p 5002 10.10.1.22 5001 < 10000bytes.txt
开启GSO
sgordon@ginger$ sudo ethtool -k eth0
Offload parameters for eth0:
Cannot get device flags: Operation not supported
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp segmentation offload: off
udp fragmentation offload: off
generic segmentation offload: on
large receive offload: off
sgordon@ginger$ sudo tcpdump -i eth0 -n 'not port 22'
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 96 bytes
18:30:24.899687 IP 192.168.1.2.5002 > 10.10.1.22.5001: S 679249855:679249855(0) win 5840
18:30:24.900583 IP 10.10.1.22.5001 > 192.168.1.2.5002: S 1420594303:1420594303(0) ack 679249856 win 5792
18:30:24.900612 IP 192.168.1.2.5002 > 10.10.1.22.5001: . ack 1 win 92
18:30:24.900713 IP 192.168.1.2.5002 > 10.10.1.22.5001: . 1:2897(2896) ack 1 win 92
18:30:24.900735 IP 192.168.1.2.5002 > 10.10.1.22.5001: . 2897:4345(1448) ack 1 win 92
18:30:24.902575 IP 10.10.1.22.5001 > 192.168.1.2.5002: . ack 1449 win 68
18:30:24.902591 IP 192.168.1.2.5002 > 10.10.1.22.5001: P 4345:7241(2896) ack 1 win 92
18:30:24.903597 IP 10.10.1.22.5001 > 192.168.1.2.5002: . ack 2897 win 91
18:30:24.903607 IP 192.168.1.2.5002 > 10.10.1.22.5001: . 7241:8689(1448) ack 1 win 92
18:30:24.903613 IP 192.168.1.2.5002 > 10.10.1.22.5001: P 8689:10001(1312) ack 1 win 92
18:30:24.903617 IP 10.10.1.22.5001 > 192.168.1.2.5002: . ack 4345 win 114
18:30:24.905573 IP 10.10.1.22.5001 > 192.168.1.2.5002: . ack 5793 win 136
18:30:24.905587 IP 10.10.1.22.5001 > 192.168.1.2.5002: . ack 7241 win 159
18:30:24.906628 IP 10.10.1.22.5001 > 192.168.1.2.5002: . ack 8689 win 181
18:30:24.906637 IP 10.10.1.22.5001 > 192.168.1.2.5002: . ack 10001 win 204
10000个字节被分成5个segment,最大的为2896,可以太网的MSS为1460(启用SACK和timestamp后为1448),这便是Generic Segmentaion Offloading在工作;
关闭GSO
sgordon@ginger$ sudo ethtool -K eth0 gso off
sgordon@ginger$ sudo tcpdump -i eth0 -n 'not port 22'
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 96 bytes
18:33:02.644356 IP 192.168.1.2.5002 > 10.10.1.22.5001: S 3144010294:3144010294(0) win 5840
18:33:02.645427 IP 10.10.1.22.5001 > 192.168.1.2.5002: S 3901655238:3901655238(0) ack 3144010295 win 5792
18:33:02.645471 IP 192.168.1.2.5002 > 10.10.1.22.5001: . ack 1 win 92
18:33:02.645542 IP 192.168.1.2.5002 > 10.10.1.22.5001: . 1:1449(1448) ack 1 win 92
18:33:02.645558 IP 192.168.1.2.5002 > 10.10.1.22.5001: . 1449:2897(1448) ack 1 win 92
18:33:02.645567 IP 192.168.1.2.5002 > 10.10.1.22.5001: P 2897:4345(1448) ack 1 win 92
18:33:02.647415 IP 10.10.1.22.5001 > 192.168.1.2.5002: . ack 1449 win 68
18:33:02.647433 IP 192.168.1.2.5002 > 10.10.1.22.5001: . 4345:5793(1448) ack 1 win 92
18:33:02.647439 IP 192.168.1.2.5002 > 10.10.1.22.5001: . 5793:7241(1448) ack 1 win 92
18:33:02.648437 IP 10.10.1.22.5001 > 192.168.1.2.5002: . ack 2897 win 91
18:33:02.648446 IP 192.168.1.2.5002 > 10.10.1.22.5001: . 7241:8689(1448) ack 1 win 92
18:33:02.648451 IP 192.168.1.2.5002 > 10.10.1.22.5001: P 8689:10001(1312) ack 1 win 92
18:33:02.648460 IP 10.10.1.22.5001 > 192.168.1.2.5002: . ack 4345 win 114
18:33:02.650414 IP 10.10.1.22.5001 > 192.168.1.2.5002: . ack 5793 win 136
18:33:02.650428 IP 10.10.1.22.5001 > 192.168.1.2.5002: . ack 7241 win 159
18:33:02.651469 IP 10.10.1.22.5001 > 192.168.1.2.5002: . ack 8689 win 181
18:33:02.651476 IP 10.10.1.22.5001 > 192.168.1.2.5002: . ack 10001 win 204
此时TCP segment在内核栈便参照MSS分割
http://sandilands.info/sgordon/segmentation-offloading-with-wireshark-and-ethtool