Linux TCP 网络重传问题定位步骤

Prometnehs 告警规则如下:

- alert: HostTCPRetransmitsHigh
expr: increase(node_netstat_Tcp_RetransSegs[5m]) > 20000

Prometheus 监控数据显示: retrans=21736/5m

检查系统上的 网络内核协议栈状态 ,使用工具 nstat ,它能显示网络栈的统计信息,还能通过差值计数快速定位当前发生的异常。默认显示自系统启动以来的所有统计项。

# nstat
#kernel
IpInReceives 741632185 0.0
IpInHdrErrors 1454 0.0
IpInAddrErrors 24848 0.0
IpForwDatagrams 6281793 0.0
IpInUnknownProtos 173793 0.0
IpInDelivers 585081400 0.0
IpOutRequests 633335747 0.0
IpOutDiscards 39 0.0
IpOutNoRoutes 6912 0.0
IpReasmTimeout 104832 0.0
IpReasmReqds 40046807 0.0
IpReasmOKs 19711639 0.0
IpReasmFails 268659 0.0
IpFragOKs 2920 0.0
IpFragCreates 5840 0.0
IcmpInMsgs 28451709 0.0
IcmpInErrors 50710 0.0
IcmpInCsumErrors 4 0.0
IcmpInDestUnreachs 116906 0.0
IcmpInTimeExcds 12355 0.0
IcmpInRedirects 1137 0.0
IcmpInEchos 28320718 0.0
IcmpInEchoReps 47 0.0
IcmpInTimestamps 542 0.0
IcmpOutMsgs 34766477 0.0
IcmpOutDestUnreachs 6455420 0.0
IcmpOutTimeExcds 22793 0.0
IcmpOutRedirects 32 0.0
IcmpOutEchos 16 0.0
...

使用 bpftrace 跟踪重传发生的端口统计数据

# bpftrace -e '
tracepoint:tcp:tcp_retransmit_skb
{
@[args->sport, args->dport] = count();
}'

Attaching 1 probe...


^C

@[10050, 43344]: 4
@[22, 6890]: 5
@[22, 33828]: 5
@[22, 16254]: 5
@[22, 32874]: 5
@[22, 59454]: 6
@[22, 50852]: 6
@[22, 56096]: 7
@[9100, 51150]: 10
@[443, 49917]: 21
@[443, 50447]: 21

使用 bpftrace 实时跟踪指定端口的重传数据

# bpftrace -e '
tracepoint:tcp:tcp_retransmit_skb
/args->sport == 80 || args->sport == 443 || args->sport == 22/
{
printf("%d -> %d retrans\n", args->sport, args->dport);
}'
Attaching 1 probe...
22 -> 44184 retrans
22 -> 45636 retrans
22 -> 49526 retrans
22 -> 46220 retrans
22 -> 50296 retrans
22 -> 45636 retrans
22 -> 45636 retrans
22 -> 45636 retrans
22 -> 35922 retrans
22 -> 44184 retrans
22 -> 49526 retrans
22 -> 45636 retrans
22 -> 45256 retrans
22 -> 45256 retrans
22 -> 45256 retrans
22 -> 46886 retrans
22 -> 45256 retrans
443 -> 42035 retrans
443 -> 42035 retrans
443 -> 12169 retrans
443 -> 42035 retrans
443 -> 42035 retrans
443 -> 12169 retrans
443 -> 12169 retrans
443 -> 42035 retrans