mirror of
https://github.com/torvalds/linux.git
synced 2026-03-08 06:24:43 +01:00
tcp: give up on stronger sk_rcvbuf checks (for now)
We hit another corner case which leads to TcpExtTCPRcvQDrop Connections which send RPCs in the 20-80kB range over loopback experience spurious drops. The exact conditions for most of the drops I investigated are that: - socket exchanged >1MB of data so its not completely fresh - rcvbuf is around 128kB (default, hasn't grown) - there is ~60kB of data in rcvq - skb > 64kB arrives The sum of skb->len (!) of both of the skbs (the one already in rcvq and the arriving one) is larger than rwnd. My suspicion is that this happens because __tcp_select_window() rounds the rwnd up to (1 << wscale) if less than half of the rwnd has been consumed. Eric suggests that given the number of Fixes we already have pointing to1d2fbaad7cit's probably time to give up on it, until a bigger revamp of rmem management. Also while we could risk tweaking the rwnd math, there are other drops on workloads I investigated, after the commit in question, not explained by this phenomenon. Suggested-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/20260225122355.585fd57b@kernel.org Fixes:1d2fbaad7c("tcp: stronger sk_rcvbuf checks") Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260227003359.2391017-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This commit is contained in:
parent
6996a2d2d0
commit
026dfef287
1 changed files with 1 additions and 15 deletions
|
|
@ -5374,25 +5374,11 @@ static void tcp_ofo_queue(struct sock *sk)
|
|||
static bool tcp_prune_ofo_queue(struct sock *sk, const struct sk_buff *in_skb);
|
||||
static int tcp_prune_queue(struct sock *sk, const struct sk_buff *in_skb);
|
||||
|
||||
/* Check if this incoming skb can be added to socket receive queues
|
||||
* while satisfying sk->sk_rcvbuf limit.
|
||||
*
|
||||
* In theory we should use skb->truesize, but this can cause problems
|
||||
* when applications use too small SO_RCVBUF values.
|
||||
* When LRO / hw gro is used, the socket might have a high tp->scaling_ratio,
|
||||
* allowing RWIN to be close to available space.
|
||||
* Whenever the receive queue gets full, we can receive a small packet
|
||||
* filling RWIN, but with a high skb->truesize, because most NIC use 4K page
|
||||
* plus sk_buff metadata even when receiving less than 1500 bytes of payload.
|
||||
*
|
||||
* Note that we use skb->len to decide to accept or drop this packet,
|
||||
* but sk->sk_rmem_alloc is the sum of all skb->truesize.
|
||||
*/
|
||||
static bool tcp_can_ingest(const struct sock *sk, const struct sk_buff *skb)
|
||||
{
|
||||
unsigned int rmem = atomic_read(&sk->sk_rmem_alloc);
|
||||
|
||||
return rmem + skb->len <= sk->sk_rcvbuf;
|
||||
return rmem <= sk->sk_rcvbuf;
|
||||
}
|
||||
|
||||
static int tcp_try_rmem_schedule(struct sock *sk, const struct sk_buff *skb,
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue