Why Wireshark caught client request packet, but server does not return ACK

Everybody is good, found a very strange question, after many tests and online search did not find the answer, want to let everybody help analysis.
In order to simplify the problem, I wrote two simple procedures, Server and Client, Client is built according to the input parameter specifies the number of connected to the Server, and then sends a byte instruction to the Server, don't do anything to receive instructions Server.
Found the following:
1 Client 1 is connected to the Server, each connected every 50ms send a command, running more than a dozen hours found no problem.
2 Client 1 is connected to the Server, every 15ms sends a byte instruction, probably running problems after one or two hours.
3 Cleint to establish 4 or more connected, each connecting every 50ms sends a byte instruction, sometimes one or two minutes, sometimes for twenty or thirty minutes will have problems, with the increase in the number of connections, problems with increasing frequency.
Problems after the phenomenon is: the Server terminal command Wireshark caught Client sent, but Server wasn't going back to ACK, after 200ms Client retransmission of the packet, Server then return to normal. Check the received packet content and before under normal conditions, the contents of the packet does not have what difference. You can see the number of bad segments received will add a netstat -s but.
Environment: Fedora10, CentOS6.3 are tried, for too many machines, tried two machine cable direct, tried Server and Client running on the same machine. All the above phenomena.
Please gurus help analyse what possible reason, thank you.

Started by Avivahc at February 02, 2016 - 12:48 AM

Set the socket option to enable TCP_NODELAY (this will disable the Nagle) and disable the TCP_CORK (if you didn't set it need not set up). Your question should be the number is too small to cause congestion control algorithm in Nagle TCP.

Posted by Tyrone at February 13, 2016 - 1:13 AM

Thank you, Server and Client NODELAY options have been set up.
Server is already captured Client packets, so should not delay the lead to Client.
It will be because the Server end delay caused by ACK, I increase the time 50ms sent in the Server end of a byte to the processing of Client, found in waiting for the 200ms Server sends 4 packets to the Client, but not before the Client package for confirmation, so it should not delay caused by ACK.
Following the capture of records, one of the first line is the problem of packet 172.16.129.3 (client) is sent to the 172.16.129.138 (server) package, the last line is the client retransmission packet
cap.png

Posted by Avivahc at February 19, 2016 - 1:30 AM

The Server server code
  1. typedef struct client_t
  2. {
  3. int sock;
  4. int frameNO;
  5. struct timeval lastKeepAlive;
  6. struct timeval lasttime;
  7. }Client;

  8. int g_listenSock = 0;
  9. list<Client*> g_clientList;

  10. void SetFds(fd_set* set)
  11. {
  12. FD_ZERO(set);
  13. if(g_listenSock > 0)
  14. {
  15. FD_SET(g_listenSock, set);
  16. }

  17. for(list<Client*>::iterator it = g_clientList.begin(); it != g_clientList.end(); it++)
  18. {
  19. Client* cl = *it;
  20. if(cl->sock > 0)
  21. {
  22. FD_SET(cl->sock, set);
  23. }
  24. }
  25. }

  26. void SetSockOpt(int sock)
  27. {
  28. Network::SetNonBlock(sock);
  29. Network::SetKeepLive(sock, 6, 3, 2);
  30. Network::SetNonDelay(sock);
  31. }

  32. bool ProceessClientCmd(Client* cl)
  33. {
  34. unsigned char msgType = 0;
  35. int ret = Network::Readn(cl->sock, &msgType, 1);
  36. if(ret != 1)
  37. {
  38. printf("recv failed, sock = %d\n", cl->sock);
  39. return false;
  40. }

  41. printf("sock = %d, recv request, seq = %d, frame %d\n", cl->sock, msgType, ++cl->frameNO);

  42. struct timeval now;
  43. gettimeofday(&now, NULL);
  44. if(cl->lasttime.tv_sec == 0)
  45. {
  46. cl->lasttime = now;
  47. }
  48. else
  49. {
  50. long diff = (now.tv_sec - cl->lasttime.tv_sec)* 1000000 + now.tv_usec - cl->lasttime.tv_usec;
  51. printf("sock = %d, now is %ld:%06d, wait %ld us\n", cl->sock, now.tv_sec, now.tv_usec, diff);
  52. if(diff > 200000)
  53. {
  54. usleep(500000);
  55. _exit(0);
  56. sleep(900000); // stop
  57. }
  58. cl->lasttime = now;
  59. }

  60. return true;
  61. }

  62. int main(int argc, char** argv)
  63. {
  64. if(argc <2)
  65. {
  66. return 0;
  67. }

  68. g_listenSock = Network::MakeInetServer(atoi(argv[1]));
  69. if(g_listenSock <0)
  70. {
  71. printf("start listen failed\n");
  72. return -1;
  73. }

  74. while(1)
  75. {
  76. fd_set rfds;
  77. SetFds(&rfds);

  78. struct timeval tv = {2, 0};
  79. int ret = select(FD_SETSIZE, &rfds, NULL, NULL, &tv);
  80. if(ret > 0)
  81. {
  82. if(FD_ISSET(g_listenSock, &rfds))
  83. {
  84. int sock = Network::MakeAccept(g_listenSock);
  85. if(sock > 0)
  86. {
  87. Client* cl = new Client;
  88. cl->frameNO = 0;
  89. cl->sock = sock;

  90. SetSockOpt(sock);

  91. gettimeofday(&(cl->lastKeepAlive), NULL);
  92. cl->lasttime.tv_sec = 0;
  93. g_clientList.push_back(cl);
  94. printf("new client, sock = %d\n", sock);
  95. }
  96. else
  97. {
  98. printf("make accpet failed\n");
  99. }
  100. }

  101. for(list<Client*>::iterator it = g_clientList.begin(); it != g_clientList.end(); it++)
  102. {
  103. Client* cl = *it;
  104. if(FD_ISSET(cl->sock, &rfds))
  105. {
  106. if(!ProceessClientCmd(cl))
  107. {
  108. Network::Close(cl->sock);
  109. cl->sock = -1;
  110. }
  111. }
  112. }

  113. for(list<Client*>::iterator it = g_clientList.begin(); it != g_clientList.end();)
  114. {
  115. Client* cl = *it;
  116. if(cl->sock <0)
  117. {
  118. delete *it;
  119. it = g_clientList.erase(it);
  120. }
  121. else
  122. {
  123. it++;
  124. }
  125. }
  126. }
  127. }
  128. }

Posted by Avivahc at February 22, 2016 - 1:55 AM

Network read and write function
  1. ssize_t Network::Recvn(int fd, void* vptr, ssize_t n)
  2. {
  3. ssize_t nleft;
  4. ssize_t nread;
  5. char *ptr;
  6. fd_set fds;
  7. struct timeval tv;

  8. ptr = reinterpret_cast<char*>(vptr);
  9. nleft = n;
  10. while(nleft > 0)
  11. {
  12. #ifdef __linux__
  13. nread = read(fd,ptr,nleft);
  14. #elif _WIN32
  15. nread = recv(fd,ptr,nleft,0);
  16. #else
  17. nread = read(fd,ptr,nleft);
  18. #endif

  19. #ifdef __linux__
  20. if(nread <0 )
  21. #elif _WIN32
  22. if(nread == SOCKET_ERROR )
  23. #else
  24. if(nread <0 )
  25. #endif
  26. {
  27. #ifdef __linux__
  28. if (errno != EWOULDBLOCK && errno != EAGAIN && errno != EINTR)
  29. {
  30. return RET_ERROR;
  31. }
  32. #elif _WIN32
  33. int err = GetLastError();
  34. if(err != WSAEWOULDBLOCK)
  35. {
  36. return RET_ERROR;
  37. }
  38. #else
  39. if (errno != EWOULDBLOCK && errno != EAGAIN && errno != EINTR)
  40. {
  41. return RET_ERROR;
  42. }
  43. #endif

  44. FD_ZERO(&fds);
  45. FD_SET(fd, &fds);
  46. tv.tv_sec = 5;
  47. tv.tv_usec = 0;
  48. int nselect = Select(fd+ 1, &fds, NULL, NULL, &tv);
  49. if(nselect <= 0)
  50. {
  51. return RET_ERROR;
  52. }
  53. }
  54. else if (nread == 0)
  55. {
  56. return RET_ERROR;
  57. }
  58. else
  59. {
  60. nleft -= nread;
  61. ptr += nread;
  62. }
  63. }
  64. return (n - nleft);
  65. }


  66. ssize_t Network::Sendn(int fd, const void* vptr, ssize_t n)
  67. {
  68. ssize_t nleft;
  69. ssize_t nwritten;
  70. const char *ptr;

  71. fd_set fds;
  72. struct timeval tv;

  73. ptr = reinterpret_cast<const char*>(vptr);
  74. nleft = n;

  75. while(nleft > 0)
  76. {
  77. #ifdef __linux__
  78. nwritten = write(fd,ptr,nleft);
  79. #elif _WIN32
  80. {
  81. nwritten = send(fd,ptr,nleft,0);
  82. }
  83. #else
  84. nwritten = write(fd,ptr,nleft);
  85. #endif

  86. #ifdef __linux__
  87. if(nwritten <0 )
  88. #elif _WIN32
  89. if(nwritten == SOCKET_ERROR )
  90. #else
  91. if(nwritten <0 )
  92. #endif
  93. {
  94. #ifdef __linux__
  95. if (errno != EWOULDBLOCK && errno != EAGAIN && errno != EINTR)
  96. {
  97. return RET_ERROR;
  98. }
  99. #elif _WIN32
  100. int err = GetLastError();
  101. if(err != WSAEWOULDBLOCK)
  102. {
  103. return RET_ERROR;
  104. }
  105. #else
  106. if (errno != EWOULDBLOCK && errno != EAGAIN && errno != EINTR)
  107. {
  108. return RET_ERROR;
  109. }
  110. #endif

  111. FD_ZERO(&fds);
  112. FD_SET(fd, &fds);
  113. tv.tv_sec = 5;
  114. tv.tv_usec = 0;

  115. int nselect = Select(fd + 1,NULL,&fds,NULL,&tv);
  116. if(nselect <= 0)
  117. {
  118. return RET_ERROR;
  119. }
  120. }
  121. else if(nwritten == 0)
  122. {
  123. return RET_ERROR;
  124. }
  125. else
  126. {
  127. nleft -= nwritten;
  128. ptr += nwritten;
  129. }
  130. }
  131. return (n-nleft);
  132. }

Posted by Avivahc at March 06, 2016 - 2:06 AM

Below is the two adjacent client request packet, the left is normal, and on the right is wrong, I couldn't see where is not normal
errpacket.png

Posted by Avivahc at March 12, 2016 - 2:53 AM

Look at your record, I doubt and this line of code "Network:: SetKeepLive (sock, 6, 3, 2);", although I don't know what it was, but I think it must modify the default behavior of KEEPALIVE, the default is two hours, this code Is it right? Put TCP_KEEPIDLE to change.

Posted by Tyrone at March 20, 2016 - 3:20 AM

The SetKeepLive code is as follows. I have tried, this line of code comment out, the problem still exists.
  1. int Network::SetKeepLive(int sockfd,int idle,int intvl,int cnt)
  2. {
  3. int opt = 1;
  4. if (setsockopt(sockfd,SOL_SOCKET,SO_KEEPALIVE,reinterpret_cast<char*>(&opt),sizeof(opt)) <0)
  5. {
  6. return RET_ERROR;
  7. }

  8. #ifdef __linux__
  9. if (setsockopt(sockfd,IPPROTO_TCP,TCP_KEEPIDLE, &idle, sizeof(idle)) <0 ||
  10. setsockopt(sockfd,IPPROTO_TCP,TCP_KEEPINTVL,&intvl,sizeof(intvl))<0 ||
  11. setsockopt(sockfd,IPPROTO_TCP,TCP_KEEPCNT, &cnt, sizeof(cnt)) <0)
  12. {
  13. return RET_ERROR;
  14. }
  15. #elif _WIN32
  16. unsigned long dw;
  17. tcp_keepalive live,liveout;

  18. live.onoff = 1;
  19. live.keepalivetime = idle * 1000;
  20. live.keepaliveinterval = intvl * 1000;

  21. if(WSAIoctl(sockfd,SIO_KEEPALIVE_VALS,&live,sizeof(live),&liveout,sizeof(liveout),&dw,NULL,NULL) <0)
  22. {
  23. return RET_ERROR;
  24. }
  25. #else
  26. if (setsockopt(sockfd,IPPROTO_TCP,TCP_KEEPIDLE, &idle, sizeof(idle)) <0 ||
  27. setsockopt(sockfd,IPPROTO_TCP,TCP_KEEPINTVL,&intvl,sizeof(intvl))<0 ||
  28. setsockopt(sockfd,IPPROTO_TCP,TCP_KEEPCNT, &cnt, sizeof(cnt)) <0)
  29. {
  30. return RET_ERROR;
  31. }
  32. #endif

  33. return RET_SUCCESS;
  34. }

Posted by Avivahc at March 22, 2016 - 3:24 AM

Both the client and the server to comment out this function?

Posted by Tyrone at March 26, 2016 - 3:56 AM

Is, comment

Posted by Avivahc at March 29, 2016 - 4:49 AM

This should be a super simple procedure, but once the connection number, communication frequency increases, there will be such a problem (the client initiates a connection between the 3, every 50ms sends a request packet, ran for two days were not an issue)
If not a procedural question, that any such application scenarios of the network program will appear this kind of problem? It does not, however, has not been able to find the reason

Posted by Avivahc at April 12, 2016 - 5:06 AM

The client and server are respectively in Windows and Linux? If the client and server running on the same platform, what is the difference?

Posted by Tyrone at April 15, 2016 - 5:25 AM

These tests are both the client and the server is running on Linux results,
Server running on windows, the client will appear the same problem in Linux, and capture records, from the records are not back to the ACK receiver to the sending end.
Both the client and the server running on the windows did the test, server is by statistics with a connecting client two consecutive request time interval to determine whether a problem (more than 200ms considers problems), the test found out, but I didn't capture.

Posted by Avivahc at April 16, 2016 - 5:47 AM

Whether or not to run such a problem in the same machine, only gradually cleared. Implementation of this problem certainly and protocol stack, but the exact cause is now also speculated that does not come out. (client and network equipment and server), the three groups may affect the behavior of each point.

Posted by Tyrone at April 23, 2016 - 6:47 AM

Here is the article may be of some help to you.


In addition, if how will disable delay ACK?
  1. int i = 1;
  2. setsockopt( iSock, IPPROTO_TCP, TCP_QUICKACK, (void *)&i, sizeof(i));

Posted by Tyrone at May 01, 2016 - 6:55 AM

Running the same Linux machine also tried, but also tried to use the line connecting two machine troubleshooting switch, there are problems.
The TCP_QUICKACK option is set, there is no effect, but I have a test for server timing out to client, if Server wasn't going back to ACK because of delay of ACK caused it, then the next time I send bag should take ACK.

Posted by Avivahc at May 06, 2016 - 7:40 AM

How to temporarily can not think of this problem is solved, the congestion control protocol of last resort for a change of the TCP have a look.
The current
  1. cat /proc/sys/net/ipv4/tcp_congestion_control
Available
  1. modprobe -l | grep tcp_
  2. kernel/net/ipv4/tcp_diag.ko
  3. kernel/net/ipv4/tcp_bic.ko
  4. kernel/net/ipv4/tcp_westwood.ko
  5. kernel/net/ipv4/tcp_highspeed.ko
  6. kernel/net/ipv4/tcp_hybla.ko
  7. kernel/net/ipv4/tcp_htcp.ko
  8. kernel/net/ipv4/tcp_vegas.ko
  9. kernel/net/ipv4/tcp_veno.ko
  10. kernel/net/ipv4/tcp_scalable.ko
  11. kernel/net/ipv4/tcp_lp.ko
  12. kernel/net/ipv4/tcp_yeah.ko
  13. kernel/net/ipv4/tcp_illinois.ko

Posted by Tyrone at January 03, 2017 - 12:05 AM

Well, I try, thank you

Posted by Avivahc at January 03, 2017 - 12:15 AM

I also encountered this problem, ask the landlord to solve?

Posted by Eilian at January 14, 2017 - 12:34 AM