[PHP] nginx file_get_contents CPU lead to 100% problems

Recommended for you: Get network issues from WhatsUp Gold. Not end users.

Yesterday morning, online site (nginx + fastcgi) image details page to open slowly, appeared a lot of 502 and 504 errors, and the server pressure is too large, almost in a state of denial of service. The Top command to view the server resource usage, found the CPU soared to 100% and sustained high 1-2 minutes. Moreover, the error log open nginx, found that there are many requests are the following state:


Because the image details page was not calculated to consume CPU resources, only get the picture information and related recommendation and comment on the logic, thus initially suspected to be causing problems of nginx configuration. Later found, only in some cases there will be such a problem. In desperation, had to request time of each module by error_log records and statistics will be recorded to a file. Three hours later, open the document statistics, found that one module of the loading time is very unstable, sometimes even out over 120s, which apparently is caused by one of the nginx 502 errors. The corresponding code found on Check, search interface, directly through the file_get_contents (API) way to get. Because file_get_contents is blocking I/O mode, and the default does not set the timeout, so if the search interface does not return data in long time, will always occupy system resources, resulting in 502 of nginx bad gateway error. Zhang Yan's blog, on this phenomenon to do a detailed explanation and description (address: ). In this paper, the authors give the solution is timeout parameters using the stream, the file_get_contents socket connection mandatory timeout, the specific scheme is:


$ctx = stream_context_create(array(  
		'http' => array(  
			'timeout' => 5 //Set a timeout, the unit s 
		)  
	)  
);  
file_get_contents(API, 0, $ctx);

After attempting to found, In many cases, This timeout is possible, But occasionally a timeout failure situation, And, File_get_contents cannot be detailed information on the wrong track, Therefore we consider the following, Decided to give up file_get_contents, And the use of more powerful curl to complete the corresponding function, And through the CURLOPT_TIMEOUT and CURLOPT_CONNECT_TIMEOUT limited connection time and request time interface search. And in order to ensure that the search results, will try 3 time connection, if it fails, use the default data to fill. After setting this, basically rarely occurred in 502 bad gateway error.

Here is the record of some notes:

1 similar mistakes, don't rush to modify the nginx configuration, because if a lot of business are normal infants only part of the business is not normal, very has the possibility is the business contains more time-consuming operation or faulty logic, to conduct the investigation, the business especially, many developer did not modify the server configuration file permissions.

High service general fault tolerance 2 line, often need to resolutely rejected such as file_get_contents such seemingly convenient but in fact there are many defects and the use of function, function more perfect library instead of.

3 most of the time, the process is abnormal (such as CPU higher occupancy), do not rush to kill off this "field", may wish to strace – P PID tracking system of the calling process, often there will be unexpected gains (for example, find a hard to find bug). The use of the strace, recommend a good article

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download

Posted by maker at March 01, 2014 - 9:39 PM