HBase client access timeout, Retry, retry interval time allocation

Recommended for you: Get network issues from WhatsUp Gold. Not end users.

Making the timeout, the number of retries retry interval, the configuration is more important, Because the default configuration values are larger, If the HBase cluster or RegionServer and ZK off, It is a disaster for the application, Timeout and re can quickly fill the web container link, Causes the web container stop service, On the socket timeout, There are two: 1: to establish the connection timeout; 2: read data timeout.

You can configure the following parameters:

1. hbase.rpc.timeout: RPC timeout, The default 60s, Don't suggest changes, To avoid affecting the normal business, The online environment start configuration is 3 seconds, The day after the operation showed a large number of timeout error, The reason is that there is a region appeared the following problems blocking write: "Blocking updates... Memstore size 434.3m is > blocking 256.0m size = than "visible too low.

2. ipc.socket.timeout: Socket link timeout, should be less than or equal to RPC timeout, the default is 20s

3. hbase.client.retries.number: The number of retries, default is 14, which can be configured as 3

4. hbase.client.pause: Sleep time again, the default is 1s, can be reduced, such as 100ms

5. zookeeper.recovery.retry: The number of retries ZK, Can be adjusted to 3 times, ZK is not easy to hang, And if HBase cluster problem, Each retry retry the operation of ZK will be, The total number of retry ZK is: hbase.client.retries.number * zookeeper.recovery.retry, And sleep time each retry will have exponential growth of 2, Every time you access the HBase will try again, In a HBase operation if it involves multiple ZK access, If ZK is not available, There will be many times the ZK retry, Is a waste of time.

6. zookeeper.recovery.retry.intervalmill: Sleep time ZK retries, the default is 1s, can be reduced, for example: 200ms

7. hbase.regionserver.lease.period: A scan query when interacting with server timeout, the default is 60s, can not adjust.


Retry interval strategy RPC:

public static long getPauseTime(final long pause, final int tries) {

int ntries = tries;

// RETRY_BACKOFF[] = { 1, 1, 1, 2, 2, 4, 4, 8, 16, 32, 64 }

    if (ntries >= HConstants.RETRY_BACKOFF.length) {

      ntries = HConstants.RETRY_BACKOFF.length - 1;


    long normalPause = pause * HConstants.RETRY_BACKOFF[ntries];

    long jitter =  (long)(normalPause * RANDOM.nextFloat() * 0.01f); // 1% possible jitter

    return normalPause + jitter;




Retry interval strategy ZK:

// RetryCounterClass

//Sleep time as the number of retries a2Exponential growth, the first retry sleep time is the configuration parameters2Times

public void sleepUntilNextRetry() throws InterruptedException {

    int attempts = getAttemptTimes();

    long sleepTime = (long) (retryIntervalMillis * Math.pow(2, attempts));




// retriesRemaining, The default value ismaxReties, Each retry after reduction1

       public int getAttemptTimes() {

          return maxRetries-retriesRemaining+1;





Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download

Posted by Brant at July 14, 2014 - 7:33 PM