Matters needing attention in processing Chinese when Oracle LPAD/RPAD function

Recommended for you: Get network issues from WhatsUp Gold. Not end users.

First look at the definition of function of the official Oracle:

The RPAD function returns an expression, right-padded to a specified length with the specified characters; or, when the expression to be padded is longer than the length specified after padding, only that portion of the expression that fits into the specified length.

The RPAD function will specify the string from the target string right fill the specified length string, the string to be filled is larger than the length of the filling length, will be based on the length of filling intercepting the target string.


Function syntax

RPAD (text-exp , length [, pad-exp])


The filling length interpretation of Length

On the interpretation of individual parameters, we focus on the parameters of length to have a look:

Length

The total length of the return value as it is displayed on your terminal screen. In most character sets, this is also the number of characters in the return value. However, in some multibyte character sets, the display length of a character string can differ from the number of characters in the string.

When you specify a value for length that is shorter than the length of text-exp, then this function truncates the expression to the specified length.

In the total length: as the return value to display them in the terminal screen of your attention, is the same length,The length of the terminal display, The actual non string in the database show the length (in bytes). In the multi byte support database (such as simplified Chinese, AL32UTF8 and so on) is particularly obvious.

Through an example below to understand this sentence:

SQL> SELECT 'Oracle' ORA_STR,
  2         LENGTHB('Oracle') ORA_STR_LENGTH,
  3         RPAD('Oracle', 10, '$') ORA_RPAD_STR,
  4         LENGTHB(RPAD('Oracle', 10, '$')) ORA_RPAD_STR_LENGTH
  5    FROM dual
  6  ;
 
ORA_STR   ORA_STR_LENGTH ORA_RPAD_STR  ORA_RPAD_STR_LENGTH
--------- -------------- ------------- -------------------
Oracle                 9 Oracle$$$$                     13
 
SQL> 

The database character set is AL32UTF8, so a Chinese character occupying 3 bytes. In this example, the LENGTHB function length byte string "oracle" calculation is accurate, is 9 bytes, but you'll find out, we used the RPAD function to the string "oracle" back fill the $sign, a total length of 10, the range is the "Oracle $$$$", is not we expect the value of "Oracle $", also found that, the length of the string after the interception is not 10 byte.

We return to the official definition for the return of length Length, the length depends on the current character occupied in the terminal display screen size, for Chinese, usually occupy on the screen is 2 byte width:

So when RPAD was filled on these characters according to two bytes, so the above example described the situation. This unpredictable (Sometimes) condition for people who fixed output byte program will undoubtedly cause problems.


Solution

The following is the solution to this problem:

SQL> SELECT 'Oracle' ORA_STR,
  2         LENGTHB('Oracle') ORA_STR_LENGTH,
  3         RPAD('Oracle', 10, '$') ORA_RPAD_STR,
  4         LENGTHB(RPAD('Oracle', 10, '$')) ORA_RPAD_STR_LENGTH,
  5         -- Solution
  6         SUBSTRB('Oracle' || RPAD('$', 10, '$'), 1, 10) SOLUTION
  7    FROM dual
  8  ;
 
ORA_STR   ORA_STR_LENGTH ORA_RPAD_STR  ORA_RPAD_STR_LENGTH SOLUTION
--------- -------------- ------------- ------------------- ----------
Oracle                 9 Oracle$$$$                     13 Oracle$
 
SQL> 

In practical application, we can the method is abstracted as a function to use:

CREATE OR REPLACE 
FUNCTION rpad2(str IN VARCHAR2,
               len IN PLS_INTEGER,
               pad IN VARCHAR2)
  RETURN VARCHAR2
IS
BEGIN
  RETURN SUBSTRB(str || RPAD(pad, len, pad), 1, len);
END rpad2;

Use examples:

SQL> SELECT rpad2('Oracle', 10, '$') PADDED_VALUE FROM DUAL;
 
PADDED_VALUE
-----------------------
Oracle$
 
SQL> 

Reference linking

  1. Oracle Bug? RPAD of Japanese (kanji) character in Oracle 10gR2 UTF8 database
  2. Issue with LPAD/RPAD when using with Japanese data
  3. Oracle Documentation - RPAD
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download

Posted by Algernon at November 30, 2013 - 8:22 PM