The foreach problem in PHP

Recommended for you: Get network issues from WhatsUp Gold. Not end users.

Preface:

The foreach structure is introduced in PHP4, this is one simple way traverses the array. Compared with the traditional for, foreach can be more convenient access to key value pairs. Before PHP5, foreach can only be used for array; PHP5, using foreach can traverse the object (see: traversing object). This paper discusses only through the array of.

Foreach although simple, but it may be some unexpected behavior, especially the code relates to the case of references.

Here are several case, help us to further understand the essence of foreach.

Problem 1:

$arr = array(1,2,3);

foreach($arr as $k => &$v) {
    $v = $v * 2;
}
// now $arr is array(2, 4, 6)

foreach($arr as $k => $v) {
    echo "$k", " => ", "$v";
}

First from the simple beginning, if we try to run the code, you will find the final output is0=>2  1=>4  2=>4 .

Why not 0=>2  1=>4  2=>6 ?

In fact, we can think of the foreach ($arr as $k => $v) structure implies the following operation, respectively, the array current 'key' and the 'value' is assigned to the variable $k and $v. Concrete form:

foreach($arr as $k => $v){ 
//Before a user code executionImplicit 2 assignment
$v = currentVal();
$k = currentKey();
//Continue to runUser code …… }

According to the theory, now we re analysis of the first foreach:

First times circulation, because $v is a reference, so the $v = & $arr[0], $v=$v*2 is equivalent to $arr[0]*2, so $arr becomes 2,2,3

Second times cycle, $v = & $arr[1], $arr into 2,4,3

Third times cycle, $v = & $arr[2], $arr into 2,4,6

Then the code into the second foreach:

First times circulation, implicit operation $v=$arr[0] is triggered, the $v is still a reference to the $arr[2], which is equivalent to $arr[2]=$arr[0], $arr to 2,4,2

Second times cycle, $v=$arr[1], $arr[2]=$arr[1], $arr into 2,4,4

Third times cycle, $v=$arr[2], $arr[2]=$arr[2], $arr into 2,4,4

OK, The analysis is complete.

How to solve similar problems? The PHP Handbook has a reminder:

Warning: the last element of the array $value referenced in the foreach cycle will be kept. Recommend the use of unset () to be destroyed. 

$arr = array(1,2,3);

foreach($arr as $k => &$v) {
    $v = $v * 2;
}
unset($v);

foreach($arr as $k => $v) {
    echo "$k", " => ", "$v";
}
// Output 0=>2  1=>4  2=>6

From this we can see that, quote is likely to be accompanied by side effects. If you do not want to modify the contents of the array of unconscious to change, the best timely unset off these references.

Problem 2:

$arr = array('a','b','c');

foreach($arr as $k => $v) {
    echo key($arr), "=>", current($arr);
}

// Print 1=>b 1=>b 1=>b

The problem is even more strange. According to the manual version, key and current are taking the current element in an array of keys.

Why is key ($arr) was 1, current ($arr) is B.?

First use the VLD view after compiling opcode:

We the third line from the ASSIGN command looks, it represents the array ('a','b','c') is assigned to$arr.

Because $arr is CV, Array ('a','b','c') TMP, So the ASSIGN command to find the actual implementation of the ZEND_ASSIGN_SPEC_CV_TMP_HANDLER function. Here in particular, CV is a variable cache PHP5.1 to increase, It to hold the zval** uses an array of forms, Cache live variables used again without the need to find the active symbol table, But to go directly to the CV array access, Because the array access much faster than the hash table, It can improve the efficiency of.

static int ZEND_FASTCALL  ZEND_ASSIGN_SPEC_CV_TMP_HANDLER(ZEND_OPCODE_HANDLER_ARGS)
{
    zend_op *opline = EX(opline);
    zend_free_op free_op2;
    zval *value = _get_zval_ptr_tmp(&opline->op2, EX(Ts), &free_op2 TSRMLS_CC);
    
    // Create a CV $arr** pointer array
    zval **variable_ptr_ptr = _get_zval_ptr_ptr_cv(&opline->op1, EX(Ts), BP_VAR_W TSRMLS_CC);

    if (IS_CV == IS_VAR && !variable_ptr_ptr) {
        ……
    }
    else {
        // Assigns array to$arr
         value = zend_assign_to_variable(variable_ptr_ptr, value, 1 TSRMLS_CC);
        if (!RETURN_VALUE_UNUSED(&opline->result)) {
            AI_SET_PTR(EX_T(opline->result.u.var).var, value);
            PZVAL_LOCK(value);
        }
    }

    ZEND_VM_NEXT_OPCODE();
}

After the ASSIGN command, zval** pointer is added to the CV array, pointer to the actual array, indicating that $arr has been CV cache up.

The next loop array, we see FE_RESET instruction execution function, it corresponds to ZEND_FE_RESET_SPEC_CV_HANDLER:

static int ZEND_FASTCALL  ZEND_FE_RESET_SPEC_CV_HANDLER(ZEND_OPCODE_HANDLER_ARGS)
{
    ……
    if (……) {
        ……
    } else {
        // Gets a pointer to the array through the CV array
        array_ptr = _get_zval_ptr_cv(&opline->op1, EX(Ts), BP_VAR_R TSRMLS_CC);
        ……
    }
    ……
    // WillPoint toSave the array pointer to zend_execute_data-> Ts (Ts used to store code execution period of the temp_variable)
    AI_SET_PTR(EX_T(opline->result.u.var).var, array_ptr);
    PZVAL_LOCK(array_ptr);

    if (iter) {
        ……
    } else if ((fe_ht = HASH_OF(array_ptr)) != NULL) {
        // Reset the internal pointer array
        zend_hash_internal_pointer_reset(fe_ht);
        if (ce) {
            ……
        }
        is_empty = zend_hash_has_more_elements(fe_ht) != SUCCESS;
        
        // Set EX_T (opline-> result.u.var).Fe.fe_pos is used to save the internal pointer array
        zend_hash_get_pointer(fe_ht, &EX_T(opline->result.u.var).fe.fe_pos);
    } else {
        ……
    }
    ……
}

Here are 2 important pointer into zend_execute_data-> Ts:

After the FE_RESET instruction is executed, the actual situation of memory:

Next, we continue to view the FE_FETCH, perform the function it corresponds to ZEND_FE_FETCH_SPEC_VAR_HANDLER:

static int ZEND_FASTCALL  ZEND_FE_FETCH_SPEC_VAR_HANDLER(ZEND_OPCODE_HANDLER_ARGS)
{
    zend_op *opline = EX(opline);
    
    // Note that the pointer is from EX_T (opline-> op1.u.var).Var.ptr acquisition
    zval *array = EX_T(opline->op1.u.var).var.ptr;
    ……
   
    switch (zend_iterator_unwrap(array, &iter TSRMLS_CC)) {
        default:
        case ZEND_ITER_INVALID:
            ……

        case ZEND_ITER_PLAIN_OBJECT: {
            ……
        }

        case ZEND_ITER_PLAIN_ARRAY:
            fe_ht = HASH_OF(array);
            
            // Special attention: 
            // The pointer FE_RESET instructions, internal elements stored in the EX_T array(opline->op1.u.var).fe.fe_pos
            // Here to get the pointer
            zend_hash_set_pointer(fe_ht, &EX_T(opline->op1.u.var).fe.fe_pos);
            
            // Get the value of the element
            if (zend_hash_get_current_data(fe_ht, (void **) &value)==FAILURE) {
                ZEND_VM_JMP(EX(op_array)->opcodes+opline->op2.u.opline_num);
            }
            if (use_key) {
                key_type = zend_hash_get_current_key_ex(fe_ht, &str_key, &str_key_len, &int_key, 1, NULL);
            }
            
            // Array internal pointer to the next element
            zend_hash_move_forward(fe_ht);
            
            // Save the pointer after to EX_T(opline->op1.u.var).fe.fe_pos
            zend_hash_get_pointer(fe_ht, &EX_T(opline->op1.u.var).fe.fe_pos);
            break;

        case ZEND_ITER_OBJECT:
            ……
    }
    
    ……
}

According to the FE_FETCH implementation, we generally see foreach ($arr as $k => $v) do. It will be based on the zend_execute_data-> Ts pointer to access array elements, in the acquisition of success, the pointer moves to the next position and re save.

In simple terms, the first time in the FE_FETCH cycle has the internal array pointer to the second elements, so in foreach internal call key ($arr) and current ($arr), actually obtained is 1 and'b'.

Why will output 3 times 1=> b.?

We continue to look at the ninth and thirteenth rows of SEND_REF instruction, the $arr parameter is pushed onto the stack. Then will generally use the DO_FCALL command to invoke the key and current functions. PHP is not compiled into native machine code, so PHP uses the opcode command to simulate the actual CPU and memory mode.

Refer to the PHP source in the SEND_REF:

static int ZEND_FASTCALL  ZEND_SEND_REF_SPEC_CV_HANDLER(ZEND_OPCODE_HANDLER_ARGS)
{
    ……
// Gets the $arr pointer pointer in CV varptr_ptr = _get_zval_ptr_ptr_cv(&opline->op1, EX(Ts), BP_VAR_W TSRMLS_CC); …… // The separation of variables, here to copy a array for key function SEPARATE_ZVAL_TO_MAKE_IS_REF(varptr_ptr); varptr = *varptr_ptr; Z_ADDREF_P(varptr); // The stack zend_vm_stack_push(varptr TSRMLS_CC); ZEND_VM_NEXT_OPCODE(); }

The code in the SEPARATE_ZVAL_TO_MAKE_IS_REF is a macro:

#define SEPARATE_ZVAL_TO_MAKE_IS_REF(ppzv)    \
    if (!PZVAL_IS_REF(*ppzv)) {                \
        SEPARATE_ZVAL(ppzv);                \
        Z_SET_ISREF_PP((ppzv));                \
    }

The main function of SEPARATE_ZVAL_TO_MAKE_IS_REF is, if the variable is not a reference, then copy in memory of a new. In this case it will be array ('a','b','c') got a copy. Therefore the separation of variables after the memory:

Note that, after the separation of variables in the CV array, pointer pointing to the new copy data, and through zend_execute_data-> pointer in Ts can still get the old data.

The next cycle is not one one of them, with above it:

Now we understand why key and current always returns array second elements, with no external code to copy out array, interior pointers, it will never move.

Problem 3:

$arr = array('a','b','c');

foreach($arr as $k => &$v) {
    echo key($arr), '=>', current($arr);
}
// Print 1=>b 2=>c =>

The 2 and only one difference: in this case foreach use references. Using VLD see this, found 2 and code opcode. So we use issue tracking method 2, to gradually achieve the corresponding view opcode.

First, foreach will call the FE_RESET:

static int ZEND_FASTCALL  ZEND_FE_RESET_SPEC_CV_HANDLER(ZEND_OPCODE_HANDLER_ARGS)
{
    ……
    if (opline->extended_value & ZEND_FE_RESET_VARIABLE) {
        // Gets the variable from the CV
        array_ptr_ptr = _get_zval_ptr_ptr_cv(&opline->op1, EX(Ts), BP_VAR_R TSRMLS_CC);
        if (array_ptr_ptr == NULL || array_ptr_ptr == &EG(uninitialized_zval_ptr)) {
            ……
        }
        else if (Z_TYPE_PP(array_ptr_ptr) == IS_OBJECT) {
            ……
        }
        else {
            // According to traverse the array case
            if (Z_TYPE_PP(array_ptr_ptr) == IS_ARRAY) {
                SEPARATE_ZVAL_IF_NOT_REF(array_ptr_ptr);
                if (opline->extended_value & ZEND_FE_FETCH_BYREF) {
                    // Save your array zval settings for is_ref
                    Z_SET_ISREF_PP(array_ptr_ptr);
                }
            }
            array_ptr = *array_ptr_ptr;
            Z_ADDREF_P(array_ptr);
        }
    } else {
        ……
    }
    ……
}

Problem 2 has analyzed the realization of part of FE_RESET. Here the need to pay particular attention to the reference value, for the cases of foreach, therefore enters with another branch on the question of different in execution time in FE_RESET.

Ultimately, FE_RESET will array is_ref is set to true, the memory is only one array data.

The analysis of SEND_REF:

static int ZEND_FASTCALL  ZEND_SEND_REF_SPEC_CV_HANDLER(ZEND_OPCODE_HANDLER_ARGS)
{
    ……
    // Gets the $arr pointer pointer in CV
    varptr_ptr = _get_zval_ptr_ptr_cv(&opline->op1, EX(Ts), BP_VAR_W TSRMLS_CC);
    ……
    
    // Separation of variables, Because the variables in CV itself is a reference, Here is not a new copy array
    SEPARATE_ZVAL_TO_MAKE_IS_REF(varptr_ptr);
    varptr = *varptr_ptr;
    Z_ADDREF_P(varptr);
    
    // The stack
    zend_vm_stack_push(varptr TSRMLS_CC);

    ZEND_VM_NEXT_OPCODE();
}

The macro SEPARATE_ZVAL_TO_MAKE_IS_REF only is_ref=false separation variable. Since array has been set to is_ref=true, so it won't be a copy of a copy. In other words, the memory is still only a array data.

This explains why the first 2 cycles of output 1=> B 2=> C. In the third cycle FE_FETCH, move the pointer to move forward.

ZEND_API int zend_hash_move_forward_ex(HashTable *ht, HashPosition *pos)
{
    HashPosition *current = pos ? pos : &ht->pInternalPointer;

    IS_CONSISTENT(ht);

    if (*current) {
        *current = (*current)->pListNext;
        return SUCCESS;
    } else
        return FAILURE;
}

Due to the internal pointer is pointing to the last element of the array, and then move up to NULL. After the internal pointer to NULL, then we call the key and current of the array, which will return the NULL and false, said the call to fail, this is not a character echo.

 Problem 4:

$arr = array(1, 2, 3);
$tmp = $arr;
foreach($tmp as $k => &$v){
    $v *= 2;
}
var_dump($arr, $tmp); // Print what? 

Is the relationship with foreach, but since it relates to the foreach, were discussed.:)

The code first creates a $arr array, the array is assigned to $tmp, in the next foreach cycle, modify function in an array of $tmp on the $v, but not to$arr.

Why?

This is because in PHP, the assignment operator is the value of a variable copy to another variable, therefore modify one, will not affect the other one.

PS: this does not apply to object, from PHP5, the object will always default assignment by reference, for example:

class A{
    public $foo = 1;
}
$a1 = $a2 = new A;
$a1->foo=100;
echo $a2->foo; // The output of 100, $a1 and $a2 in fact for the same object reference

Back to the topic of the code, we can now confirm $tmp=$arr is actually the value is copied, the $arr array will be give a copy of $tmp. In theory, after the assignment statement is executed, the memory will have 2 copies of the same array.

Maybe someone will doubt, if the array is not large, this operation will be very slow?

Fortunately, PHP has a solution more clever. In fact, when the $tmp=$arr after the execution, the memory is still only a array. To view the PHP source in the zend_assign_to_variable implementation (from php5.3.26):

static inline zval* zend_assign_to_variable(zval **variable_ptr_ptr, zval *value, int is_tmp_var TSRMLS_DC)
{
    zval *variable_ptr = *variable_ptr_ptr;
    zval garbage;
    ……
  // Lvalue of type object
    if (Z_TYPE_P(variable_ptr) == IS_OBJECT && Z_OBJ_HANDLER_P(variable_ptr, set)) {
        ……
    }
    // An lvalue as references
    if (PZVAL_IS_REF(variable_ptr)) {
        ……
    } else {
        // An refcount__gc=1 case
        if (Z_DELREF_P(variable_ptr)==0) {
            ……
        } else {
            GC_ZVAL_CHECK_POSSIBLE_ROOT(*variable_ptr_ptr);
            // Non temporary variables
            if (!is_tmp_var) {
                if (PZVAL_IS_REF(value) && Z_REFCOUNT_P(value) > 0) {
                    ALLOC_ZVAL(variable_ptr);
                    *variable_ptr_ptr = variable_ptr;
                    *variable_ptr = *value;
                    Z_SET_REFCOUNT_P(variable_ptr, 1);
                    zval_copy_ctor(variable_ptr);
                } else {
                    // $tmp=$Arr will run to here, 
// Value array pointer to refer to the actual data of $arr, variable_ptr_ptr for the $tmp pointer pointing to the data pointer
// Just copy the pointer array, and no real copy actual *variable_ptr_ptr = value; // Value refcount__gc +1, refcount__gc in this case is 1, Z_ADDREF_P is 2 Z_ADDREF_P(value); } } else { …… } } Z_UNSET_ISREF_PP(variable_ptr_ptr); } return *variable_ptr_ptr; }

The essence of visible $tmp = $arr is the array pointer to copy, and then will automatically add 1 map to express the memory of array refcount, is still only a array array:

Since only one array, $tmp time to modify the foreach loop, why didn't the $arr change?

Continue to look at the ZEND_FE_RESET_SPEC_CV_HANDLER function in the PHP source code, this is a OPCODE HANDLER, its OPCODE is FE_RESET. The function responsible before foreach starts, the internal pointer array pointer to its first element.

static int ZEND_FASTCALL  ZEND_FE_RESET_SPEC_CV_HANDLER(ZEND_OPCODE_HANDLER_ARGS)
{
    zend_op *opline = EX(opline);

    zval *array_ptr, **array_ptr_ptr;
    HashTable *fe_ht;
    zend_object_iterator *iter = NULL;
    zend_class_entry *ce = NULL;
    zend_bool is_empty = 0;

    // FE_RESET for the variable
    if (opline->extended_value & ZEND_FE_RESET_VARIABLE) {
        array_ptr_ptr = _get_zval_ptr_ptr_cv(&opline->op1, EX(Ts), BP_VAR_R TSRMLS_CC);
        if (array_ptr_ptr == NULL || array_ptr_ptr == &EG(uninitialized_zval_ptr)) {
            ……
        }
        // Foreach a object
        else if (Z_TYPE_PP(array_ptr_ptr) == IS_OBJECT) {
            ……
        }
        else {
            // The meeting into the branch
            if (Z_TYPE_PP(array_ptr_ptr) == IS_ARRAY) {
                // Note that the SEPARATE_ZVAL_IF_NOT_REF here
// It will re replication of an array. // True separation of $tmp and $arr, into 2 arrays in memory
SEPARATE_ZVAL_IF_NOT_REF(array_ptr_ptr); if (opline->extended_value & ZEND_FE_FETCH_BYREF) { Z_SET_ISREF_PP(array_ptr_ptr); } } array_ptr = *array_ptr_ptr; Z_ADDREF_P(array_ptr); } } else { …… } // Reset the internal pointer array …… }

As you can see from the code, the implementation of the real variable separation and not in the assignment statement execution, but delayed when using the variable, which is also implemented Copy On Write mechanism in PHP.

FE_RESET, following the changes in memory:

This explains why foreach does not have an impact on the original $arr. As for the ref_count and is_ref changes, interested students can implement read ZEND_FE_RESET_SPEC_CV_HANDLER and ZEND_SWITCH_FREE_SPEC_VAR_HANDLER (both in the php-src/zend/zend_vm_execute.h), this paper does not make detailed analysis:)

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download

Posted by Lucien at October 23, 2013 - 10:24 PM