# The large amount of data processing and anti brush

There are now 3 to the topic, trouble you to the point
1 to achieve a function f (A, B), can be used with -12, +12 operation, +7, -7, +5, -5, how to get the shortest distance A-B.
For example, A=0 and B=24, the 0+12+12=24, which is the shortest distance.

Access log 2 such as Baidu day, log only one column, is IP, now wants to repeat the IP removal, how to do?
Because the data volume is huge, I thought of several schemes:
a: AWK, don't know AWK to this file will not be reported out of memory errors
B: the file cut into several small files, and then filtered for each file, and finally in the consolidated
c: Using fseek function of PHP, a period of treatment.
Whether d: can build a hash table, but how to build, use what hash functions do not know? (for God to answer, it is a wonder)
You do not know what a more efficient way? For large file processing, experience less

3 is to do a statistical, if Baidu the home page to add a flower, click on the flowers, counting numbers next to add 1, what is the solution? How to prevent the brush?

For this problem, I want to achieve is not difficult, mainly to prevent the brush? I think IP is the unique identifier, but think of some large companies export IP is a, in this way, a company can only vote once. There are such as Unicom mobile phone, IP network is also a. May I ask how to solve this problem of anti brush

Started by Cher at October 29, 2016 - 1:54 AM

Posted by Cher at November 08, 2016 - 2:18 AM

Ding ding ding ding ding ding ding

Posted by Cher at November 22, 2016 - 2:23 AM

1 to achieve a function f (A, B), can be used with -12, +12 operation, +7, -7, +5, -5, how to get the shortest distance A-B.
For example, A=0 and B=24, the 0+12+12=24, which is the shortest distance.
Look up table method
```function f(\$a, \$b) {
\$c = abs(\$b - \$a);
\$d = array(
0 => '',
1 => '+5+5+5-7-7',
2 => '+7-5',
3 => '+5+5-7',
4 => '+7+7-5-5',
);
\$r = str_repeat('+12', intval(\$c/12));
\$c %= 12;
\$r .= str_repeat('+12', intval(\$c/7));
return \$r . \$d[\$c % 5];
}
```

Posted by Vernon at December 01, 2016 - 3:13 AM

Thank you, although it is not very clear, but the result is right

Posted by Cher at December 10, 2016 - 4:08 AM

There are 2 questions, what is the solution

Posted by Cher at December 14, 2016 - 4:41 AM

How does this\$d = array(
0 => '',
1 => '+5+5+5-7-7',
2 => '+7-5',
3 => '+5+5-7',
4 => '+7+7-5-5',
);What is the meaning of

Posted by Cher at December 15, 2016 - 5:25 AM

The dictionary look-up table ah, should not do?

Access log 2 such as Baidu day, log only one column, is IP, now wants to repeat the IP removal, how to do?
Hash table with IP key
PHP provides a ip2long for converting IP growth integer
MySQL also provides INET_ATON

Posted by Vernon at December 20, 2016 - 5:45 AM

function f(\$a, \$b) {
\$c = abs(\$b - \$a);
\$d = array(
0 => '',
1 => '+5+5+5-7-7',
2 => '+7-5',
3 => '+5+5-7',
4 => '+7+7-5-5',
);
\$r = str_repeat('+12', intval(\$c/12));
\$c %= 12;
\$r .= str_repeat('+7', intval(\$c/7));
\$c %= 7;
return \$r . \$d[\$c % 5];
}

I repair, such as if it is correct

Posted by Cher at December 23, 2016 - 5:58 AM

Hash table hash function, for what? What is the hash function input and output

Posted by Cher at December 30, 2016 - 6:56 AM

The hash algorithm binary values of an arbitrary length binary mapping for the smaller fixed length value, this small binary value called a hash value. The hash value is a data only and extremely compact numerical representation.

For your question, what functions do not need to use
Just import table, export to create unique index

Posted by Vernon at January 01, 2017 - 7:49 AM

The log file may be 10G or more, directly into the hash table, is it right? Memory burst? Should not hold.

Posted by Cher at January 03, 2017 - 8:10 AM

Top posts.~~~~

Posted by Esther at January 08, 2017 - 8:43 AM