Consult a log analysis

Log parts as follows:

<tr><td class="data">1</td><td class="data2"><a href="192_168_23_13/d192_168_23_13.html"><img src="../images/datetime.png" title="date/time report" alt="T"></a></td><td class="data2"><a href="192_168_23_13/192_168_23_13.html">192.168.23.13</a></td><td class="data">45.43K</td><td class="data">2.63G</td><td class="data">13.58%</td><td class="data">14.90%</td><td class="data">85.10%</td><td class="data">00:00:00</td><td class="data">0</td><td class="data">0.00%</td></tr>


I want to withdraw the regular expression IP 192.168.23.13 and flow for 2.63G:
  1. pattern=re.compile ('192\.168\.\d+.\d+')
  2. # pattern=re.compile ('\d+.\d+G')
  3. text = pattern.search(string)
  4. if text:
  5. print text.group()
How can also extract? For example, 192.168.23.13 2.63G

Started by Archer at March 01, 2016 - 1:12 PM

import re

str1='<tr><td class="data">1</td><td class="data2"><a href="192_168_23_13/d192_168_23_13.html"><img src="../images/datetime.png" title="date/time report" alt="T"></a></td><td class="data2"><a href="192_168_23_13/192_168_23_13.html">192.168.23.13</a></td><td class="data">45.43K</td><td class="data">2.63G</td><td class="data">13.58%</td><td class="data">14.90%</td><td class="data">85.10%</td><td class="data">00:00:00</td><td class="data">0</td><td class="data">0.00%</td></tr>'

str3=re.escape('><a href="')
ls = re.findall(r'%s(\d+_\d+_\d+_\d+)/'%str3, str1)
for j in ls:
j=re.sub('_','.',j)
print "IP:",j

str4 = re.escape('"data">')
ls1 = re.findall(r'%s(\d+.\d+)G<'%str4, str1)
for j in ls1:
print "data:",j

Posted by Caspar at March 02, 2016 - 1:51 PM

Regular writing is a little rough :
  1. sed -r 's/.*html">(.+)<\/a>.*"data">(.+G)<.*/\1\t\2/' file
  1. re.findall(r'(\d+\.\d+\.\d+\.\d+|\d+\.\d+G)',str)

Posted by Donna at March 16, 2016 - 2:40 PM

Thank you very much for two teachers.

Posted by Archer at March 18, 2016 - 2:56 PM

If you are familiar with JQuery, you can also use the PyQuery, the readability of the code will be much higher than re.

Posted by Pete at March 23, 2016 - 3:49 PM

  1. [root@source ~]# cat a
  2. <tr><td class="data">1</td><td class="data2"><a href="192_168_23_13/d192_168_23_13.html"><img src="../images/datetime.png" title="date/time report" alt="T"></a></td><td class="data2"><a href="192_168_23_13/192_168_23_13.html">192.168.23.13</a></td><td class="data">45.43K</td><td class="data">2.63G</td><td class="data">13.58%</td><td class="data">14.90%</td><td class="data">85.10%</td><td class="data">00:00:00</td><td class="data">0</td><td class="data">0.00%</td></tr>
  3. [root@source ~]# grep -oP "(\d+\.){3}\d+|[^>]+G(?=<)" a
  4. 192.168.23.13
  5. 2.63G

Posted by Devin at March 27, 2016 - 4:47 PM