I'm writing a simple program that's going to parse a logfile of a packet dump from wireshark into a more readable form. I'm doing this with python.
Currently I'm stuck on this part:
for i in range(len(linelist)):
if '### SERVER' in linelist[i]:
#do server parsing stuff
packet = linelist[i:find("\n\n", i, len(linelist))]
linelist is a list created using the readlines() method, so every line in the file is an element in the list. I'm iterating through it for all occurances of "### SERVER", then grabbing all lines after it until the next empty line(which signifies the end of the packet). I must be doing something wrong, because not only is find() not working, but I have a feeling there's a better way to grab everything between ### SERVER and the next occurance of a blank line.
Any ideas?
-
Looking at thefile.readlines() doc:
file.readlines([sizehint])
Read until EOF using readline() and return a list containing the lines thus read. If the optional sizehint argument is present, instead of reading up to EOF, whole lines totalling approximately sizehint bytes (possibly after rounding up to an internal buffer size) are read. Objects implementing a file-like interface may choose to ignore sizehint if it cannot be implemented, or cannot be implemented efficiently.
and the file.readline() doc:
file.readline([size])
Read one entire line from the file. A trailing newline character is kept in the string (but may be absent when a file ends with an incomplete line). [6] If the size argument is present and non-negative, it is a maximum byte count (including the trailing newline) and an incomplete line may be returned. An empty string is returned only when EOF is encountered immediately.
A trailing newline character is kept in the string
- means that each line inlinelist
will contain at most one newline. That is why you cannot find a"\n\n"
substring in any of the lines - look for a whole blank line (or an empty one at EOF):if myline in ("\n", ""): handle_empty_line()
Note: I tried to explain
find
behavior, but a pythonic solution looks very different from your code snippet. -
General idea is:
inpacket = False packets = [] for line in open("logfile"): if inpacket: content += line if line in ("\n", ""): # empty line inpacket = False packets.append(content) elif '### SERVER' in line: inpacket = True content = line # put here packets.append on eof if needed
-
This works well with an explicit iterator, also. That way, nested loops can update the iterator's state by consuming lines.
fileIter= iter(theFile) for x in fileIter: if "### SERVER" in x: block = [x] for y in fileIter: if len(y.strip()) == 0: # empty line break block.append(y) print block # Or whatever # elif some other pattern:
This has the pleasant property of finding blocks that are at the tail end of the file, and don't have a blank line terminating them.
Also, this is quite easy to generalize, since there's no explicit state-change variables, you just go into another loop to soak up lines in other kinds of blocks.
-
best way - use generators read presentation Generator Tricks for Systems Programmers This best that I saw about parsing log ;)
Peter Rowell : That was my first thought, too. A slightly more up-to-date version of the same talk is at http://www.dabeaz.com/generators-uk/. I have actually had *dreams* about generator pipelines. (how weird is that?).
0 comments:
Post a Comment