如何使用 python 一次从文件中读取两行-IT科技

摘要：问题描述：我正在编写一个解析文本文件的 Python 脚本。这个文本文件的格式是，每个元素占用两行，为了方便起见，我想在解析之前读取这两行。这可以用 Python 实现吗？我想要一些类似的东西：f = open(filename, "r") for line in f: line1...

问题描述：

我正在编写一个解析文本文件的 Python 脚本。这个文本文件的格式是，每个元素占用两行，为了方便起见，我想在解析之前读取这两行。这可以用 Python 实现吗？

我想要一些类似的东西：

f = open(filename, "r")
for line in f:
    line1 = line
    line2 = f.readline()

f.close

但这打破了以下说法：

ValueError：混合迭代和读取方法会丢失数据

有关的：

对列表进行分块迭代的最“Python 式”方法是什么？

解决方案 1：

这里有类似的问题。您不能混合使用迭代和 readline，因此您需要使用其中之一。

while True:
    line1 = f.readline()
    line2 = f.readline()
    if not line2: break  # EOF
    ...

解决方案 2：

import itertools
with open('a') as f:
    for line1,line2 in itertools.zip_longest(*[f]*2):
        print(line1,line2)

itertools.zip_longest()返回一个迭代器，因此即使文件有数十亿行长，它也能很好地工作。

如果行数为奇数，则在最后一次迭代中line2设置为。None

在 Python2 上您需要使用izip_longest。

评论中有人问这个解决方案是否先读取整个文件，然后再对文件进行第二次迭代。我认为不是。该with open('a') as f行打开了一个文件句柄，但没有读取文件。f是一个迭代器，因此在请求之前不会读取其内容。zip_longest接受迭代器作为参数，并返回一个迭代器。

zip_longest确实两次传入了同一个迭代器 f。但最终结果是，它next(f)在第一个参数上被调用，然后在第二个参数上被调用。由于next()它在同一个底层迭代器上被调用，因此会返回连续的行。这与读取整个文件截然不同。实际上，使用迭代器的目的恰恰是为了避免读取整个文件。

因此，我相信该解决方案可以按预期工作——文件仅被 for 循环读取一次。

为了证实这一点，我运行了 zip_longest 解决方案和使用的解决方案。我在脚本末尾f.readlines()添加了来暂停脚本，然后分别运行了这两个脚本：input()`ps axuw`

% ps axuw | grep zip_longest_method.py

unutbu 11119 2.2 0.2 4520 2712 pts/0 S+ 21:14 0:00 python /home/unutbu/pybin/zip_longest_method.py bigfile

% ps axuw | grep readlines_method.py

unutbu 11317 6.5 8.8 93908 91680 pts/0 S+ 21:16 0:00 python /home/unutbu/pybin/readlines_method.py bigfile

显然，它readlines会一次性读取整个文件。由于zip_longest_method占用的内存少得多，我认为可以肯定地说，它并不是一次性读取整个文件。

解决方案 3：

使用next()，例如

with open("file") as f:
    for line in f:
        print(line)
        nextline = next(f)
        print("next line", nextline)
        ....

解决方案 4：

这个怎么样？有人发现它有问题吗？

with open('file_name') as f:
    for line1, line2 in zip(f, f):
        print(line1, line2)

解决方案 5：

我将按照与ghostdog74类似的方式进行，只是在外部进行尝试并进行一些修改：

try:
    with open(filename) as f:
        for line1 in f:
            line2 = f.next()
            # process line1 and line2 here
except StopIteration:
    print "(End)" # do whatever you need to do with line1 alone

这使得代码既简单又健壮。with如果发生其他情况，使用会关闭文件，或者在资源耗尽后关闭资源并退出循环。

请注意，with需要 2.6 或with_statement启用该功能的 2.5。

解决方案 6：

适用于偶数和奇数长度的文件。它只是忽略不匹配的最后一行。

f=file("file")

lines = f.readlines()
for even, odd in zip(lines[0::2], lines[1::2]):
    print "even : ", even
    print "odd : ", odd
    print "end cycle"
f.close()

如果您的文件很大，这不是正确的方法。您使用 readlines() 将整个文件加载到内存中。我曾经写过一个类，它读取文件并保存每行开头的 fseek 位置。这样，您无需将整个文件加载到内存中即可获取特定行，并且还可以向前或向后移动。

我把它粘贴到这里。许可证是公共领域，也就是说，你想怎么用就怎么用。请注意，这个类是6年前写的，从那以后我就没碰过或检查过它。我觉得它甚至不符合文件规范。买者自负。另外，请注意，对于你的问题来说，这有点小题大做。我不是说你一定要这么做，但我有这段代码，如果你需要更复杂的访问方式，我很乐意分享。

import string
import re

class FileReader:
    """ 
    Similar to file class, but allows to access smoothly the lines 
    as when using readlines(), with no memory payload, going back and forth,
    finding regexps and so on.
    """
    def __init__(self,filename): # fold>>
        self.__file=file(filename,"r")
        self.__currentPos=-1
        # get file length
        self.__file.seek(0,0)
        counter=0
        line=self.__file.readline()
        while line != '':
            counter = counter + 1
            line=self.__file.readline()
        self.__length = counter
        # collect an index of filedescriptor positions against
        # the line number, to enhance search
        self.__file.seek(0,0)
        self.__lineToFseek = []

        while True:
            cur=self.__file.tell()
            line=self.__file.readline()
            # if it's not null the cur is valid for
            # identifying a line, so store
            self.__lineToFseek.append(cur)
            if line == '':
                break
    # <<fold
    def __len__(self): # fold>>
        """
        member function for the operator len()
        returns the file length
        FIXME: better get it once when opening file
        """
        return self.__length
        # <<fold
    def __getitem__(self,key): # fold>>
        """ 
        gives the "key" line. The syntax is

        import FileReader
        f=FileReader.FileReader("a_file")
        line=f[2]

        to get the second line from the file. The internal
        pointer is set to the key line
        """

        mylen = self.__len__()
        if key < 0:
            self.__currentPos = -1
            return ''
        elif key > mylen:
            self.__currentPos = mylen
            return ''

        self.__file.seek(self.__lineToFseek[key],0)
        counter=0
        line = self.__file.readline()
        self.__currentPos = key
        return line
        # <<fold
    def next(self): # fold>>
        if self.isAtEOF():
            raise StopIteration
        return self.readline()
    # <<fold
    def __iter__(self): # fold>>
        return self
    # <<fold
    def readline(self): # fold>>
        """
        read a line forward from the current cursor position.
        returns the line or an empty string when at EOF
        """
        return self.__getitem__(self.__currentPos+1)
        # <<fold
    def readbackline(self): # fold>>
        """
        read a line backward from the current cursor position.
        returns the line or an empty string when at Beginning of
        file.
        """
        return self.__getitem__(self.__currentPos-1)
        # <<fold
    def currentLine(self): # fold>>
        """
        gives the line at the current cursor position
        """
        return self.__getitem__(self.__currentPos)
        # <<fold
    def currentPos(self): # fold>>
        """ 
        return the current position (line) in the file
        or -1 if the cursor is at the beginning of the file
        or len(self) if it's at the end of file
        """
        return self.__currentPos
        # <<fold
    def toBOF(self): # fold>>
        """
        go to beginning of file
        """
        self.__getitem__(-1)
        # <<fold
    def toEOF(self): # fold>>
        """
        go to end of file
        """
        self.__getitem__(self.__len__())
        # <<fold
    def toPos(self,key): # fold>>
        """
        go to the specified line
        """
        self.__getitem__(key)
        # <<fold
    def isAtEOF(self): # fold>>
        return self.__currentPos == self.__len__()
        # <<fold
    def isAtBOF(self): # fold>>
        return self.__currentPos == -1
        # <<fold
    def isAtPos(self,key): # fold>>
        return self.__currentPos == key
        # <<fold

    def findString(self, thestring, count=1, backward=0): # fold>>
        """
        find the count occurrence of the string str in the file
        and return the line catched. The internal cursor is placed
        at the same line.
        backward is the searching flow.
        For example, to search for the first occurrence of "hello
        starting from the beginning of the file do:

        import FileReader
        f=FileReader.FileReader("a_file")
        f.toBOF()
        f.findString("hello",1,0)

        To search the second occurrence string from the end of the
        file in backward movement do:

        f.toEOF()
        f.findString("hello",2,1)

        to search the first occurrence from a given (or current) position
        say line 150, going forward in the file 

        f.toPos(150)
        f.findString("hello",1,0)

        return the string where the occurrence is found, or an empty string
        if nothing is found. The internal counter is placed at the corresponding
        line number, if the string was found. In other case, it's set at BOF
        if the search was backward, and at EOF if the search was forward.

        NB: the current line is never evaluated. This is a feature, since
        we can so traverse occurrences with a

        line=f.findString("hello")
        while line == '':
            line.findString("hello")

        instead of playing with a readline every time to skip the current
        line.
        """
        internalcounter=1
        if count < 1:
            count = 1
        while 1:
            if backward == 0:
                line=self.readline()
            else:
                line=self.readbackline()

            if line == '':
                return ''
            if string.find(line,thestring) != -1 :
                if count == internalcounter:
                    return line
                else:
                    internalcounter = internalcounter + 1
                    # <<fold
    def findRegexp(self, theregexp, count=1, backward=0): # fold>>
        """
        find the count occurrence of the regexp in the file
        and return the line catched. The internal cursor is placed
        at the same line.
        backward is the searching flow.
        You need to pass a regexp string as theregexp.
        returns a tuple. The fist element is the matched line. The subsequent elements
        contains the matched groups, if any.
        If no match returns None
        """
        rx=re.compile(theregexp)
        internalcounter=1
        if count < 1:
            count = 1
        while 1:
            if backward == 0:
                line=self.readline()
            else:
                line=self.readbackline()

            if line == '':
                return None
            m=rx.search(line)
            if m != None :
                if count == internalcounter:
                    return (line,)+m.groups()
                else:
                    internalcounter = internalcounter + 1
    # <<fold
    def skipLines(self,key): # fold>>
        """
        skip a given number of lines. Key can be negative to skip
        backward. Return the last line read.
        Please note that skipLines(1) is equivalent to readline()
        skipLines(-1) is equivalent to readbackline() and skipLines(0)
        is equivalent to currentLine()
        """
        return self.__getitem__(self.__currentPos+key)
    # <<fold
    def occurrences(self,thestring,backward=0): # fold>>
        """
        count how many occurrences of str are found from the current
        position (current line excluded... see skipLines()) to the
        begin (or end) of file.
        returns a list of positions where each occurrence is found,
        in the same order found reading the file.
        Leaves unaltered the cursor position.
        """
        curpos=self.currentPos()
        list = []
        line = self.findString(thestring,1,backward)
        while line != '':
            list.append(self.currentPos())
            line = self.findString(thestring,1,backward)
        self.toPos(curpos)
        return list
        # <<fold
    def close(self): # fold>>
        self.__file.close()
    # <<fold

解决方案 7：

file_name = '你的文件名'
file_open = open(file_name, 'r')

定义处理程序（line_one，line_two）：
    打印（第一行，第二行）

当文件打开时：
    尝试：
        一 = file_open.next()
        two = file_open.next()
        处理程序（一，二）
    除外（停止迭代）：
        文件打开.关闭()
        休息

解决方案 8：

def readnumlines(file, num=2):
    f = iter(file)
    while True:
        lines = [None] * num
        for i in range(num):
            try:
                lines[i] = f.next()
            except StopIteration: # EOF or not enough lines available
                return
        yield lines

# use like this
f = open("thefile.txt", "r")
for line1, line2 in readnumlines(f):
    # do something with line1 and line2

# or
for line1, line2, line3, ..., lineN in readnumlines(f, N):
    # do something with N lines

解决方案 9：

我的想法是创建一个生成器，一次从文件中读取两行，并将其作为 2 元组返回，这意味着您可以迭代结果。

from cStringIO import StringIO

def read_2_lines(src):   
    while True:
        line1 = src.readline()
        if not line1: break
        line2 = src.readline()
        if not line2: break
        yield (line1, line2)


data = StringIO("line1
line2
line3
line4
")
for read in read_2_lines(data):
    print read

如果行数为奇数，它将无法完美运行，但这应该会给您一个很好的轮廓。

解决方案 10：

我上个月也遇到过类似的问题。我尝试了 while 循环，分别使用了 f.readline() 和 f.readlines()。我的数据文件不大，所以最终选择了 f.readlines()，它能更好地控制索引，否则我就得用 f.seek() 来回移动文件指针了。

我的情况比楼主更复杂。因为我的数据文件对于每次解析多少行比较灵活，所以我必须先检查一些条件才能解析数据。

我发现有关 f.seek() 的另一个问题是，当我使用 codecs.open('', 'r', 'utf-8') 时，它不能很好地处理 utf-8（不太确定罪魁祸首，最终我放弃了这种方法。）

解决方案 11：

一个简单的小读取器。它会以两行两两的方式读取行，并在你迭代对象时将它们作为元组返回。你可以手动关闭它，或者当它超出范围时会自动关闭。

class doublereader:
    def __init__(self,filename):
        self.f = open(filename, 'r')
    def __iter__(self):
        return self
    def next(self):
        return self.f.next(), self.f.next()
    def close(self):
        if not self.f.closed:
            self.f.close()
    def __del__(self):
        self.close()

#example usage one
r = doublereader(r"C:ile.txt")
for a, h in r:
    print "x:%s
y:%s" % (a,h)
r.close()

#example usage two
for x,y in doublereader(r"C:ile.txt"):
    print "x:%s
y:%s" % (x,y)
#closes itself as soon as the loop goes out of scope

解决方案 12：

f = open(filename, "r")
for line in f:
    line1 = line
    f.next()

f.close

现在，你可以每两行读取一次文件。如果你愿意，也可以在读取之前检查一下 f 的状态。f.next()

解决方案 13：

如果文件大小合理，则使用列表理解将整个文件读入二元组列表的另一种方法是：

filaname = '/path/to/file/name'

with open(filename, 'r') as f:
    list_of_2tuples = [ (line,f.readline()) for line in f ]

for (line1,line2) in list_of_2tuples: # Work with them in pairs.
    print('%s :: %s', (line1,line2))

解决方案 14：

这段 Python 代码将打印前两行：

import linecache  
filename = "ooxx.txt"  
print(linecache.getline(filename,2))