如何在 Python 生成器中向前查看一个元素(窥视)?
- 2025-03-19 08:57:00
- admin 原创
- 50
问题描述:
我不知道如何在 Python 生成器中向前查看一个元素。我一看,它就消失了。
我的意思是:
gen = iter([1,2,3])
next_value = gen.next() # okay, I looked forward and see that next_value = 1
# but now:
list(gen) # is [2, 3] -- the first value is gone!
这是一个更加真实的例子:
gen = element_generator()
if gen.next_value() == 'STOP':
quit_application()
else:
process(gen.next())
有人能帮我编写一个可以向前查找一个元素的生成器吗?
另请参阅: 在 Python 中重置生成器对象
解决方案 1:
为了完整起见,该more-itertools
包(应该是任何 Python 程序员工具箱的一部分)包含一个实现此行为的包装器。如文档peekable
中的代码示例所示:
>>> p = peekable(['a', 'b'])
>>> p.peek()
'a'
>>> next(p)
'a'
但是,通常可以重写使用此功能的代码,使其实际上不需要它。例如,问题中的实际代码示例可以这样写:
gen = element_generator()
command = gen.next_value()
if command == 'STOP':
quit_application()
else:
process(command)
(读者注意:我保留了撰写本文时问题示例中的语法,即使它指的是 Python 的旧版本)
解决方案 2:
Python 生成器 API 是单向的:您无法推回已读取的元素。但您可以使用itertools 模块创建一个新的迭代器并将元素添加到前面:
import itertools
gen = iter([1,2,3])
peek = gen.next()
print list(itertools.chain([peek], gen))
解决方案 3:
好吧——晚了两年——但我遇到了这个问题,却没有找到令我满意的答案。我想出了这个元生成器:
class Peekorator(object):
def __init__(self, generator):
self.empty = False
self.peek = None
self.generator = generator
try:
self.peek = self.generator.next()
except StopIteration:
self.empty = True
def __iter__(self):
return self
def next(self):
"""
Return the self.peek element, or raise StopIteration
if empty
"""
if self.empty:
raise StopIteration()
to_return = self.peek
try:
self.peek = self.generator.next()
except StopIteration:
self.peek = None
self.empty = True
return to_return
def simple_iterator():
for x in range(10):
yield x*3
pkr = Peekorator(simple_iterator())
for i in pkr:
print i, pkr.peek, pkr.empty
结果:
0 3 False
3 6 False
6 9 False
9 12 False
...
24 27 False
27 None False
即,在迭代过程中您可以随时访问列表中的下一个项目。
解决方案 4:
使用itertools.tee
将生成生成器的轻量级副本;然后提前查看一个副本不会影响第二个副本。因此:
import itertools
def process(seq):
peeker, items = itertools.tee(seq)
# initial peek ahead
# so that peeker is one ahead of items
if next(peeker) == 'STOP':
return
for item in items:
# peek ahead
if next(peeker) == "STOP":
return
# process items
print(item)
items
修改 不会影响生成器。peeker
但是,seq
在调用 之后进行修改tee
可能会引发问题。
也就是说:任何需要在生成器中提前查找项目的算法都可以改为使用当前生成器项目和前一个项目。这将导致更简单的代码 - 请参阅我对这个问题的其他回答。
解决方案 5:
允许查看下一个元素以及更远的元素的迭代器。它会根据需要提前读取并记住 中的值deque
。
from collections import deque
class PeekIterator:
def __init__(self, iterable):
self.iterator = iter(iterable)
self.peeked = deque()
def __iter__(self):
return self
def __next__(self):
if self.peeked:
return self.peeked.popleft()
return next(self.iterator)
def peek(self, ahead=0):
while len(self.peeked) <= ahead:
self.peeked.append(next(self.iterator))
return self.peeked[ahead]
演示:
>>> it = PeekIterator(range(10))
>>> it.peek()
0
>>> it.peek(5)
5
>>> it.peek(13)
Traceback (most recent call last):
File "<pyshell#68>", line 1, in <module>
it.peek(13)
File "[...]", line 15, in peek
self.peeked.append(next(self.iterator))
StopIteration
>>> it.peek(2)
2
>>> next(it)
0
>>> it.peek(2)
3
>>> list(it)
[1, 2, 3, 4, 5, 6, 7, 8, 9]
>>>
解决方案 6:
>>> gen = iter(range(10))
>>> peek = next(gen)
>>> peek
0
>>> gen = (value for g in ([peek], gen) for value in g)
>>> list(gen)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
解决方案 7:
只是为了好玩,我根据 Aaron 的建议创建了一个前瞻类的实现:
import itertools
class lookahead_chain(object):
def __init__(self, it):
self._it = iter(it)
def __iter__(self):
return self
def next(self):
return next(self._it)
def peek(self, default=None, _chain=itertools.chain):
it = self._it
try:
v = self._it.next()
self._it = _chain((v,), it)
return v
except StopIteration:
return default
lookahead = lookahead_chain
这样,以下操作就可以完成:
>>> t = lookahead(xrange(8))
>>> list(itertools.islice(t, 3))
[0, 1, 2]
>>> t.peek()
3
>>> list(itertools.islice(t, 3))
[3, 4, 5]
采用这种实现方式,连续多次调用 peek 是一个坏主意……
在查看 CPython 源代码时,我发现了一种更短、更高效的更好方法:
class lookahead_tee(object):
def __init__(self, it):
self._it, = itertools.tee(it, 1)
def __iter__(self):
return self._it
def peek(self, default=None):
try:
return self._it.__copy__().next()
except StopIteration:
return default
lookahead = lookahead_tee
用法与上面相同,但您无需为连续多次使用 peek 付出代价。只需多几行代码,您还可以在迭代器中提前查看多个项目(最多可用 RAM)。
解决方案 8:
一个简单的解决方案是使用如下函数:
def peek(it):
first = next(it)
return first, itertools.chain([first], it)
然后你可以这样做:
>>> it = iter(range(10))
>>> x, it = peek(it)
>>> x
0
>>> next(it)
0
>>> next(it)
1
解决方案 9:
这将起作用——它缓冲一个项目并使用序列中的每个项目和下一个项目调用一个函数。
您对序列末尾发生的事情的要求很模糊。当您处于最后一个时,“向前看”是什么意思?
def process_with_lookahead( iterable, aFunction ):
prev= iterable.next()
for item in iterable:
aFunction( prev, item )
prev= item
aFunction( item, None )
def someLookaheadFunction( item, next_item ):
print item, next_item
解决方案 10:
如果有人感兴趣,如果我错了请纠正我,但我相信向任何迭代器添加一些推回功能非常容易。
class Back_pushable_iterator:
"""Class whose constructor takes an iterator as its only parameter, and
returns an iterator that behaves in the same way, with added push back
functionality.
The idea is to be able to push back elements that need to be retrieved once
more with the iterator semantics. This is particularly useful to implement
LL(k) parsers that need k tokens of lookahead. Lookahead or push back is
really a matter of perspective. The pushing back strategy allows a clean
parser implementation based on recursive parser functions.
The invoker of this class takes care of storing the elements that should be
pushed back. A consequence of this is that any elements can be "pushed
back", even elements that have never been retrieved from the iterator.
The elements that are pushed back are then retrieved through the iterator
interface in a LIFO-manner (as should logically be expected).
This class works for any iterator but is especially meaningful for a
generator iterator, which offers no obvious push back ability.
In the LL(k) case mentioned above, the tokenizer can be implemented by a
standard generator function (clean and simple), that is completed by this
class for the needs of the actual parser.
"""
def __init__(self, iterator):
self.iterator = iterator
self.pushed_back = []
def __iter__(self):
return self
def __next__(self):
if self.pushed_back:
return self.pushed_back.pop()
else:
return next(self.iterator)
def push_back(self, element):
self.pushed_back.append(element)
it = Back_pushable_iterator(x for x in range(10))
x = next(it) # 0
print(x)
it.push_back(x)
x = next(it) # 0
print(x)
x = next(it) # 1
print(x)
x = next(it) # 2
y = next(it) # 3
print(x)
print(y)
it.push_back(y)
it.push_back(x)
x = next(it) # 2
y = next(it) # 3
print(x)
print(y)
for x in it:
print(x) # 4-9
解决方案 11:
您不应使用项目 (i, i+1),其中“i”是当前项目,i+1 是“前瞻”版本,而应使用 (i-1, i),其中“i-1”是来自生成器的先前版本。
以这种方式调整算法将产生与您当前拥有的结果相同的东西,除了试图“向前窥视”的额外不必要的复杂性之外。
偷看是错误的,你不应该这样做。
解决方案 12:
虽然itertools.chain()
这是完成此处工作的自然工具,但请谨防如下循环:
for elem in gen:
...
peek = next(gen)
gen = itertools.chain([peek], gen)
...因为这将消耗线性增长的内存量,并最终停滞不前。(此代码本质上似乎创建了一个链接列表,每个 chain() 调用一个节点。)我知道这一点并不是因为我检查了库,而是因为这导致我的程序速度大幅下降 - 删除该gen = itertools.chain([peek], gen)
行会再次加快速度。(Python 3.3)
解决方案 13:
@jonathan-hartley答案的 Python3 片段:
def peek(iterator, eoi=None):
iterator = iter(iterator)
try:
prev = next(iterator)
except StopIteration:
return iterator
for elm in iterator:
yield prev, elm
prev = elm
yield prev, eoi
for curr, nxt in peek(range(10)):
print((curr, nxt))
# (0, 1)
# (1, 2)
# (2, 3)
# (3, 4)
# (4, 5)
# (5, 6)
# (6, 7)
# (7, 8)
# (8, 9)
# (9, None)
__iter__
创建一个执行此操作的类并仅产生prev
项目并将其放入elm
某个属性中很简单。
解决方案 14:
wrt@David Z 的帖子,较新的seekable
工具可以将包装的迭代器重置到先前的位置。
>>> s = mit.seekable(range(3))
>>> s.next()
# 0
>>> s.seek(0) # reset iterator
>>> s.next()
# 0
>>> s.next()
# 1
>>> s.seek(1)
>>> s.next()
# 1
>>> next(s)
# 2
解决方案 15:
cytoolz具有预览功能。
>> from cytoolz import peek
>> gen = iter([1,2,3])
>> first, continuation = peek(gen)
>> first
1
>> list(continuation)
[1, 2, 3]
解决方案 16:
就我而言,我需要一个生成器,我可以将刚刚通过 next() 调用获得的数据排队回到生成器中。
我处理这个问题的方法是创建一个队列。在生成器的实现中,我会首先检查队列:如果队列不为空,“yield”将返回队列中的值,否则以正常方式返回值。
import queue
def gen1(n, q):
i = 0
while True:
if not q.empty():
yield q.get()
else:
yield i
i = i + 1
if i >= n:
if not q.empty():
yield q.get()
break
q = queue.Queue()
f = gen1(2, q)
i = next(f)
print(i)
i = next(f)
print(i)
q.put(i) # put back the value I have just got for following 'next' call
i = next(f)
print(i)
跑步
python3 gen_test.py
0
1
1
当我编写解析器时,这个概念非常有用,它需要逐行查看文件,如果该行似乎属于解析的下一阶段,我就可以将其排队回到生成器,以便下一阶段的代码可以正确解析它而无需处理复杂的状态。
解决方案 17:
通过“查看”生成器中的下一个元素来工作的算法可以等效地通过记住前一个元素、将该元素视为要操作的元素并将“当前”元素视为简单的“查看”来工作。
不管怎样,实际发生的情况是,算法考虑了来自生成器的重叠对itertools.tee
。这个配方会很好地发挥作用——不难看出,它本质上是Jonathan Hartley 方法的重构版本:
from itertools import tee
# From https://docs.python.org/3/library/itertools.html#itertools.pairwise
# In 3.10 and up, this is directly supplied by the `itertools` module.
def pairwise(iterable):
# pairwise('ABCDEFG') --> AB BC CD DE EF FG
a, b = tee(iterable)
next(b, None)
return zip(a, b)
def process(seq):
for to_process, lookahead in pairwise(seq):
# peek ahead
if lookahead == "STOP":
return
# process items
print(to_process)
解决方案 18:
对于那些崇尚节俭和单行代码的人,我向你们介绍了一个允许人们在迭代中向前看的单行代码(这只在 Python 3.8 及更高版本中有效):
>>> import itertools as it
>>> peek = lambda iterable, n=1: it.islice(zip(it.chain((t := it.tee(iterable))[0], [None] * n), it.chain([None] * n, t[1])), n, None)
>>> for lookahead, element in peek(range(10)):
... print(lookahead, element)
1 0
2 1
3 2
4 3
5 4
6 5
7 6
8 7
9 8
None 9
>>> for lookahead, element in peek(range(10), 2):
... print(lookahead, element)
2 0
3 1
4 2
5 3
6 4
7 5
8 6
9 7
None 8
None 9
此方法可避免多次复制迭代器,从而节省空间。由于它以惰性方式生成元素,因此速度也很快。最后,锦上添花的是,您可以向前查看任意数量的元素。
解决方案 19:
许多用例iter.peek()
也可以使用当前值、下一个值的迭代器来实现。
这比当前接受的答案有一个优势,因为它不涉及itertools.chain
重复调用,每次都将迭代器包装在新的迭代器中。
这只是手动编写的几行代码:
>>> def cur_next(iterator):
... cur = None
... for num, next_val in enumerate(iterator):
... if num != 0:
... yield cur, next_val
... cur = next_val
... yield cur, None
...
>>> list(cur_next(range(10)))
[(0, 1), (1, 2), (2, 3), (3, 4), (4, 5), (5, 6), (6, 7), (7, 8), (8, 9), (9, None)]
也可以使用 itertools 来完成,尽管在这种情况下并不能节省太多代码。
>>> import itertools
>>> def cur_next_2(iterator):
... cur, next = itertools.tee(iterator)
... cur = itertools.chain([None], cur)
... result = itertools.zip_longest(cur, next)
... # Advance so the first element is (0, 1)
... for discarded in result:
... break
... return result
...
>>> list(cur_next_2(range(10)))
[(0, 1), (1, 2), (2, 3), (3, 4), (4, 5), (5, 6), (6, 7), (7, 8), (8, 9), (9, None)]
解决方案 20:
此peekable
函数应该可以达到这个目的:包装原始迭代器并产生后续项对。
from collections.abc import Iterable, Generator
from typing import Any
def peekable(iterable: Iterable[Any],
sentinel: Any = None) -> Generator[tuple[Any, Any], None, None]:
it = iter(iterable)
last = next(it)
for value in it:
yield last, value
last = value
yield last, sentinel
n = [1, 2, 3]
for this, peek in peekable(n):
print(this, peek)
# 1, 2
# 2, 3
# 3, None
扫码咨询,免费领取项目管理大礼包!