如何在 Python 中从文件中一次读取一个字符?
- 2025-04-17 09:02:00
- admin 原创
- 17
问题描述:
在 Python 中,给定一个文件的名称,如何编写一个循环,每次循环读取一个字符?
解决方案 1:
with open(filename) as f:
while True:
c = f.read(1)
if not c:
print("End of file")
break
print("Read a character:", c)
解决方案 2:
首先,打开一个文件:
with open("filename") as fileobj:
for line in fileobj:
for ch in line:
print(ch)
这将遍历文件的每一行,然后遍历该行中的每个字符。
解决方案 3:
我喜欢这个公认的答案:它很简单,而且能完成工作。我还想提供一个替代方案:
def chunks(filename, buffer_size=4096):
"""Reads `filename` in chunks of `buffer_size` bytes and yields each chunk
until no more characters can be read; the last chunk will most likely have
less than `buffer_size` bytes.
:param str filename: Path to the file
:param int buffer_size: Buffer size, in bytes (default is 4096)
:return: Yields chunks of `buffer_size` size until exhausting the file
:rtype: str
"""
with open(filename, "rb") as fp:
chunk = fp.read(buffer_size)
while chunk:
yield chunk
chunk = fp.read(buffer_size)
def chars(filename, buffersize=4096):
"""Yields the contents of file `filename` character-by-character. Warning:
will only work for encodings where one character is encoded as one byte.
:param str filename: Path to the file
:param int buffer_size: Buffer size for the underlying chunks,
in bytes (default is 4096)
:return: Yields the contents of `filename` character-by-character.
:rtype: char
"""
for chunk in chunks(filename, buffersize):
for char in chunk:
yield char
def main(buffersize, filenames):
"""Reads several files character by character and redirects their contents
to `/dev/null`.
"""
for filename in filenames:
with open("/dev/null", "wb") as fp:
for char in chars(filename, buffersize):
fp.write(char)
if __name__ == "__main__":
# Try reading several files varying the buffer size
import sys
buffersize = int(sys.argv[1])
filenames = sys.argv[2:]
sys.exit(main(buffersize, filenames))
我建议的代码本质上与您接受的答案相同:从文件中读取给定数量的字节。不同之处在于,它首先读取大量数据(对于X86来说,4006是一个不错的默认值,但您可能想尝试1024或8192;页面大小的任意倍数),然后逐个生成该块中的字符。
对于较大的文件,我提供的代码可能更快。以托尔斯泰的《战争与和平》全文为例。以下是我的计时结果(Mac Book Pro 使用 OS X 10.7.4;so.py 是我粘贴的代码的名称):
$ time python so.py 1 2600.txt.utf-8
python so.py 1 2600.txt.utf-8 3.79s user 0.01s system 99% cpu 3.808 total
$ time python so.py 4096 2600.txt.utf-8
python so.py 4096 2600.txt.utf-8 1.31s user 0.01s system 99% cpu 1.318 total
现在:不要将缓冲区大小4096
视为普遍真理;查看我获得的不同大小的结果(缓冲区大小(字节)与挂钟时间(秒)):
2 2.726
4 1.948
8 1.693
16 1.534
32 1.525
64 1.398
128 1.432
256 1.377
512 1.347
1024 1.442
2048 1.316
4096 1.318
如您所见,您可以更早地看到性能提升(我的计时可能非常不准确);缓冲区大小是性能和内存之间的权衡。默认值 4096 是一个合理的选择,但一如既往,请先进行测量。
解决方案 4:
Python 本身可以在交互模式下帮助您实现这一点:
>>> help(file.read)
Help on method_descriptor:
read(...)
read([size]) -> read at most size bytes, returned as a string.
If the size argument is negative or omitted, read until EOF is reached.
Notice that when in non-blocking mode, less data than what was requested
may be returned, even if no size parameter was given.
解决方案 5:
只是:
myfile = open(filename)
onecharacter = myfile.read(1)
解决方案 6:
Python 3.8+ 的最佳答案:
with open(path, encoding="utf-8") as f:
while c := f.read(1):
do_my_thing(c)
您可能需要指定 utf-8 并避免使用平台编码。我在这里选择这样做。
函数——Python 3.8+:
def stream_file_chars(path: str):
with open(path) as f:
while c := f.read(1):
yield c
函数 – Python<=3.7:
def stream_file_chars(path: str):
with open(path, encoding="utf-8") as f:
while True:
c = f.read(1)
if c == "":
break
yield c
功能 – pathlib + 文档:
from pathlib import Path
from typing import Union, Generator
def stream_file_chars(path: Union[str, Path]) -> Generator[str, None, None]:
"""Streams characters from a file."""
with Path(path).open(encoding="utf-8") as f:
while (c := f.read(1)) != "":
yield c
解决方案 7:
今天,在观看 Raymond Hettinger 的《将代码转换为优美、惯用的 Python》时,我学到了一个新的习语:
import functools
with open(filename) as f:
f_read_ch = functools.partial(f.read, 1)
for ch in iter(f_read_ch, ''):
print 'Read a character:', repr(ch)
解决方案 8:
只读一个字符
f.read(1)
解决方案 9:
补充一下,如果你正在读取的文件包含一行非常大的行,这可能会破坏你的记忆,你可以考虑将它们读入缓冲区,然后产生每个字符
def read_char(inputfile, buffersize=10240):
with open(inputfile, 'r') as f:
while True:
buf = f.read(buffersize)
if not buf:
break
for char in buf:
yield char
yield '' #handle the scene that the file is empty
if __name__ == "__main__":
for word in read_char('./very_large_file.txt'):
process(char)
解决方案 10:
这也将起作用:
with open("filename") as fileObj:
for line in fileObj:
for ch in line:
print(ch)
它会检查文件中的每一行以及每一行中的每个字符。
(请注意,这篇文章现在看起来与一个获得高度赞同的答案极为相似,但在撰写本文时情况并非如此。)
解决方案 11:
os.system("stty -icanon -echo")
while True:
raw_c = sys.stdin.buffer.peek()
c = sys.stdin.read(1)
print(f"Char: {c}")
解决方案 12:
f = open('hi.txt', 'w')
f.write('0123456789abcdef')
f.close()
f = open('hej.txt', 'r')
f.seek(12)
print f.read(1) # This will read just "c"
解决方案 13:
你应该尝试一下f.read(1)
,这绝对是正确且应该做的事情。
解决方案 14:
结合其他一些答案的特点,这里有一些不受长文件/行影响的东西,同时更简洁、更快:
import functools as ft, itertools as it
with open(path) as f:
for c in it.chain.from_iterable(
iter(ft.partial(f.read, 4096), '')
):
print(c)
解决方案 15:
#reading out the file at once in a list and then printing one-by-one
f=open('file.txt')
for i in list(f.read()):
print(i)
扫码咨询,免费领取项目管理大礼包!