哪个操作更快,re.match/search 还是 str.find?
- 2025-03-20 08:47:00
- admin 原创
- 47
问题描述:
对于一次性字符串搜索,使用 str.find/rfind 是否比使用 re.match/search 更快?
也就是说,对于给定的字符串 s,我应该使用:
if s.find('lookforme') > -1:
do something
或者
if re.match('lookforme',s):
do something else
?
解决方案 1:
这个问题:哪个更快最好通过使用来回答timeit
。
from timeit import timeit
import re
def find(string, text):
if string.find(text) > -1:
pass
def re_find(string, text):
if re.match(text, string):
pass
def best_find(string, text):
if text in string:
pass
print timeit("find(string, text)", "from __main__ import find; string='lookforme'; text='look'")
print timeit("re_find(string, text)", "from __main__ import re_find; string='lookforme'; text='look'")
print timeit("best_find(string, text)", "from __main__ import best_find; string='lookforme'; text='look'")
输出为:
0.441393852234
2.12302494049
0.251421928406
因此,您不仅应该使用该in
运算符,因为它更易于阅读,而且因为它也更快。
解决方案 2:
为了完成有关正则表达式编译时间的最受欢迎答案问题,这里有一个具有预编译模式的版本:
from timeit import timeit
import re
def find(string, text):
if string.find(text) > -1:
pass
def re_find(string, text_re):
if text_re.match(string):
pass
def best_find(string, text):
if text in string:
pass
print timeit("find(string, text)", "from __main__ import find; string='lookforme'; text='look'")
print timeit("re_find(string, text_re)", "from __main__ import re_find; string='lookforme'; import re; text_re=re.compile('look')")
print timeit("best_find(string, text)", "from __main__ import best_find; string='lookforme'; text='look'")
我的数据如下:
0.189274072647
0.239935874939
0.0820939540863
预编译模式提高了数字,但仍然in
更快。
解决方案 3:
使用这个:
if 'lookforme' in s:
do something
正则表达式需要先编译,这会增加一些开销。无论如何,Python 的常规字符串搜索非常高效。
如果您经常搜索相同的术语或者当您执行更复杂的事情时,正则表达式会变得更加有用。
解决方案 4:
也许有人仍然感兴趣。给出的答案看起来不错,但只考虑了一个很短的字符串。事实上,如果你取一个长字符串,而你正在寻找的模式大致在末尾,那么性能就会向正则表达式的方向转变!
import re
def find(string, text):
if string.find(text) > -1:
pass
def re_find(string, text):
if re.match(text, string):
pass
def best_find(string, text):
if text in string:
pass
very_long_string = 'sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd'
pattern = 'look'
print('pattern at the end of string')
print('find:', end=' ')
%timeit find(very_long_string + pattern, pattern)
print('regex:', end=' ')
%timeit re_find(very_long_string + pattern, pattern)
print('in:', end=' ')
%timeit best_find(very_long_string + pattern, pattern)
print('pattern in front of string')
print('find:', end=' ')
%timeit find(pattern + very_long_string, pattern)
print('regex:', end=' ')
%timeit re_find(pattern + very_long_string, pattern)
print('in:', end=' ')
%timeit best_find(pattern + very_long_string, pattern)
输出结果如下:
pattern at the end of string
find: 3.41 µs ± 74.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
regex: 1.93 µs ± 23.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
in: 3.32 µs ± 74.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
pattern in front of string
find: 748 ns ± 15.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
regex: 2.03 µs ± 21.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
in: 589 ns ± 6.75 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
摘要:find
和in
取决于字符串长度和字符串中模式的位置,而regex
对于末尾带有模式的非常长的字符串,它在某种程度上与字符串长度无关,并且速度更快。
解决方案 5:
我遇到了同样的问题。我使用 Jupyter 的 %timeit 来检查:
import re
sent = "a sentence for measuring a find function"
sent_list = sent.split()
print("x in sentence")
%timeit "function" in sent
print("x in token list")
%timeit "function" in sent_list
print("regex search")
%timeit bool(re.match(".*function.*", sent))
print("compiled regex search")
regex = re.compile(".*function.*")
%timeit bool(regex.match(sent))
句子中的 x 61.3 ns ± 3 ns 每循环(7 次运行的平均值 ± 标准差,每次 10000000 个循环)
标记列表中的 x 93.3 ns ± 1.26 ns 每循环(7 次运行的平均值 ± 标准差,每次 10000000 个循环)
正则表达式搜索每次循环 772 ns ± 8.42 ns(7 次运行的平均值 ± 标准差,每次 1000000 次循环)
编译正则表达式搜索每次循环 420 ns ± 7.68 ns(7 次运行的平均值 ± 标准差,每次 1000000 次循环)
编译速度快,但是越简单越好。
解决方案 6:
如果您反复搜索相同的内容,re.compile 可以大大加快正则表达式的速度。但我在匹配之前使用“in”剔除不良情况,从而大大提高了速度。我知道这只是轶事。~Ben
解决方案 7:
除了上述答案之外,re.search() 和 re.match() 需要相同的运行时间。
if(re.search(rf"{re.escape(some_keyword)}",some_sentence))
运行时间与
if(re.search(rf"{re.escape(some_keyword)}",some_sentence))
如果你的正则表达式必须需要某些单词匹配,那么使用“if”“in”搜索减少正则表达式比较是一个更好的选择。例如,下面的代码比上面的两个代码更快,并且给出相同的结果:
if(some_keyword.lower() in some_sentence.lower()):
if(re.search(rf"{re.escape(some_keyword)}",some_sentence)):
扫码咨询,免费领取项目管理大礼包!