如何在 Python 中使用正则表达式验证 URL?

2025-02-11 09:51:00
admin
原创
81
摘要:问题描述:我正在 Google App Engine 上构建一个应用程序。我对 Python 非常陌生,过去 3 天一直在努力解决以下问题。我有一个类来表示 RSS Feed,在这个类中我有一个名为 setUrl 的方法。此方法的输入是一个 URL。我正在尝试使用 re python 模块来验证 RFC 39...

问题描述:

我正在 Google App Engine 上构建一个应用程序。我对 Python 非常陌生,过去 3 天一直在努力解决以下问题。

我有一个类来表示 RSS Feed,在这个类中我有一个名为 setUrl 的方法。此方法的输入是一个 URL。

我正在尝试使用 re python 模块来验证 RFC 3986 Reg-ex (http://www.ietf.org/rfc/rfc3986.txt

下面是一个应该可以工作的片段?

p = re.compile('^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(?([^#]*))?(#(.*))?')
m = p.match(url)
if m:
  self.url = url
  return url

解决方案 1:

这是解析 URL 的完整正则表达式。

(?:https?://(?:(?:(?:(?:(?:[a-zA-Zd](?:(?:[a-zA-Zd]|-)*[a-zA-Zd])?)
.)*(?:[a-zA-Z](?:(?:[a-zA-Zd]|-)*[a-zA-Zd])?))|(?:(?:d+)(?:.(?:d
+)){3}))(?::(?:d+))?)(?:/(?:(?:(?:(?:[a-zA-Zd$-_.+!*'(),]|(?:%[a-fA
-Fd]{2}))|[;:@&=])*)(?:/(?:(?:(?:[a-zA-Zd$-_.+!*'(),]|(?:%[a-fA-Fd
]{2}))|[;:@&=])*))*)(?:?(?:(?:(?:[a-zA-Zd$-_.+!*'(),]|(?:%[a-fA-Fd
]{2}))|[;:@&=])*))?)?)|(?:s?ftp://(?:(?:(?:(?:(?:[a-zA-Zd$-_.+!*'(),
]|(?:%[a-fA-Fd]{2}))|[;?&=])*)(?::(?:(?:(?:[a-zA-Zd$-_.+!*'(),]|(?:
%[a-fA-Fd]{2}))|[;?&=])*))?@)?(?:(?:(?:(?:(?:[a-zA-Zd](?:(?:[a-zA-Z\nd]|-)*[a-zA-Zd])?).)*(?:[a-zA-Z](?:(?:[a-zA-Zd]|-)*[a-zA-Zd])?))|(
?:(?:d+)(?:.(?:d+)){3}))(?::(?:d+))?))(?:/(?:(?:(?:(?:[a-zA-Zd$-
_.+!*'(),]|(?:%[a-fA-Fd]{2}))|[?:@&=])*)(?:/(?:(?:(?:[a-zA-Zd$-_.+!
*'(),]|(?:%[a-fA-Fd]{2}))|[?:@&=])*))*)(?:;type=[AIDaid])?)?)|(?:news
:(?:(?:(?:(?:[a-zA-Zd$-_.+!*'(),]|(?:%[a-fA-Fd]{2}))|[;/?:&=])+@(?:
(?:(?:(?:[a-zA-Zd](?:(?:[a-zA-Zd]|-)*[a-zA-Zd])?).)*(?:[a-zA-Z](?:
(?:[a-zA-Zd]|-)*[a-zA-Zd])?))|(?:(?:d+)(?:.(?:d+)){3})))|(?:[a-zA
-Z](?:[a-zA-Zd]|[_.+-])*)|*))|(?:nntp://(?:(?:(?:(?:(?:[a-zA-Zd](?:
(?:[a-zA-Zd]|-)*[a-zA-Zd])?).)*(?:[a-zA-Z](?:(?:[a-zA-Zd]|-)*[a-zA
-Zd])?))|(?:(?:d+)(?:.(?:d+)){3}))(?::(?:d+))?)/(?:[a-zA-Z](?:[a-
zA-Zd]|[_.+-])*)(?:/(?:d+))?)|(?:telnet://(?:(?:(?:(?:(?:[a-zA-Zd$\n-_.+!*'(),]|(?:%[a-fA-Fd]{2}))|[;?&=])*)(?::(?:(?:(?:[a-zA-Zd$-_.+!
*'(),]|(?:%[a-fA-Fd]{2}))|[;?&=])*))?@)?(?:(?:(?:(?:(?:[a-zA-Zd](?:(
?:[a-zA-Zd]|-)*[a-zA-Zd])?).)*(?:[a-zA-Z](?:(?:[a-zA-Zd]|-)*[a-zA-
Zd])?))|(?:(?:d+)(?:.(?:d+)){3}))(?::(?:d+))?))/?)|(?:gopher://(?
:(?:(?:(?:(?:[a-zA-Zd](?:(?:[a-zA-Zd]|-)*[a-zA-Zd])?).)*(?:[a-zA-Z
](?:(?:[a-zA-Zd]|-)*[a-zA-Zd])?))|(?:(?:d+)(?:.(?:d+)){3}))(?::(?
:d+))?)(?:/(?:[a-zA-Zd$-_.+!*'(),;/?:@&=]|(?:%[a-fA-Fd]{2}))(?:(?:
(?:[a-zA-Zd$-_.+!*'(),;/?:@&=]|(?:%[a-fA-Fd]{2}))*)(?:%09(?:(?:(?:[
a-zA-Zd$-_.+!*'(),]|(?:%[a-fA-Fd]{2}))|[;:@&=])*)(?:%09(?:(?:[a-zA-
Zd$-_.+!*'(),;/?:@&=]|(?:%[a-fA-Fd]{2}))*))?)?)?)?)|(?:wais://(?:(?
:(?:(?:(?:[a-zA-Zd](?:(?:[a-zA-Zd]|-)*[a-zA-Zd])?).)*(?:[a-zA-Z](?
:(?:[a-zA-Zd]|-)*[a-zA-Zd])?))|(?:(?:d+)(?:.(?:d+)){3}))(?::(?:d
+))?)/(?:(?:[a-zA-Zd$-_.+!*'(),]|(?:%[a-fA-Fd]{2}))*)(?:(?:/(?:(?:[
a-zA-Zd$-_.+!*'(),]|(?:%[a-fA-Fd]{2}))*)/(?:(?:[a-zA-Zd$-_.+!*'()
,]|(?:%[a-fA-Fd]{2}))*))|?(?:(?:(?:[a-zA-Zd$-_.+!*'(),]|(?:%[a-fA-
Fd]{2}))|[;:@&=])*))?)|(?:mailto:(?:(?:[a-zA-Zd$-_.+!*'(),;/?:@&=]|
(?:%[a-fA-Fd]{2}))+))|(?:file://(?:(?:(?:(?:(?:[a-zA-Zd](?:(?:[a-zA-
Zd]|-)*[a-zA-Zd])?).)*(?:[a-zA-Z](?:(?:[a-zA-Zd]|-)*[a-zA-Zd])?))
|(?:(?:d+)(?:.(?:d+)){3}))|localhost)?/(?:(?:(?:(?:[a-zA-Zd$-_.+!
*'(),]|(?:%[a-fA-Fd]{2}))|[?:@&=])*)(?:/(?:(?:(?:[a-zA-Zd$-_.+!*'()
,]|(?:%[a-fA-Fd]{2}))|[?:@&=])*))*))|(?:prospero://(?:(?:(?:(?:(?:[a-
zA-Zd](?:(?:[a-zA-Zd]|-)*[a-zA-Zd])?).)*(?:[a-zA-Z](?:(?:[a-zA-Zd
]|-)*[a-zA-Zd])?))|(?:(?:d+)(?:.(?:d+)){3}))(?::(?:d+))?)/(?:(?:(
?:(?:[a-zA-Zd$-_.+!*'(),]|(?:%[a-fA-Fd]{2}))|[?:@&=])*)(?:/(?:(?:(?
:[a-zA-Zd$-_.+!*'(),]|(?:%[a-fA-Fd]{2}))|[?:@&=])*))*)(?:(?:;(?:(?:
(?:[a-zA-Zd$-_.+!*'(),]|(?:%[a-fA-Fd]{2}))|[?:@&])*)=(?:(?:(?:[a-zA
-Zd$-_.+!*'(),]|(?:%[a-fA-Fd]{2}))|[?:@&])*)))*)|(?:ldap://(?:(?:(?
:(?:(?:(?:[a-zA-Zd](?:(?:[a-zA-Zd]|-)*[a-zA-Zd])?).)*(?:[a-zA-Z](?
:(?:[a-zA-Zd]|-)*[a-zA-Zd])?))|(?:(?:d+)(?:.(?:d+)){3}))(?::(?:d
+))?))?/(?:(?:(?:(?:(?:(?:(?:[a-zA-Zd]|%(?:3d|[46][a-fA-Fd]|[57][Aa
d]))|(?:%20))+|(?:OID|oid).(?:(?:d+)(?:.(?:d+))*))(?:(?:%0[Aa])?(
?:%20)*)=(?:(?:%0[Aa])?(?:%20)*))?(?:(?:[a-zA-Zd$-_.+!*'(),]|(?:%[a-
fA-Fd]{2}))*))(?:(?:(?:%0[Aa])?(?:%20)*)+(?:(?:%0[Aa])?(?:%20)*)(?:(
?:(?:(?:(?:[a-zA-Zd]|%(?:3d|[46][a-fA-Fd]|[57][Aad]))|(?:%20))+|(?
:OID|oid).(?:(?:d+)(?:.(?:d+))*))(?:(?:%0[Aa])?(?:%20)*)=(?:(?:%0[
Aa])?(?:%20)*))?(?:(?:[a-zA-Zd$-_.+!*'(),]|(?:%[a-fA-Fd]{2}))*)))*)
(?:(?:(?:(?:%0[Aa])?(?:%20)*)(?:[;,])(?:(?:%0[Aa])?(?:%20)*))(?:(?:(?:
(?:(?:(?:[a-zA-Zd]|%(?:3d|[46][a-fA-Fd]|[57][Aad]))|(?:%20))+|(?:O
ID|oid).(?:(?:d+)(?:.(?:d+))*))(?:(?:%0[Aa])?(?:%20)*)=(?:(?:%0[Aa
])?(?:%20)*))?(?:(?:[a-zA-Zd$-_.+!*'(),]|(?:%[a-fA-Fd]{2}))*))(?:(?
:(?:%0[Aa])?(?:%20)*)+(?:(?:%0[Aa])?(?:%20)*)(?:(?:(?:(?:(?:[a-zA-Zd
]|%(?:3d|[46][a-fA-Fd]|[57][Aad]))|(?:%20))+|(?:OID|oid).(?:(?:d+
)(?:.(?:d+))*))(?:(?:%0[Aa])?(?:%20)*)=(?:(?:%0[Aa])?(?:%20)*))?(?:(
?:[a-zA-Zd$-_.+!*'(),]|(?:%[a-fA-Fd]{2}))*)))*))*(?:(?:(?:%0[Aa])?(
?:%20)*)(?:[;,])(?:(?:%0[Aa])?(?:%20)*))?)(?:?(?:(?:(?:(?:[a-zA-Zd$\n-_.+!*'(),]|(?:%[a-fA-Fd]{2}))+)(?:,(?:(?:[a-zA-Zd$-_.+!*'(),]|(?:%
[a-fA-Fd]{2}))+))*)?)(?:?(?:base|one|sub)(?:?(?:((?:[a-zA-Zd$-_.+
!*'(),;/?:@&=]|(?:%[a-fA-Fd]{2}))+)))?)?)?)|(?:(?:z39.50[rs])://(?:(
?:(?:(?:(?:[a-zA-Zd](?:(?:[a-zA-Zd]|-)*[a-zA-Zd])?).)*(?:[a-zA-Z](
?:(?:[a-zA-Zd]|-)*[a-zA-Zd])?))|(?:(?:d+)(?:.(?:d+)){3}))(?::(?:\nd+))?)(?:/(?:(?:(?:[a-zA-Zd$-_.+!*'(),]|(?:%[a-fA-Fd]{2}))+)(?:+(?
:(?:[a-zA-Zd$-_.+!*'(),]|(?:%[a-fA-Fd]{2}))+))*(?:?(?:(?:[a-zA-Zd
$-_.+!*'(),]|(?:%[a-fA-Fd]{2}))+))?)?(?:;esn=(?:(?:[a-zA-Zd$-_.+!*
'(),]|(?:%[a-fA-Fd]{2}))+))?(?:;rs=(?:(?:[a-zA-Zd$-_.+!*'(),]|(?:%[
a-fA-Fd]{2}))+)(?:+(?:(?:[a-zA-Zd$-_.+!*'(),]|(?:%[a-fA-Fd]{2}))+
))*)?))|(?:cid:(?:(?:(?:[a-zA-Zd$-_.+!*'(),]|(?:%[a-fA-Fd]{2}))|[;?
:@&=])*))|(?:mid:(?:(?:(?:[a-zA-Zd$-_.+!*'(),]|(?:%[a-fA-Fd]{2}))|[
;?:@&=])*)(?:/(?:(?:(?:[a-zA-Zd$-_.+!*'(),]|(?:%[a-fA-Fd]{2}))|[;?:
@&=])*))?)|(?:vemmi://(?:(?:(?:(?:(?:[a-zA-Zd](?:(?:[a-zA-Zd]|-)*[a-
zA-Zd])?).)*(?:[a-zA-Z](?:(?:[a-zA-Zd]|-)*[a-zA-Zd])?))|(?:(?:d+)
(?:.(?:d+)){3}))(?::(?:d+))?)(?:/(?:(?:(?:[a-zA-Zd$-_.+!*'(),]|(?
:%[a-fA-Fd]{2}))|[/?:@&=])*)(?:(?:;(?:(?:(?:[a-zA-Zd$-_.+!*'(),]|(?
:%[a-fA-Fd]{2}))|[/?:@&])*)=(?:(?:(?:[a-zA-Zd$-_.+!*'(),]|(?:%[a-fA
-Fd]{2}))|[/?:@&])*))*))?)|(?:imap://(?:(?:(?:(?:(?:(?:(?:[a-zA-Zd$\n-_.+!*'(),]|(?:%[a-fA-Fd]{2}))|[&=~])+)(?:(?:;[Aa][Uu][Tt][Hh]=(?:*|
(?:(?:(?:[a-zA-Zd$-_.+!*'(),]|(?:%[a-fA-Fd]{2}))|[&=~])+))))?)|(?:(
?:;[Aa][Uu][Tt][Hh]=(?:*|(?:(?:(?:[a-zA-Zd$-_.+!*'(),]|(?:%[a-fA-F\nd]{2}))|[&=~])+)))(?:(?:(?:(?:[a-zA-Zd$-_.+!*'(),]|(?:%[a-fA-Fd]{2}
))|[&=~])+))?))@)?(?:(?:(?:(?:(?:[a-zA-Zd](?:(?:[a-zA-Zd]|-)*[a-zA-Z
d])?).)*(?:[a-zA-Z](?:(?:[a-zA-Zd]|-)*[a-zA-Zd])?))|(?:(?:d+)(?:\n.(?:d+)){3}))(?::(?:d+))?))/(?:(?:(?:(?:(?:(?:[a-zA-Zd$-_.+!*'(),]
|(?:%[a-fA-Fd]{2}))|[&=~:@/])+)?;[Tt][Yy][Pp][Ee]=(?:[Ll](?:[Ii][Ss][
Tt]|[Ss][Uu][Bb])))|(?:(?:(?:(?:[a-zA-Zd$-_.+!*'(),]|(?:%[a-fA-Fd]{
2}))|[&=~:@/])+)(?:?(?:(?:(?:[a-zA-Zd$-_.+!*'(),]|(?:%[a-fA-Fd]{2}
))|[&=~:@/])+))?(?:(?:;[Uu][Ii][Dd][Vv][Aa][Ll][Ii][Dd][Ii][Tt][Yy]=(?
:[1-9]d*)))?)|(?:(?:(?:(?:[a-zA-Zd$-_.+!*'(),]|(?:%[a-fA-Fd]{2}))|
[&=~:@/])+)(?:(?:;[Uu][Ii][Dd][Vv][Aa][Ll][Ii][Dd][Ii][Tt][Yy]=(?:[1-9
]d*)))?(?:/;[Uu][Ii][Dd]=(?:[1-9]d*))(?:(?:/;[Ss][Ee][Cc][Tt][Ii][Oo
][Nn]=(?:(?:(?:[a-zA-Zd$-_.+!*'(),]|(?:%[a-fA-Fd]{2}))|[&=~:@/])+))
)?)))?)|(?:nfs:(?:(?://(?:(?:(?:(?:(?:[a-zA-Zd](?:(?:[a-zA-Zd]|-)*[a
-zA-Zd])?).)*(?:[a-zA-Z](?:(?:[a-zA-Zd]|-)*[a-zA-Zd])?))|(?:(?:d+
)(?:.(?:d+)){3}))(?::(?:d+))?)(?:(?:/(?:(?:(?:(?:(?:[a-zA-Zd$-_.
!~*'(),])|(?:%[a-fA-Fd]{2})|[:@&=+])*)(?:/(?:(?:(?:[a-zA-Zd$-_.!~*
'(),])|(?:%[a-fA-Fd]{2})|[:@&=+])*))*)?)))?)|(?:/(?:(?:(?:(?:(?:[a-zA
-Zd$-_.!~*'(),])|(?:%[a-fA-Fd]{2})|[:@&=+])*)(?:/(?:(?:(?:[a-zA-Z\nd$-_.!~*'(),])|(?:%[a-fA-Fd]{2})|[:@&=+])*))*)?))|(?:(?:(?:(?:(?:[a
-zA-Zd$-_.!~*'(),])|(?:%[a-fA-Fd]{2})|[:@&=+])*)(?:/(?:(?:(?:[a-zA
-Zd$-_.!~*'(),])|(?:%[a-fA-Fd]{2})|[:@&=+])*))*)?)))

考虑到它的复杂性,我认为你应该采用 urlparse 方式。

为了完整起见,这里是上述正则表达式的伪 BNF(作为文档):

;URL 的通用形式为:

genericurl = scheme“:”schemepart

; 这里定义了特定的预定义方案;新方案
;可能已在 IANA 注册

url = httpurl | ftpurl | newsurl |
                 nntpurl | telneturl | gopherurl |
                 瓦苏尔 |邮件旅游 |文件网址 |
                 prosperourl | 其他网址

;新方案遵循一般语法
otherurl = genericurl

;该方案为小写;解释器应使用忽略大小写
方案 = 1*[ 低字母 | 数字 | “+” | “-” | “。” ]
schemepart = *xchar | ip-schemepart


; 基于 ip 的协议的 URL 方案部分:

ip-schemepart =“//”登录[“/”urlpath]

登录名 = [用户 [“:”密码]“@”]主机端口
主机端口 = 主机 [ “:” 端口 ]
主机 = 主机名 | 主机号
主机名 = *[ 域名标签 "." ] 顶部标签
域名标签 = 字母数字 |字母数字 *[ 字母数字 | “-”] 字母数字
顶部标签 = 阿尔法 |阿尔法 *[ 阿尔法数字 | “-”] 字母数字
字母数字 = 字母 | 数字
主机号 = 数字“。”数字“。”数字“。”数字
端口 = 数字
用户 = *[ uchar | “;” | “?” | “&”| “=”]
密码 = *[ uchar | “;” | “?” | “&”| “=”]
urlpath = *xchar ; 取决于协议,参见第 3.1 节

;预定义的方案:

;FTP(另请参阅 RFC959)

ftpurl = "ftp://" 登录 [ "/" fpath [ ";type=" ftptype ]]
fpath = fsegment *[ “/” fsegment ]
fsegment = *[ uchar | “?” | “:”| “@”| “&”| “=”]
ftptype =“A”|“I”|“D”|“a”|“i”|“d”

; 文件

fileurl = "file://" [ 主机 | "localhost" ] "/" fpath

;HTTP

httpurl = "http://" 主机端口 [ "/" hpath [ "?” 搜索 ]]
hpath = hsegment *[ “/” hsegment ]
hsegment = *[ uchar | “;” | “:”| “@”| “&”| “=”]
搜索 = *[ uchar | “;” | “:” | “@” | “&”| “=”]

; GOPHER(另请参阅 RFC1436)

gopherurl = “gopher://” 主机端口 [ / [ gtype [ 选择器
                 [ "%09" 搜索 [ "%09" gopher+_string ] ] ] ] ]
gtype=xchar
选择器 = *xchar
gopher+_string = *xchar

;MAILTO(另请参阅 RFC822)

mailtourl =“mailto:”encoded822addr
coded822addr = 1*xchar ; 在 RFC822 中进一步定义

;新闻(另请参阅 RFC1036)

newsurl =“新闻:”grouppart
grouppart =“*”| 组| 文章
组 = alpha *[alpha | 数字 | "-" | "." | "+" | "_" ]
文章 = 1*[ uchar | “;” | “/” | “?” | “:”| “&”| “=”]“@”主机

;NNTP(另请参阅 RFC977)

nntpurl = “nntp://”主机端口“/”组[“/”数字]

;远程登录

telneturl =“telnet://”登录[“/”]

;WAIS(另请参阅 RFC1625)

waisurl = waisdatabase | waisindex | waisdoc
waisdatabase = “wais://”主机端口“/”数据库
waisindex = “wais://”主机端口“/”数据库“?”搜索
waisdoc = “wais://”主机端口“/”数据库“/”wtype“/”wpath
数据库 = *uchar
wtype = *uchar
wpath = *uchar

;普洛斯彼罗

prosperourl = "prospero://" 主机端口 "/" ppath *[ fieldspec ]
ppath = psegment *[ “/” psegment ]
psegment = *[ uchar | “?” | “:”| “@” | “&”| “=”]
fieldspec = ";" fieldname "=" fieldvalue
字段名 = *[ uchar | “?” | “:” | “@” | “&”]
字段值 = *[ uchar | “?” | “:” | “@” | “&”]

;杂项定义

lowalpha = “a”|“b”|“c”|“d”|“e”|“f”|“g”|“h”|
                 “我” | “j” | “k” | “l” | “m” | “n” | “o” | “p”
                 “q” | “r” | “s” | “t” | “u” | “v” | “w” | “x”
                 “y” | “z”
hialpha =“ A”|“ B”|“ C”|“ D”|“ E”|“ F”|“ G”|“ H”|“ I”|
                 “J” | “K” | “L” | “M” | “N” | “O” | “P” | “Q” | “R”
                 “S” | “T” | “U” | “V” | “W” | “X” | “Y” | “Z”
alpha = 低alpha | 高alpha
数字 = “0” | “1” | “2” | “3” | “4” | “5” | “6” | “7”|
                 “8” | “9”
安全 = “$” | “-” | “_” | “。” | “+”
额外 = “!” | “*” | “'” | “(” | “)” | “,”
国家 = “{” | “}” | “|” | “” | “^” | “~” | “[” | “]” | “`”
标点符号 = “” | “#” | “%” |


保留 = “;” | “/” | “?” | “:” | “@” | “&” | “=”
十六进制 = 数字 | “A” | “B” | “C” | “D” | “E” | “F” |
                 “a” | “b” | “c” | “d” | “e” | “f”
转义 = “%”十六进制十六进制

未保留 = 字母 | 数字 | 安全 | 额外
uchar = 未保留 | 转义
xchar = 未保留 | 保留 | 转义
数字 = 1*数字

解决方案 2:

解析(和验证)URL 的一个简单方法是使用urlparse(py2、py3)模块。

正则表达式的工作量太大了。


没有“验证”方法,因为几乎任何东西都是有效的 URL。有一些标点符号规则可以将其拆分。没有任何标点符号,您仍然拥有有效的 URL。

仔细检查 RFC,看看是否可以构造“无效”的 URL。规则非常灵活。

例如:::::是一个有效的 URL。路径是":::::"。这是一个相当愚蠢的文件名,但却是一个有效的文件名。

此外,/////是有效的 URL。netloc(“主机名”)是""。路径是"///"。同样很愚蠢。也是有效的。此 URL 标准化为"///"是等效的。

类似的事情"bad://///worse/////"完全有效。虽然愚蠢,但有效。

底线。分析它,并查看各个部分,看看它们是否在某些方面令人不快。

您是否希望方案始终为“http”?您是否希望 netloc 始终为“www.somename.somedomain”?您是否希望路径看起来像 unix?还是像 windows?您是否要删除查询字符串?还是保留它?

这些不是 RFC 指定的验证。这些是您的应用程序独有的验证。

解决方案 3:

我正在使用Django使用的那个,它似乎运行得很好:

def is_valid_url(url):
    import re
    regex = re.compile(
        r'^https?://'  # http:// or https://
        r'(?:(?:[A-Z0-9](?:[A-Z0-9-]{0,61}[A-Z0-9])?.)+[A-Z]{2,6}.?|'  # domain...
        r'localhost|'  # localhost...
        r'd{1,3}.d{1,3}.d{1,3}.d{1,3})' # ...or ip
        r'(?::d+)?'  # optional port
        r'(?:/?|[/?]S+)$', re.IGNORECASE)
    return url is not None and regex.search(url)

您可以随时在此处查看最新版本:https://github.com/django/django/blob/master/django/core/validators.py#L74

解决方案 4:

我承认,我觉得你的正则表达式完全不可理解。我想知道你是否可以使用 urlparse 来代替?例如:

pieces = urlparse.urlparse(url)
assert all([pieces.scheme, pieces.netloc])
assert set(pieces.netloc) <= set(string.letters + string.digits + '-.')  # and others?
assert pieces.scheme in ['http', 'https', 'ftp']  # etc.

它可能会比较慢,也许你会错过条件,但(对我而言)它比URL 的正则表达式更容易阅读和调试。

解决方案 5:

如今,如果你在 Python 中使用 URL,90% 的情况下你可能会使用 python-requests。因此,这里的问题是 - 为什么不重用请求中的 URL 验证?

from requests.models import PreparedRequest
import requests.exceptions


def check_url(url):
    prepared_request = PreparedRequest()
    try:
        prepared_request.prepare_url(url, None)
        return prepared_request.url
    except requests.exceptions.MissingSchema, e:
        raise SomeException

特征:

  • 不要重新发明轮子

  • 干燥

  • 离线工作

  • 最少的资源

解决方案 6:

urlparse很高兴接受无效的 URL,它更像是一个字符串拆分库,而不是任何类型的验证器。例如:

from urlparse import urlparse
urlparse('http://----')
# returns: ParseResult(scheme='http', netloc='----', path='', params='', query='', fragment='')

根据情况来看,这可能没问题。

如果您基本信任数据,并且只想验证协议是否为 HTTP,那么urlparse就很完美了。

如果你想让 URL 实际上是合法的 URL,请使用荒谬的正则表达式

如果你想确定这是一个真实的网址,

import urllib
try:
    urllib.urlopen(url)
except IOError:
    print "Not a real URL"

解决方案 7:

http://pypi.python.org/pypi/rfc3987给出了与 RFC 3986 和 RFC 3987 中的规则一致的正则表达式(即,不符合特定于方案的规则)。

IRI_reference 的正则表达式为:

(?P<scheme>[a-zA-Z][a-zA-Z0-9+.-]*):(?://(?P<iauthority>(?:(?P<iuserinfo>(?:(?:[
a-zA-Z0-9._~-]|[xa0-/ud7ff/uf900-/ufdcf/ufdf0-/uffefU00010000-U0001fffdU0002
0000-U0002fffdU00030000-U0003fffdU00040000-U0004fffdU00050000-U0005fffdU
00060000-U0006fffdU00070000-U0007fffdU00080000-U0008fffdU00090000-U0009ff
fdU000a0000-U000afffdU000b0000-U000bfffdU000c0000-U000cfffdU000d0000-U00
0dfffdU000e1000-U000efffd])|%[0-9A-F][0-9A-F]|[!$&'()*+,;=]|:)*)@)?(?P<ihost>\n[(?:(?:[0-9A-F]{1,4}:){6}(?:[0-9A-F]{1,4}:[0-9A-F]{1,4}|(?:(?:(?:25[0-5]|2[0-4]
[0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)))|::(?:[0
-9A-F]{1,4}:){5}(?:[0-9A-F]{1,4}:[0-9A-F]{1,4}|(?:(?:(?:25[0-5]|2[0-4][0-9]|[01]
?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)))|[0-9A-F]{1,4}?::(
?:[0-9A-F]{1,4}:){4}(?:[0-9A-F]{1,4}:[0-9A-F]{1,4}|(?:(?:(?:25[0-5]|2[0-4][0-9]|
[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)))|(?:(?:[0-9A-F
]{1,4}:)?[0-9A-F]{1,4})?::(?:[0-9A-F]{1,4}:){3}(?:[0-9A-F]{1,4}:[0-9A-F]{1,4}|(?
:(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[
0-9][0-9]?)))|(?:(?:[0-9A-F]{1,4}:){,2}[0-9A-F]{1,4})?::(?:[0-9A-F]{1,4}:){2}(?:
[0-9A-F]{1,4}:[0-9A-F]{1,4}|(?:(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3
}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)))|(?:(?:[0-9A-F]{1,4}:){,3}[0-9A-F]{1,
4})?::(?:[0-9A-F]{1,4}:)(?:[0-9A-F]{1,4}:[0-9A-F]{1,4}|(?:(?:(?:25[0-5]|2[0-4][0
-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)))|(?:(?:[0-
9A-F]{1,4}:){,4}[0-9A-F]{1,4})?::(?:[0-9A-F]{1,4}:[0-9A-F]{1,4}|(?:(?:(?:25[0-5]
|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)))|
(?:(?:[0-9A-F]{1,4}:){,5}[0-9A-F]{1,4})?::[0-9A-F]{1,4}|(?:(?:[0-9A-F]{1,4}:){,6
}[0-9A-F]{1,4})?::|v[0-9A-F]+\\.(?:[a-zA-Z0-9_.~-]|[!$&'()*+,;=]|:)+)\\]|(?:(?:(
?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][
0-9]?))|(?:(?:[a-zA-Z0-9._~-]|[xa0-/ud7ff/uf900-/ufdcf/ufdf0-/uffefU00010000-\nU0001fffdU00020000-U0002fffdU00030000-U0003fffdU00040000-U0004fffdU000500
00-U0005fffdU00060000-U0006fffdU00070000-U0007fffdU00080000-U0008fffdU00
090000-U0009fffdU000a0000-U000afffdU000b0000-U000bfffdU000c0000-U000cfffd
U000d0000-U000dfffdU000e1000-U000efffd])|%[0-9A-F][0-9A-F]|[!$&'()*+,;=])*)(
?::(?P<port>[0-9]*))?)(?P<ipath>(?:/(?:(?:[a-zA-Z0-9._~-]|[xa0-/ud7ff/uf900-/uf
dcf/ufdf0-/uffefU00010000-U0001fffdU00020000-U0002fffdU00030000-U0003fffd\nU00040000-U0004fffdU00050000-U0005fffdU00060000-U0006fffdU00070000-U0007f
ffdU00080000-U0008fffdU00090000-U0009fffdU000a0000-U000afffdU000b0000-U0
00bfffdU000c0000-U000cfffdU000d0000-U000dfffdU000e1000-U000efffd])|%[0-9A-
F][0-9A-F]|[!$&'()*+,;=]|:|@)*)*)|(?P<ipath>/(?:(?:(?:[a-zA-Z0-9._~-]|[xa0-/ud7
ff/uf900-/ufdcf/ufdf0-/uffefU00010000-U0001fffdU00020000-U0002fffdU00030000
-U0003fffdU00040000-U0004fffdU00050000-U0005fffdU00060000-U0006fffdU0007
0000-U0007fffdU00080000-U0008fffdU00090000-U0009fffdU000a0000-U000afffdU
000b0000-U000bfffdU000c0000-U000cfffdU000d0000-U000dfffdU000e1000-U000eff
fd])|%[0-9A-F][0-9A-F]|[!$&'()*+,;=]|:|@)+(?:/(?:(?:[a-zA-Z0-9._~-]|[xa0-/ud7ff
/uf900-/ufdcf/ufdf0-/uffefU00010000-U0001fffdU00020000-U0002fffdU00030000-\nU0003fffdU00040000-U0004fffdU00050000-U0005fffdU00060000-U0006fffdU000700
00-U0007fffdU00080000-U0008fffdU00090000-U0009fffdU000a0000-U000afffdU00
0b0000-U000bfffdU000c0000-U000cfffdU000d0000-U000dfffdU000e1000-U000efffd
])|%[0-9A-F][0-9A-F]|[!$&'()*+,;=]|:|@)*)*)?)|(?P<ipath>(?:(?:[a-zA-Z0-9._~-]|[\nxa0-/ud7ff/uf900-/ufdcf/ufdf0-/uffefU00010000-U0001fffdU00020000-U0002fffdU
00030000-U0003fffdU00040000-U0004fffdU00050000-U0005fffdU00060000-U0006ff
fdU00070000-U0007fffdU00080000-U0008fffdU00090000-U0009fffdU000a0000-U00
0afffdU000b0000-U000bfffdU000c0000-U000cfffdU000d0000-U000dfffdU000e1000-
U000efffd])|%[0-9A-F][0-9A-F]|[!$&'()*+,;=]|:|@)+(?:/(?:(?:[a-zA-Z0-9._~-]|[xa
0-/ud7ff/uf900-/ufdcf/ufdf0-/uffefU00010000-U0001fffdU00020000-U0002fffdU00
030000-U0003fffdU00040000-U0004fffdU00050000-U0005fffdU00060000-U0006fffd
U00070000-U0007fffdU00080000-U0008fffdU00090000-U0009fffdU000a0000-U000a
fffdU000b0000-U000bfffdU000c0000-U000cfffdU000d0000-U000dfffdU000e1000-U
000efffd])|%[0-9A-F][0-9A-F]|[!$&'()*+,;=]|:|@)*)*)|(?P<ipath>))(?:\\?(?P<iquery
>(?:(?:(?:[a-zA-Z0-9._~-]|[xa0-/ud7ff/uf900-/ufdcf/ufdf0-/uffefU00010000-U000
1fffdU00020000-U0002fffdU00030000-U0003fffdU00040000-U0004fffdU00050000-\nU0005fffdU00060000-U0006fffdU00070000-U0007fffdU00080000-U0008fffdU000900
00-U0009fffdU000a0000-U000afffdU000b0000-U000bfffdU000c0000-U000cfffdU00
0d0000-U000dfffdU000e1000-U000efffd])|%[0-9A-F][0-9A-F]|[!$&'()*+,;=]|:|@)|[\nue000-/uf8ffU000f0000-U000ffffdU00100000-U0010fffd]|/|\\?)*))?(?:\\#(?P<ifra
gment>(?:(?:(?:[a-zA-Z0-9._~-]|[xa0-/ud7ff/uf900-/ufdcf/ufdf0-/uffefU00010000-
U0001fffdU00020000-U0002fffdU00030000-U0003fffdU00040000-U0004fffdU00050
000-U0005fffdU00060000-U0006fffdU00070000-U0007fffdU00080000-U0008fffdU0
0090000-U0009fffdU000a0000-U000afffdU000b0000-U000bfffdU000c0000-U000cfff
dU000d0000-U000dfffdU000e1000-U000efffd])|%[0-9A-F][0-9A-F]|[!$&'()*+,;=]|:|
@)|/|\\?)*))?|(?:(?://(?P<iauthority>(?:(?P<iuserinfo>(?:(?:[a-zA-Z0-9._~-]|[xa
0-/ud7ff/uf900-/ufdcf/ufdf0-/uffefU00010000-U0001fffdU00020000-U0002fffdU00
030000-U0003fffdU00040000-U0004fffdU00050000-U0005fffdU00060000-U0006fffd
U00070000-U0007fffdU00080000-U0008fffdU00090000-U0009fffdU000a0000-U000a
fffdU000b0000-U000bfffdU000c0000-U000cfffdU000d0000-U000dfffdU000e1000-U
000efffd])|%[0-9A-F][0-9A-F]|[!$&'()*+,;=]|:)*)@)?(?P<ihost>\\[(?:(?:[0-9A-F]{1,
4}:){6}(?:[0-9A-F]{1,4}:[0-9A-F]{1,4}|(?:(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-
9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)))|::(?:[0-9A-F]{1,4}:){5}(?:
[0-9A-F]{1,4}:[0-9A-F]{1,4}|(?:(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3
}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)))|[0-9A-F]{1,4}?::(?:[0-9A-F]{1,4}:){4
}(?:[0-9A-F]{1,4}:[0-9A-F]{1,4}|(?:(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\
.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)))|(?:(?:[0-9A-F]{1,4}:)?[0-9A-F]{1
,4})?::(?:[0-9A-F]{1,4}:){3}(?:[0-9A-F]{1,4}:[0-9A-F]{1,4}|(?:(?:(?:25[0-5]|2[0-
4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)))|(?:(?
:[0-9A-F]{1,4}:){,2}[0-9A-F]{1,4})?::(?:[0-9A-F]{1,4}:){2}(?:[0-9A-F]{1,4}:[0-9A
-F]{1,4}|(?:(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][
0-9]|[01]?[0-9][0-9]?)))|(?:(?:[0-9A-F]{1,4}:){,3}[0-9A-F]{1,4})?::(?:[0-9A-F]{1
,4}:)(?:[0-9A-F]{1,4}:[0-9A-F]{1,4}|(?:(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]
?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)))|(?:(?:[0-9A-F]{1,4}:){,4}[0-
9A-F]{1,4})?::(?:[0-9A-F]{1,4}:[0-9A-F]{1,4}|(?:(?:(?:25[0-5]|2[0-4][0-9]|[01]?[
0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)))|(?:(?:[0-9A-F]{1,4}
:){,5}[0-9A-F]{1,4})?::[0-9A-F]{1,4}|(?:(?:[0-9A-F]{1,4}:){,6}[0-9A-F]{1,4})?::|
v[0-9A-F]+\\.(?:[a-zA-Z0-9_.~-]|[!$&'()*+,;=]|:)+)\\]|(?:(?:(?:25[0-5]|2[0-4][0-
9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))|(?:(?:[a-zA
-Z0-9._~-]|[xa0-/ud7ff/uf900-/ufdcf/ufdf0-/uffefU00010000-U0001fffdU00020000
-U0002fffdU00030000-U0003fffdU00040000-U0004fffdU00050000-U0005fffdU0006
0000-U0006fffdU00070000-U0007fffdU00080000-U0008fffdU00090000-U0009fffdU
000a0000-U000afffdU000b0000-U000bfffdU000c0000-U000cfffdU000d0000-U000dff
fdU000e1000-U000efffd])|%[0-9A-F][0-9A-F]|[!$&'()*+,;=])*)(?::(?P<port>[0-9]*)
)?)(?P<ipath>(?:/(?:(?:[a-zA-Z0-9._~-]|[xa0-/ud7ff/uf900-/ufdcf/ufdf0-/uffefU0
0010000-U0001fffdU00020000-U0002fffdU00030000-U0003fffdU00040000-U0004fff
dU00050000-U0005fffdU00060000-U0006fffdU00070000-U0007fffdU00080000-U000
8fffdU00090000-U0009fffdU000a0000-U000afffdU000b0000-U000bfffdU000c0000-\nU000cfffdU000d0000-U000dfffdU000e1000-U000efffd])|%[0-9A-F][0-9A-F]|[!$&'()*
+,;=]|:|@)*)*)|(?P<ipath>/(?:(?:(?:[a-zA-Z0-9._~-]|[xa0-/ud7ff/uf900-/ufdcf/ufd
f0-/uffefU00010000-U0001fffdU00020000-U0002fffdU00030000-U0003fffdU000400
00-U0004fffdU00050000-U0005fffdU00060000-U0006fffdU00070000-U0007fffdU00
080000-U0008fffdU00090000-U0009fffdU000a0000-U000afffdU000b0000-U000bfffd
U000c0000-U000cfffdU000d0000-U000dfffdU000e1000-U000efffd])|%[0-9A-F][0-9A
-F]|[!$&'()*+,;=]|:|@)+(?:/(?:(?:[a-zA-Z0-9._~-]|[xa0-/ud7ff/uf900-/ufdcf/ufdf0
-/uffefU00010000-U0001fffdU00020000-U0002fffdU00030000-U0003fffdU00040000
-U0004fffdU00050000-U0005fffdU00060000-U0006fffdU00070000-U0007fffdU0008
0000-U0008fffdU00090000-U0009fffdU000a0000-U000afffdU000b0000-U000bfffdU
000c0000-U000cfffdU000d0000-U000dfffdU000e1000-U000efffd])|%[0-9A-F][0-9A-F
]|[!$&'()*+,;=]|:|@)*)*)?)|(?P<ipath>(?:(?:[a-zA-Z0-9._~-]|[xa0-/ud7ff/uf900-/u
fdcf/ufdf0-/uffefU00010000-U0001fffdU00020000-U0002fffdU00030000-U0003fffd
U00040000-U0004fffdU00050000-U0005fffdU00060000-U0006fffdU00070000-U0007
fffdU00080000-U0008fffdU00090000-U0009fffdU000a0000-U000afffdU000b0000-U
000bfffdU000c0000-U000cfffdU000d0000-U000dfffdU000e1000-U000efffd])|%[0-9A
-F][0-9A-F]|[!$&'()*+,;=]|@)+(?:/(?:(?:[a-zA-Z0-9._~-]|[xa0-/ud7ff/uf900-/ufdcf
/ufdf0-/uffefU00010000-U0001fffdU00020000-U0002fffdU00030000-U0003fffdU00
040000-U0004fffdU00050000-U0005fffdU00060000-U0006fffdU00070000-U0007fffd
U00080000-U0008fffdU00090000-U0009fffdU000a0000-U000afffdU000b0000-U000b
fffdU000c0000-U000cfffdU000d0000-U000dfffdU000e1000-U000efffd])|%[0-9A-F][
0-9A-F]|[!$&'()*+,;=]|:|@)*)*)|(?P<ipath>))(?:\\?(?P<iquery>(?:(?:(?:[a-zA-Z0-9.
_~-]|[xa0-/ud7ff/uf900-/ufdcf/ufdf0-/uffefU00010000-U0001fffdU00020000-U000
2fffdU00030000-U0003fffdU00040000-U0004fffdU00050000-U0005fffdU00060000-\nU0006fffdU00070000-U0007fffdU00080000-U0008fffdU00090000-U0009fffdU000a00
00-U000afffdU000b0000-U000bfffdU000c0000-U000cfffdU000d0000-U000dfffdU00
0e1000-U000efffd])|%[0-9A-F][0-9A-F]|[!$&'()*+,;=]|:|@)|[/ue000-/uf8ffU000f000
0-U000ffffdU00100000-U0010fffd]|/|\\?)*))?(?:\\#(?P<ifragment>(?:(?:(?:[a-zA-
Z0-9._~-]|[xa0-/ud7ff/uf900-/ufdcf/ufdf0-/uffefU00010000-U0001fffdU00020000-
U0002fffdU00030000-U0003fffdU00040000-U0004fffdU00050000-U0005fffdU00060
000-U0006fffdU00070000-U0007fffdU00080000-U0008fffdU00090000-U0009fffdU0
00a0000-U000afffdU000b0000-U000bfffdU000c0000-U000cfffdU000d0000-U000dfff
dU000e1000-U000efffd])|%[0-9A-F][0-9A-F]|[!$&'()*+,;=]|:|@)|/|\\?)*))?)

一行代码:

(?P<scheme>[a-zA-Z][a-zA-Z0-9+.-]*):(?://(?P<iauthority>(?:(?P<iuserinfo>(?:(?:[a-zA-Z0-9._~-]|[xa0-/ud7ff/uf900-/ufdcf/ufdf0-/uffefU00010000-U0001fffdU00020000-U0002fffdU00030000-U0003fffdU00040000-U0004fffdU00050000-U0005fffdU00060000-U0006fffdU00070000-U0007fffdU00080000-U0008fffdU00090000-U0009fffdU000a0000-U000afffdU000b0000-U000bfffdU000c0000-U000cfffdU000d0000-U000dfffdU000e1000-U000efffd])|%[0-9A-F][0-9A-F]|[!$&'()*+,;=]|:)*)@)?(?P<ihost>\\[(?:(?:[0-9A-F]{1,4}:){6}(?:[0-9A-F]{1,4}:[0-9A-F]{1,4}|(?:(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)))|::(?:[0-9A-F]{1,4}:){5}(?:[0-9A-F]{1,4}:[0-9A-F]{1,4}|(?:(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)))|[0-9A-F]{1,4}?::(?:[0-9A-F]{1,4}:){4}(?:[0-9A-F]{1,4}:[0-9A-F]{1,4}|(?:(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)))|(?:(?:[0-9A-F]{1,4}:)?[0-9A-F]{1,4})?::(?:[0-9A-F]{1,4}:){3}(?:[0-9A-F]{1,4}:[0-9A-F]{1,4}|(?:(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)))|(?:(?:[0-9A-F]{1,4}:){,2}[0-9A-F]{1,4})?::(?:[0-9A-F]{1,4}:){2}(?:[0-9A-F]{1,4}:[0-9A-F]{1,4}|(?:(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)))|(?:(?:[0-9A-F]{1,4}:){,3}[0-9A-F]{1,4})?::(?:[0-9A-F]{1,4}:)(?:[0-9A-F]{1,4}:[0-9A-F]{1,4}|(?:(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)))|(?:(?:[0-9A-F]{1,4}:){,4}[0-9A-F]{1,4})?::(?:[0-9A-F]{1,4}:[0-9A-F]{1,4}|(?:(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)))|(?:(?:[0-9A-F]{1,4}:){,5}[0-9A-F]{1,4})?::[0-9A-F]{1,4}|(?:(?:[0-9A-F]{1,4}:){,6}[0-9A-F]{1,4})?::|v[0-9A-F]+\\.(?:[a-zA-Z0-9_.~-]|[!$&'()*+,;=]|:)+)\\]|(?:(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))|(?:(?:[a-zA-Z0-9._~-]|[xa0-/ud7ff/uf900-/ufdcf/ufdf0-/uffefU00010000-U0001fffdU00020000-U0002fffdU00030000-U0003fffdU00040000-U0004fffdU00050000-U0005fffdU00060000-U0006fffdU00070000-U0007fffdU00080000-U0008fffdU00090000-U0009fffdU000a0000-U000afffdU000b0000-U000bfffdU000c0000-U000cfffdU000d0000-U000dfffdU000e1000-U000efffd])|%[0-9A-F][0-9A-F]|[!$&'()*+,;=])*)(?::(?P<port>[0-9]*))?)(?P<ipath>(?:/(?:(?:[a-zA-Z0-9._~-]|[xa0-/ud7ff/uf900-/ufdcf/ufdf0-/uffefU00010000-U0001fffdU00020000-U0002fffdU00030000-U0003fffdU00040000-U0004fffdU00050000-U0005fffdU00060000-U0006fffdU00070000-U0007fffdU00080000-U0008fffdU00090000-U0009fffdU000a0000-U000afffdU000b0000-U000bfffdU000c0000-U000cfffdU000d0000-U000dfffdU000e1000-U000efffd])|%[0-9A-F][0-9A-F]|[!$&'()*+,;=]|:|@)*)*)|(?P<ipath>/(?:(?:(?:[a-zA-Z0-9._~-]|[xa0-/ud7ff/uf900-/ufdcf/ufdf0-/uffefU00010000-U0001fffdU00020000-U0002fffdU00030000-U0003fffdU00040000-U0004fffdU00050000-U0005fffdU00060000-U0006fffdU00070000-U0007fffdU00080000-U0008fffdU00090000-U0009fffdU000a0000-U000afffdU000b0000-U000bfffdU000c0000-U000cfffdU000d0000-U000dfffdU000e1000-U000efffd])|%[0-9A-F][0-9A-F]|[!$&'()*+,;=]|:|@)+(?:/(?:(?:[a-zA-Z0-9._~-]|[xa0-/ud7ff/uf900-/ufdcf/ufdf0-/uffefU00010000-U0001fffdU00020000-U0002fffdU00030000-U0003fffdU00040000-U0004fffdU00050000-U0005fffdU00060000-U0006fffdU00070000-U0007fffdU00080000-U0008fffdU00090000-U0009fffdU000a0000-U000afffdU000b0000-U000bfffdU000c0000-U000cfffdU000d0000-U000dfffdU000e1000-U000efffd])|%[0-9A-F][0-9A-F]|[!$&'()*+,;=]|:|@)*)*)?)|(?P<ipath>(?:(?:[a-zA-Z0-9._~-]|[xa0-/ud7ff/uf900-/ufdcf/ufdf0-/uffefU00010000-U0001fffdU00020000-U0002fffdU00030000-U0003fffdU00040000-U0004fffdU00050000-U0005fffdU00060000-U0006fffdU00070000-U0007fffdU00080000-U0008fffdU00090000-U0009fffdU000a0000-U000afffdU000b0000-U000bfffdU000c0000-U000cfffdU000d0000-U000dfffdU000e1000-U000efffd])|%[0-9A-F][0-9A-F]|[!$&'()*+,;=]|:|@)+(?:/(?:(?:[a-zA-Z0-9._~-]|[xa0-/ud7ff/uf900-/ufdcf/ufdf0-/uffefU00010000-U0001fffdU00020000-U0002fffdU00030000-U0003fffdU00040000-U0004fffdU00050000-U0005fffdU00060000-U0006fffdU00070000-U0007fffdU00080000-U0008fffdU00090000-U0009fffdU000a0000-U000afffdU000b0000-U000bfffdU000c0000-U000cfffdU000d0000-U000dfffdU000e1000-U000efffd])|%[0-9A-F][0-9A-F]|[!$&'()*+,;=]|:|@)*)*)|(?P<ipath>))(?:\\?(?P<iquery>(?:(?:(?:[a-zA-Z0-9._~-]|[xa0-/ud7ff/uf900-/ufdcf/ufdf0-/uffefU00010000-U0001fffdU00020000-U0002fffdU00030000-U0003fffdU00040000-U0004fffdU00050000-U0005fffdU00060000-U0006fffdU00070000-U0007fffdU00080000-U0008fffdU00090000-U0009fffdU000a0000-U000afffdU000b0000-U000bfffdU000c0000-U000cfffdU000d0000-U000dfffdU000e1000-U000efffd])|%[0-9A-F][0-9A-F]|[!$&'()*+,;=]|:|@)|[/ue000-/uf8ffU000f0000-U000ffffdU00100000-U0010fffd]|/|\\?)*))?(?:\\#(?P<ifragment>(?:(?:(?:[a-zA-Z0-9._~-]|[xa0-/ud7ff/uf900-/ufdcf/ufdf0-/uffefU00010000-U0001fffdU00020000-U0002fffdU00030000-U0003fffdU00040000-U0004fffdU00050000-U0005fffdU00060000-U0006fffdU00070000-U0007fffdU00080000-U0008fffdU00090000-U0009fffdU000a0000-U000afffdU000b0000-U000bfffdU000c0000-U000cfffdU000d0000-U000dfffdU000e1000-U000efffd])|%[0-9A-F][0-9A-F]|[!$&'()*+,;=]|:|@)|/|\\?)*))?|(?:(?://(?P<iauthority>(?:(?P<iuserinfo>(?:(?:[a-zA-Z0-9._~-]|[xa0-/ud7ff/uf900-/ufdcf/ufdf0-/uffefU00010000-U0001fffdU00020000-U0002fffdU00030000-U0003fffdU00040000-U0004fffdU00050000-U0005fffdU00060000-U0006fffdU00070000-U0007fffdU00080000-U0008fffdU00090000-U0009fffdU000a0000-U000afffdU000b0000-U000bfffdU000c0000-U000cfffdU000d0000-U000dfffdU000e1000-U000efffd])|%[0-9A-F][0-9A-F]|[!$&'()*+,;=]|:)*)@)?(?P<ihost>\\[(?:(?:[0-9A-F]{1,4}:){6}(?:[0-9A-F]{1,4}:[0-9A-F]{1,4}|(?:(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)))|::(?:[0-9A-F]{1,4}:){5}(?:[0-9A-F]{1,4}:[0-9A-F]{1,4}|(?:(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)))|[0-9A-F]{1,4}?::(?:[0-9A-F]{1,4}:){4}(?:[0-9A-F]{1,4}:[0-9A-F]{1,4}|(?:(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)))|(?:(?:[0-9A-F]{1,4}:)?[0-9A-F]{1,4})?::(?:[0-9A-F]{1,4}:){3}(?:[0-9A-F]{1,4}:[0-9A-F]{1,4}|(?:(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)))|(?:(?:[0-9A-F]{1,4}:){,2}[0-9A-F]{1,4})?::(?:[0-9A-F]{1,4}:){2}(?:[0-9A-F]{1,4}:[0-9A-F]{1,4}|(?:(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)))|(?:(?:[0-9A-F]{1,4}:){,3}[0-9A-F]{1,4})?::(?:[0-9A-F]{1,4}:)(?:[0-9A-F]{1,4}:[0-9A-F]{1,4}|(?:(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)))|(?:(?:[0-9A-F]{1,4}:){,4}[0-9A-F]{1,4})?::(?:[0-9A-F]{1,4}:[0-9A-F]{1,4}|(?:(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)))|(?:(?:[0-9A-F]{1,4}:){,5}[0-9A-F]{1,4})?::[0-9A-F]{1,4}|(?:(?:[0-9A-F]{1,4}:){,6}[0-9A-F]{1,4})?::|v[0-9A-F]+\\.(?:[a-zA-Z0-9_.~-]|[!$&'()*+,;=]|:)+)\\]|(?:(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))|(?:(?:[a-zA-Z0-9._~-]|[xa0-/ud7ff/uf900-/ufdcf/ufdf0-/uffefU00010000-U0001fffdU00020000-U0002fffdU00030000-U0003fffdU00040000-U0004fffdU00050000-U0005fffdU00060000-U0006fffdU00070000-U0007fffdU00080000-U0008fffdU00090000-U0009fffdU000a0000-U000afffdU000b0000-U000bfffdU000c0000-U000cfffdU000d0000-U000dfffdU000e1000-U000efffd])|%[0-9A-F][0-9A-F]|[!$&'()*+,;=])*)(?::(?P<port>[0-9]*))?)(?P<ipath>(?:/(?:(?:[a-zA-Z0-9._~-]|[xa0-/ud7ff/uf900-/ufdcf/ufdf0-/uffefU00010000-U0001fffdU00020000-U0002fffdU00030000-U0003fffdU00040000-U0004fffdU00050000-U0005fffdU00060000-U0006fffdU00070000-U0007fffdU00080000-U0008fffdU00090000-U0009fffdU000a0000-U000afffdU000b0000-U000bfffdU000c0000-U000cfffdU000d0000-U000dfffdU000e1000-U000efffd])|%[0-9A-F][0-9A-F]|[!$&'()*+,;=]|:|@)*)*)|(?P<ipath>/(?:(?:(?:[a-zA-Z0-9._~-]|[xa0-/ud7ff/uf900-/ufdcf/ufdf0-/uffefU00010000-U0001fffdU00020000-U0002fffdU00030000-U0003fffdU00040000-U0004fffdU00050000-U0005fffdU00060000-U0006fffdU00070000-U0007fffdU00080000-U0008fffdU00090000-U0009fffdU000a0000-U000afffdU000b0000-U000bfffdU000c0000-U000cfffdU000d0000-U000dfffdU000e1000-U000efffd])|%[0-9A-F][0-9A-F]|[!$&'()*+,;=]|:|@)+(?:/(?:(?:[a-zA-Z0-9._~-]|[xa0-/ud7ff/uf900-/ufdcf/ufdf0-/uffefU00010000-U0001fffdU00020000-U0002fffdU00030000-U0003fffdU00040000-U0004fffdU00050000-U0005fffdU00060000-U0006fffdU00070000-U0007fffdU00080000-U0008fffdU00090000-U0009fffdU000a0000-U000afffdU000b0000-U000bfffdU000c0000-U000cfffdU000d0000-U000dfffdU000e1000-U000efffd])|%[0-9A-F][0-9A-F]|[!$&'()*+,;=]|:|@)*)*)?)|(?P<ipath>(?:(?:[a-zA-Z0-9._~-]|[xa0-/ud7ff/uf900-/ufdcf/ufdf0-/uffefU00010000-U0001fffdU00020000-U0002fffdU00030000-U0003fffdU00040000-U0004fffdU00050000-U0005fffdU00060000-U0006fffdU00070000-U0007fffdU00080000-U0008fffdU00090000-U0009fffdU000a0000-U000afffdU000b0000-U000bfffdU000c0000-U000cfffdU000d0000-U000dfffdU000e1000-U000efffd])|%[0-9A-F][0-9A-F]|[!$&'()*+,;=]|@)+(?:/(?:(?:[a-zA-Z0-9._~-]|[xa0-/ud7ff/uf900-/ufdcf/ufdf0-/uffefU00010000-U0001fffdU00020000-U0002fffdU00030000-U0003fffdU00040000-U0004fffdU00050000-U0005fffdU00060000-U0006fffdU00070000-U0007fffdU00080000-U0008fffdU00090000-U0009fffdU000a0000-U000afffdU000b0000-U000bfffdU000c0000-U000cfffdU000d0000-U000dfffdU000e1000-U000efffd])|%[0-9A-F][0-9A-F]|[!$&'()*+,;=]|:|@)*)*)|(?P<ipath>))(?:\\?(?P<iquery>(?:(?:(?:[a-zA-Z0-9._~-]|[xa0-/ud7ff/uf900-/ufdcf/ufdf0-/uffefU00010000-U0001fffdU00020000-U0002fffdU00030000-U0003fffdU00040000-U0004fffdU00050000-U0005fffdU00060000-U0006fffdU00070000-U0007fffdU00080000-U0008fffdU00090000-U0009fffdU000a0000-U000afffdU000b0000-U000bfffdU000c0000-U000cfffdU000d0000-U000dfffdU000e1000-U000efffd])|%[0-9A-F][0-9A-F]|[!$&'()*+,;=]|:|@)|[/ue000-/uf8ffU000f0000-U000ffffdU00100000-U0010fffd]|/|\\?)*))?(?:\\#(?P<ifragment>(?:(?:(?:[a-zA-Z0-9._~-]|[xa0-/ud7ff/uf900-/ufdcf/ufdf0-/uffefU00010000-U0001fffdU00020000-U0002fffdU00030000-U0003fffdU00040000-U0004fffdU00050000-U0005fffdU00060000-U0006fffdU00070000-U0007fffdU00080000-U0008fffdU00090000-U0009fffdU000a0000-U000afffdU000b0000-U000bfffdU000c0000-U000cfffdU000d0000-U000dfffdU000e1000-U000efffd])|%[0-9A-F][0-9A-F]|[!$&'()*+,;=]|:|@)|/|\\?)*))?)

解决方案 8:

注意- Lepl 不再维护或支持。

RFC 3696 定义了 URL 验证的“最佳实践” - http://www.faqs.org/rfcs/rfc3696.html

Lepl(一个 Python 解析器库)的最新版本包含 RFC 3696 的实现。您可以类似如下方式使用它:

from lepl.apps.rfc3696 import Email, HttpUrl

# compile the validators (do once at start of program)
valid_email = Email()
valid_http_url = HttpUrl()

# use the validators (as often as you like)
if valid_email(some_email):
    # email is ok
else:
    # email is bad
if valid_http_url(some_url):
    # url is ok
else:
    # url is bad

尽管验证器是在 Lepl(一种递归下降解析器)中定义的,但它们大部分都是在内部编译为正则表达式。这结合了两全其美的优势 - 一个(相对)易于阅读的定义,可以根据 RFC 3696 进行检查,并且实现高效。我的博客上有一篇文章展示了这如何简化解析器 - http://www.acooke.org/cute/LEPLOptimi0.html

Lepl 可在http://www.acooke.org/lepl上获取,RFC 3696 模块的文档可在http://www.acooke.org/lepl/rfc3696.html上找到

这是此版本中的全新功能,因此可能包含错误。如果您有任何问题,请联系我,我会尽快修复。谢谢。

解决方案 9:

提供的正则表达式应该与任何形式为http://www.ietf.org/rfc/rfc3986.txt的 URL 匹配;并且在 python 解释器中测试时确实如此。

您在解析时遇到困难的 URL 是什么格式的?

解决方案 10:

修改后的 django url 验证正则表达式:

import re

ul = "/u00a1-/uffff"  # Unicode letters range (must not be a raw string).

# IP patterns
ipv4_re = (
    r"(?:0|25[0-5]|2[0-4][0-9]|1[0-9]?[0-9]?|[1-9][0-9]?)"
    r"(?:.(?:0|25[0-5]|2[0-4][0-9]|1[0-9]?[0-9]?|[1-9][0-9]?)){3}"
)
ipv6_re = r"[[0-9a-f:.]+]"  # (simple regex, validated later)

# Host patterns
hostname_re = (
    r"[a-z" + ul + r"0-9](?:[a-z" + ul + r"0-9-]{0,61}[a-z" + ul + r"0-9])?"
)
# Max length for domain name labels is 63 characters per RFC 1034 sec. 3.1
domain_re = r"(?:.(?!-)[a-z" + ul + r"0-9-]{1,63}(?<!-))*"
tld_re = (
    r"."  # dot
    r"(?!-)"  # can't start with a dash
    r"(?:[a-z" + ul + "-]{2,63}"  # domain label
    r"|xn--[a-z0-9]{1,59})"  # or punycode label
    r"(?<!-)"  # can't end with a dash
    r".?"  # may have a trailing dot
)
host_re = "(" + hostname_re + domain_re + tld_re + "|localhost)"

regex = re.compile(
    r"^(?:http|ftp)s?://" # http(s):// or ftp(s)://
    r"(?:[^s:@/]+(?::[^s:@/]*)?@)?"  # user:pass authentication
    r"(?:" + ipv4_re + "|" + ipv6_re + "|" + host_re + ")"
    r"(?::[0-9]{1,5})?"  # port
    r"(?:[/?#][^s]*)?"  # resource path
    r"Z",
    re.IGNORECASE,
)

来源:https://github.com/django/django/blob/master/django/core/validators.py#L74

解决方案 11:

多年来我需要多次这样做,但最终总是复制别人的正则表达式,而他们对此的思考远远超出了我的想象

话虽如此,Django 表单代码中有一个正则表达式可以解决问题:

http://code.djangoproject.com/browser/django/trunk/django/forms/fields.py#L534

解决方案 12:

urlfinders = [
    re.compile("([0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}|(((news|telnet|nttp|file|http|ftp|https)://)|(www|ftp)[-A-Za-z0-9]*\\.)[-A-Za-z0-9\\.]+)(:[0-9]*)?/[-A-Za-z0-9_\\$\\.\\+\\!\\*\\(\\),;:@&=\\?/~\\#\\%]*[^]'\\.}>\\),\\\"]"),
    re.compile("([0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}|(((news|telnet|nttp|file|http|ftp|https)://)|(www|ftp)[-A-Za-z0-9]*\\.)[-A-Za-z0-9\\.]+)(:[0-9]*)?"),
    re.compile("(~/|/|\\./)([-A-Za-z0-9_\\$\\.\\+\\!\\*\\(\\),;:@&=\\?/~\\#\\%]|\\\\
)+"),
    re.compile("'\\<((mailto:)|)[-A-Za-z0-9\\.]+@[-A-Za-z0-9\\.]+"),
]

注意:虽然在浏览器中看起来很丑,但只需复制粘贴,格式就很好了

在 python 邮件列表中找到并用于 gnome-terminal

来源:http://mail.python.org/pipermail/python-list/2007-January/595436.html

解决方案 13:

简单的方法:

import re

def is_valid_url(url):
    regex = re.compile(
      r'^(?:http|ftp)s?://'  # http:// or https://
      r'(?:(?:[A-Z0-9](?:[A-Z0-9-]{0,61}[A-Z0-9])?.)+(?:[A-Z]{2,6}.?|[A-Z0-9-]{2,}.?)|'  # domain...
      r'localhost|'  # localhost...
      r'd{1,3}.d{1,3}.d{1,3}.d{1,3})'  # ...or ip
      r'(?::d+)?'  # optional port
      r'(?:/?|[/?]S+)$', re.IGNORECASE)
    return re.match(regex, url) is not None

示例

print(is_valid_url("http://www.example.com"))     # True
print(is_valid_url("https://example.com/path"))  # True
print(is_valid_url("ftp://example.com"))         # True
print(is_valid_url("://example.com"))            # False
print(is_valid_url("http:///example.com"))       # False
相关推荐
  政府信创国产化的10大政策解读一、信创国产化的背景与意义信创国产化,即信息技术应用创新国产化,是当前中国信息技术领域的一个重要发展方向。其核心在于通过自主研发和创新,实现信息技术应用的自主可控,减少对外部技术的依赖,并规避潜在的技术制裁和风险。随着全球信息技术竞争的加剧,以及某些国家对中国在科技领域的打压,信创国产化显...
工程项目管理   4027  
  为什么项目管理通常仍然耗时且低效?您是否还在反复更新电子表格、淹没在便利贴中并参加每周更新会议?这确实是耗费时间和精力。借助软件工具的帮助,您可以一目了然地全面了解您的项目。如今,国内外有足够多优秀的项目管理软件可以帮助您掌控每个项目。什么是项目管理软件?项目管理软件是广泛行业用于项目规划、资源分配和调度的软件。它使项...
项目管理软件   2755  
  本文介绍了以下10款项目管理软件工具:禅道项目管理软件、Freshdesk、ClickUp、nTask、Hubstaff、Plutio、Productive、Targa、Bonsai、Wrike。在当今快速变化的商业环境中,项目管理已成为企业成功的关键因素之一。然而,许多企业在项目管理过程中面临着诸多痛点,如任务分配不...
项目管理系统   86  
  本文介绍了以下10款项目管理软件工具:禅道项目管理软件、Monday、TeamGantt、Filestage、Chanty、Visor、Smartsheet、Productive、Quire、Planview。在当今快速变化的商业环境中,项目管理已成为企业成功的关键因素之一。然而,许多项目经理和团队在管理复杂项目时,常...
开源项目管理工具   99  
  本文介绍了以下10款项目管理软件工具:禅道项目管理软件、Smartsheet、GanttPRO、Backlog、Visor、ResourceGuru、Productive、Xebrio、Hive、Quire。在当今快节奏的商业环境中,项目管理已成为企业成功的关键因素之一。然而,许多企业在选择项目管理工具时常常面临困惑:...
项目管理系统   87  
热门文章
项目管理软件有哪些?
曾咪二维码

扫码咨询,免费领取项目管理大礼包!

云禅道AD
禅道项目管理软件

云端的项目管理软件

尊享禅道项目软件收费版功能

无需维护,随时随地协同办公

内置subversion和git源码管理

每天备份,随时转为私有部署

免费试用