如何使用 Python 读取 URL 的内容?
- 2025-03-10 08:52:00
- admin 原创
- 42
问题描述:
当我将其粘贴到浏览器上时,效果如下:
http://www.somesite.com/details.pl?urn=2344
但是当我尝试用 Python 读取 URL 时什么也没有发生:
link = 'http://www.somesite.com/details.pl?urn=2344'
f = urllib.urlopen(link)
myfile = f.readline()
print myfile
我是否需要对 URL 进行编码,或者是否有我没有看到的内容?
解决方案 1:
回答你的问题:
import urllib.request
link = "http://www.somesite.com/details.pl?urn=2344"
f = urllib.request.urlopen(link)
myfile = f.read()
print(myfile)
你需要read()
,而不是readline()
另请参阅 Martin Thoma 或 innm 对这个问题的回答:Python 2/3 compat,Python 3
或者,requests
使用
import requests
link = "http://www.somesite.com/details.pl?urn=2344"
f = requests.get(link)
print(f.text)
解决方案 2:
对于python3
用户来说,为了节省时间,请使用以下代码,
from urllib.request import urlopen
link = "https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html"
f = urlopen(link)
myfile = f.read()
print(myfile)
我知道针对错误有不同的线程:Name Error: urlopen is not defined
,但我认为这可能会节省时间。
解决方案 3:
这些答案对于 Python 3 来说都不太好(在发布本文时已在最新版本上测试过)。
这就是你做事的方式...
import urllib.request
try:
with urllib.request.urlopen('http://www.python.org/') as f:
print(f.read().decode('utf-8'))
except urllib.error.URLError as e:
print(e.reason)
以上内容适用于返回“utf-8”的内容。如果您希望 python“猜测适当的编码”,请删除 .decode('utf-8')。
文档:
https://docs.python.org/3/library/urllib.request.html#module-urllib.request
解决方案 4:
适用于 Python 2.X 和 Python 3.X 的解决方案利用了 Python 2 和 3 兼容库six
:
from six.moves.urllib.request import urlopen
link = "http://www.somesite.com/details.pl?urn=2344"
response = urlopen(link)
content = response.read()
print(content)
解决方案 5:
#!/usr/bin/python
# -*- coding: utf-8 -*-
# Works on python 3 and python 2.
# when server knows where the request is coming from.
import sys
if sys.version_info[0] == 3:
from urllib.request import urlopen
else:
from urllib import urlopen
with urlopen('https://www.facebook.com/') as \n url:
data = url.read()
print data
# When the server does not know where the request is coming from.
# Works on python 3.
import urllib.request
user_agent = \n 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.7) Gecko/2009021910 Firefox/3.0.7'
url = 'https://www.facebook.com/'
headers = {'User-Agent': user_agent}
request = urllib.request.Request(url, None, headers)
response = urllib.request.urlopen(request)
data = response.read()
print data
解决方案 6:
我们可以读取如下的网站html内容:
from urllib.request import urlopen
response = urlopen('http://google.com/')
html = response.read()
print(html)
解决方案 7:
from urllib.request import urlopen
# if has Chinese, apply decode()
html = urlopen("https://blog.csdn.net/qq_39591494/article/details/83934260").read().decode('utf-8')
print(html)
解决方案 8:
import requests
from bs4 import BeautifulSoup
link = "https://www.timeshighereducation.com/hub/sinorbis"
res = requests.get(link)
if res.status_code == 200:
soup = BeautifulSoup(res, 'html.parser')
# get the text content of the webpage
text = soup.get_text()
print(text)
使用BeautifulSoup
HTML 解析器我们可以提取网页内容。
解决方案 9:
我使用了以下代码:
import urllib
def read_text():
quotes = urllib.urlopen("https://s3.amazonaws.com/udacity-hosted-downloads/ud036/movie_quotes.txt")
contents_file = quotes.read()
print contents_file
read_text()
解决方案 10:
# retrieving data from url
# only for python 3
import urllib.request
def main():
url = "http://docs.python.org"
# retrieving data from URL
webUrl = urllib.request.urlopen(url)
print("Result code: " + str(webUrl.getcode()))
# print data from URL
print("Returned data: -----------------")
data = webUrl.read().decode("utf-8")
print(data)
if __name__ == "__main__":
main()
解决方案 11:
URL 应该是一个字符串:
import urllib
link = "http://www.somesite.com/details.pl?urn=2344"
f = urllib.urlopen(link)
myfile = f.readline()
print myfile
相关推荐
热门文章
项目管理软件有哪些?
热门标签
曾咪二维码
扫码咨询,免费领取项目管理大礼包!
云禅道AD