如何在 asyncio 中使用请求？-IT科技

摘要：问题描述：我想在中执行并行 http 请求任务asyncio，但我发现这python-requests会阻止的事件循环asyncio。我找到了aiohttp，但它无法使用 http 代理提供 http 请求的服务。所以我想知道是否有办法借助执行异步 http 请求asyncio。解决方案 1：要将请求（...

问题描述：

我想在中执行并行 http 请求任务asyncio，但我发现这python-requests会阻止的事件循环asyncio。我找到了aiohttp，但它无法使用 http 代理提供 http 请求的服务。

所以我想知道是否有办法借助执行异步 http 请求asyncio。

解决方案 1：

要将请求（或任何其他阻塞库）与 asyncio 一起使用，您可以使用BaseEventLoop.run_in_executor在另一个线程中运行一个函数并从中获取结果。例如：

import asyncio
import requests

@asyncio.coroutine
def main():
    loop = asyncio.get_event_loop()
    future1 = loop.run_in_executor(None, requests.get, 'http://www.google.com')
    future2 = loop.run_in_executor(None, requests.get, 'http://www.google.co.uk')
    response1 = yield from future1
    response2 = yield from future2
    print(response1.text)
    print(response2.text)

asyncio.run(main())

这将同时获得两个响应。

使用 python 3.5，您可以使用新的await/async语法：

import asyncio
import requests

async def main():
    loop = asyncio.get_event_loop()
    response1 = await loop.run_in_executor(None, requests.get, 'http://www.google.com')
    response2 = await loop.run_in_executor(None, requests.get, 'http://www.google.co.uk')
    print(response1.text)
    print(response2.text)

asyncio.run(main())

更多信息请参阅PEP0492 。

解决方案 2：

aiohttp已经可以与 HTTP 代理一起使用：

import asyncio
import aiohttp


async def do_request():
    proxy_url = 'http://localhost:8118'  # your proxy address
    response = await aiohttp.request(
        'GET', 'http://google.com',
        proxy=proxy_url,
    )
    return response

loop = asyncio.get_event_loop()
loop.run_until_complete(do_request())

解决方案 3：

上面的答案仍然使用旧的 Python 3.4 样式协程。如果您使用的是 Python 3.5+，则需要编写以下内容。

aiohttp 现在支持http代理

import aiohttp
import asyncio

async def fetch(session, url):
    async with session.get(url) as response:
        return await response.text()

async def main():
    urls = [
            'http://python.org',
            'https://google.com',
            'http://yifei.me'
        ]
    tasks = []
    async with aiohttp.ClientSession() as session:
        for url in urls:
            tasks.append(fetch(session, url))
        htmls = await asyncio.gather(*tasks)
        for html in htmls:
            print(html[:100])

if __name__ == '__main__':
    loop = asyncio.get_event_loop()
    loop.run_until_complete(main())

还有httpx库，它是支持请求的替代品async/await。但是，httpx 比 aiohttp 慢一些。

另一个选项是curl_cffi，它能够模仿浏览器的ja3和http2指纹。

解决方案 4：

Requests 目前不支持asyncio，也没有计划提供此类支持。您很可能可以实现一个知道如何使用的自定义“传输适配器”（如此处asyncio所述）。

如果我有时间，我可能会真正研究一下，但我不能保证什么。

解决方案 5：

Pimin Konstantin Kefaloukos 的一篇文章《使用 Python 和 asyncio 轻松并行 HTTP 请求》中很好地介绍了 async/await 循环和线程的案例
：

为了尽量缩短总完成时间，我们可以增加线程池的大小，使其与需要发出的请求数相匹配。幸运的是，这很容易做到，我们将在下文中看到。下面的代码清单是一个示例，说明如何使用 20 个工作线程的线程池发出 20 个异步 HTTP 请求：

# Example 3: asynchronous requests with larger thread pool
import asyncio
import concurrent.futures
import requests

async def main():

    with concurrent.futures.ThreadPoolExecutor(max_workers=20) as executor:

        loop = asyncio.get_event_loop()
        futures = [
            loop.run_in_executor(
                executor, 
                requests.get, 
                'http://example.org/'
            )
            for i in range(20)
        ]
        for response in await asyncio.gather(*futures):
            pass


loop = asyncio.get_event_loop()
loop.run_until_complete(main())

解决方案 6：

考虑到 aiohttp 是一个功能齐全的 Web 框架，我建议使用更轻量级的框架，例如支持异步请求的httpx ( https://www.python-httpx.org/ )。它具有与请求几乎相同的 API：

>>> async with httpx.AsyncClient() as client:
...     r = await client.get('https://www.example.com/')
...
>>> r
<Response [200 OK]>

解决方案 7：

python-requests尚未原生支持 asyncio。使用原生支持 asyncio 的库（如httpx）将是最有益的方法。

但是，如果您的用例严重依赖于使用，python-requests您可以使用并包装同步调用asyncio.to_thread并asyncio.gather遵循异步编程模式。

import asyncio
import requests

async def main():
    res = await asyncio.gather(asyncio.to_thread(requests.get("YOUR_URL"),)

if __name__ == "__main__":
    asyncio.run(main())

对于网络请求的并发/并行化：

import asyncio
import requests

urls = ["URL_1", "URL_2"]

async def make_request(url: string):
    response = await asyncio.gather(asyncio.to_thread(requests.get(url),)
    return response

async def main():
    responses = await asyncio.gather((make_request(url) for url in urls))
    for response in responses:
        print(response)

if __name__ == "__main__":
    asyncio.run(main())

解决方案 8：

免责声明：Following code creates different threads for each function.

在某些情况下，这可能很有用，因为它使用起来更简单。但请注意，它不是异步的，而是给人一种使用多个线程异步的错觉，尽管装饰器建议这样做。

要使任何函数非阻塞，只需复制装饰器并使用回调函数作为参数来装饰任何函数。回调函数将接收从函数返回的数据。

import asyncio
import requests


def run_async(callback):
    def inner(func):
        def wrapper(*args, **kwargs):
            def __exec():
                out = func(*args, **kwargs)
                callback(out)
                return out

            return asyncio.get_event_loop().run_in_executor(None, __exec)

        return wrapper

    return inner


def _callback(*args):
    print(args)


# Must provide a callback function, callback func will be executed after the func completes execution !!
@run_async(_callback)
def get(url):
    return requests.get(url)


get("https://google.com")
print("Non blocking code ran !!")

解决方案 9：

Requests 不支持 asyncio。你可以使用aiohttp，因为 aiohttp 完全支持 asyncio，并且比 request 具有更好的性能。

或者，您可以使用传统多线程请求：

import concurrent.futures
import requests

def main():
    with concurrent.futures.ThreadPoolExecutor() as executor:
        feature1 = executor.submit(requests.get, 'http://www.google.com')
        feature2 = executor.submit(requests.get, 'http://www.google.co.uk')
        print(feature1.result().text)
        print(feature2.result().text)

main()

您可以使用loop.run_in_executor集成executor到 asyncio 中。上述代码在语义上等同于：

import asyncio
import requests

@asyncio.coroutine
def main():
    loop = asyncio.get_event_loop()
    future1 = loop.run_in_executor(None, requests.get, 'http://www.google.com')
    future2 = loop.run_in_executor(None, requests.get, 'http://www.google.co.uk')
    response1 = yield from future1
    response2 = yield from future2
    print(response1.text)
    print(response2.text)

asyncio.run(main())

通过这种方法，您可以将任何其他阻塞库与 asyncio 一起使用。

使用 Python 3.5+，你可以使用新的await/async语法：

import asyncio
import requests

async def main():
    loop = asyncio.get_event_loop()
    future1 = loop.run_in_executor(None, requests.get, 'http://www.google.com')
    future2 = loop.run_in_executor(None, requests.get, 'http://www.google.co.uk')
    print((await future1).text)
    print((await future2).text)

asyncio.run(main())

更多信息请参阅PEP 492 。

使用 Python 3.9+ 时，使用起来更加简单asyncio.to_thread：

import asyncio
import requests

async def main():
    future1 = asyncio.to_thread(requests.get, 'http://www.google.com')
    future2 = asyncio.to_thread(requests.get, 'http://www.google.co.uk')
    print((await future1).text)
    print((await future2).text)

asyncio.run(main())

asyncio.to_thread还有另一个优点：asyncio.to_thread接受关键字参数，而loop.run_in_executor不能。

请记住，以上所有代码实际上都在后台使用多线程而不是 asyncio，因此请考虑使用异步 HTTP 客户端（如aiohttp ）来实现真正的异步。