python请求超时。获得完整的响应

我正在收集网站列表上的统计数据，为了简单起见，我正在使用请求。这是我的代码:

data=[]
websites=['http://google.com', 'http://bbc.co.uk']
for w in websites:
    r= requests.get(w, verify=False)
    data.append( (r.url, len(r.content), r.elapsed.total_seconds(), str([(l.status_code, l.url) for l in r.history]), str(r.headers.items()), str(r.cookies.items())) )

现在，我想要请求。10秒后进入超时，这样循环就不会卡住。

这个问题以前也很有趣，但没有一个答案是干净的。

我听说可能不使用请求是一个好主意，但我应该如何得到请求提供的好东西(元组中的那些)。

当前回答

如果你使用选项stream=True，你可以这样做:

r = requests.get(
    'http://url_to_large_file',
    timeout=1,  # relevant only for underlying socket
    stream=True)

with open('/tmp/out_file.txt'), 'wb') as f:
    start_time = time.time()
    for chunk in r.iter_content(chunk_size=1024):
        if chunk:  # filter out keep-alive new chunks
            f.write(chunk)
        if time.time() - start_time > 8:
            raise Exception('Request took longer than 8s')

该解决方案不需要信号或多处理。

2018-04-24 13:17:41

其他回答

如果遇到这种情况，创建一个看门狗线程，在10秒后搞乱请求的内部状态，例如:

关闭底层套接字，理想情况下如果请求重试该操作，则触发异常

请注意，根据系统库的不同，您可能无法设置DNS解析的截止日期。

2014-02-25 22:40:31

有一个叫做timeout-decorator的包，你可以用它让任何python函数超时。

@timeout_decorator.timeout(5)
def mytest():
    print("Start")
    for i in range(1,10):
        time.sleep(1)
        print("{} seconds have passed".format(i))

它使用这里的一些答案所建议的信号方法。或者，你可以告诉它使用多处理而不是信号(例如，如果你在多线程环境中)。

2018-09-13 19:37:46

这可能有点过分，但是芹菜分布式任务队列对超时有很好的支持。

特别是，您可以定义一个软时间限制，它只在您的流程中引发一个异常(这样您就可以清理)和/或一个硬时间限制，它在超过时间限制时终止任务。

在封面之下，这使用了与你的“之前”帖子中引用的相同的信号方法，但以一种更可用和更易于管理的方式。如果你监控的网站列表很长，你可能会从它的主要功能中受益——各种各样的方法来管理大量任务的执行。

2014-02-27 05:47:58

设置stream=True并使用r.iter_content(1024)。是的,eventlet。我就是不喜欢超时。

try:
    start = time()
    timeout = 5
    with get(config['source']['online'], stream=True, timeout=timeout) as r:
        r.raise_for_status()
        content = bytes()
        content_gen = r.iter_content(1024)
        while True:
            if time()-start > timeout:
                raise TimeoutError('Time out! ({} seconds)'.format(timeout))
            try:
                content += next(content_gen)
            except StopIteration:
                break
        data = content.decode().split('\n')
        if len(data) in [0, 1]:
            raise ValueError('Bad requests data')
except (exceptions.RequestException, ValueError, IndexError, KeyboardInterrupt,
        TimeoutError) as e:
    print(e)
    with open(config['source']['local']) as f:
        data = [line.strip() for line in f.readlines()]

讨论在这里https://redd.it/80kp1h

2018-02-28 03:28:33

更新:https://requests.readthedocs.io/en/master/user/advanced/超时

在新版本的请求:

如果你为超时指定一个单独的值，像这样:

r = requests.get('https://github.com', timeout=5)

超时值将应用于连接超时和读取超时。如果你想分别设置值，请指定一个元组:

r = requests.get('https://github.com', timeout=(3.05, 27))

如果远程服务器非常慢，您可以告诉Requests永远等待响应，方法是将None作为超时值，然后检索一杯咖啡。

r = requests.get('https://github.com', timeout=None)

我以前的答案(可能已经过时了)(很久以前贴出来的):

还有其他方法可以克服这个问题:

1. 使用TimeoutSauce内部类

来自:https://github.com/kennethreitz/requests/issues/1928 # issuecomment - 35811896

import requests from requests.adapters import TimeoutSauce class MyTimeout(TimeoutSauce): def __init__(self, *args, **kwargs): connect = kwargs.get('connect', 5) read = kwargs.get('read', connect) super(MyTimeout, self).__init__(connect=connect, read=read) requests.adapters.TimeoutSauce = MyTimeout This code should cause us to set the read timeout as equal to the connect timeout, which is the timeout value you pass on your Session.get() call. (Note that I haven't actually tested this code, so it may need some quick debugging, I just wrote it straight into the GitHub window.)

2. 使用kevinburke请求的分支:https://github.com/kevinburke/requests/tree/connect-timeout

从它的文档:https://github.com/kevinburke/requests/blob/connect-timeout/docs/user/advanced.rst

如果你为超时指定一个单独的值，像这样: R = requests.get('https://github.com'， timeout=5) 超时值将应用于连接和读取超时。如果要设置值，请指定一个元组另外: R = requests.get('https://github.com'， timeout=(3.05, 27))

Kevinburke已请求将其合并到主要请求项目中，但尚未被接受。

2014-03-13 11:42:11

python请求超时。获得完整的响应

推荐文章

最新文章

标签