我正在努力理解Python中的线程。我看过文档和示例,但坦率地说,许多示例过于复杂,我很难理解它们。
如何清楚地显示为多线程划分的任务?
我正在努力理解Python中的线程。我看过文档和示例,但坦率地说,许多示例过于复杂,我很难理解它们。
如何清楚地显示为多线程划分的任务?
当前回答
我发现这非常有用:创建与内核一样多的线程,并让它们执行(大量)任务(在本例中,调用shell程序):
import Queue
import threading
import multiprocessing
import subprocess
q = Queue.Queue()
for i in range(30): # Put 30 tasks in the queue
q.put(i)
def worker():
while True:
item = q.get()
# Execute a task: call a shell program and wait until it completes
subprocess.call("echo " + str(item), shell=True)
q.task_done()
cpus = multiprocessing.cpu_count() # Detect number of cores
print("Creating %d threads" % cpus)
for i in range(cpus):
t = threading.Thread(target=worker)
t.daemon = True
t.start()
q.join() # Block until all tasks are done
其他回答
与其他提到的一样,由于GIL,CPython只能在I/O等待时使用线程。
如果您想从多个内核中获得CPU绑定任务的好处,请使用多处理:
from multiprocessing import Process
def f(name):
print 'hello', name
if __name__ == '__main__':
p = Process(target=f, args=('bob',))
p.start()
p.join()
Alex Martelli的回答对我有所帮助。不过,这里有一个我认为更有用的修改版本(至少对我来说)。
更新:可在Python 2和Python 3中使用
try:
# For Python 3
import queue
from urllib.request import urlopen
except:
# For Python 2
import Queue as queue
from urllib2 import urlopen
import threading
worker_data = ['http://google.com', 'http://yahoo.com', 'http://bing.com']
# Load up a queue with your data. This will handle locking
q = queue.Queue()
for url in worker_data:
q.put(url)
# Define a worker function
def worker(url_queue):
queue_full = True
while queue_full:
try:
# Get your data off the queue, and do some work
url = url_queue.get(False)
data = urlopen(url).read()
print(len(data))
except queue.Empty:
queue_full = False
# Create as many threads as you want
thread_count = 5
for i in range(thread_count):
t = threading.Thread(target=worker, args = (q,))
t.start()
作为第二个anwser的python3版本:
import queue as Queue
import threading
import urllib.request
# Called by each thread
def get_url(q, url):
q.put(urllib.request.urlopen(url).read())
theurls = ["http://google.com", "http://yahoo.com", "http://www.python.org","https://wiki.python.org/moin/"]
q = Queue.Queue()
def thread_func():
for u in theurls:
t = threading.Thread(target=get_url, args = (q,u))
t.daemon = True
t.start()
s = q.get()
def non_thread_func():
for u in theurls:
get_url(q,u)
s = q.get()
您可以测试它:
start = time.time()
thread_func()
end = time.time()
print(end - start)
start = time.time()
non_thread_func()
end = time.time()
print(end - start)
non_thread_func()花费的时间应该是thread_func()的4倍
大多数文档和教程都使用Python的“线程和队列”模块,对于初学者来说,它们可能会让人不知所措。
也许可以考虑Python 3的concurrent.futures.ThreadPoolExecutor模块。
结合子句和列表理解,这可能是一个真正的魅力。
from concurrent.futures import ThreadPoolExecutor, as_completed
def get_url(url):
# Your actual program here. Using threading.Lock() if necessary
return ""
# List of URLs to fetch
urls = ["url1", "url2"]
with ThreadPoolExecutor(max_workers = 5) as executor:
# Create threads
futures = {executor.submit(get_url, url) for url in urls}
# as_completed() gives you the threads once finished
for f in as_completed(futures):
# Get the results
rs = f.result()
使用全新的concurrent.futures模块
def sqr(val):
import time
time.sleep(0.1)
return val * val
def process_result(result):
print(result)
def process_these_asap(tasks):
import concurrent.futures
with concurrent.futures.ProcessPoolExecutor() as executor:
futures = []
for task in tasks:
futures.append(executor.submit(sqr, task))
for future in concurrent.futures.as_completed(futures):
process_result(future.result())
# Or instead of all this just do:
# results = executor.map(sqr, tasks)
# list(map(process_result, results))
def main():
tasks = list(range(10))
print('Processing {} tasks'.format(len(tasks)))
process_these_asap(tasks)
print('Done')
return 0
if __name__ == '__main__':
import sys
sys.exit(main())
执行器方法对于所有以前接触过Java的人来说似乎都很熟悉。
还有一个附带说明:为了保持宇宙的正常,如果你不使用上下文,不要忘记关闭你的池/执行器(这是如此棒,它为你做了)