我正在努力理解Python中的线程。我看过文档和示例,但坦率地说,许多示例过于复杂,我很难理解它们。
如何清楚地显示为多线程划分的任务?
我正在努力理解Python中的线程。我看过文档和示例,但坦率地说,许多示例过于复杂,我很难理解它们。
如何清楚地显示为多线程划分的任务?
当前回答
作为第二个anwser的python3版本:
import queue as Queue
import threading
import urllib.request
# Called by each thread
def get_url(q, url):
q.put(urllib.request.urlopen(url).read())
theurls = ["http://google.com", "http://yahoo.com", "http://www.python.org","https://wiki.python.org/moin/"]
q = Queue.Queue()
def thread_func():
for u in theurls:
t = threading.Thread(target=get_url, args = (q,u))
t.daemon = True
t.start()
s = q.get()
def non_thread_func():
for u in theurls:
get_url(q,u)
s = q.get()
您可以测试它:
start = time.time()
thread_func()
end = time.time()
print(end - start)
start = time.time()
non_thread_func()
end = time.time()
print(end - start)
non_thread_func()花费的时间应该是thread_func()的4倍
其他回答
我在这里看到了很多没有执行实际工作的示例,它们大多是CPU限制的。这里是一个CPU绑定任务的示例,它计算1000万到1005万之间的所有素数。我在这里使用了所有四种方法:
import math
import timeit
import threading
import multiprocessing
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor
def time_stuff(fn):
"""
Measure time of execution of a function
"""
def wrapper(*args, **kwargs):
t0 = timeit.default_timer()
fn(*args, **kwargs)
t1 = timeit.default_timer()
print("{} seconds".format(t1 - t0))
return wrapper
def find_primes_in(nmin, nmax):
"""
Compute a list of prime numbers between the given minimum and maximum arguments
"""
primes = []
# Loop from minimum to maximum
for current in range(nmin, nmax + 1):
# Take the square root of the current number
sqrt_n = int(math.sqrt(current))
found = False
# Check if the any number from 2 to the square root + 1 divides the current numnber under consideration
for number in range(2, sqrt_n + 1):
# If divisible we have found a factor, hence this is not a prime number, lets move to the next one
if current % number == 0:
found = True
break
# If not divisible, add this number to the list of primes that we have found so far
if not found:
primes.append(current)
# I am merely printing the length of the array containing all the primes, but feel free to do what you want
print(len(primes))
@time_stuff
def sequential_prime_finder(nmin, nmax):
"""
Use the main process and main thread to compute everything in this case
"""
find_primes_in(nmin, nmax)
@time_stuff
def threading_prime_finder(nmin, nmax):
"""
If the minimum is 1000 and the maximum is 2000 and we have four workers,
1000 - 1250 to worker 1
1250 - 1500 to worker 2
1500 - 1750 to worker 3
1750 - 2000 to worker 4
so let’s split the minimum and maximum values according to the number of workers
"""
nrange = nmax - nmin
threads = []
for i in range(8):
start = int(nmin + i * nrange/8)
end = int(nmin + (i + 1) * nrange/8)
# Start the thread with the minimum and maximum split up to compute
# Parallel computation will not work here due to the GIL since this is a CPU-bound task
t = threading.Thread(target = find_primes_in, args = (start, end))
threads.append(t)
t.start()
# Don’t forget to wait for the threads to finish
for t in threads:
t.join()
@time_stuff
def processing_prime_finder(nmin, nmax):
"""
Split the minimum, maximum interval similar to the threading method above, but use processes this time
"""
nrange = nmax - nmin
processes = []
for i in range(8):
start = int(nmin + i * nrange/8)
end = int(nmin + (i + 1) * nrange/8)
p = multiprocessing.Process(target = find_primes_in, args = (start, end))
processes.append(p)
p.start()
for p in processes:
p.join()
@time_stuff
def thread_executor_prime_finder(nmin, nmax):
"""
Split the min max interval similar to the threading method, but use a thread pool executor this time.
This method is slightly faster than using pure threading as the pools manage threads more efficiently.
This method is still slow due to the GIL limitations since we are doing a CPU-bound task.
"""
nrange = nmax - nmin
with ThreadPoolExecutor(max_workers = 8) as e:
for i in range(8):
start = int(nmin + i * nrange/8)
end = int(nmin + (i + 1) * nrange/8)
e.submit(find_primes_in, start, end)
@time_stuff
def process_executor_prime_finder(nmin, nmax):
"""
Split the min max interval similar to the threading method, but use the process pool executor.
This is the fastest method recorded so far as it manages process efficiently + overcomes GIL limitations.
RECOMMENDED METHOD FOR CPU-BOUND TASKS
"""
nrange = nmax - nmin
with ProcessPoolExecutor(max_workers = 8) as e:
for i in range(8):
start = int(nmin + i * nrange/8)
end = int(nmin + (i + 1) * nrange/8)
e.submit(find_primes_in, start, end)
def main():
nmin = int(1e7)
nmax = int(1.05e7)
print("Sequential Prime Finder Starting")
sequential_prime_finder(nmin, nmax)
print("Threading Prime Finder Starting")
threading_prime_finder(nmin, nmax)
print("Processing Prime Finder Starting")
processing_prime_finder(nmin, nmax)
print("Thread Executor Prime Finder Starting")
thread_executor_prime_finder(nmin, nmax)
print("Process Executor Finder Starting")
process_executor_prime_finder(nmin, nmax)
if __name__ == "__main__":
main()
以下是我的Mac OS X四核计算机的结果
Sequential Prime Finder Starting
9.708213827005238 seconds
Threading Prime Finder Starting
9.81836523200036 seconds
Processing Prime Finder Starting
3.2467174359990167 seconds
Thread Executor Prime Finder Starting
10.228896902000997 seconds
Process Executor Finder Starting
2.656402041000547 seconds
与其他提到的一样,由于GIL,CPython只能在I/O等待时使用线程。
如果您想从多个内核中获得CPU绑定任务的好处,请使用多处理:
from multiprocessing import Process
def f(name):
print 'hello', name
if __name__ == '__main__':
p = Process(target=f, args=('bob',))
p.start()
p.join()
以前的解决方案都没有在我的GNU/Linux服务器上使用多个内核(我没有管理员权限)。他们只是在一个核心上跑步。
我使用较低级别的os.fork接口来派生多个进程。这是对我有用的代码:
from os import fork
values = ['different', 'values', 'for', 'threads']
for i in range(len(values)):
p = fork()
if p == 0:
my_function(values[i])
break
大多数文档和教程都使用Python的“线程和队列”模块,对于初学者来说,它们可能会让人不知所措。
也许可以考虑Python 3的concurrent.futures.ThreadPoolExecutor模块。
结合子句和列表理解,这可能是一个真正的魅力。
from concurrent.futures import ThreadPoolExecutor, as_completed
def get_url(url):
# Your actual program here. Using threading.Lock() if necessary
return ""
# List of URLs to fetch
urls = ["url1", "url2"]
with ThreadPoolExecutor(max_workers = 5) as executor:
# Create threads
futures = {executor.submit(get_url, url) for url in urls}
# as_completed() gives you the threads once finished
for f in as_completed(futures):
# Get the results
rs = f.result()
作为第二个anwser的python3版本:
import queue as Queue
import threading
import urllib.request
# Called by each thread
def get_url(q, url):
q.put(urllib.request.urlopen(url).read())
theurls = ["http://google.com", "http://yahoo.com", "http://www.python.org","https://wiki.python.org/moin/"]
q = Queue.Queue()
def thread_func():
for u in theurls:
t = threading.Thread(target=get_url, args = (q,u))
t.daemon = True
t.start()
s = q.get()
def non_thread_func():
for u in theurls:
get_url(q,u)
s = q.get()
您可以测试它:
start = time.time()
thread_func()
end = time.time()
print(end - start)
start = time.time()
non_thread_func()
end = time.time()
print(end - start)
non_thread_func()花费的时间应该是thread_func()的4倍