在地铁应用程序中,我需要执行一些WCF调用。有大量的调用要执行,所以我需要在并行循环中执行它们。问题是并行循环在WCF调用全部完成之前就退出了。
您将如何重构它以使其按预期工作?
var ids = new List<string>() { "1", "2", "3", "4", "5", "6", "7", "8", "9", "10" };
var customers = new System.Collections.Concurrent.BlockingCollection<Customer>();
Parallel.ForEach(ids, async i =>
{
ICustomerRepo repo = new CustomerRepo();
var cust = await repo.GetCustomer(i);
customers.Add(cust);
});
foreach ( var customer in customers )
{
Console.WriteLine(customer.ID);
}
Console.ReadKey();
Parallel.ForEach()背后的整个思想是,您有一组线程,每个线程处理集合的一部分。正如您所注意到的,这在async-await中不起作用,在async调用期间,您希望释放线程。
你可以通过阻塞ForEach()线程来“修复”这个问题,但这就违背了async-await的全部意义。
您可以使用TPL Dataflow而不是Parallel.ForEach(),后者很好地支持异步任务。
具体来说,您的代码可以使用TransformBlock编写,它使用async lambda将每个id转换为Customer。此块可以配置为并行执行。您可以将该块链接到一个ActionBlock,该ActionBlock将每个Customer写入控制台。
在你建立了块网络之后,你可以Post()每个id到TransformBlock。
在代码:
var ids = new List<string> { "1", "2", "3", "4", "5", "6", "7", "8", "9", "10" };
var getCustomerBlock = new TransformBlock<string, Customer>(
async i =>
{
ICustomerRepo repo = new CustomerRepo();
return await repo.GetCustomer(i);
}, new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = DataflowBlockOptions.Unbounded
});
var writeCustomerBlock = new ActionBlock<Customer>(c => Console.WriteLine(c.ID));
getCustomerBlock.LinkTo(
writeCustomerBlock, new DataflowLinkOptions
{
PropagateCompletion = true
});
foreach (var id in ids)
getCustomerBlock.Post(id);
getCustomerBlock.Complete();
writeCustomerBlock.Completion.Wait();
尽管您可能希望将TransformBlock的并行度限制为某个小常数。此外,您还可以限制TransformBlock的容量,并使用SendAsync()向其异步添加项目,例如,如果集合太大。
与您的代码相比(如果它工作的话),一个额外的好处是,只要一个项目完成,编写就会开始,而不是等到所有的处理都完成。
斯维克的回答(一如既往)极好。
但是,我发现当您需要传输大量数据时,Dataflow会更有用。或者当您需要异步兼容队列时。
在你的情况下,一个更简单的解决方案是使用异步风格的并行:
var ids = new List<string>() { "1", "2", "3", "4", "5", "6", "7", "8", "9", "10" };
var customerTasks = ids.Select(i =>
{
ICustomerRepo repo = new CustomerRepo();
return repo.GetCustomer(i);
});
var customers = await Task.WhenAll(customerTasks);
foreach (var customer in customers)
{
Console.WriteLine(customer.ID);
}
Console.ReadKey();
这应该是非常有效的,并且比整个TPL数据流工作更容易:
var customers = await ids.SelectAsync(async i =>
{
ICustomerRepo repo = new CustomerRepo();
return await repo.GetCustomer(i);
});
...
public static async Task<IList<TResult>> SelectAsync<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, Task<TResult>> selector, int maxDegreesOfParallelism = 4)
{
var results = new List<TResult>();
var activeTasks = new HashSet<Task<TResult>>();
foreach (var item in source)
{
activeTasks.Add(selector(item));
if (activeTasks.Count >= maxDegreesOfParallelism)
{
var completed = await Task.WhenAny(activeTasks);
activeTasks.Remove(completed);
results.Add(completed.Result);
}
}
results.AddRange(await Task.WhenAll(activeTasks));
return results;
}
在介绍了一堆helper方法之后,你将能够使用以下简单的语法运行并行查询:
const int DegreeOfParallelism = 10;
IEnumerable<double> result = await Enumerable.Range(0, 1000000)
.Split(DegreeOfParallelism)
.SelectManyAsync(async i => await CalculateAsync(i).ConfigureAwait(false))
.ConfigureAwait(false);
这里发生的事情是:我们将源集合分成10个块(. split (DegreeOfParallelism)),然后运行10个任务,每个任务逐个处理它的项(. selectmanyasync(…)),并将它们合并回一个列表。
值得一提的是,有一个更简单的方法:
double[] result2 = await Enumerable.Range(0, 1000000)
.Select(async i => await CalculateAsync(i).ConfigureAwait(false))
.WhenAll()
.ConfigureAwait(false);
但是它需要一个预防措施:如果您有一个太大的源集合,它将立即为每个项目安排一个Task,这可能会导致显著的性能损失。
上面例子中使用的扩展方法如下所示:
public static class CollectionExtensions
{
/// <summary>
/// Splits collection into number of collections of nearly equal size.
/// </summary>
public static IEnumerable<List<T>> Split<T>(this IEnumerable<T> src, int slicesCount)
{
if (slicesCount <= 0) throw new ArgumentOutOfRangeException(nameof(slicesCount));
List<T> source = src.ToList();
var sourceIndex = 0;
for (var targetIndex = 0; targetIndex < slicesCount; targetIndex++)
{
var list = new List<T>();
int itemsLeft = source.Count - targetIndex;
while (slicesCount * list.Count < itemsLeft)
{
list.Add(source[sourceIndex++]);
}
yield return list;
}
}
/// <summary>
/// Takes collection of collections, projects those in parallel and merges results.
/// </summary>
public static async Task<IEnumerable<TResult>> SelectManyAsync<T, TResult>(
this IEnumerable<IEnumerable<T>> source,
Func<T, Task<TResult>> func)
{
List<TResult>[] slices = await source
.Select(async slice => await slice.SelectListAsync(func).ConfigureAwait(false))
.WhenAll()
.ConfigureAwait(false);
return slices.SelectMany(s => s);
}
/// <summary>Runs selector and awaits results.</summary>
public static async Task<List<TResult>> SelectListAsync<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, Task<TResult>> selector)
{
List<TResult> result = new List<TResult>();
foreach (TSource source1 in source)
{
TResult result1 = await selector(source1).ConfigureAwait(false);
result.Add(result1);
}
return result;
}
/// <summary>Wraps tasks with Task.WhenAll.</summary>
public static Task<TResult[]> WhenAll<TResult>(this IEnumerable<Task<TResult>> source)
{
return Task.WhenAll<TResult>(source);
}
}