我有一份人们的身份证和名字的名单,还有一份人们的身份证和姓氏的名单。有些人没有名字,有些人没有姓;我想在这两个列表上做一个完整的外部连接。

下面列出:

ID  FirstName
--  ---------
 1  John
 2  Sue

ID  LastName
--  --------
 1  Doe
 3  Smith

应该生产:

ID  FirstName  LastName
--  ---------  --------
 1  John       Doe
 2  Sue
 3             Smith

我已经发现了相当多的解决方案的“LINQ外部连接”,它们看起来都很相似,但似乎真的是离开外部连接。

到目前为止我的尝试是这样的:

private void OuterJoinTest()
{
    List<FirstName> firstNames = new List<FirstName>();
    firstNames.Add(new FirstName { ID = 1, Name = "John" });
    firstNames.Add(new FirstName { ID = 2, Name = "Sue" });

    List<LastName> lastNames = new List<LastName>();
    lastNames.Add(new LastName { ID = 1, Name = "Doe" });
    lastNames.Add(new LastName { ID = 3, Name = "Smith" });

    var outerJoin = from first in firstNames
        join last in lastNames
        on first.ID equals last.ID
        into temp
        from last in temp.DefaultIfEmpty()
        select new
        {
            id = first != null ? first.ID : last.ID,
            firstname = first != null ? first.Name : string.Empty,
            surname = last != null ? last.Name : string.Empty
        };
    }
}

public class FirstName
{
    public int ID;
    
    public string Name;
}
    
public class LastName
{
    public int ID;
    
    public string Name;
}

但结果是:

ID  FirstName  LastName
--  ---------  --------
 1  John       Doe
 2  Sue

我做错了什么?


当前回答

谢谢大家的有趣帖子!

我修改了代码,因为在我的情况下我需要

个性化的连接谓词 一个个性化的联合,独特的比较

这是我修改过的代码(不好意思,用VB写的)

    Module MyExtensions
        <Extension()>
        Friend Function FullOuterJoin(Of TA, TB, TResult)(ByVal a As IEnumerable(Of TA), ByVal b As IEnumerable(Of TB), ByVal joinPredicate As Func(Of TA, TB, Boolean), ByVal projection As Func(Of TA, TB, TResult), ByVal comparer As IEqualityComparer(Of TResult)) As IEnumerable(Of TResult)
            Dim joinL =
                From xa In a
                From xb In b.Where(Function(x) joinPredicate(xa, x)).DefaultIfEmpty()
                Select projection(xa, xb)
            Dim joinR =
                From xb In b
                From xa In a.Where(Function(x) joinPredicate(x, xb)).DefaultIfEmpty()
                Select projection(xa, xb)
            Return joinL.Union(joinR, comparer)
        End Function
    End Module

    Dim fullOuterJoin = lefts.FullOuterJoin(
        rights,
        Function(left, right) left.Code = right.Code And (left.Amount [...] Or left.Description.Contains [...]),
        Function(left, right) New CompareResult(left, right),
        New MyEqualityComparer
    )

    Public Class MyEqualityComparer
        Implements IEqualityComparer(Of CompareResult)

        Private Function GetMsg(obj As CompareResult) As String
            Dim msg As String = ""
            msg &= obj.Code & "_"
            [...]
            Return msg
        End Function

        Public Overloads Function Equals(x As CompareResult, y As CompareResult) As Boolean Implements IEqualityComparer(Of CompareResult).Equals
            Return Me.GetMsg(x) = Me.GetMsg(y)
        End Function

        Public Overloads Function GetHashCode(obj As CompareResult) As Integer Implements IEqualityComparer(Of CompareResult).GetHashCode
            Return Me.GetMsg(obj).GetHashCode
        End Function
    End Class

其他回答

两个或多个表的完全外部连接: 首先提取要连接的列。

var DatesA = from A in db.T1 select A.Date; 
var DatesB = from B in db.T2 select B.Date; 
var DatesC = from C in db.T3 select C.Date;            

var Dates = DatesA.Union(DatesB).Union(DatesC); 

然后在提取的列和主表之间使用左外连接。

var Full_Outer_Join =

(from A in Dates
join B in db.T1
on A equals B.Date into AB 

from ab in AB.DefaultIfEmpty()
join C in db.T2
on A equals C.Date into ABC 

from abc in ABC.DefaultIfEmpty()
join D in db.T3
on A equals D.Date into ABCD

from abcd in ABCD.DefaultIfEmpty() 
select new { A, ab, abc, abcd })
.AsEnumerable();

正如您所发现的,Linq没有“外部连接”结构。您所能得到的最接近的是使用您所声明的查询的左外连接。为此,你可以添加任何没有在join中表示的姓氏列表元素:

outerJoin = outerJoin.Concat(lastNames.Select(l=>new
                            {
                                id = l.ID,
                                firstname = String.Empty,
                                surname = l.Name
                            }).Where(l=>!outerJoin.Any(o=>o.id == l.id)));

更新1:提供一个真正通用的扩展方法FullOuterJoin 更新2:可选地接受键类型的自定义IEqualityComparer 更新3:这个实现最近已经成为MoreLinq的一部分-谢谢大家!

编辑新增FullOuterGroupJoin (ideone)。我重用了GetOuter<>实现,使它的性能比它可能的要低一些,但我现在的目标是“高级”代码,而不是前沿优化。

请登录http://ideone.com/O36nWc观看直播

static void Main(string[] args)
{
    var ax = new[] { 
        new { id = 1, name = "John" },
        new { id = 2, name = "Sue" } };
    var bx = new[] { 
        new { id = 1, surname = "Doe" },
        new { id = 3, surname = "Smith" } };

    ax.FullOuterJoin(bx, a => a.id, b => b.id, (a, b, id) => new {a, b})
        .ToList().ForEach(Console.WriteLine);
}

打印输出:

{ a = { id = 1, name = John }, b = { id = 1, surname = Doe } }
{ a = { id = 2, name = Sue }, b =  }
{ a = , b = { id = 3, surname = Smith } }

您还可以提供默认值:http://ideone.com/kG4kqO

    ax.FullOuterJoin(
            bx, a => a.id, b => b.id, 
            (a, b, id) => new { a.name, b.surname },
            new { id = -1, name    = "(no firstname)" },
            new { id = -2, surname = "(no surname)" }
        )

印刷:

{ name = John, surname = Doe }
{ name = Sue, surname = (no surname) }
{ name = (no firstname), surname = Smith }

使用术语的解释:

连接是一个借用自关系数据库设计的术语:

只要b中有对应键的元素,连接就会重复A中的元素(即:如果b为空,则什么都没有)。数据库行话称之为内部连接(equi)。 外部连接包括没有对应的元素 元素存在于b中(即:如果b为空则结果为偶数)。这通常称为左连接。 一个完整的外部连接包括来自A和b的记录,如果另一个中不存在相应的元素。(即,如果a为空,则结果为偶数)

在RDBMS中不常见的是组连接[1]:

组连接与上面描述的相同,但它不是为多个对应的b重复A中的元素,而是用对应的键对记录进行分组。当您希望根据公共键枚举“已连接”记录时,这通常更方便。

另请参阅GroupJoin,其中还包含一些一般的背景说明。


[1](我相信Oracle和MSSQL对此有专有扩展)

完整代码

一个通用的“drop-in”扩展类

internal static class MyExtensions
{
    internal static IEnumerable<TResult> FullOuterGroupJoin<TA, TB, TKey, TResult>(
        this IEnumerable<TA> a,
        IEnumerable<TB> b,
        Func<TA, TKey> selectKeyA, 
        Func<TB, TKey> selectKeyB,
        Func<IEnumerable<TA>, IEnumerable<TB>, TKey, TResult> projection,
        IEqualityComparer<TKey> cmp = null)
    {
        cmp = cmp?? EqualityComparer<TKey>.Default;
        var alookup = a.ToLookup(selectKeyA, cmp);
        var blookup = b.ToLookup(selectKeyB, cmp);

        var keys = new HashSet<TKey>(alookup.Select(p => p.Key), cmp);
        keys.UnionWith(blookup.Select(p => p.Key));

        var join = from key in keys
                   let xa = alookup[key]
                   let xb = blookup[key]
                   select projection(xa, xb, key);

        return join;
    }

    internal static IEnumerable<TResult> FullOuterJoin<TA, TB, TKey, TResult>(
        this IEnumerable<TA> a,
        IEnumerable<TB> b,
        Func<TA, TKey> selectKeyA, 
        Func<TB, TKey> selectKeyB,
        Func<TA, TB, TKey, TResult> projection,
        TA defaultA = default(TA), 
        TB defaultB = default(TB),
        IEqualityComparer<TKey> cmp = null)
    {
        cmp = cmp?? EqualityComparer<TKey>.Default;
        var alookup = a.ToLookup(selectKeyA, cmp);
        var blookup = b.ToLookup(selectKeyB, cmp);

        var keys = new HashSet<TKey>(alookup.Select(p => p.Key), cmp);
        keys.UnionWith(blookup.Select(p => p.Key));

        var join = from key in keys
                   from xa in alookup[key].DefaultIfEmpty(defaultA)
                   from xb in blookup[key].DefaultIfEmpty(defaultB)
                   select projection(xa, xb, key);

        return join;
    }
}

我真的很讨厌这些linq表达式,这就是SQL存在的原因:

select isnull(fn.id, ln.id) as id, fn.firstname, ln.lastname
   from firstnames fn
   full join lastnames ln on ln.id=fn.id

在数据库中创建此sql视图并将其作为实体导入。

当然,(不同的)左连接和右连接的并集也可以做到这一点,但这是愚蠢的。

这是另一个完整的外部连接

由于对其他命题的简单性和可读性不太满意,我最后得出了这样的结论:

它没有快速的自命(在2020m CPU上加入1000 * 1000大约800毫秒:2.4ghz / 2核)。对我来说,它只是一个紧凑而随意的完全外部连接。

它的工作原理与SQL FULL OUTER JOIN相同(重复保存)

欢呼;-)

using System;
using System.Collections.Generic;
using System.Linq;
namespace NS
{
public static class DataReunion
{
    public static List<Tuple<T1, T2>> FullJoin<T1, T2, TKey>(List<T1> List1, Func<T1, TKey> KeyFunc1, List<T2> List2, Func<T2, TKey> KeyFunc2)
    {
        List<Tuple<T1, T2>> result = new List<Tuple<T1, T2>>();

        Tuple<TKey, T1>[] identifiedList1 = List1.Select(_ => Tuple.Create(KeyFunc1(_), _)).OrderBy(_ => _.Item1).ToArray();
        Tuple<TKey, T2>[] identifiedList2 = List2.Select(_ => Tuple.Create(KeyFunc2(_), _)).OrderBy(_ => _.Item1).ToArray();

        identifiedList1.Where(_ => !identifiedList2.Select(__ => __.Item1).Contains(_.Item1)).ToList().ForEach(_ => {
            result.Add(Tuple.Create<T1, T2>(_.Item2, default(T2)));
        });

        result.AddRange(
            identifiedList1.Join(identifiedList2, left => left.Item1, right => right.Item1, (left, right) => Tuple.Create<T1, T2>(left.Item2, right.Item2)).ToList()
        );

        identifiedList2.Where(_ => !identifiedList1.Select(__ => __.Item1).Contains(_.Item1)).ToList().ForEach(_ => {
            result.Add(Tuple.Create<T1, T2>(default(T1), _.Item2));
        });

        return result;
    }
}
}

这个想法是

基于提供的关键函数生成器构建id 处理仅剩下的项 流程内部连接 只处理正确的项目

下面是一个与之相关的简单测试:

在结束处放置断点,以手动验证它的行为是否符合预期

using System;
using System.Collections.Generic;
using Microsoft.VisualStudio.TestTools.UnitTesting;
using Newtonsoft.Json;
using Newtonsoft.Json.Linq;
using NS;

namespace Tests
{
[TestClass]
public class DataReunionTest
{
    [TestMethod]
    public void Test()
    {
        List<Tuple<Int32, Int32, String>> A = new List<Tuple<Int32, Int32, String>>();
        List<Tuple<Int32, Int32, String>> B = new List<Tuple<Int32, Int32, String>>();

        Random rnd = new Random();

        /* Comment the testing block you do not want to run
        /* Solution to test a wide range of keys*/

        for (int i = 0; i < 500; i += 1)
        {
            A.Add(Tuple.Create(rnd.Next(1, 101), rnd.Next(1, 101), "A"));
            B.Add(Tuple.Create(rnd.Next(1, 101), rnd.Next(1, 101), "B"));
        }

        /* Solution for essential testing*/

        A.Add(Tuple.Create(1, 2, "B11"));
        A.Add(Tuple.Create(1, 2, "B12"));
        A.Add(Tuple.Create(1, 3, "C11"));
        A.Add(Tuple.Create(1, 3, "C12"));
        A.Add(Tuple.Create(1, 3, "C13"));
        A.Add(Tuple.Create(1, 4, "D1"));

        B.Add(Tuple.Create(1, 1, "A21"));
        B.Add(Tuple.Create(1, 1, "A22"));
        B.Add(Tuple.Create(1, 1, "A23"));
        B.Add(Tuple.Create(1, 2, "B21"));
        B.Add(Tuple.Create(1, 2, "B22"));
        B.Add(Tuple.Create(1, 2, "B23"));
        B.Add(Tuple.Create(1, 3, "C2"));
        B.Add(Tuple.Create(1, 5, "E2"));

        Func<Tuple<Int32, Int32, String>, Tuple<Int32, Int32>> key = (_) => Tuple.Create(_.Item1, _.Item2);

        var watch = System.Diagnostics.Stopwatch.StartNew();
        var res = DataReunion.FullJoin(A, key, B, key);
        watch.Stop();
        var elapsedMs = watch.ElapsedMilliseconds;
        String aser = JToken.FromObject(res).ToString(Formatting.Indented);
        Console.Write(elapsedMs);
    }
}

}