在Java 8中,我如何使用流API通过检查每个对象的属性的清晰度来过滤一个集合?

例如,我有一个Person对象列表,我想删除同名的人,

persons.stream().distinct();

将对Person对象使用默认的相等性检查,所以我需要这样的东西,

persons.stream().distinct(p -> p.getName());

不幸的是,distinct()方法没有这样的重载。如果不修改Person类内部的相等检查,是否可以简洁地做到这一点?


当前回答

基于@josketres的回答,我创建了一个通用的实用方法:

您可以通过创建一个Collector使其对Java 8更加友好。

public static <T> Set<T> removeDuplicates(Collection<T> input, Comparator<T> comparer) {
    return input.stream()
            .collect(toCollection(() -> new TreeSet<>(comparer)));
}


@Test
public void removeDuplicatesWithDuplicates() {
    ArrayList<C> input = new ArrayList<>();
    Collections.addAll(input, new C(7), new C(42), new C(42));
    Collection<C> result = removeDuplicates(input, (c1, c2) -> Integer.compare(c1.value, c2.value));
    assertEquals(2, result.size());
    assertTrue(result.stream().anyMatch(c -> c.value == 7));
    assertTrue(result.stream().anyMatch(c -> c.value == 42));
}

@Test
public void removeDuplicatesWithoutDuplicates() {
    ArrayList<C> input = new ArrayList<>();
    Collections.addAll(input, new C(1), new C(2), new C(3));
    Collection<C> result = removeDuplicates(input, (t1, t2) -> Integer.compare(t1.value, t2.value));
    assertEquals(3, result.size());
    assertTrue(result.stream().anyMatch(c -> c.value == 1));
    assertTrue(result.stream().anyMatch(c -> c.value == 2));
    assertTrue(result.stream().anyMatch(c -> c.value == 3));
}

private class C {
    public final int value;

    private C(int value) {
        this.value = value;
    }
}

其他回答

另一个支持这个的库是jOOλ,它的Seq.distinct(Function<T,U>)方法:

Seq.seq(persons).distinct(Person::getName).toList();

实际上,它所做的事情与公认的答案几乎相同。

虽然最高赞的答案绝对是Java 8的最佳答案,但它同时在性能方面绝对是最差的。如果您真的想要一个糟糕的低性能应用程序,那么就使用它吧。提取一组唯一人名的简单要求,只需“For-Each”和“set”即可实现。 如果list的大小大于10,情况会更糟。

假设你有一个包含20个对象的集合,如下所示:

public static final List<SimpleEvent> testList = Arrays.asList(
            new SimpleEvent("Tom"), new SimpleEvent("Dick"),new SimpleEvent("Harry"),new SimpleEvent("Tom"),
            new SimpleEvent("Dick"),new SimpleEvent("Huckle"),new SimpleEvent("Berry"),new SimpleEvent("Tom"),
            new SimpleEvent("Dick"),new SimpleEvent("Moses"),new SimpleEvent("Chiku"),new SimpleEvent("Cherry"),
            new SimpleEvent("Roses"),new SimpleEvent("Moses"),new SimpleEvent("Chiku"),new SimpleEvent("gotya"),
            new SimpleEvent("Gotye"),new SimpleEvent("Nibble"),new SimpleEvent("Berry"),new SimpleEvent("Jibble"));

你的SimpleEvent对象是这样的:

public class SimpleEvent {

private String name;
private String type;

public SimpleEvent(String name) {
    this.name = name;
    this.type = "type_"+name;
}

public String getName() {
    return name;
}

public void setName(String name) {
    this.name = name;
}

public String getType() {
    return type;
}

public void setType(String type) {
    this.type = type;
}
}

为了测试,你有这样的JMH代码,(请注意,我使用相同的distinctByKey谓语提到接受的答案):

@Benchmark
@OutputTimeUnit(TimeUnit.SECONDS)
public void aStreamBasedUniqueSet(Blackhole blackhole) throws Exception{

    Set<String> uniqueNames = testList
            .stream()
            .filter(distinctByKey(SimpleEvent::getName))
            .map(SimpleEvent::getName)
            .collect(Collectors.toSet());
    blackhole.consume(uniqueNames);
}

@Benchmark
@OutputTimeUnit(TimeUnit.SECONDS)
public void aForEachBasedUniqueSet(Blackhole blackhole) throws Exception{
    Set<String> uniqueNames = new HashSet<>();

    for (SimpleEvent event : testList) {
        uniqueNames.add(event.getName());
    }
    blackhole.consume(uniqueNames);
}

public static void main(String[] args) throws RunnerException {
    Options opt = new OptionsBuilder()
            .include(MyBenchmark.class.getSimpleName())
            .forks(1)
            .mode(Mode.Throughput)
            .warmupBatchSize(3)
            .warmupIterations(3)
            .measurementIterations(3)
            .build();

    new Runner(opt).run();
}

然后你会得到这样的基准测试结果:

Benchmark                                  Mode  Samples        Score  Score error  Units
c.s.MyBenchmark.aForEachBasedUniqueSet    thrpt        3  2635199.952  1663320.718  ops/s
c.s.MyBenchmark.aStreamBasedUniqueSet     thrpt        3   729134.695   895825.697  ops/s

正如您所看到的,与Java 8 Stream相比,简单的For-Each在吞吐量方面提高了3倍,并且错误评分更低。

吞吐量越高,性能越好

我遇到了一种情况,我应该根据2个键从列表中获得不同的元素。 如果您希望基于两个键或组合键进行区分,请尝试此操作

class Person{
    int rollno;
    String name;
}
List<Person> personList;


Function<Person, List<Object>> compositeKey = personList->
        Arrays.<Object>asList(personList.getName(), personList.getRollno());

Map<Object, List<Person>> map = personList.stream().collect(Collectors.groupingBy(compositeKey, Collectors.toList()));

List<Object> duplicateEntrys = map.entrySet().stream()`enter code here`
        .filter(settingMap ->
                settingMap.getValue().size() > 1)
        .collect(Collectors.toList());

扩展Stuart Marks的回答,这可以用更短的方式完成,不需要并发映射(如果你不需要并行流):

public static <T> Predicate<T> distinctByKey(Function<? super T, ?> keyExtractor) {
    final Set<Object> seen = new HashSet<>();
    return t -> seen.add(keyExtractor.apply(t));
}

然后调用:

persons.stream().filter(distinctByKey(p -> p.getName());

在我的情况下,我需要控制什么是前一个元素。然后,我创建了一个有状态的Predicate,我在其中控制前一个元素是否与当前元素不同,在这种情况下,我保留了它。

public List<Log> fetchLogById(Long id) {
    return this.findLogById(id).stream()
        .filter(new LogPredicate())
        .collect(Collectors.toList());
}

public class LogPredicate implements Predicate<Log> {

    private Log previous;

    public boolean test(Log atual) {
        boolean isDifferent = previouws == null || verifyIfDifferentLog(current, previous);

        if (isDifferent) {
            previous = current;
        }
        return isDifferent;
    }

    private boolean verifyIfDifferentLog(Log current, Log previous) {
        return !current.getId().equals(previous.getId());
    }

}