Java中的数组或列表。哪个更快?

我必须在内存中保留数千个字符串，以便在Java中串行访问。我应该把它们存储在数组中还是应该使用某种列表?

由于数组将所有数据保存在一个连续的内存块中(与list不同)，使用数组存储数千个字符串会导致问题吗?

当前回答

没有一个答案有我感兴趣的信息——重复扫描同一个数组很多很多次。必须为此做一个JMH测试。

结果(Java 1.8.0_66 x32，迭代普通数组至少比ArrayList快5倍):

Benchmark                    Mode  Cnt   Score   Error  Units
MyBenchmark.testArrayForGet  avgt   10   8.121 ? 0.233  ms/op
MyBenchmark.testListForGet   avgt   10  37.416 ? 0.094  ms/op
MyBenchmark.testListForEach  avgt   10  75.674 ? 1.897  ms/op

Test

package my.jmh.test;

import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.TimeUnit;
import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.BenchmarkMode;
import org.openjdk.jmh.annotations.Fork;
import org.openjdk.jmh.annotations.Measurement;
import org.openjdk.jmh.annotations.Mode;
import org.openjdk.jmh.annotations.OutputTimeUnit;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.annotations.Warmup;

@State(Scope.Benchmark)
@Fork(1)
@Warmup(iterations = 5, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 10)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
public class MyBenchmark {

    public final static int ARR_SIZE = 100;
    public final static int ITER_COUNT = 100000;

    String arr[] = new String[ARR_SIZE];
    List<String> list = new ArrayList<>(ARR_SIZE);

    public MyBenchmark() {
        for( int i = 0; i < ARR_SIZE; i++ ) {
            list.add(null);
        }
    }

    @Benchmark
    public void testListForEach() {
        int count = 0;
        for( int i = 0; i < ITER_COUNT; i++ ) {
            for( String str : list ) {
                if( str != null )
                    count++;
            }
        }
        if( count > 0 )
            System.out.print(count);
    }

    @Benchmark
    public void testListForGet() {
        int count = 0;
        for( int i = 0; i < ITER_COUNT; i++ ) {
            for( int j = 0; j < ARR_SIZE; j++ ) {
                if( list.get(j) != null )
                    count++;
            }
        }
        if( count > 0 )
            System.out.print(count);
    }

    @Benchmark
    public void testArrayForGet() {
        int count = 0;
        for( int i = 0; i < ITER_COUNT; i++ ) {
            for( int j = 0; j < ARR_SIZE; j++ ) {
                if( arr[j] != null )
                    count++;
            }
        }
        if( count > 0 )
            System.out.print(count);
    }

}

2016-06-01 20:00:14

其他回答

首先，有必要澄清一下，您是指经典的compp sci数据结构意义上的“列表”(即链表)，还是指java.util.List?如果你指的是java.util。List，它是一个接口。如果你想使用数组，只要使用数组列表实现，你就会得到类似数组的行为和语义。问题解决了。

如果你指的是数组和链表，这是一个稍微不同的参数，我们回到大O(如果这是一个不熟悉的术语，这里有一个简单的英语解释。

数组;

随机存取:O(1); 插入:O (n); 删除:O (n)。

链表:

随机存取:O(n); 插入:O (1); 删除:O(1)。

你可以选择最适合调整数组大小的方法。如果你调整大小，插入和删除很多，那么链表可能是一个更好的选择。如果随机访问很少，情况也是如此。你提到了串行访问。如果你主要做串行访问，很少修改，那么你选择哪一个可能都不重要。

链表的开销略高，因为正如您所说，您正在处理潜在的不连续内存块和(有效地)指向下一个元素的指针。但是，除非您要处理数百万个条目，否则这可能不是一个重要因素。

2009-04-04 13:46:11

如果提前知道数据有多大，那么使用数组会更快。

List更加灵活。你可以使用由数组支持的数组列表。

2009-04-04 06:00:50

不，因为从技术上讲，数组只存储对字符串的引用。字符串本身被分配到不同的位置。对于上千个项目，我会说列表会更好，它更慢，但它提供了更多的灵活性，更容易使用，特别是如果你要调整它们的大小。

2009-04-04 06:02:00

我猜最初的海报来自c++ /STL背景，这引起了一些混乱。在c++中std::list是一个双链表。

在Java中[Java .util]。List是一个不需要实现的接口(c++术语中的纯抽象类)。List可以是一个双重链表——提供了java.util.LinkedList。然而，100次中有99次，当你想要创建一个新的List时，你想要使用java.util.ArrayList来代替，这是c++ std::vector的大致等价。还有其他标准实现，比如java.util.Collections.emptyList()和java.util.Arrays.asList()返回的那些。

从性能的角度来看，不得不通过一个接口和一个额外的对象会有很小的影响，但是运行时内联意味着这很少有任何意义。还要记住String通常是一个对象加数组。所以对于每个元素，你可能有两个其他的对象。在c++ std::vector<std::string>中，虽然按值复制而不使用指针，但字符数组将形成一个string对象(通常不会共享这些对象)。

如果这段代码对性能非常敏感，那么可以为所有字符串的所有字符创建一个char[]数组(甚至byte[])，然后创建一个偏移量数组。IIRC，这是javac的实现方式。

2009-04-04 13:11:54

这里给出的许多微基准测试发现，像array/ArrayList读取这样的事情需要几纳秒。如果所有内容都在L1缓存中，这是非常合理的。

更高级别的缓存或主存访问的数量级可能是10nS-100nS，而L1缓存的数量级更接近1nS。访问ArrayList有一个额外的内存间接，在实际的应用程序中，你可以几乎从不或每次都支付这个代价，这取决于你的代码在访问之间所做的事情。当然，如果你有很多小的数组列表，这可能会增加你的内存使用，使你更有可能缓存丢失。

原来的海报似乎只使用一个，在短时间内访问了很多内容，所以应该没有太大的困难。但是对于其他人来说可能有所不同，在解释微基准测试时应该注意。

Java Strings, however, are appallingly wasteful, especially if you store lots of small ones (just look at them with a memory analyzer, it seems to be > 60 bytes for a string of a few characters). An array of strings has an indirection to the String object, and another from the String object to a char[] which contains the string itself. If anything's going to blow your L1 cache it's this, combined with thousands or tens of thousands of Strings. So, if you're serious - really serious - about scraping out as much performance as possible then you could look at doing it differently. You could, say, hold two arrays, a char[] with all the strings in it, one after another, and an int[] with offsets to the starts. This will be a PITA to do anything with, and you almost certainly don't need it. And if you do, you've chosen the wrong language.

2014-10-30 16:14:17

Java中的数组或列表。哪个更快?

推荐文章

最新文章

标签