Wednesday, February 9, 2011

.toArray(new MyClass[0]) or .toArray(new MyClass[myList.size()]) ?

Assuming I have an ArrayList

ArrayList<MyClass> myList;

And I want to call toArray, is there a performance reason to use

MyClass[] arr = myList.toArray(new MyClass[myList.size()]);

over

MyClass[] arr = myList.toArray(new MyClass[0]);

?

I prefer the second style, since it's less verbose, and I assumed that the compiler will make sure the empty array doesn't really get created, but I've been wondering if that's true.

Of course, in 99% of the cases it doesn't make a difference one way or the other, but I'd like to keep a consistent style between my normal code and my optimized inner loops...

  • The compiler will not do an optimization. Most probably it will be an optimization of the JIT compiler at runtime and that will depend on your JVM.

    Personally, i use the former version. It's not that verbose.

  • Using 'toArray' with the array of the correct size will perform better as the alternative will create first the zero sized array then the array of the correct size. However, as you say the difference is likely to be negligible.

    Also, note that the javac compiler does not perform any optimization. These days all optimizations are performed by the JIT/HotSpot compilers at runtime. I am not aware of any optimizations around 'toArray' in any JVMs.

    The answer to your question, then, is largely a matter of style but for consistency's sake should form part of any coding standards you adhere to (whether documented or otherwise).

  • The first case is more efficient.

    That is because in the second case:

    MyClass[] arr = myList.toArray(new MyClass[0]);
    

    the runtime actually creates an empty array (with zero size) and then inside the toArray method creates another array to fit the actual data. This creation is done using reflection using the following code (taken from jdk1.5.0_10):

    public <T> T[] toArray(T[] a) {
        if (a.length < size)
            a = (T[])java.lang.reflect.Array.
     newInstance(a.getClass().getComponentType(), size);
    System.arraycopy(elementData, 0, a, 0, size);
        if (a.length > size)
            a[size] = null;
        return a;
    }
    

    By using the first form, you avoid the creation of a second array and also avoid the reflection code.

    Georgi : toArray() does not use reflection. At least as long as you do not count "casting" to reflection, anyway ;-).
    Tom Hawtin - tackline : toArray(T[]) does. It needs to create an array of the appropriate type. Modern JVMs optimise that kind of reflection to be about the same speed as the non-reflective version.
    Panagiotis Korros : I think that it does use reflection. The JDK 1.5.0_10 does for sure and reflection is the only way I know to create an array of a type that you don't know at compile time.
    Georgi : Then one of the source code examples her (the one above or mine) is out-of-date. Sadly, I didn't find a correct sub-version number for mine, though.
    Panagiotis Korros : Georgi, your code is from JDK 1.6 and if you see the implementation of the Arrays.copyTo method you will see that the implementation uses reflection.
    Georgi : Thanks for clearing things up Panagiotis.
  • toArray checks that the array passed is of the right size (that is, large enough to fit the elements from your list) and if so, uses that. Consequently if the size of the array provided it smaller than required, a new array will be reflexively created.

    In your case, an array of size zero, is immutable, so could safely be elevated to a static final variable, which might make your code a little cleaner, which avoids creating the array on each invocation. A new array will be created inside the method anyway, so it's a readability optimisation.

    Arguably the faster version is to pass the array of a correct size, but unless you can prove this code is a performance bottleneck, prefer readability to runtime performance until proven otherwise.

  • Use the first case because it is easier and provides cleaner code. The reason for this is that the underlying method of how the ToArray method works is to perform a copy operation which is O(n). The immutable memory isn't a big deal. Management of such objects is done very efficiently and it will expand to the desired size.

    Don't optimize too much until you've established that something is a bottleneck in your code. If you spend too much time optimizing this you're just wasting time. I am sure there are plenty of other things you can optimize, so I would say use either one. If you want readability and less verboseness take the first one. If you don't mind the extra code and lessening of clarity, use the latter.

  • As of http://java.sun.com/j2se/1.5.0/docs/api/java/util/ArrayList.html#toArray(T[]) the array will be filled already if it has the right size (or is bigger). Consequently

    MyClass[] arr = myList.toArray(new MyClass[myList.size()]);
    

    will create one array object, fill it and return it to "arr". On the other hand

    MyClass[] arr = myList.toArray(new MyClass[0]);
    

    will create two arrays. The second one is an array of MyClass with length 0. So there is an object creation for an object that will be thrown away immediately. As far as the source code suggests the compiler / JIT cannot optimize this one so that it is not created. Additionally, using the zero-length object results in casting(s) within the toArray() - method.

    See the source of ArrayList.toArray():

    public <T> T[] toArray(T[] a) {
        if (a.length < size)
            // Make a new array of a's runtime type, but my contents:
            return (T[]) Arrays.copyOf(elementData, size, a.getClass());
        System.arraycopy(elementData, 0, a, 0, size);
        if (a.length > size)
            a[size] = null;
        return a;
    }
    

    Use the first method so that only one object is created and avoid (implicit but nevertheless expensive) castings.

    Kevin Day : wow - you learn something new every day. Thanks for the explanation - definitely worth knowing. Just checked the 1.5 javadocs, and this applies for all List implementations - not just ArrayList
    Erick Robertson : I was wondering this myself, and was going to ask the question if it wasn't asked. Very late +1.
    From Georgi
  • Modern JVMs optimise reflective array construction in this case, so the performance difference is tiny. Naming the collection twice in such boilerplate code is not a great idea, so I'd avoid the second method. Another advantage of the first is that it works with synchronised and concurrent collections. If you want to make optimisation, reuse the empty array (empty arrays are immutable and can be shared), or use a profiler(!).

  • Will your code compile (in either case)?

    I got an compilation error "Cannot create a generic array of T"

    Carlos Heuberger : well, `T` is a generic -> array can't be created; `MyClass` from the original code (probably) is a Class -> no problem; your `answer` is a question -> ...
    From Jason Yang

0 comments:

Post a Comment