Don’t Abuse Java Parallel Streams

A long long time ago I wrote an article regarding Can/Should I use parallel streams in a transaction context? that pointed out a part of the pitfalls regarding the erroneous usage of parallel streams. Recently I am seeing more and more usage of parallel streams with the false assumption that it will increase performance and not taking into account completely the potential issues. So let’s analyze the do’s and dont’s of parallel streams in Java.

Don’t use it when order matters

Let’s take a look at the following code

import java.util.ArrayList;
import java.util.List;

public class Main {

    public static void main (String[] args){
        var bigList = createABigList();

        var total = bigList.stream().reduce(1l,(acc, next) -> (acc/2) + next);
        System.out.println("Total "+total);
        var totalParallel = bigList.stream().parallel().reduce(1l,(acc,next) -> (acc/2)+next);
        System.out.println("Total Parallel "+totalParallel);
    }

    static List<Long> createABigList(){
        var myBigList = new ArrayList<Long>();
        for(int i = 1; i<= 1000000; i++){
            myBigList.add(Long.valueOf(i));
        }
        return myBigList;
    }

}

I used a mathematical iteration when the execution order affects the result. The results on my machine were

Total (Non parallel Stream) 1999998 (correct)
Total Parallel 10283175

The reason was that this happens is because Java Streams use ForkJoinPool to launch parallel executions splitting the task into multiple chunks recursively so that each of them can be computed independently. How to avoid the error? Create Unit tests 🙂

The number of threads in the common pool for ForkJoinPool is equal to the number of processor cores. This of course can be altered like so (for setting it for ex in 16) even though I don’t recommend it.

-D java.util.concurrent.ForkJoinPool.common.parallelism=16

Don’t use it for simple tasks because they are slower

Take a look at an alternate version of the code.

import java.util.ArrayList;
import java.util.List;

public class Main {

    static long startTimeMs;

    public static void main (String[] args){
        var bigList = createABigList();

        startTime();
        var total = bigList.stream().reduce(1l,(acc, next) -> (acc/2) + next);
        printExecutionTimeMs();
        System.out.println("Total "+total);

        startTime();
        var totalParallel = bigList.stream().parallel().reduce(1l,(acc,next) -> (acc/2)+next);
        printExecutionTimeMs();
        System.out.println("Total Parallel "+totalParallel);
    }

    static List<Long> createABigList(){
        var myBigList = new ArrayList<Long>();
        for(int i = 1; i<= 1000000; i++){
            myBigList.add(Long.valueOf(i));
        }
        return myBigList;
    }

    static void printExecutionTimeMs(){
        System.out.println("Elapsed Time "+(System.currentTimeMillis() - startTimeMs));
    }

    static void startTime(){
        startTimeMs = System.currentTimeMillis();
    }


}

The metrics show

Elapsed Time 27
Total 1999998
Elapsed Time 43
Total Parallel 10283175

This means that parallel streams took almost double the time. The reason behind this is that the overhead of managing threads, sources and results is a more expensive operation than the business one. So the rule of thumb, simple tasks are faster when not using parallel streams

When to Use Parallel Streams

Parallel streams is not a Harry Potters wang for boosted performance.

We use them when

  • We have costly operations that can be done parallel PROVIDED THEY ARE STATELESS (read again this Can/Should I use parallel streams in a transaction context? )
  • When we experience performance issues and we analyze the usage of them as an option, provided we refactor the code to utilize correctly the ForkJoinPool algorythm
  • For fast communication with Async external systems (for ex a fast broadcast of something to many recipients.)
  • Other similar cases

Hope you enjoyed the article, drop me a comment if so

Passionate Archer, Runner, Linux lover and JAVA Geek! That's about everything! Alexius Dionysius Diakogiannis is a Senior Java Solutions Architect and Squad Lead at the European Investment Bank. He has over 20 years of experience in Java/JEE development, with a strong focus on enterprise architecture, security and performance optimization. He is proficient in a wide range of technologies, including Spring, Hibernate and JakartaEE. Alexius is a certified Scrum Master and is passionate about agile development. He is also an experienced trainer and speaker, and has given presentations at a number of conferences and meetups. In his current role, Alexius is responsible for leading a team of developers in the development of mission-critical applications. He is also responsible for designing and implementing the architecture for these applications, focusing on performance optimization and security.