Hello, welcome to the second part of “Parallel Programming in Java” series. If you have somehow missed the first part and want to read it now, then read it here.

So Part-1 covered the very basic concept of task execution in parallel and how ForkJoin framework helps in writing parallel programs easily. We saw how tasks which are recursive in nature and can be divided into smaller parts can be executed in parallel and a great speed can be achieved.

There was a note at the end of the previous part which said :

“You can do some optimisations by providing a threshold instead of going up to (start==end), for bigger inputs it can become a game changer to provide a threshold. If the input size goes below the threshold then just use sequential execution. Because creating worker threads have overhead which might not be worth for smaller inputs.”

Now setting up threshold can really speed up the performance or at least save you from the overhead of creating worker threads for smaller input sizes. Which means that, it is faster to run sequential programs for such input sizes, than executing it in parallel, why ? Because the overhead cost of creating worker threads, managing them is more than the sequential execution cost. So use the parallel approach with caution and only for large inputs. But how large ? How do we know what size is large enough ? Well, I would say that depends on the use case primarily and a lot of other factors, but in general always test the execution times using varying input sizes and identify the threshold for your program.

Getting back the previous program we can implement a threshold as follows :

    protected void compute() {
      //So instead of going up to chunk of size=1, we say
      // "use sequential approach for all sizes below THRESHOLD"
      if( (end-start+1) <= THRESHOLD ){
          //same as before...

If you try to run this optimisations for larger inputs (~1million numbers) , with threshold of (~50000 ) you may see a speedup of 1.5 to 2x, which is remarkable.

If you closely observe, at each step the execution splits the task into two more tasks which work on smaller input.You may chose to create “n” tasks at each step, which may get you more speedup. But you need to be careful to make sure overhead is not killing your parallelism.

Now, Java is a very high level language which believes in abstraction and hence, it provides API’s that hide the complexity of parallel execution. What does that mean ? In the previous example the programmer is aware of the ForkJoin framework being used (which is not always a bad thing) , but in simple cases it becomes a complex thing and improper usage may lead to incorrect behaviour of the algorithm.

Since Java-8, streams are available to us, which hides the whole iteration logic from the programmers. We can simply creates streams from collections, operate on each item in the stream, apply filters, map them to other objects, apply aggregations, apply reduce operations and a lot of other stuff. To see how things can be simplified lets refactor the sequential-sum code from the part-1 using Streams.

The sequential-sum code looks like below :

int sumAllNumbers_Seq(int[] nums){
  int sum=0;
  for(int i=0;i<nums.length;i++){
      sum+= nums[i];
  return sum;

Using streams we can write something like below :

int sumAllNumbers_Stream(int[] nums){
          .reduce(0,(x,y) -> x+y);

Or even simpler if you don’t want to deal with lambdas:

int sumAllNumbers_MethodReference(int[] nums){

That’s really awesome.Isn’t it ? But why are we talking about Java streams in an article on Parallel programming ? Because, Java8 also provides parallel stream APIs. For example the above function can be converted to a parallel execution function using just one api :

int sumAllNumbers_MethodReference_Parallel(int[] nums){
            .parallel() //As simple as this.

Whoa!! this is simply amazing. Java is on steroids here :D. Without dealing with the complexity of ForkJoin framework, you can write parallel programs just using a simple high level API. What can be better ? Nothing!

Btw, can you think of any other problem which we often solve and can be converted to a parallel algorithm ?? Write your ideas in comments. For now, lets deal with sorting a list of numbers.

Arrays utility in Java provides sort() api to sort a given list of numbers. Before jumping on to converting this into parallel, lets discuss why we think it can be converted ?

Because – ??? Think and type in your answer in the comments section.

Arrays utility class also provide a parallel execution API , which is parallelSort(). So basically you can simply call this API and the sort will utilise all your processors to provide you a fast computation.

If you are wondering how these APIs work ? The answer is simple. Using ForkJoin framework.

But wait, if Java provides APIs which are way easy to use, why should we read and understand the ForkJoin framework ? Because, Java doesn’t provide API’s for solving very specific use cases, it will only provide you APIs for general cases such as sorting. If you want to write your own API for your own use case then you will have to use ForkJoin.

Lets take another (a slightly more complex example) which is very much suitable for parallel execution.

Assume, we have an automated gift packing and delivery system. Gifts are some items which are gift-able and ordered by someone to be delivered as a gift to some other person. And, you have a nice warehouse of items with conveyer belt to automate the steps. After we receive an order, it is sent to a robot control centre which asks a robot to pickup the ordered item to pickup from the rack and put on the conveyer belt. Conveyer belt them goes through multiple stations which perform operations on each item.

  • Basic analysis
  • Price tag removal
  • Gift wrap
  • Writing a gift message
  • Send to delivery station

The above sequence happens for each item which needs to be delivered. Lets see how we can write a program for this :

void conveyerBelt_Execution(List<Item> items) {
    items.parallelStream()  //use of parallel stream.
            .filter(item -> analyze(item)) //filters items in bad condition
            .forEach(this::sendFordelivery); //terminal function to end processing.

boolean analyze(Item item){
  return item.isGood();

Item removePriceTag(Item item){
 return tagRemovalService.removeTag(item);

Item giftWrap(Item item){
  return giftWrapService.giftWrap(item);

Item giftMessage(Item item){
  return giftMessageService.giftMessage(item);

void sendForDelivery(Item item){
 return deliveryService.sendForDelivery(item);

So as you saw, we used another API  which is parallelStream(). This will process each item in parallel, provided you have multiple processing stations available in the warehouse.

As you can easily imagine, while the data is growing exponentially,  parallel, concurrent and distributed programming can come to rescue. And hence, it becomes way more important to be able to design and implement such algorithms.

At this point I would like to leave you restless to try more Java parallel stream APIs.

Stay tuned for upcoming articles, where we will see more synchronisation aids in Java, which provide flexibility at a more granular level.

I hope you liked the article. Please like, comment and share your ideas for any suggestion, correction etc.


Kaivalya Apte


Previous post
Next post
Related Posts
Leave a Reply

Your email address will not be published. Required fields are marked *