JIT Optimizations – Method Inlining

In this blog post, we are going to understand the impact of functions calls in an application and what JIT does to reduce its impact.

A function call is a relatively expensive operation but JIT makes sure that our application does not suffer, performance wise, due to a large number of function calls. JIT does function inlining to make sure that function calls are minimized in an application. Further, we are going to learn “how can we debug one’s application and see which functions are getting inlined”.

Are functions calls expensive?

At first, a function call may seem trivial to you but it involves a lot of instructions getting executed under the hood which in turn might kill your application performance if your application involves a lot of function calls.

Let us a consider a simple function call and see the native instructions involved:

private static void subtract(int num) {
    int r = 20 - num;
    return;
}

private int getNum() {
    int a1 = 10;
    subtract(12);
    return 0;
}

Following things happen when we call subtract function from getNum() function

random4

  • Arguments get pushed onto the stack i.e. “12”
  • Return Address of the Caller i.e. getNum() gets pushed onto the stack
  • Frame Pointer of the Caller i.e. getNum() is also pushed onto the stack. Frame Pointer points to the memory address storing the return address of the caller.
  • Call Instruction transfers the control to the callee and instructions of the callee starts getting executed.
  • Method signature of the callee is executed
  • Ret Instruction transfers the control back to the caller with frame pointer restored to the original frame pointer of the caller method.
  • Original Arguments Passed to the callee i.e. “12” are popped from the stack.

So we can see that there is a whole lot of instructions getting executed even when we call a simple function like subtract which makes a function call really expensive.

Just to add to this nowadays, with the advent of modern programming styles, it is highly recommended to write smaller functions to improve the readability of the code which in turn increases the overhead of the function calls even more.

Method Inlining

This is another important performance optimization used by JIT. Function inlining greatly influences the performance of an application.

Let’s check out the performance boost application gets with function inlining with this example:

We have to apply these three operations on a number.

  • Multiply constant x to the number
  • Subtract constant x to the number
  • Add constant x from the number

There are two ways to do this:

  • Methodology1: Inline all the operations in a single method
public static ArrayList experimentFunctionInlining(ArrayList arrayList) {
    for (int i = 0; i < 10000; i++) {
        int num2 = 10 * i;
        int num1 = 10 - num2;
        int num = 10 + num1;
        arrayList.add(num);
    }
    return arrayList;
}
  • Methodology2: Write all the operations in different methods and call them one by one after each operation
private static int add(int num) {
 return 10 + num;
}

private static int subtract(int num) {
 return 10 - num;
}

private static int multiply(int num) {
 return 10 * num;
}

public static ArrayList experimentFunctionCalling(ArrayList arrayList) {
    for (int i = 0; i < 10000; i++) {
        int num2 = multiply(i);
        int num1 = subtract(num2);
        int num = add(num1);
        arrayList.add(num);
    }
    return arrayList;
}

Note: These two tests have been benchmarked with JMH with JIT disabled, so as to understand the impact of function inlining.

  • Methodology 1 of inlining all the operations in a single method performs at ~135 ops/second
  • Methodology 2 of writing all operations in separate functions and calling them one by one performs at ~98 ops/second

This shows that function inlining has a huge impact on application performance.

But to write code via Methodology 1 is not always possible for the sake of readability. JIT comes in handy for such situations. JIT figures out the hot code path in an application and tries to inline all the methods lying on that hot code path. Now let’s run this same benchmark with JIT enabled and see if there is any performance difference between the two methodologies. Our hypothesis is that JIT should inline the methods/functions in Methodology2 and hence the performance numbers more or less should be the same.

And voila, yes they are

  • Methodology 1 with JIT enabled performs at ~ 9000 ops/second
  • Methodology 2 with JIT enabled also performs at ~ 9000 ops/second

So it seems with JIT, the performance of the JIT inlined method is in the same ballpark as the original inlined method. Also apart from reducing the function call overhead, one other important reason for function inlining is that inlined function have more context which can then be used by compilers to make many other optimizations.

Debug your application

JIT has certain limitations when it comes to inlining methods on hot code path. Method inlining depends on these factors:

  • JIT can inline methods up to a particular depth
  • JIT support inlined methods up to a particular size
  • To be Inlined Method Type
    • JIT can easily align static method types
    • For inlining virtual functions, it needs to be aware of the classType of the object on which function is called so as to resolve the function definition.
  • Many others …

Few terminologies to understand beforehand. Sample JIT output logs:

( Method 1 ) @ 4 com.test.experiments.operators.OperatorPipelineEmulationExperiment::experimentVirtual (45 bytes) inline (hot)
( Method 2 )   @ 4 com.test.experiments.operators.BufferedOperator:: (21 bytes) inline (hot)
( Method 3 )      @ 1 com.test.experiments.operators.Operator:: (5 bytes) inline (hot)
( Method 4 )   @ 15 com.test.experiments.operators.AddOperator:: (25 bytes) inline (hot)
  • @ Annotation in JIT denotes the place in java method which triggered the compilation ( i.e. osr_bci ). Like in the above example, the code at the 4th index in the method 1 triggered an OSR compilation request.
  • To show the method inlining hierarchy, JIT chooses this format. In this, we can clearly see that
    • Method 1 inlines Method 2 and Method 4.
    • Method 2 inlines Method 3
  • TypeProfile is a special kind of check or profiling made by JIT which is used when we want to inline virtual functions. Inlining in cases where polymorphism is involved is difficult due to a simple fact that the caller might refer to different methods or different call sites depending on the classType of the object on which method is called. So in these cases, JIT profiles the types or call sites to which we are making calls and in cases, we are making calls to a single call site, JIT optimizes those after taking enough data samples.

Let’s understand the logs for method Inlining in JIT. We will use this sample application for testing purposes.

public static ArrayList experimentVirtual(ArrayList arrayList) {
    BufferedOperator bufferedOperator = new BufferedOperator(); // Line 1
    AddOperator addOperator = new AddOperator(bufferedOperator, 10); // Line 2
    SourceOperator sourceOperator = new SourceOperator(addOperator, true); // Line 3
    sourceOperator.setArrayList(arrayList); // Line 4
    sourceOperator.get(1); // Line 5
    return bufferedOperator.arrayList;
}

Note: For more code details see this link

Here are the JIT logs for the application

-XX:+UnlockDiagnosticVMOptions
-XX:+PrintInlining
-XX:+PrintCompilation

jit_inline

  • JIT inlines the call sites involved in the first 3 lines of the method ( i.e. experimentVirtual ) which is obvious in Section 1, 2 and 3.
    BufferedOperator bufferedOperator = new BufferedOperator();
    AddOperator addOperator = new AddOperator(bufferedOperator, 10);
    SourceOperator sourceOperator = new SourceOperator(addOperator, true);
  • In Section 4, we can clearly see that JIT is trying to inline all the call sites involved in line number 5 (i.e. sourceOperator.get(1))

    sourceOperator.get(1);
    • In section 4, we can see that first, it tries to inline the source code for the .get() implementation in sourceOperator.
      for (int i = 0; i < nums.size(); i++) {
          int p = nums.get(i);
          if (enableFlush && (i % flushNumber == 0)) {
              underlyingOperator.get(FLUSH_CODE);
          }
          underlyingOperator.get(p);
      }
    • Also with the help of typeProfileit figures out the call site involved in underlyingOperator.get() and inlines that as well  i.e. AddOperator.get().

References

3 thoughts on “JIT Optimizations – Method Inlining

Leave a Reply