Developer Forum: 2014

Wednesday, June 11, 2014

Spiral Traversal of 2 dimensional Array

I would like to share this interesting question which is asked in data structures and algorithms round in some companies:

Problem: How would you traverse a two dimensional array in spiral fashion.

package com.girish.algorithms;

public class SpiralTraversal {

static int ROW_SIZE = 5;
static int COLUMN_SIZE = 6;
static int COLUMNS_TILL_RIGHT=5;
static int ROWS_TILL_DOWN=4;
static int COLUMNS_TILL_LEFT=0;
static int ROWS_TILL_UP=0;
static int[][] Array = new int[][]
{ {1, 2, 3, 4, 5, 6},
{7, 8, 9, 10, 11, 12},
{13, 14, 15, 16, 17, 18},
{19, 20, 21, 22, 23, 24},
{25, 26, 27, 28, 29, 30} };

public static void traverse(int m, int n, boolean directionUp)
{

System.out.println("{"+m+", "+n+"} "+ Array[m][ n]);
// Recurse right
if ((n+1<=COLUMNS_TILL_RIGHT) && !directionUp)
{
traverse(m, n+1, false);
return;
}
// Recurse down

if ((m+1<=ROWS_TILL_DOWN) && !directionUp)
{
traverse(m+1, n, false);
return;
}
// Recurse Left
if ((m > 0 && n >0) && (n-1>COLUMNS_TILL_LEFT))
{
traverse(m, n-1, true);
return;
}
//Recurse up
if ((m > 0) && (m-1>ROWS_TILL_UP))
{
traverse(m-1, n, true);
return;
}
COLUMNS_TILL_RIGHT--;
ROWS_TILL_DOWN--;
COLUMNS_TILL_LEFT++;
ROWS_TILL_UP++;
if ((COLUMNS_TILL_LEFT < COLUMNS_TILL_RIGHT) || (ROWS_TILL_DOWN > ROWS_TILL_UP))
{
traverse (m, n+1, false);
}
}

public static void main(String[] args) {
traverse(0, 0, false);

}

}

Thursday, June 5, 2014

Git Basics

Sunday, April 13, 2014

HotSpot JVM GC and performance tuning

Performance tuning is a very interesting topic which offers many things to learn about the JVM and your application.

Every Java application has its own behaviour and its own requirements thus providing a great scope to learn something new while doing tuning.

Mostly developers are focused on working only on the package they are supposed to code. While coding there are lots of faux pas they commit without realizing how it could bring down the performance in the production. I would like to put some thought on how to analyse the JVM tuning and coding practices which are effective to minimize performance glitches.

Heap is divided into 2 parts:

1. Young Generation

Young generation memory consists of two parts, Eden space and 2 survivor spaces. Most objects are initially allocated in eden. One of the survivor spaces is always empty and serves as the destination of any live objects in eden and the other survivor space during the next copying collection. Objects are copied between survivor spaces in this way until they are old enough to be tenured (copied to the tenured generation). Thus Shortlived objects will be available in Eden space. Every object starts its life from Eden space. When GC happens, if an object is still alive and it will be moved to survivor space and other dereferenced objects will be removed.

2. Old Generation – Tenured and Perm Gen

Old generation memory has two parts, tenured generation and permanent generation (Perm Gen). Perm Gen is a popular term. We used to error like Perm Gen space not sufficient.

GC moves live objects from survivor space to tenured generation. The permanent generation contains meta data of the virtual machine, class and method objects.

Performance criteria:

There are two primary measures of garbage collection performance:

Throughput is the percentage of total time not spent in garbage collection, considered over long periods of time. Throughput includes time spent in allocation (but tuning for speed of allocation is generally not needed).
Pauses are the times when an application appears unresponsive because garbage collection is occurring.
Footprint is the working set of a process, measured in pages and cache lines. On systems with limited physical memory or many processes, footprint may dictate scalability.

In simple terms the goal of tuning is to provide good performance with little or no tuning of command line options by selecting the garbage collector,heap size,and runtime compiler at JVM startup, instead of using fixed defaults.

Selecting a collector:

If the application has a small data set (up to approximately 100MB), then select the serial collector with -XX:+UseSerialGC.

If the application will be run on a single processor and there are no pause time requirements, then let the VM select the collector, or select the serial collector with -XX:+UseSerialGC.

If (a) peak application performance is the first priority and (b) there are no pause time requirements or pauses of one second or longer are acceptable, then let the VM select the collector, or select the parallel collector with -XX:+UseParallelGC and (optionally) enable parallel compaction with -XX:+UseParallelOldGC.

If response time is more important than overall throughput and garbage collection pauses must be kept shorter than approximately one second, then select the concurrent collector with -XX:+UseConcMarkSweepGC. If only one or two processors are available, consider using incremental mode, described below.

Reducing Garbage-Collection Pause Time:

There are two general ways to reduce garbage-collection pause time and the impact it has on application performance:

The garbage collection can itself leverage the existence of multiple CPUs and be executed in parallel. Although the application threads remain fully suspended during this time, the garbage collection can be done in a fraction of the time, effectively reducing the suspension time.

The second approach is to leave the application running, and execute garbage collection concurrently with the application execution.

These two logical solutions have led to the development of serial, parallel, and concurrent garbage-collection strategies, which represent the foundation of all existing Java garbage-collection implementations

The serial collector suspends the application and executes the mark-and-sweep algorithm in a single thread. It is the simplest and oldest form of garbage collection in Java and is still the default in the Oracle HotSpot JVM.

The parallel collector uses multiple threads to do its work. It can therefore decrease the GC pause time by leveraging multiple CPUs. It is often the best choice for throughput applications.

The concurrent collector does the majority of its work concurrent with the application execution. It has to suspend the application for only very short amounts of time. This has a big benefit for response-time–sensitive applications, but is not without drawbacks.

(Mostly) Concurrent Marking and Sweeping

Concurrent garbage-collection strategies complicate the relatively simple mark-and-sweep algorithm a bit. The mark phase is usually sub-divided into some variant of the following:

In the initial marking, the GC root objects are marked as alive. During this phase, all threads of the application are suspended.

During concurrent marking, the marked root objects are traversed and all reachable objects are marked. This phase is fully concurrent with application execution, so all application threads are active and can even allocate new objects. For this reason there might be another phase that marks objects that have been allocated during the concurrent marking. This is sometimes referred to as pre-cleaning and is still done concurrent to the application execution.

In the final marking, all threads are suspended and all remaining newly allocated objects are marked as alive. This is indicated in Figure 2.6 by the re-mark label.

The concurrent mark works mostly, but not completely, without pausing the application. The tradeoff is a more complex algorithm and an additional phase that is not necessary in a normal stop-the-world GC: the final marking.

The Oracle JRockit JVM improves this algorithm with the help of a keep area, which, if you’re interested, is described in detail in the JRockit documentation. New objects are kept separately and not considered garbage during the first GC. This eliminates the need for a final marking or re-mark.

In the sweep phase of the CMS, all memory areas not occupied by marked objects are found and added to the free list. In other words, the objects are swept by the GC. This phase can run at least partially concurrent to the application. For instance, JRockit divides the heap into two areas of equal size and sweeps one then the other. During this phase, no threads are stopped, but allocations take place only in the area that is not actively being swept.

The downsides of the CMS algorithm can be quickly identified:

As the marking phase is concurrent to the application’s execution, the space allocated for objects can surpass the capacity of the CMS, leading to an allocation error.

The free lists immediately lead to memory fragmentation and all this entails.

The algorithm is more complicated than the other two and consequently requires more CPU cycles.

The algorithm requires more fine-tuning and has more configuration options than the other approaches.

These disadvantages aside, the CMS will nearly always lead to greater predictability and better application response time.

Reducing the Impact of Compacting

Modern garbage collectors execute their compacting processes in parallel, leveraging multiple CPUs. Nevertheless, nearly all of them have to suspend the application during this process. JVMs with several gigabytes of memory can be suspended for several seconds or more. To work around this, the various JVMs each implements a set of parameters that can be used to compact memory in smaller, incremental steps instead of as a single big block. The parameters are as follows:

Compacting is executed not for every GC cycle, but only once a certain level of fragmentation is reached (e.g., if more than 50% of the free memory is not continuous).

One can configure a target fragmentation. Instead of compacting everything, the garbage collector compacts only until a designated percentage of the free memory is available as a continuous block.

This works, but the optimization process is tedious, involves a lot of testing, and needs to be done again and again for every application to achieve optimum results.

Sizing of Heap and Various Ratios:

A number of parameters affect generation size. The following diagram illustrates the difference between committed space and virtual space in the heap. At initialization of the virtual machine, the entire space for the heap is reserved. The size of the space reserved can be specified with the -Xmx option. If the value of the -Xms parameter is smaller than the value of the -Xmx parameter, not all of the space that is reserved is immediately committed to the virtual machine. The uncommitted space is labeled "virtual" in this figure. The different parts of the heap (permanent generation, tenured generation and young generation) can grow to the limit of the virtual space as needed.

Some of the parameters are ratios of one part of the heap to another.

Total Heap

Since collections occur when generations fill up, throughput is inversely proportional to the amount of memory available. Total available memory is the most important factor affecting garbage collection performance.

By default, the virtual machine grows or shrinks the heap at each collection to try to keep the proportion of free space to live objects at each collection within a specific range. This target range is set as a percentage by the parameters -XX:MinHeapFreeRatio=<minimum> and -XX:MaxHeapFreeRatio=<maximum>, and the total size is bounded below by -Xms<min> and above by -Xmx<max>. The default parameters for the 32-bit Solaris Operating System (SPARC Platform Edition) are shown in this table:

Parameter

Default Value

MinHeapFreeRatio 40

MaxHeapFreeRatio 70

-Xms 3670k

-Xmx 64m

Lets understand these parameters:

If MinHeapFreeRatio is 40(%) , if the percent of free space in a generation falls below 40%, the generation will be expanded to maintain 40% free space, up to the maximum allowed size of the generation. Similarly, if the free space exceeds 70%, the generation will be contracted so that only 70% of the space is free, subject to the minimum size of the generation.

Generally initial heap size for application is very small, thumb rule is unless you have problems with pauses, try granting as much memory as possible to the virtual machine. The default size (64MB) is often too small.

Setting -Xms and -Xmx to the same value increases predictability by removing the most important sizing decision from the virtual machine. However, the virtual machine is then unable to compensate if you make a poor choice.

In general, increase the memory as you increase the number of processors, since allocation can be parallelized.

The Young Generation

The second most influential knob is the proportion of the heap dedicated to the young generation. Larger young generation means lesser minor collections.However, for a fixed heap size if young generation is large that means space for tenured gets reduced and hence major GC occurs frequently.

By default, the young generation size is controlled by NewRatio. For example, setting -XX:NewRatio=3 means that the ratio between the young and tenured generation is 1:3. In other words, the combined size of the eden and survivor spaces will be one fourth of the total heap size.

The parameters NewSize and MaxNewSize bound the young generation size from below and above. Setting these to the same value fixes the young generation, just as setting -Xms and -Xmx to the same value fixes the total heap size. This is useful for tuning the young generation at a finer granularity than the integral multiples allowed by NewRatio.

Survivor Space Sizing

Though we can confiure the survivor space sizing but from performance perspective it is not that important.For example, -XX:SurvivorRatio=n sets the ratio between eden and a survivor space to 1:n. In other words, each survivor space will be one nth the size of eden, and thus one (n+2)th the size of the young generation (not one (n+1)th, because there are two survivor spaces).

If survivor spaces are too small, copying collection overflows directly into the tenured generation. If survivor spaces are too large, they will be uselessly empty.

Here are the default values for the 32-bit Solaris Operating System (SPARC Platform Edition); the default values on other platforms are different.

Default Value

Parameter

Client JVM

Server JVM

NewRatio 8 2

NewSize 2228K 2228K

MaxNewSize not limited not limited

SurvivorRatio 32 32

The maximum size of the young generation will be calculated from the maximum size of the total heap and NewRatio. The "not limited" default value for MaxNewSize means that the calculated value is not limited by MaxNewSize unless a value for MaxNewSize is specified on the command line.

The steps for server applications for setting parameters are:

First decide the maximum heap size you can afford to give the virtual machine. Then tune the young generation size to give you optimum/best results.

Note that the maximum heap size should always be smaller than the amount of memory installed on the machine, to avoid excessive page faults and thrashing.

If the total heap size is fixed, increasing the young generation size requires reducing the tenured generation size. Keep the tenured generation large enough to hold all the live data used by the application at any given time, plus some amount of slack space (10-20% or more).

Subject to the above constraint on the tenured generation:

Grant plenty of memory to the young generation.

Increase the young generation size as you increase the number of processors, since allocation can be parallelized.

Optimization:

Optimization is an art. There are no magical data structures capable of solving every problem. As you can see, you have to fight for every byte. Memory optimization is a complex process. Remember that you should design your data so each object can be referenced from different collections (instead of having to copy data). It is usually better to use semantically immutable objects because you can easily share them instead of copying them. And from my experience, in a well-designed application, optimization and tuning can reduce memory usage by 30-50%.

Friday, April 11, 2014

HOT SPOT JVM memory Optimization and tools

In this blog post, I want to discuss optimization of java memory usage. The Sun JDK has two simple-but-powerful tools for memory profiling -- jmap and jhat.

jmap has two important capabilities for memory profiling. It can:
• create a heap dump file for any live java process
• show a heap distribution histogram

Neither of these capabilities requires any special parameters for the Java virtual machine (JVM). Below is a heap distribution histogram produced by jmap.

gyadav@gyadav-ubuntu:jmap -histo:live 5241

num #instances #bytes class name
----------------------------------------------
1: 123406 19940016 <constMethodKlass>
2: 198852 16388720 [C
3: 123406 15831360 <methodKlass>
4: 11531 14892200 <constantPoolKlass>
5: 11531 8938112 <instanceKlassKlass>
6: 9596 8641280 <constantPoolCacheKlass>
7: 91819 4703840 [B
8: 176758 4242192 java.lang.String
9: 88520 2832640 java.util.HashMap$Entry
10: 27069 1948968 java.lang.reflect.Field
11: 12444 1577032 java.lang.Class
12: 10453 1383296 [Ljava.util.HashMap$Entry;
13: 17116 1321848 [S
14: 29401 1318704 [Ljava.lang.Object;
15: 24378 1143144 [I
16: 1734 999016 <methodDataKlass>
17: 18074 922552 [[I
18: 15235 680328 [Ljava.lang.String;
19: 896 480256 <objArrayKlassKlass>
20: 9605 461040 org.eclipse.core.internal.registry.ReferenceMap$SoftRef
21: 9394 450912 java.util.HashMap
22: 10832 433280 java.util.LinkedHashMap$Entry
23: 17229 413496 java.util.ArrayList

...
Total 1317091 118936448

For each class we can see the class name, the number of instances of the class in the heap, and the number of bytes used in the heap by all instances of the class. The table is sorted by consumed heap space.

You can also try jhat. jhat can read and explore a heap dump. It has a web interface and you can click though your dumped object graph.

gyadav@gyadav-ubuntu:~$ jmap -dump:file=eclipse.heap 5241
Dumping heap to /home/gyadav/eclipse.heap ...
Heap dump file created

gyadav@gyadav-ubuntu:~$ jhat -J-Xmx512m eclipse.heap
Reading from eclipse.heap...
Dump file created Thu May 01 12:43:48 IST 2014
Snapshot read, resolving...
Resolving 1365807 objects...
Chasing references, expect 273 dots.................................................................................................................................................................................................................................................................................
Eliminating duplicate references.................................................................................................................................................................................................................................................................................
Snapshot resolved.
Started HTTP server on port 7000

Server is ready.

Now open http://localhost:7000 and you can see a summary of all classes. You can use standard queries via links or go straight to the “Execute Object Query Language (OQL) query” link at the bottom and type your own query. Query language is quite awkward (it is based on the java script engine) and may not work well for large numbers of objects, but it is a very powerful tool.

Enterprise applications are 80% strings and maps

Let’s look at the jmap histogram again. In the top row, we can see "[C" class consuming most of the heap space. Actually, these char arrays are part of String objects, and we can see that String instances are also consuming considerable space. From my experience, 60-80% of heaps in an enterprise application are consumed by strings and hash maps.
Strings
Let's look how the JVM is storing strings. String objects are semantically immutable. Each instance has four fields (all except hash are marked final):
• Reference to char array
• Integer offset
• Integer count of character
• Integer string hash (lazily evaluated, and once evaluated never changes)

How much memory does one String instance consume?

Here and below are size calculations for the Sun JVM (32bit). This should be similar for other vendors.
Object header (8 bytes) + 3 refs (12 bytes) + int (4 bytes) = 24 bytes. But the String instance is only a header. Actual text is stored in char array (2 bytes each character + 12 bytes array header).

String instances can share char arrays with each other. When you call substring(…) on a String instance, no new char array allocation happens. Instead, a new String referencing subrange of existing char arrays is created. Sometimes this can become a problem. Imagine you are loading a text file (e.g., CSV). First you are loading the entre line as string, then you seek the position of the field and call substring(…). Now your small field value string object has a reference to an entry line of text. Sometime later, a string header for the text line object is collected, but characters are still in memory because they are referenced via other string object.

If you are creating a String object with a constructor, a new char array is always allocated. To copy content of a substring you can use the following construct (it looks a bit strange, but it works):
new String(a.substring(…))

String.intern() – think twice
String class has a method intern() that can guarantee the following:
a.equals(b) => a.intern() == b.intern()

There is a table inside of the JVM used to store normal forms of strings. If some text is found in a table then value from a table will be returned from intern(), else the string will be added to table. So a reference returned by intern() is always an object from JVM intern table.

String intern table keep weak references to its objects, so unused strings can be collected as garbage when no other references exist except from intern table itself. It looks like a great idea to use intern(), and eliminate all duplicated strings in an application. Many have tried … and many have regretted such decision. I cannot say this is true for every JVM vendor, but if you are using Sun’s JVM, you should never do this. Why?

JVM string intern tables are stored in PermGen -- a separate region of the heap (not included in -Xmx size) used for the JVM’s internal needs. Garbage collection in this area is expensive and size is limited (though configurable).
You would have to insert a new string into the table which has O(n) complexity, where n is table size.

String intern tables in JVMs work perfectly for the JVM’s needs (new entries are added only while loading new classes so insertion time is not a big issue) and it is very compact. But it is completely unsuitable for storing millions of application strings. It just was not designed for such a use case.
Removing of duplicates

Maps and sets
Good old java.util.HashMap is used everywhere. Standard implementation in JDKs is to use the open hash table data structure.

References to key and value are stored in the Entry object (which also keeps the hash for the key for faster access). If several keys are mapped to same hash slot, they are stored as a list of interlinked entries. The size of each entry structure: object header + 3 references + int = 24 bytes. java.util.HashSet is using java.util.HashMap under the hood so your overhead will be the same. Can we store map/set in more compact form? Sure.

Sorted array map
If the keys are comparable, we can store the map or set as sorted arrays of interleaved key/value pairs (array of keys in the case of a set). Binary search can be used for fast lookups in an array, but insertion and deletion of entries will have to shift elements. Due to the high cost of updates in such a structure, it will be effective only for smaller collection sizes, and for operation patterns that are mostly read. Fortunately, this is usually what we have -- map/set of a few dozen objects that are read more often than modified.

Closed hash table
If we have a collection of a larger size, we should use a hashtable. Can we make the hashtable more compact? Again, the answer is yes. There are data structures called closed hashtables that do not require entry objects.

In closed hashtables, references from the hashtable point directly to an object (key). What if we want to put in a reference to a key but the hash slot is occupied already? In such a case, we should find another slot (e.g., next one). How do you lookup a key? Search through all adjacent slots until key or null reference is found. As you can see from algorithm, it is very important to have enough empty slots in your table. Density of closed hash tables should be kept below 0.5 to avoid performance degradation.

Conclusion
Optimization is an art. There are no magical data structures capable of solving every problem. As you can see, you have to fight for every byte. Memory optimization is a complex process. Remember that you should design your data so each object can be referenced from different collections (instead of having to copy data). It is usually better to use semantically immutable objects because you can easily share them instead of copying them. And from my experience, in a well-designed application, optimization and tuning can reduce memory usage by 30-50%.

Sunday, February 2, 2014

JRocket JVM tuning

This blog will cover the command line options for setting various heap sizing and GC related parameters. I have referred oracle docs for most of the stuff . For more details you can read oracle docs.

Basic Tuning:

Tuning the Heap Size

The heap is the area where Java objects reside. A large heap decreases the garbage collection frequency but may take slightly longer to garbage collect. Typically a heap should be at least twice the size of the live objects in the heap, meaning that at least half of the heap should be freed at each garbage collection. For server applications you can usually set the heap as large as the available memory in your system will allow, as long as this doesn’t cause paging.

Set the heap size using the following command line options:

-Xms:<size>, which sets the initial and minimum heap size.
-Xmx:<size>, which sets the maximum heap size.
For example a server application running on a machine with 2 GB RAM memory could be started with the following settings:

java -Xms:800m -Xmx:1000m MyServerApp

This starts the JVM with a heap of 800 MB and allows the heap to grow up to 1000MB.

For in-depth information on setting the heap size, see Setting the Heap Size.

Tuning the Garbage Collection

Garbage collection is the process of reclaiming space from objects that are no longer in use, so that this space can be used for allocation of new objects. Garbage collection uses system resources in one way or another. By tuning the garbage collection you can decide how and when the resources are used. The JRockit JVM offers three garbage collection modes and a number of static garbage collection strategies. These allow you to tune the garbage collection to suit your application’s needs.

Select the garbage collection mode by using one of the following options:

-XgcPrio:throughput, which defines that the garbage collection should be optimized for application throughput. This is the default garbage collection mode.
-XgcPrio:pausetime, which defines that the garbage collection should be optimized for short garbage collection pauses.
-XgcPrio:deterministic, which defines that the garbage collection should be optimized for very short and deterministic garbage collection pauses. This option is only available as part of Oracle JRockit Real Time.
For example a transaction based application which requires reasonably low latencies could be started with the following settings:

java -XgcPrio:pauseTime MyTransactionApp

This starts the JVM with the garbage collection optimized for short garbage collection pauses.

For in-depth information on selecting a garbage collection mode or a static garbage collection strategy, see Selecting and Tuning a Garbage Collector.

Tuning the Nursery Size

Some of the garbage collection modes and strategies in the JRockit JVM use a nursery. The nursery is an area of the heap where new objects are allocated. When the nursery becomes full it is garbage collected separately in a young collection. The nursery size decides the frequency and duration of young collections. A larger nursery decreases the frequency but slightly increases the duration of each young collection.

In the JRockit JVM R27.3.0 and later the nursery size is adjusted automatically to optimize for application throughput if you use -XgcPrio:throughput (default) or -Xgc:genpar. For other garbage collection modes and static strategies or older versions of the JVM you may want to tune the nursery size manually. Typically the nursery size should be as large as possible while maintaining reasonably short young collection pauses. Depending on the application, a reasonable nursery size can be anything from a few megabytes up to about half of the heap size.

Set the nursery size by using the following command line option:

-Xns:<size>
For example a transaction based application running on a machine with 2GB RAM memory could be started with the following settings:

java -Xms:800m -Xmx:1000m -XgcPrio:pausetime -Xns:100m MyTransactionApp

This starts up the JVM with a heap of 800 MB, allowing it to grow up to 1000 MB. The garbage collection is set to optimize for pause times and the nursery size is set to 100 MB. Note that the dynamic garbage collection mode may choose to run without a nursery, but whenever a nursery is used it will be 100 MB.

For in-depth information on how to tune the nursery size, see Setting the Nursery and Keep Area Size.

Tuning the Pause Target

-XgcPrio:pausetime and -XgcPrio:deterministic use a pause target for optimizing the pause times while keeping the application throughput as high as possible. A higher pause target usually allows for a higher application throughput, thus you should set the pause target as high as your application can tolerate.

Set the pause target by using the following command line option:

-XpauseTarget:<time>
For example a transaction based application with transactions that normally take 100 ms and time out after 400 ms could be started with the following settings:

java -XgcPrio:pausetime -XpauseTarget:250 MyTransactionApp

This starts up the JVM with garbage collection optimized for short pauses with a pause target of 250 ms. This leaves a 50 ms margin before time-out for 100 ms transactions that are interrupted by a 250 ms garbage collection pause.

For in-depth information on tuning the pause target, see Setting a Pause Target for Pausetime Mode.

Performance Tuning

To be able to tune your JVM for better application throughput you must first have a way of assessing the throughput. A common way of measuring the application throughput is to time the execution of a pre-defined set of test cases. Optimally the test cases should simulate several different use cases and be as close to real scenarios as possible. Also, one test run should take at least a few minutes, so that the JVM has time to warm up.

This section describes a few optional performance features that improve the performance for many applications. Once you have a way of assessing the throughput of your application you can try out the following features:

Lazy Unlocking
Call Profiling
Large Pages
Lazy Unlocking

The JRockit JVM R27.3 and later offers a feature called lazy unlocking. This feature makes synchronized Java code run faster when the contention on the locks is low.

Try this feature on your application by adding the following option to the command line:

-XXlazyUnlocking
For more information on this option, see the documentation for -XXlazyUnlocking.

Call Profiling

Call profiling enables the use of more advanced profiling for code optimizations and can increase the performance for many applications. This option is supported in the JRockit JVM R27.3.0 and later versions.

Try this feature on your application by adding the following option to the command line:

-XXcallProfiling
For more information on this option, see the documentation for -XXcallProfiling.

Large Pages

The JRockit JVM can use large pages for the Java heap and other memory areas in the JVM. To use large pages, you must first configure your operating system for large pages. Then you can add the following option to the Java command line:

-XlargePages
For complete instructions on how to use this option and configure your operating system for large pages, see the documentation for -XlargePages.

Advanced Tuning

Some applications may benefit from further tuning. It is important that you verify the results of the tuning by monitoring and benchmarking your application. Advanced tuning of the JRockit JVM can give you improved performance and predictable behavior if done correctly, while incorrect tuning may lead to uneven performance, low performance or performance degradation over time.

Tuning Compaction

Compaction of objects is the process of moving objects closer to each other in the heap, thus reducing the fragmentation and making object allocation easier for the JVM. The JRockit JVM compacts a part of the heap at each garbage collection (or old collection, if the garbage collector is generational).

Compaction may in some cases lead to long garbage collection pauses. To assess the impact of compaction on garbage collection pauses you can either monitor the -Xverbose:gcpause outputs or create a JRA recording and look at the garbage collection pauses in the Java Runtime Analyzer (see Using Oracle JRockit Mission Control Tools for more information). Look for old collection pause times and pause parts called “compaction” and “reference updates”. The compaction pause times depend on the compaction ratio and the compact set limit.

Compaction Ratio

The compaction ratio determines how many percent of the heap will be compacted during each garbage collection (old collection). The compaction ratio is set using the following option:

-XXcompactRatio:<percentage>
You can tune the compaction ratio if the garbage collection pauses are too long because of compaction. As a start, you can try lowering the compaction ratio to 1 and see if the problem persists. If it doesn’t, you should try gradually increasing the compaction ratio as long as the compaction times stay short. A good value for the compact ratio is usually between 1 and 20, sometimes even higher. If the problem persists even though you set the compaction ratio to 1, you can try changing the compact set limit.

Setting the compaction ratio too low may increase the fragmentation and the amount of “dark matter”, which is free space that is too small to be used for object allocation. You can see the amount of dark matter in JRA recordings.

Compact Set Limit

The compact set limit prevents sets a limit for how many references there can be to objects within the compaction area. If the number of references exceeds this limit, the compaction is canceled. The compact set limit is set using the following option:

-XXcompactSetLimit:<references>
You can tune the compact set limit if the garbage collection pauses are too long due to compaction. As a start, you can try setting the compact set limit as low as 10.000. If the problem is solved you should try gradually increasing the compact set limit as long as the compaction times stay low. A normal value for the compact set limit is usually between 100.000 and several million, while lower values are used when the pause time limits are very low.

Setting the compact set limit too low may stop compaction from being done altogether, which you can see in the verbose logs or in a JRA recording, where all compactions are noted as “aborted”. Running without any compaction at all may lead to increasing fragmentation, which will in the end force the JVM to perform a full compaction of the whole heap at once, which may take several seconds. Thus we recommend that you do not decrease the compact set limit unless you really have to.

Note: -XXcompactSetLimit has no effect when -XgcPrio:deterministic or -XgcPrio:pausetime is used. For these garbage collection modes you should not tune the compaction manually, but instead use the -XpauseTarget option to tune the garbage collection pauses.
For in-depth information on how to tune compaction, see Tuning the Compaction of Memory.

Tuning the TLA size

The thread local area (TLA) is a chunk of free space reserved on the heap or the nursery and given to a thread for its exclusive use. A thread can allocate small objects in its own TLA without synchronizing with other threads. When the TLA gets full the thread simply requests a new TLA. The objects allocated in a TLA are accessible to all Java threads and are not considered “thread local” in any way after they have been allocated.

Increasing the TLA size is beneficial for multi threaded applications where each thread allocates a lot of objects. Increasing the TLA size is also beneficial when the average size of the allocated objects is large, as this allows larger objects to be allocated in the TLAs. Increasing the TLA size too much may however cause more fragmentation and more frequent garbage collections. To assess the sizes of the objects allocated by your application you can do a JRA recording and view object allocation statistics in the Java Runtime Analyzer. See Using Oracle JRockit Mission Control Tools for more information on JRA.

The TLA size is set using the following option:

-XXtlaSize:min=<size>,preferred=<size>
The “min” value is the minimum TLA size, while the “preferred” value is a preferred size. This means that TLAs will be of the “preferred” size whenever possible, but may be as small as the “min” size. Typically the preferred TLA size can be up to twice the size of the largest commonly used object size in the application. Adjusting the min size may have an effect on garbage collection performance, but is seldom necessary. A normal value for the min size is 2 KB.

For in-depth information about tuning the TLA size, see Optimizing Memory Allocation Performance.

This section lists some best practices for tuning the JRockit JVM for a number of specific applications and application types.

Oracle WebLogic Server

Oracle WebLogic Server is an application server, and as such it requires high application throughput. An application server is often set up in a controlled environment on a dedicated machine. Try the following when tuning the JRockit JVM for Oracle WebLogic Server:

Use a large heap, several gigabytes if the system allows for it.
Set the initial/minimum heap size (-Xms) to the same value as the maximum heap size (-Xmx).
Use the default garbage collection mode, -XgcPrio:throughput

Oracle WebLogic SIP Server

Oracle WebLogic SIP Server is an application server specialized for the communications industry. Typically it requires fairly low latencies and is run in a controlled environment on a dedicated machine. Try the following when tuning the JRockit JVM for Oracle WebLogic SIP Server:

Use a large heap, at least a couple of gigabytes if the system allows for it.
Set the initial/minimum heap size (-Xms) to the same value as the maximum heap size (-Xmx).
Use the garbage collection mode optimized for pause times, -XgcPrio:pausetime, or the static generational concurrent garbage collector, -Xgc:gencon.
Use a fairly small nursery, in the range of 50-100 MB.
Decrease the compaction ratio or compact set limit to lower and even out the compaction pause times, see Tuning Compaction for more information.