Java (category)

In a few words, what is Java ?

Java is the trademark of a software platform consisting in three sets of elements:

A programming language
A virtual machine (JVM)
Runtime environments (JRE and others)

Each component of the platform is defined by a specification. Different implementations of each specification allow the support of Java programs in multiple environments.

The Java platform components where initiated by Sun Microsystems. They currently evolve in the frame of the JCP, a gathering of partners around Oracle Corp. (after acquisition of Sun).

The JCP members ask for new specifications: they propose a JSR. After approval, the JCP forms an expert group to work on the JSR. A final relase is obtained with a Reference Implementation (RI) and a Technology Conformance Kit (TCK).

Runtime environments are declined in three versions:

Java Micro Edition (Java ME)
Java Standard Edition (Java SE)
Java Enterprise Edition (Java EE)

Compile-time checking in Java

In a previous post I wrote about code quality: in a few words, I believe in static-checking. Static means without executing the program. In a software process point of view, this would imply before runtime, in other words at compile-time. For a developer, it can be very useful to have such a verification simply with a compilation, without executing a specific tool (by the way, some very interesting software of that kind are available, just Findbugs for example).

Here is the point: how would it be possible to proceed that kind of control without rewriting a compiler from scratch ?

The Java platform provides a very interesting feature to perform such operations: annotations (JSR 175). These are metadata defined as java types. Annotations have been introduced in the Java language starting with Java SE 5. Annotations can be processed at compile-time or runtime.

The Sun Java SE distribution provided from the starting point a facility for annotation processing at compile-time called APT (for Annotation Processing Tool). This was a vendor-specific feature (in Sun packages) executed in a specific and eponymous command line tool.

Since Java 6, that function is part of the platform with JSR 269 (javax.annotation.processing package, reference implementation part of the Java SE distribution). The contributed processing can be executed along compilation. With Java SE 8, the historical APT packaging and specific tool is removed (after deprecation in Java SE 7).

If the topic seems basic for the Java platform, there is a lack of exhaustive documentation for annotation processors development. I found some precious informations in Using Java 6 processors in Eclipse , a blog post by Carl-Petter Bertell . The Eclipse platform provide very precious functions with the JDT-APT project.

A very interesting presentation at EclipseCon 2007 by Walter Harley , Gary Horen and Jess Garms from BEA systems develops many important points about compile-time checking: what could be done with annotation processors and what requires more. For example, the processor does not provide the structure of the AST and thereby does not allow flow analysis.

A processor must implement the javax.annotation.processing.Processor interface. The implementation should be specified to the platform using the SPI system. An abstract class AbstractProcessor is provided to help implementations.

In a tutorial about annotation processing, Alex Collins details the implementation of a compile-time processor for the @Transactional annotation of the spring framework .

A Lucene-based map

Last year I needed to store strings in a Java Map using a limited memory.

The solution was rather simple: using a Lucene index to store the content on disk. This would provide interesting access performances. Here is the Gist:

Apache Solr: JVM memory management and mmap system calls

Unlike other indexing software like Elasticsearch, Solr can be deployed in any Java Servlet container. The documentation provides an example with Apache Tomcat: this is our target environment at the MNHN. Just like a standard Tomcat installation, here comes the question of memory allocation to the JVM. Reading the Solr section on the topic, it was chosen to allow a large part of the RAM to the JVM (5GB on a total of 6: one single spared for the system).

We identified quickly a main problem: while indexing, the Solr engine issued timeout errors on select queries. A stackoverflow exchange describes a similar behavior. We noticed on the server side an important usage of virtual memory.

As I answered in the exchange, we solved this problem a few months ago after reading a blog post by Uwe Schindler (a Solr committer). With Solr 4 and several Solr 3 versions, you have to let an important share of your RAM free so that the system can use properly the mmap system call. This is due to the introduction in Solr of a new Lucene component: the MMapDirectory. The blog post gives a plenty of informations on the system configuration. In our case, this solved the problem: we could finally index without any more timeout issue.

Multiple standard API implementations (SPI) and ClassLoaders

I recently faced a singular situation: I had to adapt an application as a Solr plug-in. That application was using two different JPA implementations. As a reference to a previous post, in a traditional environment a standard API implementation is usually specified using the SPI. A central interface for the API is identified (e.g. java.sql.Driver for JDBC). Then the full name of the implementing class is specified in the content of a file available in the ClassPath under the location META-INF/services/{"fullInterfaceName"}. Multiple implementations can be specified on separate lines, this is an important point.

Usually nobody cares about the SPI: each implementation has its own META-INF/services declaration. If you want to check it just take a look inside the JAR of your usual JDBC driver (type 4). This is why the JDBC DriverManager does not require any more to use a Class.forName(...) call since Java 6. Things become more complex when multiple implementations are available in the ClassPath: for example when an application uses both a PostgreSQL and a MySQL driver. This is a common case. The JDBC API solves the problem with a workaround: a specific system property (jdbc.drivers) can be used to specify the classes names. This is not the case for all standard APIs. To solve that, the usual solution is to define your own META-INF/services/{"fullInterfaceName"} file in your application ClassPath. But with a Solr plug-in, just like my situation, it is difficult to handle the question of class loading.

In order to troubleshoot the loading of SPI implementations the JRE ServiceLoader#loadClass(Class,ClassLoader) methods is a very useful tool. In my case, it showed me that the PersistenceProvider implementations specified in my META-INF/services/javax.persistence.spi.PersistenceProvider file where not properly loaded in the Solr plug-in. As a consequence, the call to Persistence.createEntityManagerFactory(String) failed.

Understanding that the problem of implementation discovery was related to the ClassLoader, I had a look at the source of PersistenceProviderResolverHolder. It showed me that the ClassLoader used for the discovery of the JPA implementations was the current thread's:

Thread.currentThread().getContextClassLoader()

Then the solution was obvious. Before calling Persistence.createEntityManagerFactory(...) I just forced the ClassLoader associated to the thread:

ClassLoader cLoader = getClass().getClassLoader();
Thread.currentThread().setContextClassLoader(cLoader);

Well, in the end all is logical. But such problems can be quite challenging to understand. The Java platform could be improved on that point. Perhaps an improvement provided by project Jigsaw ?

EMF model change listener

As explained in the EclipseZone forum exchange on the topic, two strategies are available to listen to changes on an EMF model:

EContentAdapter (from the EMF core)
ResourceSetListener (from the EMF/transaction)

The EContentAdapter notifies of change on any EMF object (EObject). This requires to have a direct access on the object, what does not help much to listen on third-party changes (external editor on anything else). A tutorial by Lars Vogel details how to use it.

The ResourceSetListener solution, far more complete on the transaction point of view, helps to listen on external editor changes. It is observer pattern. An access to the editor's editing domain is required, it must be of type TransactionalEditingDomain in order to expose the registration method.

Programmatically refactor with the JDT

Using the internal rename package is not a very good idea (discouraged access warning). There is an official documentation on that topic. A stackoverflow exchange also mentions interesting informations.

Decoding simple JDT signatures

The JDT sometimes returns simplified type signatures, for example the IMethod#getParameterTypes() method can return values such as QString instead of java.lang.String.

To decode such values, the org.eclipse.jdt.core.Signature class provides utility functions. Signature#getSignatureSimpleName(String) transforms QString into the simple name, i.e. String.

To get the full name, the containing type allows to resolve the simple name. IType#resolveType(String) transforms this simple name using the imports of the containing type. The result is an array with two dimensions. The first one is for the case where there are multiple answers possible. The second one separates the name of the package and the simple name.

The following code returns the first full name of a method return type:

	String name = method.getReturnType();
	String simpleName = Signature.getSignatureSimpleName(name);
	IType type = method.getDeclaringType();
	String[][] allResults = type.resolveType(simpleName);
	String fullName = null;
	if(allResults != null) {
		String[] nameParts = allResults[0];
		if(nameParts != null) {
			fullName = new String();
			for(int i=0 ; i < nameParts.length ; i++) {
				if(fullName.length() > 0) {
					fullName += '.';
				}
				if(nameParts[i] != null) {
					fullName += nameParts[i];
				}
			}
		}
	}
	return name;

An exchange on stackoverflow details the code for parameters types full names.

Memory errors in Java

Memory in Java is largely made easy by garbage collection. This automatic management of memory does not prevent from any memory problem. Here are some basics elements I find useful to review while facing a memory error.

The memory is divided in two distinct spaces, even if this might change in Java 8.

The PermGen space contains classes definitions
The Heap space contains objects

When a memory problem occurs, an error is raised. This throwable contains information about the cause of the problem and the impacted memory space. Most of time, these informations are sufficient to help solve the problem.

The JVM options mentioned in this post are for HotSpot (Sun/Oracle implementation). For more informations, the official HotSpot tuning documentation is very helpful.

java.lang.OutOfMemoryError: PermGen space

The space allocated for classes definitions is full. This can happen if the environment loads a large number of classes (common in Java EE environments). The allocated space can be configured via the JVM option -XX:PermSize=256m (for 256 megabytes).

In some situations, this saturation can be caused by a hot deployment. A workaround for such a problem is the JVM option -XX:+CMSClassUnloadingEnabled. This clear all references to obsolete classes definitions. A very helpful option when processing multiple hot deployments in a Tomcat environment.

A blog post by Frank Kieviet explains a pattern that could also cause PermGen space errors.

Message java.lang.OutOfMemoryError: Java heap space

The created objects stay in the Heap space while being referenced in the executed program. When an object is not referenced any more, it becomes available for release by the Garbage Collector (GC). Most of time, the GC acts when the available memory becomes low. This is highly configurable with many JVM options.

If there is no memory leak, the easy solution is to increase the available space.

All becomes complicated when there is a memory leak. The cause can be difficult to identify. Several tools help to identify the problem nature.

I personally generate an hprof file. This is done using JVM option -XX:+HeapDumpOnOutOfMemoryError. Another JVM option -XX:HeapDumpPath=... allows to specify the generated file location. The Eclipse MAT tools helps to visualize the generated file.

Another solution is to run your program with the Eclipse TPTP tools. But I prefer to avoid this option because it requires much resources to run (and was quite unstable the last time I used it).

Message java.lang.OutOfMemoryError: GC limit overhead exceeded

This error happens when the system spends too much time executing garbage collection. Literally, there is no memory saturation but the system uses most of its memory despite garbage collections. A simple workaround can be to extend the Heap space. If the problem persists, tuning the GC configuration will probably be the solution. A hint can be the -XX:-UseParallelGC JVM option (some documentation is available here).

Design by contract, assertions and exceptions

Understanding design by contract is, I think, important for software quality in OOP because the principles are clear and efficient.

B. Bertrand Meyer introduced the idea in 1988. It takes advantage of assertions as defined by C.A.R. Charles Anthony Richard Hoare (1969), the seminal work of an eponymous logic.

Assertions are associated to an object method and qualified in one of three categories :

Preconditions are assertions true before executing the method
Postconditions are assertions true after executing the method
Invariants are assertions true before and after executing the method

In practice...

Meyer includes his idea in the Eiffel programming language : the assertions are checked in a static way (at compile time).

If the Java programming language provides support for assertions, these are checked only in a dynamic way (at runtime) and disabled by default. Because of runtime checking, exceptions are largely used instead of assertions. In Java, a proper use of exceptions can be more meaningful to the programmer than a proper use of assertions because it provides more details about the malfunction conditions (the stack trace). At first glance, assertions (in the design by contract way) seem to be very far from exceptions. This is to some extent a paradox: I think the developer should consider assertions concepts every time he uses exceptions.

In Effective Java , Bloch defines checked exceptions as recoverable conditions and runtime exceptions as programming errors.

In an non-Java context, Fowler says an exception is a situation where preconditions are satisfied but postconditions can not be satisfied.

I think that, to respect Bloch terms,

Runtime exceptions should be used for checking preconditions
Checked exceptions should be used when preconditions are true but postconditions can not be satisfied, assuming that the method is correct (e.g. the network connection is broken).

If a runtime exception is raised, the calling method should be bugged (or its preconditions are not properly checked).

Some extensions to the Java platform provide support for static testing (assertions checking at compile time). For example, JML takes advantage of Java comments: this is interesting for traditional compiling compliance.

Java compilation with the Eclipse IDE and Tomcat

You probably have noticed that the Eclipse IDE does not require a JDK to compile a Java program: this is possible with a simple JRE.

The IDE Java Development Tools (JDT) include the appropriate components to compile a program. An AST parser is exposed to help the code manipulation. Compilation is generally defined in two steps : analysis and synthesis. The AST is the analyzed version of a program organized as a tree connecting declarations and definitions. A specific API called the Java model provides an abstraction layer for the AST. The C Development Tools (CDT) also include an AST parser.

With such informations, the IDE is in Java able to perform complex operations on the code design. Functions exposed in the refactor menu take advantage of the Eclipse AST parser.

The compilation performed using the Eclipse AST is reliable. Since tomcat 5.5, it is used by the Jasper engine to compile JSPs as servlets. This is why, since this version, the servlet container does not require any longer a JDK to run.

You should be careful when compiling with Eclipse : there are a few differences when compiling with the JDK (especially on generics transtyping).

References, caching and garbage collection

For a long time I have been wondering how do frameworks manage memory while caching.

The EclipseLink documentation gave me the answer. The Java SE platform provide special references :

According to the official documentation, soft references are used to provide memory-sensitive properties to cached references. This can be very efficient.

But access to soft-referenced objects needs to be done via specific objects so that freed references can be renewed (e.g. via a persistent storage).

The official API also provides a special map with weak referenced keys. This class can be useful but is not really appropriate for caching as keys are released.

Dynamic assignation and proxy objects

Still in JPA, you perhaps wonder how lazy-loading is managed. This is done via proxy objects.

The Java SE platform provides a proxy class which creates proxy objects for interfaces. But this does not help when you wish to do the job on POJOs.

A framework allows to do this (and many other bytecode operations) : Javassist. The ProxyFactory class allows to build a substitute object for any class via the createClass method. I guess the substitution is done by subclassing.

Instances of this "dynamic" class can be created calling the newInstance method. The created object implements the Proxy interface. To this this interface can be associated a MethodHandler object. Such an object is called on any access to the dynamic instance (including null tests). For a lazy-loading field, this is where to replace the dynamic instance.

It is also recommended to specify a MethodFilter object to the ProxyFactory to avoid the creation of proxies when finalizing the reference.

 	ProxyFactory factory = new ProxyFactory();
	factory.setSuperclass(MyClass.class);
	factory.setFilter(
	  new MethodFilter() {
	    public boolean isHandled(Method m) {
	        // The filter to avoid finalization
	        return !m.getName().equals("finalize");
	    }
	  }
	);
	Class cl = factory.createClass();
	MethodHandler handler = new MethodHandler() {
	    public Object invoke(
	      Object self, 
	      Method m, 
	      Method proceed,
	      Object[] args
	    ) throws Throwable {
	        // On each access, execution of the original operation
	        return proceed.invoke(self, args);
	    }
	};
	MyClass cls = (MyClass)cl.newInstance();
	((Proxy)cls).setHandler(handler);

My way crossed a single problem with Javassist: null tests. While using dynamic objects to avoid unnecessary database access, the method filter proceeded the proxy replacement (database access) properly on null tests in most cases. But in some rare cases, the null test did not call the method filter and did not perform the test properly. Perhaps was this a misuse of the library. Anyway the workaround was to perform a double check (null reference test + null test on a mandatory property) in order to ensure the method filter call.

Edit (2014-10-16) : this last point requires further explanations (in a few words this was a misuse). I used Javassist proxies to implement lazy-loading of an object-relational mapped attribute. The point was about replacing the proxy object. To do so, I was only listening to method calls on the proxy. But a field access is not a method call: it was possible to create direct references on the proxy object before replacing it. It seemed then impossible to replace all the assignated references and the proxy object was still assignated (that caused the null tests to fail).

SPI

What are the Service Provider Interfaces (SPI) ?

These are important elements of the Java platform. More documentation is available in the official tutorial. In a few words, you probably have noticed the Java APIs mainly consist in interfaces. The SPI is the mechanism setting up the appropriate implementation so that programs don't rely on the implementation at compile time.

When an implementation is provided, to associate it to the implemented interface, there is one (or more) entry point. For example, the JPA API has for single point the interface javax.persistence.spi.PersistenceProvider . All other interfaces are available via this one.

If you want to provide a JPA implementation, you first have to code a class implementing this interface. Then, a text file named after the fully qualified name of the interface should be available to the class loader setting up the environment in the META-INF/services directory.
This mean, if you are working with Eclipse IDE in a Java SE environment with an src source directory :
src/META-INF/services/javax.persistence.spi.PersistenceProvider

This text file just has to contain the available implementations. If I want my own implementation (the provider is org.test.jpaimpl.PersistenceProvider) to be available along with EclipseLink (the provider is org.eclipse.persistence.jpa.PersistenceProvider ) the file has to contain these two lines :

org.eclipse.persistence.jpa.PersistenceProvider
org.test.jpaimpl.PersistenceProvider

For the specific case of JPA, when multiple implementations are found by the class loader, the implementation to use can be specified in the persistence.xml file, in the provider element of a persistence-unit :

<persistence version="2.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://java.sun.com/xml/ns/persistence" xsi:schemalocation="http://java.sun.com/xml/ns/persistence http://java.sun.com/xml/ns/persistence/persistence_2_0.xsd">
<persistence-unit name="COLLECTIONS" transaction-type="RESOURCE_LOCAL">
<provider>org.test.jpaimpl.PersistenceProvider</provider>
...
</persistence-unit>
</persistence>