Backus-Naur Form (BNF) processing

I recently resurrected on my spare time an old idea: a tool to transform a Backus-Naur Form (BNF) grammar to ANTLR. The purpose was to easily generate a parser from an initial BNF grammar. In mind was the conversion of a JPQL query string to Criteria objects. I was previously surprised not to find such a grammar conversion tool available as open source. Here started ēmaitijǭ. Simple was the idea, humble was the ambition: at first glance only characters replacements.

Limited was my experience in text parsing. A first idea was to convert the BNF definition available in the Wikipedia article as a grammar of BNF rules to an ANTLR grammar in order to generate a parser for the transformation tool itself. Such a movement seemed to me too complicated: I chose to process the initial file in a single class. The early ANTLR grammars where generated a few minutes later. Unfortunately, these grammars failed generation with ANTLR with many errors in the output. I finally came back to the idea of a traditional parsing tool but wrote it by myself: the initial ANTLR generation plan would have been the right.

A few hours later, my parser did the same job as the single text transformation class. And as a result there where as many errors in the ANTLR output. I had a look at the JPQL BNF input. Here is a single expression that challenged me:

{AVG |MAX |MIN |SUM} ([DISTINCT] state_field_path_expression) | COUNT ([DISTINCT] identification_variable | state_field_path_expression | single_valued_association_path_expression)

Reading properly was the first problem. I was abused by the parentheses: used to common programming languages, I first considered these characters as grouping symbols, such as in Extended BNF. What a mistake: these where literal characters in the expression. As a consequence, the parser did not read the sequence properly. Just a bug: easy to solve.

In the following section of the expression comes another issue:

([DISTINCT] state_field_path_expression)

Syntax was my second problem. In Extended BNF, literals are clearly identified with quotes. This is not the rule in standard BNF. Because of concatenation, how can the parser make the distinction between the parentheses literals and the expressions to be interpreted (in that case [DISTINCT] and state_field_path_expression) ?

I did not solve properly that problem and modified the input BNF: just introduced a whitespace after and before each parenthesis. After all, ēmaitijǭ was probably not a very good idea in the first place. Donald Knuth said about the form of BNF that it was "not a normal form in the conventional sense". This is probably why such a tool was not available online before. A main problem is that there is no strict definition of BNF.

BNF is largely more used than EBNF in formal definitions. Is the literal limitation a real problem ? As a conclusion, if it was definitely stated that literals and other expression terms should be delimited with whitespaces (instead of quotes or anything else), wouldn't it simplify the use of BNF for anyone ? Well, I hear the coming problem: it wouldn't be possible to express whitespaces in a grammar. But is there a real need to specify sharply whitespaces in a grammar ? The background problem is that BNF grammar definition, because of its origins, is fuzzy and its current use is approximative.

Multiple standard API implementations (SPI) and ClassLoaders

I recently faced a singular situation: I had to adapt an application as a Solr plug-in. That application was using two different JPA implementations. As a reference to a previous post, in a traditional environment a standard API implementation is usually specified using the SPI. A central interface for the API is identified (e.g. java.sql.Driver for JDBC). Then the full name of the implementing class is specified in the content of a file available in the ClassPath under the location META-INF/services/{"fullInterfaceName"}. Multiple implementations can be specified on separate lines, this is an important point.

Usually nobody cares about the SPI: each implementation has its own META-INF/services declaration. If you want to check it just take a look inside the JAR of your usual JDBC driver (type 4). This is why the JDBC DriverManager does not require any more to use a Class.forName(...) call since Java 6. Things become more complex when multiple implementations are available in the ClassPath: for example when an application uses both a PostgreSQL and a MySQL driver. This is a common case. The JDBC API solves the problem with a workaround: a specific system property (jdbc.drivers) can be used to specify the classes names. This is not the case for all standard APIs. To solve that, the usual solution is to define your own META-INF/services/{"fullInterfaceName"} file in your application ClassPath. But with a Solr plug-in, just like my situation, it is difficult to handle the question of class loading.

In order to troubleshoot the loading of SPI implementations the JRE ServiceLoader#loadClass(Class,ClassLoader) methods is a very useful tool. In my case, it showed me that the PersistenceProvider implementations specified in my META-INF/services/javax.persistence.spi.PersistenceProvider file where not properly loaded in the Solr plug-in. As a consequence, the call to Persistence.createEntityManagerFactory(String) failed.

Understanding that the problem of implementation discovery was related to the ClassLoader, I had a look at the source of PersistenceProviderResolverHolder. It showed me that the ClassLoader used for the discovery of the JPA implementations was the current thread's:

Thread.currentThread().getContextClassLoader()

Then the solution was obvious. Before calling Persistence.createEntityManagerFactory(...) I just forced the ClassLoader associated to the thread:

ClassLoader cLoader = getClass().getClassLoader();
Thread.currentThread().setContextClassLoader(cLoader);

Well, in the end all is logical. But such problems can be quite challenging to understand. The Java platform could be improved on that point. Perhaps an improvement provided by project Jigsaw ?

Web components: adaptation of an XML document

Following my previous post, I tried to play with Polymer on my spare time. In my mind, Web components could be an important point for the architecture of the Web. On one side the Web would be easier to handle for automated systems because of the separation of information and presentation. On the other side Web applications would become easier to design for the same reason.

Software modeling is interesting because it provides a graphical representation of software. As a consequence, UML is an expressive representation for design and documentation. The Eclipse UML2 project relies on an XML file which tends to become a standard form for the persistence of UML models. In that XML file all the informations relative to an UML model are available.

In the discovery of the Polymer platform came the idea of Polymeria: bridging the gap between the XML representation of an UML model and its graphical representation. The principle sounds simple: writing a Web component for each UML element. This is an echo to the idea exposed by E. Bidelman that every framework is compatible with DOM (further information is available in my previous post).

First constraint: Web component names must contain a dash in order to avoid overriding standard HTML element names. The problem is that Eclipse UML2 XML element names do not necessarily contain a dash. In other words, the XML file needs a transformation.

The main issue comes with the Web component for an UML attribute: the XML tag generated by most UML2 tools is self-closing. The behavior is quite surprising in some cases, some side-effects are completely unexpected. For example, some classes appear in a wrong package. I open an issue in the Polymer project because the self-closing tag renders in a surprising way. And it appears one and two others had been submitted before. Self-closing tags are not allowed in Web components.

Every framework is compatible with DOM. But DOM is not necessarily compatible with Web components. After these two limitations, it seems Web components are perhaps XML compliant. However an XML model is tricky to adapt as a Web components document. The random support of self-closing tags can be rather disappointing for a first contact with Web components...

Here is an HTML sample of Polymeria showing the transformations needed and the Web components obtained (a proper rendering has been tested with Chrome and Firefox).

EMF model change listener

As explained in the EclipseZone forum exchange on the topic, two strategies are available to listen to changes on an EMF model:

EContentAdapter (from the EMF core)
ResourceSetListener (from the EMF/transaction)

The EContentAdapter notifies of change on any EMF object (EObject). This requires to have a direct access on the object, what does not help much to listen on third-party changes (external editor on anything else). A tutorial by Lars Vogel details how to use it.

The ResourceSetListener solution, far more complete on the transaction point of view, helps to listen on external editor changes. It is observer pattern. An access to the editor's editing domain is required, it must be of type TransactionalEditingDomain in order to expose the registration method.

Programmatically refactor with the JDT

Using the internal rename package is not a very good idea (discouraged access warning). There is an official documentation on that topic. A stackoverflow exchange also mentions interesting informations.