A Groovy DSL for the Creation of Test Data using JPA

With the automated integrative testing of software that works with a complex JPA data model, it is invariably the case that sooner or later, one will face the question of how it is possible to create semantically meaningful test data without great cost. This article shows how Groovy can be used in order to define a Domain Specific Language (DSL) that enables test data to be defined so that it is easily readable, modular and separate from the actual test code.

The original article from the magazine “Java aktuell” is available here for download.

Automated software tests make certain assumptions before they ensure that the tested feature works correctly under the prevailing conditions. If the tests are performed at the integrative level, these assumptions frequently manifest themselves as data in a relational database. Therefore, before running a test, it is necessary to create the appropriate test data. Often the data required for the individual test cases are mutually contradictory, however, which means that it is impossible to approach matters on the basis of a single defined baseline. Rather, the data have to be defined per test case and inserted into the underlying database. This means that the test code rapidly becomes incomprehensible if large hierarchies or large volumes of data are required.

Outsourcing the creation of the test data so that the actual tests can focus on what is most important is therefore a good idea. In this respect, two alternatives initially come to the fore: native SQL scripts and the DBUnit test library. Both are oriented to tables and columns rather than Java objects, however. This means that an abstraction of the relational database gained via JPA in the productive code would be lost once again during the test. This becomes painfully clear when mapping references with foreign keys. The generation of the data can be separated from the test code, but nothing is gained in terms of clarity. The focus of other alternatives tends to be on the creation of large volumes of random data. These are suitable for performance tests, but are unsuitable for specific test cases.

Requirements

What, however, would a solution that the database-shy Java developer approves look like?
1. Firstly, it would have to be possible to call the solution seamlessly from the Java test code.
2. Test data should be reusable and definable on a modular basis.
3. The solution should work on a completely object-oriented basis.
4. Saved entities should be made accessible to the calling code.
5. It should be possible to define the test data easily and in a readable form.

The Grails Fixtures plugin largely fulfils the aforementioned requirements on the basis of a Domain Specific Language (DSL). It is closely interwoven with the Grails Framework, however. Therefore, the following discussion will highlight how the requirements can be implemented at limited cost in a Groovy DSL which can be used directly in conventional Java projects. The complete code is available at https://github.com/triologygmbh/test-data-loader.

Groovy to the Rescue

With its dynamic nature and plenty of syntactic sugar, the JVM language “Groovy” provides the ideal conditions for being able to define one’s own DSL on a straightforward basis. More detailed information on Groovy is available at http://www.groovy-lang.org/documentation.html. Language features used in the described solution are briefly introduced in the event of use.

The solution

The biggest challenge when developing the envisaged DSL concerns finding a syntax for the definition of the test data that is as simple as possible and then converting this definition into actual JPA entities. To make this a little clearer, first of all, here is an example: With the following snippet, the goal is to instantiate a JPA entity user, to initialise the fields accordingly, to save the entity in the database, and to make the test code available under the name “Peter”.

create User, 'Peter', {
  firstName = 'Peter'
  lastName = 'Pan'
}

Reading in and running Groovy scripts

Before we look at how the definition becomes an initialised entity, solutions for the remaining requirements should firstly be discussed.
Groovy is compiled into Java byte code. For this reason, Groovy code can be called directly from Java code as though it were written in Java. This means that we get the seamless integration with Java for free.
It is also possible to use Groovy as a script language. In this context, it is relatively easy to programmatically read in and run script files with standard Groovy tools. This option is ideal for our purposes: In this way, we are able to outsource our test data definitions in .groovy files and load the required files according to the test case.
For this purpose, we introduce the EntityBuilder class, which accepts a file name and creates the entities that are defined in the file. The complete code is available in the GitHub repository referred to initially.

class EntityBuilder {
  void buildEntities(String entityDefinitionFile) {
    DelegatingScript script = createExecutableScriptFromEntityDefinition(entityDefinitionFile)
    script.setDelegate(this)
    script.run()
  }
  // …
}

buildEntities accepts the file name of an entity definition script. This definition is read in and converted into an executable script instance. In doing so, a class is created at runtime, that provides a run method which contains the content of the script. We can now call this run method like any other method and run the script accordingly. In this respect, it is necessary to remember that Groovy can resolve references which are unclear in the script at runtime. The DelegatingScript script subclass which is used allows setting a delegate, against which unclear references can be resolved. At this point, the EntityBuilder sets itself as the delegate of the script. The reason for this will become clear later.
With this implementation, we can now define test data on a modular basis in random script files and load it as required. Assuming that the entities defined in the scripts are acutally instantiated and initialised, raises the question of how the data enters the database.

The Glue Code

To decouple persisting the entities from their creation, the EntityBuilder provides the option to register EntityCreatedListeners. That way we can use a Listener in the Glue Code in the TestDataLoader class (seerepo) which that takes care of saving the data. The TestDataLoader expects a fully initialised JPA-EntityManager as a constructor parameter. Its loadTestData method can then be called by the Client passing entity definition files. After initializing the listener, it forwards the definitions to the EntityBuilder. The EntityBuilder in turn creates the defined entities and passes them to the EntityPersister through the listener interface for saving.

Definition of the actual DSL

We are now able to define entities in arbitrary scripts, to read in the definitions and to save the created entities. During the actual creation of the entities several features of Groovy take effect: When calling methods, Groovy allows the brackets that surround the parameters to be omitted. Additionally, it is possible to define closures with curly brackets, i.e. runnable sections of code similar to Java 8 lambdas, which can be referenced and transferred via variables like usual objects. Closures can then be run at any location.
With that in mind, it becomes clear that the expression

create User, 'Peter', { } 

is nothing more than a call to the static method create in the EntityBuilder (static import) with three parameters (in Groovy, a class can be referenced simply via its name. The expression User is therefore equivalent to User.class.).
Let us now take a look at what happens when we call create in the DSL.

class EntityBuilder {
  static <T> T create(Class<T> entityClass, String entityName, Closure entityData) {
    return instance().createEntity(entityClass, entityName, entityData);
  }

  private <T> T createEntity(Class<T> entityClass, String entityName, Closure entityData) {
    T entity = createEntityInstance(entityName, entityClass)
    executeEntityDataDefinition(entityData, entity)
    notifyEntityCreatedListeners(entity)
    return entity
  }

  private <T> T createEntityInstance(String entityName, Class<T> entityClass) {
    ensureNameHasNotYetBeenAssigned(entityName, entityClass)
    T entity = entityClass.newInstance()
    entitiesByName[entityName] = entity;
    return entity
  }

  private void executeEntityDataDefinition(Closure entityDataDefinition, Object entity) {
    entityDataDefinition = entityDataDefinition.rehydrate(entity, this, this)
    entityDataDefinition.call()
  }

  // …
}

We can see that the static create method just delegates the call to createEntity of the EntityBuilder’s singleton instance. Here, a new instance of the passed entity class is created and registered under the passed name in the java.util.Map entitiesByName. Please note: entitiesByName[entityName] = entity is equivalent to entitiesByName.put(entityName, entity). Once the new entity is initialized with data in the executeEntityDataDefinition, the registered listeners will finally be informed.

The actual magic takes place in the two lines of the executeEntityDataDefinition method. It takes the newly instantiated entity as parameter as well as the closure originating from the script in which the data for the entity are defined. To understand what happens, we have to do a little more research, however. Let us take another look at the closure which is passed to the create method as the last parameter in the script.

{
  firstName = 'Peter'
  lastName = 'Pan'
}

It appears as though values are assigned to variables. These variables are not declared, however, which means that an additional Groovy feature takes effect in this case. The expression
myObject.someProperty = 'value' is equivalent to calling a setter: myObject.setSomeProperty(‘value’). This means that two setters are called in the closure, the only question being: on what? Since the fields “coincidentally” correspond to the properties of the User entity to be created, wouldn’t it be useful if they were to be called directly on the entity? This is exactly what the two lines in executeEntityDataDefinition achive. Similar to the way it handles scripts, Groovy is able to dynamically resolve method calls within a closure at runtime. In this respect, it is also possible to define a delegate for the closure. The rehydrate method which is called in executeEntityDataDefinition creates a copy of the closure and sets the first parameter as the delegate, in our case, this is the previously instantiated entity. If the closure is now run with entityDataDefinition.call(), the setters in the example are actually called on the User entity so that it is initialised with the corresponding data.
Time to put what has so far been achieved to the test before we further refine the DSL. Let us therefore start by defining a user

import static de.triology.blog.testdataloader.EntityBuilder.create
import de.triology.blog.testdataloader.demo.User

create User, 'Peter', {
  firstName = 'Peter'
  lastName = 'Pan'
}

The Demo.java class demonstrates the way in which the TestDataLoader can be used from the Java code, and that the created entities are actually persisted. The DSL snippets come from the testData.groovy file (see repo).

Nested Entities

So far, so good – we have completed the round trip from the DSL to the database to the test code. It remains to be seen as to whether a complex data model can be used, i.e. how we deal with references between entities. In this context, let us assume that a user can be assigned to a department. In this respect, the user gains a @ManyToOne relationship to the department. In the DSL, we are able to simply nest the creation of entities:

create User, 'Peter', {
  firstName = 'Peter'
  lastName = 'Pan'
  department = create Department, 'lostBoys', {
    name = 'The Lost Boys'
  }
}

What happens, however, if a second user belongs to the same department? Creating the department with the create method twice is obviously not an option. Therefore, we have to create a possibility for referencing previously created entities from the DSL. Since we are using normal Groovy code, it should be possible to store an entity created using the create method in a variable. With a little support from the EntityBuilder, however, it is easier:

create User, 'Tinker', {
    firstName = 'Tinker'
    lastName = 'Bell'
    department = lostBoys
}

What is happening here? How can the allocation department = lostBoys work without lostBoys having been initialised? In this context, a variety of Groovy attributes come together: lostBoys is initially an identifier which cannot be resolved. In this case, Groovy calls a getter for a property with the name of the identifier –similar to the setter with the allocation of values to variables. In the same way, the expression myObject.someProperty is equivalent to the expression myObject.getSomeProperty(). In this respect, the question is again as to what the getter is called on. It cannot be the delegate of the closure. In this case, it is a user instance which certainly does not offer a getLostBoys() method. At this point, the above-described delegate of the script comes into play. We recall that the EntityBuilder positions itself as a delegate before the running of the script. The EntityBuilder does not have a getLostBoys() method either, but implements the propertyMissing method, which is called by Groovy in the case of access to a non-existing property. The missing property’s name is passed as argument. This way, we are now able to retrieve and return the previously created department with the name “lostBoys” so that it is set as a value in the script:

private def propertyMissing(String name) {
  if (entitiesByName[name]) {
    return entitiesByName[name]
  }
  // handle missing reference
}

We can even set Peter as the head of the department while we create the department and assign it to him:

create User, 'Peter', {
  department = create Department, 'lostBoys', {
      name = 'The Lost Boys'
      head = Peter
  }
}

Code Completion

This actually means we have everything that we need. But things could be a little more convenient. Until now, we are largely on our own with the definition of the actual entity data. Within the DSL, there is no way of finding out which properties an entity has and what their specific types are. This means that code completion by the IDE is also impossible.

We are able to tell the IDE what the delegate of the closure will be, however. All of the information are in the call of the static create method of the EntityBuilder. The delegate of the closure is always an instance of the simultaneously passed class. Let us therefore supplement the create method with two annotations to make this information known:

static <T> T create(@DelegatesTo.Target Class<T> entityClass, String entityName, @DelegatesTo(strategy = Closure.DELEGATE_FIRST, genericTypeIndex = 0) Closure entityData) {
  return instance().createEntity(entityClass, entityName, entityData);
}

As a result, the IDE (in this case, IntelliJ IDEA) knows that in our example calls within the closure are delegated to a user instance, and therefore offers the properties of the user.

Summary

Finished! Our DSL enables test data to be defined so that it is easily readable, modular and separate from the actual test code. Hierarchies can be nested and object-oriented instead of being mapped using foreign keys. In this context, we do not have to leave the world of Java, or to make any conceptual break towards the relational model of the database. Since we are using Groovy code within the DSL, we can enjoy all of the freedoms that a programming language is able to offer. One conceivable scenario, for instance, is generating large volumes of data via loops.
This article does not address the question of how the database should be cleaned after a test case. In the examples demonstrated (Demo.java), we make it easy for ourselves, and reverse the transaction after every test. For a genuinely integrative test, this is not available as an option. In “real” projects, we have so far emptied all of the tables with a database script and TRUNCATE TABLE. With large schemas however, this has had a considerable impact on the run time of the tests. It would be interesting to see whether it would be more efficient to clean the database programmatically. Via a stack, it should be possible to delete the created entities in reverse order of their creation.
And another small drawback: The DSL could be designed even prettier with a fluent API, for example:

create User named 'Peter' with {
  firstName = 'Peter'
  lastName = 'Pan'
}

Since, in this case, the definition is distributed across three method calls (create, named and with), I have not found any way of making the delegate of the closure known using annotations. This is because the class at which the delegating occurs is passed to a different method than the closure itself. For the benefit of the code completion, I have therefore decided on a somewhat less attractive DSL syntax. Can anyone think of how both can be achieved? Pull requests with improvements are particularly welcome.

Share this article

Daniel Behrwind
Software Development
As a passionate software developer and clean code advocate, he is fascinated by creative solutions for complex problems which appear so obvious that they may have simply emerged all on their own.