2016-06-20

Strategies for loading object graphs with JPA

In this article, I will describe and discuss the different strategies for loading object graphs with JPA 2.1. This should help you to choose the right strategy for your application. Firstly, I will explain how you can load your object graphs with JPA. Secondly, I will describe the pros and cons of the different approaches and when to use them.

As I've already mentioned, this article is all about object graphs, but before we delve into the topic I will briefly introduce JPA, explain what an object graph actually is and why it is important to choose the right strategy to load it. 

Java Persistence API (JPA)

Whenever you use such things as Java Persistence API with its most popular implementations Hibernate, EclipseLink or OpenJPA you probably try to solve the following problem. You have written an object-oriented application with Java and you want to store your data into a database. In Java, data is always represented by objects and their attributes, in contrast, databases (at least the relational ones) define concepts such as tables with rows and columns. The problem that must be solved is to map the objects to database tables and vice versa. This includes questions such as how to map inheritance, how to map primitive Java types to SQL data types and how to deal with identity of objects. This problem is exactly what an Object-Relational-Mapping (ORM) tool is solving for you and JPA provides a standardized API to access it. 

Object graphs 

As I've already mentioned, in Java, data will be represented by objects. Most probably your application will have something like a domain model, representing all the domain objects of the application. Domain object define attributes, getter- and setter methods to access these attributes and probably other more high-level methods to operate on the data stored in the attributes. Attributes may be of different data types. Primitive types such as int, long or double (of course also the wrapper types of the same) or more complex types such as address objects or other domain object types.

Example:

@Entity
@Table(name = "usr")
public class User {

   @Id
   private Long id;

   @OneToMany
   private Set<Car> cars;

   @OneToMany
   private Set<UserLogins> logins;
}

The class User has been mapped to the table named usr. It defines an attribute id of primitive (wrapper) type Long and two attributes of complex type Car as well as UserLogins. One User may have several Cars and several UserLogins. Both target types are domain objects mapped to the database (defining an identifier and at least the annotation @Entity). The classes UserLogins and Car will probably define their own attributes and some of them are pointing to other domain objects, too. As a consequence, when you have several domain objects / entities, you might be faced with a huge object graph and each object of a particular type is stored in a separate table: User objects are stored in the table usr, Car objects in the table car and so on. You can see this in the following UML class diagram:


In this example you can navigate from a User to a Car to its Axis and it Wheels. This is a navigation path of length 3. You can see that most of the UML associations between classes have multiplicity * and thus a User could have several Cars a Car could have several Axis and an Axis several Wheels. In this example the * will probably represent small numbers between 2-4. But imagine you had objects such as a calculation with several positions and a much longer navigation path. The number of objects will explode. Thus it is very important to load only the relevant parts of the object graph from database.

Static loading strategies

Static loading strategies define the to be loaded object graphs already at compile time via mappings (XML or annotations), specific queries or so called NamedEntityGraphs. I will distinguish application-wide strategies such as lazy loading or eager loading and operation-specific strategies such as explicit joins.

Lazy loading

JPA's default behaviour is that only the root object which is explicitly requested from database will be loaded. In contrast, all other objects in the graph will be lazy. They will be represented by a proxy and loaded whenever you access them. However, the prerequisite is that the entity manager session is still open. When is this session open? By default the session is bound to the transaction life cycle. That means if you don't have any transaction, the session will be closed after you've performed your database operation. The following test will exemplify this behaviour. It expects a LazyInitializationException.

@RunWith(SpringJUnit4ClassRunner.class)
@ContextConfiguration(classes = { Config.class })
public class StaticFetchTest {

   @PersistenceContext
   private EntityManager entityManager;

   @Test(expected = LazyInitializationException.class)
   public void testFailLazyAccess() {
      User user = entityManager.find(User.class, 1L);
      // Lazy init exception cars is lazy but session closed
      user.getCars().stream().forEach(System.out::println);
   }
}

The reason for raising the exception is that the @PersistenceContext is by default set to type =  PersistenceContextType.TRANSACTION. In order to extend the lifetime of the session, you could put all the database operations which shall operate in this very same session into one common transaction. Of course you will hesitate to use a transaction although there is no write to the database which might cause performance penalties. And you are right. If you just use the transaction as session demarcation, you should set the attribute read-only to true. The next test method demonstrates this:

@Test
@Transactional(readOnly = true)
public void testLoadInTx() {
   User user = entityManager.find(User.class, 1L);
   // Cars are loaded on demand because entity manager is still open
   // because of the transaction
   user.getCars().stream().forEach(System.out::println);

}

The session is open as long as you don't leave the test method and thus JPA can load the objects you access on demand (proxied objects will get initialized). 

You are able to control the lifetime of the session. Let's assume you had an application with a controller layer, a service layer and a repository layer. You decide where to put your transaction: to the controller? to the service ? or to the repository? What strategy you choose depends on your application, but in general I think it is advantageous to put the transactions to the controller level. There is one exception from this rule: If your application runs controllers and services on separate machines and the service are consumed remotely via REST or SOAP, for example, then it makes no sense to put the transaction annotations to the controller because from there it won't be possible to initialise the proxies, because they run in different JVMs. In this case I would advise you to put the @Transaction annotation only to your services. Btw. you could also add transactions to controllers AND services (for example if the controllers are not the only consumer of the services). In this case, the transaction will be started in the controller and all subsequent service calls will join the ongoing transaction. If you call the service from a non-transactional application part the service will automatically start a new transaction.

A completely different strategy is to use extended sessions. With this strategy, sessions are open as long as possible and thus independent from the transaction's life time. What does as long as possible mean? To manage an open session you need to maintain state and how long this state can be maintained depends on the lifecycle of the object instance with the reference to the EntityManager instance. In a JEE application with Session Beans, for instance, you could use a Stateful Session Bean to manage this state. Let's have a look on how our test would look like with extended session:

@RunWith(SpringJUnit4ClassRunner.class)
@ContextConfiguration(classes = { Config.class })
public class StaticFetchTest {

   @PersistenceContext(type = PersistenceContextType.EXTENDED)
   private EntityManager entityManagerExtended;

   @Test
   public void testLoadExtended() {
      User user = entityManagerExtended.find(User.class, 1L);
      // Cars are loaded on demand because entity manager is extended
      user.getCars().stream().forEach(System.out::println);
}

In this example the attribute entityManagerExtended is annotated with @PersistenceContext and its attribute type set to PersistenceContextType.EXTENDED. This enables extended session and although there is no transaction this test will run successfully.

Eager loading

The opposite to lazy loading is loading parts of the object graph instantly.  This is called eager loading and can be specified via  the object mapping. For each attribute of an entity pointing to another entity (and thus @OneToOne, @ManyToOne, @OneToMany or @ManyToMany) you can decide to load the target entities together with the source entity. If you want to load UserLogins whenever you load a User object, you would specify the following mapping:

@Entity
@Table(name = "usr")
public class User {

   @OneToMany(fetch = FetchType.EAGER)
   private Set<UserLogins> logins;
}

You set the attribute fetch to FetchType.EAGER which tells JPA to load the UserLogins instantly together with the User. So you can run the following test now:

@Test
public void testLoadEager() {
   User user = entityManager.find(User.class, 1L);
   // Logins are loaded because attribute is mapped eager
   user.getLogins().stream().forEach(System.out::println);
}

The advantage of this approach is that it is easy to configure and its impact is application-wide. However, this is also a drawback because you might not always want to load User objects with UserLogins. Still, if the probability is high you can define it as the default strategy for your application. Candidates for eager loading are in general objects which can be represented by UML composite associations. They are parent-child connections where the child lifecycle depends on the parent lifecycle.

In contrast to the application-wide configuration of the loading behaviour, JPA allows also to define the loading strategy on a per-operation-basis.

Explicit join

If you are using queries written in JPAQL (an object-oriented extension to SQL) you can join parts of the object graph via JPAQL's join keyword. But be careful, there are two join variants in JPAQL. The standard join keyword should be used when you would like to constrain the results by attributes of other objects from the graph than your root object. In this example we load only users who own a car with a particular license plate:

@Test(expected = LazyInitializationException.class)
public void testFailLazyAccessByExplicitJoinInJPAQL() {
   TypedQuery<User> loadUserQuery = entityManager
      .createQuery("select usr from User usr left outer join usr.cars cars where    
         cars.licensePlate = :licensePlate", User.class);
   loadUserQuery.setParameter("licensePlate", "HIBERNATE");
   User user = loadUserQuery.getSingleResult();
   user.getCars().stream().forEach(System.out::println);
}

As you can see this test will result in a LazyInitializationException although we have joined cars. The right keyword for using the explicit join loading strategy is to use fetch join keyword instead.

@Test
public void testInitByExplicitFetchJoinInJPAQL() {
   TypedQuery<User> loadUserQuery = entityManager
      .createQuery("select usr from User usr left outer join fetch usr.cars where usr.id =   
         :id", User.class);
   loadUserQuery.setParameter("id", 1L);
   User user = loadUserQuery.getSingleResult();
   user.getCars().stream().forEach(System.out::println);
}

Dynamic loading strategies

Dynamic loading strategies vary in the way how dynamic they actually are. In general, all those strategies allow you to decide at runtime which parts of the graph you want to load, whereas the aforementioned strategies do not support this because they are configured via mappings at compile time or via constant query strings.

Named Entity Graphs

One way to define several dynamic loading strategies is to use a NamedEntityGraph which can be configured as an annotation on top of an entity as follows:

@NamedEntityGraph(name = "user.cars", attributeNodes = @NamedAttributeNode("cars")
@Entity
@Table(name = "usr")
public class User {
}

An entity graph has two important attributes: a uniquely identifiable name and the attributeNodes which described the parts of the object graph to be loaded. The given graph defines that whenever a user is loaded its cars shall be fetched eagerly. But why is this dynamic? I have to define the graph as annotation and it cannot be changed at runtime. You should see how to use the graph. You can use it on a per-operation basis:

@Test
public void testInitCarsByEntityGraph() {
   User user = entityManager.find(User.class, 1L,    
      Collections.singletonMap("javax.persistence.fetchgraph",
         entityManager.getEntityGraph("user.cars")));
   user.getCars().stream().forEach(System.out::println);

}

There is a property called javax.persistence.fetchgraph allowing you to pass in a given entity graph to be used for the find operation. In this example we have used the find operation of the EntityManager. You can also set an entity graph when you are using a JPAQL query:

@Test
public void testInitByNamedEntityGraphInJPAQL() {
   TypedQuery<User> loadUserQuery = entityManager.createQuery("select usr from User usr 
      where usr.id = :id"User.class);
   loadUserQuery.setParameter("id", 1L);
   loadUserQuery.setHint("javax.persistence.fetchgraph",    
      entityManager.getEntityGraph("user.cars"));
   User user = loadUserQuery.getSingleResult();
   user.getCars().stream().forEach(System.out::println);

}

The entity graph in this example is passed in via the setHint method which is defined on TypedQuery.

You could also have defined several different entity graphs and choose the entity graph depending on some condition, or you could even allow to pass the name of the entity graph as a parameter for your repository or data access object methods. That is why I categorize entity graphs as dynamic.

However, named entity graphs are still limited because I can only use one entity graph per operation at the same time. It is not allowed to mix them. Hence, you could potentially face a huge number of different graphs representing all combinations of object paths to be loaded. If that is the case you should rather use dynamic entity graphs.

Dynamic Entity Graphs

Dynamic entity graphs can be instantiated at runtime. If your application must be very flexible in loading different graphs you could choose this concept. The next test case shows how you use it:

@Test
public void testInitDynamicEntityGraph() {
   EntityGraph<User> graph = entityManager.createEntityGraph(User.class);
   graph.addAttributeNodes("cars");
   User user = entityManager.find(User.class, 1L,     
      Collections.singletonMap("javax.persistence.fetchgraph"graph));
   user.getCars().stream().forEach(System.out::println);

}

In this example we create a new EntityGraph<User> via EntityManager.createEntityGraph. To this entity graph we add the attributes nodes cars. The instance is then passed in as javax.persistence.fetchgraph property again, similar to the named entity graph example.

Criteria API

The most powerful and flexible way to define queries with JPA is to use the JPA Criteria API. The criteria API allows you to define the whole query dynamically at runtime. A typical use case would be to define a search filter. Imagine you had different filter attributes and whenever you put a value into the filter form you want to add a test whether this attribute equals the given value to the where clause conjunction. The most obvious approach would be to heavily rely on String concatenation. But this is not type safe and most probably inefficient. Hence, you would rather make use of the JPA Criteria API as follows:


@Test
public void testInitByExplicitFetchJoinInJPACriteria() {
   CriteriaBuilder builder = entityManager.getCriteriaBuilder();
   CriteriaQuery<User> query = builder.createQuery(User.class);
   Root<User> root = query.from(User.class);
   root.fetch("cars", JoinType.LEFT);
   CriteriaQuery<User> criteriaQuery =  
      query.select(root).where(builder.and(builder.equal(root.get("id"), 1L)));
   User user = entityManager.createQuery(criteriaQuery).getSingleResult();
   user.getCars().stream().forEach(System.out::println);

}

In this example, you first create a CriteriaBuilder object. This object allows you to create a new CriteriaQuery object. Next, you create the query root. On this root object you call fetch on cars. Then you define what to select and how the where clause looks like. To make the query type safe you could also use a canonical meta model. This would allow you to replace the Strings "cars" and "id" with type safe expression such as root.get(User_.id) or root.get(User_.cars). In order to dynamically load certain objects from the graph you can add further parts of the graph for example root.fetch("cars", JoinType.LEFT).fetch("axis", JoinType.LEFT) by concatenating the fetch method calls.

Alternatively you can even combine named entity graphs or dynamic entity graphs with criteria API as follows:

@Test
public void testInitByNamedEntityGraphInJPACriteria() {
   CriteriaBuilder builder = entityManager.getCriteriaBuilder();
   CriteriaQuery<User> query = builder.createQuery(User.class);
   Root<User> root = query.from(User.class);
   CriteriaQuery<User> criteriaQuery
      query.select(root).where(builder.and(builder.equal(root.get("id"), 1L)));
   User user = entityManager.createQuery(criteriaQuery)
      .setHint("javax.persistence.fetchgraph"
         entityManager.getEntityGraph("user.cars")).getSingleResult();
   user.getCars().stream().forEach(System.out::println);

}

When to use which strategy

To sum up, I will briefly discuss the pros, cons of the presented strategies and a guideline to choose the appropriate strategy.

Lazy Loading

Pros:
  • You will only load objects that you access
  • You don't have to think about a concrete loading strategy
  • Your root object loads fast
Cons:
  • Might produce higher delay when you access objects which a long navigation path
  • You can't use SQL joins to load parts of your object graph, you will always produce several select statements 
  • You are not flexible because your strategy is global on mapping level
You should use it if
  • your frontend is running in the same JVM as your Hibernate backend
  • you can't anticipate which parts of your object graphs are needed in which situation
  • you want your application start quickly and distribute load time among subsequent user interactions
You should not use it if
  • your frontend calls your Hibernate backend remotely
  • it is clear that certain parts of the object graph must always be loaded
  • you can neglect application start up time and preload most of your data

Eager Loading

Pros: 
  • You load all data that you need instantly
  • You allow JPA to use the optimal fetching strategy (batch, joins, select)
  • You don't have to deal with closed sessions
Cons:
  • You need to think about which parts should be eager loaded and which not
  • You will have higher delay to load the root object from the graph because also other parts of the graph must be loaded
  • You will probably load unnecessary parts of the object graph
  • You are not flexible because your strategy is global on mapping level
You should use it if
  • it is clear which objects in the graph always belong together
  • you want to preload data to have quick access later on
You should not use it if
  • it is not clear which parts of the object graph are required 
  • your fetch graph is getting too big

Explicit join

Pros: 
  • You decide on operation level what to load, you probably won't load unnecessary objects
  • You can offer different variants of your operation (e.g. a lazy and an eager one) by using different join strategies for the same operation
Cons:
  • It is more complex to define what to load on query level
  • No separation of the loading strategy from the query making it less reusable because you have to define what to load for each query again and again, although you might have the same strategy you need to redefine it for the next query
You should use it if
  • for a certain operation, always the same data must be loaded
You should not use it if
  • the objects to be loaded vary strongly by the context from where your operation is called

Named Entity Graphs

Pros: 
  • You decide on operation level what to load, you probably won't load unnecessary objects
  • You can offer different variants of your operation by allowing to use different named entity graphs
  • Increases reusability because you can use the same graph for different operations
Cons:
  • Syntax is verbose if the graph gets complex
  • Named entity graphs cannot be combined, number of defined graphs may explode 
You should use it if
  • you have a few different loading strategies to be supported which can be reused for different operations in your repository/dao
You should not use it if
  • there are too many strategies or combinations of strategies

Dynamic Entity Graphs 

Pros: 
  • You decide on operation level what to load, you probably won't load unnecessary objects
  • The consumer of your repository/dao operation may decide completely on his/her own what to load, this will probably reduce the number of operations you need to offer in your repository/dao
Cons:
  • Complex to define
  • Difficult to reuse because they are only valid for a particular use case
You should use it if
  • there are many different loading strategies or combinations of strategies
You should not use it if
  • you only need a few strategies to offer in your dao/repositories

Criteria API

Pros: 
  • Allows you to write completely dynamic queries including dynamic loading strategies
  • Can be combined with entity graphs
Cons:
  • More verbose, difficult to read and complex to define
You should use it if
  • you have many variants of the same query
  • you start to build your queries using String concatenations
You should not use it if
  • you can implement the same with one static query

All examples used in this article can be found on GitHub https://github.com/Javatar81/code-examples/tree/master/jpa-object-graphs

2 comments: