JAX 2009: Tuning Hibernate & JPA (Tuning von Hibernate und JPA-Anwendungen)

(Disclaimer: this text was written while listening to the presentation – please be forgiving with errors that might result from both listening and writing)

Michael Plöd starts our second JAX day with a talk about how to tune your persistence layer. Since we make extensive use of both Hibernate and JPA in our company portal framework and always like to speed things up even more I was quite curious about what he has to offer. Especially since I’m usually known as the guy who is not that interested in databases and rather prefer to talk about the actual domain model.

Michael started with a summary of the general concerns often raised by managers, database administrators and other stakeholders about ORM being painfully slow. He agrees that it is correct to say “Yes, ORM does a lot more than plain JDBC” but he also calls into memory that a lot of experience has gone into Hibernate, OpenJPA, EclipseLink and others so that we are not talking about “bleeding edge” technology but rather mature systems. Late flushing and other nice features actually help to improve the performance of solutions.

Main reasons for bad performance usually are

  • too many queries
  • too slow queries
  • incorrectly tuned databases
  • incorrectly tuned infrastructure

The session will be concerned with the first two items. He starts enumerating reasons for the problems. Main reasons for too many queries usually are

  • application logic
  • mappings
  • caching (he directly states: you might be missing a second level cache – but don’t turn it on just for the sake of it)
  • N+1 selects problem

Runtime problems concerning slow queries include

  • too many selections
  • variable passing
  • missing indices on the database
  • the cartesian product
  • locking (especially pessimistic locking)
  • database structure (especially inheritance structure)

He starts by examining the N+1 select problem and continues with an example for the carthesian product problem both related to using non optimal fetching strategies. Based on these examples Michael stresses that database tuning always will be a matter of balance and you must keep your goals in mind (e.g. better data quality due to normalization versus better speed due to denormalization).

Next Michael is going to address the following optimization items:

  • fetching strategies (batch, subselect, eager – he directly recommends to not use eager), 
  • caching (1st level, 2nd level, stateless session – again not recommended), 
  • queries (selectivity, query cache, variable binding).

Batch fecthing is one of the means to reduce the number of queries in N+1 select situations by using the @BatchSize(size=n) annotation. You need to estimate a useful batch size, it’s an easy approach and your class must use lazy loading (otherwise it won’t work). In this way you can reduce the N+1 problem to an (N / size) + 1 problem.

The next idea would be to use subselect fetching which must be used on each relation reducing N+1 queries to 2 queries. Subselect fectching is parametrized via the fetch mode and can only be used in conjunction with lazy loading.

Eager fetching (again specified by the fetch type) reduces the query number to 1. Nonetheless Michael recommends to not use this fetch type in the global fetch plan lest you once more face the carthesian product.

Caching is differentiated between 1st level caches (e.g. the Hibernate session holding the persistence contexts holding objects), 2nd level caches (clustered or distributed caches, caches working with more than one persistence context) and the actual 2nd level cache implementation being used.

First of all Michael states that it is the developers responsibility to care for the 1st level cache. E.g. Hibernate does not flush the 1st level cache itself, that’s the responsibility of the developer. Additionally he reminds the audience that ORM tools are not useful for batch processes – if you have to use them in such a scenario, flush often and early (and don’t forget clearing the session). Additionally it might be useful to configure the batch size on the JDBC level (which is not a Hibernate parameter but must be done during session factory configuration – you can do that with JPA, too).

2nd level caches differentiate various concurrency strategies:

  • transactional (repeatable read isolation)
  • read-write (read committed isolation)
  • nonstrict-read-write (no consistency guarantee, useful for non-volatile data like country lists, etc.)
  • read-only (only for data that never changes – during the runtime of the application)

Hibernate provides four cache providers:

  • EHCache (everything but transactional)
  • OSCache (the same)
  • JBoss Cache (transactional, read-only)
  • SwarmCache (but the project seems to be dead)

Michael warns the audience that the configuration of a distributed cache (only possible with JBoss Cache) is extremely difficult and depends highly on the context. He proposes to hire an external JBoss specialist for that and plan at least a week of time to configure such a setup since you otherwise are extremely likely to run into consistency problems. Even Hibernate experts usually are not proficient enough to do that. Cache strategies must be configured for entities, which is possible on both the class and collection level. On class level only the things in the class will be cached, on collection level only the primary keys of the collection will be cached. If you want to cache everything you must cache both the entity, the collection and the referenced entity. You can use the @Cache annotation for that.

Aditionally cache regions define data storages for individual entity types. Cache region handling again must be configured in the 2nd level cache configuration. There you describe where the caching is done (e.g. disk or memory) and how often caches are kept.

Next he addresses the question which candidates should be used for 2nd level caches. He recommends being very conservative – all the other optimizations are much more important (e.g. fetch strategies and query optimization). Entities are good caching candidates if

  • there a very few inserts and updates (this is especially true for distributed caches),
  • there are many read accesses,
  • the data is uncritical,
  • the data is used by many sessions and users.

A completely contrary concept is the stateless session (in Hibernate). The concept is not very well documented and rarely used. Being created by sessionFactory.openStatelessSession() yu get a command oriented API directly accessing the database without persistence context, caching, transactional write-behind, cascading, interceptors or events. Additionally other caches won’t learn about data changes wrought upon the database by such stateless sessions. Michael never used that concept so far but guesses that it might be useful in batch scenarios.

Michael then proceeds to query optimization. First of all he addresses selectivity: Load just the data you really need and use projections to reduce the data size as early as possible Specifically you can create new objects (which must not be mapped by Hibernate!) holding the data subset you need. 

Next he addresses the issue of queries being built from strings which creates two mjor problems: HQL/SQL injection and performance implications because the query will have to be rebuild each time. He highly recommends to not use this but rather use variable bindings which bypass both problems.

Then he addresses the Hibernate query cache which is only suited for very few scenarios (e.g. the query parameters are stable). The cache must be configured separately and the query cache must be activated for each query (e.g. query.setCachable(true)).

Michael then addresses analysis issues since all optimization attempts are only useful if you know what context you are working on. There are several means of analysis:

  • Logging (specifically setting org.hibernate.SQL and org.hibernate.jdbc to TRACE). Global logging usually will provide too much and too detailed information. The logging parameters described above will provide access to the plain SQL generated by the ORM. Logging analysis helps to find query cache candidates, N+1 select issues and generally complex queries to be examined in more detail.
  • Statistics. These can be accessed by either sessionFactory.getStatistics() or JMX and must be configured previously. Statistics deliver global information about prepared statements, rollbacks, flushes, transactions, entity access information with fetch types and operations, the same for collections, information about HQL statistics (but not Criteria based queries – Hibernate 3.5 or 4. might bring new features if it shows up at some point in time).
  • Monitoring of the database infrastructure.

For analysis Michael recommends to analyse each use case, collect data, compare that to reference data, introduce one optimization, run again, check again, and so on.

For stress tests he recommends doing difference kinds of stress (normal stress, high stress, long running queries). Such tests should be done with production hardware and production data sizes. Preferrably tests should also run for at least 48 hours to find memory issues.

Finally Michael presented code examples based on his hobby music magazine for loud and fast music. The example is based on Wicket and Hibernate and illustrates how some simple tuning measures reduce the number of queries quickly.

During Q&A we learned that

  • JPA currently does not specify or offer statistics,
  • Michael is not sure whether EclipseLink offers statistics. An interesting difference from his point of view is that EclipseLink uses a 2nd level cache by default instead of requiring extra configuration. He believes EclipseLink to also be a highly sophisticated product.
  • changing the database at least implies running all the tests once more. Without native queries according to his experience all rules apply as usual – there should be no real differences. According to his experience he would always run a complete tuning session depending on the results of a stress test. Generally he believes that tuning will occur on database level.
  • JDBC drivers and driver versions may play a significant role. He e.g. experienced changing an Oracle 9i versus a 10g driver with a 40% performance difference (in that case on the positive side).

For examples related to the presentation Michael points to his blog.

Overall the presentation IMHO was excellent and very informative providing a great mix of detail concerning both general strategies and specific actions to take. Credits for one of the best talks I have seen so far at JAX 2009!

One Response to “JAX 2009: Tuning Hibernate & JPA (Tuning von Hibernate und JPA-Anwendungen)”

  1. dheeraj  on July 22nd, 2009

    very nice explanation