Tuesday, April 1, 2014

Extending Spring Data

What is Spring data

Spring Data is by theit own definition "Makes it easy to use new data access technologies". It is a collection of various database specific subprojects. At the heart of it, Spring data's goal is to make implementing a Data Access layer effortless
Almost all CRUD applications are designed with MVC pattern. Model-represents the Data, View the UI and COntroller the business logic. Usually, you implement a Data Access layer that interfaes between the controller and the model. The job of the Data layer is to isolate the specifics of the database implementation from the business logic. This is a good way to keep your controller from getting bloated, except that it leads to
a) Data access has a lot of boiler plate code
b) There is a common tendency fro Data access code to seep into Controller
Spring Data solves these problem by delegating the job of writing Data Access code to code. The idea is that the developer tells the framework what kind of Data access queries s/he wants, and Spring Data will generate the actual code. SO, for exmple, lets say we have an Employee  entity, and the developer wants to implement a Data access layer for it using Spring data, all s/he has to do is this
public interface EmployeeRepository extends JpaRepository {}
Boom! That's it.  Spring scans the packaged for interfaces that extend JpaRepository and automatically generates a bean  that implements that interface and adds the autogenerated bean to the application context. Any other classes that autowire an instance of EmployeeRepository will get the generated bean. JpaRepository declares the most common CRUD methods along with support for paging and sorting. The users of EmployeeRepository can use these methods right out of the box
If you want to add more methods to find Employee, all you need to do is add a method to the interface

public interface EmployeeRepository extends JpaRepository {
    @Query
    Page findByName(String name, Pageable pageable);
}
That's it. Spring Data already knows how to generate the code to find employee by name and to return the results page by page. It figures out what column to match by parsing the function name. The developer can also provide the JQL directly. To find out differrent ways of adding find queries into Spring Data, please consult the Spring documentation

If you want to move from using JPA to using MongoDB, all you need to do is change the interface that EmployeeRepsitory extends. So, all you use is change to this
public interface EmployeeRepository extends MongoRepository {
}
This tells Spring Data to generate code that talks to Mongo DB instead of JPA

[edit]How does it work?

A lot of the internals of Spring Data are undocumented, and even the Spring data code is poorly documented. I went through some effort of reverse engineering Spring Data to understand this* I am documenting this here so the next generation of hackers don't have to hack through Spring Data again
*I believe, I have seen  Mike Bond do some very similar things in Lucy, and I am guessing he has gone through the same process. 
As explained earlier, all of the Spring Data magic works by load-time code generation. Whenever a Spring context is loaded, Spring scans the packages for interfaces that extend the Spring Data interfaces and generates bean that implement these interfaces.  However, to make Spring Data extensible and debuggable, not all code is generated. Most of the code resides in classes implemented by Spring Data developers. At load-time, Spring Data generates a proxy class that delegates the calls to the "real" classes. The proxy simply acts as a bridge between the controller and the implementaiton provided by Spring
The way it is desinged is that for each of the respective Spring Data interfaces, there is a class that implements the interface. So, for example, there is a class named JpaRepositoryImpl that implements JpaRepository. SO, during load time, Spring Data scans the packages, and when it sees an interface that extends JpaRepository, it invokes a factory that 
  1. creates an instance of JpaRepositoryImpl
  2. Generates a proxy class that implements EmployeeRepository. The proxy class wraps the instance of JpaRepositoryImpl. The proxy is very lightweight and simply delegates to the wrapped instance
  3. Adds the proxy to the spring's application context. Since, at this point, this bean it's just like any other bean, it can be used as any other bean, which means it can be injected into other beans

[edit]Wait a minute! What about methods annotated with @Query

Good question! So, if you have an interface that looks like this
public interface EmployeeRepository extends JpaRepository {
    @Query
    Page findByName(String name, Pageable pageable);
}

WHat happens to the findByName method. Long answer short, this uses the same "pattern" of the proxy class delegating to abnother class that does the real query. During load time, Spring instantiates a class that implements RepositoryQuery interface for every method that has the Query annotation. The type of the concrete class depends on the type of query(for example if it's a JQL, it will use SimpleJPAQuery). The instance of RepositoryQuery is initialized with enough information to run the query (for example, the JQL will be passed to the SimpleJPAQuery). The proxy class will contain an object of RespositoryQuery for every method annotated with @Query , and it has the code to delegate to the correct ReqpositoryQuery

[edit]How do you extend it?

Once you understand how Spring Data works, extending Spring Data is a simple matter of either extending existing Spring Data classes or simply implementing your own classes. The beauty of Spring Data is that you don't have to extend any of their repositories, and still get SPring Data like functionality

Let's go through a simple sample:- We are trying to implement a Spring Data interface that can store data in CSV files. We are not extending existing Spring Data functionality. We are going to provide completely new interfaces.
These are the steps to follow

[edit]Define interface

Define an interface that your users have to implement. Your interface should define all the methods for basic CRUD functionality. To make it easy, you can just extend one of Spring data provided interfaces. You don;t have to. It just makes it easy
public interface CSVBasedRepository extends PagingAndSortingRepository {
}
This is the interface that your users will extend.. like this
public interface EmployeeRepository extends CSVBasedRepository  {
}
You can add more methods to CSVBasedRepository if you like. To keep things simple, we will keep it as is

[edit]Provide implementation for interface

Now, you need to provide an implementation of CSVBasedRepository. THis class does the real work of reading and writing to the CSV file
public class CSVBasedRepositoryImpl implements CSVBasedRepository {

    @Override

    public T findOne(ID id) {
        //TODO add code here
    }


    @Override

    public boolean exists(ID id) {

    //TODO

   }

.
.
.
.
// other methods declared CSVBasedRepository

}
So, now you have an implementation of a CSV based Spring repository that does most of the work of reading/writing a record(s) tio CSV

[edit]Provide implementation for Queries

First thing you need to decide is what kind of queries you want to support JPA supports, JQL queries, Named queries and PartTree queries. You don't have to support all of these. You can support only one, or you can come up with your own type of Query. The only thing is that your users have to know what kind of Queries will you support. Spring Data let's you define your own rules
So, in our example, let's say we don;t want to support  JQL queries and Named queries because it's too difficult, and unnescary. Let's say, we just want to support parttree. So, the user can define the repository as 
public interface EmployeeRepository extends CSVBasedRepository  {

   @Query

    public List findByName(String name);

    @Query

    public List findByNameAndJoinDate(String name, Date joinDate);

}
Here the name of the method tells you which columns to search
You need to implement a class that implements RespositoryQuery

class CSVQuery implements RepositoryQuery {

    final Method method;
    final RespositoryMetaData metadata;
    final List columnNames;

    public CSVQuery(Method method, RepositoryMetadata metadata,List columnNames){

     this.columnNames = new ArrayList(columnNames);
     this.method = method;
     this.metadata = metadata;

   }

   @Override

   public List execute(Object[] parameters) {
      //code to search the CSV file goes here.. The columns to search are in the constructor. The data to search for is in parameters
  }

  @Override

  public QueryMethod getQueryMethod() {

      // dont; think about this too much. Spring data just wants it like this... haven;t figured out rhyme or reason why this is here
      return new QueryMethod(method, metadata);

  }

}

[edit]Extend factory classes

So, now you have implemented a class that provides basic CRUD functionality, and a class that can run queries against your data source. Now you need to make the magic happen. The magic lies in factory classes that generate a proxy class. Just to make it simple for us common folk. Spring Data has provided classes that provide the basic functioality of the factory that you can extend. You don;t have to generate the proxies yourself. Just provide a factory that creates the real repository and query beans, and the base class takes care of generating the proxy.
Also, because Spring Data doesn't want things to be too easy for us, they decided the best way to do this is to have a factory that returns a factory that returns. Don't think about it too much. 
This is where all the magic behind Spring Data lies, and unfortunately this code is very badly documented, and frankly, very confusing. Anyways, the awesomeness of Spring Data kind of justifies the over engineered code 

[edit]Implement QueryLookupStrategy

QueryLookupStrategy is the factory that creates an instance of RepositoryQuery and the query class. You have to implement one method named resolveQuery that returns RepositoryQuery
class CSVQueryLookupStrategy extends QueryLookupStrategy {

     public RepositoryQuery resolveQuery(Method method, RepositoryMetadata metadata, NamedQueries namedQueries) {
        // here method refers to the method being implemented and metadata gives access to the interface being implemented
        List colunNames = extractColumnNamesFromMethod(method); 
        return new CSVQuery(method, metadata, columnNames);
    }
}

[edit]Implement RepositoryFactorySupport

This is a class that acts as a factory for the Repository interface and  factory for QueryLookupStrategy (which makes it a factory for a factory) 

class CSVRepositoryFactorySupport extends RepositoryFactorySupport {

   /**

   * Spring calls this to create the object that will get all the calls

   */
   @Override
   public Object getTargetRepository(final RepositoryMetadata metadata) {
        return new CSVBasedRepositoryImpl();

   }

   /**

    * Spring calls this to get the class that the proxy will wrap

   */

    @Override

    protected Class getRepositoryBaseClass(final RepositoryMetadata metadata) {

        return CSVBasedRepositoryImpl.class;

   }
   /**

    * Spring calls this to get the factory that will create the query objects

   */

   @Override
   protected QueryLookupStrategy getQueryLookupStrategy(final Key key) {

      return new CSVQueryLookupStrategy();

   }

}

[edit] Implement RepositoryFactoryBeanSupport

 This is a factory that creates RepositoryFactorySupport. Yes this makes it a factory that is a factory for a factory. No idea why everything is in one factory
public class CSVRepositoryFactoryBean, S, ID extends Serializable> extends RepositoryFactoryBeanSupport{ 

   @Override
   protected RepositoryFactorySupport createRepositoryFactory() { return new CSVRepositoryFactorySupport();}

}
That's it. You have just implemented your own Spring Data

[edit]What are the limitations?

The limitations are really around the Query annotation. For some reason, all thoughts of extensibility have been thrown out while designing the support for Query annotation.
  • The authors have decided to make SimpleJPAQuery final. So, if you wanted to make a custom JPARespository that modifies JQL query, you are out of luck. 
  • PartTreeJPAQuery is also not very extensible. It has a nested class that converts the method name to a JPA query. Unfortunately, you cannot extend it