rachitskillisaurus: 2013

Sunday, November 17, 2013

Resolving generic method return types

Java supports generic/parameterized return types:

public <t> T anyType();

A method with a generic return type like above is now expected to return different types depending on its caller. For example -

String string = anyType();

expects "anyType()" to return a String. Whereas

Integer i = anyType();

expects "anyType()" to return an Integer. There does not appear to ber a way of determining a method's expected return type at runtime. To get around this, many APIs tend to specify the expected return type by describing it in method parameters - for example:

public <t> T anyType(Class<T> type);
public <T> T getBean(String beanName,Class<T> beanType);

I put together a library that provides an alternative -

public <T> T anyType(){
  Class<?> myReturnType = Resolver.getConcreteReturnType();
  ..
}

The library works by inspecting a method's call stack. The call stack typically includes class names, method names, and line numbers. Using these, the library locates the code surrounding method invocations using the ASM bytecode manipulation library and inspects it for "CHECKCAST" operations. The found type is the expected return type of the parameterized method. Extracting parameterized return types (e.g. Map<String,List<Integer>>) is a little trickier but also possible.

Instead of going with the google code route to share the project like I'm used to, I've decided to give github a try. I've also set up continuous integration using drone.io - so far, it's been an excellent experience. Read more over at https://github.com/ProfessorEugene/rrd-generics-inspector !

Saturday, October 5, 2013

Extending static methods in java

Java doesn't allow extending static methods. I put together a library with an

@OverrideStatic

annotation that allows one to change this behavior over at http://code.google.com/p/rrd-static-extender/.

Having recently done a bit of dirty copy/pasting to extend a third party library with a bunch of static methods at work, I've given some thought to what it would take to allow one to extend static methods.

Take the below hierarchy for example:

class Base{
  void instanceMethod(){
    staticMethod();
  }
  static void staticMethod(){
    ...
  }
}
class Subclass extends Base{
   static void staticMethod(){
    ...
   }
}

When one creates an instance of Subclass and invokes "instanceMethod" , Base#staticMethod() instead of Subclass#staticMethod() is invoked. Other than using something like aspectj or maybe powermock there isn't really a good way to change this behavior.

In some cases, one might want all instances of Subclass to invoke Subclass#staticMethod() instead of Base#staticMethod when "instanceMethod" is invoked. The desired behavior seems pretty simple to explain:

subclasses should have the ability to override their super-class' static methods
instance methods of such classes should invoke the overriding method
any inner and local classes of the super-class who's enclosing instance is a subclass should also invoke the overriding method.
instance methods of such classes' super classes should invoke the overridden method
in order to not interfere with expected behavior one should be able to mark an overriding method with an annotation

Starting with an @OverridesStatic annotation, there's a couple of approaches one could take to implement this behavior. At the end of the day, the bytecode of the base class needs to be instrumented to dispatch method calls to "staticMethod" to the appropriate place depending on the context of such calls.

Having intercepted a static method call, the instance of the caller needs to be inspected to check whether it is assignable from a class which contains the overriding method. If said instance is an inner class, it's enclosing class reference (something like this$1) needs to be recursively inspected. If the instance is a type of a subclass with the overriding method, the overriding method should be called in place of the overridden method.

Using the ASM bytecode manipulation framework to detect and modify any invocations of static methods in a class is a breeze. The problem is to how to load the modified bytecode. There's a few alternatives:

Use your own class loader that overrides the "defineClass" method for loading any classes that might need to have their static methods overridden. The problem here is that portions of your application now needs to be loaded through a custom class loader - something like TomcastInstrumentableClassLoader.
Rename classes (eg. the popular CGLib enhancer approach) as @OverrideStatic annotations are detected and load them in a separate class loader. Create a deep-proxy loaded in the application class loader that recursively copies fields and invokes methods on instances in the separate object hierarchy. This path is laden with traps - your proxy gateway won't work on final classes and the amount of reflection that has to happen is going to take a toll on performance.
Instrument the entire virtual machine using a java agent. While this is the cleanest way to go, it typically requires one to start the JVM with some funky looking command line arguments.

As anyone that's used jconsole before must be suspicious of, Java provides an way to allow one to instrument running virtual machines. Launching jconsole shows you a list of java virtual machines currently executing on your machine and allows you to 'attach' to one.

Presumably, jconsole achieves this via the java attach API. This API lets you list any running virtual machines and attach to them and load agents. While the implementation of the API is proprietary, the "Instrumentation" framework used by such agents is not. Indeed, coding against the java.lang.instrument.Instrumentation interface seems to be a portable endeavor.

In light of this, I came up with the rrd-attach-util library to allow one to attach to and instrument the virtual machine in which it is executed. The library provides a very simple API to obtain an Instrumentation instance for the running VM without having to jump through hoops packaging an actual agent:

LocalInstrumentationFactory.getLocalInstrumentation();

Once an Instrumentation object is obtained, all loaded classes can be inspected and instrumented. Since attach API implementations are proprietary, as far as I know rrd-attach-util only works with Sun/Oracle JDK6 and later. This doesn't seem like a big deal since most folks seem to be running Oracle JREs these days.

I used this API to create a simple implementation of a "static extender" To use it, one annotates static methods with an @OverrideStatic annotation, calls OverrideStatics.enable(), and all such annotations are picked up and static method invokation changes to the behavior described above until such time that OverrideStatics.disable() is invoked.

A unit test that demonstrates this behavior in action:

static class BaseClass{  
 public String callStaticMethod() throws Exception{      
  return staticMethod();   
 }
 public static String staticMethod(){
  return "BaseClass";
 }
}
static class SubClass extends BaseClass{
 @OverrideStatic
 public static String staticMethod(){
  return "Subclass";
 }
}
@Test
public void demonstrateOverrideStatic() throws Exception{
 /* instantiate a base and sub class */
 BaseClass baseClass = new BaseClass();
 SubClass subClass = new SubClass();
 /* BaseClass#callStaticMethod will always route to BaseClas#staticMethod */
 assertEquals("BaseClass",baseClass.callStaticMethod());
 /* SubClass#callStaticMethod will always route to BaseClas#staticMethod */
 assertEquals("BaseClass",subClass.callStaticMethod());
 /* lets see what happens when we enable static overrides */  
 OverrideStatics.enable();  
 /* SubClass#callStaticMethod will now route to SubClass#staticMethod */
 assertEquals("Subclass",subClass.callStaticMethod());
 /* BaseClass#callStaticMethod will still route to BaseClass#staticMethod */
 assertEquals("BaseClass",baseClass.callStaticMethod());
}

My implementation works great for my needs and doesn't seem to cause a big performance hit. There is however a lot of room for improvement:

The actual code could use a cleanup.
Calls to overridden static methods are intercepted and routed to a utility method that uses reflection to decide which method to invoke. It seems like this can be done through bytecode manipulation like the actual interception of these calls.

In general though, this is a great example of what one can do in a few days to totally modify Java behavior through bytecode manipulation. The library also serves as a functional showcase of the utility provided by the rrd-attach-util API. Until I've got the time to clean up, use at your own risk!

Wednesday, October 2, 2013

Spring: Force a bean to be the first to initialize

Spring provides the "depends-on" bean attribute to control bean initialization order. Unfortunately, there are some rare cases where you really want to run some code before the application context initializes. An example scenario might involve messing around with the thread class loader before something like "PropertiesPlaceholderConfigurer" has a chance to initialize.

There are a number of ways to go about this, but the easiest might be through implementing a PriorityOrdered BeanFactoryPostProcessor. After all spring beans are defined, spring calls all BeanFactoryPostProcessor s defined in the application context. Post processors that implement PriorityOrdered, and have the highest precedence are called first.

To execute some code before any other spring beans or processors have a chance to initialize, one can therefore use something like the following:

        
/**
 * An example spring pre-initializer.  Referncing this class as a bean in an 
 * application context will cause it to be the first executed PostProcessor 
 * after all beans in the application context are defined.
 */
public class PreInitializer implements BeanFactoryPostProcessor, PriorityOrdered {
 @Override
 public int getOrder() {
  return Ordered.HIGHEST_PRECEDENCE;
 }
 @Override
 public void postProcessBeanFactory(
   ConfigurableListableBeanFactory beanFactory) throws BeansException {
  /* put initialization code here */
 }
}

This is an easy hack to perform initialization tasks that might affect the application context before anything else - including property resolution has a chance to run.

Saturday, August 24, 2013

Computing path to maven module base directory

Having spent a bunch of time writing functional tests for maven projects with many modules, I tend to end up needing access to the project base path (typically the path with the pom.xml file in it). This isn't much of a problem when running a single module - new File(".") pretty much does it. On the other hand when attempting to access modules outside of the jvm base path, things start to get tricky.

There's a couple ways to skin this cat - probably one of the more portable ones is a utility method that accepts a Class and a path and returns a File by checking the supplied Class' classloader:

        public static File getPathInModule(Class<?> classInModule,String ... items){
                String tcPath = classInModule.getProtectionDomain().getCodeSource().getLocation().getFile();
                String[] tcPathParts = StringUtils.split(tcPath,File.separator);
                StringBuffer pomPath = new StringBuffer();
                for(String part:tcPathParts){
                        pomPath.append(File.separator);
                        if("target".equals(part)){
                                break;
                        }
                        pomPath.append(part);
                }
                pomPath.append(StringUtils.join(items,File.separator));
                return new File(pomPath.toString());
        }

This works fairly well as long as there is a target directory present, which is for my purposes - always. It could be modified to look for a pom.xml file instead though.

Wednesday, August 21, 2013

Hadoop 3.0.0 maven repository

Recent maven hadoop artifacts are pretty hard to come by. The central repository contains stable and 2.0.0-alpha releases, CDH repositories have 2.0.X releases, and the apache repository has 3.X snapshots. That means if you want 2.3.X or 3.X functionality, you're either stuck pulling in snapshots or building/hosting your own releases.

The apache snapshots are problematic because individual build snapshots don't appear to be retained for very long. Building/hosting your own artifacts seems like a pain. Hopefully, this is a temporary issue until hadoop 3.X is released. Until then, here's a maven repository that has (unofficial) 3.0.0 release artifacts with source jars.

It is pretty easy to put in pretty much any hadoop version into the repository. Visit the project page or shoot me a email to request one!

rachitskillisaurus