Tour de Mock 6: Spock

09 April 2015 ~ blog, groovy, testing

My last entry in my "Tour de Mock" series was focused on basic Groovy mocking. In this post, I am going to take a look at the Spock Framework, which is an alternative testing framework with a lot of features, including its own mocking API.

Since it's been a while, let's refer back to the original posting as a refresher of what is being tested. We have a Servlet, the EmailListServlet

public class EmailListServlet extends HttpServlet {

    private EmailListService emailListService;

    public void init() throws ServletException {
        final ServletContext servletContext = getServletContext();
        this.emailListService = (EmailListService)servletContext.getAttribute(EmailListService.KEY);

        if(emailListService == null) throw new ServletException("No ListService available!");
    }

    protected void doGet(final HttpServletRequest req, final HttpServletResponse res) throws ServletException, IOException {
        final String listName = req.getParameter("listName");
        final List<String> list = emailListService.getListByName(listName);
        PrintWriter writer = null;
        try {
            writer = res.getWriter();
            for(final String email : list){
                writer.println(email);
            }
        } finally {
            if(writer != null) writer.close();
        }
    }
}

which uses an EmailListService

public interface EmailListService {

    public static final String KEY = "com.stehno.mockery.service.EmailListService";

    /**
     * Retrieves the list of email addresses with the specified name. If no list
     * exists with that name an IOException is thrown.
     */
    List<String> getListByName(String listName) throws IOException;
}

to retrieve lists of email addresses, because that's what you do, right? It's just an example. :-)

First, we need to add Spock to our build (recently converted to Gradle, but basically the same) by adding the following line to the build.gradle file:

testCompile "org.spockframework:spock-core:1.0-groovy-2.4"

Next, we need a test class. Spock uses the concept of a test "Specification" so we create a simple test class as:

class EmailListServlet_SpockSpec extends Specification {
    // test stuff here...
}

Not all that different from a JUnit test; conceptually they are very similar.

Just as in the other examples of testing this system, we need to setup our mock objects for the servlet environment and other collaborators:

def setup() {
    def emailListService = Mock(EmailListService) {
        _ * getListByName(null) >> { throw new IOException() }
        _ * getListByName('foolist') >> LIST
    }

    def servletContext = Mock(ServletContext) {
        1 * getAttribute(EmailListService.KEY) >> emailListService
    }

    def servletConfig = Mock(ServletConfig) {
        1 * getServletContext() >> servletContext
    }

    emailListServlet = new EmailListServlet()
    emailListServlet.init servletConfig

    request = Mock(HttpServletRequest)
    response = Mock(HttpServletResponse)
}

Spock provides a setup method that you can override to perform your test setup operations, such as mocking. In this example, we are mocking the service interface, and the servlet API interfaces so that they behave in the deisred manner.

The mocking provided by Spock took a little getting used to when coming from a primarily mockito-based background, but once you grasp the overall syntax, it's actually pretty expressive. In the code above for the EmailListService, I am mocking the getListByName(String) method such that it will accept any number of calls with a null parameter and throw an exception, as well as any number of calls with a foolist parameter which will return a reference to the email address list. Similarly, you can specify that you expect only N calls to a method as was done in the other mocks. You can dig a little deeper into the mocking part of the framework in the Interaction-based Testing section of the Spock documentation.

Now that we have our basic mocks ready, we can test something. As in the earlier examples, we want to test the condition when no list name is specified and ensure that we get the expected Exception thrown:

def 'doGet: without list'() {
    setup:
    1 * request.getParameter('listName') >> null

    when:
    emailListServlet.doGet request, response

    then:
    thrown(IOException)
}

One thing you should notice right away is that Spock uses label blocks to denote different parts of a test method. Here, the setup block is where we do any additional mocking or setup specific to this test method. The when block is where the actual operations being tested are performed while the then block is where the results are verified and conditions examined.

In our case, we need to mock out the reuest parameter to return null and then we need to ensure that an IOException is thrown.

Our other test is the case when a valid list name is provided:

def 'doGet: with list'() {
    setup:
    1 * request.getParameter('listName') >> 'foolist'

    def writer = Mock(PrintWriter)

    1 * response.getWriter() >> writer

    when:
    emailListServlet.doGet request, response

    then:
    1 * writer.println(LIST[0])
    1 * writer.println(LIST[1])
    1 * writer.println(LIST[2])
}

In the then block here, we verify that the println(String) method of the mocked PrintWriter is called with the correct arguments in the correct order.

Overall, Spock is a pretty clean and expressive framework for testing and mocking. It actually has quite a few other interesting features that beg to be explored.

You can find the source code used in this posting in my TourDeMock project.

Testing AST Transformations

08 March 2015 ~ blog, groovy, testing

While working on my Effigy project, I have gone deep into the world of Groovy AST Transformations and found that they are, in my opinion, the most interesting and useful feature of the Groovy language; however, developing them is a bit of a poorly-documented black art, especially around writing unit tests for your transformations. Since the code you are writing is run at compile-time, you generally have little access or view to what is going on at that point and it can be quite frustrating to try and figure out why something is failing.

After some Googling and experimentation, I have been able to piece together a good method for testing your transformation code, and it's actually not all that hard. Also, you can do your development and testing in a single project, rather than in a main project and testing project (to account for the need to compile the code for testing)

The key to making transforms testable is the GroovyClassLoader which gives you the ability to compile Groovy code on the fly:

def clazz = new GroovyClassLoader().parseClass(sourceCode)

During that parseClass method is when all the AST magic happens. This means you can not only easily test your code, but also debug into your transformations to get a better feel for what is going wrong when things break - and they often do.

For my testing, I have started building a ClassBuilder code helper that is a shell for String-based source code. You provide a code template that acts as your class shell, and then you inject code for your specific test case. You end up with a reasonably clean means of building test code and instantiating it:

private final ClassBuilder code = forCode('''
    package testing

    import com.stehno.ast.annotation.Counted

    class CountingTester {
        $code
    }
''')

@Test void 'single method'(){
    def instance = code.inject('''
        @Counted
        String sayHello(String name){
            "Hello, $name"
        }
    ''').instantiate()

    assert instance.sayHello('AST') == 'Hello, AST'
    assert instance.getSayHelloCount() == 1

    assert instance.sayHello('Counting') == 'Hello, Counting'
    assert instance.getSayHelloCount() == 2
}

The forCode method creates the builder and prepares the code shell. This construct may be reused for each of your tests.

The inject method adds in the actual code you care about, meaning your transformation code being tested.

The instantiate method uses the GroovyClassLoader internally to load the class and then instantiate it for testing.

I am going to add a version of the ClassBuilder to my Vanilla project once it is more stable; however, I have a version of it and a simple AST testing demo project in the ast-testing CoffeaElectronica sub-repo. This sample code builds a simple AST Transformation for counting method invocations and writes normal unit tests for it (the code above is taken from one of the tests).

Note: I have recently discovered the groovy.tools.ast.TransformTestHelper class; I have not yet tried it out, but it seems to provide a similar base functionality set to what I have described here.

Custom Domain for GitHub Pages

15 February 2015 ~ blog

I have been working for a while now to get my blog fully cut over to being generated by JBake and hosted on GitHub; it's not all that difficult, just a format conversion and some domain fiddling, but I was procrastinating.

Pointing your GitHub Pages at a custom domain is not all that hard to do, and they provide decent documentation about how to do it; however, some streamlining is nice for DNS novices like myself. I may have done things a bit out of order, but it worked in the end...

First, I created A records for the GitHub-provided IP Addresses. I use Godaddy for my domain names, so your experience may be a bit different; but, in the Godaddy DNS Zone File editor you end up adding something like:

A Record

Next, I added a CName record alias for www pointing to my GitHub account hostname, which ended up looking like this:

CName Record

Lastly, you need to make changes in your repository - this step seems to be missed by a lot of people. The gist of it is that you add a new file to your gh-pages branch, named CNAME (all caps, no extension). And in that file you add your domain name (without http://www.). Save the file and be sure you push it to your remote repository.

At this point it worked for me, but the documentation said it could take up to 48 hours to propagate the changes.

Gradle and CodeNarc

07 November 2014 ~ blog, java, testing, gradle, groovy

The subject of "code quality tools" has lead to many developer holy wars over the years, so I'm not really going to touch the subject of their value or level of importance here, suffice to say that they are tools in your toolbox for helping to maintain a base level of "tedious quality", meaning style rules and general coding conventions enforced by your organization - it should never take the ultimate decision making from the developers.

That being said, let's talk about CodeNarc. CodeNarc is a rule-based code quality analysis tool for Groovy-based projects. Groovy does not always play nice with other code analysis tools, so it's nice that there is one specially designed for it and Gradle provides access to it out of the box.

Using the Gradle CodeNarc plugin is easy, apply the plugin to your build

apply plugin: 'codenarc'

and then do a bit of rule configuration based on the needs of your code base.

codenarcMain {
    ignoreFailures false
    configFile file('config/codenarc/codenarc-main.rules')

    maxPriority1Violations 0
    maxPriority2Violations 10
    maxPriority3Violations 20
}

codenarcTest {
    ignoreFailures true
    configFile file('config/codenarc/codenarc-test.rules')

    maxPriority1Violations 0
    maxPriority2Violations 10
    maxPriority3Violations 20
}

The plugin allows you to have different configurations for your main code and your test code, and I recommend using that functionality since generally you may care about slightly different things in your production code versus your test code. Also, there are JUnit-specific rules that you can ignore in your production code scan.

Notice that in my example, I have ignored failures in the test code. This is handy when you are doing a lot of active development and don't really want to fail your build every time your test code quality drops slightly. You can also set the thresholds for allowed violations of the three priority levels - when the counts exceed one of the given thresholds, the build will fail, unless it's ignored. You will always get a report for both main and test code in your build reports directory, even if there are no violations. The threshold numbers are something you will need to determine based on your code base, your team and your needs.

The .rules files are really Groovy DSL files, but the extension is unimportant so I like to keep them out of the Groovy namespace. The CodeNarc web site has a sample "kitchen sink" rule set to get things started - though it has a few rules that cause errors, you can comment those out or remove them from the file. Basically the file is a list of all the active rules, so removing one disables it. You can also configure some of them. LineLength is one I like to change:

LineLength { length = 150 }

This will keep the rule active, but will allow line lengths of 150 rather than the default 120 characters. You will need to check the JavaDocs for configurable rule properties; for the most part, they seem to be on or off.

Running the analysis is simple, the check task may be run by itself, or it will be run along with the build task.

gradle check

The reports (main and test) will be available in the build/reports/codenarc directory as two html files. They are not the prettiest reports, but they are functional.

If you are starting to use CodeNarc on an existing project, you may want to take a phased approach to applying and customizing rules so that you are not instantly bogged down with rule violations - do a few passes with the trimmed down rule set, fix what you can fix quickly and configure or disable the others and set your thresholds to a sensible level then make a goal to drop the numbers with each sprint or release so that progress is made.

Hello Again Slick2D

11 October 2014 ~ blog, java, groovy

I am finally getting back around to working on my little game programming project and I realized that somewhere along the
way, my project stopped working. I am using the Slick2D library, which seems to have little
in the way of formal release or distribution so it didn't surprise me. I think I had something hacked together making it
work last time. I decided to try and put some more concrete and repeatable steps around basic setup, at least for how I use it - I'm no
game programmer.

I'm using Groovy as my development language and Gradle for building. In the interest of time and clarity, I am going to use a
dump-and-describe approach here; there are only two files, so it should not be a big deal.

The build.gradle file is as follows:

group = 'com.stehno.demo'
version = '0.1'

buildscript {
    repositories {
        jcenter()

        maven {
            url 'http://dl.bintray.com/cjstehno/public/'
        }
    }

    dependencies {
        classpath 'com.stehno:gradle-natives:0.2'
    }
}

apply plugin:'groovy'
apply plugin:'application'
apply plugin:'com.stehno.natives'

compileJava {
    sourceCompatibility = 1.8
    targetCompatibility = 1.8
}

mainClassName = 'helloslick.HelloSlick'

repositories {
    jcenter()
}

dependencies {
    compile 'org.codehaus.groovy:groovy-all:2.3.6'

    compile 'org.slick2d:slick2d-core:1.0.1'
}

test {
    systemProperty 'java.library.path', file('build/natives/windows')
}

run {
    systemProperty 'java.library.path', file('build/natives/windows')
}

natives {
    jars = [
        'lwjgl-platform-2.9.1-natives-windows.jar',
        'jinput-platform-2.0.5-natives-windows.jar'
    ]
    platforms = 'windows'
}

task wrapper(type: Wrapper) {
    gradleVersion = '2.1'
}

The first point of note, is that I am using my Gradle Natives plugin, not as
a self-promotion, but since this is the reason I wrote it. This plugin takes care of extracting all the little native
libraries and putting them in your build so that they are easily accessible by your code. The configuration is found near
the bottom of the file, in the natives block - we want to extract the native libraries from the lwjgl and jinput libraries
for this project and in my case, I only care about the Windows versions (leave off platforms to get all platforms).

There was one interesting development during my time away from this project, a 3rd-party jar version of Slick2D has been pushed to maven central, which makes it a lot easier - I think I had to build it myself and fiddle with pushing it to my local maven repo or something. Now it's just another remote library (hopefully it works as expected - I have not played with it yet).

The last point of interest here is the use of the application plugin. This plugin provides an easy way to run your game
while specifying the java.library.path which is the painful part of running applications with native libraries. With the
application plugin and the run configuration in place, you can run the game from Gradle - admittedly not ideal, but this
is just development; I actually have a configuration set for the IzPack installer that I will write about later.

Now, we need some code to run, and the Slick2D wiki provides a simple Hello world sample that I have tweaked a bit for my
use - mostly just cosmetic changes:

package helloslick

import groovy.util.logging.Log
import org.newdawn.slick.*

import java.util.logging.Level

@Log
class HelloSlick extends BasicGame {

    HelloSlick(String gamename){
        super(gamename)
    }

    @Override
    public void init(GameContainer gc) throws SlickException {}

    @Override
    public void update(GameContainer gc, int i) throws SlickException {}

    @Override
    public void render(GameContainer gc, Graphics g) throws SlickException {
        g.drawString 'Hello Slick!', 50, 50
    }

    public static void main(String[] args){
        try {
            AppGameContainer appgc = new AppGameContainer(new HelloSlick('Simple Slick Game'))
            appgc.setDisplayMode(640, 480, false)
            appgc.start()

        } catch (SlickException ex) {
            log.log(Level.SEVERE, null, ex)
        }
    }
}

This just opens a game window and writes "Hello Slick!" in it, but if you have that working, you should be ready for playtime
with Slick2D.

Once you have the project setup (build.gradle in the root, and HelloSlick.groovy in /src/main/groovy/helloslick), you
are ready to go. Run the following to run the project.

gradle unpackNatives run

And if all is well, you will see the game window and message.

Like I said, this is mostly just for getting my development environment up and running as a sanity check, but maybe it is useful to others.

Yes, the explicit unpackNatives calls are annoying, it's something I am working on.

Spring Boot Embedded Server API

15 September 2014 ~ blog, spring, groovy, java, gradle

I have been investigating Spring-Boot for both work and personal projects and while it seems very all-encompassing and useful, I have found that its "opinionated" approach to development was a bit too aggressive for the project conversion I was doing at work; however, I did come to the realization that you don't have to use Spring-Boot as your projects core - you can use it and most of its features in your own project, just like any other java library.

The project I was working on had a customized embedded Jetty solution with a lot of tightly-coupled Jetty-specific configuration code with configuration being pulled from a Spring Application context. I did a little digging around in the Spring-Boot documentation and found that their API provides direct access to the embedded server abstraction used by a Boot project. On top of that, it's actually a very sane and friendly API to use. During my exploration and experimentation I was able to build up a simple demo application, which seemed like good fodder for a blog post - we're not going to solve any problems here, just a little playtime with the Spring-Boot embedded server API.

To start off, we need a project to work with; I called mine "spring-shoe" (not big enough for the whole boot, right?). I used Java 8, Groovy 2.3.2 and Gradle 2.0, but slightly older versions should also work fine - the build file looks like:

apply plugin: 'groovy'

compileJava {
    sourceCompatibility = 1.8
    targetCompatibility = 1.8
}

compileGroovy {
    groovyOptions.optimizationOptions.indy = false
}

repositories {
    jcenter()
}

dependencies {
    compile 'org.codehaus.groovy:groovy-all:2.3.2'

    compile 'javax.servlet:javax.servlet-api:3.0.1'
    compile 'org.eclipse.jetty:jetty-webapp:8.1.15.v20140411'

    compile 'org.springframework.boot:spring-boot:1.1.5.RELEASE'
    compile 'org.springframework:spring-web:4.0.6.RELEASE'
    compile 'org.springframework:spring-webmvc:4.0.6.RELEASE'
}

Notice, that I am using the spring-boot library, not the Gradle plugin or "starter" dependencies - this also means that you have to bring in other libraries yourself (e.g. the web and webmvc libraries above).

Next, we need an application starter, which just instantiates a specialized Application context, the AnnotationConfigEmbeddedWebApplicationContext:

package shoe

import org.springframework.boot.context.embedded.AnnotationConfigEmbeddedWebApplicationContext
import org.springframework.boot.context.embedded.EmbeddedWebApplicationContext

class Shoe {
    static void main( args ){
        EmbeddedWebApplicationContext context = new AnnotationConfigEmbeddedWebApplicationContext('shoe.config')
        println "Started context on ${new Date(context.startupDate)}"
    }
}

Where the package shoe.config is where my configuration class lives - the package will be auto-scanned. When this class' main method is run, it instantiates the context and just prints out the context start date. Internally this context will search for the embedded server configuration beans as well as any servlets and filters to be loaded on the server - but I am jumping ahead; we need a configuration class:

package shoe.config

import org.springframework.boot.context.embedded.EmbeddedServletContainerFactory
import org.springframework.boot.context.embedded.jetty.JettyEmbeddedServletContainerFactory
import org.springframework.context.annotation.Bean
import org.springframework.context.annotation.Configuration
import org.springframework.web.servlet.config.annotation.EnableWebMvc

@Configuration
@EnableWebMvc
class ShoeConfig {

    @Bean EmbeddedServletContainerFactory embeddedServletContainerFactory(){
        new JettyEmbeddedServletContainerFactory( 10101 )
    }
}

As you can see, it's just a simple Java-based configuration class. The EmbeddedServletContainerFactory class is the crucial part here. The context loader searches for a configured bean of that type and then loads it to create the embedded servlet container - a Jetty container in this case, running on port 10101.

Now, if you run Shoe.main() you will see some logging similar to what is shown below:

...
INFO: Jetty started on port: 10101
Started context on Thu Sep 04 18:59:24 CDT 2014

You have a running server, though its pretty boring since you have nothing useful configured. Let's start make it say hello using a simple servlet named HelloServlet:

package shoe.servlet

import javax.servlet.ServletException
import javax.servlet.http.HttpServlet
import javax.servlet.http.HttpServletRequest
import javax.servlet.http.HttpServletResponse

class HelloServlet extends HttpServlet {

    @Override
    protected void doGet( HttpServletRequest req, HttpServletResponse resp ) throws ServletException, IOException{
        resp.writer.withPrintWriter { w->
            w.println "Hello, ${req.getParameter('name')}"
        }
    }
}

It's just a simple HttpServlet extension that says "hello" with the input value from the "name" parameter. Nothing really special here. We could have just as easily used an extension of Spring's HttpServletBean here instead. Moving back to the ShoeConfig class, the modifications are minimal, you just create the servlet and register it as a bean.

@Bean HttpServlet helloServlet(){
    new HelloServlet()
}

Now fire the server up again, and browse to http://localhost:10101/helloServlet?name=Chris and you will get a response of:

Hello, Chris

Actually, any path will resolve to that servlet since it's the only one configured. I will come back to configuration of multiple servlets and how to specify the url-mappings in a little bit, but let's take the next step and setup a Filter implementation. Let's create a Filter that counts requests as they come in and then passes the current count along with the continuing request.

package shoe.servlet

import org.springframework.web.filter.GenericFilterBean

import javax.servlet.FilterChain
import javax.servlet.ServletException
import javax.servlet.ServletRequest
import javax.servlet.ServletResponse
import java.util.concurrent.atomic.AtomicInteger

class RequestCountFilter extends GenericFilterBean {

    private final AtomicInteger count = new AtomicInteger(0)

    @Override
    void doFilter( ServletRequest request, ServletResponse response, FilterChain chain ) throws IOException, ServletException{
        request.setAttribute('request-count', count.incrementAndGet())

        chain.doFilter( request, response )
    }
}

In this case, I am using the Spring helper, GenericFilterBean simply so I only have one method to implement, rather than three. I could have used a simple Filter implementation.

In order to make use of this new count information, we can tweak the HelloServlet so that it prints out the current count with the response - just change the println statement to:

w.println "<${req.getAttribute('request-count')}> Hello, ${req.getParameter('name')}"

Lastly for this case, we need to register the filter as a bean in the ShoeConfig class:

@Bean Filter countingFilter(){
    new RequestCountFilter()
}

Now, run the application again and hit the hello servlet a few times and you will see something like:

<10> Hello, Chris

The default url-mapping for the filter is "/*" (all requests). While, this may be useful for some quick demo cases, it would be much more useful to be able to define the servlet and filter configuration similar to what you would do in the web container configuration - well, that's where the RegistrationBeans come into play.

Revisiting the servlet and filter configuration in ShoeConfig we can now provide a more detailed configuration with the help of the ServletRegistrationBean and the FilterRegistrationBean classes, as follows:

@Bean ServletRegistrationBean helloServlet(){
    new ServletRegistrationBean(
        urlMappings:[ '/hello' ],
        servlet: new HelloServlet()
    )
}

@Bean FilterRegistrationBean countingFilter(){
    new FilterRegistrationBean(
        urlPatterns:[ '/*' ],
        filter: new RequestCountFilter()
    )
}

We still leave the filter mapped to all requests, but you now have access to any of the filter mapping configuration parameters. For instance, we can add a simple init-param to the RequestCountingFilter, such as:

int startValue = 0

private AtomicInteger count

@Override
protected void initFilterBean() throws ServletException {
    count = new AtomicInteger(startValue)
}

This will allow the starting value of the count to be specified as a filter init-parameter, which can be easily configured in the filter configuration:

@Bean FilterRegistrationBean countingFilter(){
    new FilterRegistrationBean(
        urlPatterns:[ '/*' ],
        filter: new RequestCountFilter(),
        initParameters:[ 'startValue': '1000' ]
    )
}

Nice and simple. Now, when you run the application again and browse to http://localhost:10101/helloServlet?name=Chris you get a 404 error. Why? Well, now you have specified a url-mapping for the servlet, try http://localhost:10101/hello?name=Chris and you will see the expected result, something like:

<1004> Hello, Chris

You can also register ServletContextListeners in a similar manner. Let's create a simple one:

package shoe.servlet

import javax.servlet.ServletContextEvent
import javax.servlet.ServletContextListener

class LoggingListener implements ServletContextListener {

    @Override
    void contextInitialized(ServletContextEvent sce) {
        println "Initialized: $sce"
    }

    @Override
    void contextDestroyed(ServletContextEvent sce) {
        println "Destroyed: $sce"
    }
}

And then configure it in ShoeConfig:

@Bean ServletListenerRegistrationBean listener(){
    new ServletListenerRegistrationBean(
        listener: new LoggingListener()
    )
}

Then, when you run the application, you will get a message in the server output like:

Initialized: javax.servlet.ServletContextEvent[source=ServletContext@o.s.b.c.e.j.JettyEmbeddedWebAppContext{/,null}]

Now, let's do something a bit more interesting - let's setup a Spring-MVC configuration inside our embedded server.

The first thing you need for a minimal Spring-MVC configuration is a DispatcherServlet which, at its heart, is just an HttpServlet so we can just configure it as a bean in ShoeConfig:

@Bean HttpServlet dispatcherServlet(){
    new DispatcherServlet()
}

Then, we need a controller to make sure this configuration works - how about a simple controller that responds with the current time; we will also dump the request count to show that the filter is still in play. The controller looks like:

package shoe.controller

import org.springframework.web.bind.annotation.RequestMapping
import org.springframework.web.bind.annotation.RestController

import javax.servlet.http.HttpServletRequest

@RestController
class TimeController {

    @RequestMapping('/time')
    String time( HttpServletRequest request ){
        "<${request.getAttribute('request-count')}> Current-time: ${new Date()}"
    }
}

Lastly for this example, we need to load the controller into the configuration; just add a @ComponentScan annotation to the ShoeConfig as:

@ComponentScan(basePackages=['shoe.controller'])

Fire up the server and hit the http://localhost:10101/time controller and you see something similar to:

<1002> Current-time: Fri Sep 05 07:02:36 CDT 2014

Now you have the ability to do any of your Spring-MVC work with this configuration, while the standard filter and servlet still work as before.

As a best-practice, I would suggest keeping this server configuration code separate from other configuration code for anything more than a trivial application (i.e. you wouldn't do your security and database config in this same file).

For my last discussion point, I want to point out that the embedded server configuration also allows you to do additional customization to the actual server instance during startup. To handle this additional configuration, Spring provides the JettyServerCustomizer interface. You simply implement this interface and add it to your sever configuration factory bean. Let's do a little customization:

class ShoeCustomizer implements JettyServerCustomizer {

    @Override
    void customize( Server server ){
        SelectChannelConnector myConn = server.getConnectors().find { Connector conn ->
            conn.port == 10101
        }

        myConn.maxIdleTime = 1000 * 60 * 60
        myConn.soLingerTime = -1

        server.setSendDateHeader(true)
    }
}

Basically just a tweak of the main connector and also telling the server to send an additional response header with the date value. This needs to be wired into the factory configuration, so that bean definition becomes:

@Bean EmbeddedServletContainerFactory embeddedServletContainerFactory(){
    def factory = new JettyEmbeddedServletContainerFactory( 10101 )
    factory.addServerCustomizers( new ShoeCustomizer() )
    return factory
}

Now when you start the server and hit the time controller you will see an additional header in the response:

Date:Fri, 05 Sep 2014 12:15:27 GMT

As you can see from this long discussion, the Spring-Boot embedded server API is quite useful all on its own. It's nice to see that Spring has exposed this functionality as part of its public API rather than hiding it under the covers somewhere.

The code I used for this article can be found in the main repository for this project, under the spring-shoe directory.

NodeTypes - Deeper Down the Rabbit Hole

23 August 2014 ~ blog, java, groovy

In my last post about Jackrabbit, "Wabbit Season with Jackrabbit", I fleshed out the old Jackrabbit tutorial and expanded it a bit to ingest some image file content. I touched on the subject of node types briefly, but did little with them. In this post, I am going to delve a bit deeper into using node types and creating your own.

In the older versions of Jackrabbit, they a text-based format for configuring your own node types. I is not well documented, and I was not at all sad to see that it is no longer used since Jackrabbit 2.x. There may be another approach to loading node types, but I found the programmatic approach interesting.

For this post, you will want to refer to the code presented in the other post, "Wabbit Season with Jackrabbit" as a starting point (especially the last version of the code, which the code here will be based on).

For this example, we are going to expand the previous example to include image metadata in the stored node properties. I was originally under the impression that Jackrabbit would automatically extract the metadata on ingestion of the data, but it appears that this is only the case for text-based data when doing indexing. This is not a big roadblock, though, since Apache Tika is included with Jackrabbit, although a slightly older version than what I wanted to use. You can add the following to your build.gradle file to update the version:

compile 'org.apache.tika:tika-parsers:1.5'

Tika provides metadata extractors for a wide range of file formats, one of which is JPEG images, which is what we are playing with here.

First, we need to extract the metadata from the image file. I did this just after the main method's file reference statement:

def metadata = extractMetadata( file )

The code for the extractMetadata(File) method is as follows:

private static Map<String,String> extractMetadata( File imageFile ){
    def meta = new Metadata()
    def extractor = new ImageMetadataExtractor( meta )

    log.info 'Extracting metadata from {}', imageFile

    extractor.parseJpeg(imageFile)

    def props = [:]
    meta.names().sort().each { name->
        props[name] = meta.get(name)
        log.info " : <image-meta> $name : ${meta.get(name)}"
    }

    return props
}

It's just a simple straight-forward use of the Tika ImageMetadataExtractor, which pulls out all the data and stores it into a Map for use later.

Then, after we create the main file node, we want to apply the metadata properties to it:

applyMetadata( fileNode, metadata )

The applyMetadata(Node,Map) method applies the metadata from the map as properties on the node. The code is as shown below:

private static void applyMetadata( Node node, Map<String,String> metadata ){
    node.addMixin('pp:photo')
    node.setProperty('pp:photo-width', metadata['Image Width'].split(' ')[0] as long )

    log.info 'Applied mixin -> {} :: {}', node.mixinNodeTypes.collect { it.name }.join(', '), node.getProperty('pp:photo-width').string
}

For the metadata, I used the concept of "Mixin" node types. Every node has a primary node type, in this case it's an "nt:file" node, but nodes can have multiple mixin node types also applied to them so that they can have additional properties available. This works perfectly in my case, since I want a file that is a photo with extra metadata associated with it.

Also, the dumpProps(Node) method changed slightly to avoid errors during extraction, and to hide properties we don't care about seeing:

private static void dumpProps( Node node ){
    log.info 'Node ({}) of type ({}) with mixins ({})', node.name, node.getPrimaryNodeType().name, node.getMixinNodeTypes()

    def iter = node.properties
    while( iter.hasNext() ){
        def prop = iter.nextProperty()
        if( prop.type != PropertyType.BINARY ){
            if( prop.name != 'jcr:mixinTypes' ){
                log.info ' - {} : {}', prop.name, prop.value.string
            }
        } else {
            log.info ' - {} : <binary-data>', prop.name
        }
    }
}

If you run the code at this point, you will get an error about the node type not being defined, so we need to define the new node type. In the current version of Jackrabbit, they defer node type creation to the standard JCR 2.0 approach, which is pretty clean. The code is shown below:

private static void registerNodeTypes(Session session ) throws Exception {
    if( !session.namespacePrefixes.contains('pp') ){
        session.workspace.namespaceRegistry.registerNamespace('pp', 'http://stehno.com/pp')
    }

    NodeTypeManager manager = session.getWorkspace().getNodeTypeManager()

    if( !manager.hasNodeType('pp:photo') ){
        NodeTypeTemplate nodeTypeTemplate = manager.createNodeTypeTemplate()
        nodeTypeTemplate.name = 'pp:photo'
        nodeTypeTemplate.mixin = true

        PropertyDefinitionTemplate propTemplate = manager.createPropertyDefinitionTemplate()
        propTemplate.name = 'pp:photo-width'
        propTemplate.requiredType = PropertyType.LONG
        propTemplate.multiple = false
        propTemplate.mandatory = true

        nodeTypeTemplate.propertyDefinitionTemplates << propTemplate

        manager.registerNodeType( nodeTypeTemplate, false )
    }
}

Which is called just after logging in and getting a reference to a repository session. Basically, you use the NodeTypeManager to create a NodeTypeTemplate which you can use to specify the configuration settings of your new node type. There is a similar construct for node type properties, the PropertyDefinitionTemplate. Once you have your configuration done, you register the node type and you are ready to go.

When run, this code generates output similar to:

2014-08-23 16:43:02 Rabbits [INFO] User (admin) logged into repository (Jackrabbit)
...
2014-08-23 16:43:02 Rabbits [INFO]  : <image-meta> Image Width : 2448 pixels
...
2014-08-23 16:43:02 Rabbits [INFO] Applied mixin -> pp:photo :: 2448
2014-08-23 16:43:02 Rabbits [INFO] Stored image file data into node (2014-08-19 20.49.40.jpg)...
2014-08-23 16:43:02 Rabbits [INFO] Node (2014-08-19 20.49.40.jpg) of type (nt:file) with mixins ([org.apache.jackrabbit.core.nodetype.NodeTypeImpl@5b3bb1f7])
2014-08-23 16:43:02 Rabbits [INFO]  - jcr:createdBy : admin
2014-08-23 16:43:02 Rabbits [INFO]  - pp:photo-width : 2448
2014-08-23 16:43:02 Rabbits [INFO]  - jcr:primaryType : nt:file
2014-08-23 16:43:02 Rabbits [INFO]  - jcr:created : 2014-08-23T16:43:02.531-05:00
2014-08-23 16:43:02 Rabbits [INFO] Node (jcr:content) of type (nt:resource) with mixins ([])
2014-08-23 16:43:02 Rabbits [INFO]  - jcr:lastModified : 2014-08-19T20:49:44.000-05:00
2014-08-23 16:43:02 Rabbits [INFO]  - jcr:data : <binary-data>
2014-08-23 16:43:02 Rabbits [INFO]  - jcr:lastModifiedBy : admin
2014-08-23 16:43:02 Rabbits [INFO]  - jcr:uuid : a699fbd6-4493-4dc7-9f7a-b87b84cb1ef9
2014-08-23 16:43:02 Rabbits [INFO]  - jcr:primaryType : nt:resource

(I omitted a bunch of the metadata output lines to clean up the output)

You can see that the new node type data is populated from the metadata and the mixin is properly applied.

Call me crazy, but this approach seems a lot cleaner than the old text-based approach. There are some rules around node types and ensuring that they are not created if they already exist, though this only seems to be a problem in certain use cases - need to investigate that a bit more, but be aware of it.

Now, you can stop here and create new node types all day long, but let's take this experiment a little farther down the rabbit hole. The programmatic approach to node type configuration seems to lend itself nicely to a Groovy-based DSL approach, something like:

private static void registerNodeTypes( Session session ) throws Exception {
    definitions( session.workspace ){
        namespace 'pp', 'http://stehno.com/pp'

        nodeType {
            name 'pp:photo'
            mixin true

            propertyDefinition {
                name 'pp:photo-width'
                requiredType PropertyType.LONG
                multiple false
                mandatory true
            }

            propertyDefinition {
                name 'pp:photo-height'
                requiredType PropertyType.LONG
                multiple false
                mandatory true
            }
        }
    }
}

Seems like a nice clean way to create new node types and their properties with little fuss and muss. So, using a little Groovy DSL closure delegation we can do this without too much pain:

class NodeTypeDefiner {

    private final NodeTypeManager manager
    private final Workspace workspace

    private NodeTypeDefiner( final Workspace workspace ){
        this.workspace = workspace
        this.manager = workspace.nodeTypeManager
    }

    void namespace( String name, String uri ){
        if( !workspace.namespaceRegistry.prefixes.contains(name) ){
            workspace.namespaceRegistry .registerNamespace(name, uri)
        }
    }

    static void definitions( final Workspace workspace, Closure closure ){
        NodeTypeDefiner definer = new NodeTypeDefiner( workspace )
        closure.delegate = definer
        closure.resolveStrategy = Closure.DELEGATE_ONLY
        closure()
    }

    void nodeType( Closure closure ){
        def nodeTypeTemplate = new DelegatingNodeTypeTemplate( manager )

        closure.delegate = nodeTypeTemplate
        closure.resolveStrategy = Closure.DELEGATE_ONLY
        closure()

        manager.registerNodeType( nodeTypeTemplate, true )
    }
}

The key pain point I found here was that with the nested closure structures, I needed to change the resolveStrategy so that you get the delegate only rather than the owner - took a little debugging to trace that one down.

The other useful point here was the "Delegating" extensions of the two "template" classes:

class DelegatingNodeTypeTemplate implements NodeTypeDefinition {

    @Delegate NodeTypeTemplate template
    private final NodeTypeManager manager

    DelegatingNodeTypeTemplate( final NodeTypeManager manager ){
        this.manager = manager
        this.template = manager.createNodeTypeTemplate()
    }

    void name( String name ){
        template.setName( name )
    }

    void mixin( boolean mix ){
        template.mixin = mix
    }

    void propertyDefinition( Closure closure ){
        def propertyTemplate = new DelegatingPropertyDefinitionTemplate( manager )
        closure.delegate = propertyTemplate
        closure.resolveStrategy = Closure.DELEGATE_ONLY
        closure()
        propertyDefinitionTemplates << propertyTemplate
    }
}

class DelegatingPropertyDefinitionTemplate implements PropertyDefinition {

    @Delegate PropertyDefinitionTemplate template
    private final NodeTypeManager manager

    DelegatingPropertyDefinitionTemplate( final NodeTypeManager manager ){
        this.manager = manager
        this.template = manager.createPropertyDefinitionTemplate()
    }

    void name( String name ){
        template.setName( name )
    }

    void requiredType( int propertyType ){
        template.setRequiredType( propertyType )
    }

    void multiple( boolean value ){
        template.multiple = value
    }

    void mandatory( boolean value ){
        template.mandatory = value
    }
}

They provide the helper methods to allow a nice clean DSL. Without them you have only setters, which did not work out cleanly. You just end up with some small delegate classes.

This code takes care of adding in the property definitions, registering namespaces and node types. It does not currently support all the configuration properties; however, that would be simple to add - there are not very many available.

As you can see from the DSL example code, you can now add new node types in a very simple manner. This kind of thing is why I love Groovy so much.

If there is any interest in this DSL code, I will be using it in one of my own projects, so I could extract it into a library for more public use - let me know if you are interested.

Wabbit Season with Jackrabbit

23 August 2014 ~ blog, java, groovy

I have been playing with Apache Jackrabbit today, while doing some research for one of my personal projects, and while it seems to have matured a bit since the last time I looked into it, the documentation has stagnated. Granted, it still works as a jump-start better than nothing at all, but it really does not reflect the current state of the API. I present here a more modern take on the "First Hops" document based on what I did for my research - I am using Gradle, Groovy, and generally more modern versions of the libraries involved. Maybe this can help others, or myself at a later date.

Getting Started

The quickest and easiest way to get started is using an embedded TransientRepository. Create a project directory and create a build.groovy Gradle build file similar to the following:

apply plugin: 'groovy'

repositories {
    jcenter()
}

dependencies {
    compile 'org.codehaus.groovy:groovy-all:2.3.6'

    compile 'javax.jcr:jcr:2.0'
    compile 'org.apache.jackrabbit:jackrabbit-core:2.8.0'
    compile 'org.slf4j:slf4j-log4j12:1.7.7'
}

This will give you the required dependencies and a nice playground project to work with.

Logging in to Jackrabbit

In the src/main/groovy directory of the project, create a file called Rabbits.groovy with the following code:

import groovy.util.logging.Slf4j
import org.apache.jackrabbit.core.TransientRepository

import javax.jcr.Repository
import javax.jcr.Session

@Slf4j
class Rabbits {

    static void main(args) throws Exception {
        Repository repository = new TransientRepository(
            new File('./build/repository')
        )

        Session session = repository.login()
        try {
            String user = session.getUserID()
            String name = repository.getDescriptor(Repository.REP_NAME_DESC)

            log.info 'Logged in as {} to a {} repository.', user, name

        } finally {
            session.logout()
        }
    }
}

The important part here is the TransientRepository code, which allows you to use/reuse a repository for testing. I found that specifying a repository directory in my build directory was useful since by default it will put a bunch of files and directories in the root of your project when you run the project - it's just a little cleaner when you can run gradle clean to wipe out your development repository when needed. The downside of specifying the directory seems to be that your repository is not completely transient. I was not clear whether or not this was always the case or just when I set the directory, hence the need to wipe it out sometimes.

The rest of the code is pretty clear, it just does a login to the repository and writes out some information. When run, you should get something like the following:


The `finally` block is used to always logout of the repository, though this seems a bit dubious because it seemed quite easy to lock the repository in a bad state when errors caused application failure - this will require some additional investigation. Lastly, to round out the first version of the project, create a `log4j.properties` file in `src/main/resources` so that your logger has some configuration. I used:

log4j.rootCategory=INFO, Cons

log4j.logger.com.something=ERROR

log4j.logger.org.apache.jackrabbit=WARN

log4j.appender.Cons = org.apache.log4j.ConsoleAppender
log4j.appender.Cons.layout = org.apache.log4j.PatternLayout
log4j.appender.Cons.layout.ConversionPattern = %d{yyyy-MM-dd HH:mm:ss} %c{1} [%p] %m%n
```

If you want to see more about what Jackrabbit is doing, set the logging level for log4j.logger.org.apache.jackrabbit to INFO - it gets a little verbose, so I turned it down to WARN.

Working with Content

When using a content repository, you probably want to do something with actual content, so let's start off with a simple case of some nodes with simple text content. The main method of the Rabbits class now becomes:

Repository repository = new TransientRepository(
    new File('./build/repository')
)

Session session = repository.login(
    new SimpleCredentials('admin','admin'.toCharArray())
)

try {
    String username = session.userID
    String name = repository.getDescriptor(Repository.REP_NAME_DESC)
    log.info 'User ({}) logged into repository ({})', username, name

    Node root = session.rootNode

    // Store content
    Node hello = root.addNode('hello')
    Node world = hello.addNode('world')
    world.setProperty('message', 'Hello, World!')
    session.save()

    // Retrieve content
    Node node = root.getNode('hello/world')
    log.info 'Found node ({}) with property: {}', node.path, node.getProperty('message').string

    // Remove content
    root.getNode('hello').remove()
    log.info 'Removed node.'

    session.save()

} finally {
    session.logout()
}

Notice, that the login code now contains credentials so that we can login with a writable session rather than the read-only default session (previous example).

First, we need to store some content in the repository. Since Jackrabbit is a hierarchical data store, you need to get a reference to the root node, and then add a child node to it with some content:

Node root = session.rootNode

// Store content
Node hello = root.addNode('hello')
Node world = hello.addNode('world')
world.setProperty('message', 'Hello, World!')
session.save()

We create a node named "hello", the add a child named "world" to that node, and give the child node a "message" property. Notice that we save the session to persist the changes to the underlying data store.

Next, we want to read the data back out:

Node node = root.getNode('hello/world')
log.info 'Found node ({}) with property: {}', node.path, node.getProperty('message').string

You just get the node by it's relative path, in this case from the root, and then retrieve its data.

Lastly, for this example, we want to remove the nodes we just added:

root.getNode('hello').remove()
session.save()
log.info 'Removed node.'

Removing the "hello" node removes it and it's children (i.e. the "world" node). We then save the session to commit the node removal.

When you run this version of the code, you should see something like this:

2014-08-23 15:45:18 Rabbits [INFO] User (admin) logged into repository (Jackrabbit)
2014-08-23 15:45:18 Rabbits [INFO] Found node (/hello/world) with property: Hello, World!
2014-08-23 15:45:18 Rabbits [INFO] Removed node.

Working with Binary Content

This is where my tour diverts from the original wiki document, which goes on to cover XML data imports. I was more interested in loading binary content, especially image files. To accomplish this, we need to consider how the data is stored in JCR. I found a very helpful article "Storing Files and Folders" from the ModeShape documentation (another JCR implementation) - since it's standard JCR, it is still relevant with Jackrabbit.

Basically you need a node for the file and it's metadata, which has a child node for the actual file content. The article has some nice explanations and diagrams, so if you want more than code and quick discussion I recommend you head over there and take a look at it. For my purpose, I am just going to ingest a single image file and then read out the data to ensure that it was actually stored. The code for the try/finally block of our example becomes:

String username = session.userID
String name = repository.getDescriptor(Repository.REP_NAME_DESC)
log.info 'User ({}) logged into repository ({})', username, name

Node root = session.rootNode

// Assume that we have a file that exists and can be read ...
File file = IMAGE_FILE

// Determine the last-modified by value of the file (if important) ...
Calendar lastModified = Calendar.instance
lastModified.setTimeInMillis(file.lastModified())

// Create an 'nt:file' node at the supplied path ...
Node fileNode = root.addNode(file.name, 'nt:file')

// Upload the file to that node ...
Node contentNode = fileNode.addNode('jcr:content', 'nt:resource')
Binary binary = session.valueFactory.createBinary(file.newInputStream())
contentNode.setProperty('jcr:data', binary)
contentNode.setProperty('jcr:lastModified',lastModified)

// Save the session (and auto-created the properties) ...
session.save()

log.info 'Stored image file data into node ({})...', file.name

// now get the image node data back out

def node = root.getNode(file.name)
dumpProps node

dumpProps node.getNode('jcr:content')

Where IMAGE_FILE is a File object pointing to a JPEG image file.

The first thing we do is create the file node:

Node fileNode = root.addNode(file.name, 'nt:file')

Notice, it's of type nt:file to designate that it's a file node - you will want to brush up on NodeTypes in the Jackrabbit or JCR documentation if you don't already have a basic understanding; I won't do much more than use them in these examples. For the name of the node, we just use the file name.

Second, we create the file content node as a child of the file node:

Node contentNode = fileNode.addNode('jcr:content', 'nt:resource')
Binary binary = session.valueFactory.createBinary(file.newInputStream())
contentNode.setProperty('jcr:data', binary)
contentNode.setProperty('jcr:lastModified',lastModified)

// Save the session (and auto-created the properties) ...
session.save()

Notice that the child node is named "jcr:content" and is of type "nt:resource" and that it has a property named "jcr:data" containing the binary data content for the file. Of course, the session is saved to persist the changes.

Once we have the file data stored, we want to pull it back out to see that we stored everything as intended:

def node = root.getNode(file.name)
dumpProps node

dumpProps node.getNode('jcr:content')

The dumpProps method just iterates the properties of a given node and writes them to the log file:

private static void dumpProps( Node node ){
    log.info 'Node: ({})', node.name

    def iter = node.properties
    while( iter.hasNext() ){
        def prop = iter.nextProperty()
        if( prop.type != PropertyType.BINARY ){
            log.info ' - {} : {}', prop.name, prop.value.string
        } else {
            log.info ' - {} : <binary-data>', prop.name
        }
    }
}

When you run this version of the code, you will have output similar to:

2014-08-23 16:09:18 Rabbits [INFO] User (admin) logged into repository (Jackrabbit)
2014-08-23 16:09:18 Rabbits [INFO] Stored image file data into node (2014-08-19 20.49.40.jpg)...
2014-08-23 16:09:18 Rabbits [INFO] Node: (2014-08-19 20.49.40.jpg)
2014-08-23 16:09:18 Rabbits [INFO]  - jcr:createdBy : admin
2014-08-23 16:09:18 Rabbits [INFO]  - jcr:created : 2014-08-23T15:59:26.155-05:00
2014-08-23 16:09:18 Rabbits [INFO]  - jcr:primaryType : nt:file
2014-08-23 16:09:18 Rabbits [INFO] Node: (jcr:content)
2014-08-23 16:09:18 Rabbits [INFO]  - jcr:lastModified : 2014-08-19T20:49:44.000-05:00
2014-08-23 16:09:18 Rabbits [INFO]  - jcr:data : <binary-data>
2014-08-23 16:09:18 Rabbits [INFO]  - jcr:lastModifiedBy : admin
2014-08-23 16:09:18 Rabbits [INFO]  - jcr:uuid : cbdefd4a-ec2f-42d2-b58a-a39942766723
2014-08-23 16:09:18 Rabbits [INFO]  - jcr:primaryType : nt:resource

Conclusion

Jackrabbit seems to still have some development effort behind it, and it's still a lot easier to setup and use when compared with something like ModeShape, which seems to be the only other viable JCR implementation which is not specifically geared to a target use case.

The documentation is lacking, but with some previous experience and a little experimentation, it was not too painful getting things to work.

Simple Configuration DSL using Groovy

19 July 2014 ~ blog, java, groovy

Recently at work we were talking about being able to process large configuration files from legacy applications where the config file had a fairly simple text-based format. One of my co-workers mentioned that you could probably just run the configuration file like a Groovy script and just handle the missingMethod() calls and use them to populate a configuration object. This sounded like an interesting little task to play with so I threw together a basic implementation - and it's actually easier than I thought.

To start out with, we need a configuration holder class, which we'll just call Configuration:

class Configuration {
    String hostName
    String protocol
    int port
    Headers headers
}

Say we are collecting configuration information for some sort of HTTP request util or something, it's a contrived example, but shows the concept nicely. The Headers class is a simple delegated builder in itself, and looks like:

@ToString(includeNames=true)
class Headers {
    Map<String,Object> values = [:]

    static Headers headers( Closure closure ){
        Headers h = new Headers()
        closure.delegate = h
        closure()
        return h
    }
    
    void header( String name, value ){
        values[name] = value
    }
}

I won't explain much about the Headers class, other than it takes a closure and delegates the method calls of it onto a Headers instance to populate it. For our purposes it just makes a nice simple way to show closure usage in the example.

Now, we need a configuration file to load. It's just a simple text file:

hostname 'localhost'
protocol = 'https'
port 2468

headers {
    header 'Content-type','text/html'
    header 'Content-length',10101
}

The script-based configuration is similar to the delegated builder, in that the method calls of the "script" (text configuration file) will be delegated to an instance of the Configuration class. For that to work, we could override the missingMethod() method and handle each desired operation, or if we have a good idea of the configuration (as we do in our case), we could just add the missing methods, as follows:

@ToString(includeNames=true)
class Configuration {

    String hostName
    int port
    Headers headers
   
    void hostname( final String name ){
        this.hostName = name
    }

    void port( final int port ){
        this.port = port
    }
    
    void headers( final Closure closure ){
        this.headers = Headers.headers( closure )
    }
}

Basically, they are just setters in our case; however, you could do whatever conversion or validation you need, they're just method calls. Also, notice that the protocol property in the configuration file is actually setting the property directly with an equals = rather than using a method call - this is also valid, though personally I like the way it looks without all the equals signs.

The final part needed to make this work, is the Groovy magic. We need to load the text as a script in a GroovyShell, parse it and run it. The whole code for the Configuration object is shown below:

@ToString(includeNames=true)
class Configuration {

    String hostName
    String protocol
    int port
    Headers headers
   
    void hostname( final String name ){
        this.hostName = name
    }

    void port( final int port ){
        this.port = port
    }
    
    void headers( final Closure closure ){
        this.headers = Headers.headers( closure )
    }

    static Configuration configure( final File file ){
        def script = new GroovyShell(
            new CompilerConfiguration(
                scriptBaseClass:DelegatingScript.class.name 
            )
        ).parse(file)

        def configuration = new Configuration()
        script.setDelegate( configuration )
        script.run()

        return configuration
    }
}

The important parts are the use of the DelegatingScript as the scriptBaseClass and then setting the Configuration instance as the delegate for the script. Now if you run the following:

def conf = Configuration.configure( new File('conf.txt') )
println conf

You get something like the following output:

Configuration(protocol:https, hostName:localhost, port:2468, headers:Headers(values:[Content-type:text/html, Content-length:10101]))

Notice, that in the example we didn't define a method for protocol, which means that the only way you can set it in the configuration is as a property; however, we could use the property format to set the value of the other fields, such as port since there is a setter method available along with the helper method (options are nice).

Groovy makes simple DSLs, well... simple.

Going Native with Gradle

16 March 2014 ~ blog, java, groovy, gradle

With my recent foray into Java game programming, I found the support for managing the native sub-dependencies of jar files to be a bit lacking in Gradle. I did find a few blog posts about the general ways of adding it to your build; however, I did not find any specific plugin or built-in support. Since I am planning on doing a handful of simple games as a tutorial for game programming it made sense for me to pull out my native library handling functionality into a Gradle plugin... and thus the Gradle Natives Plugin was born.

First, we need a project to play with. I found a simple LWJGL Hello World application that works nicely for our starting point. So, create the standard Gradle project structure with the following files:

// hello/src/main/java/hello/HelloWorld.java
package hello;

import org.lwjgl.LWJGLException;
import org.lwjgl.opengl.Display;
 
public class HelloWorld {
    public static void main (String args[]){
        try {
            Display.setTitle("Hello World");
            Display.create();
			
			while(!Display.isCloseRequested()){
				Thread.sleep(100);      
			}
		
        } catch (Exception e) {
            e.printStackTrace();
        } finally {
			Display.destroy();
		}
    }
}

with a standard Gradle build file as a starting point:

// hello/build.gradle

apply plugin:'java'

repositories {
	jcenter()
}

dependencies {
	compile 'org.lwjgl.lwjgl:lwjgl:2.9.1'
}

At this point, the project will build, but will not run without jumping through some extra hoops. Let's do some of that hoop-jumping in Gradle with the application plugin. Add the following to the build.gradle file:

apply plugin:'application'

mainClassName = 'hello.HelloWorld'

This adds the run task to the build which will run the HelloWorld main class; however, this still won't work since it does not know how to deal with the LWJGL native libraries. That's where the natives plugin comes in. At this time there is no official release of the plugin on Bintray (coming soon), so you will need to clone the repo and build the plugin, then install it into your local maven repo:

git clone git@github.com:cjstehno/gradle-natives.git

cd gradle-natives

gradle build install

Once that is done, you will need to add the natives plugin to your build:

buildscript {
    repositories {
        mavenLocal()
    }

    dependencies {
        classpath 'gradle-natives:gradle-natives:0.1'
    }
}

apply plugin:'natives'

And then you will need to apply the custom configuration for your specific native libraries. You will need to add an entry in the jars list for each dependency jar containing native libraries. These are the jars that will be searched on the classpath for native libraries by platform.

natives {
	jars = [
		'lwjgl-platform-2.9.1-natives-windows', 
		'lwjgl-platform-2.9.1-natives-osx', 
		'lwjgl-platform-2.9.1-natives-linux'
	]
}

This will allow the associated native libraries to be unpacked into the build directory with:

gradle unpackNatives

Which will copy the libraries into a directory for each platform under build/natives/PLATFORM. Then we need one more step to allow it to be run. The java.library.path needs to be set before the run:

run {
    systemProperty 'java.library.path', file( 'build/natives/windows' )
}

Then you can run the application using:

gradle run

Granted, there are still issues to be resolved with the plugin. Currently, it is a little picky about when it is run. If you have tests that use the native libraries you will need to build without tests and then run the tests:

gradle clean build unpackNatives -x test

gradle test

Lastly, you can also specify the platforms whose library files are to be copied over using the platforms configuration property, for example:

natives {
	jars = [
		'lwjgl-platform-2.9.1-natives-windows', 
		'lwjgl-platform-2.9.1-natives-osx', 
		'lwjgl-platform-2.9.1-natives-linux'
	]
	platforms = 'windows'
}

Will only copy the windows libraries into the build.

Feel free to create an issue for any bugs you find or features you would like to see. Also, I am open to bug fixes and pull requests from others.

Mapping Large Data Sets

09 June 2013 ~ blog, java, javascript

Recently, I was tasked to resolve some performance issues related to displaying a large set of geo-location data on a map. Basically, the existing implementation was taking the simple approach of fetching all the location data from the server and rendering it on the map. While, there is nothing inherently wrong with this approach, it does not scale well as the number of data points increases, which was the problem at hand.

The map needed to be able to render equally well whether there were 100 data points or a million. With this direct approach, the browser started to bog down at just over a thousand points, and failed completely at 100-thousand. A million was out of the question. So, what can be done?

I have created a small demo application to help present the concepts and techniques I used in solving this problem. I intend to focus mostly on the concepts and keep the discussion of the code to a minimum. This will not really be much of an OpenLayers tutorial unless you are faced with a similar task. See the sidebar for more information about how to setup and run the application - it's only necessary if you want to run the demo yourself.

The demo application is available on GitHub and its README file contains the information you need to build and run it.

First, let's look at the problem itself. If you fire up the demo "V1" with a data set of 10k or less, you will see something like the following:

V1 View

You can see that even with only ten thousand data points it is visually cluttered and a bit sluggish to navigate. If you build a larger data set of 100k or better yet, a million data points and try to run the demo, at best it will take a long time, most likely it will crash your browser. This approach is just not practical for this volume of data.

The code for this version simply makes an ajax request to the data service to retrieve all the data points:

$.ajax('poi/v1/fetch', { contentType:'application/json' }).done(function(data){
	updateMarkers(map, data);
});

and then renders the markers for each data point on the map:

function updateMarkers( map, data ){
	var layer = map.getLayersByName('Data')[0];

	var markers = $.map(data, function(item){
		return new OpenLayers.Feature.Vector(
			new OpenLayers.Geometry.Point(
				item.longitude, item.latitude
			).transform(PROJECTION_EXTERNAL, PROJECTION_INTERNAL),
			{ 
				item:item 
			},
			OpenLayers.Util.applyDefaults(
				{ fillColor:'#0000ff' }, 
				OpenLayers.Feature.Vector.style['default']
			)
		);
	});

	layer.addFeatures(markers);
}

What we really need to do is reduce the amount of data being processed without losing any visual information? The key is to consider the scope of your view. Other than at the lowest zoom levels (whole Earth view) you are only viewing a relatively limited part of the whole map, which means that only a sub-set of the data is visible at any given time. So why fetch it all from the server when it just adds unnecessary load on the JavaScript mapping library?

The answer is that you don't have to. If you listen to map view change events and fetch the data for only your current view by passing the view bounding box to your query, you can limit the data down to only what you currently see. The "V2" demo uses this approach to limit the volume of data.

eventListeners:{
	moveend:function(){
		var bounds = map.getExtent().transform(PROJECTION_INTERNAL, PROJECTION_EXTERNAL).toString();

		$.ajax('poi/v2/fetch/' + bounds, { contentType:'application/json' }).done(function(data){
			updateMarkers(map, data);
		});
	}
}

The updateMarkers() function remains unchanged in this version.

Visually, this version of the application is the same; however, it will handle larger data sets with less pain. This approach increases the number of requests for data but will reduce the amount of data retrieved as the user zooms into their target area of interest.

This approach is still a bit flawed; this method works fine for cases where the user is zoomed in on a state or small country; however, it is still possible to view the whole large data set when your view is at the lower zoom levels (whole Earth). There is still more work to be done.

In order to reduce the number of data points when viewing the lower zoom levels, we need to consider how useful all this data really is. Considering the image from V1, which is still valid for V2, is there any use in rendering all of those data points? This is just random distributed data, but even real data would probably be as dense or even more so in areas around population centers which would only compound the problem. How can you clean up this display mess while also reducing the amount of data being sent, oh, and without any lose of useful information?

The first part of the answer is clustering (see Cluster Analysis). We needed to group the data together in a meaningful way such that we present a representative point for a nearby group of points, otherwise known as a cluster. After some research and peer discussion, it was decided that the K-Means Clustering Algorithm was the approach for our needs, and the Apache Commons - Math library provided a stable and generic implementation that would work well for our requirements. It is also what I have used here for this demo.

The clustering provides a means of generating a fixed-size data set representing the whole around a common center point. With this, you can limit your clustered data set down to something like 200, which can easily be displayed on the map, and will still provide an accurate representation of the location data.

Notice, though, I said that clustering was the first part of the answer... what is the second? Consider the effect of clustering on your data set as you zoom in from whole Earth view down to city street level. Clustering combined with view-bounds limiting will cause your overall data set to change. When the data points used in the cluster calculation change, the results change, which causes the location points to jump. I called this "jitter". Even just panning the map at a constant zoom level would cause map markers to move around like they were doing some sort of annoying square dance. To overcome the jittery cluster markers, you need to keep the data set used in the cluster calculation constant.

A hybrid approach is required. Basically, add the zoom level to the fetch request.

eventListeners:{
	moveend:function(){
		var bounds = map.getExtent().transform(PROJECTION_INTERNAL, PROJECTION_EXTERNAL).toString();
		var zoom = map.getZoom();

		$.ajax('poi/v3/fetch/' + bounds + '/' + zoom, { contentType:'application/json' }).done(function(data){
			updateMarkers(map, data);
		});
	}
}

At the lower zoom levels, up to a configured threshold, you calculate the clusters across the whole data set (not bound by view) and cache this cluster data so that the calculation will only be done on the first call. Since zoom is not a function of this calculation, there can be one cached data set for all of the zoom levels below the specified threshold. Then, when the user zooms into the higher zoom levels (over the threshold), the actual data points (filtered by the view bounds) are returned by the fetch.

If you look at demo V3, you can see this in action, for 10-thousand points:

V3 10k View

And if you run the demo with a one-million point data set, you will see the same view. The initial load will take a bit longer but once loaded, it should perform nicely. What you may notice, though is that once you cross the clustered threshold you may suddenly get a large data set again... not overly so, but just more than you might expect. This is an area that you would want to tune to your specific needs so that you have a balance of when this change occurs to get the best perceived results.

You could stop here and be done with it, but depending on how your data is distributed you could still run into some overly-dense visual areas. Consider the case where you generate a million data points, but only in the Western Hemisphere.

If you build a one-million point data set for only the Americas, you can see that there are still some overly-dense areas even with the clustering. Since I am using OpenLayers as the mapping API, I can use their client-side clustering mechanism to help resolve this. With the client-side clustering enabled, the mapping API will groups markers together by distance to help de-clutter the view. If you look at V3 again, you can see the cluster clutter problem:

V3 West

You can see that there are still some areas of high marker density. The client-side clustering strategy in OpenLayers can help relieve the clutter a bit:

new OpenLayers.Layer.Vector('Data',{
	style: OpenLayers.Util.applyDefaults(
		{
			fillColor:'#00ff00'
		},
		OpenLayers.Feature.Vector.style['default']
	),
	strategies:[
		new OpenLayers.Strategy.Cluster({
			distance:50,
			threshold:3
		})
	]
})

as can be seen in V4:

V4 West

But, it is more apparent when you zoom in:

V4 West Zoom

You can see now that the green markers are client-side clusters and the blue markers are server-side points (clusters or single locations).

At the end of all that you have a map with client-side clustering to handle visual density at the local level. You have server-side clustering at more-global zoom levels, with caching to remove jitter and reduce calculation time and you have actual location points being served filtered by bounds. It seems like a lot of effort, but overall the code itself is fairly simple and straight-forward... and now we can support a million data points with no real issues or loss of information.

One thing I have not mentioned here is the use of GIS databases or extensions. My goal here was more conceptual, but should you be faced with this kind of problem, you should look into the GIS support for your data storage solution since being able to run queries directly on the bounding shape can be more efficient with GIS solutions in place.


Older posts are available in the archive.