Custom Domain for GitHub Pages

15 February 2015 ~ blog

I have been working for a while now to get my blog fully cut over to being generated by JBake and hosted on GitHub; it's not all that difficult, just a format conversion and some domain fiddling, but I was procrastinating.

Pointing your GitHub Pages at a custom domain is not all that hard to do, and they provide decent documentation about how to do it; however, some streamlining is nice for DNS novices like myself. I may have done things a bit out of order, but it worked in the end...

First, I created A records for the GitHub-provided IP Addresses. I use Godaddy for my domain names, so your experience may be a bit different; but, in the Godaddy DNS Zone File editor you end up adding something like:

A Record

Next, I added a CName record alias for www pointing to my GitHub account hostname, which ended up looking like this:

CName Record

Lastly, you need to make changes in your repository - this step seems to be missed by a lot of people. The gist of it is that you add a new file to your gh-pages branch, named CNAME (all caps, no extension). And in that file you add your domain name (without http://www.). Save the file and be sure you push it to your remote repository.

At this point it worked for me, but the documentation said it could take up to 48 hours to propagate the changes.

Gradle and CodeNarc

07 November 2014 ~ blog, java, testing, gradle, groovy

The subject of "code quality tools" has lead to many developer holy wars over the years, so I'm not really going to touch the subject of their value or level of importance here, suffice to say that they are tools in your toolbox for helping to maintain a base level of "tedious quality", meaning style rules and general coding conventions enforced by your organization - it should never take the ultimate decision making from the developers.

That being said, let's talk about CodeNarc. CodeNarc is a rule-based code quality analysis tool for Groovy-based projects. Groovy does not always play nice with other code analysis tools, so it's nice that there is one specially designed for it and Gradle provides access to it out of the box.

Using the Gradle CodeNarc plugin is easy, apply the plugin to your build

apply plugin: 'codenarc'

and then do a bit of rule configuration based on the needs of your code base.

codenarcMain {
    ignoreFailures false
    configFile file('config/codenarc/codenarc-main.rules')

    maxPriority1Violations 0
    maxPriority2Violations 10
    maxPriority3Violations 20
}

codenarcTest {
    ignoreFailures true
    configFile file('config/codenarc/codenarc-test.rules')

    maxPriority1Violations 0
    maxPriority2Violations 10
    maxPriority3Violations 20
}

The plugin allows you to have different configurations for your main code and your test code, and I recommend using that functionality since generally you may care about slightly different things in your production code versus your test code. Also, there are JUnit-specific rules that you can ignore in your production code scan.

Notice that in my example, I have ignored failures in the test code. This is handy when you are doing a lot of active development and don't really want to fail your build every time your test code quality drops slightly. You can also set the thresholds for allowed violations of the three priority levels - when the counts exceed one of the given thresholds, the build will fail, unless it's ignored. You will always get a report for both main and test code in your build reports directory, even if there are no violations. The threshold numbers are something you will need to determine based on your code base, your team and your needs.

The .rules files are really Groovy DSL files, but the extension is unimportant so I like to keep them out of the Groovy namespace. The CodeNarc web site has a sample "kitchen sink" rule set to get things started - though it has a few rules that cause errors, you can comment those out or remove them from the file. Basically the file is a list of all the active rules, so removing one disables it. You can also configure some of them. LineLength is one I like to change:

LineLength { length = 150 }

This will keep the rule active, but will allow line lengths of 150 rather than the default 120 characters. You will need to check the JavaDocs for configurable rule properties; for the most part, they seem to be on or off.

Running the analysis is simple, the check task may be run by itself, or it will be run along with the build task.

gradle check

The reports (main and test) will be available in the build/reports/codenarc directory as two html files. They are not the prettiest reports, but they are functional.

If you are starting to use CodeNarc on an existing project, you may want to take a phased approach to applying and customizing rules so that you are not instantly bogged down with rule violations - do a few passes with the trimmed down rule set, fix what you can fix quickly and configure or disable the others and set your thresholds to a sensible level then make a goal to drop the numbers with each sprint or release so that progress is made.

Hello Again Slick2D

11 October 2014 ~ blog, java, groovy

I am finally getting back around to working on my little game programming project and I realized that somewhere along the
way, my project stopped working. I am using the Slick2D library, which seems to have little
in the way of formal release or distribution so it didn't surprise me. I think I had something hacked together making it
work last time. I decided to try and put some more concrete and repeatable steps around basic setup, at least for how I use it - I'm no
game programmer.

I'm using Groovy as my development language and Gradle for building. In the interest of time and clarity, I am going to use a
dump-and-describe approach here; there are only two files, so it should not be a big deal.

The build.gradle file is as follows:

group = 'com.stehno.demo'
version = '0.1'

buildscript {
    repositories {
        jcenter()

        maven {
            url 'http://dl.bintray.com/cjstehno/public/'
        }
    }

    dependencies {
        classpath 'com.stehno:gradle-natives:0.2'
    }
}

apply plugin:'groovy'
apply plugin:'application'
apply plugin:'com.stehno.natives'

compileJava {
    sourceCompatibility = 1.8
    targetCompatibility = 1.8
}

mainClassName = 'helloslick.HelloSlick'

repositories {
    jcenter()
}

dependencies {
    compile 'org.codehaus.groovy:groovy-all:2.3.6'

    compile 'org.slick2d:slick2d-core:1.0.1'
}

test {
    systemProperty 'java.library.path', file('build/natives/windows')
}

run {
    systemProperty 'java.library.path', file('build/natives/windows')
}

natives {
    jars = [
        'lwjgl-platform-2.9.1-natives-windows.jar',
        'jinput-platform-2.0.5-natives-windows.jar'
    ]
    platforms = 'windows'
}

task wrapper(type: Wrapper) {
    gradleVersion = '2.1'
}

The first point of note, is that I am using my Gradle Natives plugin, not as
a self-promotion, but since this is the reason I wrote it. This plugin takes care of extracting all the little native
libraries and putting them in your build so that they are easily accessible by your code. The configuration is found near
the bottom of the file, in the natives block - we want to extract the native libraries from the lwjgl and jinput libraries
for this project and in my case, I only care about the Windows versions (leave off platforms to get all platforms).

There was one interesting development during my time away from this project, a 3rd-party jar version of Slick2D has been pushed to maven central, which makes it a lot easier - I think I had to build it myself and fiddle with pushing it to my local maven repo or something. Now it's just another remote library (hopefully it works as expected - I have not played with it yet).

The last point of interest here is the use of the application plugin. This plugin provides an easy way to run your game
while specifying the java.library.path which is the painful part of running applications with native libraries. With the
application plugin and the run configuration in place, you can run the game from Gradle - admittedly not ideal, but this
is just development; I actually have a configuration set for the IzPack installer that I will write about later.

Now, we need some code to run, and the Slick2D wiki provides a simple Hello world sample that I have tweaked a bit for my
use - mostly just cosmetic changes:

package helloslick

import groovy.util.logging.Log
import org.newdawn.slick.*

import java.util.logging.Level

@Log
class HelloSlick extends BasicGame {

    HelloSlick(String gamename){
        super(gamename)
    }

    @Override
    public void init(GameContainer gc) throws SlickException {}

    @Override
    public void update(GameContainer gc, int i) throws SlickException {}

    @Override
    public void render(GameContainer gc, Graphics g) throws SlickException {
        g.drawString 'Hello Slick!', 50, 50
    }

    public static void main(String[] args){
        try {
            AppGameContainer appgc = new AppGameContainer(new HelloSlick('Simple Slick Game'))
            appgc.setDisplayMode(640, 480, false)
            appgc.start()

        } catch (SlickException ex) {
            log.log(Level.SEVERE, null, ex)
        }
    }
}

This just opens a game window and writes "Hello Slick!" in it, but if you have that working, you should be ready for playtime
with Slick2D.

Once you have the project setup (build.gradle in the root, and HelloSlick.groovy in /src/main/groovy/helloslick), you
are ready to go. Run the following to run the project.

gradle unpackNatives run

And if all is well, you will see the game window and message.

Like I said, this is mostly just for getting my development environment up and running as a sanity check, but maybe it is useful to others.

Yes, the explicit unpackNatives calls are annoying, it's something I am working on.

Spring Boot Embedded Server API

15 September 2014 ~ blog, spring, groovy, java, gradle

I have been investigating Spring-Boot for both work and personal projects and while it seems very all-encompassing and useful, I have found that its "opinionated" approach to development was a bit too aggressive for the project conversion I was doing at work; however, I did come to the realization that you don't have to use Spring-Boot as your projects core - you can use it and most of its features in your own project, just like any other java library.

The project I was working on had a customized embedded Jetty solution with a lot of tightly-coupled Jetty-specific configuration code with configuration being pulled from a Spring Application context. I did a little digging around in the Spring-Boot documentation and found that their API provides direct access to the embedded server abstraction used by a Boot project. On top of that, it's actually a very sane and friendly API to use. During my exploration and experimentation I was able to build up a simple demo application, which seemed like good fodder for a blog post - we're not going to solve any problems here, just a little playtime with the Spring-Boot embedded server API.

To start off, we need a project to work with; I called mine "spring-shoe" (not big enough for the whole boot, right?). I used Java 8, Groovy 2.3.2 and Gradle 2.0, but slightly older versions should also work fine - the build file looks like:

apply plugin: 'groovy'

compileJava {
    sourceCompatibility = 1.8
    targetCompatibility = 1.8
}

compileGroovy {
    groovyOptions.optimizationOptions.indy = false
}

repositories {
    jcenter()
}

dependencies {
    compile 'org.codehaus.groovy:groovy-all:2.3.2'

    compile 'javax.servlet:javax.servlet-api:3.0.1'
    compile 'org.eclipse.jetty:jetty-webapp:8.1.15.v20140411'

    compile 'org.springframework.boot:spring-boot:1.1.5.RELEASE'
    compile 'org.springframework:spring-web:4.0.6.RELEASE'
    compile 'org.springframework:spring-webmvc:4.0.6.RELEASE'
}

Notice, that I am using the spring-boot library, not the Gradle plugin or "starter" dependencies - this also means that you have to bring in other libraries yourself (e.g. the web and webmvc libraries above).

Next, we need an application starter, which just instantiates a specialized Application context, the AnnotationConfigEmbeddedWebApplicationContext:

package shoe

import org.springframework.boot.context.embedded.AnnotationConfigEmbeddedWebApplicationContext
import org.springframework.boot.context.embedded.EmbeddedWebApplicationContext

class Shoe {
    static void main( args ){
        EmbeddedWebApplicationContext context = new AnnotationConfigEmbeddedWebApplicationContext('shoe.config')
        println "Started context on ${new Date(context.startupDate)}"
    }
}

Where the package shoe.config is where my configuration class lives - the package will be auto-scanned. When this class' main method is run, it instantiates the context and just prints out the context start date. Internally this context will search for the embedded server configuration beans as well as any servlets and filters to be loaded on the server - but I am jumping ahead; we need a configuration class:

package shoe.config

import org.springframework.boot.context.embedded.EmbeddedServletContainerFactory
import org.springframework.boot.context.embedded.jetty.JettyEmbeddedServletContainerFactory
import org.springframework.context.annotation.Bean
import org.springframework.context.annotation.Configuration
import org.springframework.web.servlet.config.annotation.EnableWebMvc

@Configuration
@EnableWebMvc
class ShoeConfig {

    @Bean EmbeddedServletContainerFactory embeddedServletContainerFactory(){
        new JettyEmbeddedServletContainerFactory( 10101 )
    }
}

As you can see, it's just a simple Java-based configuration class. The EmbeddedServletContainerFactory class is the crucial part here. The context loader searches for a configured bean of that type and then loads it to create the embedded servlet container - a Jetty container in this case, running on port 10101.

Now, if you run Shoe.main() you will see some logging similar to what is shown below:

...
INFO: Jetty started on port: 10101
Started context on Thu Sep 04 18:59:24 CDT 2014

You have a running server, though its pretty boring since you have nothing useful configured. Let's start make it say hello using a simple servlet named HelloServlet:

package shoe.servlet

import javax.servlet.ServletException
import javax.servlet.http.HttpServlet
import javax.servlet.http.HttpServletRequest
import javax.servlet.http.HttpServletResponse

class HelloServlet extends HttpServlet {

    @Override
    protected void doGet( HttpServletRequest req, HttpServletResponse resp ) throws ServletException, IOException{
        resp.writer.withPrintWriter { w->
            w.println "Hello, ${req.getParameter('name')}"
        }
    }
}

It's just a simple HttpServlet extension that says "hello" with the input value from the "name" parameter. Nothing really special here. We could have just as easily used an extension of Spring's HttpServletBean here instead. Moving back to the ShoeConfig class, the modifications are minimal, you just create the servlet and register it as a bean.

@Bean HttpServlet helloServlet(){
    new HelloServlet()
}

Now fire the server up again, and browse to http://localhost:10101/helloServlet?name=Chris and you will get a response of:

Hello, Chris

Actually, any path will resolve to that servlet since it's the only one configured. I will come back to configuration of multiple servlets and how to specify the url-mappings in a little bit, but let's take the next step and setup a Filter implementation. Let's create a Filter that counts requests as they come in and then passes the current count along with the continuing request.

package shoe.servlet

import org.springframework.web.filter.GenericFilterBean

import javax.servlet.FilterChain
import javax.servlet.ServletException
import javax.servlet.ServletRequest
import javax.servlet.ServletResponse
import java.util.concurrent.atomic.AtomicInteger

class RequestCountFilter extends GenericFilterBean {

    private final AtomicInteger count = new AtomicInteger(0)

    @Override
    void doFilter( ServletRequest request, ServletResponse response, FilterChain chain ) throws IOException, ServletException{
        request.setAttribute('request-count', count.incrementAndGet())

        chain.doFilter( request, response )
    }
}

In this case, I am using the Spring helper, GenericFilterBean simply so I only have one method to implement, rather than three. I could have used a simple Filter implementation.

In order to make use of this new count information, we can tweak the HelloServlet so that it prints out the current count with the response - just change the println statement to:

w.println "<${req.getAttribute('request-count')}> Hello, ${req.getParameter('name')}"

Lastly for this case, we need to register the filter as a bean in the ShoeConfig class:

@Bean Filter countingFilter(){
    new RequestCountFilter()
}

Now, run the application again and hit the hello servlet a few times and you will see something like:

<10> Hello, Chris

The default url-mapping for the filter is "/*" (all requests). While, this may be useful for some quick demo cases, it would be much more useful to be able to define the servlet and filter configuration similar to what you would do in the web container configuration - well, that's where the RegistrationBeans come into play.

Revisiting the servlet and filter configuration in ShoeConfig we can now provide a more detailed configuration with the help of the ServletRegistrationBean and the FilterRegistrationBean classes, as follows:

@Bean ServletRegistrationBean helloServlet(){
    new ServletRegistrationBean(
        urlMappings:[ '/hello' ],
        servlet: new HelloServlet()
    )
}

@Bean FilterRegistrationBean countingFilter(){
    new FilterRegistrationBean(
        urlPatterns:[ '/*' ],
        filter: new RequestCountFilter()
    )
}

We still leave the filter mapped to all requests, but you now have access to any of the filter mapping configuration parameters. For instance, we can add a simple init-param to the RequestCountingFilter, such as:

int startValue = 0

private AtomicInteger count

@Override
protected void initFilterBean() throws ServletException {
    count = new AtomicInteger(startValue)
}

This will allow the starting value of the count to be specified as a filter init-parameter, which can be easily configured in the filter configuration:

@Bean FilterRegistrationBean countingFilter(){
    new FilterRegistrationBean(
        urlPatterns:[ '/*' ],
        filter: new RequestCountFilter(),
        initParameters:[ 'startValue': '1000' ]
    )
}

Nice and simple. Now, when you run the application again and browse to http://localhost:10101/helloServlet?name=Chris you get a 404 error. Why? Well, now you have specified a url-mapping for the servlet, try http://localhost:10101/hello?name=Chris and you will see the expected result, something like:

<1004> Hello, Chris

You can also register ServletContextListeners in a similar manner. Let's create a simple one:

package shoe.servlet

import javax.servlet.ServletContextEvent
import javax.servlet.ServletContextListener

class LoggingListener implements ServletContextListener {

    @Override
    void contextInitialized(ServletContextEvent sce) {
        println "Initialized: $sce"
    }

    @Override
    void contextDestroyed(ServletContextEvent sce) {
        println "Destroyed: $sce"
    }
}

And then configure it in ShoeConfig:

@Bean ServletListenerRegistrationBean listener(){
    new ServletListenerRegistrationBean(
        listener: new LoggingListener()
    )
}

Then, when you run the application, you will get a message in the server output like:

Initialized: javax.servlet.ServletContextEvent[source=ServletContext@o.s.b.c.e.j.JettyEmbeddedWebAppContext{/,null}]

Now, let's do something a bit more interesting - let's setup a Spring-MVC configuration inside our embedded server.

The first thing you need for a minimal Spring-MVC configuration is a DispatcherServlet which, at its heart, is just an HttpServlet so we can just configure it as a bean in ShoeConfig:

@Bean HttpServlet dispatcherServlet(){
    new DispatcherServlet()
}

Then, we need a controller to make sure this configuration works - how about a simple controller that responds with the current time; we will also dump the request count to show that the filter is still in play. The controller looks like:

package shoe.controller

import org.springframework.web.bind.annotation.RequestMapping
import org.springframework.web.bind.annotation.RestController

import javax.servlet.http.HttpServletRequest

@RestController
class TimeController {

    @RequestMapping('/time')
    String time( HttpServletRequest request ){
        "<${request.getAttribute('request-count')}> Current-time: ${new Date()}"
    }
}

Lastly for this example, we need to load the controller into the configuration; just add a @ComponentScan annotation to the ShoeConfig as:

@ComponentScan(basePackages=['shoe.controller'])

Fire up the server and hit the http://localhost:10101/time controller and you see something similar to:

<1002> Current-time: Fri Sep 05 07:02:36 CDT 2014

Now you have the ability to do any of your Spring-MVC work with this configuration, while the standard filter and servlet still work as before.

As a best-practice, I would suggest keeping this server configuration code separate from other configuration code for anything more than a trivial application (i.e. you wouldn't do your security and database config in this same file).

For my last discussion point, I want to point out that the embedded server configuration also allows you to do additional customization to the actual server instance during startup. To handle this additional configuration, Spring provides the JettyServerCustomizer interface. You simply implement this interface and add it to your sever configuration factory bean. Let's do a little customization:

class ShoeCustomizer implements JettyServerCustomizer {

    @Override
    void customize( Server server ){
        SelectChannelConnector myConn = server.getConnectors().find { Connector conn ->
            conn.port == 10101
        }

        myConn.maxIdleTime = 1000 * 60 * 60
        myConn.soLingerTime = -1

        server.setSendDateHeader(true)
    }
}

Basically just a tweak of the main connector and also telling the server to send an additional response header with the date value. This needs to be wired into the factory configuration, so that bean definition becomes:

@Bean EmbeddedServletContainerFactory embeddedServletContainerFactory(){
    def factory = new JettyEmbeddedServletContainerFactory( 10101 )
    factory.addServerCustomizers( new ShoeCustomizer() )
    return factory
}

Now when you start the server and hit the time controller you will see an additional header in the response:

Date:Fri, 05 Sep 2014 12:15:27 GMT

As you can see from this long discussion, the Spring-Boot embedded server API is quite useful all on its own. It's nice to see that Spring has exposed this functionality as part of its public API rather than hiding it under the covers somewhere.

The code I used for this article can be found in the main repository for this project, under the spring-shoe directory.

NodeTypes - Deeper Down the Rabbit Hole

23 August 2014 ~ blog, java, groovy

In my last post about Jackrabbit, "Wabbit Season with Jackrabbit", I fleshed out the old Jackrabbit tutorial and expanded it a bit to ingest some image file content. I touched on the subject of node types briefly, but did little with them. In this post, I am going to delve a bit deeper into using node types and creating your own.

In the older versions of Jackrabbit, they a text-based format for configuring your own node types. I is not well documented, and I was not at all sad to see that it is no longer used since Jackrabbit 2.x. There may be another approach to loading node types, but I found the programmatic approach interesting.

For this post, you will want to refer to the code presented in the other post, "Wabbit Season with Jackrabbit" as a starting point (especially the last version of the code, which the code here will be based on).

For this example, we are going to expand the previous example to include image metadata in the stored node properties. I was originally under the impression that Jackrabbit would automatically extract the metadata on ingestion of the data, but it appears that this is only the case for text-based data when doing indexing. This is not a big roadblock, though, since Apache Tika is included with Jackrabbit, although a slightly older version than what I wanted to use. You can add the following to your build.gradle file to update the version:

compile 'org.apache.tika:tika-parsers:1.5'

Tika provides metadata extractors for a wide range of file formats, one of which is JPEG images, which is what we are playing with here.

First, we need to extract the metadata from the image file. I did this just after the main method's file reference statement:

def metadata = extractMetadata( file )

The code for the extractMetadata(File) method is as follows:

private static Map<String,String> extractMetadata( File imageFile ){
    def meta = new Metadata()
    def extractor = new ImageMetadataExtractor( meta )

    log.info 'Extracting metadata from {}', imageFile

    extractor.parseJpeg(imageFile)

    def props = [:]
    meta.names().sort().each { name->
        props[name] = meta.get(name)
        log.info " : <image-meta> $name : ${meta.get(name)}"
    }

    return props
}

It's just a simple straight-forward use of the Tika ImageMetadataExtractor, which pulls out all the data and stores it into a Map for use later.

Then, after we create the main file node, we want to apply the metadata properties to it:

applyMetadata( fileNode, metadata )

The applyMetadata(Node,Map) method applies the metadata from the map as properties on the node. The code is as shown below:

private static void applyMetadata( Node node, Map<String,String> metadata ){
    node.addMixin('pp:photo')
    node.setProperty('pp:photo-width', metadata['Image Width'].split(' ')[0] as long )

    log.info 'Applied mixin -> {} :: {}', node.mixinNodeTypes.collect { it.name }.join(', '), node.getProperty('pp:photo-width').string
}

For the metadata, I used the concept of "Mixin" node types. Every node has a primary node type, in this case it's an "nt:file" node, but nodes can have multiple mixin node types also applied to them so that they can have additional properties available. This works perfectly in my case, since I want a file that is a photo with extra metadata associated with it.

Also, the dumpProps(Node) method changed slightly to avoid errors during extraction, and to hide properties we don't care about seeing:

private static void dumpProps( Node node ){
    log.info 'Node ({}) of type ({}) with mixins ({})', node.name, node.getPrimaryNodeType().name, node.getMixinNodeTypes()

    def iter = node.properties
    while( iter.hasNext() ){
        def prop = iter.nextProperty()
        if( prop.type != PropertyType.BINARY ){
            if( prop.name != 'jcr:mixinTypes' ){
                log.info ' - {} : {}', prop.name, prop.value.string
            }
        } else {
            log.info ' - {} : <binary-data>', prop.name
        }
    }
}

If you run the code at this point, you will get an error about the node type not being defined, so we need to define the new node type. In the current version of Jackrabbit, they defer node type creation to the standard JCR 2.0 approach, which is pretty clean. The code is shown below:

private static void registerNodeTypes(Session session ) throws Exception {
    if( !session.namespacePrefixes.contains('pp') ){
        session.workspace.namespaceRegistry.registerNamespace('pp', 'http://stehno.com/pp')
    }

    NodeTypeManager manager = session.getWorkspace().getNodeTypeManager()

    if( !manager.hasNodeType('pp:photo') ){
        NodeTypeTemplate nodeTypeTemplate = manager.createNodeTypeTemplate()
        nodeTypeTemplate.name = 'pp:photo'
        nodeTypeTemplate.mixin = true

        PropertyDefinitionTemplate propTemplate = manager.createPropertyDefinitionTemplate()
        propTemplate.name = 'pp:photo-width'
        propTemplate.requiredType = PropertyType.LONG
        propTemplate.multiple = false
        propTemplate.mandatory = true

        nodeTypeTemplate.propertyDefinitionTemplates << propTemplate

        manager.registerNodeType( nodeTypeTemplate, false )
    }
}

Which is called just after logging in and getting a reference to a repository session. Basically, you use the NodeTypeManager to create a NodeTypeTemplate which you can use to specify the configuration settings of your new node type. There is a similar construct for node type properties, the PropertyDefinitionTemplate. Once you have your configuration done, you register the node type and you are ready to go.

When run, this code generates output similar to:

2014-08-23 16:43:02 Rabbits [INFO] User (admin) logged into repository (Jackrabbit)
...
2014-08-23 16:43:02 Rabbits [INFO]  : <image-meta> Image Width : 2448 pixels
...
2014-08-23 16:43:02 Rabbits [INFO] Applied mixin -> pp:photo :: 2448
2014-08-23 16:43:02 Rabbits [INFO] Stored image file data into node (2014-08-19 20.49.40.jpg)...
2014-08-23 16:43:02 Rabbits [INFO] Node (2014-08-19 20.49.40.jpg) of type (nt:file) with mixins ([org.apache.jackrabbit.core.nodetype.NodeTypeImpl@5b3bb1f7])
2014-08-23 16:43:02 Rabbits [INFO]  - jcr:createdBy : admin
2014-08-23 16:43:02 Rabbits [INFO]  - pp:photo-width : 2448
2014-08-23 16:43:02 Rabbits [INFO]  - jcr:primaryType : nt:file
2014-08-23 16:43:02 Rabbits [INFO]  - jcr:created : 2014-08-23T16:43:02.531-05:00
2014-08-23 16:43:02 Rabbits [INFO] Node (jcr:content) of type (nt:resource) with mixins ([])
2014-08-23 16:43:02 Rabbits [INFO]  - jcr:lastModified : 2014-08-19T20:49:44.000-05:00
2014-08-23 16:43:02 Rabbits [INFO]  - jcr:data : <binary-data>
2014-08-23 16:43:02 Rabbits [INFO]  - jcr:lastModifiedBy : admin
2014-08-23 16:43:02 Rabbits [INFO]  - jcr:uuid : a699fbd6-4493-4dc7-9f7a-b87b84cb1ef9
2014-08-23 16:43:02 Rabbits [INFO]  - jcr:primaryType : nt:resource

(I omitted a bunch of the metadata output lines to clean up the output)

You can see that the new node type data is populated from the metadata and the mixin is properly applied.

Call me crazy, but this approach seems a lot cleaner than the old text-based approach. There are some rules around node types and ensuring that they are not created if they already exist, though this only seems to be a problem in certain use cases - need to investigate that a bit more, but be aware of it.

Now, you can stop here and create new node types all day long, but let's take this experiment a little farther down the rabbit hole. The programmatic approach to node type configuration seems to lend itself nicely to a Groovy-based DSL approach, something like:

private static void registerNodeTypes( Session session ) throws Exception {
    definitions( session.workspace ){
        namespace 'pp', 'http://stehno.com/pp'

        nodeType {
            name 'pp:photo'
            mixin true

            propertyDefinition {
                name 'pp:photo-width'
                requiredType PropertyType.LONG
                multiple false
                mandatory true
            }

            propertyDefinition {
                name 'pp:photo-height'
                requiredType PropertyType.LONG
                multiple false
                mandatory true
            }
        }
    }
}

Seems like a nice clean way to create new node types and their properties with little fuss and muss. So, using a little Groovy DSL closure delegation we can do this without too much pain:

class NodeTypeDefiner {

    private final NodeTypeManager manager
    private final Workspace workspace

    private NodeTypeDefiner( final Workspace workspace ){
        this.workspace = workspace
        this.manager = workspace.nodeTypeManager
    }

    void namespace( String name, String uri ){
        if( !workspace.namespaceRegistry.prefixes.contains(name) ){
            workspace.namespaceRegistry .registerNamespace(name, uri)
        }
    }

    static void definitions( final Workspace workspace, Closure closure ){
        NodeTypeDefiner definer = new NodeTypeDefiner( workspace )
        closure.delegate = definer
        closure.resolveStrategy = Closure.DELEGATE_ONLY
        closure()
    }

    void nodeType( Closure closure ){
        def nodeTypeTemplate = new DelegatingNodeTypeTemplate( manager )

        closure.delegate = nodeTypeTemplate
        closure.resolveStrategy = Closure.DELEGATE_ONLY
        closure()

        manager.registerNodeType( nodeTypeTemplate, true )
    }
}

The key pain point I found here was that with the nested closure structures, I needed to change the resolveStrategy so that you get the delegate only rather than the owner - took a little debugging to trace that one down.

The other useful point here was the "Delegating" extensions of the two "template" classes:

class DelegatingNodeTypeTemplate implements NodeTypeDefinition {

    @Delegate NodeTypeTemplate template
    private final NodeTypeManager manager

    DelegatingNodeTypeTemplate( final NodeTypeManager manager ){
        this.manager = manager
        this.template = manager.createNodeTypeTemplate()
    }

    void name( String name ){
        template.setName( name )
    }

    void mixin( boolean mix ){
        template.mixin = mix
    }

    void propertyDefinition( Closure closure ){
        def propertyTemplate = new DelegatingPropertyDefinitionTemplate( manager )
        closure.delegate = propertyTemplate
        closure.resolveStrategy = Closure.DELEGATE_ONLY
        closure()
        propertyDefinitionTemplates << propertyTemplate
    }
}

class DelegatingPropertyDefinitionTemplate implements PropertyDefinition {

    @Delegate PropertyDefinitionTemplate template
    private final NodeTypeManager manager

    DelegatingPropertyDefinitionTemplate( final NodeTypeManager manager ){
        this.manager = manager
        this.template = manager.createPropertyDefinitionTemplate()
    }

    void name( String name ){
        template.setName( name )
    }

    void requiredType( int propertyType ){
        template.setRequiredType( propertyType )
    }

    void multiple( boolean value ){
        template.multiple = value
    }

    void mandatory( boolean value ){
        template.mandatory = value
    }
}

They provide the helper methods to allow a nice clean DSL. Without them you have only setters, which did not work out cleanly. You just end up with some small delegate classes.

This code takes care of adding in the property definitions, registering namespaces and node types. It does not currently support all the configuration properties; however, that would be simple to add - there are not very many available.

As you can see from the DSL example code, you can now add new node types in a very simple manner. This kind of thing is why I love Groovy so much.

If there is any interest in this DSL code, I will be using it in one of my own projects, so I could extract it into a library for more public use - let me know if you are interested.

Wabbit Season with Jackrabbit

23 August 2014 ~ blog, java, groovy

I have been playing with Apache Jackrabbit today, while doing some research for one of my personal projects, and while it seems to have matured a bit since the last time I looked into it, the documentation has stagnated. Granted, it still works as a jump-start better than nothing at all, but it really does not reflect the current state of the API. I present here a more modern take on the "First Hops" document based on what I did for my research - I am using Gradle, Groovy, and generally more modern versions of the libraries involved. Maybe this can help others, or myself at a later date.

Getting Started

The quickest and easiest way to get started is using an embedded TransientRepository. Create a project directory and create a build.groovy Gradle build file similar to the following:

apply plugin: 'groovy'

repositories {
    jcenter()
}

dependencies {
    compile 'org.codehaus.groovy:groovy-all:2.3.6'

    compile 'javax.jcr:jcr:2.0'
    compile 'org.apache.jackrabbit:jackrabbit-core:2.8.0'
    compile 'org.slf4j:slf4j-log4j12:1.7.7'
}

This will give you the required dependencies and a nice playground project to work with.

Logging in to Jackrabbit

In the src/main/groovy directory of the project, create a file called Rabbits.groovy with the following code:

import groovy.util.logging.Slf4j
import org.apache.jackrabbit.core.TransientRepository

import javax.jcr.Repository
import javax.jcr.Session

@Slf4j
class Rabbits {

    static void main(args) throws Exception {
        Repository repository = new TransientRepository(
            new File('./build/repository')
        )

        Session session = repository.login()
        try {
            String user = session.getUserID()
            String name = repository.getDescriptor(Repository.REP_NAME_DESC)

            log.info 'Logged in as {} to a {} repository.', user, name

        } finally {
            session.logout()
        }
    }
}

The important part here is the TransientRepository code, which allows you to use/reuse a repository for testing. I found that specifying a repository directory in my build directory was useful since by default it will put a bunch of files and directories in the root of your project when you run the project - it's just a little cleaner when you can run gradle clean to wipe out your development repository when needed. The downside of specifying the directory seems to be that your repository is not completely transient. I was not clear whether or not this was always the case or just when I set the directory, hence the need to wipe it out sometimes.

The rest of the code is pretty clear, it just does a login to the repository and writes out some information. When run, you should get something like the following:


The `finally` block is used to always logout of the repository, though this seems a bit dubious because it seemed quite easy to lock the repository in a bad state when errors caused application failure - this will require some additional investigation. Lastly, to round out the first version of the project, create a `log4j.properties` file in `src/main/resources` so that your logger has some configuration. I used:

log4j.rootCategory=INFO, Cons

log4j.logger.com.something=ERROR

log4j.logger.org.apache.jackrabbit=WARN

log4j.appender.Cons = org.apache.log4j.ConsoleAppender
log4j.appender.Cons.layout = org.apache.log4j.PatternLayout
log4j.appender.Cons.layout.ConversionPattern = %d{yyyy-MM-dd HH:mm:ss} %c{1} [%p] %m%n
```

If you want to see more about what Jackrabbit is doing, set the logging level for log4j.logger.org.apache.jackrabbit to INFO - it gets a little verbose, so I turned it down to WARN.

Working with Content

When using a content repository, you probably want to do something with actual content, so let's start off with a simple case of some nodes with simple text content. The main method of the Rabbits class now becomes:

Repository repository = new TransientRepository(
    new File('./build/repository')
)

Session session = repository.login(
    new SimpleCredentials('admin','admin'.toCharArray())
)

try {
    String username = session.userID
    String name = repository.getDescriptor(Repository.REP_NAME_DESC)
    log.info 'User ({}) logged into repository ({})', username, name

    Node root = session.rootNode

    // Store content
    Node hello = root.addNode('hello')
    Node world = hello.addNode('world')
    world.setProperty('message', 'Hello, World!')
    session.save()

    // Retrieve content
    Node node = root.getNode('hello/world')
    log.info 'Found node ({}) with property: {}', node.path, node.getProperty('message').string

    // Remove content
    root.getNode('hello').remove()
    log.info 'Removed node.'

    session.save()

} finally {
    session.logout()
}

Notice, that the login code now contains credentials so that we can login with a writable session rather than the read-only default session (previous example).

First, we need to store some content in the repository. Since Jackrabbit is a hierarchical data store, you need to get a reference to the root node, and then add a child node to it with some content:

Node root = session.rootNode

// Store content
Node hello = root.addNode('hello')
Node world = hello.addNode('world')
world.setProperty('message', 'Hello, World!')
session.save()

We create a node named "hello", the add a child named "world" to that node, and give the child node a "message" property. Notice that we save the session to persist the changes to the underlying data store.

Next, we want to read the data back out:

Node node = root.getNode('hello/world')
log.info 'Found node ({}) with property: {}', node.path, node.getProperty('message').string

You just get the node by it's relative path, in this case from the root, and then retrieve its data.

Lastly, for this example, we want to remove the nodes we just added:

root.getNode('hello').remove()
session.save()
log.info 'Removed node.'

Removing the "hello" node removes it and it's children (i.e. the "world" node). We then save the session to commit the node removal.

When you run this version of the code, you should see something like this:

2014-08-23 15:45:18 Rabbits [INFO] User (admin) logged into repository (Jackrabbit)
2014-08-23 15:45:18 Rabbits [INFO] Found node (/hello/world) with property: Hello, World!
2014-08-23 15:45:18 Rabbits [INFO] Removed node.

Working with Binary Content

This is where my tour diverts from the original wiki document, which goes on to cover XML data imports. I was more interested in loading binary content, especially image files. To accomplish this, we need to consider how the data is stored in JCR. I found a very helpful article "Storing Files and Folders" from the ModeShape documentation (another JCR implementation) - since it's standard JCR, it is still relevant with Jackrabbit.

Basically you need a node for the file and it's metadata, which has a child node for the actual file content. The article has some nice explanations and diagrams, so if you want more than code and quick discussion I recommend you head over there and take a look at it. For my purpose, I am just going to ingest a single image file and then read out the data to ensure that it was actually stored. The code for the try/finally block of our example becomes:

String username = session.userID
String name = repository.getDescriptor(Repository.REP_NAME_DESC)
log.info 'User ({}) logged into repository ({})', username, name

Node root = session.rootNode

// Assume that we have a file that exists and can be read ...
File file = IMAGE_FILE

// Determine the last-modified by value of the file (if important) ...
Calendar lastModified = Calendar.instance
lastModified.setTimeInMillis(file.lastModified())

// Create an 'nt:file' node at the supplied path ...
Node fileNode = root.addNode(file.name, 'nt:file')

// Upload the file to that node ...
Node contentNode = fileNode.addNode('jcr:content', 'nt:resource')
Binary binary = session.valueFactory.createBinary(file.newInputStream())
contentNode.setProperty('jcr:data', binary)
contentNode.setProperty('jcr:lastModified',lastModified)

// Save the session (and auto-created the properties) ...
session.save()

log.info 'Stored image file data into node ({})...', file.name

// now get the image node data back out

def node = root.getNode(file.name)
dumpProps node

dumpProps node.getNode('jcr:content')

Where IMAGE_FILE is a File object pointing to a JPEG image file.

The first thing we do is create the file node:

Node fileNode = root.addNode(file.name, 'nt:file')

Notice, it's of type nt:file to designate that it's a file node - you will want to brush up on NodeTypes in the Jackrabbit or JCR documentation if you don't already have a basic understanding; I won't do much more than use them in these examples. For the name of the node, we just use the file name.

Second, we create the file content node as a child of the file node:

Node contentNode = fileNode.addNode('jcr:content', 'nt:resource')
Binary binary = session.valueFactory.createBinary(file.newInputStream())
contentNode.setProperty('jcr:data', binary)
contentNode.setProperty('jcr:lastModified',lastModified)

// Save the session (and auto-created the properties) ...
session.save()

Notice that the child node is named "jcr:content" and is of type "nt:resource" and that it has a property named "jcr:data" containing the binary data content for the file. Of course, the session is saved to persist the changes.

Once we have the file data stored, we want to pull it back out to see that we stored everything as intended:

def node = root.getNode(file.name)
dumpProps node

dumpProps node.getNode('jcr:content')

The dumpProps method just iterates the properties of a given node and writes them to the log file:

private static void dumpProps( Node node ){
    log.info 'Node: ({})', node.name

    def iter = node.properties
    while( iter.hasNext() ){
        def prop = iter.nextProperty()
        if( prop.type != PropertyType.BINARY ){
            log.info ' - {} : {}', prop.name, prop.value.string
        } else {
            log.info ' - {} : <binary-data>', prop.name
        }
    }
}

When you run this version of the code, you will have output similar to:

2014-08-23 16:09:18 Rabbits [INFO] User (admin) logged into repository (Jackrabbit)
2014-08-23 16:09:18 Rabbits [INFO] Stored image file data into node (2014-08-19 20.49.40.jpg)...
2014-08-23 16:09:18 Rabbits [INFO] Node: (2014-08-19 20.49.40.jpg)
2014-08-23 16:09:18 Rabbits [INFO]  - jcr:createdBy : admin
2014-08-23 16:09:18 Rabbits [INFO]  - jcr:created : 2014-08-23T15:59:26.155-05:00
2014-08-23 16:09:18 Rabbits [INFO]  - jcr:primaryType : nt:file
2014-08-23 16:09:18 Rabbits [INFO] Node: (jcr:content)
2014-08-23 16:09:18 Rabbits [INFO]  - jcr:lastModified : 2014-08-19T20:49:44.000-05:00
2014-08-23 16:09:18 Rabbits [INFO]  - jcr:data : <binary-data>
2014-08-23 16:09:18 Rabbits [INFO]  - jcr:lastModifiedBy : admin
2014-08-23 16:09:18 Rabbits [INFO]  - jcr:uuid : cbdefd4a-ec2f-42d2-b58a-a39942766723
2014-08-23 16:09:18 Rabbits [INFO]  - jcr:primaryType : nt:resource

Conclusion

Jackrabbit seems to still have some development effort behind it, and it's still a lot easier to setup and use when compared with something like ModeShape, which seems to be the only other viable JCR implementation which is not specifically geared to a target use case.

The documentation is lacking, but with some previous experience and a little experimentation, it was not too painful getting things to work.

Simple Configuration DSL using Groovy

19 July 2014 ~ blog, java, groovy

Recently at work we were talking about being able to process large configuration files from legacy applications where the config file had a fairly simple text-based format. One of my co-workers mentioned that you could probably just run the configuration file like a Groovy script and just handle the missingMethod() calls and use them to populate a configuration object. This sounded like an interesting little task to play with so I threw together a basic implementation - and it's actually easier than I thought.

To start out with, we need a configuration holder class, which we'll just call Configuration:

class Configuration {
    String hostName
    String protocol
    int port
    Headers headers
}

Say we are collecting configuration information for some sort of HTTP request util or something, it's a contrived example, but shows the concept nicely. The Headers class is a simple delegated builder in itself, and looks like:

@ToString(includeNames=true)
class Headers {
    Map<String,Object> values = [:]

    static Headers headers( Closure closure ){
        Headers h = new Headers()
        closure.delegate = h
        closure()
        return h
    }
    
    void header( String name, value ){
        values[name] = value
    }
}

I won't explain much about the Headers class, other than it takes a closure and delegates the method calls of it onto a Headers instance to populate it. For our purposes it just makes a nice simple way to show closure usage in the example.

Now, we need a configuration file to load. It's just a simple text file:

hostname 'localhost'
protocol = 'https'
port 2468

headers {
    header 'Content-type','text/html'
    header 'Content-length',10101
}

The script-based configuration is similar to the delegated builder, in that the method calls of the "script" (text configuration file) will be delegated to an instance of the Configuration class. For that to work, we could override the missingMethod() method and handle each desired operation, or if we have a good idea of the configuration (as we do in our case), we could just add the missing methods, as follows:

@ToString(includeNames=true)
class Configuration {

    String hostName
    int port
    Headers headers
   
    void hostname( final String name ){
        this.hostName = name
    }

    void port( final int port ){
        this.port = port
    }
    
    void headers( final Closure closure ){
        this.headers = Headers.headers( closure )
    }
}

Basically, they are just setters in our case; however, you could do whatever conversion or validation you need, they're just method calls. Also, notice that the protocol property in the configuration file is actually setting the property directly with an equals = rather than using a method call - this is also valid, though personally I like the way it looks without all the equals signs.

The final part needed to make this work, is the Groovy magic. We need to load the text as a script in a GroovyShell, parse it and run it. The whole code for the Configuration object is shown below:

@ToString(includeNames=true)
class Configuration {

    String hostName
    String protocol
    int port
    Headers headers
   
    void hostname( final String name ){
        this.hostName = name
    }

    void port( final int port ){
        this.port = port
    }
    
    void headers( final Closure closure ){
        this.headers = Headers.headers( closure )
    }

    static Configuration configure( final File file ){
        def script = new GroovyShell(
            new CompilerConfiguration(
                scriptBaseClass:DelegatingScript.class.name 
            )
        ).parse(file)

        def configuration = new Configuration()
        script.setDelegate( configuration )
        script.run()

        return configuration
    }
}

The important parts are the use of the DelegatingScript as the scriptBaseClass and then setting the Configuration instance as the delegate for the script. Now if you run the following:

def conf = Configuration.configure( new File('conf.txt') )
println conf

You get something like the following output:

Configuration(protocol:https, hostName:localhost, port:2468, headers:Headers(values:[Content-type:text/html, Content-length:10101]))

Notice, that in the example we didn't define a method for protocol, which means that the only way you can set it in the configuration is as a property; however, we could use the property format to set the value of the other fields, such as port since there is a setter method available along with the helper method (options are nice).

Groovy makes simple DSLs, well... simple.

Going Native with Gradle

16 March 2014 ~ blog, java, groovy, gradle

With my recent foray into Java game programming, I found the support for managing the native sub-dependencies of jar files to be a bit lacking in Gradle. I did find a few blog posts about the general ways of adding it to your build; however, I did not find any specific plugin or built-in support. Since I am planning on doing a handful of simple games as a tutorial for game programming it made sense for me to pull out my native library handling functionality into a Gradle plugin... and thus the Gradle Natives Plugin was born.

First, we need a project to play with. I found a simple LWJGL Hello World application that works nicely for our starting point. So, create the standard Gradle project structure with the following files:

// hello/src/main/java/hello/HelloWorld.java
package hello;

import org.lwjgl.LWJGLException;
import org.lwjgl.opengl.Display;
 
public class HelloWorld {
    public static void main (String args[]){
        try {
            Display.setTitle("Hello World");
            Display.create();
			
			while(!Display.isCloseRequested()){
				Thread.sleep(100);      
			}
		
        } catch (Exception e) {
            e.printStackTrace();
        } finally {
			Display.destroy();
		}
    }
}

with a standard Gradle build file as a starting point:

// hello/build.gradle

apply plugin:'java'

repositories {
	jcenter()
}

dependencies {
	compile 'org.lwjgl.lwjgl:lwjgl:2.9.1'
}

At this point, the project will build, but will not run without jumping through some extra hoops. Let's do some of that hoop-jumping in Gradle with the application plugin. Add the following to the build.gradle file:

apply plugin:'application'

mainClassName = 'hello.HelloWorld'

This adds the run task to the build which will run the HelloWorld main class; however, this still won't work since it does not know how to deal with the LWJGL native libraries. That's where the natives plugin comes in. At this time there is no official release of the plugin on Bintray (coming soon), so you will need to clone the repo and build the plugin, then install it into your local maven repo:

git clone git@github.com:cjstehno/gradle-natives.git

cd gradle-natives

gradle build install

Once that is done, you will need to add the natives plugin to your build:

buildscript {
    repositories {
        mavenLocal()
    }

    dependencies {
        classpath 'gradle-natives:gradle-natives:0.1'
    }
}

apply plugin:'natives'

And then you will need to apply the custom configuration for your specific native libraries. You will need to add an entry in the jars list for each dependency jar containing native libraries. These are the jars that will be searched on the classpath for native libraries by platform.

natives {
	jars = [
		'lwjgl-platform-2.9.1-natives-windows', 
		'lwjgl-platform-2.9.1-natives-osx', 
		'lwjgl-platform-2.9.1-natives-linux'
	]
}

This will allow the associated native libraries to be unpacked into the build directory with:

gradle unpackNatives

Which will copy the libraries into a directory for each platform under build/natives/PLATFORM. Then we need one more step to allow it to be run. The java.library.path needs to be set before the run:

run {
    systemProperty 'java.library.path', file( 'build/natives/windows' )
}

Then you can run the application using:

gradle run

Granted, there are still issues to be resolved with the plugin. Currently, it is a little picky about when it is run. If you have tests that use the native libraries you will need to build without tests and then run the tests:

gradle clean build unpackNatives -x test

gradle test

Lastly, you can also specify the platforms whose library files are to be copied over using the platforms configuration property, for example:

natives {
	jars = [
		'lwjgl-platform-2.9.1-natives-windows', 
		'lwjgl-platform-2.9.1-natives-osx', 
		'lwjgl-platform-2.9.1-natives-linux'
	]
	platforms = 'windows'
}

Will only copy the windows libraries into the build.

Feel free to create an issue for any bugs you find or features you would like to see. Also, I am open to bug fixes and pull requests from others.

Mapping Large Data Sets

09 June 2013 ~ blog, java, javascript

Recently, I was tasked to resolve some performance issues related to displaying a large set of geo-location data on a map. Basically, the existing implementation was taking the simple approach of fetching all the location data from the server and rendering it on the map. While, there is nothing inherently wrong with this approach, it does not scale well as the number of data points increases, which was the problem at hand.

The map needed to be able to render equally well whether there were 100 data points or a million. With this direct approach, the browser started to bog down at just over a thousand points, and failed completely at 100-thousand. A million was out of the question. So, what can be done?

I have created a small demo application to help present the concepts and techniques I used in solving this problem. I intend to focus mostly on the concepts and keep the discussion of the code to a minimum. This will not really be much of an OpenLayers tutorial unless you are faced with a similar task. See the sidebar for more information about how to setup and run the application - it's only necessary if you want to run the demo yourself.

The demo application is available on GitHub and its README file contains the information you need to build and run it.

First, let's look at the problem itself. If you fire up the demo "V1" with a data set of 10k or less, you will see something like the following:

V1 View

You can see that even with only ten thousand data points it is visually cluttered and a bit sluggish to navigate. If you build a larger data set of 100k or better yet, a million data points and try to run the demo, at best it will take a long time, most likely it will crash your browser. This approach is just not practical for this volume of data.

The code for this version simply makes an ajax request to the data service to retrieve all the data points:

$.ajax('poi/v1/fetch', { contentType:'application/json' }).done(function(data){
	updateMarkers(map, data);
});

and then renders the markers for each data point on the map:

function updateMarkers( map, data ){
	var layer = map.getLayersByName('Data')[0];

	var markers = $.map(data, function(item){
		return new OpenLayers.Feature.Vector(
			new OpenLayers.Geometry.Point(
				item.longitude, item.latitude
			).transform(PROJECTION_EXTERNAL, PROJECTION_INTERNAL),
			{ 
				item:item 
			},
			OpenLayers.Util.applyDefaults(
				{ fillColor:'#0000ff' }, 
				OpenLayers.Feature.Vector.style['default']
			)
		);
	});

	layer.addFeatures(markers);
}

What we really need to do is reduce the amount of data being processed without losing any visual information? The key is to consider the scope of your view. Other than at the lowest zoom levels (whole Earth view) you are only viewing a relatively limited part of the whole map, which means that only a sub-set of the data is visible at any given time. So why fetch it all from the server when it just adds unnecessary load on the JavaScript mapping library?

The answer is that you don't have to. If you listen to map view change events and fetch the data for only your current view by passing the view bounding box to your query, you can limit the data down to only what you currently see. The "V2" demo uses this approach to limit the volume of data.

eventListeners:{
	moveend:function(){
		var bounds = map.getExtent().transform(PROJECTION_INTERNAL, PROJECTION_EXTERNAL).toString();

		$.ajax('poi/v2/fetch/' + bounds, { contentType:'application/json' }).done(function(data){
			updateMarkers(map, data);
		});
	}
}

The updateMarkers() function remains unchanged in this version.

Visually, this version of the application is the same; however, it will handle larger data sets with less pain. This approach increases the number of requests for data but will reduce the amount of data retrieved as the user zooms into their target area of interest.

This approach is still a bit flawed; this method works fine for cases where the user is zoomed in on a state or small country; however, it is still possible to view the whole large data set when your view is at the lower zoom levels (whole Earth). There is still more work to be done.

In order to reduce the number of data points when viewing the lower zoom levels, we need to consider how useful all this data really is. Considering the image from V1, which is still valid for V2, is there any use in rendering all of those data points? This is just random distributed data, but even real data would probably be as dense or even more so in areas around population centers which would only compound the problem. How can you clean up this display mess while also reducing the amount of data being sent, oh, and without any lose of useful information?

The first part of the answer is clustering (see Cluster Analysis). We needed to group the data together in a meaningful way such that we present a representative point for a nearby group of points, otherwise known as a cluster. After some research and peer discussion, it was decided that the K-Means Clustering Algorithm was the approach for our needs, and the Apache Commons - Math library provided a stable and generic implementation that would work well for our requirements. It is also what I have used here for this demo.

The clustering provides a means of generating a fixed-size data set representing the whole around a common center point. With this, you can limit your clustered data set down to something like 200, which can easily be displayed on the map, and will still provide an accurate representation of the location data.

Notice, though, I said that clustering was the first part of the answer... what is the second? Consider the effect of clustering on your data set as you zoom in from whole Earth view down to city street level. Clustering combined with view-bounds limiting will cause your overall data set to change. When the data points used in the cluster calculation change, the results change, which causes the location points to jump. I called this "jitter". Even just panning the map at a constant zoom level would cause map markers to move around like they were doing some sort of annoying square dance. To overcome the jittery cluster markers, you need to keep the data set used in the cluster calculation constant.

A hybrid approach is required. Basically, add the zoom level to the fetch request.

eventListeners:{
	moveend:function(){
		var bounds = map.getExtent().transform(PROJECTION_INTERNAL, PROJECTION_EXTERNAL).toString();
		var zoom = map.getZoom();

		$.ajax('poi/v3/fetch/' + bounds + '/' + zoom, { contentType:'application/json' }).done(function(data){
			updateMarkers(map, data);
		});
	}
}

At the lower zoom levels, up to a configured threshold, you calculate the clusters across the whole data set (not bound by view) and cache this cluster data so that the calculation will only be done on the first call. Since zoom is not a function of this calculation, there can be one cached data set for all of the zoom levels below the specified threshold. Then, when the user zooms into the higher zoom levels (over the threshold), the actual data points (filtered by the view bounds) are returned by the fetch.

If you look at demo V3, you can see this in action, for 10-thousand points:

V3 10k View

And if you run the demo with a one-million point data set, you will see the same view. The initial load will take a bit longer but once loaded, it should perform nicely. What you may notice, though is that once you cross the clustered threshold you may suddenly get a large data set again... not overly so, but just more than you might expect. This is an area that you would want to tune to your specific needs so that you have a balance of when this change occurs to get the best perceived results.

You could stop here and be done with it, but depending on how your data is distributed you could still run into some overly-dense visual areas. Consider the case where you generate a million data points, but only in the Western Hemisphere.

If you build a one-million point data set for only the Americas, you can see that there are still some overly-dense areas even with the clustering. Since I am using OpenLayers as the mapping API, I can use their client-side clustering mechanism to help resolve this. With the client-side clustering enabled, the mapping API will groups markers together by distance to help de-clutter the view. If you look at V3 again, you can see the cluster clutter problem:

V3 West

You can see that there are still some areas of high marker density. The client-side clustering strategy in OpenLayers can help relieve the clutter a bit:

new OpenLayers.Layer.Vector('Data',{
	style: OpenLayers.Util.applyDefaults(
		{
			fillColor:'#00ff00'
		},
		OpenLayers.Feature.Vector.style['default']
	),
	strategies:[
		new OpenLayers.Strategy.Cluster({
			distance:50,
			threshold:3
		})
	]
})

as can be seen in V4:

V4 West

But, it is more apparent when you zoom in:

V4 West Zoom

You can see now that the green markers are client-side clusters and the blue markers are server-side points (clusters or single locations).

At the end of all that you have a map with client-side clustering to handle visual density at the local level. You have server-side clustering at more-global zoom levels, with caching to remove jitter and reduce calculation time and you have actual location points being served filtered by bounds. It seems like a lot of effort, but overall the code itself is fairly simple and straight-forward... and now we can support a million data points with no real issues or loss of information.

One thing I have not mentioned here is the use of GIS databases or extensions. My goal here was more conceptual, but should you be faced with this kind of problem, you should look into the GIS support for your data storage solution since being able to run queries directly on the bounding shape can be more efficient with GIS solutions in place.

Javassist - Mind Blown

25 May 2013 ~ blog, java

I have been doing a lot with Java reflection recently in one of my personal projects and while doing some research I came across the Javassist bytecode manipulation API.

Javassist allows you to create new classes and/or manipulate existing classes at runtime... at the bytecode level, and it does it without you having to understand all the deep down details of classfiles.

Let's take an example and say that I have an interface:

package jsist;

public interface Greeter {

	String sayHello( String name );
	
	String sayGoodbye( String name );
}

It's very easy to dynamically implement that interface at runtime, but first we need a little demo application:

package jsist;

public class Demo {

    private static final ClassPool CLASS_POOL = ClassPool.getDefault();
    private static CtClass STRING_CLASS;

    static {
        try{
            STRING_CLASS = CLASS_POOL.get( "java.lang.String" );
        } catch( NotFoundException e ){
            e.printStackTrace();
        }
    }

    public static void main( final String[] args ) throws Exception {
		useIt( implementIt() );
	}
	
	private static Class implementIt() throws Exception {
		// will contain our javassist code
	}
	
	private static void useIt( Class clazz ) throws  Exception {
        System.out.println( clazz );

        Greeter greeter = (Greeter)clazz.newInstance();

        System.out.println("Hi : " + greeter.sayHello("Bytecode"));
        System.out.println("Bye: " + greeter.sayGoodbye( "Java" ));
    }
}

This will give us a simple test bed for the various dynamic implementations of the Greeter interface. Basically, it builds an implementation of the interface, prints out the class and the result of executing the two methods. Now for the fun part.

Our first example will be a simple implementation of the interface:

private static Class implementIt() throws Exception {
	CtClass greeterClass = CLASS_POOL.makeClass("jsist.gen.GreeterImpl");
	greeterClass.addInterface( CLASS_POOL.get("jsist.Greeter") );

	CtMethod sayHelloMethod = new CtMethod( STRING_CLASS, "sayHello", new CtClass[]{STRING_CLASS}, greeterClass );
	greeterClass.addMethod( sayHelloMethod );
	sayHelloMethod.setBody( "{return \\"Hello, \\" + $1;}" );

	CtMethod sayGoodbyeMethod = new CtMethod( STRING_CLASS, "sayGoodbye", new CtClass[]{STRING_CLASS}, greeterClass );
	greeterClass.addMethod( sayGoodbyeMethod );
	sayGoodbyeMethod.setBody( "return \\"Goodbye, \\" + $1;" );

	greeterClass.setModifiers(greeterClass.getModifiers() & ~Modifier.ABSTRACT);

	return greeterClass.toClass();
}

We start off by creating a new class called jsist.gen.Greeter where the package name does not need to exist; it will be created. We then need to add the interface we want to implement, the jsist.Greeter interface. Next we have to provide method implementations.

It feels a bit odd to create a CtMethod object with the greeterClass instance and then add the method to the instance, but this is the pattern that is used. I am sure there must be some internal reason for doing so.

The setBody(String) method is the key worker here. It allows you to provide source code as a template using the Javassist source template language. With what I have done above it it equivalent to:

return "Hello, " + arg0;

for the sayHello(String) method, and similar for the other. The important thing to note here is that your provided source is compiled down to Java bytecode, this is not some embedded scripting language.

Next we need to change the modifiers of the class to remove "abstract", and then with a call to the toClass() method we have a standard Java Class object representing our newly created implementation.

If you run the demo with this, you will get:

class jsist.gen.GreeterImpl
Hi : Hello, Bytecode
Bye: Goodbye, Java

Ok, that was fun, but how about an abstract class? Let's say we have an abstract implemenation of the Greeter interface:

public abstract class AbstractGreeter implements Greeter {

    @Override
    public String sayGoodbye( String name ){
        return "(Abstract) Goodbye, " + name;
    }
}

Note, I have implemented the sayGoodbye(String) method but not the sayHello(String) to make things more interesting. Our implementation of the implementIt() method now becomes:

private static Class implementIt() throws Exception {
	CtClass greeterClass = CLASS_POOL.makeClass( "jsist.gen.GreeterImpl" );
	greeterClass.setSuperclass( CLASS_POOL.get("jsist.AbstractGreeter") );

	CtMethod sayHelloMethod = new CtMethod( STRING_CLASS, "sayHello", new CtClass[]{STRING_CLASS}, greeterClass );
	greeterClass.addMethod( sayHelloMethod );
	sayHelloMethod.setBody( "{return \\"Hello, \\" + $1;}" );

	greeterClass.setModifiers(greeterClass.getModifiers() & ~Modifier.ABSTRACT);

	return greeterClass.toClass();
}

The first difference to note is that now we are setting the superclass rather than the interface, since our superclass already implements the interface. Also, notice that since we already have an implementation of the sayGoodbye(String) method, we only need to implement sayHello(String). Other than that, there is little difference. When you run with this implementation you get:

class jsist.gen.GreeterImpl
Hi : Hello, Bytecode
Bye: (Abstract) Goodbye, Java

As expected, our dynamic implementation plays nicely with the concrete implementation.

Now, what if you already have objects that implement the functionality of the two interface methods, but that do not implement the Greeter interface? Say, we have:

public class Hello {

    public String say( String name ){
        return "(Delegate) Hello, " + name;
    }
}

public class Goodbye {

    public String say( String name ){
        return "(Delegate) Goodbye, " + name;
    }
}

You can easily implement the interface by copying the methods from these classes:

private static Class implementIt() throws Exception {
	CtClass greeterClass = CLASS_POOL.makeClass("jsist.gen.GreeterImpl");
	greeterClass.addInterface( CLASS_POOL.get("jsist.Greeter") );

	CtClass helloClass = CLASS_POOL.get( "jsist.Hello" );
	CtMethod helloSay = helloClass.getMethod( "say", "(Ljava/lang/String;)Ljava/lang/String;" );

	CtMethod sayHelloMethod = new CtMethod( STRING_CLASS, "sayHello", new CtClass[]{STRING_CLASS}, greeterClass );
	greeterClass.addMethod( sayHelloMethod );
	sayHelloMethod.setBody( helloSay, null );


	CtClass gbClass = CLASS_POOL.get( "jsist.Goodbye" );
	CtMethod gbSay = gbClass.getMethod( "say", "(Ljava/lang/String;)Ljava/lang/String;" );

	CtMethod sayGoodbyeMethod = new CtMethod( STRING_CLASS, "sayGoodbye", new CtClass[]{STRING_CLASS}, greeterClass );
	greeterClass.addMethod( sayGoodbyeMethod );
	sayGoodbyeMethod.setBody( gbSay, null );

	greeterClass.setModifiers(greeterClass.getModifiers() & ~Modifier.ABSTRACT);

	return greeterClass.toClass();
}

This version is similar to the original interface implementation, except that now rather than providing source code for the method bodies, we provide a method object. You first find the Hello class in the ClassPool and then find it's say(String) method - the description string is the formal JVM parameter format, but I found it simple to dump out the methods and just copy it as a shortcut.

If you run this version, you get:

class jsist.gen.GreeterImpl
Hi : (Delegate) Hello, Bytecode
Bye: (Delegate) Goodbye, Java

Showing that both methods were from the delegate classes.

For our final example, to round things out, let's go back to the abstract class and provide a delegate for the abstract method rather than source:

private static Class implementIt() throws Exception {
	CtClass greeterClass = CLASS_POOL.makeClass( "jsist.gen.GreeterImpl" );
	greeterClass.setSuperclass( CLASS_POOL.get("jsist.AbstractGreeter") );

	CtClass helloClass = CLASS_POOL.get( "jsist.Hello" );
	CtMethod helloSay = helloClass.getMethod( "say", "(Ljava/lang/String;)Ljava/lang/String;" );

	CtMethod sayHelloMethod = new CtMethod( STRING_CLASS, "sayHello", new CtClass[]{STRING_CLASS}, greeterClass );
	greeterClass.addMethod( sayHelloMethod );
	sayHelloMethod.setBody( helloSay, null );

	greeterClass.setModifiers(greeterClass.getModifiers() & ~Modifier.ABSTRACT);

	return greeterClass.toClass();
}

There is not really anything here, you have not already seen, but when you run it you see:

class jsist.gen.GreeterImpl
Hi : (Delegate) Hello, Bytecode
Bye: (Abstract) Goodbye, Java

As expected, one method provided by the delegate and one by the abstract class' implementation.

There are other bytecode manipulation libraries, but most of the ones I looked at seemed to be very abstract or probably closer to the actual class file format, whereas Javassist is a lot more familar when coming from a Java reflection background.

It seems very powerful and full of interesting potential. I am by no means an expert with it, but I wanted to share what I had found since the documentation is reasonably good, but not very rich with examples.

JUnit Rules

28 March 2013 ~ blog, java, groovy, testing

No, the title is not simply an expression of my love of JUnit, but rather specifies that I will be talking
about the @Rule annotations provided by JUnit... and yes, they do "rule".

Out of the box, JUnit has a handful of useful rules defined for things like
temporary folder and test timeouts. With this post I am going to focus on writing my own rules using extensions of the
ExternalResource rule class.

Suppose we are doing some unit testing of database access code using JDBC. Mocking direct JDBC calls is tedious and not very
productive so we will be using a real database for testing. To keep this post a simple and straight-forward as possible
without forsaking useful content, I am going to use Groovy for the examples and assume that we are using the Spring JDBC
framework and some random database.

We have a PersonDao for storing the name and email address of people in
the database.

class PersonDao {
    JdbcTemplate jdbcTemplate

    void createPerson( person ){
        jdbcTemplate.update('insert into people (name,email) values (?,?)', person.name, person.email )
    }
}

We are only going to worry about a simple create operation since we are discussing the rules, not the testing itself.
We first need to have a test case to work with:

class PersonDaoTest {
    private PersonDao personDao

    @Before void before(){
        personDao = new PersonDao(
            jdbcTemplate: null // ?
        )
    }
}

Right out of the gate we run into our first hurdle... we need a JdbcTempate to inject. We could just connect to a
database or fire up an embedded database right here and move on, but we can assume that if there is one of these
tests, there will be many so a reusable solution would be best. Enter the JUnit rules. Basically, the rules are just
reusable code that implements a simple interface to provide operations before and after test classes or methodes
(depending on the rule annotation).

For our first rule, we want to setup a database environment to test with.

class DatabaseEnvironment extends ExternalResource {
    DataSource dataSource

    JdbcTemplate getJdbcTemplate(){
        new JdbcTemplate(dataSource: dataSource)
    }

    @Override
    protected void before() throws Throwable {
        Connection conn
        try {
            conn = getDataSource().getConnection()
            final Liquibase liquibase = new Liquibase(
                "src/main/resources/changelog.xml",
                new FileSystemResourceAccessor(),
                new JdbcConnection( conn )
            )
            liquibase.dropAll()
            liquibase.update( "test" )
        } catch( ex ){
            fail(ex.message)
        } finally {
            conn?.close()
        }
    }
}

Remember, we are assuming that you have some DataSource that you are using for testing. When the before() method is
called, our database is destroyed if it exists and is then recreated to the fresh empty state. I am using liquibase for
database management, but any means of creating and destroying your database would work here.

Note: that I do not destroy the database in the after() method. This is intentional; it allows you to investigate the data conditions of a failed test.

We can now integrate this into the test case and move forward:

class PersonDaoTest {

    @ClassRule public DatabaseEnvironment dbEnvironment = new DatabaseEnvironment(
        dataSource: myTestDataSource // you must define somewhere
    )

    private PersonDao personDao

    @Before void before(){
        personDao = new PersonDao(
            jdbcTemplate: dbEnvironment.jdbcTemplate
        )
    }
}

I defined the DatabaseEnvironment as a @ClassRule so that the database is created once for each test class, rather than
for every test method. Now we can add an actual test method.

class PersonDaoTest {

    @ClassRule public DatabaseEnvironment dbEnvironment = new DatabaseEnvironment(
        dataSource: myTestDataSource // you must define somewhere
    )

    private PersonDao personDao

    @Before void before(){
        personDao = new PersonDao(
            jdbcTemplate: dbEnvironment.jdbcTemplate
        )
    }

    @Test void 'createPerson: simple'(){
        personDao.createPerson([ name:'Chris', email:'chris@stehno.com' ])

        assert 1 == JdbcTestUtils.countRowsInTable(dbEnvironment.jdbcTemplate, 'people')
    }
}

The test runs and passes with a fresh database every time. There is still a hidden problem here though, let's add another
test method. This is a bit arbitrary but let's test the case when you add a person with no email address (successfully);
we add the following test method:

@Test void 'createPerson: simple'(){
    personDao.createPerson([ name:'Chris' ])

    assert 1 == JdbcTestUtils.countRowsInTable(dbEnvironment.jdbcTemplate, 'people')
}

Now, if you run all the tests (not just the one you added), the test will fail with a value of 2 where 1 was expected.
Why? The database is created and destroyed per-class, not per-test so you are working with a database that already has
data in it. To get around this we could make the database work per-test, but depending on how large your schema is,
this could be time consuming and greatly increase your test runtime. What we want is to clean up the existing database
in-place after each test. Another ExternalResource rule to the rescue!

class DatabaseCleaner extends ExternalResource {
    JdbcTemplate jdbcTemplate

    def tables = []

    @Override
    protected void before() throws Throwable {
        tables.each { table->
            jdbcTemplate.execute("truncate table $table cascade")
        }
    }
}

Here we have defined an ExternalResource rule which will truncate a specified collection of tables each time the before()
method is called. We want to use this as an instance rule, and again, we do nothing in the after() method so that our
data is in a known-failed state for a failed test. Our test case becomes:

class PersonDaoTest {

    @ClassRule public DatabaseEnvironment dbEnvironment = new DatabaseEnvironment(
        dataSource: myTestDataSource // you must define somewhere
    )

    @Rule public DatabaseCleaner dbCleaner = new DatabaseCleaner(
        jdbcTemplate: dbEnvironment.jdbcTemplate,
        tables:['people']
    )

    private PersonDao personDao

    @Before void before(){
        personDao = new PersonDao(
            jdbcTemplate: dbEnvironment.jdbcTemplate
        )
    }

    @Test void 'createPerson: simple'(){
        personDao.createPerson([ name:'Chris', email:'chris@stehno.com' ])

        assert 1 == JdbcTestUtils.countRowsInTable(dbEnvironment.jdbcTemplate, 'people')
    }

    @Test void 'createPerson: simple'(){
        personDao.createPerson([ name:'Chris' ])

        assert 1 == JdbcTestUtils.countRowsInTable(dbEnvironment.jdbcTemplate, 'people')
    }
}

Now when we run the whole test case, we have both tests passing because before each test method, the database is cleaned
in-place.

With just these two rules we have created a stable and flexible means of testing database code. With configuration you
can point your tests at an in-memory database, locally running database or shared database server. For normal unit
testing I would recommend either an embedded database, or when that is not possible a database running local to the
testing machine, but those strategies will have to be discussed another time.


Older posts are available in the archive.