Vanilla TextFileReader/Writer

06 March 2016 ~ blog groovy vanilla

Something I have found myself doing quite often over my whole career as a developer is reading and writing simple text file data. Whether it is a quick data dump or a data set to be loaded from a 3rd party, it is something I end up doing a lot and usually it is something coded mostly from scratch since, surprisingly enough, there are very few tools available for working with formatted text files. Sure, there are a few for CSV, but quite often I get a reqest to read or write a format that is kind of similar to CSV, but just enough different that it breaks a standard CSV parser for whatever reason. Recently, I decided to add some utility components to my Vanilla project with the aim of making these readers and writers simpler to build.

Let’s start off with the com.stehno.vanilla.text.TextFileWriter and say we have a data source of Person objects in our application that the business wants dumped out to a text file (so they can import it into some business tools that only ever seem capable of importing simple text files). In the application, the data structure looks something like this:

class Person {
    String firstName
    String middleName
    String lastName
    int age
    float height
    float weight
}

with the TextFileWriter you need to define a LineFormatter which will be used to format the generated lines of text, one per object written. The LineFormatter defines two methods, String formatComment(String) for formatting a comment line, and String formatLine(Object) for formatting a data line. A simple implementation is provided, the CommaSeparatedLineFormatter will generate comment lines prefixed with a # and will expect a Collection object to be formatted and will format it as a CSV line.

The available implementation will not work for our case, so we will need to define our own LineFormatter. We want the formatted data lines to be of the form:

# Last-Name,First-Name-Middle-Initial,Attrs
Smith,John Q,{age:42, height:5.9, weight:230.5}

Yes, that’s a bit of a convoluted format, but I have had to generate worse. Our LineFormatter ends up being something like this:

class PersonLineFormatter implements LineFormatter {

    @Override
    String formatComment(String text) {
        "# $text" (1)
    }

    @Override
    String formatLine(Object object) {
        Person person = object as Person
        "${person.lastName},${person.firstName} ${person.middleName[0]},{age:${person.age}, height:${person.height}, weight:${person.weight}}" (2)
    }
}
  1. We specify the comment as being prefixed by a # symbol.

  2. Write out the Person object as the formatted String

We see that implementing the LineFormatter keeps all the application specific logic isolated from the common operation of actually writing the file. Now we can use our formatter as follows:

TextFileWriter writer = new TextFileWriter(
    lineFormatter: new PersonLineFormatter(),
    filePath: new File(outputDir, 'people.txt')
)

writer.writeComment('Last-Name,First-Name-Middle-Initial,Attrs')

Collection<Person> people = peopleDao.listPeople()

people.each { Person p->
    writer.write(p)
}

This will write out the text file in the desired format with very little new coding required.

Generally, writing out text representations of application data is not really all that challenging, since you have access to the data you need and some control over the formatting of the objects to be represented. The real challenge is usually going in the other direction, when you are reading in a data file from some external source, this is where the com.stehno.vanilla.text.TextFileReader becomes useful.

Let’s say you receive a request to import the data file we described above, maybe it was generated by the same business tools I mentioned earlier. We have something like this:

# Last-Name,First-Name-Middle-Initial,Attrs
Smith,John Q,{age:42, height:5.9, weight:230.5}
Jones,Robert M,{age:38, height:5.6, weight:240.0}
Mendez,Jose R,{age:25, height:6.1, weight:232.4}
Smalls,Jessica X,{age:30, height:5.5, weight:175.2}

The TextFileReader requires a LineParser to parse the input file lines into objects; it defines three methods, boolean parseable(String) which is used to determine whether or not the line should be parsed, Object[] parseLine(String) which is used to parse the line of text, and Object parseItem(Object, int) which is used to parse an individual element of the comma-separated line. There is a default implementation provided, the CommaSeparatedLineParser will parse simple comma-separated lines of text into arrays of Objects based on configured item converters; however, this will not work in the case of our file since there are commas in the data items themselves (the JSON-like format of the last element). So we need to implement one. Our LineParser will look something like the following:

class PersonLineParser implements LineParser {

    boolean parsable(String line){
        line && !line.startsWith(HASH) (1)
    }

    Object[] parseLine(String line){ (2)
        int idx = 0
        def elements = line.split(',').collect { parseItem(it, idx++) }

        [
            new Person(
                firstName:elements[1][0],
                middleName:elements[1][1],
                lastName:elements[0],
                age:elements[2],
                height:elements[3],
                weight:elements[4],
            )
        ] as Object[]
    }

    // Smith,John Q,{age:42, height:5.9, weight:230.5}
    // 0    ,1     ,2      ,3          ,4
    Object parseItem(Object item, int index){ (3)
        switch(index){
            case 0:
                return item as String
            case 1:
                return item.split(' ')
            case 2:
                return item.split(':')[1] as int
            case 3:
                return item.split(':')[1] as float
            case 4:
                return item.split(':')[1][0..-2] as float
        }
    }
}
  1. We want to ignore blank lines or lines that start with a # symbol.

  2. We extract the line items and build the Person object

  3. We convert the line items to our desired types

It’s not pretty, but it does the job and keeps all the line parsing logic out of the main file loading functionality. Our code to read in the file would look somethign like:

setup:
TextFileReader reader = new TextFileReader(
    filePath: new File(inputDir, 'people.txt'),
    lineParser: new PersonLineParser(),
    firstLine: 2 (1)
)

when:
def people = []

reader.eachLine { Object[] data ->
    lines << data[0]
}
  1. We skip the first line, since it will always be the header

The provided implementations for both the LineFormatter and LineParser will not account for every scenario, but hopefully they will hit some of them and provide a guideline for implementing your own. If nothing else, these components help to streamline the readign and writing of formatted text data so that you can get it done and focus on other more challenging development tasks.


Creative Commons License CoffeaElectronica.com content is copyright © 2016 Christopher J. Stehno and available under a Creative Commons Attribution-ShareAlike 4.0 International License.