[ [ A0, A1, A2, A3, A4, A5 ], [ B0, B1, B2, B3, B4, B5 ], [ C0, C1, C2, C3, C4, C5 ] ]
31 October 2015 ~ blog
A few years ago, I was working on a project where we had collections of data spread across multiple rows of data… and then we had to provide a paginated view of that data. This research was the result of those efforts. The discussion here is a bit more rigorous than I usually go into, so if you just want the implementation code jump to the bottom.
Consider that you have a data set representing a collection of collections:
[ [ A0, A1, A2, A3, A4, A5 ], [ B0, B1, B2, B3, B4, B5 ], [ C0, C1, C2, C3, C4, C5 ] ]
We want to retrieve the data in a paginated fashion where the subset (page) with index P
and subset size (page size) S
is used to retrieve only the desired elements in the most efficient means possible.
Consider also that the data sets may be very large and that the internal collections may not be directly associated with the enclosing collection (e.g. two different databases).
Also consider that the subsets may cross collection boundaries or contain fewer than the desired number of elements.
Lastly, requests for data subsets will be more likely discrete events – one subset per request, rather than iterating over all results.
For a page size of four (S = 4
) you would have the following five pages:
P0 : [ A0, A1, A2, A3 ] P1 : [ A4, A5, B0, B1 ] P2 : [ B2, B3, B4, B5 ] P3 : [ C0, C1, C2, C3 ] P4 : [ C4, C5 ]
The overall collection is traversed to determine how many elements are contained within each subcollection; this may be precomputed or done at runtime. Three counts are calculated or derived for each subcollection:
Count (CI
)  the number of elements in the subcollection.
Countbefore (CB
)  the total count of all subcollection elements counted before this collection, but not including this collection.
Countwith (CW
)  the total count of all subcollection elements counted before and including this collection.
For our example data set we would have:
[ { CI:6, CB:0, CW:6 [ A0, A1, A2, A3, A4, A5 ] }, { CI:6, CB:6, CW:12 [ B0, B1, B2, B3, B4, B5 ] }, { CI:6, CB:12, CW:18 [ C0, C1, C2, C3, C4, C5 ] } ]
This allows for a simple means of selecting only the subcollections we are interested in; those containing the desired elements based on the starting and ending indices for the subset (START
and END
respectively). These indices can easily be calculated as:
START = P * S END = START + S – 1
Note

The indices referenced here are for the overall collection, not the individual subcollections. 
The desired elements will reside in subcollections whose inclusive count (CW
) is greater than the starting index and whose preceding count (CB
) is less than or equal to the ending index, or:
CW > START and CB ≤ END
For the case of selecting the second subset of data (P = 1
) with a page size of four (S = 4
) we would have:
START = 4 END = 7
This will select the first two or the three subcollections as "interesting" subcollections containing at least some of our desired elements, namely:
{ CI:6, CB:0, CW:6 [ A0, A1, A2, A3, A4, A5 ] }, { CI:6, CB:6, CW:12 [ B0, B1, B2, B3, B4, B5 ] }
What remains is to gather from these subcollections (call them SC[0]
, SC[1]
) the desired number of elements (S
).
To achieve this, a local starting and ending index must be calculated while iterating through the "interesting" subcollections to gather the elements until either the desired amount is obtained (S
) or there are no more elements available.
Calculate the initial local starting index (LOCAL_START
) by subtracting the noninclusive preceding count value of the first selected collection (SC[0]
) from the overall starting index.
Iterate the selected collections (in order) until the desired amount has been gathered
This is more clearly represented in pseudo code as:
LOCAL_START = START – SC[0].CB REMAINING = S foreach sc in SC while REMAINING > 0 if( REMAINING < (sc.size()  LOCAL_START) ) LOCAL_END = LOCAL_START + REMAINING  1 else LOCAL_END = sc.size()1 FOUND = sc.sub( LOCAL_START, LOCAL_END ) G.addAll( FOUND ) REMAINING = REMAINING – FOUND.size() LOCAL_START = 0 end
Where the gathered collection of elements (G
) is your resulting data set containing the elements for the specified data page.
It must be stated that the ordering of the overall collection and the subcollections must be consistent across multiple data requests for this procedure to work properly.
Ok now, enough discussion. Let’s see what this looks like with some real Groovy code. First, we need our collections of collections data to work with:
def data = [
[ 'A0', 'A1', 'A2', 'A3', 'A4', 'A5' ],
[ 'B0', 'B1', 'B2', 'B3', 'B4', 'B5' ],
[ 'C0', 'C1', 'C2', 'C3', 'C4', 'C5' ]
]
Next, we need to implement the algorithm in Groovy:
int page = 1
int pageSize = 4
// precomputation
int before = 0
def prepared = data.collect {d >
def result = [
countIn: d.size(),
countBefore: before,
countWith: before + d.size(),
values:d
]
before += d.size()
return result
}
// main computation
def localStart = (page * pageSize )  prepared[0].countBefore
def remaining = pageSize
def gathered = []
prepared.each { sc>
if( remaining ){
def localEnd
if( remaining < (sc.values.size()  localStart) ){
localEnd = localStart + remaining  1
} else {
localEnd = sc.values.size()  1
}
def found = sc.values[localStart..localEnd]
gathered.addAll(found)
remaining = found.size()
localStart = 0
}
}
println "P$page : $gathered"
which yields
P1 : [A4, A5, B0, B1]
and if you look all the way back up to the beginning of the article, you see that this is the expected data set for page 1 of the example data.
It’s not a scenario I have run into often, but it was a bit of a tricky one to unravel. The precomputation steps ended up being the key to keeping it simple and stable.