*By Stephen Lazaro*

In the first part of this series, we looked at how to compose different reductions over a single report to efficiently calculate multiple summaries of a report in a single pass. In this blog post, we continue with the same main strategy, this time articulating a very explicit applicative style to pull structured data from a relatively flat format. In this way, we achieve flexible, maintainable, and composable code to wrap a fairly traditional Java library API.

### Study Two: Composable Excel Sheet Processing

Let’s suppose we want — for whatever reason — to parse an Excel spreadsheet. On the JVM there are options of libraries that provide an API allowing us to access the contents of Excel spreadsheets but many of them are, we might say, venerable.

Their APIs are traditional, by and large; sometimes they initialize a fair amount of state, and they handle errors via exceptions. It can be painful to integrate such code into a more functional code base without some effort. Using Scala’s type system we can apply similar ideas as our previous post in a rigorous way to get a flexible and safe API.

Let’s approach this study in a similar fashion as to our last one. Here’s some example code that is meant to parse a single row using Apache POI, one example of an Excel integration library for the JVM:

final case class Loan( | |

id: Int, | |

balance: Double, | |

loanState: String | |

) | |

val workbook = WorkbookFactory.create(i).getSheetAt(0) | |

val idIndex = 0 | |

val balanceIndex = 1 | |

val loanStateIndex = 2 | |

val balances: Iterator[Balance] = sheet.iterator() | |

.asScala | |

.map(row => { | |

val id = row.getCell(idIndex).toInt | |

val balance = row.getCell(balanceIndex).toDouble | |

val loanState = row.getCell(loanStateIndex) | |

Loan(id, balance, loanState) | |

}) |

We notice immediately that, just like our last study, this code certainly works, but it manages to combine walking the iterator with other concerns. More importantly, it has some other troubles. This code is wildly unsafe, in the sense that `getSheet`

, `getCell`

, `toInt`

, and `toDouble`

can all throw exceptions! Since functional programs don’t make a habit of catching and handling exceptions with the traditional machinery, this leads to one bad record knocking over our entire application. Thankfully, in Scala we have tools for handling these issues.

Let’s repeat our procedure from the last post of focusing in closely on one element and attempting to generalize from there.

Here’s a function that gets the ID field from a row:

def getId(row: Row): Int = row.getCell(0).toInt |

As noted before, this is both unsafe and not obviously composable.

We could do this, but it has a few downsides:

def getId(row: Row): Int = try { | |

row.getCell(0).toInt | |

} catch (e) { | |

-1 | |

} |

It’s not quite clear how this is meant to compose with other decoders, and we have to force some value to model failure. The type of this function also does not reflect it’s error handling! We could use a type like `Option`

or `Either`

and manually return from the branches, but we can do something more idiomatic.

In order to handle the errors, we can use the `Try`

abstraction that captures errors and then convert to `Either`

so we can access those errors:

def getId(row: Row): Either[Throwable, Int] = Try(row.getCell(0).toInt).toEither |

Looks good. Entirely safe for our program. But again, how will we combine these? Afterall, we’ll want to get more than one cell from each row!

Let’s take a step back. We need a function of type `Row => Either[Throwable, A]`

in order to parse a value of type `A`

from a row.

Let’s capture that in an abstract notion, same as we did in the last post:

// This trait is a named notion of a kind of thing, our decoders! | |

trait RowDecoder[A] { | |

def decode(row: Row): Either[Throwable, A] | |

} | |

// Some functions related to decoders in general. | |

object RowDecoder { | |

// A way to easily make decoders without too much boilerplate | |

def from(f: Row => Either[Throwable, A]): RowDecoder[A] = | |

new RowDecoder[A] { | |

def decode(row: Row) = f(row) | |

} | |

// Just some useful boilerplate for summoning a RowDecoder | |

// Don't need to know this mechanic, it just allows RowDecoder[Int] to compile | |

def apply[A](implicit rowDecoder: RowDecoder[A]): RowDecoder[A] = rowDecoder | |

} |

Now we have a named *notion* of what it is to decode something out of a row of an Excel spreadsheet.

What we want is a function with the following signature:

// We want to decode _multiple things_ from _one row_ | |

def combine[A, B](left: RowDecoder[A], right: RowDecoder[B]): RowDecoder[(A, B)] = ??? | |

// For example, if we wanted to decode a `Debtor` we want this to work: | |

final case class Debtor(id: Int, name: String) | |

// Whip up a decoder using the decoders of the fields! | |

val debtor: RowDecoder[Debtor] = combine(RowDecoder[Int], RowDecoder[String]).map(Debtor.apply) |

If you’re familiar with functional programming, it immediately occurs to you that we’re talking about *monoidal functors* again (or *applicative functors*).

Let’s look at what we need to implement to get a monoidal functor for our `RowDecoder`

type:

// Aka an Applicative, though we won't talk about the connection here. | |

trait Monoidal[F[_]] { | |

// Named this way for Historical Reasons™ | |

def pure[A](x: A): F[A] | |

def map[A, B](fa: F[A])(f: A => B): F[B] | |

def product[A, B](fa: F[A], fb: F[B]): F[(A, B)] | |

} | |

object Monoidal { | |

// Some convenient boilerplate to "summon" our semigroupal | |

// again, ignore this if it doesn't make sense, allows Semigroupal[RowDecoder].map to work | |

def apply[F[_]](implicit semi: Monoidal[F]): Monoidal[F] = semi | |

} | |

// Specialized to RowDecoder! | |

implicit val monoidalForRowDecoder = new Monoidal[RowDecoder] { | |

def pure[A](x: A): RowDecoder[A] = ??? | |

def map[A, B](fa: RowDecoder[A])(f: A => B): RowDecoder[B] = ??? | |

def product[A, B](fa: RowDecoder[A], fb: RowDecoder[B]): RowDecoder[(A, B)] = ??? | |

} |

- A way to make a decoder that just always returns the value we ask of it (
`pure)`

. - A way to “post process” the result of decoding something from a row (
`map`

). - A way to combine two decoders into one decoder that returns both values (
`product`

).

How can we do that? Well, this will work:

implicit val monoidalForRowDecoder = new Monoidal[RowDecoder] { | |

// This always succeeds returning the given value! | |

def pure[A](x: A): RowDecoder[A] = | |

RowDecoder.from(Function.const(Right(x))) | |

// This processes the output of the given decode with the given function | |

def map[A, B](fa: RowDecoder[A])(f: A => B): RowDecoder[B] = | |

RowDecoder.from(x => fa.decode(x).map(f)) | |

// Combine the output of two into one! | |

def product[A, B](fa: RowDecoder[A], fb: RowDecoder[B]): RowDecoder[(A, B)] = | |

RowDecoder.from(x => fa.decode(x).flatMap(a => f.decode(x).map(b => (a, b)))) | |

} |

Now we can use it to combine our decoders! `Monoidal`

let’s define some goodies. If the definitions are opaque, *do not worry,* they are generally provided in libraries and one just pays attention to what they *do*:

def map2[F[_]: Monoidal, A, B, C](fa: F[A], fb: F[B])(f: (A, B) => C): F[C] = | |

Monoidal[F].map(Monoidal[F].product(fa, fb))(f.tupled) | |

def map3[F[_]: Monoidal](fa: F[A], fb: F[B], fc: F[C])(f: (A, B, C) => D): F[D] = | |

Monoidal[F].map( | |

Monoidal[F].product(Monoidal[F].product(fa, fb), fc)) | |

)({ | |

case ((a, b), c) => f(a, b, c) | |

}) |

What do these functions mean in terms of our use case?

`map2`

let’s us process the output of two different decoders, failing if either does.

`map3`

let’s us process the output of *three *different decoders after all of them produce, failing only when any one of them does.

Note that if a bad row is passed in it fails on the first decoding step to go wrong, and returns *that* error, going from left to right essentially. We could collect all errors using slightly different data structures, but for now let’s just sit with keeping only the first thing that goes wrong.

But now we have exactly what we needed in order to parse each row into our record type! Let’s see what this new API looks like:

def getSheet: Either[Throwable, Sheet] = Try(WorkbookFactory.create(i).getSheetAt(0)).toEither | |

val decodeId: RowDecoder[Int] = | |

RowDecoder.from(row => Try(row.getCell(0).toInt).toEither) | |

val decodeBalance: RowDecoder[Double] = | |

RowDecoder.from(row => Try(row.getCell(1).toDouble).toEither) | |

val decodeLoanState: RowDecoder[String] = | |

RowDecoder.from(row => Try(row.getCell(2)).toEither) | |

// Boom! Decode each row into a Loan object so long as we are able to decode each cell. | |

val decodeLoan: RowDecoder[Loan] = map3( | |

decodeId, | |

decodeBalance, | |

decodeLoanState | |

)(Loan.apply) | |

def processSheet[A: RowDecoder](sheet: Sheet): Iterator[Either[Throwable, Loan]] = | |

sheet.iterator() | |

.asScala | |

.map(RowDecoder[A].decode) | |

val records: Iterator[Either[Throwable, Loan]] = getSheet.flatMap(processSheet[Loan]) |

Et voila! We have ourselves a clean, compositional, and safe procedure for pulling data from an Excel spreadsheet with very clear division of responsibilities. We’ve already gone a nice distance towards maintainable and safe code!

On the other hand, things always change. Imagine our Project Manager passes us the new requirement that we need to parse `nextPaymentDate`

as well, so we enrich our models:

final case class LoanWithPaymentDate( | |

id: Int, | |

balance: Double, | |

loanState: String, | |

nextPaymentDate: Date | |

) |

No problem, right? Now we just make a `map4`

following the same pattern as `map2`

and `map3`

. But wait, looking at the file we realize that there are *multiple date formats* for the next payment date field! Both “yyyy/mm/dd” and “yyyy-mm-dd” at the very least.

Oh well, it’s the rare set of documents that never varies from a fixed schema. We don’t yet have any way to combine our decoders to capture choices between multiple possible strategies or shapes. Are we out of luck?

Let’s contemplate the problem in front of us:

// One way to parse a date: | |

val slashParser = DateTimeFormatter.ofPattern("yyyy/MM/dd") | |

// Another way to parse a date: | |

val dashParser = DateTimeFormatter.ofPattern("yyyy-MM-dd") | |

val withSlashes: RowDecoder[Date] = | |

RowDecoder.from(x => Either.catchNonFatal(slashParser.parse(x))) | |

val withDashes: RowDecoder[Date] = | |

RowDecoder.from(x => Either.catchNonFatal(dashParser.parse(x))) | |

// What do we do here ??? | |

val eitherWithSlashesOrDashes: RowDecoder[Date] = ??? |

We need a way to say “or” for decoders. How hard can it be?

It turns out if we extend the notions we’ve been working with a bit, we have just enough power to do this.

// Notice we're just extending our prior notion. | |

trait Alternative[F[_]] { | |

// Kind of like true, or 1 | |

def pure[A](x: A): F[A] | |

// Kind of like boolean AND, or multiplication | |

def product[A, B](fa: F[A], fb: F[B]): F[(A, B)] | |

// Kind of like false, or 0 | |

def empty[A]: F[A] | |

// Kind of like boolean XOR, or addition | |

// Named this way for Reasons™ | |

def combineK[A](x: F[A], y: F[A]): F[A] | |

} |

We have two new functions to write:

`empty`

, which takes any*type*and gives us a decoder of that type`combineK`

, which takes any two decoders of the same time, and lets us try them both

So now that we know what we need, how do we get it for our `RowDecoder`

? Try it out:

implicit val alternativeForRowDecoder = | |

new Alternative[RowDecoder] { | |

// Kind of like true, same as in our Monoidal above | |

def pure[A](x: A): RowDecoder[A] = | |

Monoidal[RowDecoder].pure(x) | |

// Kind of like AND, same as in our Monoidal abaove | |

def product[A, B](fa: RowDecoder[A], fb: RowDecoder[B]): RowDecoder[(A, B)] = | |

Monoidal[RowDecoder].product(fa, fb) | |

// Kind of like false but for decoders | |

def empty[A]: F[A] = | |

new RowDecoder[A] { | |

def apply(s: String) = Left(new RuntimeError("no dice")) | |

} | |

// Kind of like XOR but for decoders | |

def combineK[A](x: F[A], y: F[A]): F[A] = | |

RowDecoder.from(z => | |

x.decode(z) match { | |

case Left(_) => y.decode(z) | |

case Right(v) => Right(v) | |

}) | |

} |

The function `empty`

always fails to parse anything it is given. This makes sense because with no information about the type we want to decode things into, what else could it do?

On the other hand, `combineK`

says “try the first parsing strategy, and — if that fails — try the second!” In other words, `combineK`

lets us *prioritize* different possible ways of parsing the same kind of value so that we can make choices between different possible suitable inputs.

In order to make our code a little more intimidating (and standard!), we can define some operators:

implicit class altOps[F[_]: Alternative](x: F[A]) { | |

def <+>(y: F[A]): F[A] = Alternative[F].combineK(x, y) | |

} |

Great, so now let’s see it in action:

// Combine two patterns. | |

val dateDecoder: RowDecoder[Date] = | |

withSlashes <+> withDashes | |

// It really does work. | |

val Right(dateA) = dateDecoder("1986/10/10") | |

val Right(dateB) = dateDecoder("2010–10–10") |

Nice. Now we have error handling *built into our parser notion*. We can account for any collection of variations!

Let’s take that literally and make this easier on ourselves, because it’s going to get annoying if we have to write `combine`

and `<+>`

over and over for longer ways of combining things:

// "Sum up" all the decoders in a list! | |

def altSum[A](patterns: List[RowDecoder[A]]): RowDecoder[A] = | |

patterns.foldLeft(Alternative[RowDecoder].empty[A])(_ <+> _) | |

// Four different patterns! | |

val variableDateDecoder: RowDecoder[Date] = | |

altSum( | |

List( | |

withSlashes, | |

withDashes, | |

RowDecoder.from(x => Either.catchNonFatal( | |

DateTimeFormatter.ofPattern("yyyy-MM-dd'T'hh:mm:ss").parse(x))), | |

RowDecoder.from(x => Either.catchNonFatal( | |

DateTimeFormatter.ofPattern("yyyy-MM-dd'T'hh").parse(x))) | |

) | |

) | |

// Putting it all together… | |

implicit val loanDecoder: RowDecoder[LoanWithPaymentDate] = | |

map4( | |

decodeId, | |

decodeBalance, | |

decodeLoanState, | |

variableDateDecoder | |

)(LoanWithPaymentDate.apply) |

Even better. And clearly we could do the same thing with pretty much any collection. Now we can encode relatively complex handling fairly straightforwardly inside this parser layer with power very near to that of boolean algebra. AND and OR go a pretty long way. After all, that’s pretty much all you get with computers!

### Coda

Thanks for reading. I hope this trial of parsing or decoding with applicative style (and it’s fancier extensions) was worth your time. If your interest was piqued and you missed part one, you can find it here.

If you like this stuff, Earnest is hiring and we’d love to talk!

If you want to read more about these topics, here are some classic resources on the use of applicative style in decoding and deserializing:

**Published:**November 14, 2018