Alert Message
Thanks for signing up! We hope you enjoy our newsletter, The Teller.
Compositionality
engineering

Using Applicative Style to Increase Compositionality : Part 2

By Stephen Lazaro

In the first part of this series, we looked at how to compose different reductions over a single report to efficiently calculate multiple summaries of a report in a single pass. In this blog post, we continue with the same main strategy, this time articulating a very explicit applicative style to pull structured data from a relatively flat format. In this way, we achieve flexible, maintainable, and composable code to wrap a fairly traditional Java library API.

Study Two: Composable Excel Sheet Processing

Let’s suppose we want — for whatever reason — to parse an Excel spreadsheet. On the JVM there are options of libraries that provide an API allowing us to access the contents of Excel spreadsheets but many of them are, we might say, venerable.

Their APIs are traditional, by and large; sometimes they initialize a fair amount of state, and they handle errors via exceptions. It can be painful to integrate such code into a more functional code base without some effort. Using Scala’s type system we can apply similar ideas as our previous post in a rigorous way to get a flexible and safe API.

Let’s approach this study in a similar fashion as to our last one. Here’s some example code that is meant to parse a single row using Apache POI, one example of an Excel integration library for the JVM:

final case class Loan(
  id: Int,
  balance: Double,
  loanState: String
)
val workbook = WorkbookFactory.create(i).getSheetAt(0)
val idIndex = 0
val balanceIndex = 1
val loanStateIndex = 2
val balances: Iterator[Balance] = sheet.iterator()
 .asScala
 .map(row => {
  val id = row.getCell(idIndex).toInt
  val balance = row.getCell(balanceIndex).toDouble
  val loanState = row.getCell(loanStateIndex)
Loan(id, balance, loanState)
  })
Dead simple Excel sheet parsing

We notice immediately that, just like our last study, this code certainly works, but it manages to combine walking the iterator with other concerns. More importantly, it has some other troubles. This code is wildly unsafe, in the sense that getSheetgetCelltoInt, and toDouble can all throw exceptions! Since functional programs don’t make a habit of catching and handling exceptions with the traditional machinery, this leads to one bad record knocking over our entire application. Thankfully, in Scala we have tools for handling these issues.

Let’s repeat our procedure from the last post of focusing in closely on one element and attempting to generalize from there.

Here’s a function that gets the ID field from a row:

def getId(row: Row): Int = row.getCell(0).toInt
Parsing a single cell from a row

As noted before, this is both unsafe and not obviously composable.

We could do this, but it has a few downsides:

def getId(row: Row): Int = try {
  row.getCell(0).toInt
} catch (e) {
  -1
}
view raw using_a_try.scala hosted with ❤ by GitHub
Obviously, we try to catch errors.

It’s not quite clear how this is meant to compose with other decoders, and we have to force some value to model failure. The type of this function also does not reflect it’s error handling! We could use a type like Option or Either and manually return from the branches, but we can do something more idiomatic.

In order to handle the errors, we can use the Try abstraction that captures errors and then convert to Either so we can access those errors:

def getId(row: Row): Either[Throwable, Int] = Try(row.getCell(0).toInt).toEither
Catch errors in a data structure.

Looks good. Entirely safe for our program. But again, how will we combine these? Afterall, we’ll want to get more than one cell from each row!

Let’s take a step back. We need a function of type Row => Either[Throwable, A] in order to parse a value of type A from a row.

Let’s capture that in an abstract notion, same as we did in the last post:

// This trait is a named notion of a kind of thing, our decoders!
trait RowDecoder[A] {
  def decode(row: Row): Either[Throwable, A]
}
// Some functions related to decoders in general.
object RowDecoder {
// A way to easily make decoders without too much boilerplate
  def from(f: Row => Either[Throwable, A]): RowDecoder[A] =
  new RowDecoder[A] {
  def decode(row: Row) = f(row)
  }
 
  // Just some useful boilerplate for summoning a RowDecoder
// Don't need to know this mechanic, it just allows RowDecoder[Int] to compile
 def apply[A](implicit rowDecoder: RowDecoder[A]): RowDecoder[A] = rowDecoder
}
view raw row_decoder.scala hosted with ❤ by GitHub
We capture the notion of a decoder in a data type.

Now we have a named notion of what it is to decode something out of a row of an Excel spreadsheet.

What we want is a function with the following signature:

// We want to decode _multiple things_ from _one row_
def combine[A, B](left: RowDecoder[A], right: RowDecoder[B]): RowDecoder[(A, B)] = ???
// For example, if we wanted to decode a `Debtor` we want this to work:
final case class Debtor(id: Int, name: String)
// Whip up a decoder using the decoders of the fields!
val debtor: RowDecoder[Debtor] = combine(RowDecoder[Int], RowDecoder[String]).map(Debtor.apply)
view raw what_we_want.scala hosted with ❤ by GitHub
We want to combine decoders.

If you’re familiar with functional programming, it immediately occurs to you that we’re talking about monoidal functors again (or applicative functors).

Let’s look at what we need to implement to get a monoidal functor for our RowDecoder type:

// Aka an Applicative, though we won't talk about the connection here.
trait Monoidal[F[_]] {
  // Named this way for Historical Reasons™
  def pure[A](x: A): F[A
  def map[A, B](fa: F[A])(f: A => B): F[B]
  def product[A, B](fa: F[A], fb: F[B]): F[(A, B)]
}
object Monoidal {
  // Some convenient boilerplate to "summon" our semigroupal
// again, ignore this if it doesn't make sense, allows Semigroupal[RowDecoder].map to work
  def apply[F[_]](implicit semi: Monoidal[F]): Monoidal[F] = semi
}
// Specialized to RowDecoder!
implicit val monoidalForRowDecoder = new Monoidal[RowDecoder] {
 def pure[A](x: A): RowDecoder[A] = ???
 def map[A, B](fa: RowDecoder[A])(f: A => B): RowDecoder[B] = ???
 def product[A, B](fa: RowDecoder[A], fb: RowDecoder[B]): RowDecoder[(A, B)] = ???
}
Monoidal functors.
  1. A way to make a decoder that just always returns the value we ask of it (pure).
  2. A way to “post process” the result of decoding something from a row (map).
  3. A way to combine two decoders into one decoder that returns both values (product).

How can we do that? Well, this will work:

implicit val monoidalForRowDecoder = new Monoidal[RowDecoder] {
 // This always succeeds returning the given value!
 def pure[A](x: A): RowDecoder[A] =
  RowDecoder.from(Function.const(Right(x)))
// This processes the output of the given decode with the given function
  def map[A, B](fa: RowDecoder[A])(f: A => B): RowDecoder[B] =
  RowDecoder.from(x => fa.decode(x).map(f))
// Combine the output of two into one!
def product[A, B](fa: RowDecoder[A], fb: RowDecoder[B]): RowDecoder[(A, B)] =
  RowDecoder.from(x => fa.decode(x).flatMap(a => f.decode(x).map(b => (a, b))))
}
Combining decoders, finally!

Now we can use it to combine our decoders! Monoidal let’s define some goodies. If the definitions are opaque, do not worry, they are generally provided in libraries and one just pays attention to what they do:

def map2[F[_]: Monoidal, A, B, C](fa: F[A], fb: F[B])(f: (A, B) => C): F[C] =
  Monoidal[F].map(Monoidal[F].product(fa, fb))(f.tupled)
def map3[F[_]: Monoidal](fa: F[A], fb: F[B], fc: F[C])(f: (A, B, C) => D): F[D] =
  Monoidal[F].map(
  Monoidal[F].product(Monoidal[F].product(fa, fb), fc))
  )({
  case ((a, b), c) => f(a, b, c)
  })
view raw map_n.scala hosted with ❤ by GitHub
Various post-processing idioms for decoders.

What do these functions mean in terms of our use case?
map2 let’s us process the output of two different decoders, failing if either does.
map3 let’s us process the output of three different decoders after all of them produce, failing only when any one of them does.

Note that if a bad row is passed in it fails on the first decoding step to go wrong, and returns that error, going from left to right essentially. We could collect all errors using slightly different data structures, but for now let’s just sit with keeping only the first thing that goes wrong.

But now we have exactly what we needed in order to parse each row into our record type! Let’s see what this new API looks like:

def getSheet: Either[Throwable, Sheet] = Try(WorkbookFactory.create(i).getSheetAt(0)).toEither
val decodeId: RowDecoder[Int] =
RowDecoder.from(row => Try(row.getCell(0).toInt).toEither)
val decodeBalance: RowDecoder[Double] =
RowDecoder.from(row => Try(row.getCell(1).toDouble).toEither)
val decodeLoanState: RowDecoder[String] =
RowDecoder.from(row => Try(row.getCell(2)).toEither)
// Boom! Decode each row into a Loan object so long as we are able to decode each cell.
val decodeLoan: RowDecoder[Loan] = map3(
 decodeId,
 decodeBalance,
 decodeLoanState
)(Loan.apply)
def processSheet[A: RowDecoder](sheet: Sheet): Iterator[Either[Throwable, Loan]] =
sheet.iterator()
  .asScala
  .map(RowDecoder[A].decode)
val records: Iterator[Either[Throwable, Loan]] = getSheet.flatMap(processSheet[Loan])
Parsing the sheet with our new machinery

Et voila! We have ourselves a clean, compositional, and safe procedure for pulling data from an Excel spreadsheet with very clear division of responsibilities. We’ve already gone a nice distance towards maintainable and safe code!

On the other hand, things always change. Imagine our Project Manager passes us the new requirement that we need to parse nextPaymentDate as well, so we enrich our models:

final case class LoanWithPaymentDate(
  id: Int,
  balance: Double,
  loanState: String,
  nextPaymentDate: Date
)
view raw loan_with_date.scala hosted with ❤ by GitHub
Loans, now with payment dates!

No problem, right? Now we just make a map4 following the same pattern as map2 and map3. But wait, looking at the file we realize that there are multiple date formats for the next payment date field! Both “yyyy/mm/dd” and “yyyy-mm-dd” at the very least.

Oh well, it’s the rare set of documents that never varies from a fixed schema. We don’t yet have any way to combine our decoders to capture choices between multiple possible strategies or shapes. Are we out of luck?

Let’s contemplate the problem in front of us:

// One way to parse a date:
val slashParser = DateTimeFormatter.ofPattern("yyyy/MM/dd")
// Another way to parse a date:
val dashParser = DateTimeFormatter.ofPattern("yyyy-MM-dd")
val withSlashes: RowDecoder[Date] =
  RowDecoder.from(x => Either.catchNonFatal(slashParser.parse(x)))
val withDashes: RowDecoder[Date] =
  RowDecoder.from(x => Either.catchNonFatal(dashParser.parse(x)))
// What do we do here ???
val eitherWithSlashesOrDashes: RowDecoder[Date] = ???
The problem of choice.

We need a way to say “or” for decoders. How hard can it be?

It turns out if we extend the notions we’ve been working with a bit, we have just enough power to do this.

// Notice we're just extending our prior notion.
trait Alternative[F[_]] {
  // Kind of like true, or 1
  def pure[A](x: A): F[A]
// Kind of like boolean AND, or multiplication
 def product[A, B](fa: F[A], fb: F[B]): F[(A, B)]
// Kind of like false, or 0
  def empty[A]: F[A]
// Kind of like boolean XOR, or addition
  // Named this way for Reasons™
  def combineK[A](x: F[A], y: F[A]): F[A]
}
view raw alternative.scala hosted with ❤ by GitHub
A slightly stronger strategy!

We have two new functions to write:

  • empty, which takes any type and gives us a decoder of that type
  • combineK, which takes any two decoders of the same time, and lets us try them both

So now that we know what we need, how do we get it for our RowDecoder? Try it out:

implicit val alternativeForRowDecoder =
new Alternative[RowDecoder] {
  // Kind of like true, same as in our Monoidal above
  def pure[A](x: A): RowDecoder[A] =
Monoidal[RowDecoder].pure(x)
// Kind of like AND, same as in our Monoidal abaove
  def product[A, B](fa: RowDecoder[A], fb: RowDecoder[B]): RowDecoder[(A, B)] =
Monoidal[RowDecoder].product(fa, fb)
// Kind of like false but for decoders
  def empty[A]: F[A] =
new RowDecoder[A] {
  def apply(s: String) = Left(new RuntimeError("no dice"))
  }
// Kind of like XOR but for decoders
  def combineK[A](x: F[A], y: F[A]): F[A] =
  RowDecoder.from(z =>
  x.decode(z) match {
  case Left(_) => y.decode(z)
  case Right(v) => Right(v)
  })
}
Our new powers for decoders. The comments are literal.

The function empty always fails to parse anything it is given. This makes sense because with no information about the type we want to decode things into, what else could it do?

On the other hand, combineK says “try the first parsing strategy, and — if that fails — try the second!” In other words, combineK lets us prioritize different possible ways of parsing the same kind of value so that we can make choices between different possible suitable inputs.

In order to make our code a little more intimidating (and standard!), we can define some operators:

implicit class altOps[F[_]: Alternative](x: F[A]) {
  def <+>(y: F[A]): F[A] = Alternative[F].combineK(x, y)
}
view raw scary_symbols.scala hosted with ❤ by GitHub
Adding some cool syntax cause we’re cool.

Great, so now let’s see it in action:

// Combine two patterns.
val dateDecoder: RowDecoder[Date] =
  withSlashes <+> withDashes
// It really does work.
val Right(dateA) = dateDecoder("1986/10/10")
val Right(dateB) = dateDecoder("2010–10–10")
Combining two strategies of date parsing.

Nice. Now we have error handling built into our parser notion. We can account for any collection of variations!

Let’s take that literally and make this easier on ourselves, because it’s going to get annoying if we have to write combine and <+> over and over for longer ways of combining things:

// "Sum up" all the decoders in a list!
def altSum[A](patterns: List[RowDecoder[A]]): RowDecoder[A] =
  patterns.foldLeft(Alternative[RowDecoder].empty[A])(_ <+> _)
// Four different patterns!
val variableDateDecoder: RowDecoder[Date] =
 altSum(
 List(
  withSlashes,
  withDashes,
  RowDecoder.from(x => Either.catchNonFatal(
  DateTimeFormatter.ofPattern("yyyy-MM-dd'T'hh:mm:ss").parse(x))),
  RowDecoder.from(x => Either.catchNonFatal(
  DateTimeFormatter.ofPattern("yyyy-MM-dd'T'hh").parse(x)))
  )
 )
// Putting it all together…
implicit val loanDecoder: RowDecoder[LoanWithPaymentDate] =
  map4(
  decodeId,
  decodeBalance,
  decodeLoanState,
  variableDateDecoder
  )(LoanWithPaymentDate.apply)
Finally using our multiple decoding strategies.

Even better. And clearly we could do the same thing with pretty much any collection. Now we can encode relatively complex handling fairly straightforwardly inside this parser layer with power very near to that of boolean algebra. AND and OR go a pretty long way. After all, that’s pretty much all you get with computers!

Coda

Thanks for reading. I hope this trial of parsing or decoding with applicative style (and it’s fancier extensions) was worth your time. If your interest was piqued and you missed part one, you can find it here.

If you like this stuff, Earnest is hiring and we’d love to talk!

If you want to read more about these topics, here are some classic resources on the use of applicative style in decoding and deserializing:

Disclaimer: This blog post provides personal finance educational information, and it is not intended to provide legal, financial, or tax advice.