« Google Chrome and .NET - Almost, But Not Quite| Main | Signal-to-Nothing Ratio »

Functional TDD: Codd is my Copilot

| | Comments (1)

Let's play a little game.

You work for the IRS. Your job today is to select a list of people with large bank balances so that you can do some kind of indescribably-complex tax-stuff to them. Your boss would like a list of these guys on his desk by noon.

SELECT FirstName, LastName, SSN FROM RICHGUYS WHERE Wealth > 250000

Now before you start in with the exceptions, this extremely tiny program provides business value. It's something your customer is going to use over and over again. Over time, it is planned to become part of a larger program which chases down tax cheats overseas and posts their personal information on Facebook. This is just the first increment.

According to some, you shouldn't write a lick of code without writing a test first. I think Bob Martin said to code without testing first is something a "caveman" would do. But how would you do that with a SELECT statement? I guess you could create some mock data, then run the select and test against the results, but that makes no sense. Plus if you test that way, you're only testing the database engine at that point, not the code. In fact, the only thing to test here is whether or not you understood what the requirement is -- is the data being returned and transformed the way the customer wanted or not? The only real test is a business test, not a programming test.

TDD doesn't work with SQL. Most people develop this code by, well, just typing it in and running it, probably on a development copy of the database. Either it works or it doesn't. Once it works, it works. You can move on to other stuff. The test and the business acceptance are really the same thing.

Ok, so perhaps simple SELECT statements are really silly to test. Let's move up a notch in complexity. Over time, you build up a library of SELECT, UPDATE, and transform statements. None of them have a cursor and all of them are atomic transforms -- data comes in one way and goes out another. Each of them do what they are supposed to do. None of them have been tested, at least not in the TDD sense.

A funny little secret comes into play here that is going to change programming forever.

In a fully normalized database, you can do anything the customer wants by just using SQL transforms.

It's such a wild concept that I don't think many imperative programmers really grok it. If your relational model is tight, all you need are selects and transforms.

To see why, let's switch back over to Object-Oriented Programming land. Let's say you have the following function.

class Foo
...
public coolThing doSomethingCool(myInputStream streamInput, currentAccount account)
    {
        accountDelta = myInputStream.getLatestAccountChange();
        reallyCoolThingGenerator rctg = 
            new reallyCoolThingGenerator(currentAccount.latestProfile);
        return rctg.CoolestThingCreator(
            coolThingProfile.ThingThatIsBlue);
    }
...

Ok, ok, I confess -- this could be a much better example. A really good critique of this is that the nouns and verbs really don't describe much of anything useful at all. So let's get past that. Let's pretend that these nouns and verbs exactly describe something of utmost business importance.

The thing is, even with totally descriptive nouns and verbs that describe exactly what the code is doing, this is a freaking nightmare to test. A few lines of code, and already testing is a monster.

Why? For one thing, even if you generally understand the nouns, the complexity of the concepts are hidden. I count at least seven classes or objects being referenced in four lines of code. Do you see all of them? Each of these classes or objects has it's own state, private variables, getters and setters. Each could be hundreds of lines long. Each could be in a state that is non-predictable and non-deterministic.

Worse yet, each of those objects may easily reference a dozen other objects, all with similar complexity and varying states.

This is why you can never have 100% code coverage -- even moderately-complex OOP systems have an almost infinite variety of states they can be in when your function "doSomethingCool" is called. In for a penny, in for a pound. Yes, if you thoroughly-tested each class and method as you built it, and you tested before you coded, you have the maximum shot at a stable or mostly-known environment entering and executing your code. But most of us work in some kind of combination of legacy and new code, so even if we religiously test as we go along, there's a lot going on in far away places that can have immediate impact on our methods.

Or put another way, OOP gives us tools for creating and managing complex structures of code and data. It gives us a place to put our stuff, and allows us to work in larger teams. But this very ability to organize things and still provide access to them brings in a host of tacit coupling. Or to put it in very crude terms, the shit is all connected to each other no matter how good of a job you do of putting it all together. And it's connected in ways that you either don't see or don't think about as much as you should (or can).

Reminds me of a team I met once that was working on one of the first e-commerce sites. They were so afraid of fraud that they immediately encrypted data coming in over the wire using an IBM crypto card. Each method received an encrypted blob (binary object). Many times it sent the blob out to other objects which did things and sent back another encrypted blob. So you had unkown stuff coming in, unknown stuff happening to it, and you returned more unknown stuff back to the caller. And for some reason these guys had the toughest time debugging! Go figure. (They solved this debugging problem by using late-binding and turning all errors off. Sometimes the program ran, sometimes it didn't, sometimes it did strange things nobody could understand. But they were all happy and paid well, and each of them had jobs for life)

Functional programming has a neat attack vector on hidden coupling. In a pure functional environment, it's like SQL: everything is a select and transform. There's no hidden effects or state issues because all the state is right in front of you and it's all painfully obvious.

Want to find rich folks who have a house and contributed money in the last election in F#?

richGuys |> 
    List.filter(fun x->x.Wealth > 250000 && 
    (electionData |> List.exists(fun y->y.contributedMoney &&
         y.ID = x.ID)));;

When developing, I might split up those 3 transforms into separate lines, so I can look at them to make sure they are returning what I want. This is exactly the same thing as the SQL programmer writing and "testing" his SQL statement. Or I might just collapse them all on one line. It's a matter of readability and maintenance, really. It's all the same to the compiler.

But test first? Just doesn't make any sense.

But wait, it gets better! Because true functional programming is just passing around immutable structures of data and doing transforms on them, it doesn't matter where the data is processed. I could send it out to Neptune and wait ten years for it to come back. It would be just as correct as it is today. The removal of tacit coupling also removes a lot of scaling and cross-talk problems. At the CPU level, as it is humming along it gets a hunk of data and some code to run on it, and it knows that this bunch of data and code is not going to be accessing data and code in other places. That means it can run a lot faster.This is a huge difference from an OOP run through the CPU, where the CPU is trying to figure out how much data and code it can take at a time, and the code keeps accessing code from other places and data from other places. This jumping around and not knowing what is connected to what isn't just tough on the programmer, it's also tough on the CPU. But with Functional Programming, it's just a simple job. Go do the job.

This also means that I can "splay" the data out over hundreds or thousands of machines, have them all work on it in parallel, and then have the data come back to me. Ever wonder how Google manages to work so quickly and your OOP website manages to work so slowly? Sure there are legion of technical tips and tricks, but a lot of it boils down to the critical difference I've described between OOP and FP.

Yes, it's hard to program in FP. And yes, maintenance and readability can be a big issue. But guess what? We already have great tools for dealing with maintenance and readability, they're called code reviews and pair programming.

The future, at least for high-performing teams, is looking to me like pairs of pure functional programmers working from the ground-up, adding in bits of OOP-like structure as the codebase grows more complex. All those OOP skills are going to come in handy both in adding complexity and in creating data structures, but not in building scaffolding that other code sits inside. It's a completely different world than most people in this generation are working in, but I know it's going to be a blast.

Like SQL, functional programming just works. Or rather, it works or it doesn't work. There's not a lot of in-between states. And when it comes to using TDD, I'd rather stick with good data modeling and FP than trying to play whack-a-mole with OOP and TDD. Functional programming, it's so simple even a caveman can do it.

Codd is my copilot.

EDIT: I didn't say you wouldn't debug, or that you wouldn't do any testing at all. I'm simply saying at the transform layer, TDD doesn't make much sense. I'm also not trying to imply that FP is magical or that somehow testing goes away. The point is that FP coding and development is much different than imperative; different enough to warrant changes in the way we develop. Apologies if I left the impression that was any sort of magic involved. There is not.

1 Comment

Something that doesn't get much notice is that Codd's paper introducing relational databases and Backus' paper introducing functional programming are very similar: Codd introduced an "algebra of relations" while Backus introduced an "algebra of functions".

The key is that in algebra a relation can have one or more of these four properties: left-unique, right-unique, left-total, right-total.

A function is any relational that has at least the right-unique and left-total properties. So "algebra of functions" is just a different way of saying "algebra of relations, but only right-unique, left-total relations".

Hence the similarity.

Left-uniqueness is the difference between "map" and "reduce" in fp terms.

Leave a comment


Comment Policy: I really, really, really enjoy comments, but if all you have to offer is general platitudes like how happy you are to have found my site and what a wonderful place it is, I will delete your comment and report your comment as spam. Please try to either tell me I am wrong, sympathize with my point, expand on what I'm saying, or offer your own experiences or opinions. If you just want a link your best bet is to just ask for one. Probably won't work, but at least be honest about it. No name-calling and please keep the profanity as low as possible. If your grandma can't read it or you wouldn't say it in person, don't write it here. Thanks.

About this Entry

This page contains a single entry by DanielBMarkham published on September 13, 2010 8:44 AM.

Google Chrome and .NET - Almost, But Not Quite was the previous entry in this blog.

Signal-to-Nothing Ratio is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.





Share Bookmark this on Delicious

Recent Comments

  • David Mathers: Something that doesn't get much notice is that Codd's paper read more

Information you might find handy
(other sites I have worked on)





Recently I created a list of books that hackers recommend to each other -- what are the books super hackers use to help guide them form their own startups and make millions? hn-books might be a site you'd like to check out.
On the low-end of the spectrum, I realized that a lot of people have problems logging into Facebook, of all things. So I created a micro-site to help folks learn how to log-in correctly, and to share various funny pictures and such that folks might like to share with their friends. It's called (appropriately enough) facebook login help