« Google Chrome and .NET - Almost, But Not Quite| Main | Signal-to-Nothing Ratio »

Functional TDD: Codd is my Copilot

By DanielBMarkham on September 13, 2010 8:44 AM | Permalink | Comments (1)

Let's play a little game.

You work for the IRS. Your job today is to select a list of people with large bank balances so that you can do some kind of indescribably-complex tax-stuff to them. Your boss would like a list of these guys on his desk by noon.

SELECT FirstName, LastName, SSN FROM RICHGUYS WHERE Wealth > 250000

Now before you start in with the exceptions, this extremely tiny program provides business value. It's something your customer is going to use over and over again. Over time, it is planned to become part of a larger program which chases down tax cheats overseas and posts their personal information on Facebook. This is just the first increment.

According to some, you shouldn't write a lick of code without writing a test first. I think Bob Martin said to code without testing first is something a "caveman" would do. But how would you do that with a SELECT statement? I guess you could create some mock data, then run the select and test against the results, but that makes no sense. Plus if you test that way, you're only testing the database engine at that point, not the code. In fact, the only thing to test here is whether or not you understood what the requirement is -- is the data being returned and transformed the way the customer wanted or not? The only real test is a business test, not a programming test.

TDD doesn't work with SQL. Most people develop this code by, well, just typing it in and running it, probably on a development copy of the database. Either it works or it doesn't. Once it works, it works. You can move on to other stuff. The test and the business acceptance are really the same thing.

Ok, so perhaps simple SELECT statements are really silly to test. Let's move up a notch in complexity. Over time, you build up a library of SELECT, UPDATE, and transform statements. None of them have a cursor and all of them are atomic transforms -- data comes in one way and goes out another. Each of them do what they are supposed to do. None of them have been tested, at least not in the TDD sense.

A funny little secret comes into play here that is going to change programming forever.

In a fully normalized database, you can do anything the customer wants by just using SQL transforms.

It's such a wild concept that I don't think many imperative programmers really grok it. If your relational model is tight, all you need are selects and transforms.

To see why, let's switch back over to Object-Oriented Programming land. Let's say you have the following function.

class Foo
...
public coolThing doSomethingCool(myInputStream streamInput, currentAccount account)
    {
        accountDelta = myInputStream.getLatestAccountChange();
        reallyCoolThingGenerator rctg = 
            new reallyCoolThingGenerator(currentAccount.latestProfile);
        return rctg.CoolestThingCreator(
            coolThingProfile.ThingThatIsBlue);
    }
...

Ok, ok, I confess -- this could be a much better example. A really good critique of this is that the nouns and verbs really don't describe much of anything useful at all. So let's get past that. Let's pretend that these nouns and verbs exactly describe something of utmost business importance.

The thing is, even with totally descriptive nouns and verbs that describe exactly what the code is doing, this is a freaking nightmare to test. A few lines of code, and already testing is a monster.

Why? For one thing, even if you generally understand the nouns, the complexity of the concepts are hidden. I count at least seven classes or objects being referenced in four lines of code. Do you see all of them? Each of these classes or objects has it's own state, private variables, getters and setters. Each could be hundreds of lines long. Each could be in a state that is non-predictable and non-deterministic.

Worse yet, each of those objects may easily reference a dozen other objects, all with similar complexity and varying states.

This is why you can never have 100% code coverage -- even moderately-complex OOP systems have an almost infinite variety of states they can be in when your function "doSomethingCool" is called. In for a penny, in for a pound. Yes, if you thoroughly-tested each class and method as you built it, and you tested before you coded, you have the maximum shot at a stable or mostly-known environment entering and executing your code. But most of us work in some kind of combination of legacy and new code, so even if we religiously test as we go along, there's a lot going on in far away places that can have immediate impact on our methods.

Or put another way, OOP gives us tools for creating and managing complex structures of code and data. It gives us a place to put our stuff, and allows us to work in larger teams. But this very ability to organize things and still provide access to them brings in a host of tacit coupling. Or to put it in very crude terms, the shit is all connected to each other no matter how good of a job you do of putting it all together. And it's connected in ways that you either don't see or don't think about as much as you should (or can).

Reminds me of a team I met once that was working on one of the first e-commerce sites. They were so afraid of fraud that they immediately encrypted data coming in over the wire using an IBM crypto card. Each method received an encrypted blob (binary object). Many times it sent the blob out to other objects which did things and sent back another encrypted blob. So you had unkown stuff coming in, unknown stuff happening to it, and you returned more unknown stuff back to the caller. And for some reason these guys had the toughest time debugging! Go figure. (They solved this debugging problem by using late-binding and turning all errors off. Sometimes the program ran, sometimes it didn't, sometimes it did strange things nobody could understand. But they were all happy and paid well, and each of them had jobs for life)

Functional programming has a neat attack vector on hidden coupling. In a pure functional environment, it's like SQL: everything is a select and transform. There's no hidden effects or state issues because all the state is right in front of you and it's all painfully obvious.

Want to find rich folks who have a house and contributed money in the last election in F#?

richGuys |> 
    List.filter(fun x->x.Wealth > 250000 && 
    (electionData |> List.exists(fun y->y.contributedMoney &&
         y.ID = x.ID)));;

When developing, I might split up those 3 transforms into separate lines, so I can look at them to make sure they are returning what I want. This is exactly the same thing as the SQL programmer writing and "testing" his SQL statement. Or I might just collapse them all on one line. It's a matter of readability and maintenance, really. It's all the same to the compiler.

But test first? Just doesn't make any sense.

But wait, it gets better! Because true functional programming is just passing around immutable structures of data and doing transforms on them, it doesn't matter where the data is processed. I could send it out to Neptune and wait ten years for it to come back. It would be just as correct as it is today. The removal of tacit coupling also removes a lot of scaling and cross-talk problems. At the CPU level, as it is humming along it gets a hunk of data and some code to run on it, and it knows that this bunch of data and code is not going to be accessing data and code in other places. That means it can run a lot faster.This is a huge difference from an OOP run through the CPU, where the CPU is trying to figure out how much data and code it can take at a time, and the code keeps accessing code from other places and data from other places. This jumping around and not knowing what is connected to what isn't just tough on the programmer, it's also tough on the CPU. But with Functional Programming, it's just a simple job. Go do the job.

This also means that I can "splay" the data out over hundreds or thousands of machines, have them all work on it in parallel, and then have the data come back to me. Ever wonder how Google manages to work so quickly and your OOP website manages to work so slowly? Sure there are legion of technical tips and tricks, but a lot of it boils down to the critical difference I've described between OOP and FP.

Yes, it's hard to program in FP. And yes, maintenance and readability can be a big issue. But guess what? We already have great tools for dealing with maintenance and readability, they're called code reviews and pair programming.

The future, at least for high-performing teams, is looking to me like pairs of pure functional programmers working from the ground-up, adding in bits of OOP-like structure as the codebase grows more complex. All those OOP skills are going to come in handy both in adding complexity and in creating data structures, but not in building scaffolding that other code sits inside. It's a completely different world than most people in this generation are working in, but I know it's going to be a blast.

Like SQL, functional programming just works. Or rather, it works or it doesn't work. There's not a lot of in-between states. And when it comes to using TDD, I'd rather stick with good data modeling and FP than trying to play whack-a-mole with OOP and TDD. Functional programming, it's so simple even a caveman can do it.

Codd is my copilot.

EDIT: I didn't say you wouldn't debug, or that you wouldn't do any testing at all. I'm simply saying at the transform layer, TDD doesn't make much sense. I'm also not trying to imply that FP is magical or that somehow testing goes away. The point is that FP coding and development is much different than imperative; different enough to warrant changes in the way we develop. Apologies if I left the impression that was any sort of magic involved. There is not.

1 Comment

David Mathers | October 19, 2010 1:21 PM | Reply

Something that doesn't get much notice is that Codd's paper introducing relational databases and Backus' paper introducing functional programming are very similar: Codd introduced an "algebra of relations" while Backus introduced an "algebra of functions".

The key is that in algebra a relation can have one or more of these four properties: left-unique, right-unique, left-total, right-total.

A function is any relational that has at least the right-unique and left-total properties. So "algebra of functions" is just a different way of saying "algebra of relations, but only right-unique, left-total relations".

Hence the similarity.

Left-uniqueness is the difference between "map" and "reduce" in fp terms.

Comment Policy: I really, really, really enjoy comments, but if all you have to offer is general platitudes like how happy you are to have found my site and what a wonderful place it is, I will delete your comment and report your comment as spam. Please try to either tell me I am wrong, sympathize with my point, expand on what I'm saying, or offer your own experiences or opinions. If you just want a link your best bet is to just ask for one. Probably won't work, but at least be honest about it. No name-calling and please keep the profanity as low as possible. If your grandma can't read it or you wouldn't say it in person, don't write it here. Thanks.

Name

Email Address

URL

Remember personal info?

Comments (You may use HTML tags for style)

Information you might find handy
(other sites I have worked on)

Looking for information about Adecco payroll? The Adecco payroll website or just examples of a paycheck stub sample?

Paycheck-stub.com has your information. We try to provide anything related to paycheck stubs we can -- who would have thought that Adecco payroll was in demand? Must be a popular product. Looks like we're providing a gateway for folks interested in using/buying it.

Got a tingling in your feet? Did the doctor tell you it was neuropathy but you have no idea what that means?
There are many types of neuropathy in feet and legs that you can get. Some is natural, like when your foot goest to sleep. Some is not. It's important for the serious kind to know the symptoms of neuropathy hands feet so you'll know what to do. For instance, treating neuropathy in feet varies depending on the underlying causes. This site provides a grade-level explanation, videos, and links to more official sites to help in your research.

If your feet and legs can get neuropathy, how about the rest of you? Unfortunately the answer is yes, you can. For instance, there are lots of commercials on television right now about Poligrip Fixodent and other denture adhesives causing permanent neuropathy.
What can you look for in terms of a neuropathy treatment? Tell me more about neuropathy in feet, or give me the symptoms of peripheral neuropathy.
For all this and more, like Fixodent Neuropathy, Poligrip Neuropathy, The Fixodent lawsuit, or the Poligrip Lawsuit, check out the site.

On a lighter note, who doesn't like hamburger casserole recipes? Whether it's that hamburger tater tot casserole, the award winning casserole recipes, the hamburger potato casserole, the hamburger noodle casserole, or the yummy ground beef stroganoff recipes, this has to be one of my favorite sites ever.

Recently I created a list of books that hackers recommend to each other -- what are the books super hackers use to help guide them form their own startups and make millions? hn-books might be a site you'd like to check out.
On the low-end of the spectrum, I realized that a lot of people have problems logging into Facebook, of all things. So I created a micro-site to help folks learn how to log-in correctly, and to share various funny pictures and such that folks might like to share with their friends. It's called (appropriately enough) facebook login help

Functional TDD: Codd is my Copilot

Tags:

1 Comment

Leave a comment

About this Entry

Recent Comments

Recent Entries

Search

Links

Functional TDD: Codd is my Copilot

Tags:

1 Comment

Leave a comment

About this Entry

Recent Comments

Recent Entries

Tag Cloud

Search

Sign In

Links