« The Outsider| Main | Feb 2010 New F# Compiler Bugs »

Answering PG'S Arc Challenge: On the Road to a DSL

| | Comments (0)

I'm building a new startup -- it allows people to collect and share quotes from books and web articles. As you add each quote, you tag it. When people vote up or down your quote (or comment on it), the system trains itself to learn which tags each user likes. I may like quotes from American History. You may never want to see any quotes about politics. Over time, the system learns this and acts accordingly. That way you can have a broad range of subjects with a large user base and the app still has the feel of a private forum.

A while back, Paul Graham wrote a language called Arc. After he wrote it, he challenged other languages to create a simple set of web pages in as few tokens as possible. In Paul's philosophy, the fewer tokens a language has (or needs) the more robust it is. Therefore the more likely it is to last a hundred years

I've been thinking about Paul's assertion for over a year now. I've programmed in lots of languages -- to me they're just tools. Old friends. I can't say I am crazy about one language or another, no matter how many tokens it has.

As I and others pointed out, you can make a computer language do almost anything in as few tokens as you like as long as you've set up a DSL (Domain-Specific Language) for the problem domain.

Since I'm building my product almost from scratch, I thought I would take you through a quick tour of how you end up with powerful "languages" that have maximum expressiveness and minimum tokens, no matter what tools you are using. For this discussion, we'll stick to a (mostly) .NET stack, with some major modifications, but the stack is really not important.

First let's start with a design goal: I want to consume the fewest amount of processor cycles possible for a read as opposed to an insert or update. The ratio of casual readers to interactive ones is pretty high -- no point in wasting lots of CPU time displaying the same static page over and over again. So let's use regular html files to represent the state of the system for the casual user. Everything will have an URL, and the URL will simply be an html file. You want to see quote number 7 and the comments? Go to Quote7.htm. This allows us to use a CDN (Content Delivery Network) at some future point if scalability becomes an issue. It allows the site to work for non-javascript browsers. And it's more Google-friendly.

Second, I also want the displays customized for the registered user and updating immediately as new information comes in. This means we'll need to get data from the server back to the html file. I'll use JSONP for that. It's lightweight and easy-to-use. Delivering the JSONP to the html file will be a .NET aspx page. The ASP.NET part of this really matters the least: it's just serving up javascript. We could do the same with CGI on any platform. ASP.NET is just a handy tool.

Third, I want to use a functional programming language. You don't really learn something well until you build something with it, and this looks like a great project for F#.

Now let's get started. Here's basically how it went.

  1. Create some mock-up screens: At this early point the UI sucks, but that's not the point. Let's sketch out some screens, then kick out some basic html, to see what we want the site to do. As a default, let's make Phase 1 functionality a lot like a standard forum, say Reddit or HackerNews. Send it out to some sample users and tweak. After these first non-functional mock-ups, the rest of the mock-ups will be working models

  2. Create a persistence layer (Design): At first I thought I'd just keep everything in memory, maybe flushing the data out to a flat file. As I thought about it some more, however, I realized that something a little more structured has it's benefits, especially in sharding and replication. So I decided to go wtih SQL Server. I sketched out a rough data model (took about an hour), created the DDL. and wrote some sample code to update and retrieve one of the tables.

  3. Create a persistence strategy (Generalize the DAL): Since reads are much more common than writes, and since I own the whole box, I'll just cache up everything in memory and pull from there. Every so often -- say once a minute -- I'll refresh the cache. Since the cache is always being updated on writes too, the information is always live, just the database isn't getting thrashed for a bunch of reads of the same data in different combinations. The database, in fact, is just a set-based persistence store, not a huge player in the system. This means that by rewriting the DAL I can target other persistence stores somewhat easily. F# is very close to OCAML. So without too much effort I should be able to switch to a linux environment, if necessary.

  4. Template the DAL: This is what separates the pros from the punks. As my understanding of the problem domain increases, the data structures needed to solve the problem evolve. I want to make that evolution as painless as possible. Write up a script to take the Data Model, create a database, create a DAL in F#, and enforce the persistence strategy. This takes about 100 lines of code in CodeSmith, but represents all kinds of computational whatnot and eliminates huge swaths of typos and inconsistencies. As the data model changes, push the button and have the underlying library update itself. This reduces errors greatly and simplifies the underlying complexity

  5. Create the client-server interfaces (Design): Take a few pages and send and receive data. I have some old JSONP code lying around from a previous startup, so this was easy to do

  6. Generalize the client-server interfaces: As I do more pages, come up with a standard format for calls, how to handle long-format data, user errors, and data formats. This is also easy to do when you use functionality to drive out structure and keep on the lookout for re-factoring opportunities

  7. Template the client-server interfaces: Create a general way of updating the screen that uses templates and tokens. This means less code and more power -- easy changes to the UI.

  8. Template the html files themselves: Create a template for, say, the user's home page. Then when the user's information changes, run the template and update the page. This also allows easy RSS feeds -- an RSS feed is just another version of a template
So now we have a templated way of changing the data -- simply change some tokens in the UML model (or DDL), push a button, and the data structure is changed. We have a templated way of updating the live data on the page: simply write a template for the type of data to retrieve, then run the template through the javascipt engine. We even have a templated way to create pages: Create a page template and have the server update the template whenever the associated information changes.

This puts a lot of power and flexibility in the hands of the developer, but there's one final step: generalizing the tokens and writing the lexer/parser. This takes all of those generics we've been playing around with and creates a new language, a Domain-Specific Language, which addresses the problem of creating static web pages and updating them with dynamic data from the server.

This means creating a set of tokens on the server to represent common nouns and verbs. Stuff like top_user_list_by_karma, sort_by_age, or latest_quotes.

With such a DSL, you'd simply write something like "Update_Page (foo_Template) bind to User and Quote with latest_quotes and _current_user" which gets you a robust page with all kinds of controls and displays based on the current user and the latest quotes on the system -- all with dynamically-updatable data. The only piece that is missing from this code is the page template itself (foo) which can be a bastardized html with some custom attributes.

I remember reading "Founders at Work" where Paul was discussing being able to change ViaWeb almost instantaneously while customers were on the phone with complaints/feature requests. Having a DSL allows you do to this without having to recode or even program. This is where all this generalizing and such takes you -- to a place where your business can pivot easily to find and address new business opportunities as you see them, instead of being trapped inside a programming paradigm that makes even simple changes painful.

But here's the kicker: even though I used a functional language for this, all languages will evolve to this same spot given enough attention to the problem domain and how you're going to solve it. The pattern is even the same: Design, Generalize, Template, Tokenize. And no, this isn't some kind of crazy architect astronaut adventure on my part. I didn't spend a huge amount of time to reach this spot -- maybe six weeks? And I'm just a sole developer.

Once I'm done, all of that work becomes a language of it's own: a language with the fewest number of tokens required to fix people's problems. I know this because I'm not developing any tokens that aren't being used in the solution, and I'm constantly on the lookout for genericizing tokens. Since I'm not touching the code anymore, it's part of my "standard library" of tools to use. I can share this with other developers and, without programming, they can fix similar problems with minimum tokens and maximum expressiveness too.

Paul has included similar web functionality in Arc because he felt building web pages are a hugely important part of a library. He asks if other languages also have the same priorities.

They may not. Instead of easy-to-create web pages, many of them have easy-to-create generics, or easy-to-create scalability patterns, or sorting, or file translating, or peer-to-peer access, or myriad other things.

To me, however, the question isn't which DSL/library is included in a language, it's how easy it is to create your own DSL in that language. It's here where I think functional languages do much better job than OOP languages. For instance, F# comes with it's own lexer and parser, whereas with most OOP languages you'd have to pull something off the shelf. Also functions as first-class citizens bend themselves much easier to tokenization than objects do.

But at the end of the day, no matter where you start, you end up in the same spot.

So what your language can do easily, to me, isn't much of a useful question. More useful would be "what libraries can you easily access and how easy is it to create your own DSL for your own designs?" That's where the real difference is made.



Daniel (and sock-monkey) are current applicants for Y Combinator and are looking for Angels to join them in their venture E-mail them if you are interested in starting a conversation about this.

Leave a comment

About this Entry

This page contains a single entry by DanielBMarkham published on February 19, 2010 12:06 PM.

The Outsider was the previous entry in this blog.

Feb 2010 New F# Compiler Bugs is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.

Social Widgets





Share Bookmark this on Delicious

Information you might find handy
(other sites I have worked on)





Recently I created a list of books that hackers recommend to each other -- what are the books super hackers use to help guide them form their own startups and make millions? hn-books might be a site you'd like to check out.
On the low-end of the spectrum, I realized that a lot of people have problems logging into Facebook, of all things. So I created a micro-site to help folks learn how to log-in correctly, and to share various funny pictures and such that folks might like to share with their friends. It's called (appropriately enough) facebook login help