Arc Forumnew | comments | leaders | submitlogin
Types in Nulan
3 points by Pauan 4475 days ago | 11 comments
So, after much thinking, here's the current state of Nulan's type system.

Objects are suuuuuper-important in Nulan, much more so than most Lisps which just use cons for everything[1]. As I've explained in the past, everything in Nulan is immutable[2], so to "change" an object, you create a new object which is just like the old one but with the changes applied.

But what is an object, exactly? Well, I decided to go for an extremely simple model. An object is simply an ordered binary search tree with keys and values. In other words, a dictionary. This makes Nulan's objects basically the same as objects in JS or Lua, except that they're immutable, sorted, and don't have inheritance.

Now here's where types come in. A type is basically just a gensym, an atomic unique identifier. You can take this type and use it as a key in a dictionary.

Rather than thinking about is-a relationships (is it a number? a string? a duck?) you think about has-a relationships: does this object possess this type as a key?

Here's an example:

  $def %foo (type)
  
  $def foo
    (%foo F) -> (F 10)
Okay, we've defined the variable "%foo" to be a new anonymous type, which is just a gensym. Now we define a "foo" function which pattern matches an object that has a "%foo" key and binds the variable "F" to the value. We then call "F" with the number 10.

How do you create new "instances" of the "%foo" type? Well, you can create a new object using the {} syntax, just like in JS[3]:

  $def foo-instance { %foo (X -> X + 20) }
This created a binary search tree with the type "%foo" as its key, and the function (X -> X + 20) as its value. It then assigned that tree to the variable "foo-instance"

Now let's try passing it to the "foo" function:

  (foo foo-instance)
The above will return 30. What happened is, the "foo" function destructured the object, looking for a "%foo" key. It found it, and bound the value to the variable "F". It then called "F" with 10, and "F" is the function (X -> X + 20) which thus returned 30.

And that's it. That's the entire type system. There's no inheritance, there's no mucking about with things, it's just simple flat objects with gensyms as keys.

Rather than thinking about types as being in some sort of hierarchy, you think about types as being attributes: does this object have this type? You can add, remove, and change types on the fly, which works out very well because objects are immutable.

---

That's great and all, but I ran into a slight snag. I wanted to have only two primitive things in Nulan: types and binary search trees. Types are atomic gensyms, and binary search trees are a combination of type/value pairs.

But when I got around to implementing things like symbols, strings, numbers, etc. it became a bit complicated. These things need to be implemented in terms of JavaScript strings and numbers, and they need at least the "%lt" and "%is" types. The problem is, how to store the data?

What I mean is, let's say you have two numbers. You need to implement a %lt method for them, but how does the left side know about the right side? In other words, how do I take this binary search tree and somehow extract the raw JavaScript number from it?

I can't store it as a JavaScript property because it'll be shadowed when the tree is changed, so it has to be stored inside the tree. Hm... I guess I could use a "&%js" type, which will convert a Nulan object to a JS object. That would be nice, but it does complicate things a lot...

The other option is to simply make strings, numbers, and symbols primitive. I'm actually really leaning toward this, but I'm not sure. The primary downside is that you can't just take a random string/number/symbol and slap arbitrary properties on it: they're atomic, just like types.

---

* [1]: Nulan doesn't have cons at all. Everything is binary search trees, which means "lists" are almost exactly like JavaScript arrays: they're just objects which have numeric keys, and a special "%len" type.

* [2]: The sole exception to this is variables, which are simply a single-celled box used to hold a value.

* [3]: You can also update existing objects in two ways:

  set foo-instance %foo ...
  
  { @foo-instance
    %foo ... }
The first way uses a function called "set" to update "foo-instance", whereas the second way uses the splicing form @. This has a whole slew of benefits... it makes it easy to merge objects together:

  { @foo @bar @qux }
It makes it easy to update an existing object with multiple properties:

  { @foo
    %bar   ...
    %qux   ...
    %corge ... }
It makes it easy to define the order in which things are merged:

  { %bar ...
    @foo }
The above will create an object that uses "foo"s definition of "%bar", but if it doesn't exist, it'll use "...". In other words, the above provides a default implementation of "%bar" if it doesn't already exist in "foo".


1 point by Pauan 4471 days ago | link

I'm getting closer to figuring this stuff out (I hope).

One problem with this super-simple system is that I wasn't able to figure out how functions fit into it... And fexprs just seemed to really muck things up even worse, and it's just a big pain and all.

But I wasn't really able to get rid of fexprs, or to be more specific, I couldn't find a better idea. Haskell has all this loveliness going on by using functions for pretty much everything, but I want to try a different route. I want to see how far I can push a functional OOP language with immutable objects and gensyms.

So for now, here's the plan. $vau will have to be built-in (duh) but using this system, it could in principle be written in user-land, provided you had a few other primitives like $quote, each, etc.

$vau will return an object. This object has a %call property, which lets you specify custom behavior when calling an object. This %call property is actually another object, which has an %argument, %environment, %body, and %closure properties.

Each property is a Nulan datatype, like an object or number or whatever, so this avoids the problem of using native JavaScript functions (I don't want to expose native things to Nulan that have the same purpose as Nulan data types, unless the reason is for speed).

That means that this vau:

  ($vau E [Args Body]
    (wrap (eval E [$vau ~ Args Body])))
Would be represented as this object:

  { %call { %argument    ($quote [Args Body])
            %environment ($quote E)
            %body        ($quote (wrap (eval E [$vau ~ Args Body])))
            %closure     ... }}
%closure would be the environment that the vau was defined in. This makes it possible not only to inspect the argument, body, environment, etc. of all vaus/functions, but also lets you modify them (keep in mind objects are immutable, so this doesn't cause any problems).

It also lets you attach arbitrary properties to vaus. Including other things, this means that (in principle) you can implement "wrap" in user-land:

  $def %wrapped (uniq)

  $def wrap
    $vau E [F]
      $let F (eval E F)
        { @($vau E X
             (F (each X; X -> (eval E X))))
          %wrapped F }
And then define $fn in terms of wrap, like usual:

  $def $fn
    $vau E [Args Body]
      wrap (eval E [$vau ~ Args Body])
And then define "unwrap":

  $def unwrap
    $fn [{ %wrapped F }]
      F
So I guess this means Nulan is going into the "everything is an object" direction, rather than the hardcore Haskell way which isn't too far from "everything is a function"

It'll be interesting to see how these immutable objects combine with the functional/recursive style of Nulan.

---

Oh yeah, and I've been playing with the idea of getting rid of var, so that there's no mutation whatsoever. Recursive functions would have to use the Y combinator. I'm not decided on this either.

-----

1 point by Pauan 4475 days ago | link

A quick comparison. First, in Nulan:

  $def %foo (type)
  $def foo
    (%foo F) -> (F 10)
  $def foo-instance { %foo (X -> X + 20) }
  (foo foo-instance)
Now in Arc:

  (def foo (x)
    (x!foo 10))
  (= foo-instance (obj foo (fn (x) (+ x 20))))
  (foo foo-instance)
Now in Arc (hygienic):

  (= foo-t (uniq))
  (def foo (x)
    (x.foo-t 10))
  (= foo-instance (listtab:list
                    (list foo-t (fn (x) (+ x 20)))))
  (foo foo-instance)
Now in JS:

  function foo(x) {
    return x.foo(10)
  }
  var fooInstance = { foo: function (x) { return x + 20 } }
  foo(fooInstance)
Now in Python:

  def foo(x):
    return x["foo"](10)
  fooInstance = { "foo": lambda x: x + 20 }
  foo(fooInstance)
Interestingly enough, Arc and Python are shorter than Nulan by 1 line. But there's a HUGE benefit to the Nulan code: it's hygienic. Arc, JS, and Python use string/symbol keys, which means if you pass in an object that just so happens to have a "foo" key, but different behavior, things will break.

Nulan, however, uses types, which are unique identifiers, so it's impossible to accidentally add a %foo type to something: there's no collision.

As you can see, the hygienic version of the Arc code sucks, because Arc doesn't have a good way to create a hash table where the keys are evaluated. That would be easy to fix with a macro, putting it on par with Nulan. Unfortunately, Arc really isn't designed for this kind of type system, whereas Nulan is. But I can easily imagine a dialect of Arc that uses Nulan's type system.

There's a proposal for ECMAScript to add in Private Names, which are basically just gensyms: http://wiki.ecmascript.org/doku.php?id=strawman:private_name... That would give JS roughly the same capabilities of Nulan, albeit with worse syntax and much more complicated semantics.

Unfortunately, even if it gets added to JS (I sure hope so!) the existing language doesn't make use of it. For instance, arrays have a magical "length" property, where "length" is a string. Maybe that'll get fixed eventually.

---

Oh yeah, and it's possible to use the unhygienic version in Nulan too, by using string keys:

  $def foo
    ("foo" F) -> (F 10)
  $def foo-instance { "foo" (X -> X + 20) }
  (foo foo-instance)
But I don't recommend this. The one place where I'd consider it okay is to have an object as a value of a type:

  $def %foo (type)

  { %foo { "bar"   ...
           "qux"   ...
           "corge" ... } }

-----

2 points by Pauan 4475 days ago | link

Oh, I just decided, I'll use {} for object destructuring instead, so the "foo" function looks like this now:

  $def foo
    { %foo F } -> (F 10)
This lets you match multiple properties, and use splice:

  $def foo
    { %foo F %bar G @qux } -> (F 10)
The only problem is, now I don't know how to say "match an object that has only these properties". I guess I could add in an "only" pattern matcher...

  $def foo
    (only { %foo F }) -> (F 10)
I dunno, seems a bit... off.

-----

1 point by Pauan 4475 days ago | link

Also, I got the binary search tree code ported over to JS. It's looking quite nice, and it's mostly partitioned off so you can use it by itself in JS, without using the rest of Nulan:

https://github.com/Pauan/nulan/blob/master/Tree.js

You can just open up Chrome's JavaScript Console and paste in the code (https://raw.github.com/Pauan/nulan/master/Tree.js)

Now try it out like this:

  var x = NULAN.toTree({ foo: "bar", qux: "corge" })
  console.log("" + x)
  console.log(x.get("foo"))
  console.log("" + x.set("nou", "yes"))
  console.log("" + x)

-----

1 point by Pauan 4475 days ago | link

"The other option is to simply make strings, numbers, and symbols primitive. I'm actually really leaning toward this, but I'm not sure. The primary downside is that you can't just take a random string/number/symbol and slap arbitrary properties on it: they're atomic, just like types."

Okay, I think I got it figured out. Numbers, strings, and types will be atomic. Numbers and strings will just be JS numbers/strings, so they can be nice and faaaast. Symbols will be a tree that uses strings and has an %eval type, to give them special behavior when being evaluated.

-----

1 point by rocketnia 4475 days ago | link

There are a few more approaches if you're going for "everything's a foo" minimalism:

- Numbers and strings can be types. You already talk about using numbers as keys (aka types) for array access, so it's a bit surprising you haven't mentioned this option yourself.

- Numbers and strings can be objects whose types happen to be out of scope in user code. As far as the programmer is concerned, the core utilities were constructed with those types in scope, so it makes sense for them to be able to reach inside. I like the thought of everything being a weak table (and this philosophy underlies my garbage collector, Cairntaker), so this is the kind of choice I would make.

- Numbers and strings can be functions. Why not? :-p

-----

1 point by Pauan 4475 days ago | link

"Numbers and strings can be types."

Hm... I suppose that might work... yeah you're right that might be a really good idea. Thanks, I didn't think about that. But... hm... I wonder if that'll really work...

---

"Numbers and strings can be objects whose types happen to be out of scope in user code."

That's the problem. If I create the object using an ordinary closure to hide the data, how do I actually get the data when doing things like "lt" and "is" comparisons? I'd have to store it in the object itself, in which case now it's exposed to user code.

---

"Numbers and strings can be functions. Why not? :-p"

Because I'm trying to make a clean separation between data and algorithm. Though I already blur that a little by making $vau return an object. Plus I'm not sure how I'd implement that anyways, could you give an example?

-----

1 point by rocketnia 4470 days ago | link

"I'd have to store it in the object itself, in which case now it's exposed to user code."

If the type is out of scope in user code, then there's nothing user code can do to extract that value, right? The user code can't extract a list of types for an object, can it?

---

"Because I'm trying to make a clean separation between data and algorithm. Though I already blur that a little by making $vau return an object. Plus I'm not sure how I'd implement that anyways, could you give an example?"

I don't know what numbers and strings would do as functions, but they could potentially take a secret key and return some other internal representation. This secret key would not be in scope in user code. ;)

Of course, if user code can inspect lexical closure data anyway, it's impossible to hide things this way. I kinda recommend making the %closure key mean "the lexical closure, or nil if it happens to be unavailable."

-----

1 point by Pauan 4470 days ago | link

"If the type is out of scope in user code, then there's nothing user code can do to extract that value, right? The user code can't extract a list of types for an object, can it?"

I was thinking about letting code loop through an object, similar to the "for ... in" loop in JavaScript. An example of where that would be helpful is map, filter, etc.

If I gave that up, then, sure, I could store it in the object and user code wouldn't be able to reach it.

---

"Of course, if user code can inspect lexical closure data anyway, it's impossible to hide things this way. I kinda recommend making the %closure key mean "the lexical closure, or nil if it happens to be unavailable.""

That's right. I think if I'm going to go the "everything is an object" way, then I want everything to be open. Of course, I might change my mind and try some other route...

I think I've figured out what it is that I want in a programming language, at least at a high level. I want a programming language that's basically like Lego blocks. You have these little atomic pieces that are useful in and of themself, but because they share a similar interface, you can plug them together to create bigger pieces.

In other words, it's the Unix philosophy of "write programs that do one thing and do it well" as well as the functional philosophy of "avoiding state". The reason for this is to be able to treat the program as a black box that can be plugged into other black boxes to get the behavior.

I've given up on extensibility and being able to hack other people's code. Why? If their code is insufficient, it's probably easier to just rewrite it from scratch anyways. As long as the new code has the same interface as the old code, it'll work just fine.

So I want a language that has a lot of power and flexibility for defining interfaces, and also a lot of fine-grained power for taking programs and treating them as black boxes. Using objects for everything defeats that, if I allow code to loop through the keys of objects.

-----

1 point by Pauan 4470 days ago | link

I guess I can make vau a primitive, which makes that part easier. I'm still trying to figure out the best way to create interfaces, though...

Haskell's Monads are quite elegant. This is the best tutorial I've found thus far about them: http://mvanier.livejournal.com/3917.html

But I think my objects + gensyms are basically a simple system that lets Nulan have both monads and comonads. Still a lot to think about and figure out!

-----

1 point by Pauan 4474 days ago | link

"Okay, I think I got it figured out."

Okay, I think I got it really figured out. Symbols and numbers will be atomic. Everything else is a tree: strings will be a list of symbols.

I'll have gensyms too, which are atomic like symbols, but are unique. It would be possible to build gensyms using just numbers + trees, but I think it's more consistent to make them atomic, since they're a kind of symbol.

Types then will be actual genuine gensyms, rather than almost-gensyms.

-----