Arc Forumnew | comments | leaders | submitlogin
4 points by nlavine 6103 days ago | link | parent

Let me get at two points separately.

First, I have always taken symbols to be things that are conceptually different than strings, but just happened to be represented as strings. After all, symbols really appear because they are part of the interpreter/compiler data structures - they are a way of referencing an identifier, which you might use as, for instance, a variable name. Yes, they can be displayed, and they print as the string associated with their identifier, but I've always considered that a rather uninteresting side effect. Showing a symbol to a user would seem to break layers of abstraction.

Strings, on the other hand, always seemed like the right way to store user-readable text. They have these nice escape sequences, and you can isolate them and change them without affecting your entire program. They can also include spaces and nice stuff like that.

However, it is completely possible that my gut reaction is wrong. Having interned strings can be very useful. I would suggest, however, that we do it with a real interned string facility, and then make symbols a special case of that. This facility should include strings that would not make valid identifiers, including things with spaces and special characters in them.

To your second point, though, you're right - strings are pointless. After all, they're just a special case of a list (or an array, depending on how you like to think). If you have the general case, then having the specific too is pointless and feels bloated. You could argue that characters, too, are just a special case of numbers, since that's what they really are underneath. In fact, you could say that the current string type is just a vector of numbers, with cool special syntax.

To which I would say, yes, let's go for the more general stuff. Practically speaking, we should add that in before we take out the specific string case, which after all is used a lot. But in general, yeah, let's make it possible to do this with anything.

If you do allow this abstraction, you also get the interesting benefit that internationalization and unicode come for free. As PG says, they're just stupid bloat in the core of the language. In order not to have them there then, there should be some way for people to implement them, without sacrificing anything that a built-in implementation would have.

This means that there needs to be a way to make a new type which is a vector of uniform type (oddly enough, this might be easier in arc2c than in arcn or anarki). It also means that there should be a way to define nice readable syntax for things like strings in double quotes, and isolated single characters.

And it still doesn't address the issue of how you communicate with the development system, which after all does have one preferred string representation - the one that you're writing in.

This definitely wouldn't be an easy or simple thing to do, and it might prove to be a bad idea. But I think it's worth considering.



2 points by absz 6103 days ago | link

You make some really good points, but I have to disagree. Because of Unicode, characters aren't numbers. They have different semantic properties. However, I think arc.arc has a good idea sitting on line 2521 (of the Anarki version): "; could a string be (#\a #\b . "") ?" That is a novel idea, and probably (if more things work with improper lists) a good one. The downside would be that (isa "abc" 'string) wouldn't be true, unless we make strings (annotate "abc" 'string), but then we lose the ability to treat them as lists. Maybe we should have a retype operator, so that (retype "abc" 'string) would return an object str for which (isa str 'string) would be true, but for which all list operations (car, cdr, map, etc.) would still work (so it would probably have to return true for (isa str 'cons), too).

-----

2 points by almkglor 6103 days ago | link

> (#\a #\b . "")

Well, Arc lists are (e e . nil). Why do we have a different terminator for strings?

This is where things get hairy. In Scheme lists are (e e . ()), so I would say that having strings be (#\a #\b . "") would make sense there, but Arc specifically tries to pretend that () is nil. Why should "" be different from nil, when () is nil?

As for [isa _ 'string], I propose instead the following function:

  (def astring (e)
    ((afn (e e2)
       (if
         (no e)
           t
         ; check for circularity
         (is e e2)
           nil
         (and (acons e) (isa (car e) 'character))
           (self (cdr e) (cddr e2))
         ; else
           nil))
      e (cdr e)))
Edit:

Hmm. I suppose someone will complain that you might want to differentiate between the empty string and nil. To which I respond: How is nil different from the empty list? Arc attempts to unify them; why do we want to not unify the empty string with nil? If someone says http://example.com/foobar?x= , how different is that from http://example.com/foobar ? x is still empty/nil in both cases.

As another example: table values are deleted by simply assigning nil to the value field. And yet in practice it hasn't hurt any of my use of tables; has anyone here found a real use where having a present-but-value-is-nil key in a table?

-----