"I can't believe this never came up working with json"
I think order-preserving tables are pretty useful for JSON -- particularly to help ensure implementation details aren't leaking even in a distributed system where JSON is being passed through multiple parsers and serializers -- but it's worth noting that even JSON objects are specified to be unordered and are commonly treated as such. (RFC 8259: "An object is an unordered collection ...")
Even JavaScript's standard JSON.parse(...) doesn't preserve order the way you might like it to. Here's what I get in Firefox right now:
This is because JavaScript objects historically didn't preserve any particular iteration order, and although they have more consistent cross-browser behavior these days, they seem to have settled on some rules like iterating through numeric properties like "1" before other properties like "a". (From what I've found, I see people saying this behavior is a standard as of ES2015, but I don't see it in the ES2018 spec.)
JavaScript Maps are a different story; they're a newer addition to the language, and they're strictly specified to preserve insertion order. Other languages use dictionaries which preserve order, including Groovy (where [a: 1, b: 2] is a LinkedHashMap literal), Ruby, and... you've already found details on Python.
Even in Groovy, JavaScript (Maps), and Ruby, the only way to observe the order of their collections is to iterate from the beginning, and the only way to influence it is to insert entries at the end. This means any substantial interactions with the order of these collections will be just about as inefficient and inconvenient as if we had converted the whole collection to an association list, processed it that way, and converted it back. Essentially, it seems the order is preserved only to prevent programs from having language-implementation-dependent bugs, and maybe to help with recognizing values in debug output, not because it's a useful for programming.
Every one of these languages makes it easy to communicate the intent of an ordered collection by using lists or arrays of some sort. I'm pretty sure I've seen this lead to association lists even in JSON, particularly for things like database query results:
It could be me, but I have trouble imagining any situation where I would need this syntax.
- If I'm not doing both kinds of lookup several times in the same part of the code, the lookup styles don't both need to be concise at the same time. Different parts of the code can convert to different representations that are concise to use in that context.
- If it's the root layer of a data structure, then I can simply use two different local variables to refer to that data, one of which always treats it as a list and one of which always treats it as a table.
- Unless I'm processing some specific indices, I'll usually want to iterate through the whole data structure, so I usually wouldn't be using any indexing operations in the first place.
- If I know what specific indexes I want at development time, then I would usually use an unordered table (for easy indexing) or a fixed-length list (for easy destrucuring).
- If I'm performing any two lookups for the same reason, they can usually use the same expression in the code.
So I would have to be writing some code where I'm performing several different lookups of specific ordered indexes and specific chosen keyed indexes, all of which are for distinct reasons I know at development time but few of which are under indexes I know at development time. Moreover, each lookup must be two or more layers deep into a nested data structure.
This seems to me like a very specific situation. I suppose maybe it might come up more often in data-driven applications that have many scripted extensions which are specialized to certain configurations of data, but it seems to me even those would rarely care about specific ordered indexes.
Even what you've described about your application makes it sound like you only need to look things up by ordered indexes when the user wants to move an item from one stack to the next (or to the first). If that one part of your code is verbose, I recommend not worrying about it. If the rest of your code is verbose, how about converting your alists to tables whenever you enter that code?
I'm sorry, you have a very clear idea in mind of the data structure you want, and it's very reasonable to want to know how to build it in a way that gives you convenient syntax for it.
I think I'm just trying to encourage you to get to know other techniques because I think that's the easiest way to start. Building a custom data structure in Arc is not something many people go to the trouble to do, so it's a bit hard to come up with a good example.
I don't think you need any new ssyntaxes for this new data structure.
Say you have an ordered dictionary value `db` and you want convenient syntaxes to look up items by index (where 0 should give you the first item) and by key (where 0 should give you the item with key 0). One thing you might be able to do is this:
db!by-key.0
db!by-pos.0
To get these lookups working, you only need to use Anarki's `defcall`. Here's a stub that might get you started:
(defcall ordered-dict (self val)
(case val
by-key (fn (key) ...)
by-pos (fn (pos) ...)
(err "Expected by-key or by-pos")))
Unfortunately, this doesn't allow you to assign:
(= db!by-key.0 "val")
(= db!by-pos.0 "val")
To get these to work, you must know a little bit about how Arc implements assigment. The above two calls are essentially turn into this:
(sref db!by-key "val" 0)
(sref db!by-pos "val" 0)
This means you need to extend `sref` using Anarki's `defextend` to make these work.
But we've defined db!by-key and db!by-pos to be procedures that only know how to get the value, not how to set it. We need them to return something that knows how to do both.
So let's define another type, 'getset, which contains a getter function and a setter function.
(def make-getset (get set)
(annotate 'getset (list get set)))
(def getset-get (gs . args)
(apply rep.gs.0 args))
(def getset-set (gs val . args)
(apply rep.gs.1 val args))
You should pick some representation to use (annotate 'ordered-dict ...) with (or whatever other name you like for this type) so you can implement all four of these methods.
Note that we don't need to do (defextend sref ...) for ordered-dict values here because the `sref` calls don't ever get passed the table directly. You would only need that if you wanted (= db!by-key "val") or (= db.0 "val") to do something, but those look like buggy code to me.
So far, so good, but you probably want to do more with ordered dictionaries than getting and setting.
There's at least one more utility you might like to `defextend` in Anarki for use with ordered dictionaries: `walk`. A few Anarki utilities like `each` internally call `walk` to iterate over collections, so if you extend `walk`, you can get several Anarki utilities to work with ordered dictionaries all at once.
You may also want to make various other utilities for coercing between ordered dictionatires and other Anarki values, such as tables and alists. That way you can easily use Anarki's existing table and alist utilities when they work for your situation.
Note that this approach doesn't make it possible to use the {...} curly brace syntax you and shawn have been talking about. Since there are potentially many kinds of tables, I'm not sure whether giving them all read syntaxes really makes sense; we might start getting into some obscure punctuation. :-p
I hope that's a little bit more helpful advice for your situation. I tried to write up something like this the other day and got carried away implementing it all myself, but then I realized it might not be what you really wanted to use here. It didn't even take an approach as convenient to use as this one. Maybe this can give you enough of a starting point to reach a point you're happy with.
Isn't Common Lisp a language with a package system and unhygienic macros?
Common Lisp's approach is that the way a symbol is read incorporates information about the current namespace. That way usually all symbols, even quoted ones, can only have collisions if they have collisions within the same file, and this makes hygiene problems easier to debug on a per-file basis.
I don't think it's my favorite approach, but it could very well be a viable approach for Arc. I was using an approach somewhat like this in Lathe's namespace system, although instead of qualifying symbols at read time, I was qualifying each of them individually as needed, using Arc macros.
Good question, but ns.arc manipulates what Racket calls namespaces, which are data structures that carry certain state and variable bindings we might usually think of as "global," particularly the definitions of top-level variables.
What Common Lisp and Clojure call namespaces are like prefixes that get prepended to every symbol in a file, changing them from unqualified names into qualified names.
I think namespaces are a fine approach for Arc. If Anarki's going to have both, it's probably best to rename Anarki's interactions with Racket namespaces (like in ns.arc) so they're called "environments" or something, to reduce confusion. I think they will essentially fit the role of what Common Lisp calls environments.
Of course, people doing Racket interop will still need to know they're called namespaces on the Racket side. Is there another name we can use for Common Lisp style namespaces? "Qualifications" seems like it could work.
paulgraham: "Really? You've been mad at me for years for writing a new Lisp dialect? But new dialects are so common in the history of Lisp. I've probably used 20 in my life. And why be so attached to CL specifically?
In the old days, Lisp hackers always used multiple dialects, and basically tried to program as close to the platonic form of Lisp as they could modulo the flaws of whatever one they happened to be using. Don't things work that way now? Are there lots of people who are attached to CL specifically rather than Lisp generally?"
---
demoss: "What is annoying is that for 6 years now you have been building a following of people who go "Lisp is theoretically nice, but all the existing ones are SO full of onions! I'm going to wait for Arc to come out before I learn Lisp!""
---
death: "The cardinal rule of Lisp: don't reinvent, integrate."
paulgraham: "I don't know where you picked this up, but it seems the very opposite of the Lisp spirit to me. E.g. Steele and Sussman. Are you sure you didn't mean the cardinal rule of Java or something?"
For what it's worth, my approach here is pattern-matching. In Lathe Comforts for Racket I implement a macro `expect` which expands to Racket's `match` like so:
If Arc came with a similar pattern-matching DSL and `expect`, we could write this:
(def map1 (f xs)
(expect xs (cons x xs) ()
(cons (f x) (map1 f xs))))
The line "expect xs (cons x xs) ()" conveys "If xs isn't a cons cell, finish with an empty list. Otherwise, proceed with x and xs bound to its car and cdr."
"Clojure has good interop with java and that's what made Clojure explosive. If we can do that with Arc/Racket then we are better off for it."
Do we ever expect Anarki values to be somehow better than Racket values are? If so, then they shouldn't be the same values. (The occasional "interop headaches" are a symptom of Anarki values being more interchangeable with Racket values than they should be, giving people false hope that they'll be interchangeable all the time.)
I think this is why Arc originally tossed out Racket's macro system, its structure type system, and its module system. Arc macros, values, and libraries could potentially be better than Racket's, somehow, someday. If they didn't already have a better module system in mind, then maybe they were just optimistic that experimentation would get them there.
Maybe that's a failed experiment, especially in Anarki where we've had years to form consensus on better systems than Racket's, and aligning the language with Racket is for the best.
But I have a related but different experience with Cene; I have more concrete reasons to break interop there.
I'm building Cene largely because no other language has the kind of extensibility I want, even the languages I'm implementing it in. So it's not a surprise that Cene's modules aren't going to be able to interoperate with Racket's modules (much less JavaScript's modules) as peers. And since the design of user-defined macros and user-defined types ties into the design of the modules they're defined in, Cene can't really reuse Racket's macro system or first-class values either.
My comment is only a remark to "what holds back widespread Arc adoption".
If your goal is for arc to have widespread adoption then being able to leverage racket in a meaningful way will help get you there.
Currently the ability to drop into racket is not getting people to use arc, it still seems people would rather just use racket. http://arclanguage.org/item?id=20781
IMO, It would be better if arc had implicit methods that provide access to racket capabilities. In Clojure having libraries, name spaces, and a seamless interfaces to java translated into a plethora of libraries for Clojurians to utilize. Can we not do the same? Well if the goal is for "widespread adoption" then we need to.
Can you give an example of where the notation is better with tables than with alists? Maybe there's something you can do about it, e.g. writing a macro, using `defcall` or `defset`, or extending `sref`.
I recommend not expecting `quote` or `quasiquote` to be very useful outside the context of metaprogramming.
Quotation is helpful for metaprogramming in a tangible way, because we can easily refactor code in ways that move parts of it from being quoted to being unquoted or vice versa.
And quotation is limited to metaprogramming in a tangible way, because we can only quote data that's reasonably maintainable in the same format we're maintaining our other code in. For instance, an Arc `quote` or `quasiquote` operation is itself written inside an Arc source code file, which is plain text, so it isn't very useful for quoting graphics or audio data.
We can of course use other functions or macros to construct those kinds of data. That's essentially Arc's relationship with tables. When we've constructed tables, we've just used (obj ...) and (listtab ...) and such.
Adding tables to the language syntax is doable, but it could have some quirks.
; Should this cause an error, or should it result in the same thing as
; '(let i 0 `{,++.i "foo"}) or '(let i 0 `{,++.i "foo"})? Each option
; is a little surprising, since any slight edit to the code, like
; replacing one ++.i with (++ i 1), would give us valid code to
; construct a two-entry table, and this code very well might have
; arisen from a slight edit in the opposite direction.
'(let i 0
`{,++.i "foo" ,++.i "bar"})
; Should this always result in 2, or should it result in 1 if "foo"
; comes last in the table's iteration order?
(let x 0
`{"foo" ,(= x 1) "bar" ,(= x 2)}
x)
Personally, I tend to go the other way: I prefer to have as few kinds of data as possible in a language's quotable syntax.
A macroexpander needs to extract two things from the code at any given time: The name of the next macro to expand, and the region of syntax the macro should operate on. Symbols help encode macro names. Lists and symbols together help encode regions of plain text. I think it's for these reasons that symbols and lists are so essential to Arc's syntax.
Arc has other kinds of data that commonly occur in its quotable syntax, like strings and numbers, but it doesn't need them nearly as much as symbols and lists. Arc could expand symbols like |"Hello, world!"| and |123| as ssyntax, or it could simply let everyone put up with writing things like (string '|Hello, world!|) and (int:string '|123|) each time.
Tables would fit particularly well into a language's quotable syntax if they somehow helped encode regions of syntax. For instance, if a macro body consisted of all the files in a directory, then a table could be an appropriate represention of that file collection.
> I recommend not expecting `quote` or `quasiquote` to be very useful outside the context of metaprogramming.
My immediate reaction is to disagree. A lot of the reason Lisp is so great is that quasiquotation is orthogonal to macros/metaprogramming.
> ; Should this cause an error, or should it result in the same thing as
> ; '(let i 0 `{,++.i "foo"}) or '(let i 0 `{,++.i "foo"})?
Those two fragments are the same?
In general it feels unnecessarily confusing to include long doc comments in code fragments here. We're already using prose to describe the code before and after.
Code comments make sense when sharing a utility that you expect readers to copy/paste directly into a file to keep around on their disks. But I don't think that's what you intend here?
Finally, both your examples seem to be more about side effects in literals? That is a bad idea whether it's a table literal or not, and whether it uses quasiquoting or not. Do you have a different example to show the issue without relying on side-effects?
I've replied separately about why I would say quasiquotation is only useful for code generation. In this reply I'll focus on the topic of the quirks we might have to deal with if we have Arc tables as quasiquotable syntax.
I think they're mostly unrelated topics, but I was using the quirks of tables in `quasiquote` to motivate keeping the number of quasiquotable syntaxes small and focused. Since I believe quotation is essentially only good for code generation (as I explain in more detail in the other reply), my preference is generally to focus the quasiquotable syntaxes on that purpose alone.
---
"In general it feels unnecessarily confusing to include long doc comments in code fragments here. We're already using prose to describe the code before and after."
Sorry, and thanks for the feedback on this.
There's a deeper problem here where my posts can get a bit long, with a lot of asides. :) I thought of those code examples as an aside or a subsection. If you were going to skim over the code, I wanted it to be syntactically easy to skim over the related prose at the same time.
This was something I felt was particularly worth skipping over. Ultimately, the quirks of using tables as syntax are mostly just as easy to put up with as the quirks of using tables for anything else. (I've gone to the trouble to make what I think of as non-quirky tables for Cene, but it's a very elaborate design, and I wouldn't actually expect to see non-quirky tables in Arc.)
Since I was only using these quirks to motivate why `quasiquote` would tend to be focused on code generation, I probably didn't invest enough space to fully explain what the quirks were. I'll try to explain them now....
---
"Those two fragments are the same?"
Whoops, those two fragments were supposed to be '(let i 0 `{,++.i "foo"}) and '(let i 0 `{,++.i "bar"}).
---
"Finally, both your examples seem to be more about side effects in literals? That is a bad idea whether it's a table literal or not, and whether it uses quasiquoting or not. Do you have a different example to show the issue without relying on side-effects?"
I don't know if I'd say the unquoted-key example depends on side effects, but the unquoted-value example very much does. Here it is again:
(let x 0
`{"foo" ,(= x 1) "bar" ,(= x 2)}
x)
The quirk here is that the usual left-to-right evaluation order of Arc can't necessarily be guaranteed for table-based syntax, and if the evaluation order matters for any reason, it must be because of some kind of side effect.
Removing side effects from the language is a great remedy for this, but typically that kind of effort can only go so far. In an untyped language, we usually have to deal with the side effects of run time type errors and nontermination, even if we eliminate everything else:
Even if we commit to programming without any run time errors or nontermination (perhaps enforcing termination with the help of a type system like that of Coq or Agda), we still have some cases like this where the order matters:
A programmer in Arc or Racket might expect this program to reach a space limit relatively soon on machines with less than 64TB of space available, since Arc and Racket guarantee left-to-right evaluation order.
If the programmer actively intends for this program to fail fast, you and I will probably agree they would be better off sequencing the operations a little more explicitly, maybe like this:
But suppose the programmer doesn't initially realize the program will fail at all. It only crosses their mind when they come back to diagnose bugs in their code, at which point they expect these expressions to evaluate from left to right because that's what Arc and Racket normally guarantee.
That's when they have to realize that the tables in their syntax have gotten in the way of this guarantee.
Simple solution: We clearly document this so people don't expect left-to-right evaluation order in this situation.
Alternative simple solution: We make tables order-preserving so they can be evaluated as expected.
That covers the unquoted-value example.
Now let's consider the unquoted-key example:
'(let i 0
`{,++.i "foo" ,++.i "bar"})
In this one, the quirk is that the two occurrences of ,++.i are expressed with the same syntax, so at read time the table would have two identical keys, even though the programmer may expect them to express different behavior.
While it looks like this example depends on side effects (in this case mutation), I'm not so sure it does. Here's an alternative example which shows the same issue without necessarily using side effects:
This involves a hypothetical macro (current-location) which would expand to a string literal describing the filename, line, and column where it was expanded.
Is it a side effect? Maybe not; a file of code that used (current-location) would usually be semantically equivalent to a file that spelled out the same string literal by hand. In a language with separately compiled modules, both files might compile to the same result, which would make that semantic equivalence precise. In such a language, we typically wouldn't have any reason to mind if a module used (current-location) in its source code, even if we preferred to avoid it for some reason in our own code. This makes it into some kind of "safe" side effect, if it's even a side effect at all.
Nevertheless, within a single file, the expression (current-location) could look the same in two places but give different results.
That's where using `unquote` in table keys becomes quirky: The source code of two table keys may look identical (and hence cause a duplicate key conflict at the source code level) even if the programmer thinks of them as being different because they eventually generate different results.
Because of this quirk, the programmer may have to use some kind of workaround, like putting slightly different useless code into each key:
Simple solution: We clearly document this so programmers can use that workaround with confidence. To help make sure programmers are aware of this documentation, we report descriptive errors at read time or at "quasiquotation construction time" if a table would be made with duplicate keys.
Alternative simple solution: We decide never to allow table keys to be unquoted. If a table key appears to be unquoted, the table key actually consists of a list of the form (unquote ...). We still report errors at construction time or read time so programmers don't mistakenly believe `{same-key ,(foo) same-key ,(bar)} will evaluate both expressions (foo) and (bar).
Relying on the order arguments are evaluated in is always going to result in grief. Regardless of programming language. It's one of those noob mistakes that we've all made and learned from. I think we shouldn't be trying to protect people from such mistakes. I'd rather think about how we can get people to make such mistakes faster, so they can more rapidly build up the requisite scar tissue :)
So yes, we should document this, but not just in this particular case of tables. It feels more like something to bring up in the tutorial.
Edit: to be clear, I'm not (yet) supporting Kinnard's original proposal. I haven't fully digested it yet. I'm just responding to your comment in isolation ^_^
"My immediate reaction is to disagree. A lot of the reason Lisp is so great is that quasiquotation is orthogonal to macros/metaprogramming."
Do you have particular reasons in mind? It sounds like you're reserving those until you understand what I'm saying with my quasiquoted table examples, but I think those examples are mostly incidental to the point I'm making. (I'll clarify them in a separate reply.)
Maybe I can express this again in a different way.
I bet we can at least agree, on a definitional level, that quotation is good for constructing data out of data that's written directly in the code.
I contend quotation is only very useful when it comes to code generation.
If there were ever some kind of data we could quote that we couldn't use as program syntax, then we could just remove the quotation boundary and we'd have a fresh new design for a program syntax, which would bring us back up to parity between quotation and code generation.
In a Lispy language like Arc, usually it's possible to write a macro that acts as a synonym of `quote` itself. That means the set of things that can be passed to macros must be a superset of the things that can be passed to `quote`. Conversely, since all code should be quotable, the set of things that can be passed to `quote` must be a superset of the things passed to macros, so they're precisely the same set.
This time I've made it sound like some abstract property of macro system design, but it doesn't just come up in the design of an axiomatic language core; it comes up in the day-to-day use of the language, too. Quoted lists that don't begin with prefix operators are indented oddly compared to practically all the other lists in a Lispy program. I expect similar issues arise with syntax highlighting. In general, the habits and tooling we use with the language syntax don't treat quasiquoted non-code as a seamless part of the language. So, reserving quasiquotation for actual code generation purposes tends to let it help out in the places it really helps while keeping it out of the places where it causes awkward and distracting editor interactions.
> I bet we can at least agree, on a definitional level, that quotation is good for constructing data out of data that's written directly in the code.
No, I think I disagree there, assuming I'm understanding you correctly.
One common case where I used to use quasiquote was in data migrations, and there was never a macro in sight. I don't precisely remember a real use case involving RSS feeds and user data back in the day, but here's a made-up example.
Say you're running a MMORPG that started out in 2D, but you're now adding a third dimension, starting all players off at an elevation of 0m above sea level. Initially your user data is 2-tuples that look like this:
(lat long)
Now you want it to look like this:
(x y z)
..where x is the old latitude and z is the old longitude.
Here are two ways to perform this transform. Using quasiquote:
Hopefully that conveys the idea. Maybe the difference doesn't seem large, but imagine the schema gets more complex and more deeply nested. Having lots of `list` and `cons` tokens around is a drag.
I've always thought there's a deep duality between quasiquote and destructuring. Totally independent of macros.
"No, I think I disagree there, assuming I'm understanding you correctly."
That's interesting.... How would you describe what quotation does, then, if you wouldn't say it lets you write certain data directly in the code?
---
In your data migration example, I notice you're reading and writing the data. You're even putting newlines in it, which suggests you might sometimes view the contents of that written data directly. If you're viewing it directly, it makes sense to want the code that generates it to look similar to what it looks like in that representation.
It's not always feasible for code to resemble data, but since that file is plain text with s-expressions, and since the code that generates it is plain text with s-expressions, it is very possible: First you can pretend they're the exact same language, and then you can use `quasiquote` for code generation.
You might not have thought of it in that order, but I think the cases where `quasiquote` fails to be useful are exactly the cases where it's hard to pretend the generated data is in the same language as the code generating it.
---
"I've always thought there's a deep duality between quasiquote and destructuring."
I've always thought it would be more flexible if the first element of the list were a prefix operation, letting us destructure other things like tables and tagged values.
One of the few things I implemented in patmac.arc was a `quasiquote` pattern that resembles Arc destructuring just like you're talking about.
Racket doesn't need a library like patmac.arc because it already comes with a pattern-matching DSL with user-definable match expanders. One of Racket's built-in match syntaxes is `quasiquote`.
The main issue with alists is that the special syntax doesn't work and the notation is so verbose . . . I don't know if the efficiency issues would even come to bear for me.
EDIT: nesting is not behaving as I expect but that may be a product of my own misunderstanding.
I would be careful not to structure data/logic to accommodate a special syntax.
i.e while using:
pipe!todo.1!id
is certainly fancy, writing a function is just as effective and most likely more performant since it doesn't require read de-structuring:
(fetch pipe todo first id)
So I'm suggesting you shape your syntax usage around your data, not your data around your syntax. You can always write a macro to obtain your desired level of brevity.
The general idea behind the fix is that quoted literals need to be treated as data. Arc now has two new functions for this purpose: quoted and unquoted.
The fact that (quote {a 1}) now becomes a hash table is a little strange. I’m not entirely sure that’s correct behavior. It depends whether (car '({a 1})) should yield a hash table. It seems like it should, which is reified in the code now.
EDIT: Ok, I've force-pushed the fixed commit. (Sorry for the force-push.)
If you `git reset --hard HEAD~1 && git pull` it should work now.
Personally, I found I prefer racket's pretty-printing with the horrible hash tables compared to something like {pipe "water" a 1 b 2 c ...} because if you try to evaluate `items` or `profs` you won't have a clue what the data is without pretty-printing.
And it turns out I suck at writing pretty-printers. Someone else do it!
I almost never use (coerce param 'sym) when I can say sym.param instead. I've always thought Arc doesn't really do enough with the `coerce` function to justify having it in the language; individual functions like `sym` and `string` already do its job more concisely.
---
In practice when I used to write weakly typed utilities in Arc, I tended to find `zap` very nice:
If you're unfamiliar with `zap`, (zap sym param) is essentially (= param (sym param)).
I prefer strong typing these days, but I've sometimes thought this `zap` technique could be refined to put the coercions in the argument list directly. Arc has (o arg default) for optional arguments, and we could imagine a similar (z coercion arg) for automatically zapping an argument as it comes in:
Something else that's been on my mind is that it could be useful to have higher-order coercion combinators.
Racket code can use `->` to build a contract for a function out of contracts for its arguments and its result. The result of (-> string? string? symbol?) is a contract that verifies a value is a two-argument function and then replaces it with a function that delegates to that one, but which verifies that the arguments are strings and that the result is a symbol.
The same thing could be done for coercions: The result of (as-> string string sym) could be a coercion function that coerces its argument into a two-argument function that delegates to the original value, but which first coerces the arguments to strings and then coerces the result to a symbol.
Similarly, in Racket, `(listof symbol?)` is a contract that checks that a value is a list of symbols, and for coercions we could imagine a corresponding `(aslistof sym)` operation for use in your `(map [coerce _ 'sym] ...)` example.
Sometimes Arc's weak typing suffers from making poor guesses as to whether `nil` is intended as a symbol or as a list (not to mention as a boolean), and it takes some adjusting to work around it:
I overlooked the existence of `sym` which does make my request altogether superfluous. Higher-order coercion combinators will take some digestion on my part!
It looks like that's thanks to a March 8, 2012 commit by akkartik (https://github.com/arclanguage/anarki/commit/547d8966de76320...)... which, lol... Everything I was saying in a couple of recent threads about replacing the Arc reader to read mutable tables... I guess that's already in place. :)
I think you should have to quote them. Like how you have to quote lists:
'(foo bar)
is just a list, but
(foo bar)
will either call the function foo or error if it doesn't exist.
So in a similar way
{foo "bar"}
should give a syntax error, but maybe it can have some kind of semantic meaning later. I've been considering that square brackets could be used for assignment/local binding, to cut down on the need for LET (not necessarily in arc just in lisp in general).
I don't agree. quoting a list is a way to protect the expression from evaluation, in this case because round brackets normally indicate an expression that needs to be called. A table literal {...} doesn't need protection from evaluation as a callable expression as it's just data and like any other data it should evaluate to itself. And, frankly, it would really suck having to protect that data everywhere in my code because someone wants a really nuanced use case to work.
Really what should happen is that [] should be implemented such that we don't need to protect lists of data.
"The best part is that the server should be able to reboot without losing the closures."
Might want to remember to wipe them out if you make certain changes to the code, so that you don't have to think about the effects of running old code and new code in the same system.
(Edit: Oops, I replied out of order and didn't read shawn's comment with the elisp examples before writing this.)
I suspect what pg means by it is primarily that it's tricky to do in Racket (though I'm not sure if it'd be because there are too few options or too many).
Essentially, I think it's easy to display a closure by displaying its source code and all the captured values of its source code's free variables. (Note that this might be cyclic since functions are often recursive.)
But there is something tricky about it, which is: What language is the source code in? In my opinion, a language design is at its most customizable when even its built-in syntaxes are indistinguishable from user-defined ones. So the ideal displayed format of a function would ideally involve some particular set of user-definable syntaxes. Since a language designer can't anticipate what user-defined syntaxes will exist, clearly this decision should ultimately be up to a user. But what mechanism does a user use to express this decision?
As a baseline, there's at least one straightforward choice the user can make: The one that expresses the code in terms of Arc special forms (`fn`, `assign`, `if`, `quote`, etc.) and Arc functions that aren't implemented in Arc (`car`, `apply`, etc.). In a language that isn't aiming for flawless customizability, this is enough.
Now suppose we try to generalize this so a user can choose any set of syntaxes to express things in terms of -- say, "I want the usual language, but with `=` treated as an additional built-in." If the code contains `(= x (rfn ...))`, then the macroexpander at some point needs to expand the `rfn` without expanding the `=`. That's not really viable in terms of Arc's macroexpander since we don't even know `(rfn ...)` is an expression in this context until we process `=`. So this isn't quite the right generalization; the right generalization is something trickier.
I suppose we can have every function printed in terms of its pre-macroexpansion source code along with printing all the macros and other macroexpansion-time state it happened to rely on as it macroexpanded the first time. That would narrow down the problem to how to print the built-in functions. And we could solve that problem in the language design by making it so nothing ever captures a built-in function as a first-class value, only as a late-bound reference to the global variable it's accessible from.
Or we could have the user specify their own macroexpander and make it so that whenever a function is printed, if the current macroexpander hasn't expanded that function yet, it does so now (just to determine how the function is printed, not how it behaves). This would let the user specify, for instance, that `assign` expands into `=` and `=` expands into itself, rather than the other way around.
These ideas are incomplete, and I think making them complete would be pretty tricky.
In Cene, I have a different take on this: If a function is printable (and not all are), then it's printable because it's a callable struct with a tag the printer knows about. It would be printed as a struct. The function implementation wouldn't be printed. (The user could look up the source code information based on the struct tag, but that's usually not printable.) There may be some exceptions at the REPL where information is printed that usually isn't available, because the REPL is essentially a debugging context and the debugger sees all. (Racket's struct inspectors express a similar debugger-sees-all principle, but I haven't seen the REPL take advantage of it.)
You're hitting on a problem I've been thinking about for years. There are a few reasons this is tricky, notably related to detecting whether something is a variable reference or a variable declaration.
(%language arc
(let in (instring " foo")
(%language scm
(let-values (((a b c) (port-next-location in)))
(%language el
(with-current-buffer (generate-new-buffer "bar")
(insert (prin1-to-string c))
(current-buffer)))))))
To handle this example, you'll need to know whether each form is a function call, a variable definition, a list of definitions (let-values), a function call, and which target the function is being called for.
For example, an arc function call needs to expand into `(ar-apply foo ...)`
And due to syntax, you can't just handle all the cases by writing some hypothetical very-smart `ar-apply` function. If your arc compiler targets elisp, it's tempting to try something like this:
(ar-apply let (ar-apply (ar-apply a (list 1))) ...
which can collapse nicely back down to
(let ((a 1)) ...)
in other words, it's tempting to try to defer the "syntax concern" until after you've walked the code and expanded all the macros. Then you'd collapse the resulting expressions back down to the language-specific syntax.
But it quickly becomes apparent that this is a bad idea.
Another alternative is to have a "standard language" which all the nested languages must transpile tO:
(%let in (%call instring " foo")
(%let (a b c) (%call port-next-location in)
(|with-current-buffer| (%call generate-new-buffer "bar")
(%call insert (prin1-to-string c)
(%call current-buffer)))))
Now, that seems much better! You can take those expressions and easily spit out code for scheme, elisp, arc, or any other target. And from there it's just a matter of adding shims on each runtime.
The tricky case to note in the above example is with-current-buffer. It's an elisp macro, meaning it has to end up in functional position like (with-current-buffer ...) rather than something like (funcall #'with-current-buffer ...)
There are two ways to deal with this case. One is to hook into elisp's macroexpand function and expand the macros as you go. Emacs calls this eager macroexpansion, and there are some cases related to autoloading (I think?) that make this not necessarily a good idea.
The other way is to punt, and have the user indicate "this is an elisp form; do not mess with it."
The idea is that if the symbol in functional position is surrounded by pipe chars, then the compiler should leave its position alone but compile the arguments. So
Then you'll be in for a nasty surprise: not only does it look visually awful and annoying to write, but it won't work at all, because it'll compile to something like this:
(let (ar-funcall2 (a 1) (b 2))
(ar-funcall2 _+ a b))
I am not sure it's possible to escape the "syntax concern". Emacs itself had to deal with it for user macros. And the solution is unfortunately to specify the syntax of every form explicitly:
I can't speak to elisp, but the way macro systems work in Arc and Racket, the code inside a macro call could mean something completely different depending on the macro. Some macros could quote it, compile it in their own way, etc. So any code occurring in a macro call generally can't be transformed without changing the meaning of the program. Trying to detect and process other macro calls inside there is unreliable.
I have ideas in mind for how macro systems can express "Okay, this macro call is over; everything beyond this point in the s-expression is an expression." But that doesn't help with Arc or Racket, whose macro systems aren't designed for that.
So something like your situation, where you need to walk the code before knowing which macroexpander to subject each part of it to, can't reliably treat the code as code. It's better to treat the code as a meaningless soup of symbols and parentheses (or even as a string). You can walk through the data and find things like `(%language ...)` and treat those as escape sequences.
(What elisp is doing there looks like custom escape sequences, which I think is ultimately a more concise way of doing things if new macro definitions are rare. It gets into a middle ground between having s-expression soup and having a macro system that's designed for letting code be walked like this.)
Processing the scope of variables is a little difficult, so my escape sequences would be a bit more verbose than your example. It's not like we can't take a Racket expression and infer its free variables, but we can only do that if we're ready to call the Racket macroexpander, which isn't part of the approach I'm describing.
(I heard elisp is lexically scoped these days. Is that right?)
This is how I'd modify the escape sequences. This way it's clear what variables are passing between languages:
(%language arc ()
(let in (instring " foo")
(%language scm ((in in))
(let-values (((a b c) (port-next-location in)))
(%language el ((c c))
(with-current-buffer (generate-new-buffer "bar")
(insert (prin1-to-string c))
(current-buffer)))))))
Actually, instead of just (in in), I might also specify a named strategy for how to convert that value from an Arc value to a Racket value.
Anyhow, once we walk over this and process the expressions, we can wind up with generated code like this:
We also collect enough metadata in the process that we can write harnesses to call these blocks at the right times with the right values.
This is a general-purpose technique that should help with any combination of languages. It doesn't matter if they run in the same address space or anything; that kind of detail only changes what options you have for value marshalling strategies.
I think there's a somewhat more convenient approach that might be possible between Arc and Racket, since their macroexpanders both run in the same process and can trade off with each other: We can have an Arc macro that expands its body as Racket code (essentially Anarki's `$`) and a Racket macro that expands its body as Arc code. But there are some difficulties designing the latter, particularly in terms of Racket's approach to hygiene and its local macros, two things the Arc macroexpander has zero concept of. When we switch from Racket to Arc and back to Racket, the hygiene information and local macro scopes will probably be obliterated.
In your arcmacs project, I guess you might also be able to have an Arc macro that expands its body as elisp code, an elisp macro that expands its body as Racket code, etc. :-p So maybe that's the approach you really want to take with `%language` and I'm off on a tangent with this "escape sequence" interpretation.