Arc Forumnew | comments | leaders | submitlogin
Fold
5 points by skenney26 5799 days ago | 14 comments
There are a couple definitions in arc.arc that benefit from being redefined using fold:

  (def fold (f v xs)
    (if xs
        (f (car xs) (fold f v (cdr xs)))
        v))
The first is map1:

  (def map1 (f xs)
    (fold (fn (x xs) (cons (f x) xs)) nil xs))
The second is best:

  (def best (f seq)
    (fold (fn (x y) (if (f x y) x y)) (car seq) (cdr seq)))
These definitions are shorter than the originals and almost twice as fast. (Functions such as rev and len are slower with fold.)

They could also be rewritten with a slightly more exotic notation using a macro that creates an anonymous two variable function:

  (mac xy exprs
   `(fn (x y) ,@exprs))

  (def map1 (f xs)
    (fold (xy (cons (f x) y)) nil xs))

  (def best (f seq)
    (fold (xy (if (f x y) x y)) (car seq) (cdr seq)))


5 points by rntz 5798 days ago | link

foldl and foldr exist in lib/util.arc on anarki. Unlike the above implementation, they are both tail-recursive:

    (def flip (f)
      "Flips the order of arguments of 'f. eg: ((flip cons) 1 2) -> (2 . 1)"
      (fn (a b) (f b a)))
    
    (def foldl (f v l)
      (if l (foldl f (f v car.l) cdr.l) v))

    (def foldr (f v l)
      (foldl flip.f v rev.l))
Note that the order-of-arguments to the function passed to lib/util's version of foldl follows that of Haskell rather than SML. This is also the order used on Wikipedia (http://en.wikipedia.org/wiki/Foldl).

On what grounds (benchmarks, examples, etc) do you claim your implementations are faster than the original implementations? From merely looking at the code, it seems to me that map1 looks just like your fold version, inlined.

-----

2 points by skenney26 5798 days ago | link

The tests weren't exhaustive, just simple comparison tests:

  (time (repeat 1000 (map1 [* _ 3] '(1 2 3 4 5))))
Comparing the different implementations of map1, the one using the definition of fold (foldr) at the top of the page was the fastest, followed by Arc's version, followed by Anarki's version (using the definitions of flip, foldl and foldr in rntz's comment).

These informal tests were repeated on best and rev with the same results. I honestly don't know why it's faster.

-----

2 points by rntz 5798 days ago | link

Interesting. I get the same results, even when I modify the test slightly to run on lists of different lengths:

    (def testmap (ntests map) (time (for i 0 ntests (map [* _ 3] (range 0 ntests)))))
No matter the number of tests, though, the difference is only a millisecond or so, which is negligible as the size of the input increases or if the mapped function does nontrivial work.

However, I've also tried the non-tail-recursive fold on very large lists, and it doesn't appear to blow the stack. mzscheme probably does something to prevent this from happening, like allocating "stack" frames on the heap (a standard technique for implementing call/cc without stack-copying, and if your GC is good it's not much of a penalty). Given this, it seems more defensible that arc.arc's 'map1 et al are non-tail-recursive. I think I may change lib/util's foldr to use the OP's implementation.

-----

3 points by rkts 5798 days ago | link

The difference is the call to (no xs). Swap the then/else cases in map1 and it runs as fast as your version.

-----

4 points by simonb 5799 days ago | link

The problem with your definition of fold is, it's not tail-recursive, which makes it less then ideal for implementing other general utility functions.

Also, to implement map you either need a right-fold or reverse the results.

-----

2 points by skenney26 5799 days ago | link

The fold defined above is a right fold. As for defining other general utilities, this version of fold can be used to define any function that uses a right fold. For example:

  (def len (xs)
    (fold (xy (+ y 1)) 0 xs))

  (def sumlength (xs)
    (fold (fn (n (x y)) (list (+ n x) (+ 1 y))) '(0 0) xs))
Graham Hutton has written an excellent paper on fold called "A tutorial on the universality and expressiveness of fold" which includes these and other definitions.

-----

3 points by simonb 5798 days ago | link

This doesn't change the fact that your definition of fold blows the stack for large inputs.

P.S. the "you need right-fold" bit was obviously a brain-fart on my part, sorry.

-----

1 point by absz 5799 days ago | link

To nitpick about notation (not about concept), on Anarki, the xy macro is unnecessary; the [] construction can construct n-ary procedures. Thus, we would have, for instance,

  (def map1 (f xs)
    (fold [cons (f _1) _2] nil xs))
. Your point about the benefits of fold is a good one, of course!

-----

1 point by skenney26 5799 days ago | link

I guess the aesthetics of using _1, _2, etc. has always bothered me a little.

-----

1 point by shader 5798 days ago | link

If you don't like _1, _2, etc. we could implement the system discussed here: http://arclanguage.org/item?id=8617. Basically, bind the unbound symbols inside the brackets in alphabetical order. It would work with _, _1, _2 as usual, but also x, y. Generally when people do short functions of a few variables they pick them in alphabetical order, so it makes sense. I'm just not sure how to go about implementing it. Presumably it would have to search the body for symbols which fail the bound predicate. sort them, and use them as the argument list. Unfortunately, it sounds like a macro with run-time knowledge. I don't know if macros have information on whether a symbol is bound or not; I would presume not.

Anyway, it's an idea.

-----

1 point by absz 5798 days ago | link

You could look at make-br-fn on Anarki; it crawls through the [] functions finding free variables of the form it wants (_\d+ or __). It might be possible to modify it to pick everything out, if we want to. The downside, of course, is that if you typo, say, string as strnig, the [] function will think it's a brand-new variable. And the other downside is that lexically speaking, '_10 < '_2, though we could always special-case numbers.

-----

1 point by shader 5798 days ago | link

Ok, I'll look at make-br-fn. And I hadn't thought about _10 < _2. However, when is someone going to write a 10 arg function with no names? Sounds suspicious to me. I agree that strnig would cause problems. However, unless you are very lucky, calling the arg "strnig" in function position will cause as much of an error as calling strnig in a normal function call. Of course, to see how useful it really is, I'd have to try it. I think that in most cases where all you want is three vars a, b, & c, or x, y, and z, it should work fine. And look better than _1 _2 _3, with less typing.

-----

1 point by skenney26 5799 days ago | link

Correction: rev is also twice as fast, but using a left fold rather than a right fold:

  (def foldl (f v xs)
    (if xs
        (foldl f (f (car xs) v) (cdr xs))
        v))

  (def rev (xs)
    (foldl (xy (cons x y)) nil xs))

-----

3 points by rntz 5798 days ago | link

(xy (cons x y)) is just 'cons.

    (def rev (xs) (foldl cons nil xs))
Although you write your foldl differently than mine; for me, it would be:

    (def rev (xs) (foldl flip.cons nil xs))

-----