Arc Forumnew | comments | leaders | submitlogin
2 points by almkglor 6134 days ago | link | parent

I've just built a continuation-approach parser combinator library, pushed on the git. It's currently faster on 'many and 'seq, but slower on 'alt. It's considerably faster on Arki's use case, however (about 33% less time).


2 points by raymyers 6133 days ago | link

From the comments: "...continuation-passing style, which is hard on the eyes..."

You weren't kidding :) However, I congratulate you on your superior understanding of continuations.

-----

1 point by almkglor 6133 days ago | link

Thanks. And #2 wasn't kidding either.

That said I think the main issues slowing down the monadic treeparse are:

1. Use of 'return. I'm not sure if mzscheme can optimize this away (it's just being used as a multiple-value return, and is immediately destructured by the calling function). Heck, I'm not even sure Arc destructuring is in a form optimizable by Scheme. Using continuations, this tuple becomes the parameters of the call to the continuation, and that I'm somewhat convinced that Scheme (all Schemes) are capable of optimizing. Also, by avoiding explicit destructuring, we can be reasonably sure that Arc gives the parameter list plain to Scheme (not that I've actually studied ac.scm much).

2. Copying of 'parsed. Unfortunately, we have to copy 'parsed, because of 'actions! This is because 'sem attaches the 'parsed part to the 'actions field (via closure), and so we can't reuse this 'parsed part (alternatively we could just copy 'parsed inside 'sem, instead of copying on 'many and 'seq). IMO: drop 'actions ^^. Copying 'parsed, I suspect, may be the single most time-consuming part of the monadic treeparse.

3. Using a 'tconc cell, while neater, is less efficient (about 20% IIRC) than explicit and separate head+tail variables. My goal in optimizing treeparse was to maintain legibility while adding speed; my goal in cps_treeparse was to optimize, legibility be damned^H^H^H^H^H^H^H consigned to hell.

4. There's a few more bits and pieces that may be optimizable - for example, most of my 'filt usage is (filt [map something _] parser). If we are sure that 'parsed will not need to be reused (assured by dropping 'actions, or giving the responsibility of copying 'parsed to 'sem) then we can define a (filt-map something parser) which returns the same list as 'parser, but with their car's transformed by 'something:

  ;monadic
  (def filt-map (fun parser)
    (fn (remaining)
      ; no actions!
      (iflet (parsed remaining) (parser remaining)
        ((afn (p)
           (when p
             (zap fun (car p))
             (self cdr p)))
         parsed)
        (return parsed remaining))))

  ; cps
  (def filt-map (fun parser)
    (fn (remaining pass fail)
      (parser
        (fn (parsed parsedtl remaining)
          ((afn (p)
             (when p
               (zap fun (car p))
               (self cdr p)))
           parsed)
          (pass parsed parsedtl remaining))
        fail)))
Using filt-map would reduce consing, and especially in the cps case would remove the scan through 'parsed to find the 'parsedtl.

Further optimizations abound: another common idiom I found in my wiki parser was (filt nilfn (seq-str "something")), which basically ignores the 'parsed return value of 'seq, It may be possible to create a variant of 'seq, 'seq-nil, which wouldn't bother catenating the 'parsed return values of subparsers.

So no, I don't think continuation-passing per se causes most of the speed difference, except possibly for eliminating the 'return structure.

Another possible issue is the use of a list-like structure. 'scanner-string also conses on each 'cdr, and the total memory usage actually exceeds (coerce "string" 'cons) - it's just that scanners are sexier ^^.

For this case it may be possible to create a variant of the CPS style which takes a tuple of (string, index, end) - as explicit and separate variables of course, so that underlying scheme has a shot at optimizing it away - and use indexing, which I suspect is faster on strings (and which scanner-string ends up doing anyway). This would remove the need for consing too.

-----

1 point by almkglor 6133 days ago | link

LOL forgot to give the parser the list to parse:

  ; cps
  (def filt-map (fun parser)
    (fn (remaining pass fail)
      (parser remaining ;oops!
        (fn (parsed parsedtl remaining)
          ((afn (p)
             (when p
               (zap fun (car p))
               (self cdr p)))
           parsed)
          (pass parsed parsedtl remaining))
        fail)))

-----