muSE - muvee Symbolic Expressions: September 2006

Tuesday, September 19, 2006

Object I/O

... or Why you don't need to invent a new syntax for every object type in muSE.

Standard Scheme (RnRS) has special read/write syntax for vectors which goes like this - #(1 2 3 4), which will print out as #4(1 2 3 4). muSE doesn't have such special syntax for objects because its syntax for read-time evaluation is general enough to cover vectors, hashtables and (as far as I can see) all new object types that can be added to muSE in the future.

The idea is dead simple and is based on the read-time evaluation done by muSE. It is quite likely that you'll have written a function that takes a list of arguments and constructs your object. In the case of vectors, muSE has vector that's used like (vector 1 2 3 4) and in the case of hashtables, it has hashtable that's used like (hashtable '((key1 . value1) (key2 . value2))).

Whenever the writer encounters an object, it simply has to write it out in that "constructor" function notation, using {} instead of (). When such an expression is read back in by muSE, the reader will expand the braces and return the constructed object directly. For example -
> (define v (vector 1 2 3 4))
> (write v)
{vector 1 2 3 4}

When the reader reads the following expression -
(third item {vector 1 2 3 4} is a vector)
you'll actually get a vector as the third item in the above list.

Friday, September 15, 2006

Anonymous symbols

muSE has a notion of anonymous symbols - symbols which do not have a textual representation - which are useful in situations where you need to keep a set of properties together, and in macros to introduce new variables into the generated expressions. The difference between named and anonymous symbols (apart from the textual representation) is that named symbols persist for the life time of the muSE execution environment whereas anonymous symbols and their property lists are garbage collected when there are no references to them.

Anonymous symbols are created using (new) . You can use an anonymous symbol as the first argument to the get and put functions to edit its property list.

Property lists

Every symbol in muSE has an associated property list that you can query and edit using the get and put functions. For example -

> (put 'kumar 'sister 'hamsa)
(sister . hamsa)
> (get 'kumar 'brother)
()
> (get 'kumar 'sister)
(sister . hamsa)

A symbol's property list is globally available and is not changed by local contexts such as let.

Symbols and values

In muSE, as in all Schemes I guess, named symbols are entities that are uniquely specified by their textual representation - i.e. two symbols with the same name refer to the same internal object, irrespective of context. In muSE, all named symbols are "interned" forever - i.e. they are automatically kept alive for the life of the running environment.

You can bind values to symbols either using the define syntax or using the set! function. There is very little difference between define and set!. They can both assign values to symbols at the lexically top-level, but only define can be used to specify recursive functions. muSE's define syntax is more restrictive than R5RS Scheme in that you can define functions only like this -


(define f (fn (...args..) ...body...))

whereas in standard Scheme you'd define it like this -


(define (f ...args...) ...body...)

It is not an error to define the value of a symbol more than once using define, but it will complain because it is a common source of programming error that indicates that an incorrect assumption is probably being made. set! will not complain, of course, as the intention is clear.

You can get the string name of a symbol using (name sym) and you can intern a symbol given its string representation using (symbol "name").

Differences with standard Scheme
In MzScheme (for example), symbols introduced by define are not closed over when creating functions using lambda. Changing the definition of such a top-level symbol will change the behaviour of the function created using lambda. In muSE, however, fn captures the values of symbols at the time it is being created, including all top-level definitions. fn:, on the other hand, allows its behaviour to be changed after its definition ... even in a local context such as that introduced by let.

Tuesday, September 12, 2006

Hashtables are functions

In the same spirit as vectors, hashtables can be thought of as functions that map keys to values. muSE hashtables are presented exactly like that and don't need special accessor functions.


> (define rgb (mk-hashtable))
> (rgb 'red 255)
> (rgb 'green 255)
> (rgb 'blue 255)

If you load the above definitions, you can get an alist from the hash table using -
> (hashtable->alist rgb)
which will give
((red . 255) (green . 255) (blue . 255))
(The order is unspecified, though.)

You can retrieve the green component using (rgb 'green). If you supply a key that is not present in the hashtable, the function will return (). In muSE therefore, it is not possible to distinguish between a key with a value that is () and a key that is not present in the hashtable. This fact is used to remove a key from the hashtable if you pass () as the value argument. For example -


> (rgb 'green ())
> (hashtable->alist rgb)
((red . 255) (blue . 255))

This is not really a restriction and you can use a hashtable as a set by setting the value to any non-NIL value.

Hashtables accept integers, strings and symbols as keys. Therefore, you can use a hashtable like a sparse vector if need be, since they both have the same invocation interface.

Vectors are functions

A vector is, conceptually, a function from an index to an object and in muSE, a vector is exactly that - a normal function. Here's an example -

(define rgb (mk-vector 3))

Now rgb is a 3-element vector, with all the slots set to (). Here's how to set the three color components -


(rgb 0 255) ; Red
(rgb 1 255) ; Green
(rgb 2 255) ; Blue

To get the green component, for example, you use (rgb 1). If you pass an index out of range, you always get ().

You can create a vector from data using the vector function like this -
(vector 255 255 255)

Other vector manipulation functions are more or less the same as standard Scheme - such as vector-length, list->vector and vector->list.

"Backquote" syntax

muSE does not support the Scheme/Lisp backquote notation in its reader. This is primarily because I was lazy, but later on I realized that it is simple to implement something like it using muSE's macro facility. Here's the definition -


(define literal 
  (fn 'args
    (case args
      (() 
       ())
      ((('unlit expr) . etc) 
       (list 'cons expr (apply literal etc)))
      ((('unlit-splice expr) . etc) 
       (list 'append! expr (apply literal etc)))
      ((x . etc) 
       (list 'cons (cons quote x) (apply literal etc))))))

For example -


(literal 1 2 (+ 1 2)
          (unlit (+ 2 2))
          (unlit-splice (map (fn (x) (* x x)) '(5 6 7)))
          8 9 10)

will get you the literal expression -
(1 2 (+ 1 2) 4 25 36 49 8 9 10)

This works mostly well enough to write macros using it, except when you want to use a macro-like expression within the literal, in which case the result of the macro expansion will be used instead of the literal macro term. This is due to the tail-first expansion performed by muSE.

Read-time evaluation & first class macros

muSE has a simple approach to evaluate certain expression at read-time - you enclose the expression in braces {} instead of parentheses (). If you have untrusted input sources, the muSE API lets you turn off read-time evaluation. You can use read-time evaluation to precompute subexpressions that won't change during execution.

Apart from the braces approach, muSE provides a way to specify functions which take in their syntactic arguments - i.e. their unevaluated arguments and can return code in the form of another expression that is evaluated instead. These are called macros - just as in Scheme. A macro is specified in muSE using the fn expression that's used for normal functions, but the entire argument list should be quoted. Here's an example macro that evaluates a three-term infix expression -


(define infix3
 (fn '(x op y)
   (list op x y)))

Macro calls may be enclosed in braces or parentheses - both are accepted. So the following uses infix3 in the expected way -


> (infix3 2 + 3)
5

Macro symbols are recognized at the head of a parenthesized list, but not anywhere else. So you can get the expression that the infix3 macro computes by using apply.


> (apply infix3 '(2 + 3))
(+ 2 3)

This is possible because macros in muSE are first class entities - i.e. they can be passed around by value.

Evaluation order
In common-lisp, I believe macros are expanded head first and they continue to expand until no more macros exist in the expression. muSE, on the contrary, performs tail-first expansion.

For example, in the expression (infix3 2 + (infix3 1 + 2)),
the inner (infix3 1 + 2) is expanded before passing on to the outer infix3, so the outer infix3 sees the expression (2 + (+ 1 2)), which it'll transform to (+ 2 (+ 1 2)).

Braces vs. parentheses
Braces are evaluated even if they occur within quoted expressions, whereas parentheses aren't, even if they contain sub-expressions that look like macro calls. So the expression
'(1 2 {+ 3 4} 5 6)
is actually
'(1 2 7 5 6)

Pattern matching bind

muSE uniformly uses pattern matching to bind symbols to values. It is used in fn, case and let expressions - which therefore differ slightly from standard Scheme. Not only was it easier to use the same technique in all three expressions, it has resulted in greater expressive power for let and case, obviating the need to do first, rest and such destructuring operations on lists.

muSE's pattern matching bind can deconstruct lists and match constants such as numbers, strings and symbols. Here's an example using fn and case - Suppose we need to create a function that adds up the pair-wise product of its arguments. i.e -

> (f 1 2 3 4 5 6)
should yield -
1 * 2 + 3 * 4 + 5 * 6
= 45

We can write f like this -


(define f
  (fn args
    (case args
      (() 0)
      ((x y . etc) (+ (* x y) (apply f etc))))))

Note that args is used by itself without an enclosing parentheses to get the arguments of the function as a list - this itself is a pattern match. Also note that NIL can be notated as () without a quote character.

Similarly, let also allows you to deconstruct lists. Apart from that, the behaviour of let in muSE is similar to let* in Scheme. There are no other kinds of let in muSE because so far this one has been sufficient.

Functions/closures

muSE doesn't use the Scheme standard lambda keyword to create functions. This language deecision is because muSE is used by non-programmers who are slightly familiar with JavaScript, but will freak out if they see things like lambda occuring anywhere. It has fn and fn: instead. fn behaves like you'd expect lambda to - capturing the lexical context in a closure. fn: creates a function which has a dynamically scoped body. Here's an example that tells you the difference between the two -


(define y 2.0)
(define f (fn (x) (+ x y)))
(define g (fn: (x) (+ x y)))

Now,


> (f 5.0)
7.0
> (g 5.0)
7.0

Fair enough, but now lets change the definition of y ..... locally!


(let ((y 4.0))
 (print (f 5.0))
 (print (g 5.0)))

Now f continues to use the old value of y whereas g uses the new value of y instead. The above expression will print -


7.0
9.0

Monday, September 11, 2006

API documentation

muSE API documentation in HTML format now available.

Windows CHM help file

Online HTML docs

The API documentation was generated by running the doxygen tool on the muSE source code.

Sunday, September 10, 2006

Welcome to muSE

Hello and welcome to muSE.

I wrote this Scheme dialect with the intention of having a simple small foot print engine that one can use as an expressive embedded extension language. The company I work for has allowed me to publish muSE under very liberal open-source license terms.

Selling points -

Easy to use and reasonably documented C embedding API.
Closed execution environment - instabilities won't leak into your system unlike an environment that uses the Boehm garbage collector.
Fairly small foot print - WIN32 executable is about 80KB including diagnostics.
64-bit ready - muSE integers are always 64-bit and 64-bit pointers are just a recompile away.
Unicode throughout - Exclusively uses 16-bit characters internally, supports (only) UTF-8 for I/O.
Fast enough as a glue language (at least for us). Startup/shutdown times are also quite fast due to the minimal language.
Suitable for creating simple DSLs.
Some novel/experimental features -

Simple notation for read-time code evaluation.

First class reader macros that expand tail-first.
Dynamic or lexical scoping - you choose according to your need.
Diagnostics that make spell-check-like suggestions.
Experimental support for ez2scm syntax.

What muSE is not -

A full R5RS compliant Scheme. We don't need full support because the runtime is expected to be aggresively extended using C/C++ depending on the usage context. Basics are however covered - map, for-each, list manipulations, closures, call/cc, vectors, hashtables, ports, etc. are in there. If you're looking for a full R5RS Scheme, try MzScheme which is supported by the wonderful DrScheme IDE.
A coffee maker that toasts bagels to boot.

muSE - muvee Symbolic Expressions

News