Friday, April 29, 2011

Typed signature variables for ML's module language

Idea: The signature language should be able to express, and infer, whatever we can easily be implement. Much like wanting an to be able to express and infer typing for reasonable programs, which ML already does a fairly good job of, I want to improve the language of types for module (signatures for structures and functors).

So first, lets clarify notation.

Notation

Lets write M1, M2, etc for structure (module) names; S1, S2, etc for signature names; and F1, etc for functor names.

ML's structure/signature syntax is a bit text-heavy, so we'll let ourselves take some shortcuts in notation. The curly brackets "{" and "}" are used in the obvious way to define scope where ML used expressions like "in ... end". We use the full-stop character follow by a space (". "), where ML traditionally uses semi-colon (";"). I think this notation much easier on the eye and does not cause any ambiguity.

For example:
  signature S1 = sig   type T;   val z : T;   val s : T -> T;   end;
becomes:   
  S1 = sig{ type T.  val z: T.  val s: T -> T. }

and:
  structure M2 = struct type T = int; val z = 0; s : T -> T;   end;
becomes:
  M1 = mod{ type T= int.  val z= 0.  val s: T -> T. }

I will distinguish between type abbreviations and type abstractions. We want to preserve that every structure can be given a unique most specific signature that respects abstraction. For example consider the following two modules:


  M = mod{ abbreviation T = int.  val z = 0.  fun s x = x + 1. }
  M' = mod{ type T = int.  val z = 0.  fun s x = x + 1. }

Their unique signatures are respectively:

  S = sig{ val z: int.  val s: int -> int. }  S' = sig{ type T.  val z: T.  val s: T -> T. }

I want to support expressing signatures conveniently, while not having the syntactic niceties messing with the semantics.



Given the ability to uniquely determine a signature, I want some slightly specialised notation for relating structures and signatures. For a structure, M, that can have signature S, let us write  M >: S. We will use  M =: S to say that M has exactly signature S. When all instances (structures) of signature S2 can also be given signature S1, then we write  S1 <= S2.  The idea is that the smallest signature is the empty one; and the more stuff in a  signature, the bigger it is in the partial ordering.

e.g. let
  S1 = sig{ type T.  val z: T. }
  S2 = sig{ type T.  val z: T.  val s: T -> T. }
  M2 = mod{ type T = int.  val z = 0.  fun s x = x + 1. }
then the following hold: 
  S1 <= S2
  M2 >: S1
  M2 >: S2
  M2 =: S2
But it is not the case that M2 =: S1.

Idea

Now for the main idea. Observe that, given any structure M1 >: S1, we can construct the following module:

  M' = mod{ open M1.  val y= s (s z). }

I would like ML to have a module/signature/functor language that can express the 'typing' for this code. Here's some suggestive syntax: 

  F2 = mod(M >: S1){ open M.  val y= s (s z). }

So we could then write:

  M2' = F2(M2).

which would be equivalent to:

  M2' = mod{ type T= int.  val z= 0.  fun s x = x + 1. }
          +{ val y= s (s z). }

where + is just the union operation, i.e. the above is also equivalent to:

   M2' = mod{ type T= int.  val z= 0.  fun s x = x + 1.
              val y= s (s z). }.

So, could give F2 the suggestive signature syntax: 

  F2 =: sig(S <= S1){ S -> S + { val y: T. } }.

And, specifically, for M2' we get:

  M2' =: (S2 + { val y: T. })

The idea is that we can have typed signature variables, and some typical set-like operations on them (for more on providing set-like operations for signatures, see: "An Expressive Language of Signatures" by Norman Ramsey and Kathleen Fisher and Paul Govereau).

The natural question that arises is then: "but what if the argument M2 given to F2 already has a value named y?". There are two options: the value is overwritten by the new value, or you act more strictly, as just say no, that is a type error. Both work; I would be inclined to go for the latter, since the user can always do it explicitly using the more expressive signature language anyway, with something like:

  M2' = F2(M2 as S1 - {val y});

Which makes me think that the ML open command and structure syntax should probably follow the same convention: you should be forced to say when you clobber a value in a structure, because otherwise it can be confusing. I guess this probably wants to be something you can turn on/off, as you definitely want to be able to do that at the top level... more on that another time.