Friday, April 29, 2011

Typed signature variables for ML's module language

Idea: The signature language should be able to express, and infer, whatever we can easily be implement. Much like wanting an to be able to express and infer typing for reasonable programs, which ML already does a fairly good job of, I want to improve the language of types for module (signatures for structures and functors).

So first, lets clarify notation.


Lets write M1, M2, etc for structure (module) names; S1, S2, etc for signature names; and F1, etc for functor names.

ML's structure/signature syntax is a bit text-heavy, so we'll let ourselves take some shortcuts in notation. The curly brackets "{" and "}" are used in the obvious way to define scope where ML used expressions like "in ... end". We use the full-stop character follow by a space (". "), where ML traditionally uses semi-colon (";"). I think this notation much easier on the eye and does not cause any ambiguity.

For example:
  signature S1 = sig   type T;   val z : T;   val s : T -> T;   end;
  S1 = sig{ type T.  val z: T.  val s: T -> T. }

  structure M2 = struct type T = int; val z = 0; s : T -> T;   end;
  M1 = mod{ type T= int.  val z= 0.  val s: T -> T. }

I will distinguish between type abbreviations and type abstractions. We want to preserve that every structure can be given a unique most specific signature that respects abstraction. For example consider the following two modules:

  M = mod{ abbreviation T = int.  val z = 0.  fun s x = x + 1. }
  M' = mod{ type T = int.  val z = 0.  fun s x = x + 1. }

Their unique signatures are respectively:

  S = sig{ val z: int.  val s: int -> int. }  S' = sig{ type T.  val z: T.  val s: T -> T. }

I want to support expressing signatures conveniently, while not having the syntactic niceties messing with the semantics.

Given the ability to uniquely determine a signature, I want some slightly specialised notation for relating structures and signatures. For a structure, M, that can have signature S, let us write  M >: S. We will use  M =: S to say that M has exactly signature S. When all instances (structures) of signature S2 can also be given signature S1, then we write  S1 <= S2.  The idea is that the smallest signature is the empty one; and the more stuff in a  signature, the bigger it is in the partial ordering.

e.g. let
  S1 = sig{ type T.  val z: T. }
  S2 = sig{ type T.  val z: T.  val s: T -> T. }
  M2 = mod{ type T = int.  val z = 0.  fun s x = x + 1. }
then the following hold: 
  S1 <= S2
  M2 >: S1
  M2 >: S2
  M2 =: S2
But it is not the case that M2 =: S1.


Now for the main idea. Observe that, given any structure M1 >: S1, we can construct the following module:

  M' = mod{ open M1.  val y= s (s z). }

I would like ML to have a module/signature/functor language that can express the 'typing' for this code. Here's some suggestive syntax: 

  F2 = mod(M >: S1){ open M.  val y= s (s z). }

So we could then write:

  M2' = F2(M2).

which would be equivalent to:

  M2' = mod{ type T= int.  val z= 0.  fun s x = x + 1. }
          +{ val y= s (s z). }

where + is just the union operation, i.e. the above is also equivalent to:

   M2' = mod{ type T= int.  val z= 0.  fun s x = x + 1.
              val y= s (s z). }.

So, could give F2 the suggestive signature syntax: 

  F2 =: sig(S <= S1){ S -> S + { val y: T. } }.

And, specifically, for M2' we get:

  M2' =: (S2 + { val y: T. })

The idea is that we can have typed signature variables, and some typical set-like operations on them (for more on providing set-like operations for signatures, see: "An Expressive Language of Signatures" by Norman Ramsey and Kathleen Fisher and Paul Govereau).

The natural question that arises is then: "but what if the argument M2 given to F2 already has a value named y?". There are two options: the value is overwritten by the new value, or you act more strictly, as just say no, that is a type error. Both work; I would be inclined to go for the latter, since the user can always do it explicitly using the more expressive signature language anyway, with something like:

  M2' = F2(M2 as S1 - {val y});

Which makes me think that the ML open command and structure syntax should probably follow the same convention: you should be forced to say when you clobber a value in a structure, because otherwise it can be confusing. I guess this probably wants to be something you can turn on/off, as you definitely want to be able to do that at the top level... more on that another time.

Monday, March 14, 2011

heterogeneous polymophic structures in SML (delaying type-checking to run-time)

I'm sure this is well known to those who know, but I keep meeting people who don't know...

Heterogeneous lists, and other such dynamically polymorphic data structures, can be written in Standard ML by pushing type-checking to run-time using exceptions. Here's a little example:

(* exceptions that wrap up some data *)
exception StringExpWrapper of string;
exception IntExpWrapper of int;

(* generic kind of data: an informal name for the data and the
   exception that holds the data *)
datatype generic_data =
  GenericData of { kindstr : string, dataexn : exn };

(* helper function to wrap up data in exceptions and give them a kind string *)
fun wrap_int i =
    GenericData { kindstr = "int", dataexn = IntExpWrapper i };
fun wrap_string s =
    GenericData { kindstr = "string", dataexn = StringExpWrapper s };
fun wrap_generic ex =
    GenericData { kindstr = "generic", dataexn = ex };

(* To try and pull integer from generic data; will raise the wrapped
   exception if the kind-string is "int" but the exception is not the
   IntExpWrapper. *)
fun unwrap_int (GenericData { kindstr = "int", dataexn }) =
    ((raise dataexn) handle IntExpWrapper x => SOME x)
  | unwrap_int _ = NONE;

(* empty list *)
val empty_list = []  :  generic_data list;
val list1 = [wrap_string "foo", wrap_int 12, wrap_int 3];

(* we can sum the integers in a generic list... *)
fun sum_ints l =
      (fn (data,count) =>
       case unwrap_int data of SOME n => count + n | NONE => count)
      0 l;

(* a little test to sum of our first generic/heterogeneous list *)
val sum1 = sum_ints list1;

(* we can create new kinds of exception for new type of data we want
   to wrap up and add to our list ... *)
exception SomeNewDataExn of int list;

(* now we can extend our old list with elements, even those of the
   new type *)
val list2 = [wrap_int 2, wrap_generic (SomeNewDataExn [1,2])] @ list1;

(* and we can apply the sum function again to this new list... *)
val sum2 = sum_ints list2;
More generally,  exceptions give one the opportunity to delay type-checking until run-time. Generally this introduces the opportunity for a programming error to cause a run-time exception. However, by a little careful wrapping of things in functors, you can keep it type-safe, and allow dynamically extended the kinds of data available during execution time. See for example Isabelle's universal type, or isaplib's generic polymorphic tables.