- Type Representation
Typical represents source language types using the variant type
raw_type
. For example, the type system for an implementation of
the Simply Typed Lambda Calculus with Integer and String values can be
expressed as:
mltype raw_type = IntegerT | StringT | FunctionT of type * type ;
The raw_type variant is sufficient to represent the type system of some
languages such as the Simply Typed Lambda Calculus, ML and Typical
itself. For other languages, however, we need to capture information about
attributes; for example, C's qualifiers and storage classes.
For representing attributed type systems, Typical provides the
attribute
and eqattribute
constructs. For example,
C's type system could be expressed as
mltype raw_type = VoidT | IntT | PointerT of type | ... ;
mltype storage_class = Auto | Extern | Register | Static | Typedef ;
mltype qualifier = Const | Volatile | Restrict ;
attribute storage : storage_class ;
attribute qualifiers : qualifier list;
The Typical compiler collects all attributes and combines them with the
raw_type
variant to create a record representation of types which
will be used by the typechecker.
mltype type = { type : raw_type; storage : storage_class; qualifiers = qualifer list; ... } ;
By default, Typical evaluates type equality as structural equality ignoring
attributes introduced with attribute
as opposed to
eqattribute
. This default behavior may be overridden for variant
types by using the equality
construct which specifies which
constructor arguments to compare. For example, consider the following
definition of C’s structure type:
mltype raw_type = ... | StructT of int * string * member list ;
Consistent with the C standard, it uses its integer argument as
a nonce that distinguishes between structures that have the same
name and members but are declared in different scopes. The corresponding
equality declaration (meaning two StructT
s compare equal if they
have equal
first arguments) reads:
equality raw_type = ... | StructT (n, _, _) ;
- Name Management Features
Typical supports name management by providing (1) a symbol table that is
implicitly available to all programs, (2) the scope
construct to
declare which AST nodes introduce which scopes, and (3) the
namespace
construct to declare individual namespaces, the types
of their values, and which AST nodes specify which names.
Typical provides the following constructs for symbol table operations:
define
, redefine
, lookup
, and lookup_locally
. The constructs have the structures shown:
define no type [error|warn "error message"] [at loc]
redefine no type
lookup no [error|warn "error message"] [at loc]
lookup_locally no [error|warn "error message"] [at loc]
Both define
and redefine
take a node argument (no), a value argument (type), and optional error messages and source location nodes.
The snippet below illustrates
the use of define
and lookup
.
| Abstraction (id, type, body) ->
let param = analyze type in
let _ = define id param in
let res = analyze body in
{ type = FunctionT (param, res) }
| Identifier _ as id -> let t = lookup id in t
In this snippet, define
adds the name derived from the node
id
to the symbol table with the value type. lookup
looks up the name derived from the node id. The symbol table operations have
built in error management. If a name is previously defined (in the case of
define) or not defined (in the case of lookup) the operations report default
error messages and return an error. The source location for the error is also automatically derived from the current position in the ast. The default source location can be overridden by using the at loc
construct where loc is the name of an ast node. The default error messages can be
overridden by explicitly adding an error clause to the operations; for example:
| Abstraction (id, type, body) ->
let param = analyze type in
let _ = define id param error "previously defined identifier" in
let res = analyze body in
{ type = FunctionT (param, res) }
| Identifier _ as id -> let t = lookup id error "identifier undefined" in t
The redefine
operator is used for overriding previously defined
symbol table entries without reporting errors. lookup_locally
is
used to retrieve entries that are only visibly within the current scope.
The symbol table operations depend on Typical's namespace construct to
introduce new namespaces and their types and also how to extract names from
AST nodes. The namespace construct hast the structure shown:
namespace "default"|identifier : value = PatternMatching [and "default"|identifier : value = PatternMatching]*
Here, "default" and identifier identify the namespace, value indicates the type
of values stored in this namespace, and PatternMatching is a matching from AST nodes to a Name constructor.
For example the namespace declaration:
namespace default : type = Identifier (id) -> SimpleName(id);
indicates that there's a single default namespace, where all values have type
type. Identifiers are matched to names by extracting the string
child from an Identifier node and using it to create a new SimpleName. Names
can be simple, omitting the scope, or qualified, explicitly specifying a scope
according the built-in variant type declaration:
mltype name =
| SimpleName of string
| QualifiedName of string list ;
The symbol table operations also depend on the scope construct to manage
scopes while traversing the AST. The scope construct (non-exhaustively) maps
AST nodes to scopes; each right-hand side must evaluate to a Scope constructor
value defined by the built-in variant declarations:
mltype scope_kind =
| Named of name
| Anonymous of string
| Temporary of string ;
mltype scope =
| Scope of scope_kind * node list ;
A scope spans the specified list of nodes. It can be one of three kinds.
First, a named scope is introduced by a function or class. Second, an anonymous
scope is introduced by a block or compound statement. Third, a temporary
scope, unlike a named or anonymous scope, is deleted after the AST traversal
has left the scope’s AST nodes. It is necessary for typing C or C++ function
declarations, which may contain parameter names different from the corresponding
definitions. Named scopes assume the specified name, while anonymous and
temporary scopes receive a fresh name based on the specified string. An
implementation of the scope construct depends on the ability to accurately
trace a Typical program’s AST traversal. To this end, we restrict AST node
patterns: they may specify several (recursively) nested nodes but only use
variable or alias patterns for arguments of a single node constructor. With
this restriction in place, each right-hand side of a pattern match has a
unique parent. As a result, we can easily model the path representing the
AST traversal’s dynamic state through a stack:
traversal_path : node list
Typical’s runtime updates the stack with the parent and any matched ancestors
before evaluating a pattern match’s right-hand side and restores the stack
before returning the right-hand side’s value. Whenever the runtime pushes
nodes on the stack, it evaluates the scope construct’s pattern match against
each node being added to the stack. On a match, it updates the current scope
as specified by the corresponding Scope constructor value. Independent of a
match, it annotates the node with the result. It uses the annotations to
restore previous scopes when popping nodes of the stack. It also uses the
annotations to avoid re-evaluation of the scope construct’s pattern match
for repeated passes over the same AST.
- Error Management, Detection, and Reporting.
For error management, Typical provides a system-wide no-information monad.
All types are automatically injected into this monad and implicitly represent
the union with the bottom
type; that type’s only value is
bottom
. Typical's bottom
value can be compared with
Java's null
value. In fact, in the generated Java code,
bottom
is represented by null
. Built-in constructs
and primitives generally return bottom
if any of their arguments
are bottom
. Furthermore, all pattern matches return
bottom
due to an implicit clause mapping bottom
to
itself; this default can be overridden by explicitly matching
bottom
. However, type constructors such as those for tuples,
lists, and variants treat bottom
just like any other value. It
allows for marking, say, a type attribute as having no information without
forcing the entire type record to bottom
. Finally,
the is_bottom
primitive enables the detection of
bottom
values, since the =
and !=
operators yield bottom
when either operand is
bottom
.
To actually detect and report error conditions,
Typical uses require
and guard
.
The require construct enforces one or more boolean constraints on an
expression. For example, consider :
require param = tr error "argument type mismatch" in res
If the constraints, here "param = tr", evaluate to true, require
evaluates the expression, here "res" and returns the corresponding value. If
the constraints evaluate to false, it reports an error, here "argument type mismatch", and returns bottom
. Unless explicitly specified, the
traversal_path’s top node supplies the error’s source location. However, if
the constraints evaluate to bottom
, reduce silently returns
bottom
. This feature avoids cascading errors since constraints
that evaluate to bottom
means that an error was previously
generated (and reported).
- List Reduction.
C-like languages rely on lists of specifiers or modifiers to express
the properties of declarations, including types and their attributes. When
processing such lists, a type checker needs to (1) map AST nodes to the
corresponding internal representation, (2) enforce semantic constraints on the
number and combination of specifiers, and (3) provide comprehensive error
reporting. Typical addresses these needs using the reduce
construct. The reduce construct has the following structure:
reduce to ["singleton" | "list" | "set" | "optional" | "required"] tag with PatternMatching
As illustrated in Figure 1 using C’s type specifiers as an example,
the reduce construct selects values from a list, while also mapping
them to different values and enforcing constraints on their number
and combination. It uses an extension of ML’s pattern syntax,
the set pattern { ... }
, to denote that the elements may
appear in any order in the list; alternatively, a regular list pattern
can be used to indicate that the elements must appear in the specified
order. While mapping the (non-exhaustive) pattern match over
a list, reduce ignores elements that do not match any patterns;
we assume that lists, in particular those generated by the parser,
only contain legal elements. While mapping, reduce also enforces
the reduction constraints. The singleton constraint in the example
indicates that the pattern match may be triggered at most
once, while set and list constraints allow for multiple triggers
(with duplicate triggers being ignored for set constraints). A further
optional constraint specifies that the pattern match need not
match any elements.
The design of the reduce construct reflects our analysis of how
to type check C specifiers or Java modifiers and supports a generalization
of the corresponding requirements. For example, a list of C
specifiers must include a single type specifier such as int, reducing
the list to a "singleton". It may optionally include one storage
class specifier such as register, reducing to an "optional
singleton". It may also include one or more qualifiers such as
const, which may be duplicated and thus reduce to an "optional
set". Furthermore, Figure 5 illustrates how reduce combines several
type specifiers into one internal value; as an added benefit, the
code directly reflects the constraints specified in the C99 standard.
Like require and guard, reduce integrates
error checking and reporting, basing any message on a string tag
that describes the kind of values being selected such as "type" or
"access modifier". For example, if processing the C declaration long int
float foo = 5;
, the reduce construct shown below would generate a
"multiple types detected" error and indicate the source location.
(** Reduce a list of declaration specifiers to a type. *)
mlvalue extractType = reduce to required singleton "type" with
[Unsigned _, Long _, Long _, Int _] -> {type = ULongLongT}
| [Unsigned _, Long _, Long _] -> {type = ULongLongT}
| [Signed _, Long _, Long _, Int _] -> {type = LongLongT}
| [Signed _, Long _, Long _] -> {type = LongLongT}
| [Unsigned _, Long _, Int _] -> {type = ULongT}
| [Signed _, Long _, Int _] -> {type = LongT}
| [Signed _, Short _, Int _] -> {type = ShortT}
| [Unsigned _, Short _, Int _] -> {type = UShortT}
| [Long _, Double _, Complex _] -> {type = LongDoubleComplexT}
| [Long _, Long _, Int _] -> {type = LongLongT}
| [Unsigned _, Long _ ] -> {type = ULongT}
| [Signed _, Short _] -> {type = ShortT}
| [Short _, Int _] -> {type = ShortT}
| [Unsigned _, Short _] -> {type = UShortT}
| [Signed _, Char _] -> {type = SCharT}
| [Unsigned _, Char _] -> {type = UCharT}
| [Float _, Complex _] -> {type = FloatComplexT}
| [Double _, Complex _] -> {type = DoubleComplexT}
| [Signed _, Long _] -> {type = LongT}
| [Long _, Int _] -> {type = LongT}
| [Long _, Double _] -> {type = LongDoubleT}
| [Long _, Long _] -> {type = LongLongT}
| [Unsigned _, Int _] -> {type = UIntT}
| [Signed _, Int _] -> {type = IntT}
| [Unsigned _] -> {type = UIntT}
| [Signed _] -> {type = IntT}
| [Long _] -> {type = LongT}
| [Int _] -> {type = IntT}
| [VoidTypeSpecifier _] -> {type = VoidT}
| [Char _] -> {type = CharT}
| [Float _] -> {type = FloatT}
| [Double _] -> {type = DoubleT}
| [Short _] -> {type = ShortT}
| [Bool _] -> {type = IntT}
| [VarArgListSpecifier _] -> {type = VarArgT}
| [(StructureTypeDefinition _ ) as me] -> analyze me
| _ (*Similar for enum, union and typedef *)
;
Figure 1. reduce pattern to process C declaration specifiers
- Module System.