CompileTimeTypingProblem

Last edit July 29, 2010

The compile time typing problem is the problem of assigning types to terms at compile-time (or "read time" - any time after the program is read but before its "main" function is executed). A term is essentially any expression within a program, including variables, parameters, data members of aggregates, and intermediate results from calculations or function calls. Each term in a language must be(*) assigned a specific type.

"Must be" here is technically untrue: for StaticTypeSafety, it is sufficient to prove that the term will be compatible with all operations performed upon it, which doesn't inherently require assigning a type-descriptor to the term (see TypeSafe). Even for optimization of representation, it is sufficient to identify the set of operations that might be performed upon a representation along with their estimated (or profiled) frequencies and probabilities and temporal constraints, and arrange the bits for some choice of access time, storage cost, and lazy evaluation. Neither of these require that a type descriptor (e.g. a type name or type structure) be assigned to the terms. Assuming one must do so can introduce some unnecessary complexity into the TypeSystem. However, CompileTimeTypingProblem remains a valid problem for languages that choose (for reasons that may be outside of StaticTypeSafety) to assign type descriptors to terms.

One immediate question is "what is a type"? There are many different useful definitions (including ThereAreNoTypes); here we treat types as sets; a type is a set of values and an object is an instance of the type if it is a member of the set containing all such instances. This definition assumes immutable objects; though it does extend to objects with mutable state as well.

Note that this definition implies that an object can be an instance of a (usually) infinite number of types. Most programming languages limit the number of distinct types allowed in a program; languages with nominal typing only recognize types that are explicitly declared by the programmer. Languages with structural typing face more difficulty; especially if types can be mutated at runtime. (See NominativeAndStructuralTyping). We will assume for the purposes of this discussion that the set of types in a program is a finite set. A type is maximal with respect to some property if a) the property holds for the type; and b) there are no subtypes of the type (in the finite set which we consider) that also have the property.

A term is well-typed if all possible values of the term (including all states of any mutable objects which might be bound to the term) are instances of the associated type. A program is well-typed if all terms in the program are well-typed; a program which is not well-typed may have typing errors. It's possible that such a program may be well-behaved, but many languages reject any program that is not well-typed. Others (including both CeeLanguage and CeePlusPlus) don't, or allow ill-typed programs to be written with explicit typecasts. In remaining languages, normally if a program is well-typed certain run-time errors are guaranteed not to happen, i.e., the type system is sound (see StaticTypeSafety). Note that Java is not fully sound.

In languages with a universal supertype (a TopType, called "Object" in most such languages), a trivial well-typing for any program is to assign the type Object to all terms in the program; as all possible values/objects are instances of the universal supertype, it is easy to see that all terms are well-typed. Languages with DynamicTyping do exactly this, however being well-typed gives no guarantee of safety about runtime behavior. One can also say (as usually done) that typechecking is deferred to runtime in such languages; the RunTimeTypingProblem is the subject of another page. (And much easier to implement).

A term is "ideally typed" if a) it is well-typed; and b) the associated type is a maximal type for the term. Ideally typed terms are useful in that they enable numerous optimizations - data member lookup can be optimized; DynamicDispatch can be turned into StaticDispatch, functions can be inlined, or the term can be subject to TypeErasure. On the other hand, these optimizations can cause UndefinedBehavior if the typing assumptions ever become invalid. In programs written as separate modules (and especially those deployed as separate modules), this can happen quite often. One instance of this problem is known as the FragileBinaryInterfaceProblem.

A program is "ideally typed" if all terms are ideally typed - in other words, there is no type subsumption. Some FunctionalProgrammingLanguages with StaticTyping (HaskellLanguage, CamlLanguage, but not ObjectiveCaml) produce programs with ideal typings. ObjectOrientedProgrammingLanguages, as a rule, do not - subsumption is a key component of ObjectOrientedProgramming.

How are the types of terms determined? Generally, by one of three ways:

By use of a language rule known to always produce a correct typing - the DynamicTyping technique (everything is of type Object) is an instance of this. Examples include PythonLanguage, CommonLisp, and SmalltalkLanguage.
By programmer declarations, or TypeAnnotations. These are familiar to programmers of languages like CeeLanguage, CeePlusPlus and JavaLanguage.
By TypeInference - the compiler analyzes the program and figures it out. Found in HaskellLanguage and the various ML dialects. TypeInference can be an undecidable problem if the language designer is not careful.

Note that some languages may allow different methods to be used - ML uses TypeInference by default; but the user can attach type declarations if they want.

Very nicely done. Whoever created this page should sign it with pride, it's better than most.

Thanks. -- ScottJohnson

CategoryLanguageTyping