teaching machines

CS 330 Lecture 10 – Type Systems

February 15, 2017 by . Filed under cs330, lectures, spring 2017.

Dear students,

Last time, we introduced types as one of the more prominent distinguishing features of a programming language. We enumerated a bunch of types commonly supported in programming languages. This time, we look at some ways to characterize a language’s type system.

To help understand the importance of a language’s type system, imagine if you didn’t have taste buds. Could anything go wrong? Taste buds are generally designed to help us eat good things and avoid bad things, much like a type system is designed to make sure operations are digesting good operands and not incompatible ones.

Type systems can be explicit or implicit or a little bit of both. In an explicit type system, the programming language the programmer to annotate the types in the source code. In many languages, these annotations need only appear on variable and function declarations. The compiler can figure out the types of literals on its own—except when a literal looks like an int but should really be a float, in which case we explicitly mark it with an f specifier (for example, 3.14f). In an implicit type system, on the other hand, the compiler or interpreter can infer what type is meant from the contexts in which the data is used.

Back to our taste buds metaphor, in an implicit type system, the compiler must taste each thing to discover what it is. In an explicit type system, the chef has labeled each dish. Of course, the labels might be wrong. We might have worked our saliva up for a sweet treat only to be knocked back by savory.

Does Java support implicit typing? For generics, it does. Kind of. As of Java 7, we can write this:

ArrayList<String> names = new ArrayList<>();

As we enumerate the many possible dimensions of type systems, we should try to maintain a balanced perspective. We don’t want to become fanboys or fangirls of any one way of thinking. Why would we want an explicit type system? Why would we an implicit one?

Folks have proposed adding more implicit typing to Java. Those proposals have been rejected. Consider Gilad Bracha’s response to one of them:

Humans benefit from the redundancy of the type declaration in two ways. Humans benefit from the redundancy of the type declaration in two ways. First, the redundant type serves as valuable documentation—readers do not have to search for the declaration of getMap() to find out what type it returns. Second, the redundancy allows the programmer to declare the intended type, and thereby benefit from a cross check performed by the compiler.

Does C++ support implicit typing? Consider this code for traversing a vector:

vector<string> nations {
  "Menominee",
  "Ojibwe",
  "Potawatomi",
  "Ho-Chunk"
};

for (vector<string>::const_iterator i = nations.begin();
     i != nations.end();
     ++i) {
  std::cout << *i << std::endl;
}

The C++11 standard introduced the auto keyword, which invokes the compiler’s type inference algorithm. We can now say things like:

auto i = 0;

C++11 also got a for-each loop. Putting auto and for-each together, we can now write this:

for (auto i : nations) {
  std::cout << i << std::endl;
}

This begs the question: under what circumstances should you use auto? I don’t know, but here’s what I do know. You should apply your rule of thumb consistently, and your code should be understandable. One internet person’s criteria for using auto is “when you don’t care about the type.” In the example above, I don’t really care about the iterator type. Sometimes you do care about the type, but it’s very easy to find the type right next door in the source code. Like that silly statement that happens all the time when we go to construct something:

Image image = new Image();

We can figure out the type from the right-hand side. We don’t also need it on the left:

auto image = new Image();

Interestingly, C++ requires auto in some situations. If you need to store a lambda (an unnamed function, which will discuss more later), you will discover that lambdas do not have a type in C++. How then do you declare a variable to hold one? You use auto:

auto callback = [](int n, const std::string& s) {
  for (int i = 0; i < n; ++i) {
    std::cout << s << std::endl;
  }
};

callback(3, "i");
callback(2, "wuv");
callback(4, "u");

Just as lazy and greedy were two negatively-framed alternatives to regex quantifiers, explicit and implicit are two positive alternatives to ascribing types.

On to the next dimension: safe vs. unsafe. I used to call this dimension strong vs. weak, but these terms have definitions that are O(n). Our definition of unsafe is this: a type system with an on/off switch. If a value of one type can treated as being of another type, without first undergoing some language-sponsored conversion therapy, that’s an unsafe, subvertible type system. A safe type system, on the other hand, means the type system cannot be turned off. By these definitions, there are very few unsafe languages. C and C++ are for sure.

A safe type system keeps its taste buds operating at all times. An unsafe type system recognizes that sometimes you need to swallow wretched things, and it’d be better to turn them off.

Let’s see a couple of examples of subverting types in C.

enums in C are one example of unsafe typing. We saw this last time: where an element from the enumerated set is expected, I can sneak in an arbitrary int.

Thanks to unsafe typing, we can write a test of see if we are on a little endian or a big endian architecture:

bool islittle() {
  // Does the address point to the little end?
  int x = 1;
  char *c = (char *) &x;
  return *x == 1;
}

Thanks to unsafe typing, we can muck with const stuff:

void muck(const char *s) {
  char *ss = (char *) s;
  ss[0] = 'l';
}

int main(int argc, char **argv) {
  char s[] = "muckety muck muck muck";
  muck(s);
  printf("s: %s\n", s);
  return 0;
}

Most of our other languages allow data to switch types too, but they do so through established channels of conversion. C lets the developer reinterpret and rewrite raw memory.

When would you want unsafe typing? When you don’t want the compiler’s help, either because what it wants to do is too slow or because you know what’s in memory better than it does.

Our final dimension is perhaps the most important and apparent: static vs. dynamic. In a static type system, the compiler knows the types at compile time, either because they’ve been explicitly declared or because they’ve been inferred. Because the compiler has some extra information, it can do a lot of work ahead of time as it builds the machine code representation of the program. This work might include checking that a value’s type supports the requested operation (type checking) and figuring out what function we should have the program counter jump to (static binding). In a dynamic type system, all this same work needs to be done, but at runtime instead of compile time.

In a static type system, we usually attach types to memory cells: “Anything that goes into that box or comes out has to be an int.” In a dynamic type system, we just have a bunch of generic boxes, but one of the items in the box is a tag or label telling us what sort of data is within. One of the greatest advantages of dynamic typing is that we can write one routine that can handle boxes with many different things inside. In a static language, we have to write a routine of type A, another one for type B, and another for type C.

Suppose you had multiple stomachs. If one was responsible for digesting meat and the other plants, in a static tasting system, you could figure out ahead of time which stomach should be called up to do its thing. In a dynamic system, all food drops into the same place and must be inspected to see what should be done with it.

To get a feel for the differences, let’s write our own dynamic typing system in C:

#include <stdio.h>
#include <stdlib.h>

typedef enum {
  INT,
  FLOAT,
  CHAR,
  STRING,
  POINTER
} tag_t;

typedef union {
  int i;
  char c;
  char *s;
  float f;
  void *p;
} value_t;

typedef struct {
  tag_t tag;
  value_t value;
} dynamic_t;

void print(dynamic_t any) {
  if (any.tag == INT) {
    printf("%d\n", any.value.i);
  } else if (any.tag == FLOAT) {
    printf("%f\n", any.value.f);
  } else if (any.tag == STRING) {
    printf("%s\n", any.value.s);
  } else if (any.tag == CHAR) {
    printf("%c\n", any.value.c);
  } else if (any.tag == POINTER) {
    printf("%p\n", any.value.p);
  }
}

int main(int argc, char **argv) {
  dynamic_t any;

  any.tag = INT;
  any.value.i = 17;
  print(any);

  any.tag = FLOAT;
  any.value.f = 17.1;
  print(any);

  any.tag = CHAR;
  any.value.c = '!';
  print(any);

  any.tag = STRING;
  any.value.s = "asdfasd";
  print(any);

  any.tag = POINTER;
  any.value.p = &any;
  print(any);

  return 0;
}

Here’s your TODO list for next time:

See you then!

Sincerely,