teaching machines

CS 1: Lecture 24 – Data in Series

November 3, 2017 by . Filed under cs1, cs145, cs148, fall 2017, lectures.

Dear students,

We officially close out our discussion of the Computer as a Pilot, during which we made the computer navigate this way and that way and back again through our code using conditional statements and loops. But, like always, these ideas of branching and repetition will never leave us.

Computers make a lot of things possible. In particular, they can be used to…

The last of these is what we focus on next. We enter our treatment of the Computer as a Factory Worker. Up till now, our data has been fairly limited. We declared just a few variables in the programs we’ve written. Now we’ll begin to use data that can be numbered in the hundreds or thousands.

But first, let’s do a little data analysis. Let’s plot an ASCII histogram of how many siblings we have. We’ll first collect our numbers in a file, and then we’ll process it with something like this:

File file = new File("siblings.txt");
Scanner in = new Scanner(file);

while (in.hasNextInt()) {
  int nsiblings = in.nextInt();
  ...
}

What exactly do we do with nsiblings? Well, we want to know how many 0-sibling situations there are (there better be none), how many 1-sibling situations, 2-sibling, and so on. We might make this first attempt:

int count0 = 0;
int count1 = 0;
int count2 = 0;
int count3 = 0;
int count4 = 0;

while (in.hasNextInt()) {
  int nsiblings = in.nextInt();
  if (nsiblings == 0) {
    ++count0; 
  } else if (nsiblings == 1) {
    ++count1; 
  } else if (nsiblings == 2) {
    ++count2; 
  } else if (nsiblings == 3) {
    ++count3; 
  } else if (nsiblings == 4) {
    ++count4; 
  }
}

Are there any drawbacks to this approach? I can think of a couple. First, that’s a lot of labor. Second, we’re not done. My mother-in-law had nine kids in her family. My father-in-law had eight. (My wife had three Uncle Daves and two Uncle Matts.) We’re going to need a few more variables. Ugh.

But wait, there’s a better way. Most decent programming languages provide a data type called an array. It lets you collection up a bunch of data under one name. We could create an array that held maybe 20 ints. We do so like this:

int[] counts = new int[20];

In memory, we’ll get 20 ints laid out in sequence. They will be automatically wiped to 0 by the Java virtual machine. The individual boxes within the array are just ints. We can access one through an index:

System.out.println(counts[i]); // reading
counts[i] = 45;                // writing
counts[i] += 1;                // both

Where I have i, you can put any integer expression: a literal, a variable, a method call, anything that yields an int. What indices do you suppose are legal? Just as with String, indices start at 0 and go up to 1 less than the length.

The flexibility of that name is what makes arrays so powerful. We can reduce our earlier code to this:

int[] counts = new int[20];

while (in.hasNextInt()) {
  int nsiblings = in.nextInt();
  ++counts[nsiblings];
}

After we get all our tallies, it’s really easy to walk through the array and print out the results:

for (int i = 0; i < counts.length; ++i) {
  System.out.println("%d -> %d", i, counts[i]);
}

Let’s look at another example. This time we’ll test the crazy Benford’s Law, which states leading digits of numbers are not normally distributed. We’ll examine the populations of cities across the US, and plot a histogram of how their leading digits are distributed.

In these examples, we built up the array as we processed the input. Sometimes we know what data should go in the array ahead of time. In such cases, there are two ways to initialize arrays:

Usually the second of these is preferred.

Arrays can often replace decision structures that we write to simply choose between data. For instance, we might write a program that draws stripes across an image. Without arrays, we’d use an if statement to pick our color:

create image
for each row r
  for each column c
    if c % nstripes == 0
      pick first color
    elsif c % nstripes == 1
      pick second color
    elsif c % nstripes == 2
      pick third color
    elsif ...
      ...
    else
      pick last color
    set pixel (c, r) to color

But with arrays, we can do a much simpler lookup into a collection of colors:

colors = [color0, color1, color2, ...]
create image
for each row r
  for each column c
    i = c % number of colors
    set pixel (c, r) to color

We’ll write this, and maybe animate it to shift across time.

Another really nice thing about arrays is that they all us to pass bundles of data back and forth between methods. We’ll write a score21 method that computes a Blackjack score for a hand of cards given as an array of ints.

Here’s your TODO list to complete before we meet again:

See you next class!

Sincerely,

P.S. It’s time for a haiku!

Once they were arrayed
The periodic table
Chemistry was born

P.P.S. Here’s the code we wrote together in class…

SiblingRivalry.java

package lecture1103;

import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;

public class SiblingRivalry {
  public static void main(String[] args) throws FileNotFoundException {
    String path = "/Users/johnch/Desktop/siblings.txt";
    File file = new File(path);
    Scanner in = new Scanner(file);

    int[] counts = new int[10];
    String[] bars = new String[10];
    
    for (int i = 0; i < bars.length; ++i) {
      bars[i] = "";
    }

    while (in.hasNextInt()) {
      int n = in.nextInt();
      ++counts[n];
      bars[n] += "*";
    }

    in.close();
    
    for (int i = 0; i < counts.length; ++i) {
      System.out.println(bars[i] + " " + counts[i]);
    }
  }
}

Benford.java

package lecture1103;

import java.io.File;
import java.io.FileNotFoundException;
import java.util.Arrays;
import java.util.Scanner;

public class Benford {
  public static void main(String[] args) throws FileNotFoundException {
    Scanner in = new Scanner(new File("/Users/johnch/Desktop/pops.txt"));
    
    int[] counts = new int[10];
    
    while (in.hasNextInt()) {
      String population = in.next();
      int leadingDigit = population.charAt(0) - '0';
      System.out.println(leadingDigit);
      ++counts[leadingDigit];
    }
    
    System.out.println(Arrays.toString(counts));
    
    in.close();
  }
}

Sistogram.java

package lecture1103;

import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;

public class Sistogram {
  public static void main(String[] args) throws FileNotFoundException {
    Scanner in = new Scanner(new File("/Users/johnch/Desktop/counts.txt"));

    int[] counts = new int[10];

    while (in.hasNext()) {
      int n = in.nextInt();
      ++counts[n];
    }

    for (int i = 1; i < counts.length; ++i) {
      System.out.print(repeat('\u2603', counts[i]));
      System.out.println(" " + counts[i]);
    }

    in.close();
  }
  
  public static String repeat(char c, int n) {
    String s = "";
    for (int i = 0; i < n; ++i) {
      s += c;
    }
    return s;
  }
}

Benford.java

package lecture1103;

import java.io.File;
import java.io.FileNotFoundException;
import java.util.Arrays;
import java.util.Scanner;

public class Benford {
  public static void main(String[] args) throws FileNotFoundException {
    Scanner in = new Scanner(new File("/Users/johnch/Desktop/pops.txt"));
    
    int[] counts = new int[10];
    
    while (in.hasNext()) {
      String population = in.next();
      int leadingDigit = population.charAt(0) - '0';
      counts[leadingDigit]++;
    }
    
    System.out.println(Arrays.toString(counts));
    
    in.close();
  }
}