teaching machines

CS 330 Lecture 6 – Asserting and Find-All

February 3, 2017 by . Filed under cs330, lectures, spring 2017.

Dear students,

We will discuss three common operations we want to perform on text that are only fun when regular expressions are involved:

  1. Asserting that text matches a pattern.
  2. Finding all matches of a pattern in a document.
  3. Replacing all matches of a pattern with some other text.

Today we focus on the first two of these. We will meet the operators =~ and !~ and the method String.scan. We will also capture portions of the matching text using the regex’s capturing facilities.

Let’s write regex to do the following:

  1. Assert that the user input is all letters.
  2. Assert that the user input contains at least two numbers.
  3. Assert that the user input has no whitespace.
  4. Assert that the user input is an HTML element.
  5. Assert that the user input is a binary string.
  6. Extract the index out of an array subscripting (e.g., 5 in counts[5]).
  7. Tease apart the username and domain from an email address. Regex is probably overkill for this problem.
  8. Extract the URL from an img element.
  9. Print the name of the month given a date like MM/DD/YYYY.
  10. The name of the primary class of a Java source file fed in as standard input.
  11. Identify all the fields of study listed in a dictionary—the -ology, -nomy, and -nomics words.
  12. List all the identifiers in a file.
  13. List all the string literals in a file.

Here’s your TODO list for next time:

Sincerely,

all_letters.rb

#!/usr/bin/env ruby

input = gets.chomp

if input =~ /^[a-zA-Z]+$/
  puts 'all letters'
else
  puts 'No good, foobag.'
end

two_numbers.rb

#!/usr/bin/env ruby

input = gets.chomp

if input =~ /\d.*\d/
  puts 'two numbers'
else
  puts 'No good, foobag.'
end

no_whitespace.rb

#!/usr/bin/env ruby

input = gets.chomp

if input !~ /\s/
  puts 'no whitespace'
else
  puts 'No good, foobag.'
end

html_element.rb

#!/usr/bin/env ruby

input = gets.chomp

if input =~ /^<[^<>]+>$/
  puts 'html element'
else
  puts 'No good, foobag.'
end

email.rb

#!/usr/bin/env ruby

input = gets.chomp

input =~ /(.*)@(.*)/
puts $1
puts $2

# input =~ /(.*.*)/

month.rb

#!/usr/bin/env ruby

input = gets.chomp
months = ["January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December"]

input =~ %r{^(\d\d)/\d\d/\d\d\d\d$}
imonth = $1.to_i
puts "imonth: #{imonth}"
puts months[imonth - 1]