standard types and regular expressions cs 480/680 – comparative languages

27
Standard Types and Standard Types and Regular Expressions Regular Expressions 480/680 – Comparative Languages 480/680 – Comparative Languages

Upload: arthur-green

Post on 06-Jan-2018

219 views

Category:

Documents


2 download

DESCRIPTION

Types & Regular Expressions3 Numeric Classes  Integer classes support a number of iterators 3.times { … } 1.upto(5) { … } 99.downto(7) { … } 50.step(80, 5) { … } = 50, 55, 60, 65, …, 80

TRANSCRIPT

Page 1: Standard Types and Regular Expressions CS 480/680 – Comparative Languages

Standard Types and Regular Standard Types and Regular ExpressionsExpressions

CS 480/680 – Comparative LanguagesCS 480/680 – Comparative Languages

Page 2: Standard Types and Regular Expressions CS 480/680 – Comparative Languages

Types & Regular Expressions 2

NumbersNumbers Most integers are Fixnum objects

• When they grow too large, the are converted to Bignum objects

An arbitrary length list of fixnums

Literals:• 12345 – decimal

Underscores ignored (12_345 == 12345) (Why?)• 0377 – octal (leading 0)• 0x3F7A – hex• 0b110111010001 – binary

Page 3: Standard Types and Regular Expressions CS 480/680 – Comparative Languages

Types & Regular Expressions 3

Numeric ClassesNumeric Classes Integer classes support a number of iterators

• 3.times { … }• 1.upto(5) { … }• 99.downto(7) { … }• 50.step(80, 5) { … } = 50, 55, 60, 65, …, 80

Page 4: Standard Types and Regular Expressions CS 480/680 – Comparative Languages

Types & Regular Expressions 4

StringsStrings A String is a sequence of 8-bit bytes

• Usually holds ASCII characters, but not necessary, can hold numbers

String literals• Single quotes: only \\\ and \’’• Double quotes:

Escape sequences like \n Any ruby expression:

– #{var1}– #{2*$var2+var3/7}

Page 5: Standard Types and Regular Expressions CS 480/680 – Comparative Languages

Types & Regular Expressions 5

String LiteralsString Literals If you want to use another delimeter, you can

use %q (single quotes) or %Q (double quotes)• %q/string string “string”/• %Q(This ‘is’ a #{var2} string)

Opening bracket, brace, parenthesis, or less-than sign: matching delimeter

Anything else – same character

Page 6: Standard Types and Regular Expressions CS 480/680 – Comparative Languages

Types & Regular Expressions 6

““Here Documents”Here Documents” Specify a delimiter string using <<STRING

Delimiter must be in first column• <<-STRING allows indented delimeter

aString = <<END_OF_STRING     The body of the string     is the input lines up to     one ending with the same     text that followed the '<<' END_OF_STRING

print <<-STRING1, <<-STRING2 Concat STRING1 enate STRING2

produces: Concat enate

Includes newlines and spaces

Page 7: Standard Types and Regular Expressions CS 480/680 – Comparative Languages

Types & Regular Expressions 7

String MethodsString Methods String is one of the largest classes in Ruby

• Over 75 standard methods Many of the more powerful methods use

regular expressions, so we’ll come back to the topic of String Methods after we discuss regular expressions in more detail…

Page 8: Standard Types and Regular Expressions CS 480/680 – Comparative Languages

Types & Regular Expressions 8

RangesRanges In Ruby ranges can be used for sequences,

conditions, and intervals 1..5 = 1, 2, 3, 4, 5 1…5 = 1, 2, 3, 4 (0…x is useful for arrays) Stored efficiently – a range object only stores

the min and max values as Fixnums Can convert to an array with to_a

• (1..5).to_a [1, 2, 3, 4, 5]• (‘bar’..’bat’).to_a [‘bar’, ‘bas’, ‘bat’]

Page 9: Standard Types and Regular Expressions CS 480/680 – Comparative Languages

Types & Regular Expressions 9

Range Methods and IteratorsRange Methods and Iterators A few useful operations on ranges:

digits = 0..9

digits.include?(5) » true digits.min » 0 digits.max » 9 digits.reject {|i| i < 5 } » [5, 6, 7, 8, 9] digits.each do |digit| dial(digit) end

Page 10: Standard Types and Regular Expressions CS 480/680 – Comparative Languages

Types & Regular Expressions 10

Range ContentsRange Contents Ranges can even be created on objects that you

define, provided that your class…• Implements the succ() method, providing the

next object in the sequence, and• Objects are comparable using <=> (the “spaceship

operator”) Returns -1/0/1 depending on whether the first object is

less-than/equal-to/greater-than the second

Page 11: Standard Types and Regular Expressions CS 480/680 – Comparative Languages

Types & Regular Expressions 11

Ranges of objectsRanges of objects VU holds a volume level, 0 to 9

class VU include Comparable attr_reader :volume def initialize(volume) # Should be 0..9 @volume = volume # ERROR CHECKING HERE! end def inspect # Prints out as ######... '#' * @volume end # Support for ranges def <=>(other) self.volume <=> other.volume end def succ raise(IndexError, "Too loud") if @volume >= 9 VU.new(@volume.succ) endend

Page 12: Standard Types and Regular Expressions CS 480/680 – Comparative Languages

Types & Regular Expressions 12

Volume ExampleVolume Example Volume object print out as 0 to 9 #’s Can make ranges of volume objects, since they

follow the rules

medium = VU.new(4)..VU.new(7) medium.to_a » [####, #####, ######, #######]

Actually, four VU objects

medium.include?(VU.new(3)) » false

Page 13: Standard Types and Regular Expressions CS 480/680 – Comparative Languages

Types & Regular Expressions 13

Conditions and IntervalsConditions and Intervals Ranges can also be used as conditions and as

intervals for controlling loops We’ll see these uses when we talk about loops

in Ruby

Page 14: Standard Types and Regular Expressions CS 480/680 – Comparative Languages

Types & Regular Expressions 14

Regular ExpressionsRegular Expressions Regular expressions are a powerful tool for

matching patterns against strings Available in many languages (AWK, Sed, Perl,

Python, C/C++, others) Matching strings with RegExp’s is very

efficient and fast In Ruby, RegExp’s are objects, like everything

else

Page 15: Standard Types and Regular Expressions CS 480/680 – Comparative Languages

Types & Regular Expressions 15

RegExp literalsRegExp literals There are three ways to create a regular

expression• a = Regexp.new(‘pattern’)• b = /pattern/• c = %r(pattern)

Match a Regexp against a string using• exp.match(string)• string =~ exp (positive match)• string !~ exp (negative match)

Page 16: Standard Types and Regular Expressions CS 480/680 – Comparative Languages

Types & Regular Expressions 16

String MatchingString Matching =~ and !~ are also defined for strings

• The string on the right is converted to a Regexp Return the position of the first match, or nil

• Zero-based

a = "Fats Waller" a =~ /a/ » 1 a =~ /z/ » nil a =~ "ll" » 7

Page 17: Standard Types and Regular Expressions CS 480/680 – Comparative Languages

Types & Regular Expressions 17

Regular Expression PatternsRegular Expression Patterns Most characters match themselves Wildcard: . (period) = any character Anchors

• ^ = “start of line”• $ = “end of line”

Page 18: Standard Types and Regular Expressions CS 480/680 – Comparative Languages

Types & Regular Expressions 18

Character ClassesCharacter Classes Character classes: appear within [] pairs

• Most special Regexp characters (^, $, etc) turned off• Escape sequences (\n etc) still work• [aeiou]• [0-9]• ^ as first character = negate the class• You can use the literal characters ] and – if they

appear first: []-abn-z]

Page 19: Standard Types and Regular Expressions CS 480/680 – Comparative Languages

Types & Regular Expressions 19

Predefined character classesPredefined character classes These work inside or outside []’s:

• \d = digit = [0-9]• \D = non-digit = [^0-9]• \s = whitespace, \S = non-whitespace• \w = word character [a-zA-Z0-9_]• \W = non-word character

Page 20: Standard Types and Regular Expressions CS 480/680 – Comparative Languages

Types & Regular Expressions 20

Repetition in RegexpsRepetition in Regexps These quantify the preceding character or class:

• * = zero or more• + = one or more• ? = zero or one• {m, n} = at least m and at most n• {m, } = at least m

High precedence – Only matches one character or class, unless grouped:• /^ran*$/ vs. /^r(an)*$/

Page 21: Standard Types and Regular Expressions CS 480/680 – Comparative Languages

Types & Regular Expressions 21

AlternationAlternation | is like “or” – matches either the regexp before

the | or the one after Low precedence – alternates entire regexps

unless grouped• /red ball|angry sky/ matches “red ball” or “angry

sky” not “red ball sky” or “red angry sky)• /red (ball|angry) sky/ does the latter

Page 22: Standard Types and Regular Expressions CS 480/680 – Comparative Languages

Types & Regular Expressions 22

Side Effects (Ruby Magic)Side Effects (Ruby Magic) After you match a regular expression some

“special” Ruby variables are automatically set:• $& – the part of the expression that matched the

pattern• $‘ – the part of the string before the pattern• $’ – the part of the string after the pattern

Page 23: Standard Types and Regular Expressions CS 480/680 – Comparative Languages

Types & Regular Expressions 23

Side effects and groupingSide effects and grouping When you use ()’s for grouping, Ruby assigns

the match within the first () pair to:• \1 within the pattern• $1 outside the pattern

“mississippi” =~ /^.*(iss)+.*$/ » $1 = “iss”

/([aeiou][aeiou]).*\1/

Page 24: Standard Types and Regular Expressions CS 480/680 – Comparative Languages

Types & Regular Expressions 24

Repetition and greedinessRepetition and greediness By default, repetition is greedy, meaning that it

will assign as many characters as possible. You can make a repetition modifier non-greedy

by adding ‘?’

a = "The moon is made of cheese“ showRE(a, /\w+/) » <<The>> moon is made of cheese showRE(a, /\s.*\s/) » The<< moon is made of >>cheese showRE(a, /\s.*?\s/) » The<< moon >>is made of cheese showRE(a, /[aeiou]{2,99}/) » The m<<oo>>n is made of cheese showRE(a, /mo?o/) » The <<moo>>n is made of cheese

Page 25: Standard Types and Regular Expressions CS 480/680 – Comparative Languages

Types & Regular Expressions 25

String Methods RevisitedString Methods Revisited s.split(regexp) – returns a list of substrings,

with regexp as a delimeter• Can assign to an array, or use multiple assignment

s.sqeeze(string) – reduces any runs of more than one character from string to only one

songFile.each do |line| file, length, name, title = line.chomp.split(/\s*\|\s*/) songs.append(Song.new(title, name, length))end

Page 26: Standard Types and Regular Expressions CS 480/680 – Comparative Languages

Types & Regular Expressions 26

String MethodsString Methods s.scan(regexp) – returns a list of parts that

match the patternst = "123 45 hello out67there what's 23up?"a = st.scan(/\d+/)puts a

»123456723

Many more in Built-in Classes and Methods!

Page 27: Standard Types and Regular Expressions CS 480/680 – Comparative Languages

Types & Regular Expressions 27

Regexp substitutionsRegexp substitutions a.sub (one replacement) & a.gsub (global) Replace a regular expression with a string The string can include \1, \2, etc. to match parts

of the original pattern See substitutions.rb & Ruby book: Standard

Types