You may have heard of RegEx, which is short for regular expressions. It can come off as intimidating, but it’s not too bad once you get used to what the patterns mean and how to construct an actual expression and use it. Once you get used to thinking about strings and text in a more abstract way, it can be a useful tool for solving problems where you are looking for common patterns in a set of data.
RegEx is a method of pattern matching: a way to filter strings or text based on a pattern, usually to extract and modify the desired text. In this article, we will discuss how to use Regular Expressions and how to test those expressions using Ruby methods to incorporate into your logic for your project.
One tool that is extremely helpful when it comes to visualizing and understanding RegEx is a site called Rubular. Click the link here to test RegEx using the block of text that’s already been populated. You’ll notice that in between the two forward slashes is a string with the word ‘neighbor’ in it.
Believe it or not, this is a regular expression! Whole words, sentences, paragraphs even can technically be called regular expressions (as long as they are in between two forward slashes). The Rubular environment highlights for us every single instance of the pattern ‘neighbor’ in our block of text – even instances where neighbor is part of a bigger word, too! That being said, you might want to find something more abstract than an exact word match. This is where metacharacters come in.
Metacharacters
Just as atoms are the building blocks of pretty much everything we see around, metacharacters are the building blocks of regular expressions. As you add on to your regular expression, the overall pattern changes. And when the overall pattern changes, the results you get back from from the methods you use can be different.
Listed below are several ways to modify your regular expression so you can find a pattern that works for you. There is no one absolutely right way to write a regular expression for phone numbers or emails, etc. – it’s all about what your needs are for your project.
Metacharacter | Matches | Example |
[abc] | A character class that matches a single character in the string that could be a, b or c | /[eig]/ can match portions of neighbor, apple, or gate |
[^abc] | A negated character class that matches every single character in the string but a, b or c | /[^eig]/ can match portions of neighbor, apple, or gate |
[a-z] | A character class that matches any single character in the range a-z | /[e-i]/ can match single characters in portions of neighbor, apple, or gate |
[a-zA-Z] | A character class that matches a range of characters from a-z or A-Z | /[e-i]/ can match single characters in portions of “Hi neighbor!”, Grapple, or gate |
^ | Start of line | /^Hello/ matches lines that start with ‘Hello’ |
$ | End of line | /Goodbye$/ matches lines that end in ‘Goodbye’ |
\A | Start of string. Similar to ‘^’ , but with no multiline mode | /\Aa/ matches the ‘a’ in apple, but not the ‘a’ in apricot since it’s not the beginning of the string: apple apricot |
\z | End of string. Similar to ‘$’, but with no multiline mode | /\za/ matches the ‘a’ in zebra, but not the ‘a’ in libra since it’s not the end of the string librazebra |
. | Wild card. Dot matches any character. | /./ will match any single character in apple |
+ | Matches one or more of the previous metacharacter | /aa+/ will match ‘aa’, ‘aaaaaaa’ but will not match ‘a’ since it has to be one or more of the previous metacharacter (which in this instance is the second a) |
* | Matches zero or more of the previous metacharacter | /ab*/ will match ‘a’, ‘ab’, ‘abbbbbb’ |
\s | Any whitespace character | /^The\s.+s$/ will match The Beatles, The Rolling Stones, The Cranberries, etc. |
\S | Any non-whitespace character | /\S+/ will match The Beatles, The Rolling Stones, The Cranberries, etc. |
\d | Any digit | /\d+/ will match 22, 33333, 0, etc |
\D | Any non-digit | /\D+/ will match ‘Hello, goodbye’ |
\w | Any word character | /ny\w*/ will match ‘ny_152’, ‘nypost39’, etc |
\W | Any non-word character | /\W+/ will match ‘)(*&^%$’ |
a{3} | Exactly 3 of ‘a’ | /\d{3}-\d{3}-\d{4}/ will match 555-555-5555 |
a{3,} | Three or more of ‘a’ | /[a-zA-Z0-9!#$^&*)(]{8,}/ will match ‘xE*BqRx14B7TAQp’ ⇐ which looks like it could be used as a password! |
a{3, 6} | Three to six of ‘a’ | /[a-zA-Z0-9!#$^&*)(]{8,32}/ will match ‘0XX!pC3Odpu30Qc’ because it’s more than 3 and less than 32 characters in length |
a? | 0 or 1 of ‘a’ | /\d?-\d{3)-\d{3}-\d{4}/ will match a phone number with an international code attached to front and one without an international code attached to front. |
Using metacharacters is great for validation when it comes to users filling out forms on websites. We want to make sure correct information is entered – that would be a great use of RegEx to make sure the pattern of an address or of an email or phone number is the correct format. This leads to better organized databases with less user error when registering new accounts.
Methods to Test RegEx in Ruby
Here’s the code we are going to use to differentiate between scan and match:
#!/usr/bin/ruby class RegexTest def initialize(str, regex) @str = str @regex = regex @result = str.scan(regex) end def display_details() puts "String = #@str" puts "regex = #@regex" puts "result = #@result" end end # Create Objects str1 = RegexTest.new("The rain in Spain stays mainly on the plain", /\w+ain/) str2 = RegexTest.new("In Hertford, Hereford, and Hampshire, hurricanes hardly ever happen", /H\w+/) # Call Methods str1.display_details() str2.display_details()
Scan
The scan method in Ruby returns an array of all strings that match your regular expression:
str1: result = ["rain", "Spain", "main", "plain"]
str2: result = ["Hertford", "Hereford", "Hampshire"]
This allows you to do whatever you would like with the result.
RegExp Match
The regular expression Match method is very, VERY similar to scan, but finds the first instance of a match instead of all matches. Change @result = str.scan(regex) to @result = str.match(regex) to take a look at the difference:
str1: result = rain
str2: result = Hertford
Match, however, returns a <Matchdata> object. It’s got some methods associated with it that can be used in your logic when you use your results. Take a look at the Ruby docs for more information on what you could use there.
Grep
Grep is an enumerable method for finding matching strings in arrays. It will return an array of all strings that match your regular expression. With the code we have, we have to make sure that the string we passed in is split up into an array.
To do this change this line of code:
@result = str.match(regex)
And change it to:
@result = str.split(/\s|,/).grep(regex);
"Career Karma entered my life when I needed it most and quickly helped me match with a bootcamp. Two months after graduating, I found my dream job that aligned with my values and goals in life!"
Venus, Software Engineer at Rockbot
You will then get a result similar to the first result:
str1: result = ["rain", "Spain", "main", "plain"]
str2: result = ["Hertford", "Hereford", "Hampshire"]
Str =~ RegEx
Using the =~ basic matching operator, we can compare the string to the regular expression and return the first index of a match. It will return nil if there is no match.
Conclusion
In this article, we discussed how to use regular expressions (RegEx) in Ruby. If you want to learn more about what you can build with Ruby, check out our article, ”What Is Ruby Code Used For?”
Want a better way to learn Ruby? Let Career Karma help you find the best training program for you.
About us: Career Karma is a platform designed to help job seekers find, research, and connect with job training programs to advance their careers. Learn about the CK publication.