Match a Substring Using re.search()
00:00 Letâs think about one of the maybe issues that you encountered early on. While you did lowercase all of the text to make it a bit more general to search, you might have noticed that still, given to the fact that every character is just a character for Python, you might get results that youâre not expecting.
00:20 So earlier, you were looking for how often is secret in there, and the count gave you four times, even though one of those is actually a different wordâthatâs secretly. Still, the word secret is in there, so the count makes complete sense according to Python.
00:34 But now maybe youâre looking for a way to identify this one specific word that includes secret but then continues, like maybe thereâs more of those in there, right?
00:44
So one way of doing this is using the built-in re module that youâll need to import. So Iâm going to say import re, and then I can use re.search()
00:58 and pass it a regular expression pattern.
01:04
And here Iâm adding the r before the string to create a raw string. This is normal Python and doesnât have anything to do with regex. In a raw string, Python treats the string literally without interpreting any special characters.
01:17
This means that an escape sequence, which in Python starts with a backslash character (\), isnât interpreted. Think of \n, which stands for the newline character.
01:27
If the combination of these two characters is in a normal Python string, then it stands for a newline character. If itâs in a raw string, then Python treats it as two separate characters: \ and n. Such raw strings are useful for writing regex patterns because the backslash character has a special meaning also in regular expressions. And if you use a raw string, then you can avoid having to double-escape the backslash character. For now, the quick takeaway is that prefixing your regex pattern with r can make writing it a bit less complicated, and you can use every bit of reduced complication that you can get with regex patterns.
02:06
In this case, Iâm going to say "secret", so similar to the substring that you searched before. Now Iâm adding a regex word character that is \w. Itâs a placeholder for any word character.
02:20
And in regex, a word character means any letter, digit, or underscore, which means that itâs going match this l here that comes after, but it wonât match a whitespace that would come after some or a dot, for that matter. So in this case, itâs going to match the l, but you want to get the full word, so you need more than one word character, and you can do that with by adding the quantifier plus (+) at the end.
02:47
So this regular expression pattern is going to find the word "secretly" here, but itâs more flexible than that. The same pattern would also match other words such as "secrets" or "secretary", or even "secret_9".
03:04 Thatâs the kind of flexibility you get when using regular expressions.
03:08
And of course, you also need to say where it can search, and thatâs going to be text_lower. If you press Enter here, you can see that the re module returns you a Match object,
03:23
and you have quite a bit of information in there. So you can see that it matched the word 'secretly', and it also tells you where does the substring start and where does it end.
03:34 So I get quite a bit of information in here, basically out of the box.
03:40
Now you can work with that, right? In the next lesson, youâll pick apart this Match object and learn about a few methods that you can use to extract different pieces of information from it.
Become a Member to join the conversation.
