By Sean Devlin, SEO Manager.
Although it’s been on the horizon for a while, Google has finally released regex (or regular expression) filtering for the performance report in Google Search Console. For those already familiar with regex, it is a welcome addition allowing users to quickly filter data under multiple variables and extract the key information with relative ease.
For those asking, “what’s regex?”, this article is for you, providing a quick start guide to get your data analysis skill running.
What is regex?
Regex utilises a sequence of characters that specifies a search pattern. This allows for the segmentation of data using “quantifiers”, filtering data that matches the specified characters defined by the regular expression. It may sound complicated, but once you get familiar with the functionality of the common characters used, you will find regex much easier.
Firstly, how to enter regex in Google Search Console:
Navigate to the “performance report” within your Google Search Console account. Once open, click on the “New” icon, this will launch an additional filter options to select.
Once chosen, you will be presented with a set of filter options. Select “Custom (regex)” and you will be presented with a field that allows you to enter your regular expressions. When you are happy with your regex commands, click the “apply” button to view the results.
Getting Started with regex within Google Search Console
Below are common characters used within regex commands that can be applied in many situations. We have also included regex strings which can be tailored to suit your requirements and help filter through data within Google Search Console.
“|” – Separator – Can be used to find combinations of terms within text strings, for example “nike|adidas” would return results for “nike” & “adidas”. Separator commands should not end with a “|”, e.g., “nike|adidas|”, as the end pipe instructs a search for anything, bringing back all results.
“.*” – Wildcard – Matches any sequence of characters before and after the insertion. For example, the command “.*nike.*” will bring back a match for https://www.example.com/nike-air-trainers. This is probably the command most people will encounter.
“\” – Escape Character – If you’re searching for a text string which contains a regex character, add a backslash before and regex will treat it like a normal character. Escape characters are useful for filtering urls with parameters, e.g., “index\.html\?search=nike\+air”, which contain symbols with regex functionality. In this example we are escaping “.”, “?” & “+”, to treat as normal characters.
“^” – Beginning of a Line – Basically means the results returned must begin with what is placed after the “^”. This is essential for segmenting urls that use a specific folder path within a website.
“$” – Ends a line – Placing this character at the end of the string signals that the results must end with the last character. Essentially “^nike$” would be an exact string match.
Useful regex Strings
- parent-category\/sub-category – Filter urls via parent and sub-categories
- parent-category\/sub-category\/level-three-category – Filter pages three levels down
- parent-category\/sub-category\/tiertary-category|\/alternate-category.* – Segment urls three levels down and compare with additional url path
- .*Keyword.* – Filter via keyword containing anything before or after
- ^Keyword – Segment search terms starting with
- Keyword$ – Segment search terms ending with
- Keyword1|keyword2|keyword3 – Filter by search terms that contain either of the three keywords
Learning more about regex
Regex is brilliant for filtering data sets and is used in many programming languages such as “Python”, “PHP”, “.Net” to great effect. If this article has spiked your interest, there are many more functions and sequences to learn which can improve the speed and accuracy of your data analysis. As a top tip, I would recommend creating a regex code library, containing regex snippets which can be recycled and used as and when needed, which can save a lot of time.
Further reading and useful resources
- https://regexr.com – This website allows you to test your regex commands, highlighting the returned results from the directive
- https://riptutorial.com/regex – In depth tutorials surrounding regex and how they can be used
- https://www.annielytics.com/blog/programming/regex-marketers-stupid-simple-real-world-examples-video/ – Tutorial using regex in the wild, this is a good place to start expanding your knowledge
- https://docs.google.com/spreadsheets/d/1yZ0QpzFDqiW-mMG4XFwD7eip4JCGC4cSW8pxR7gT4hE/edit#gid=0 – A Google Sheet created by JR Oaks from LocomotiveSEO, get segmenting urls in GSC!
- https://regexcrossword.com – Solve puzzles with regex, quite addictive and probably one of the better (and more fun) ways to advance your regex skills
About the author:
Sean Devlin is SEO Manager for CreativeRace, having worked in Digital Marketing for over 9 years and for brands such as Fortnum and Mason, GHD & Aldi.