Regular Expressions: Basics for Trados

If you’ve ever set up a project in Trados, then it’s likely that you’ve come across regular expressions. Often abbreviated as regex, regular expressions are special sequences of characters or symbols that define a search pattern. Whether used to filter segments that contain certain phrases, insert placeholders or set translation checks, regular expressions can come in handy during a variety of different tasks.

Despite their useful nature however, they can be a bit of a nightmare to get your head around. After all, when faced with sequences such as the one below, it’s understandable that your brain might get a little overwhelmed!

<[a-z][z-z0-9]*[^<>]*>

In order to help you familiarise yourself with regex therefore, we’re going back to basics and providing you with a regular expression glossary guide. This guide will by no means answer all your queries, but it may just help you understand what some of the main symbols mean which at the end of the day, is half the battle!

Regular Expressions: Trados Glossary Guide

Singular Symbols

Character	Meaning
( )	Round brackets allow you to group parts of regular expressions together. This means that you can follow parentheses with a quantifier that will apply to the entire group.
[ ]	Squared brackets also indicate a set and a character class that can be used inside your brackets. For example: ([A-Z])
\|	Pipes indicate an alternative and allows you to search for more than one term.
?	Questions marks are quantifiers that match the preceding character between 0 and 1 time.
+	A plus symbol is a quantifier that matches the preceding character 1 or more times.
*	An asterisk is a quantifier that matches the preceding character between 0 and more times.
^	This caret symbol can be used to indicate the beginning of a string or a segment.
.	The full stop is a metacharacter that can be used to refer to any symbol that is not a line break.
$	This symbol indicates the end of a string or a segment.
–	Hyphens allow you to indicate a range.
\b	This symbol is a bound that indicates a word boundary.
\d	This bound represents the class of all digits and is equivalent to [0-9].
\n	This will search for a new line in Trados.
\t	This symbol indicates a tab character.
\s	This symbol identifies any white space such as a non-breaking space.

Grouped Sequences

Character	Meaning
[^0-7]	A caret symbol within a set of squared brackets marks a negated class. So, in this instance, the regex means no digit between 0 and 7.
[^0-7.,]	Similar to the expression above, this sequence means no digit between 0-7 and no decimal point or comma.
^[A-Z]	This would search for segments that begin with a capital letter.
^([A-Z])([a-z])	This would search for all strings at the beginning of a segment that start with an uppercase letter and is followed by a lowercase letter.
(\t)( [A-Z])([a-z])	Similar to the above, this would search for all strings that start with an uppercase letter and are followed by a lowercase letter but that are also preceded by a tab character.
[^.]$	This would search for any segment with end punctuation that isn’t a full stop.
(“\|“).*?(“\|”)	This regular expression searches for items that are enclosed in double straight or curly quotes.

We understand that we’ve only just scratched the surface of regular expressions, but we hope these basic points will come in useful to any of you starting to navigate the world of regex!

To keep up to date with Web-Translations, be sure to follow us on Twitter, LinkedIn and more!

21 July 2023 08:07