Groups in regular expressions in Javascript

Defining groups in regular expressions

Groups are used to search for more complex matches in a string. Groups are enclosed in parentheses in regular expressions. For example, we have the following html code which contains an image tag: ‘‘. And let’s say we need to isolate paths to images from this code:

 
let initialText = '<img src= "picture.png" />';
let exp = /[az]+\.(png|jpg)/i;
let result = initialText match(exp);
result.forEach(function(value, index, array){
    
    console log(value);
})

Browser output:

picture.png

The first part before the brackets ([az]+\.) indicates the presence in the string of 1 or more characters from the az range, followed by a dot. Since the dot is a special character in regular expressions, it is escaped with a slash. And then comes the group (png|jpg): This group specifies that either “png” or “jpg” can be used after the dot.

Getting Group Values

The advantage of using groups in regular expressions is that we can get the values ​​of each individual group. For example, as you know, different countries use different date formats. What if we want to receive a date in year-month-day format and want to convert it to some other format? In this case, for each individual component, we can define our own group:

 
const exp = /(\d{4})-(\d{2})-(\d{2})/;
const result = exp.exec("2021-09-06");

console.log(result[0]); // 2021-09-06
console.log(result[1]); // 2021
console.log(result[2]); // 09
console.log(result[3]); // 06
console.log(`${result[3]}.${result[2]}.${result[1]}`); // 06.09.2021

This uses the regular expression “/(\d{4})-(\d{2})-(\d{2})/”, where three groups are defined:

The first group (\d{4})corresponds to a four-digit number

The second group (\d{2})corresponds to a two-digit number

The third group is similar to the second

The result is an array, where the first element (at index 0) always represents the substring that matched the regular expression. All subsequent elements of this array represent groups. That is, the first group has index 1, the second has index 2, and so on.

Named groups

JavaScript allows you to assign a specific name to each group in regular expressions. Using this name, you can then get the value that corresponds to this group. To set the name of a group, inside the brackets that define the group, put a question mark followed by the name of the group in angle brackets:

 
(?<group_name> ... )
    

Consider the following example:

 
const exp = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/u;
const result = exp.exec("2021-09-06");
console.log(result.groups);         // {year: "2021", month: "09", day: "06"}
console.log(result.groups.year);    // 2021
console.log(result.groups.month);   // 09
console.log(result.groups.day);     // 06

Here the regular expression defines three groups. The first group is called “year”, the second is called “month”, and the third is called “day”. When we get the result, we can access each group via the groups. This property represents an object where the properties are named the same as the groups and contain values ​​for each group:

 
console.log(result.groups);         // {year: "2021", month: "09", day: "06"}

Accordingly, using the name of the group, we can get the value for a particular group.

Statements

Statements allow you to get a substring that matches a regular expression and that is preceded or, conversely, not preceded by a specific expression.

A positive assertion (when a substring must be preceded by another substring) is defined using the expression:

 
(?<=...)

After the equal sign = there is an expression with which the substring must be preceded.

A negative assertion (when a substring must NOT be preceded by another substring) is defined using the expression:

 
(?<!...)

After the exclamation mark! there is an expression that should NOT be preceded by a substring.

Let’s take a simple task. Let’s say we have some information with some amount. But this amount can be defined in dollars, euros, rubles, and so on. Something like:

 
const text1 = "All costs: $10.53";
const text2 = "All costs: €10.53";

const exp = /\d+(\.\d*)?/;
const result1 = exp.exec(text1);
console.log(result1[0]);    // 10.53

const result2 = exp.exec(text2);
console.log(result2[0]);     // 10.53

Here we see that both the dollar amount () and the euro amount match our regular expression. But what if we only want the dollar amount? For this we apply the positive statement:

 
const text1 = "All costs: $10.53";
const text2 = "All costs: €10.53";

const exp = /(?<=\$)\d+(\.\d*)?/;

const result1 = exp.exec(text1);
console log(result1); // ["10.53", ".53", index: 12, input: "All costs: $10.53", groups: undefined]

const result2 = exp.exec(text2);
console log(result2); // null

The group (?<=\$)specifies that the line must be preceded by a dollar sign $. If there is none, then the method exec()will not find a match and will return null.