What is Regular Expression in MongoDB?

If You are interested to learn about the text search in MongoDB

MongoDB has the ability to use a regular expression to search for a pattern within a string during a query. A typical method for matching patterns with character sequences is to use a regular expression. Version 8.42 of the Perl Compatible Regular Expressions (PCRE) and UTF-8 support are used by MongoDB.

All languages typically employ regular expressions to look for a pattern or word within a string. Using the $regex operator, MongoDB additionally offers regular expression capabilities for matching text patterns. To use regular expressions, unlike text search, there is no configuration or command required.

Assume we have inserted a document in a database named posts as shown below −

> db.posts.insert(
{
   "post_text": "enjoy the mongodb articles on futurefundamentals",
   "tags": [
      "mongodb",
      "tutorialspoint"
   ]
}
WriteResult({ "nInserted" : 1 })

Using regex Expression

The following regex query searches for all the posts containing string tutorialspoint in it −

> db.posts.find({post_text:{$regex:"tutorialspoint"}}).pretty()
{
	"_id" : ObjectId("5dd7ce28f1dd4583e7103fe0"),
	"post_text" : "enjoy the mongodb articles on futurefundamentals",
	"tags" : [
		"mongodb",
		"futurefundamentals"
	]
}
{
	"_id" : ObjectId("5dd7d111f1dd4583e7103fe2"),
	"post_text" : "enjoy the mongodb articles on futurefundamentals",
	"tags" : [
		"mongodb",
		"futurefundamentals"
	]
}
>

The same query can also be written as −

>db.posts.find({post_text:/futurefundamentals})

Using regex Expression with Case Insensitive

With the value I we use the $options argument to make the search case-insensitive. Regardless of whether the term tutorialspoint is in smaller or capital letters, the following command will search for texts that contain it.

>db.posts.find({post_text:{$regex:"futurefundamentals",$options:"$i"}})

One of the results returned from this query is the following document which contains the word tutorialspoint in different cases −

{
   "_id" : ObjectId("53493d37d852429c10000004"),
   "post_text" : "hey! this is my post on futurefundamentals", 
   "tags" : [ "futurefundamentals" ]
} 
 

Using regex for Array Elements

Regex can also be applied to array fields. This is crucial when implementing tag functionality, in particular. Therefore, the following code can be used to search for any entries with tags that start with the word tutorial (either tutorial, tutorials, tutorialpoint, or tutorialphp).

>db.posts.find({tags:{$regex:"tutorial"}})

Using $regex operator for Pattern matching

The regex operator in MongoDB is used to search for specific strings in the collection. The following example shows how this can be done.

Let’s assume that we have our same Employee collection which has the Field names of “Employeeid” and “EmployeeName”. Let’ also assume that we have the following documents in our collection.

Employee idEmployee Name
22NewMartin
2Mohan
3Joe
4MohanR
100Guru99
6Gurang

Here in the below code we have used regex operator to specify the search criteria.

db.Employee.find({EmployeeName : {$regex: "Gu" }}).forEach(printjson)

Code Explanation:

  1. Here we want to find all Employee Names which have the characters ‘Gu’ in it. Hence, we specify the $regex operator to define the search criteria of ‘Gu’
  2. The printjson is being used to print each document which is returned by the query in a better way.

If the command is executed successfully, the following Output will be shown:

Output:

The output makes it quite evident that the papers that have the ‘Gu’ characters in the Employee Name are those that are returned.

Consider that you have the following documents in your collection along with a separate document that lists “Guru999” as the employee name. The page containing “Guru999” would also be returned if you used “Guru99” as the search term. But what if we simply wanted to return the document with the word “Guru99”? Then, using accurate pattern matching, we can accomplish this. Using the characters and $ will allow us to perform a precise pattern matching. The string will get the character at the start and the $ symbol at the end.

Employee idEmployee Name
22NewMartin
2Mohan
3Joe
4MohanR
100Guru99
6Gurang
8Guru999

The following example shows how this can be done.

MongoDB Regular Expression (Regex) Tutorial
db.Employee.find({EmployeeName : {$regex: "^Guru99$"}}).forEach(printjson)

Code Explanation:

  1. Here in the search criteria, we are using the ^ and $ character. The ^ is used to make sure that the string starts with a certain character, and $ is used to ensure that the string ends with a certain character. So when the code executes it will fetch only the string with the name “Guru99”.
  2. The printjson is being used to print each document which is returned by the query in a better way.

If the command is executed successfully, the following Output will be shown:

Output:

MongoDB Regular Expression (Regex) Tutorial

In the output, it is clearly visible that string “Guru99” is fetched.

Pattern Matching with $options

When using the regex operator one can also provide additional options by using the $options keyword. For example, suppose you wanted to find all the documents which had ‘Gu’ in their Employee Name, irrespective of whether it was case sensitive or insensitive. If such a result is desired, then we need to use the $options with case insensitivity parameter.

The following example shows how this can be done.

Let’s assume that we have our same Employee collection which has the Field names of “Employeeid” and “EmployeeName”.

Let’ also assume that we have the following documents in our collection.

Employee idEmployee Name
22NewMartin
2Mohan
3Joe
4MohanR
100Guru99
6Gurang
7GURU99

Now if we run the same query as in the last topic, we would never see the document with “GURU99” in the result. To ensure this comes in the result set, we need to add the $options “I” parameter.

MongoDB Regular Expression (Regex) Tutorial
db.Employee.find({EmployeeName:{$regex: "Gu",$options:'i'}}).forEach(printjson)

Code Explanation:

  1. The $options with ‘I’ parameter (which means case insensitivity) specifies that we want to carry out the search no matter if we find the letters ‘Gu’ in lower or upper case.

If the command is executed successfully, the following Output will be shown:

Output:

MongoDB Regular Expression (Regex) Tutorial
  1. The output clearly shows that even though one document has the upper case ‘Gu’ , the document still gets displayed in the result set.

Pattern matching without the regex operator

One can also do pattern matching without the regex operator. The following example shows how this can be done.

MongoDB Regular Expression (Regex) Tutorial
db.Employee.find({EmployeeName: /Gu/'}).forEach(printjson)

Code Explanation:

  1. The “//” options basically means to specify your search criteria within these delimiters. Hence, we are specifying /Gu/ to again find those documents which have ‘Gu’ in their EmployeeName.

If the command is executed successfully, the following Output will be shown:

Output:

MongoDB Regular Expression (Regex) Tutorial

The output clearly shows that those documents wherein the Employee Name contains the ‘Gu’ characters are returned.

Fetching last ‘n’ documents from a collection

There are various ways to get the last n documents in a collection.

Let’s look at one of the ways via the following steps

The following example shows how this can be done.

Let’s assume that we have our same Employee collection which has the Field names of “Employeeid” and “EmployeeName”.

Let’ also assume that we have the following documents in our collection:

Employee idEmployee Name
22NewMartin
2Mohan
3Joe
4MohanR
100Guru99
6Gurang
7GURU99
MongoDB Regular Expression (Regex) Tutorial
db.Employee.find().sort({_id:-1}).limit(2).forEach(printjson)

Code Explanation:

1) When querying for the documents, use the sort function to sort the records in reverse order based on the _id field value in the collection. The -1 basically indicates to sort the documents in reverse order or descending order so that the last document becomes the first document to be displayed.

2) Then use the limit clause to just display the number of records you want. Here we have set the limit clause (2), so it will fetch the last two documents.

If the command is executed successfully, the following Output will be shown:

Output:

MongoDB Regular Expression (Regex) Tutorial

The output clearly shows that the last two documents in the collection are displayed. Hence we have clearly shown that to fetch the last ‘n’ documents in the collection, we can first sort the documents in descending order and then use the limit clause to return the ‘n’ number of documents which are required.

Note: If the search is performed on a string which is greater than say 38,000 characters, it will not display the right results.

Optimizing Regular Expression Queries

  • If the document fields are indexed, the query will use make use of indexed values to match the regular expression. This makes the search very fast as compared to the regular expression scanning the whole collection.
  • If the regular expression is a prefix expression, all the matches are meant to start with a certain string characters. For e.g., if the regex expression is ^tut, then the query has to search for only those strings that begin with tut.
  • Summary:
  • Pattern matching can be achieved by the $regex operator. This operator can be used to find for certain strings in the collection.
  • The ^ and $ symbol can be used for exact text searches with ^ being used to make sure that the string starts with a certain character and $ used to ensure that the string ends with a certain character.
  • The ‘i’ along with the $regex operator can be used to specify case insensitivity so that strings can be searched whether they are in lower case or upper case.
  • The delimiters // can also be used for pattern matching.
  • Use a combination of sort and the limit function to return the last n documents in the collection. The sort function can be used to return the documents in descending order after which the limit clause can be used to limit the number of documents being returned.
What is Regular Expression in MongoDB?
Show Buttons
Hide Buttons