Pages

Thursday, January 14, 2010

Utilizing Apex Pattern and Matcher Classes

In many projects I am involved with I need to validate a string of data or transform the string into a new one with a specific format. Processing the text by the means of String primitive type methods or your custom handling could a big undertaking and time consuming task.

Sometimes, one needs to write hundreds of lines of code, to process a string and make sure it’s valid (formatted as expected) or transform it into proper format. Some examples of this are validating a string to see if it’s a correct email address, postal code, phone number or URL. Some other examples are grabbing html tags or striping down the XML or HTML tag to get a clear text, trimming the whitespaces, removing duplicate lines or items and many more.

Apex in Force.com platform has just the right set of classes to help you carry out such operations pretty much the same way Java does it.

“A regular expression is a string that is used to match another string, using a specific syntax. Apex supports the use of the regular expression through its Pattern and Matcher classes.” Quoted right from the holly guide. Any regular expression that is written for Java can be used with Apex as well.

In order to utilize these classes we first need to know what each of them does.

Pattern class is designed to contain the regular expression string and you compile the expression into an object of this class. You only need to use this class once. Using this class you will be able to create a Matcher object by passing your string (on which you want to carry out surgery or validation).


pattern myPattern = pattern.compile('(a(b)?)+');




Matcher in turn allows you to do further actions such as checking to see if the string matched the pattern or allows you to manipulate the original string in various ways and produce a new desired one.



matcher myMatcher = myPattern.matcher('aba');



Let’s explore some samples of using regular expressions in Apex and see how we can benefit from them:

My first example will be how to validate an email address. I personally had some struggles with this since the email addresses can get pretty ugly at times. Imagine this email address:

name.lastname_23@ca.gov.on.com



String InputString = 'email@email.com';
String emailRegex = '([a-zA-Z0-9_\\-\\.]+)@((\\[a-z]{1,3}\\.[a-z]{1,3}\\.[a-z]{1,3}\\.)|(([a-zA-Z0-9\\-]+\\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})';
Pattern MyPattern = Pattern.compile(emailRegex);

// Then instantiate a new Matcher object "MyMatcher"
Matcher MyMatcher = MyPattern.matcher(InputString);

if (!MyMatcher.matches()) {
// invalid, do something
}



Some more examples on validations:



// to validate a password
String RegualrExpression_Password = '((?=.*\d)(?=.*[a-z])(?=.*[A-Z])(?=.*[@#$%]).{6,20})';

//image file extention
String RegualrExpression_ImgFileExt = '([^\s]+(\.(?i)(jpg|png|gif|bmp))$)';

//to validate an IP Address
String RE_IP = '^([01]?\\d\\d?|2[0-4]\\d|25[0-5])\\.([01]?\\d\\d?|2[0-4]\\d|25[0-5])\\.([01]?\\d\\d?|2[0-4]\\d|25[0-5])\\.([01]?\\d\\d?|2[0-4]\\d|25[0-5])$';

//date format (dd/mm/yyyy)
String RE_date = '(0?[1-9]|[12][0-9]|3[01])/(0?[1-9]|1[012])/((19|20)\\d\\d)';

//to match links tag "A" in HTML
String RE_ATags = '(?i)<a([^>]+)>(.+?)</a>';






Another way that you can benefit from the Matcher class to to reformat the string.

Below is an example that shows you, how you can strip the HTML tags from a string and extract the plain text. This is very useful when you want to record email contents into Salesforce or covert the HTML version of an email into it's plain text counterpart.



string html = 'your html code';
//first replace all <BR> tags with \n to support new lines

string result = html.replaceAll('<br/>', '\n');
result = result.replaceAll('<br />', '\n');

//regular expression to match all HTML/XML tags
string HTML_TAG_PATTERN = '<.*?>';

// compile the pattern
pattern myPattern = pattern.compile(HTML_TAG_PATTERN);

// get your matcher instance
matcher myMatcher = myPattern.matcher(result);

//remove the tags
result = myMatcher.replaceAll('');





For complete reference of Java regular expressions please refer to: here