Regular Expression in T-SQL

Regular Expressions can be very useful to the Database programmer, particularly for data validation, data feeds and data transformations.  In C#.NET, Regex.Replace Method replaces all occurrences of a character pattern defined by a regular expression with a specified replacement character string. We can do the same steps in T-SQL (2005/2008).

Use the Ole Automation Procedures option to specify whether OLE Automation objects can be instantiated within Transact-SQL batches. This option can also be configured using the Surface Area Configuration tool.

The Ole Automation Procedures option can be set to the following values.

0 OLE Automation Procedures are disabled. Default for new instances of SQL Server 2005.
1 OLE Automation Procedures are enabled.

When OLE Automation Procedures are enabled, a call to sp_OACreate, sp_OASetProperty, sp_OAMethod, sp_OAGetErrorInfo, sp_OADestroy will start the OLE shared execution environment.

In SQL Server 2005/2008 we need to start with the configuring the Ole Automation Procedures

sp_configure ‘show advanced options’, 1;
GO
RECONFIGURE with override
go
sp_configure ‘Ole Automation Procedures’,1
GO
RECONFIGURE with override

Now we can create a new function in SQL as follow:

CREATE FUNCTION dbo.RegexReplace
(
@pattern VARCHAR(255),
@replacement VARCHAR(255),
@Subject VARCHAR(max),
@global BIT = 1,
@Multiline bit =1
)
RETURNS VARCHAR(max)
AS BEGIN
DECLARE @objRegexExp INT,
@objErrorObject INT,
@strErrorMessage VARCHAR(255),
@Substituted VARCHAR(8000),
@hr INT,
@Replace BIT
SELECT  @strErrorMessage = ‘creating a regex object’
EXEC @hr= sp_OACreate ‘VBScript.RegExp’, @objRegexExp OUT
IF @hr = 0
SELECT  @strErrorMessage = ‘Setting the Regex pattern’,
@objErrorObject = @objRegexExp
IF @hr = 0
EXEC @hr= sp_OASetProperty @objRegexExp, ‘Pattern’, @pattern
IF @hr = 0 /*By default, the regular expression is case sensitive. Set the IgnoreCase property to True to make it case insensitive.*/
SELECT  @strErrorMessage = ‘Specifying the type of match’
IF @hr = 0
EXEC @hr= sp_OASetProperty @objRegexExp, ‘IgnoreCase’, 1
IF @hr = 0
EXEC @hr= sp_OASetProperty @objRegexExp, ‘MultiLine’, @Multiline
IF @hr = 0
EXEC @hr= sp_OASetProperty @objRegexExp, ‘Global’, @global
IF @hr = 0
SELECT  @strErrorMessage = ‘Doing a Replacement’
IF @hr = 0
EXEC @hr= sp_OAMethod @objRegexExp, ‘Replace’, @Substituted OUT,
@subject, @Replacement
/*If the RegExp.Global property is False (the default), Replace will return the @subject string with the first regex match (if any) substituted with the replacement text. If RegExp.Global is true, the @Subject string will be returned with all matches replaced.*/
IF @hr <> 0
BEGIN
DECLARE @Source VARCHAR(255),
@Description VARCHAR(255),
@Helpfile VARCHAR(255),
@HelpID INT
EXECUTE sp_OAGetErrorInfo @objErrorObject, @source OUTPUT,
@Description OUTPUT, @Helpfile OUTPUT, @HelpID OUTPUT
SELECT  @strErrorMessage = ‘Error whilst ‘
+ COALESCE(@strErrorMessage, ‘doing something’) + ‘, ‘
+ COALESCE(@Description, ”)
RETURN @strErrorMessage
END
EXEC sp_OADestroy @objRegexExp
RETURN @Substituted
END
GO

The function RegexReplace requires 4 parameters:

  1. @pattern: it can be any pattern that we usually use to match any expression
  2. @replacement: will be the text or word that will replace the matched pattern
  3. @Subject: will be the text or the sentence that will be used to clean or match the pattern
  4. @global: If you want the RegExp object to return or replace all matches instead of just the first one, set the Global property to True.
  5. @Multiline: The caret and dollar only match at the very start and very end of the subject string by default. If your subject string consists of multiple lines separated by line breaks, you can make the caret and dollar match at the start and the end of those lines by setting the Multiline property to True.

As an example, we can clean the sentence “A bunch of ()/*++\’#@$&*^!% invalid URL characters ” from all non-letter words:

declare @stm varchar(1000)
set @stm = ‘A bunch of ()/*++\”#@$&*^!%     invalid URL characters  ‘
SELECT  @stm = dbo.RegexReplace(‘[^a-z0-9\s-]’,”,@stm,1,1)
SELECT  @stm = dbo.RegexReplace(‘[\s-]+’,’ ‘,@stm,1,1)
— the ouput will return:
A bunch of invalid URL characters

dbo.RegexReplace(‘[^a-z0-9\s-]’,’ ‘,’A bunch of ()/*++\”#@$&*^!%     invalid URL characters ‘,1,1) in T-SQL is equivalent to  Regex.Replace(“A bunch of ()/*++\”#@$&*^!%     invalid URL characters”, “‘[^a-z0-9\s-]”, ” “, RegexOptions.IgnoreCase)

For more T-SQL regex functions, you can check http://www.simple-talk.com/sql/t-sql-programming/tsql-regular-expression-workbench/

Advertisements

2 thoughts on “Regular Expression in T-SQL

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s