SQL Azure – TSQL User Defined Function – Pull out money from NVARCHAR(MAX)

I think this one is pretty dirty but here goes.

Continuing the theme of identifying substrings in NVARCHAR(MAX) field here is a set of Functions that can be used identify and separate out money in an email body which has been uploaded into a NVARCHAR(MAX) field. looks for the pattern of 2 numbers after a dot and certain numerals before a dot and steps down from hundreds of thousands of pounds to tens of pounds. Note the order is important of the final query as each shorter pattern is a subset the next longer pattern.

First important to strip out the commas in the body as this will skip any currencies which have been formatted with commas.

UPDATE T032email
SET body = REPLACE(body,',','');

WARNING It should be noted that this is very unoptimised code and even on small sets combining the functions in a nested ISNULL SQL really takes time to run.

Firstly a function that looks for the pattern 100000.00 ie 999k to 100k.

CREATE FUNCTION [dbo].[ReturnMoneyDecimal6](@fieldtosearch as nvarchar(max))
 Returns varchar(9)
 as
 begin
 declare @length as int
 declare @loop as int
 declare @retemoney6 as decimal(9,2)
 set @loop =1
 set @length = len(@fieldtosearch)
 
 while @loop<=@length-8
    begin
    if exists(select 1 where substring(@fieldtosearch,@loop,9) like '[0-9][0-9][0-9][0-9][0-9][0-9][.][0-9][0-9]')
        set @retemoney6 = substring(@fieldtosearch,@loop,9)
    set @loop=@loop+1
    end
Return @retemoney6
END

Next a function that looks for the pattern 10000.00 ie 99k to 10k.

CREATE function [dbo].[ReturnMoneyDecimal5](@fieldtosearch as nvarchar(max))
 Returns varchar(8)
 as
 begin
 declare @length as int
 declare @loop as int
 declare @retemoney5 as decimal(8,2)
 set @loop =1
 set @length = len(@fieldtosearch)
 
 while @loop<=@length-7
    begin
    if exists(select 1 where substring(@fieldtosearch,@loop,8) like '[0-9][0-9][0-9][0-9][0-9][.][0-9][0-9]')
        set @retemoney5 = substring(@fieldtosearch,@loop,8)
    set @loop=@loop+1
    end
Return @retemoney5
END

Next a function that looks for the pattern 1000.00 ie 9k to 1k.

CREATE function [dbo].[ReturnMoneyDecimal4](@fieldtosearch as nvarchar(max))
 Returns varchar(7)
 as
 begin
 declare @length as int
 declare @loop as int
 declare @retemoney4 as decimal(7,2)
 set @loop =1
 set @length = len(@fieldtosearch)
 
 while @loop<=@length-6
    begin
    if exists(select 1 where substring(@fieldtosearch,@loop,7) like '[0-9][0-9][0-9][0-9][.][0-9][0-9]')
        set @retemoney4 = substring(@fieldtosearch,@loop,7)
    set @loop=@loop+1
    end
Return @retemoney4
END

Next a function that looks for the pattern 100.00 ie 999 to 100.

CREATE function [dbo].[ReturnMoneyDecimal3](@fieldtosearch as nvarchar(max))
 Returns varchar(6)
 as
 begin
 declare @length as int
 declare @loop as int
 declare @retemoney3 as decimal(6,2)
 set @loop =1
 set @length = len(@fieldtosearch)
 
 while @loop<=@length-5
    begin
    if exists(select 1 where substring(@fieldtosearch,@loop,6) like '[0-9][0-9][0-9][.][0-9][0-9]')
        set @retemoney3 = substring(@fieldtosearch,@loop,6)
    set @loop=@loop+1
    end
Return @retemoney3
END

Lastly a function that looks for the pattern 10.00 ie 99 to 10.

CREATE function [dbo].[ReturnMoneyDecimal2](@fieldtosearch as nvarchar(max))
 Returns varchar(5)
 as
 begin
 declare @length as int
 declare @loop as int
 declare @retemoney2 as decimal(5,2)
 set @loop =1
 set @length = len(@fieldtosearch)
 
 while @loop<=@length-4
    begin
    if exists(select 1 where substring(@fieldtosearch,@loop,5) like '[0-9][0-9][.][0-9][0-9]')
        set @retemoney2 = substring(@fieldtosearch,@loop,5)
    set @loop=@loop+1
    end
Return @retemoney2
END

And then here is the quite horribly inefficient query that can be used to run all functions and seems to do the job.

You know your searchstrings better than me but any text with multiple decimal substrings will be an issue!!!

Which is still a few million times quicker than any person!

SELECT dbo.T032email.pkid,
ISNULL(dbo.ReturnMoneyDecimal6(dbo.T032email.body),
	ISNULL(dbo.ReturnMoneyDecimal5(dbo.T032email.body),
		ISNULL(dbo.ReturnMoneyDecimal4(dbo.T032email.body),
			ISNull(dbo.ReturnMoneyDecimal3(dbo.T032email.body), dbo.ReturnMoneyDecimal2(dbo.T032email.body)
				)))) as money4 from T032email;

SQL Azure – TSQL User Defined Function – Cleaning a Field of Specified Characters

Stack overflow sourced, adapted and personally tested code
How to strip all non-alphabetic characters from string in SQL Server

In an earlier post I was having issues as some email addresses I was pulling out from a field were captioned in <> using the following will remove those characters prior to identifying string patterns..

Remember this pulls them out and then compacts the resulting string this may or many not be what you are looking for. I have adapted from the Stack Overflow discussion to include characters I don’t want rid of.

CREATE Function [dbo].[RemoveNonAlphaCharacters](@Temp nvarchar(max))
Returns nvarchar(max)
AS
Begin

    Declare @KeepValues as nvarchar(50)
    Set @KeepValues = '%[^a-z0-9/@ £$+=?.\!]%'
    While PatIndex(@KeepValues, @Temp) > 0
        Set @Temp = Stuff(@Temp, PatIndex(@KeepValues, @Temp), 1, '')

    Return @Temp
End

And running the example code we get..

SELECT dbo.RemoveNonAlphaCharacters('abcdefghijklmnopqrstuvwxyz1234567890 !"£$()_+=-{}[]:@~;\|<>?./') as txtCorrected

We get

WARNING Please note ^ % & and * ‘ ” – are reserved TSQL characters and including them in the function appears to break the logic that I wish to see in the pattern replacement.

SQL Azure – TSQL User Defined Function – Separate multiple emails from NVARCHAR(MAX) field

Stack overflow sourced, adapted and personally tested code
Extract email address from string using tsql

A continuation of working with strings in TSQL specifically linked to emails.

Firstly create the following Function

CREATE FUNCTION [dbo].[fnFindPatternLocation]
(
    @string NVARCHAR(MAX),
    @term   NVARCHAR(MAX)
)
RETURNS TABLE
AS
    RETURN 
    (
        SELECT pos = Number - LEN(@term) 
        FROM (SELECT Number, Item = LTRIM(RTRIM(SUBSTRING(@string, Number, 
        CHARINDEX(@term, @string + @term, Number) - Number)))
        FROM (SELECT ROW_NUMBER() OVER (ORDER BY [object_id])
        FROM sys.all_objects) AS n(Number)
        WHERE Number > 1 AND Number <= CONVERT(INT, LEN(@string))
        AND SUBSTRING(@term + @string, Number, LEN(@term)) = @term
    ) AS y);

Then create a View of what you are interested in as follows.. Note here I am taking out the carriage return as my subsequent query doesn’t like them and in emails they frequently exist.

CREATE VIEW [dbo].[v001] as SELECT pkid, REPLACE(body, CHAR(13) + CHAR(10),' ') as body1 from t001email

Then run the newly created View through a query.

SELECT pkid, body1, pos, SUBSTRING(body,beginningOfEmail,endOfEmail-beginningOfEmail) AS email
FROM v001
CROSS APPLY (SELECT pos FROM dbo.fnFindPatternLocation(body1, '@')) AS A(pos)
CROSS APPLY (SELECT CHARINDEX(' ',body1 + ' ', pos)) AS B(endOfEmail)
CROSS APPLY (SELECT pos - CHARINDEX(' ', REVERSE(SUBSTRING(body, 1, pos))) + 2) AS C(beginningOfEmail)

Couple of things here
Multiple emails will be picked out and placed as separate records so if there a string that reads

This is a sentence with two emials first@gmail.com and a second second@gmail.com

it will return
first@gmail.com
second@gmail.com

If an email starts the field then this will NOT work after finding the @ symbol it will count forward and fail to find a space and so set space before to Null it will then return just the domain of the email. I will be looking to fix this at some point.

Secondly if the emails within the field contain contiguous special html characters such as < or > these will be picked up and inculded as if they are part of the email addresses.

We can fix this by scanning through the varchar(max) field and stripping out special characters.

NOTE : If you are working with email bodies carriage returns will also screw up the above query in which case consider running the field through some kind of replace view with similar syntax as

CREATE VIEW v002 as SELECT pkid, REPLACE(body, CHAR(13) + CHAR(10),' ') as txtBodyWithoutReturns from t001email

SQL Azure – TSQL User Defined Function – Separate Defined Length String from NVARCHAR(MAX) Field

Stack overflow sourced, adapted and personally tested code
How to get part of a string that matches with a regular expression

The following function arose out of a desire to find Eplanning Scotland planning references numbers which follow the format of 9 numerals a dash and then 3 numerals within a NVARCHAR(MAX) field type. The characteristics of the string was that it is always the same length.

In SSMS select the New Query button

Create function [dbo].[ReturnEplanningNumberFull](@fieldtosearch as nvarchar(max))
 Returns nvarchar(13)
 as
 begin
 declare @length as int 
 declare @loop as int
 declare @reteplan as varchar(13)
 set @loop =1
 set @length = len(@fieldtosearch)

 while @loop<=@length-12
    begin
    if exists(select 1 where substring(@fieldtosearch,@loop,13) like '[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][-][0-9][0-9][0-9]')
        set @reteplan = substring(@fieldtosearch,@loop,13)
    set @loop=@loop+1
    end
Return @reteplan
end

And then to create a View using this you can write something like ;

CREATE VIEW v026eplanrefs AS 
SELECT dbo.THETABLE.pkid, 
dbo.ReturnEplanningNumberFull(dbo.THETABLE.FIELDCONTAINSREFERENCE) as eplanno 
FROM dbo.THETABLE;

I subsequently altered this to identify the first 9 digits as this is sufficent to uniquely identify eplanning records.

CREATE function [dbo].[ReturnEplanningNumberShort](@fieldtosearch as nvarchar(max))
 Returns nvarchar(9)
 as
 begin
 declare @length as int 
 declare @loop as int
 declare @reteplanshort as nvarchar(9)
 set @loop =1
 set @length = len(@fieldtosearch)

 while @loop<=@length-8
    begin
    if exists(select 1 where substring(@fieldtosearch,@loop,9) like '[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]')
        set @reteplanshort = substring(@fieldtosearch,@loop,9)
    set @loop=@loop+1
    end
Return @reteplanshort
end