Function to Remove Extended Characters
By Pete Freitag
I wrote a ColdFusion function today that I thought I would share. What it does is replace extended, or unicode characters with HTML/XML character entities. EG: the character à becomes à.
I wrote this for an RSS feed that had a few unicode characters in it, but the majority of the feed was us-ascii. Rather than changing the encoding, I opted to replace those few chars with an ascii safe XML representation.
Here's The function:
<cffunction name="EscapeExtendedChars" returntype="string"> <cfargument name="str" type="string" required="true"> <cfset var buf = CreateObject("java", "java.lang.StringBuffer")> <cfset var len = Len(arguments.str)> <cfset var char = ""> <cfset var charcode = 0> <cfset buf.ensureCapacity(JavaCast("int", len+20))> <cfif NOT len> <cfreturn arguments.str> </cfif> <cfloop from="1" to="#len#" index="i"> <cfset char = arguments.str.charAt(JavaCast("int", i-1))> <cfset charcode = JavaCast("int", char)> <cfif (charcode GT 31 AND charcode LT 127) OR charcode EQ 10 OR charcode EQ 13 OR charcode EQ 9> <cfset buf.append(JavaCast("string", char))> <cfelse> <cfset buf.append(JavaCast("string", "#"))> <cfset buf.append(JavaCast("string", charcode))> <cfset buf.append(JavaCast("string", ";"))> </cfif> </cfloop> <cfreturn buf.toString()> </cffunction>
I'm making use of Java's StringBuffer class, and also the charAt
method of java.lang.String
. I think this code is a pretty fast solution, since it avoid appending strings by hand, and I would guess the charAt
method may be a bit faster than using the builtin CFML Mid function.
Function to Remove Extended Characters was first published on January 21, 2005.
The FuseGuard Web Application Firewall for ColdFusion & CFML is a high performance, customizable engine that blocks various attacks against your ColdFusion applications.
CFBreak
The weekly newsletter for the CFML Community
Comments
btw nolan, those chars are most likely not unicode but windows codepage, which is a sort of superset of iso-8859-1.
Any suggestions on what to do in moving forward and resolve this issue? Thanks.
Cheers, Pete (aka lad4bear)
Hope some one show me how to use this fucntion in my cfquery to replace those character.
Thanks
(excuse my poor English)
Thanks, Pete!