More than once I've needed to find all non-ascii characters in a string. In a .net regular expression this is relatively easy:
[^\x20-\x7F]
I thought that regex would port over reletively easily to Oracle - wrong.
First, I found more than one site reference the \x sequence. However I just couldn't get it to work.
I then tried a brute force version:
[^A-Za-z 0-9 \.,\?''""!@#\$%\^&\*\(\)\-_=\;:<>\/\\\|\}\{\[\]`~]
To my surprise that didn't work either. After struggling with it for a while I found the regex reference page from Oracle. It states:
The following regular expression operators are allowed within the character list, any other metacharacters included in a character list lose their special meaning (are treated as literals):
- Range operator
'-'
- POSIX character class
[:
:]
- POSIX collating sequence
[. .]
- POSIX character equivalence class
[= =]
So in other words, character sets are pretty limited in what they can have in it. If a backslash is treated as a literal and not as an escape sequence, then that explained why my brute force version didn't work - sort of. I did eventually get it to work but for some reason I had to remove the ']' within the character class. So that wasn't perfect either.
Finally I gave it one last shot. Rather than the hex range, I tried the character range for the same values - from space to tilde (look at an ascii table of values and this will make sense).
[^ -~]
Finally this seemed to work as expected (keep in mind thought that character ranges are affected by the NLS_SORT variable - anything other than binaray, might yield unexpected results). I ran the following query as a test:
select regexp_replace('This is a test '||chr(191), '[^ -~]', '$') from dual;
The only thing left was to handle non-printable chars like crlf and tabs. I new the hex code was out so I generated the first ascii character with the chr function to replace the space:
select regexp_replace('This is a test crlf'||chr(10)|| 'blah', '[^' || chr(1) ||'-~]', '$') from dual;
Update: After I wrote this article I found a section on regular-expressions.info that explains some of the oracle differences.