How to locate and replace special characters in an XML string with Visual C++ and WinRT

Hi Community,

I hope you all had a well-deserved Christmas break  Smile

A couple of days ago I began writing an article for MSDN Magazine. The article is about Metro, Windows 8 and interop between Win32  and WinRT. One of the most frequent and error prone operation is “String manipulation”, due to the possibility of a buffer overflow, encoding, an invalid pointer or different data types – Yes… data types to represent a string, it’s not the same a char  to a wchar_t or ANSI to UNICODE.

Every Windows based on NT (2000, XP & later) natively supports UNICODE, Actually, UNICODE has been the standard ever since. In fact, every string in .NET is UNICODE and strings like every other object are allocated by VirtualAlloc.

Now back to the article I’m currently writing, in my personal case I found in XML strings a convenient way to pass information between processes & DLLs, let’s say it’s like my custom IPC mechanism, of course when the amount of information being transferred is not a huge file but more an XML representation of an given object.

Before .NET even existed, we had MSXML which is an efficient XML parser. In the WinRT world, we can manipulate XML documents via the Windows.Data.Xml.Dom namespace.

In the code snippet shown below, we got a method called ParseXml which calls another method called ReplaceSpecialCharactersIfAny.

 void DataContext::ParseXml() {
     if (!xml->IsEmpty()) {
         ReplaceSpecialCharactersIfAny(); // Replace special characters
         auto doc = ref new XmlDocument;
         doc->LoadXml(xml);
         //TODO: Populate our object(s) based on XML document
     } else throw ref new NullReferenceException;
 }

The implementation of ReplaceSpecialCharacterIAny is pretty straightforward, we store in a std::map the character to be replaced & the replacement string. We also reserve twice the length of the input XML string (We don’t know for certain how many replacements we’ll do, so twice the size to be used as a buffer seems to be fair enough). Then we replaced every occurrence of the special characters in the std::map and an encoded String^ is returned.

 void DataContext::ReplaceSpecialCharactersIfAny() {
     auto tempXml = std::basic_string<wchar_t>();
     tempXml.reserve(xml->Length() * 2);
     auto replaceChars = std::map<wchar_t, String^>();

     replaceChars.insert(std::pair<wchar_t, String^>('<',"&lt;"));
     replaceChars.insert(std::pair<wchar_t, String^>('>',"&gt;"));
     replaceChars.insert(std::pair<wchar_t, String^>('"',"&quot;"));
     replaceChars.insert(std::pair<wchar_t, String^>('&',"&amp;"));
     replaceChars.insert(std::pair<wchar_t, String^>(''',"&apos;"));

     for (size_t pos = 0; pos <= xml->Length(); ++pos) {
         auto it =  replaceChars.find(xml->Data()[pos]);
         if (it != replaceChars.end()) {
             auto newValue = (*(it._Ptr))._Myval.second->Data();
             tempXml.append(newValue);
         } else {
             tempXml.append(&xml->Data()[pos], 1);
         }
     }

     xml = ref new String(tempXml.c_str());
 }

Another alternative to this approach could’ve also been via Regex (but I’ll leave it to another post) or a more specialized way.

Regards,

Angel

Leave a Reply

Your email address will not be published. Required fields are marked *