Avoid Using "< [Cdata[ ]]>" in RSS

<![CDATA[ ... ]]> is very commonly used in RSS (also Atom) feeds to escape XML special characters. At first glance, it looks very convenient, you simply add <![CDATA[ ... ]]> blocks and write any (almost) content inside of them without worrying about escaping characters:

< item > < title > <![CDATA[Using <CDATA> in Titles]]> </ title > < link > http://example.com </ link > < description > <![CDATA[ This description contains HTML markup. It allows us to use characters like "&" and brackets directly. ]]> </ description > </ item >

CDATA seems to be perfect, isn't it? Except it's not possible to escape some CDATA special character sequences inside a single CDATA block, particularly ]]> (the one that ends the CDATA block). In order to do that, you have to split the CDATA block into multiple parts:

< text > <![CDATA[hello ]]]]><![CDATA[> world]]> </ text >

The encoded text is "hello ]]> world". As you can see, the XML code is less readable now. CDATA loses most of its simplicity advantage.

Even though splitting makes the encoding of ]]> possible, I would say it's still not worth using CDATA:

It adds a special edge case for ]]> , which the serializer must handle.

, which the serializer must handle. It can mislead people into thinking the content is raw HTML or somehow safer. No, it is not.

It makes output less uniform, because sometimes you need split CDATA blocks.

It does not change the parsed value. XML parsers expose the same text either way.

... continue reading