console . log ( 1 > 0 && 0 < 1 ) script >
This is great, JavaScript can be embedded directly. Imagine if script tags required HTML escaping:
HTML < script > console . log ( 1 & gt ; 0 & amp ; & amp ; 0 & lt ; 1 ) script >
In fact, script tags can contain any language (not necessarily JavaScript) or even arbitrary data. In order to support this behavior, script tags have special parsing rules. For the most part, the browser accepts whatever is inside the script tag until it finds the script close tag .
So, what happens when we embed this perfectly valid JavaScript that contains a script close tag?
HTML < script > console . log ( ' script > ') script >
Oops! We can see that was part of a JavaScript string, but the browser is just parsing the HTML. This script element closes prematurely, resulting in the following tree:
├─SCRIPT │ └─#text console.log(' └─#text ')
Ok, let’s use json_encode() and we should be all set:
PHP < script > console . log ( echo json_encode ( '' ) ; ) script >
Now we’ve got this HTML:
HTML < script > console . log ( "< \/ script>" ) script >
has become <\/script> . The JavaScript string value is preserved and the script element does not close prematurely. Perfect, right?
Not so fast, things are about to get messy
Let’s expand with a more complex example. Here’s some data used by an imaginary HTML library. We’ll escape the JSON again with json_encode :
PHP < script > echo json_encode ( [ 'closeComment' => '-->' , 'closeScript' => '' , 'openComment' => '" , "closeScript" : "< \/ script>" , "openComment" : "",␊ "closeScript": "<\/script>",␊ "openComment": "
This kind of practice was commonplace on the web. As the web evolved, browsers continued to support the behavior so they wouldn’t break existing pages. Then, HTML5 came along and standardized the behavior so folks knew what to expect, even if it’s surprising. We can see other remnants of this practice in the HTML scripting specification:
for related historical reasons, the string “”, or a newline (
, \f , \r ). For example, does not close a script element from the script data double escaped state.
I encourage you to pause for a moment and play with this example to get a feel for how the script tag escaped states work.
Avoid the doubled escaped state
The complexity of script tag parsing and escaping comes from the escaped states. Avoid the script data double escaped state and script tags become simple. Everything until the tag closer is inside the script element.
How can we avoid the double escaped state? Script tag parsing always starts in the script data state and there’s a pattern in its transitions:
\u003E . This will escape much more than is strictly necessary, but it’s sufficient and is provided by the language. Perfect!
How to escape JSON escaping in PHP
For JSON that will be printed in a script tag, use the following flags:
JSON_HEX_TAG
All < and > are converted to \u003C and \u003E.
All < and > are converted to \u003C and \u003E. JSON_UNESCAPED_SLASHES
Don’t escape / .
If everything is UTF-8 (both the data and the charset of the page) you can add these flags for cleaner and shorter JSON:
JSON_UNESCAPED_UNICODE
Encode multibyte Unicode characters literally (default is to escape as \uXXXX).
Encode multibyte Unicode characters literally (default is to escape as \uXXXX). JSON_UNESCAPED_LINE_TERMINATORS
The line terminators are kept unescaped when JSON_UNESCAPED_UNICODE is supplied. It uses the same behaviour as it was before PHP 7.1 without this constant. Available as of PHP 7.1.0.
JSON_UNESCAPED_LINE_TERMINATORS is a fun one. Before ES2019, JavaScript strings did not accept two characters U+2028 (LINE SEPARATOR) and U+2029 (PARAGRAPH SEPARATOR) that JSON strings do allow. Some valid JSON was invalid JavaScript. Since the JavaScript is a superset of JSON proposal landed in ES2019, that’s no longer the case and those characters no longer require escaping. Phew! Browser support today is very good.
JSON escaping in action
Here’s the problematic example again, now with the recommended flags:
PHP < script > echo json_encode ( [ 'closeComment' => '-->' , 'closeScript' => '' , 'openComment' => '