Class JavaScriptEscape

Object
org.unbescape.javascript.JavaScriptEscape

public final class JavaScriptEscape extends Object

Utility class for performing JavaScript escape/unescape operations.

Configuration of escape/unescape operations

Escape operations can be (optionally) configured by means of:

  • Level, which defines how deep the escape operation must be (what chars are to be considered eligible for escaping, depending on the specific needs of the scenario). Its values are defined by the JavaScriptEscapeLevel enum.
  • Type, which defines whether escaping should be performed by means of SECs (Single Escape Characters like \n) or additionally by means of x-based or u-based hexadecimal escapes (\xE1 or \u00E1). Its values are defined by the JavaScriptEscapeType enum.

Unescape operations need no configuration parameters. Unescape operations will always perform complete unescape of SECs (\n), x-based (\xE1) and u-based (\u00E1) hexadecimal escapes, and even octal escapes (\057, which are deprecated since ECMAScript v5 and therefore not used for escaping).

Features

Specific features of the JavaScript escape/unescape operations performed by means of this class:

  • The JavaScript basic escape set is supported. This basic set consists of:
    • The Single Escape Characters: \0 (U+0000), \b (U+0008), \t (U+0009), \n (U+000A), \v (U+000B), \f (U+000C), \r (U+000D), \" (U+0022), \' (U+0027), \\ (U+005C) and \/ (U+002F). Note that \/ is optional, and will only be used when the / symbol appears after <, as in </. This is to avoid accidentally closing <script> tags in HTML. Also, note that \v (U+000B) is actually included as a Single Escape Character in the JavaScript (ECMAScript) specification, but will not be used as it is not supported by Microsoft Internet Explorer versions < 9.
    • Two ranges of non-displayable, control characters (some of which are already part of the single escape characters list): U+0001 to U+001F and U+007F to U+009F.
  • X-based hexadecimal escapes (a.k.a. hexadecimal escapes) are supported both in escape and unescape operations: \xE1.
  • U-based hexadecimal escapes (a.k.a. unicode escapes) are supported both in escape and unescape operations: \u00E1.
  • Octal escapes are supported, though only in unescape operations: \071. These are not supported in escape operations because octal escapes were deprecated in version 5 of the ECMAScript specification.
  • Support for the whole Unicode character set: \u0000 to \u10FFFF, including characters not representable by only one char in Java (>\uFFFF).
Input/Output

There are four different input/output modes that can be used in escape/unescape operations:

  • String input, String output: Input is specified as a String object and output is returned as another. In order to improve memory performance, all escape and unescape operations will return the exact same input object as output if no escape/unescape modifications are required.
  • String input, java.io.Writer output: Input will be read from a String and output will be written into the specified java.io.Writer.
  • java.io.Reader input, java.io.Writer output: Input will be read from a Reader and output will be written into the specified java.io.Writer.
  • char[] input, java.io.Writer output: Input will be read from a char array (char[]) and output will be written into the specified java.io.Writer. Two int arguments called offset and len will be used for specifying the part of the char[] that should be escaped/unescaped. These methods should be called with offset = 0 and len = text.length in order to process the whole char[].
Glossary
SEC
Single Escape Character: \0 (U+0000), \b (U+0008), \t (U+0009), \n (U+000A), \v (U+000B), \f (U+000C), \r (U+000D), \" (U+0022), \' (U+0027), \\ (U+005C) and \/ (U+002F) (optional, only in </).
XHEXA escapes
Also called x-based hexadecimal escapes or simply hexadecimal escapes: compact representation of unicode codepoints up to U+00FF, with \x followed by exactly two hexadecimal figures: \xE1. XHEXA is many times used instead of UHEXA (when possible) in order to obtain shorter escaped strings.
UHEXA escapes
Also called u-based hexadecimal escapes or simply unicode escapes: complete representation of unicode codepoints up to U+FFFF, with \u followed by exactly four hexadecimal figures: \u00E1. Unicode codepoints > U+FFFF can be represented in JavaScript by mean of two UHEXA escapes (a surrogate pair).
Octal escapes
Octal representation of unicode codepoints up to U+00FF, with \ followed by up to three octal figures: \071. Though up to three octal figures are allowed, octal numbers > 377 (0xFF) are not supported. Note octal escapes have been deprecated as of version 5 of the ECMAScript specification.
Unicode Codepoint
Each of the int values conforming the Unicode code space. Normally corresponding to a Java char primitive value (codepoint <= \uFFFF), but might be two chars for codepoints \u10000 to \u10FFFF if the first char is a high surrogate (\uD800 to \uDBFF) and the second is a low surrogate (\uDC00 to \uDFFF).
References

The following references apply:

Since:
1.0.0
Author:
Daniel Fernández
  • Method Details

    • escapeJavaScriptMinimal

      public static String escapeJavaScriptMinimal(String text)

      Perform a JavaScript level 1 (only basic set) escape operation on a String input.

      Level 1 means this method will only escape the JavaScript basic escape set:

      • The Single Escape Characters: \0 (U+0000), \b (U+0008), \t (U+0009), \n (U+000A), \v (U+000B), \f (U+000C), \r (U+000D), \" (U+0022), \' (U+0027), \\ (U+005C) and \/ (U+002F). Note that \/ is optional, and will only be used when the / symbol appears after <, as in </. This is to avoid accidentally closing <script> tags in HTML. Also, note that \v (U+000B) is actually included as a Single Escape Character in the JavaScript (ECMAScript) specification, but will not be used as it is not supported by Microsoft Internet Explorer versions < 9.
      • Two ranges of non-displayable, control characters (some of which are already part of the single escape characters list): U+0001 to U+001F and U+007F to U+009F.

      This method calls escapeJavaScript(String, JavaScriptEscapeType, JavaScriptEscapeLevel) with the following preconfigured values:

      This method is thread-safe.

      Parameters:
      text - the String to be escaped.
      Returns:
      The escaped result String. As a memory-performance improvement, will return the exact same object as the text input argument if no escaping modifications were required (and no additional String objects will be created during processing). Will return null if input is null.
    • escapeJavaScript

      public static String escapeJavaScript(String text)

      Perform a JavaScript level 2 (basic set and all non-ASCII chars) escape operation on a String input.

      Level 2 means this method will escape:

      • The JavaScript basic escape set:
        • The Single Escape Characters: \0 (U+0000), \b (U+0008), \t (U+0009), \n (U+000A), \v (U+000B), \f (U+000C), \r (U+000D), \" (U+0022), \' (U+0027), \\ (U+005C) and \/ (U+002F). Note that \/ is optional, and will only be used when the / symbol appears after <, as in </. This is to avoid accidentally closing <script> tags in HTML. Also, note that \v (U+000B) is actually included as a Single Escape Character in the JavaScript (ECMAScript) specification, but will not be used as it is not supported by Microsoft Internet Explorer versions < 9.
        • Two ranges of non-displayable, control characters (some of which are already part of the single escape characters list): U+0001 to U+001F and U+007F to U+009F.
      • All non ASCII characters.

      This escape will be performed by using the Single Escape Chars whenever possible. For escaped characters that do not have an associated SEC, default to using \xFF Hexadecimal Escapes if possible (characters <= U+00FF), then default to \uFFFF Hexadecimal Escapes. This type of escape produces the smallest escaped string possible.

      This method calls escapeJavaScript(String, JavaScriptEscapeType, JavaScriptEscapeLevel) with the following preconfigured values:

      This method is thread-safe.

      Parameters:
      text - the String to be escaped.
      Returns:
      The escaped result String. As a memory-performance improvement, will return the exact same object as the text input argument if no escaping modifications were required (and no additional String objects will be created during processing). Will return null if input is null.
    • escapeJavaScript

      public static String escapeJavaScript(String text, JavaScriptEscapeType type, JavaScriptEscapeLevel level)

      Perform a (configurable) JavaScript escape operation on a String input.

      This method will perform an escape operation according to the specified JavaScriptEscapeType and JavaScriptEscapeLevel argument values.

      All other String-based escapeJavaScript*(...) methods call this one with preconfigured type and level values.

      This method is thread-safe.

      Parameters:
      text - the String to be escaped.
      type - the type of escape operation to be performed, see JavaScriptEscapeType.
      level - the escape level to be applied, see JavaScriptEscapeLevel.
      Returns:
      The escaped result String. As a memory-performance improvement, will return the exact same object as the text input argument if no escaping modifications were required (and no additional String objects will be created during processing). Will return null if input is null.
    • escapeJavaScriptMinimal

      public static void escapeJavaScriptMinimal(String text, Writer writer) throws IOException

      Perform a JavaScript level 1 (only basic set) escape operation on a String input, writing results to a Writer.

      Level 1 means this method will only escape the JavaScript basic escape set:

      • The Single Escape Characters: \0 (U+0000), \b (U+0008), \t (U+0009), \n (U+000A), \v (U+000B), \f (U+000C), \r (U+000D), \" (U+0022), \' (U+0027), \\ (U+005C) and \/ (U+002F). Note that \/ is optional, and will only be used when the / symbol appears after <, as in </. This is to avoid accidentally closing <script> tags in HTML. Also, note that \v (U+000B) is actually included as a Single Escape Character in the JavaScript (ECMAScript) specification, but will not be used as it is not supported by Microsoft Internet Explorer versions < 9.
      • Two ranges of non-displayable, control characters (some of which are already part of the single escape characters list): U+0001 to U+001F and U+007F to U+009F.

      This method calls escapeJavaScript(String, Writer, JavaScriptEscapeType, JavaScriptEscapeLevel) with the following preconfigured values:

      This method is thread-safe.

      Parameters:
      text - the String to be escaped.
      writer - the java.io.Writer to which the escaped result will be written. Nothing will be written at all to this writer if input is null.
      Throws:
      IOException - if an input/output exception occurs
      Since:
      1.1.2
    • escapeJavaScript

      public static void escapeJavaScript(String text, Writer writer) throws IOException

      Perform a JavaScript level 2 (basic set and all non-ASCII chars) escape operation on a String input, writing results to a Writer.

      Level 2 means this method will escape:

      • The JavaScript basic escape set:
        • The Single Escape Characters: \0 (U+0000), \b (U+0008), \t (U+0009), \n (U+000A), \v (U+000B), \f (U+000C), \r (U+000D), \" (U+0022), \' (U+0027), \\ (U+005C) and \/ (U+002F). Note that \/ is optional, and will only be used when the / symbol appears after <, as in </. This is to avoid accidentally closing <script> tags in HTML. Also, note that \v (U+000B) is actually included as a Single Escape Character in the JavaScript (ECMAScript) specification, but will not be used as it is not supported by Microsoft Internet Explorer versions < 9.
        • Two ranges of non-displayable, control characters (some of which are already part of the single escape characters list): U+0001 to U+001F and U+007F to U+009F.
      • All non ASCII characters.

      This escape will be performed by using the Single Escape Chars whenever possible. For escaped characters that do not have an associated SEC, default to using \xFF Hexadecimal Escapes if possible (characters <= U+00FF), then default to \uFFFF Hexadecimal Escapes. This type of escape produces the smallest escaped string possible.

      This method calls escapeJavaScript(String, Writer, JavaScriptEscapeType, JavaScriptEscapeLevel) with the following preconfigured values:

      This method is thread-safe.

      Parameters:
      text - the String to be escaped.
      writer - the java.io.Writer to which the escaped result will be written. Nothing will be written at all to this writer if input is null.
      Throws:
      IOException - if an input/output exception occurs
      Since:
      1.1.2
    • escapeJavaScript

      public static void escapeJavaScript(String text, Writer writer, JavaScriptEscapeType type, JavaScriptEscapeLevel level) throws IOException

      Perform a (configurable) JavaScript escape operation on a String input, writing results to a Writer.

      This method will perform an escape operation according to the specified JavaScriptEscapeType and JavaScriptEscapeLevel argument values.

      All other String/Writer-based escapeJavaScript*(...) methods call this one with preconfigured type and level values.

      This method is thread-safe.

      Parameters:
      text - the String to be escaped.
      writer - the java.io.Writer to which the escaped result will be written. Nothing will be written at all to this writer if input is null.
      type - the type of escape operation to be performed, see JavaScriptEscapeType.
      level - the escape level to be applied, see JavaScriptEscapeLevel.
      Throws:
      IOException - if an input/output exception occurs
      Since:
      1.1.2
    • escapeJavaScriptMinimal

      public static void escapeJavaScriptMinimal(Reader reader, Writer writer) throws IOException

      Perform a JavaScript level 1 (only basic set) escape operation on a Reader input, writing results to a Writer.

      Level 1 means this method will only escape the JavaScript basic escape set:

      • The Single Escape Characters: \0 (U+0000), \b (U+0008), \t (U+0009), \n (U+000A), \v (U+000B), \f (U+000C), \r (U+000D), \" (U+0022), \' (U+0027), \\ (U+005C) and \/ (U+002F). Note that \/ is optional, and will only be used when the / symbol appears after <, as in </. This is to avoid accidentally closing <script> tags in HTML. Also, note that \v (U+000B) is actually included as a Single Escape Character in the JavaScript (ECMAScript) specification, but will not be used as it is not supported by Microsoft Internet Explorer versions < 9.
      • Two ranges of non-displayable, control characters (some of which are already part of the single escape characters list): U+0001 to U+001F and U+007F to U+009F.

      This method calls escapeJavaScript(Reader, Writer, JavaScriptEscapeType, JavaScriptEscapeLevel) with the following preconfigured values:

      This method is thread-safe.

      Parameters:
      reader - the Reader reading the text to be escaped.
      writer - the java.io.Writer to which the escaped result will be written. Nothing will be written at all to this writer if input is null.
      Throws:
      IOException - if an input/output exception occurs
      Since:
      1.1.2
    • escapeJavaScript

      public static void escapeJavaScript(Reader reader, Writer writer) throws IOException

      Perform a JavaScript level 2 (basic set and all non-ASCII chars) escape operation on a Reader input, writing results to a Writer.

      Level 2 means this method will escape:

      • The JavaScript basic escape set:
        • The Single Escape Characters: \0 (U+0000), \b (U+0008), \t (U+0009), \n (U+000A), \v (U+000B), \f (U+000C), \r (U+000D), \" (U+0022), \' (U+0027), \\ (U+005C) and \/ (U+002F). Note that \/ is optional, and will only be used when the / symbol appears after <, as in </. This is to avoid accidentally closing <script> tags in HTML. Also, note that \v (U+000B) is actually included as a Single Escape Character in the JavaScript (ECMAScript) specification, but will not be used as it is not supported by Microsoft Internet Explorer versions < 9.
        • Two ranges of non-displayable, control characters (some of which are already part of the single escape characters list): U+0001 to U+001F and U+007F to U+009F.
      • All non ASCII characters.

      This escape will be performed by using the Single Escape Chars whenever possible. For escaped characters that do not have an associated SEC, default to using \xFF Hexadecimal Escapes if possible (characters <= U+00FF), then default to \uFFFF Hexadecimal Escapes. This type of escape produces the smallest escaped string possible.

      This method calls escapeJavaScript(Reader, Writer, JavaScriptEscapeType, JavaScriptEscapeLevel) with the following preconfigured values:

      This method is thread-safe.

      Parameters:
      reader - the Reader reading the text to be escaped.
      writer - the java.io.Writer to which the escaped result will be written. Nothing will be written at all to this writer if input is null.
      Throws:
      IOException - if an input/output exception occurs
      Since:
      1.1.2
    • escapeJavaScript

      public static void escapeJavaScript(Reader reader, Writer writer, JavaScriptEscapeType type, JavaScriptEscapeLevel level) throws IOException

      Perform a (configurable) JavaScript escape operation on a Reader input, writing results to a Writer.

      This method will perform an escape operation according to the specified JavaScriptEscapeType and JavaScriptEscapeLevel argument values.

      All other Reader/Writer-based escapeJavaScript*(...) methods call this one with preconfigured type and level values.

      This method is thread-safe.

      Parameters:
      reader - the Reader reading the text to be escaped.
      writer - the java.io.Writer to which the escaped result will be written. Nothing will be written at all to this writer if input is null.
      type - the type of escape operation to be performed, see JavaScriptEscapeType.
      level - the escape level to be applied, see JavaScriptEscapeLevel.
      Throws:
      IOException - if an input/output exception occurs
      Since:
      1.1.2
    • escapeJavaScriptMinimal

      public static void escapeJavaScriptMinimal(char[] text, int offset, int len, Writer writer) throws IOException

      Perform a JavaScript level 1 (only basic set) escape operation on a char[] input.

      Level 1 means this method will only escape the JavaScript basic escape set:

      • The Single Escape Characters: \0 (U+0000), \b (U+0008), \t (U+0009), \n (U+000A), \v (U+000B), \f (U+000C), \r (U+000D), \" (U+0022), \' (U+0027), \\ (U+005C) and \/ (U+002F). Note that \/ is optional, and will only be used when the / symbol appears after <, as in </. This is to avoid accidentally closing <script> tags in HTML. Also, note that \v (U+000B) is actually included as a Single Escape Character in the JavaScript (ECMAScript) specification, but will not be used as it is not supported by Microsoft Internet Explorer versions < 9.
      • Two ranges of non-displayable, control characters (some of which are already part of the single escape characters list): U+0001 to U+001F and U+007F to U+009F.

      This method calls escapeJavaScript(char[], int, int, java.io.Writer, JavaScriptEscapeType, JavaScriptEscapeLevel) with the following preconfigured values:

      This method is thread-safe.

      Parameters:
      text - the char[] to be escaped.
      offset - the position in text at which the escape operation should start.
      len - the number of characters in text that should be escaped.
      writer - the java.io.Writer to which the escaped result will be written. Nothing will be written at all to this writer if input is null.
      Throws:
      IOException - if an input/output exception occurs
    • escapeJavaScript

      public static void escapeJavaScript(char[] text, int offset, int len, Writer writer) throws IOException

      Perform a JavaScript level 2 (basic set and all non-ASCII chars) escape operation on a char[] input.

      Level 2 means this method will escape:

      • The JavaScript basic escape set:
        • The Single Escape Characters: \0 (U+0000), \b (U+0008), \t (U+0009), \n (U+000A), \v (U+000B), \f (U+000C), \r (U+000D), \" (U+0022), \' (U+0027), \\ (U+005C) and \/ (U+002F). Note that \/ is optional, and will only be used when the / symbol appears after <, as in </. This is to avoid accidentally closing <script> tags in HTML. Also, note that \v (U+000B) is actually included as a Single Escape Character in the JavaScript (ECMAScript) specification, but will not be used as it is not supported by Microsoft Internet Explorer versions < 9.
        • Two ranges of non-displayable, control characters (some of which are already part of the single escape characters list): U+0001 to U+001F and U+007F to U+009F.
      • All non ASCII characters.

      This escape will be performed by using the Single Escape Chars whenever possible. For escaped characters that do not have an associated SEC, default to using \xFF Hexadecimal Escapes if possible (characters <= U+00FF), then default to \uFFFF Hexadecimal Escapes. This type of escape produces the smallest escaped string possible.

      This method calls escapeJavaScript(char[], int, int, java.io.Writer, JavaScriptEscapeType, JavaScriptEscapeLevel) with the following preconfigured values:

      This method is thread-safe.

      Parameters:
      text - the char[] to be escaped.
      offset - the position in text at which the escape operation should start.
      len - the number of characters in text that should be escaped.
      writer - the java.io.Writer to which the escaped result will be written. Nothing will be written at all to this writer if input is null.
      Throws:
      IOException - if an input/output exception occurs
    • escapeJavaScript

      public static void escapeJavaScript(char[] text, int offset, int len, Writer writer, JavaScriptEscapeType type, JavaScriptEscapeLevel level) throws IOException

      Perform a (configurable) JavaScript escape operation on a char[] input.

      This method will perform an escape operation according to the specified JavaScriptEscapeType and JavaScriptEscapeLevel argument values.

      All other char[]-based escapeJavaScript*(...) methods call this one with preconfigured type and level values.

      This method is thread-safe.

      Parameters:
      text - the char[] to be escaped.
      offset - the position in text at which the escape operation should start.
      len - the number of characters in text that should be escaped.
      writer - the java.io.Writer to which the escaped result will be written. Nothing will be written at all to this writer if input is null.
      type - the type of escape operation to be performed, see JavaScriptEscapeType.
      level - the escape level to be applied, see JavaScriptEscapeLevel.
      Throws:
      IOException - if an input/output exception occurs
    • unescapeJavaScript

      public static String unescapeJavaScript(String text)

      Perform a JavaScript unescape operation on a String input.

      No additional configuration arguments are required. Unescape operations will always perform complete JavaScript unescape of SECs, x-based, u-based and octal escapes.

      This method is thread-safe.

      Parameters:
      text - the String to be unescaped.
      Returns:
      The unescaped result String. As a memory-performance improvement, will return the exact same object as the text input argument if no unescaping modifications were required (and no additional String objects will be created during processing). Will return null if input is null.
    • unescapeJavaScript

      public static void unescapeJavaScript(String text, Writer writer) throws IOException

      Perform a JavaScript unescape operation on a String input, writing results to a Writer.

      No additional configuration arguments are required. Unescape operations will always perform complete JavaScript unescape of SECs, x-based, u-based and octal escapes.

      This method is thread-safe.

      Parameters:
      text - the String to be unescaped.
      writer - the java.io.Writer to which the unescaped result will be written. Nothing will be written at all to this writer if input is null.
      Throws:
      IOException - if an input/output exception occurs
      Since:
      1.1.2
    • unescapeJavaScript

      public static void unescapeJavaScript(Reader reader, Writer writer) throws IOException

      Perform a JavaScript unescape operation on a Reader input, writing results to a Writer.

      No additional configuration arguments are required. Unescape operations will always perform complete JavaScript unescape of SECs, x-based, u-based and octal escapes.

      This method is thread-safe.

      Parameters:
      reader - the Reader reading the text to be unescaped.
      writer - the java.io.Writer to which the unescaped result will be written. Nothing will be written at all to this writer if input is null.
      Throws:
      IOException - if an input/output exception occurs
      Since:
      1.1.2
    • unescapeJavaScript

      public static void unescapeJavaScript(char[] text, int offset, int len, Writer writer) throws IOException

      Perform a JavaScript unescape operation on a char[] input.

      No additional configuration arguments are required. Unescape operations will always perform complete JavaScript unescape of SECs, x-based, u-based and octal escapes.

      This method is thread-safe.

      Parameters:
      text - the char[] to be unescaped.
      offset - the position in text at which the unescape operation should start.
      len - the number of characters in text that should be unescaped.
      writer - the java.io.Writer to which the unescaped result will be written. Nothing will be written at all to this writer if input is null.
      Throws:
      IOException - if an input/output exception occurs