Class JavaEscape

Object
org.unbescape.java.JavaEscape

public final class JavaEscape extends Object

Utility class for performing Java escape/unescape operations.

Configuration of escape/unescape operations

Escape operations can be (optionally) configured by means of:

  • Level, which defines how deep the escape operation must be (what chars are to be considered eligible for escaping, depending on the specific needs of the scenario). Its values are defined by the JavaEscapeLevel enum.

Unbescape does not define a 'type' for Java escaping (just a level) because, given the way Unicode Escapes work in Java, there is no possibility to choose whether we want to escape, for example, a tab character (U+0009) as a Single Escape Char (\t) or as a Unicode Escape (\u0009). Given Unicode Escapes are processed by the compiler and not the runtime, using \u0009 instead of \t would really insert a tab character inside our source code before compiling, which is not equivalent to inserting "\t" in a String literal.

Unescape operations need no configuration parameters. Unescape operations will always perform complete unescape of SECs (\n), u-based (\u00E1) and octal (\341) escapes.

Features

Specific features of the Java escape/unescape operations performed by means of this class:

  • The Java basic escape set is supported. This basic set consists of:
    • The Single Escape Characters: \b (U+0008), \t (U+0009), \n (U+000A), \f (U+000C), \r (U+000D), \" (U+0022), \' (U+0027) and \\ (U+005C). Note \' is not really needed in String literals (only in Character literals), so it won't be used until escape level 3.
    • Two ranges of non-displayable, control characters (some of which are already part of the single escape characters list): U+0000 to U+001F and U+007F to U+009F.
  • U-based hexadecimal escapes (a.k.a. unicode escapes) are supported both in escape and unescape operations: \u00E1.
  • Octal escapes are supported, though only in unescape operations: \071. These are not supported in escape operations because the use of octal escapes is not recommended by the Java Language Specification (it's usage is allowed mainly for C compatibility reasons).
  • Support for the whole Unicode character set: \u0000 to \u10FFFF, including characters not representable by only one char in Java (>\uFFFF).
Specific features of Unicode Escapes in Java

The way Unicode Escapes work in Java is different to other languages like e.g. JavaScript. In Java, these UHEXA escapes are processed by the compiler itself, and therefore resolved before any other type of escapes. Besides, UHEXA escapes can appear anywhere in the code, not only String literals. This means that, while in JavaScript 'a\u005Cna' would be displayed as a\na, in Java "a\u005Cna" would in fact be displayed in two lines: a+<LF>+a.

Going even further, this is perfectly valid Java code:

final String hello = \u0022Hello, World!\u0022;

Also, Java allows to write any number of 'u' characters in this type of escapes, like \uu00E1 or even \uuuuuuuuu00E1. This is so in order to enable legacy compatibility with older code-processing tools that didn't support Unicode processing at all, which would fail when finding an Unicode escape like \u00E1, but not \uu00E1 (because they would consider \u as the escape). So this is valid Java code too:

final String hello = \uuuuuuuu0022Hello, World!\u0022;

In order to correctly unescape Java UHEXA escapes like "a\u005Cna", Unbescape will perform a two-pass process so that all unicode escapes are processed in the first pass, and then the single escape characters and octal escapes in the second pass.

Input/Output

There are four different input/output modes that can be used in escape/unescape operations:

  • String input, String output: Input is specified as a String object and output is returned as another. In order to improve memory performance, all escape and unescape operations will return the exact same input object as output if no escape/unescape modifications are required.
  • String input, java.io.Writer output: Input will be read from a String and output will be written into the specified java.io.Writer.
  • java.io.Reader input, java.io.Writer output: Input will be read from a Reader and output will be written into the specified java.io.Writer.
  • char[] input, java.io.Writer output: Input will be read from a char array (char[]) and output will be written into the specified java.io.Writer. Two int arguments called offset and len will be used for specifying the part of the char[] that should be escaped/unescaped. These methods should be called with offset = 0 and len = text.length in order to process the whole char[].
Glossary
SEC
Single Escape Character: \b (U+0008), \t (U+0009), \n (U+000A), \f (U+000C), \r (U+000D), \" (U+0022), \' (U+0027) and \\ (U+005C). Note \' is not really needed in String literals (only in Character literals), so it won't be used until escape level 3.
UHEXA escapes
Also called u-based hexadecimal escapes or simply unicode escapes: complete representation of unicode codepoints up to U+FFFF, with \u followed by exactly four hexadecimal figures: \u00E1. Unicode codepoints > U+FFFF can be represented in Java by mean of two UHEXA escapes (a surrogate pair).
Octal escapes
Octal representation of unicode codepoints up to U+00FF, with \ followed by up to three octal figures: \071. Though up to three octal figures are allowed, octal numbers > 377 (0xFF) are not supported. These are not supported in escape operations because the use of octal escapes is not recommended by the Java Language Specification (it's usage is allowed mainly for C compatibility reasons).
Unicode Codepoint
Each of the int values conforming the Unicode code space. Normally corresponding to a Java char primitive value (codepoint <= \uFFFF), but might be two chars for codepoints \u10000 to \u10FFFF if the first char is a high surrogate (\uD800 to \uDBFF) and the second is a low surrogate (\uDC00 to \uDFFF).
References

The following references apply:

Since:
1.0.0
Author:
Daniel Fernández
  • Method Summary

    Modifier and Type
    Method
    Description
    static void
    escapeJava(char[] text, int offset, int len, Writer writer)
    Perform a Java level 2 (basic set and all non-ASCII chars) escape operation on a char[] input.
    static void
    escapeJava(char[] text, int offset, int len, Writer writer, JavaEscapeLevel level)
    Perform a (configurable) Java escape operation on a char[] input.
    static void
    escapeJava(Reader reader, Writer writer)
    Perform a Java level 2 (basic set and all non-ASCII chars) escape operation on a Reader input, writing results to a Writer.
    static void
    escapeJava(Reader reader, Writer writer, JavaEscapeLevel level)
    Perform a (configurable) Java escape operation on a Reader input, writing results to a Writer.
    static String
    Perform a Java level 2 (basic set and all non-ASCII chars) escape operation on a String input.
    static void
    escapeJava(String text, Writer writer)
    Perform a Java level 2 (basic set and all non-ASCII chars) escape operation on a String input, writing results to a Writer.
    static void
    escapeJava(String text, Writer writer, JavaEscapeLevel level)
    Perform a (configurable) Java escape operation on a String input, writing results to a Writer.
    static String
    Perform a (configurable) Java escape operation on a String input.
    static void
    escapeJavaMinimal(char[] text, int offset, int len, Writer writer)
    Perform a Java level 1 (only basic set) escape operation on a char[] input.
    static void
    escapeJavaMinimal(Reader reader, Writer writer)
    Perform a Java level 1 (only basic set) escape operation on a Reader input, writing results to a Writer.
    static String
    Perform a Java level 1 (only basic set) escape operation on a String input.
    static void
    Perform a Java level 1 (only basic set) escape operation on a String input, writing results to a Writer.
    static void
    unescapeJava(char[] text, int offset, int len, Writer writer)
    Perform a Java unescape operation on a char[] input.
    static void
    unescapeJava(Reader reader, Writer writer)
    Perform a Java unescape operation on a Reader input, writing results to a Writer.
    static String
    Perform a Java unescape operation on a String input.
    static void
    unescapeJava(String text, Writer writer)
    Perform a Java unescape operation on a String input, writing results to a Writer.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Method Details

    • escapeJavaMinimal

      public static String escapeJavaMinimal(String text)

      Perform a Java level 1 (only basic set) escape operation on a String input.

      Level 1 means this method will only escape the Java basic escape set:

      • The Single Escape Characters: \b (U+0008), \t (U+0009), \n (U+000A), \f (U+000C), \r (U+000D), \" (U+0022), \' (U+0027) and \\ (U+005C). Note \' is not really needed in String literals (only in Character literals), so it won't be used until escape level 3.
      • Two ranges of non-displayable, control characters (some of which are already part of the single escape characters list): U+0000 to U+001F and U+007F to U+009F.

      This method calls escapeJava(String, JavaEscapeLevel) with the following preconfigured values:

      This method is thread-safe.

      Parameters:
      text - the String to be escaped.
      Returns:
      The escaped result String. As a memory-performance improvement, will return the exact same object as the text input argument if no escaping modifications were required (and no additional String objects will be created during processing). Will return null if input is null.
    • escapeJava

      public static String escapeJava(String text)

      Perform a Java level 2 (basic set and all non-ASCII chars) escape operation on a String input.

      Level 2 means this method will escape:

      • The Java basic escape set:
        • The Single Escape Characters: \b (U+0008), \t (U+0009), \n (U+000A), \f (U+000C), \r (U+000D), \" (U+0022), \' (U+0027) and \\ (U+005C). Note \' is not really needed in String literals (only in Character literals), so it won't be used until escape level 3.
        • Two ranges of non-displayable, control characters (some of which are already part of the single escape characters list): U+0000 to U+001F and U+007F to U+009F.
      • All non ASCII characters.

      This escape will be performed by using the Single Escape Chars whenever possible. For escaped characters that do not have an associated SEC, default to \uFFFF Hexadecimal Escapes.

      This method calls escapeJava(String, JavaEscapeLevel) with the following preconfigured values:

      This method is thread-safe.

      Parameters:
      text - the String to be escaped.
      Returns:
      The escaped result String. As a memory-performance improvement, will return the exact same object as the text input argument if no escaping modifications were required (and no additional String objects will be created during processing). Will return null if input is null.
    • escapeJava

      public static String escapeJava(String text, JavaEscapeLevel level)

      Perform a (configurable) Java escape operation on a String input.

      This method will perform an escape operation according to the specified JavaEscapeLevel argument value.

      All other String-based escapeJava*(...) methods call this one with preconfigured level values.

      This method is thread-safe.

      Parameters:
      text - the String to be escaped.
      level - the escape level to be applied, see JavaEscapeLevel.
      Returns:
      The escaped result String. As a memory-performance improvement, will return the exact same object as the text input argument if no escaping modifications were required (and no additional String objects will be created during processing). Will return null if input is null.
    • escapeJavaMinimal

      public static void escapeJavaMinimal(String text, Writer writer) throws IOException

      Perform a Java level 1 (only basic set) escape operation on a String input, writing results to a Writer.

      Level 1 means this method will only escape the Java basic escape set:

      • The Single Escape Characters: \b (U+0008), \t (U+0009), \n (U+000A), \f (U+000C), \r (U+000D), \" (U+0022), \' (U+0027) and \\ (U+005C). Note \' is not really needed in String literals (only in Character literals), so it won't be used until escape level 3.
      • Two ranges of non-displayable, control characters (some of which are already part of the single escape characters list): U+0000 to U+001F and U+007F to U+009F.

      This method calls escapeJava(String, Writer, JavaEscapeLevel) with the following preconfigured values:

      This method is thread-safe.

      Parameters:
      text - the String to be escaped.
      writer - the java.io.Writer to which the escaped result will be written. Nothing will be written at all to this writer if input is null.
      Throws:
      IOException - if an input/output exception occurs
      Since:
      1.1.2
    • escapeJava

      public static void escapeJava(String text, Writer writer) throws IOException

      Perform a Java level 2 (basic set and all non-ASCII chars) escape operation on a String input, writing results to a Writer.

      Level 2 means this method will escape:

      • The Java basic escape set:
        • The Single Escape Characters: \b (U+0008), \t (U+0009), \n (U+000A), \f (U+000C), \r (U+000D), \" (U+0022), \' (U+0027) and \\ (U+005C). Note \' is not really needed in String literals (only in Character literals), so it won't be used until escape level 3.
        • Two ranges of non-displayable, control characters (some of which are already part of the single escape characters list): U+0000 to U+001F and U+007F to U+009F.
      • All non ASCII characters.

      This escape will be performed by using the Single Escape Chars whenever possible. For escaped characters that do not have an associated SEC, default to \uFFFF Hexadecimal Escapes.

      This method calls escapeJava(String, Writer, JavaEscapeLevel) with the following preconfigured values:

      This method is thread-safe.

      Parameters:
      text - the String to be escaped.
      writer - the java.io.Writer to which the escaped result will be written. Nothing will be written at all to this writer if input is null.
      Throws:
      IOException - if an input/output exception occurs
      Since:
      1.1.2
    • escapeJava

      public static void escapeJava(String text, Writer writer, JavaEscapeLevel level) throws IOException

      Perform a (configurable) Java escape operation on a String input, writing results to a Writer.

      This method will perform an escape operation according to the specified JavaEscapeLevel argument value.

      All other String/Writer-based escapeJava*(...) methods call this one with preconfigured level values.

      This method is thread-safe.

      Parameters:
      text - the String to be escaped.
      writer - the java.io.Writer to which the escaped result will be written. Nothing will be written at all to this writer if input is null.
      level - the escape level to be applied, see JavaEscapeLevel.
      Throws:
      IOException - if an input/output exception occurs
      Since:
      1.1.2
    • escapeJavaMinimal

      public static void escapeJavaMinimal(Reader reader, Writer writer) throws IOException

      Perform a Java level 1 (only basic set) escape operation on a Reader input, writing results to a Writer.

      Level 1 means this method will only escape the Java basic escape set:

      • The Single Escape Characters: \b (U+0008), \t (U+0009), \n (U+000A), \f (U+000C), \r (U+000D), \" (U+0022), \' (U+0027) and \\ (U+005C). Note \' is not really needed in String literals (only in Character literals), so it won't be used until escape level 3.
      • Two ranges of non-displayable, control characters (some of which are already part of the single escape characters list): U+0000 to U+001F and U+007F to U+009F.

      This method calls escapeJava(Reader, Writer, JavaEscapeLevel) with the following preconfigured values:

      This method is thread-safe.

      Parameters:
      reader - the Reader reading the text to be escaped.
      writer - the java.io.Writer to which the escaped result will be written. Nothing will be written at all to this writer if input is null.
      Throws:
      IOException - if an input/output exception occurs
      Since:
      1.1.2
    • escapeJava

      public static void escapeJava(Reader reader, Writer writer) throws IOException

      Perform a Java level 2 (basic set and all non-ASCII chars) escape operation on a Reader input, writing results to a Writer.

      Level 2 means this method will escape:

      • The Java basic escape set:
        • The Single Escape Characters: \b (U+0008), \t (U+0009), \n (U+000A), \f (U+000C), \r (U+000D), \" (U+0022), \' (U+0027) and \\ (U+005C). Note \' is not really needed in String literals (only in Character literals), so it won't be used until escape level 3.
        • Two ranges of non-displayable, control characters (some of which are already part of the single escape characters list): U+0000 to U+001F and U+007F to U+009F.
      • All non ASCII characters.

      This escape will be performed by using the Single Escape Chars whenever possible. For escaped characters that do not have an associated SEC, default to \uFFFF Hexadecimal Escapes.

      This method calls escapeJava(Reader, Writer, JavaEscapeLevel) with the following preconfigured values:

      This method is thread-safe.

      Parameters:
      reader - the Reader reading the text to be escaped.
      writer - the java.io.Writer to which the escaped result will be written. Nothing will be written at all to this writer if input is null.
      Throws:
      IOException - if an input/output exception occurs
      Since:
      1.1.2
    • escapeJava

      public static void escapeJava(Reader reader, Writer writer, JavaEscapeLevel level) throws IOException

      Perform a (configurable) Java escape operation on a Reader input, writing results to a Writer.

      This method will perform an escape operation according to the specified JavaEscapeLevel argument value.

      All other String/Writer-based escapeJava*(...) methods call this one with preconfigured level values.

      This method is thread-safe.

      Parameters:
      reader - the Reader reading the text to be escaped.
      writer - the java.io.Writer to which the escaped result will be written. Nothing will be written at all to this writer if input is null.
      level - the escape level to be applied, see JavaEscapeLevel.
      Throws:
      IOException - if an input/output exception occurs
      Since:
      1.1.2
    • escapeJavaMinimal

      public static void escapeJavaMinimal(char[] text, int offset, int len, Writer writer) throws IOException

      Perform a Java level 1 (only basic set) escape operation on a char[] input.

      Level 1 means this method will only escape the Java basic escape set:

      • The Single Escape Characters: \b (U+0008), \t (U+0009), \n (U+000A), \f (U+000C), \r (U+000D), \" (U+0022), \' (U+0027) and \\ (U+005C). Note \' is not really needed in String literals (only in Character literals), so it won't be used until escape level 3.
      • Two ranges of non-displayable, control characters (some of which are already part of the single escape characters list): U+0000 to U+001F and U+007F to U+009F.

      This method calls escapeJava(char[], int, int, java.io.Writer, JavaEscapeLevel) with the following preconfigured values:

      This method is thread-safe.

      Parameters:
      text - the char[] to be escaped.
      offset - the position in text at which the escape operation should start.
      len - the number of characters in text that should be escaped.
      writer - the java.io.Writer to which the escaped result will be written. Nothing will be written at all to this writer if input is null.
      Throws:
      IOException - if an input/output exception occurs
    • escapeJava

      public static void escapeJava(char[] text, int offset, int len, Writer writer) throws IOException

      Perform a Java level 2 (basic set and all non-ASCII chars) escape operation on a char[] input.

      Level 2 means this method will escape:

      • The Java basic escape set:
        • The Single Escape Characters: \b (U+0008), \t (U+0009), \n (U+000A), \f (U+000C), \r (U+000D), \" (U+0022), \' (U+0027) and \\ (U+005C). Note \' is not really needed in String literals (only in Character literals), so it won't be used until escape level 3.
        • Two ranges of non-displayable, control characters (some of which are already part of the single escape characters list): U+0000 to U+001F and U+007F to U+009F.
      • All non ASCII characters.

      This escape will be performed by using the Single Escape Chars whenever possible. For escaped characters that do not have an associated SEC, default to \uFFFF Hexadecimal Escapes.

      This method calls escapeJava(char[], int, int, java.io.Writer, JavaEscapeLevel) with the following preconfigured values:

      This method is thread-safe.

      Parameters:
      text - the char[] to be escaped.
      offset - the position in text at which the escape operation should start.
      len - the number of characters in text that should be escaped.
      writer - the java.io.Writer to which the escaped result will be written. Nothing will be written at all to this writer if input is null.
      Throws:
      IOException - if an input/output exception occurs
    • escapeJava

      public static void escapeJava(char[] text, int offset, int len, Writer writer, JavaEscapeLevel level) throws IOException

      Perform a (configurable) Java escape operation on a char[] input.

      This method will perform an escape operation according to the specified JavaEscapeLevel argument value.

      All other char[]-based escapeJava*(...) methods call this one with preconfigured level values.

      This method is thread-safe.

      Parameters:
      text - the char[] to be escaped.
      offset - the position in text at which the escape operation should start.
      len - the number of characters in text that should be escaped.
      writer - the java.io.Writer to which the escaped result will be written. Nothing will be written at all to this writer if input is null.
      level - the escape level to be applied, see JavaEscapeLevel.
      Throws:
      IOException - if an input/output exception occurs
    • unescapeJava

      public static String unescapeJava(String text)

      Perform a Java unescape operation on a String input.

      No additional configuration arguments are required. Unescape operations will always perform complete Java unescape of SECs, u-based and octal escapes.

      This method is thread-safe.

      Parameters:
      text - the String to be unescaped.
      Returns:
      The unescaped result String. As a memory-performance improvement, will return the exact same object as the text input argument if no unescaping modifications were required (and no additional String objects will be created during processing). Will return null if input is null.
    • unescapeJava

      public static void unescapeJava(String text, Writer writer) throws IOException

      Perform a Java unescape operation on a String input, writing results to a Writer.

      No additional configuration arguments are required. Unescape operations will always perform complete Java unescape of SECs, u-based and octal escapes.

      This method is thread-safe.

      Parameters:
      text - the String to be unescaped.
      writer - the java.io.Writer to which the unescaped result will be written. Nothing will be written at all to this writer if input is null.
      Throws:
      IOException - if an input/output exception occurs
      Since:
      1.1.2
    • unescapeJava

      public static void unescapeJava(Reader reader, Writer writer) throws IOException

      Perform a Java unescape operation on a Reader input, writing results to a Writer.

      No additional configuration arguments are required. Unescape operations will always perform complete Java unescape of SECs, u-based and octal escapes.

      This method is thread-safe.

      Parameters:
      reader - the Reader reading the text to be unescaped.
      writer - the java.io.Writer to which the unescaped result will be written. Nothing will be written at all to this writer if input is null.
      Throws:
      IOException - if an input/output exception occurs
      Since:
      1.1.2
    • unescapeJava

      public static void unescapeJava(char[] text, int offset, int len, Writer writer) throws IOException

      Perform a Java unescape operation on a char[] input.

      No additional configuration arguments are required. Unescape operations will always perform complete Java unescape of SECs, u-based and octal escapes.

      This method is thread-safe.

      Parameters:
      text - the char[] to be unescaped.
      offset - the position in text at which the unescape operation should start.
      len - the number of characters in text that should be unescaped.
      writer - the java.io.Writer to which the unescaped result will be written. Nothing will be written at all to this writer if input is null.
      Throws:
      IOException - if an input/output exception occurs