Package javajs.util

Class CifDataParser

java.lang.Object
javajs.util.CifDataParser
All Implemented Interfaces:
GenericCifDataParser
Direct Known Subclasses:
Cif2DataParser

public class CifDataParser extends Object implements GenericCifDataParser
A CIF 1.0 tokenizer class for dealing with quoted strings in CIF files. Subclassed by org.jmol.adapters.readers.cif.Cif2DataParser Greek letters implemented in Jmol 13.3.9 and only for titles and space groups. All other mark ups ignored.

regarding the treatment of single quotes vs. primes in cif file, PMR wrote:

* There is a formal grammar for CIF (see http://www.iucr.org/iucr-top/cif/index.html) which confirms this. The textual explanation is

14. Matching single or double quote characters (' or ") may be used to bound a string representing a non-simple data value provided the string does not extend over more than one line.

15. Because data values are invariably separated from other tokens in the file by white space, such a quote-delimited character string may contain instances of the character used to delimit the string provided they are not followed by white space. For example, the data item _example 'a dog's life' is legal; the data value is a dog's life.

[PMR - the terminating character(s) are quote+whitespace. That would mean that: _example 'Jones' life' would be an error

The CIF format was developed in that late 1980's under the aegis of the International Union of Crystallography (I am a consultant to the COMCIFs committee). It was ratified by the Union and there have been several workshops. mmCIF is an extension of CIF which includes a relational structure. The formal publications are:

Hall, S. R. (1991). "The STAR File: A New Format for Electronic Data Transfer and Archiving", J. Chem. Inform. Comp. Sci., 31, 326-333. Hall, S. R., Allen, F. H. and Brown, I. D. (1991). "The Crystallographic Information File (CIF): A New Standard Archive File for Crystallography", Acta Cryst., A47, 655-685. Hall, S.R. invalid input: '&' Spadaccini, N. (1994). "The STAR File: Detailed Specifications," J. Chem. Info. Comp. Sci., 34, 505-508.

  • Field Details

    • KEY_MAX

      public static final int KEY_MAX
      The maximum number of columns (data keys) passed to the parser or found in the file for a given loop_ or category.subkey listing.
      See Also:
    • line

      protected String line
      from buffered reader
    • str

      protected String str
      working string (buffer)
    • ich

      protected int ich
      pointer to current character on str
    • cch

      protected int cch
      length of str
    • wasUnquoted

      protected boolean wasUnquoted
      whether we are processing an unquoted value or key
    • cterm

      protected char cterm
      optional token terminator; in CIF 2.0 could be } or ]
    • nullString

      protected String nullString
      string to return for CIF data value . and ?
    • asObject

      protected boolean asObject
      A flag to create and return Java objects, not strings. Used only by Jmol scripting x = getProperty("cifInfo", filename).
    • debugging

      protected boolean debugging
      debugging flag passed from reader; unused
    • columnCount

      protected int columnCount
    • columnNames

      protected String[] columnNames
    • haveData

      protected boolean haveData
    • htFields

      protected static Map<String,Integer> htFields
      A global, static map that contains field information. The assumption is that if we read a set of fields for, say, atom_site, once in a lifetime, then that should be good forever. Those are static lists. Or should be....
  • Constructor Details

    • CifDataParser

      public CifDataParser()
  • Method Details

    • getVersion

      protected int getVersion()
    • setNullValue

      public void setNullValue(String nullString)
      Set the string value of what is returned for "." and "?"
      Parameters:
      nullString - null here returns "." and "?"; default is "\0"
    • getColumnData

      public Object getColumnData(int i)
      Specified by:
      getColumnData in interface GenericCifDataParser
    • getColumnCount

      public int getColumnCount()
      Specified by:
      getColumnCount in interface GenericCifDataParser
    • getColumnName

      public String getColumnName(int i)
      Specified by:
      getColumnName in interface GenericCifDataParser
    • set

      public CifDataParser set(GenericLineReader reader, BufferedReader br, boolean debugging)
      A Chemical Information File data parser. set() should be called immediately upon construction. Two options; one of reader or br should be null, or reader will be ignored. Just simpler this way...
      Specified by:
      set in interface GenericCifDataParser
      Parameters:
      reader - Anything that can deliver a line of text or null
      br - A standard BufferedReader.
      debugging -
    • getFileHeader

      public String getFileHeader()
      Specified by:
      getFileHeader in interface GenericCifDataParser
      Returns:
      commented-out section at the start of a CIF file.
    • getAllCifData

      public Map<String,Object> getAllCifData()
      Parses all CIF data for a reader defined in the constructor into a standard Map structure and close the BufferedReader if it exists.
      Specified by:
      getAllCifData in interface GenericCifDataParser
      Returns:
      Hashtable of models Vector of Hashtable data
    • getAllCifDataType

      public Map<String,Object> getAllCifDataType(String... types)
    • readLine

      public String readLine()
      Specified by:
      readLine in interface GenericCifDataParser
    • getData

      public boolean getData() throws Exception
      The work horse; a general reader for loop data. Fills colunnData with fieldCount fields.
      Specified by:
      getData in interface GenericCifDataParser
      Returns:
      false if EOF
      Throws:
      Exception
    • skipLoop

      public String skipLoop(boolean doReport) throws Exception
      Skips all associated loop data. (Skips to next control word.)
      Specified by:
      skipLoop in interface GenericCifDataParser
      Throws:
      Exception
    • getNextToken

      public String getNextToken() throws Exception
      Get a token as a String value (for the reader)
      Specified by:
      getNextToken in interface GenericCifDataParser
      Returns:
      the next token of any kind, or null
      Throws:
      Exception
    • getNextTokenObject

      public Object getNextTokenObject() throws Exception
      Get the token as a Java Object
      Returns:
      the next token of any kind, or null
      Throws:
      Exception
    • getNextTokenProtected

      protected Object getNextTokenProtected() throws Exception
      Just makes sure
      Returns:
      String from buffer.
      Throws:
      Exception
    • getNextDataToken

      public Object getNextDataToken() throws Exception
      first checks to see if the next token is an unquoted control code, and if so, returns null
      Specified by:
      getNextDataToken in interface GenericCifDataParser
      Returns:
      next data token or null
      Throws:
      Exception
    • peekToken

      public Object peekToken() throws Exception
      Just look at the next token. Saves it for retrieval using getTokenPeeked()
      Specified by:
      peekToken in interface GenericCifDataParser
      Returns:
      next token or null if EOF
      Throws:
      Exception
    • getTokenPeeked

      public Object getTokenPeeked()
      Specified by:
      getTokenPeeked in interface GenericCifDataParser
      Returns:
      the token last acquired; may be null
    • fullTrim

      public String fullTrim(String str)
      Used especially for data that might be multi-line data that might have unwanted white space at start or end.
      Specified by:
      fullTrim in interface GenericCifDataParser
      Parameters:
      str -
      Returns:
      str without any leading/trailing white space, and no '\n'
    • toUnicode

      public String toUnicode(String data)
      Only translating the basic Greek set here, not all the other stuff. See http://www.iucr.org/resources/cif/spec/version1.1/semantics#markup
      Specified by:
      toUnicode in interface GenericCifDataParser
      Parameters:
      data -
      Returns:
      cleaned string
    • parseDataBlockParameters

      public void parseDataBlockParameters(String[] fields, String key, String data, int[] key2col, int[] col2key) throws Exception
      Process a data block, with or without a loop_. Passed an array of field names, this method fills two int[] arrays. The first, key2col, maps desired key values to actual order of appearance (column number) in the file; the second, col2key, is a reverse loop-up for that, mapping column numbers to desired field indices. When called within a loop_ context, this.columnData will be created but not filled. Alternatively, if fields is null, then this.fieldNames is filled, in order, with key data, and both key2col and col2key will be simply 0,1,2,... This array is used in cases such as matrices for which there are simply too many possibilities to list, and the key name itself contains information that we need. When not a loop_ context, keys are expected to be in the mmCIF form category.subkey and will be unique within a data block (see http://mmcif.wwpdb.org/docs/tutorials/mechanics/pdbx-mmcif-syntax.html). Keys and data will be read for all data in the same category, filling this.columnData. In this way, the calling class does not need to enumerate all possible category names, but instead can focus on just those of interest.
      Specified by:
      parseDataBlockParameters in interface GenericCifDataParser
      Parameters:
      fields - list of normalized field names, such as "_pdbx_struct_assembly_gen_assembly_id" (with "_" instead of ".")
      key - null to indicate a loop_ construct, otherwise the initial category.subkey found
      data - when not loop_ the initial data read, otherwise ignored
      key2col - map of desired keys to actual columns
      col2key - map of actual columns to desired keys
      Throws:
      Exception
    • fixKey

      public String fixKey(String key)
      Specified by:
      fixKey in interface GenericCifDataParser
    • setString

      protected String setString(String str)
      sets global str and line to be parsed from the beginning \1 .... \1 indicates an embedded fully escaped data object
      Parameters:
      str - new data string
      Returns:
      str
    • prepareNextLine

      protected String prepareNextLine() throws Exception
      sets the string for parsing to be from the next line when the token buffer is empty, and if ';' is at the beginning of that line, extends the string to include that full multiline string. Uses \1 to indicate that this is a special quotation.
      Returns:
      the next line or null if EOF
      Throws:
      Exception
    • preprocessString

      protected String preprocessString() throws Exception
      Preprocess the string on a line starting with a semicolon to produce a string with a \1 ... \1 segment that will be picked up in the next round
      Returns:
      escaped part with attached extra data
      Throws:
      Exception
    • preprocessSemiString

      protected String preprocessSemiString() throws Exception
      Encapsulate a multi-line ; .... ; string with \1 ... \1 CIF 1.0 and CIF 2.0
      Returns:
      ecapsulated string
      Throws:
      Exception
    • unquoted

      protected Object unquoted(String s)
      In CIF 2.0, this method turns a String into an Integer or Float In CIF 1.0 (here) just return the unchanged value.
      Parameters:
      s - unquoted string
      Returns:
      unchanged value
    • isTerminator

      protected boolean isTerminator(char c)
      The token terminator is space or tab in CIF 1.0, but it can be quoted strings in CIF 2.0.
      Parameters:
      c -
      Returns:
      true if this character is a terminator
    • isQuote

      protected boolean isQuote(char ch)
      CIF 1.0 only; we handle various quote types here
      Parameters:
      ch -
      Returns:
      true if this character is a (starting) quote
    • getQuotedStringOrObject

      protected Object getQuotedStringOrObject(char ch)
      CIF 1.0 only.
      Parameters:
      ch - current character being pointed to
      Returns:
      a String data object
    • readList

      public Object readList() throws Exception
      Read a CIF 2.0 list structure, converting it to either a JSON string or to a Java data structure
      Returns:
      a string or data structure, depending upon setting asObject
      Throws:
      Exception
    • skipNextToken

      public String skipNextToken() throws Exception
      Specified by:
      skipNextToken in interface GenericCifDataParser
      Throws:
      Exception