FITS File Calculator Expressions
General Syntax
The expression can be an arbitrarily complex series of operations
performed on constants, keyword values, and column data taken from
the specified FITS TABLE extension. For the case of row filtering,
the expression must evaluate to a single boolean value for each row
of the table.
Keyword and column data are referenced by name. Any string of
characters not surrounded by quotes (ie, a constant string) or
followed by an open parentheses (ie, a function name) will be
initially interpretted as a column name and its contents for the
current row inserted into the expression. If no such column exists,
a keyword of that name will be searched for and its value used, if
found. To force the name to be interpretted as a keyword (in case
there is both a column and keyword with the same name), precede the
keyword name with a single pound sign, '#', as in '#NAXIS2'. Due to
the generalities of FITS column and keyword names, if the column or
keyword name contains a space or a character which might appear as
an arithmetic term then inclose the name in '$' characters as in
$MAX PHA$ or #$MAX-PHA$. Names are case insensitive.
To access a table entry in a row other than the current one, follow
the column's name with a row offset within curly braces. For
example, 'PHA\{-3\}' will evaluate to the value of column PHA, 3 rows
above the row currently being processed. One cannot specify an
absolute row number, only a relative offset. Rows that fall outside
the table will be treated as undefined, or NULLs.
Boolean operators can be used in the expression in either their
Fortran or C forms. The following boolean operators are available:
"equal" .eq. .EQ. == "not equal" .ne. .NE. !=
"less than" .lt. .LT. < "less than/equal" .le. .LE. <= =<
"greater than" .gt. .GT. > "greater than/equal" .ge. .GE. >= =>
"or" .or. .OR. || "and" .and. .AND. &&
"negation" .not. .NOT. ! "approx. equal(1e-7)" ~
The expression may also include arithmetic operators and functions.
Trigonometric functions use radians, not degrees. The following
arithmetic operators and functions can be used in the expression
(function names are case insensitive). A null value will be returned
in case of illegal operations such as divide by zero, sqrt(negative)
log(negative), log10(negative), arccos(.gt. 1), arcsin(.gt. 1).
"addition" + "subtraction" -
"multiplication" * "division" /
"negation" - "exponentiation" ** ^
"absolute value" abs(x) "cosine" cos(x)
"sine" sin(x) "tangent" tan(x)
"arc cosine" arccos(x) "arc sine" arcsin(x)
"arc tangent" arctan(x) "arc tangent" arctan2(y,x)
"hyperbolic cos" cosh(x) "hyperbolic sin" sinh(x)
"hyperbolic tan" tanh(x) "round to nearest int" round(x)
"round down to int" floor(x) "round up to int" ceil(x)
"exponential" exp(x) "square root" sqrt(x)
"natural log" log(x) "common log" log10(x)
"error function" erf(x) "complement of erf" erfc(x)
"gamma function" gamma(x)
"modulus" x % y
"bitwise AND" x & y "bitwise OR" x | y
"bitwise XOR" x ^^ y (bitwise operators are 32-bit int only)
"random # [0.0,1.0)" random()
"random Gausian" randomn() "random Poisson" randomp(x)
"minimum" min(x,y) "maximum" max(x,y)
"cumulative sum" accum(x) "sequential difference" seqdiff(x)
"if-then-else" b?x:y
"angular separation" angsep(ra1,dec1,ra2,de2) (all in degrees)
"substring" strmid(s,p,n) "string search" strstr(s,r)
The bitwise operators for AND, OR and XOR operate upon 32-bit integer
expressions only.
Three different random number functions are provided: random(), with no
arguments, produces a uniform random deviate between 0 and 1; randomn(),
also with no arguments, produces a normal (Gaussian) random deviate with
zero mean and unit standard deviation; randomp(x) produces a Poisson random
deviate whose expected number of counts is X. X may be any positive real
number of expected counts, including fractional values, but the return value
is an integer.
When the random functions are used in a vector expression, by default
the same random value will be used when evaluating each element of the vector.
If different random numbers are desired, then the name of a vector
column should be supplied as the single argument to the random
function (e.g., "flux + 0.1 * random(flux)", where "flux" is the
name of a vector column). This will create a vector of
random numbers that will be used in sequence when evaluating each
element of the vector expression.
An alternate syntax for the min and max functions has only a single
argument which should be a vector value (see below). The result
will be the minimum/maximum element contained within the vector.
The accum(x) function forms the cumulative sum of x, element by element.
Vector columns are supported simply by performing the summation process
through all the values. Null values are treated as 0. The seqdiff(x)
function forms the sequential difference of x, element by element.
The first value of seqdiff is the first value of x. A single null
value in x causes a pair of nulls in the output. The seqdiff and
accum functions are functional inverses, i.e., seqdiff(accum(x)) == x
as long as no null values are present.
In the if-then-else expression, "b?x:y", b is an explicit boolean
value or expression. There is no automatic type conversion from
numeric to boolean values, so one needs to use "iVal!=0" instead of
merely "iVal" as the boolean argument. x and y can be any scalar
data type (including string).
The angsep function computes the angular separation in degrees
between 2 celestial positions, where the first 2 parameters
give the RA-like and Dec-like coordinates (in decimal degrees)
of the first position, and the 3rd and 4th parameters give the
coordinates of the second position.
The substring function strmid(S,P,N) extracts a substring from S,
starting at string position P, with a substring length N. The first
character position in S is labeled as 1. If P is 0, or refers to a
position beyond the end of S, then the extracted substring will be
NULL. S, P, and N may be functions of other columns.
The string search function strstr(S,R) searches for the first occurrence
of the substring R in S. The result is an integer, indicating the
character position of the first match (where 1 is the first character
position of S). If no match is found, then strstr() returns a NULL
value.
Type cast function
The following type casting operators are available, where the
inclosing parentheses are required and taken from the C language
usage. Also, the integer to real casts values to double precision:
"real to integer" (int) x (INT) x
"integer to real" (float) i (FLOAT) i
Numeric constants
Several constants are built in for use in numerical
expressions:
#pi 3.1415... #e 2.7182...
#deg #pi/180 #row current row number
#null undefined value #snull undefined string
A string constant must be enclosed in quotes as in 'Crab'. The
"null" constants are useful for conditionally setting table values
to a NULL, or undefined, value (eg., "col1==-99 ? #NULL : col1").
Integer constants may be specified using the following notation,
13245 decimal integer
0x12f3 hexidecimal integer
0o1373 octal integer
0b01001 binary integer
Note that integer constants are only allowed to be 32-bit, i.e.
between -2^(31) and +2^(31). Integer constants may be used in any
arithmetic expression where an integer would be appropriate. Thus,
they are distinct from bitmasks (which may be of arbitrary length,
allow the "wildcard" bit, and may only be used in logical
expressions; see below).
Approximation function
There is also a function for testing if two values are close to
each other, i.e., if they are "near" each other to within a user
specified tolerance. The arguments, value_1 and value_2 can be
integer or real and represent the two values who's proximity is
being tested to be within the specified tolerance, also an integer
or real:
near(value_1, value_2, tolerance)
Null value handling
When a NULL, or undefined, value is encountered in the FITS table,
the expression will evaluate to NULL unless the undefined value is
not actually required for evaluation, eg. "TRUE .or. NULL"
evaluates to TRUE. The following two functions allow some NULL
detection and handling:
"a null value?" ISNULL(x)
"define a value for null" DEFNULL(x,y)
"declare certain value null" SETNULL(x,y)
ISNULL(x)
returns a boolean value of TRUE if the argument x is NULL. DEFNULL(x,y)
"defines" a value to be substituted for NULL values; it
returns the value of x if x is not NULL, otherwise it returns the
value of y. SETNULL(x,y) allows NULL values to be inserted into
a variable; if x==y, a NULL value is returned; otherwise y is returned
(x and y must be numerical, and x must be a scalar).
Bit Masks
Bit masks can be used to select out rows from bit columns (TFORMn =
#X) in FITS files. To represent the mask, binary, octal, and hex
formats are allowed:
binary: b0110xx1010000101xxxx0001
octal: o720x1 -> (b111010000xxx001)
hex: h0FxD -> (b00001111xxxx1101)
In all the representations, an x or X is allowed in the mask as a
wild card. Note that the x represents a different number of wild
card bits in each representation. All representations are case
insensitive. Although bitmasks may be of arbitrary length and contain
a wildcard, they may only be used in logical expressions, unlike
integer constants (see above) which may be used in any arithmetic
expression.
To construct the boolean expression using the mask as the boolean
equal operator discribed above on a bit table column. For example,
if you had a 7 bit column named flags in a FITS table and wanted
all rows having the bit pattern 0010011, the selection expression
would be:
flags == b0010011
or
flags .eq. b10011
It is also possible to test if a range of bits is less than, less
than equal, greater than and greater than equal to a particular
boolean value:
flags <= bxxx010xx
flags .gt. bxxx100xx
flags .le. b1xxxxxxx
Notice the use of the x bit value to limit the range of bits being
compared.
It is not necessary to specify the leading (most significant) zero
(0) bits in the mask, as shown in the second expression above.
Bit wise AND, OR and NOT operations are also possible on two or
more bit fields using the '&'(AND), '|'(OR), and the '!'(NOT)
operators. All of these operators result in a bit field which can
then be used with the equal operator. For example:
(!flags) == b1101100
(flags & b1000001) == bx000001
Bit fields can be appended as well using the '+' operator. Strings
can be concatenated this way, too.
Vector Columns
Vector columns can also be used in building the expression. No
special syntax is required if one wants to operate on all elements
of the vector. Simply use the column name as for a scalar column.
Vector columns can be freely intermixed with scalar columns or
constants in virtually all expressions. The result will be of the
same dimension as the vector. Two vectors in an expression, though,
need to have the same number of elements and have the same
dimensions. The only places a vector column cannot be used (for
now, anyway) are the SAO region functions and the NEAR boolean
function.
Arithmetic and logical operations are all performed on an element by
element basis. Comparing two vector columns, eg "COL1 == COL2",
thus results in another vector of boolean values indicating which
elements of the two vectors are equal.
Several functions are available that operate on a vector. All but
the last two return a scalar result:
"minimum" MIN(V) "maximum" MAX(V)
"average" AVERAGE(V) "median" MEDIAN(V)
"sumation" SUM(V) "standard deviation" STDDEV(V)
"# of values" NELEM(V) "# of non-null values" NVALID(V)
"# axes" NAXIS(V) "axis dimension" NAXES(V,n)
"axis pos'n" AXISELEM(V,n) "vector element pos'n" ELEMENTNUM(V)
"promote to array" ARRAY(X,d)
where V represents the name of a vector column or a manually
constructed vector using curly brackets as described below. The
first 6 of these functions ignore any null values in the vector when
computing the result. The STDDEV() function computes the sample
standard deviation, i.e. it is proportional to 1/SQRT(N-1) instead
of 1/SQRT(N), where N is NVALID(V).
The SUM function literally sums all the elements in x, returning a
scalar value. If V is a boolean vector, SUM returns the number
of TRUE elements. The NELEM function returns the number of elements
in vector V whereas NVALID return the number of non-null elements in
the vector. (NELEM also operates on bit and string columns,
returning their column widths.) As an example, to test whether all
elements of two vectors satisfy a given logical comparison, one can
use the expression
SUM( COL1 > COL2 ) == NELEM( COL1 )
which will return TRUE if all elements of COL1 are greater than
their corresponding elements in COL2.
The NAXIS(V) function returns the number of axes of the vector,
for example a 2D array would be NAXIS(V) == 2. The NAXES(V,n)
function returns the dimension of axis n, for example a 4x2 array
would have NAXES(V,1) == 4. The ELEMENTNUM(V) and AXISELEM(V,n)
functions return vectors of the same size as the input vector V.
ELEMENTNUM(V) returns the vector element position for each element
in the vector, starting from 1 in each row. The AXISELEM(V,n)
function is similar but returns the element position of axis n
only.
The ARRAY(X,d) function promotes scalar value X to a vector (or
array) table element. X may be any scalar-valued item, including
a column, an expression, or a constant value. The resulting
vector or array will have the same scalar value replicated into
each element position. This may be a useful way to construct
large arrays without using the cumbersome {vector} notation. The
dimensions of the new array are given by the second argument, d.
d can either be a single constant integer value, or a vector of up
to five dimensions of the form {Nx,Ny,...}. Thus, ARRAY(TIME,4)
would promote TIME to be a 4-vector, and ARRAY(0, {2,3,1}) would
construct an array of all 0's with dimensions 2x3x1.
A second form of ARRAY(X,d) can be used where X is a vector or
array, and the dimensions d merely change the dimensions of X
without changing the total number of vector elements. This is a
way to re-dimension an existing array. For example,
ARRAY({1,2,3,4},{2,2}) would transform the 4-vector into a
2x2 array.
To specify a single element of a vector, give the column name
followed by a comma-separated list of coordinates enclosed in
square brackets. For example, if a vector column named PHAS exists
in the table as a one dimensional, 256 component list of numbers
from which you wanted to select the 57th component for use in the
expression, then PHAS[57] would do the trick. Higher dimensional
arrays of data may appear in a column. But in order to interpret
them, the TDIMn keyword must appear in the header. Assuming that a
(4,4,4,4) array is packed into each row of a column named ARRAY4D,
the (1,2,3,4) component element of each row is accessed by
ARRAY4D[1,2,3,4]. Arrays up to dimension 5 are currently
supported. Each vector index can itself be an expression, although
it must evaluate to an integer value within the bounds of the
vector. Vector columns which contain spaces or arithmetic operators
must have their names enclosed in "$" characters as with
$ARRAY-4D$[1,2,3,4].
A more C-like syntax for specifying vector indices is also
available. The element used in the preceding example alternatively
could be specified with the syntax ARRAY4D[4][3][2][1]. Note the
reverse order of indices (as in C), as well as the fact that the
values are still ones-based (as in Fortran -- adopted to avoid
ambiguity for 1D vectors). With this syntax, one does not need to
specify all of the indices. To extract a 3D slice of this 4D
array, use ARRAY4D[4].
Variable-length vector columns are not supported.
Vectors can be manually constructed within the expression using a
comma-separated list of elements surrounded by curly braces ('{}').
For example, '{1,3,6,1}' is a 4-element vector containing the values
1, 3, 6, and 1. The vector can contain only boolean, integer, and
real values (or expressions). The elements will be promoted to the
highest datatype present. Any elements which are themselves
vectors, will be expanded out with each of its elements becoming an
element in the constructed vector.
Good Time Interval Filtering and Calculation
There are two functions for filtering and calculating based
on Good Time Intervals, or GTIs. GTIs are commonly used to
express fragmented time ranges that are not easy to express with a
single start and stop time. The time intervals are defined in a
FITS table extension which contains 2 columns giving the
start and stop time of each good interval.
A common filtering method involves selecting rows which have a
time value which lies within any GTI. The gtifilter() filtering
operation accepts only those rows of the input table which have an
associated time which falls within one of the time intervals
defined in a separate GTI extension. gtifilter(a,b,c,d) evaluates
each row of the input table and returns TRUE or FALSE depending
whether the row is inside or outside the good time interval. The
syntax is
gtifilter( [ "gtifile" [, expr [, "STARTCOL", "STOPCOL" ] ] ] )
or
gtifilter( [ 'gtifile' [, expr [, 'STARTCOL', 'STOPCOL' ] ] ] )
where each "[]" demarks optional parameters. Note that the quotes
around the gtifile and START/STOP column are required. Either single
or double quotes may be used. In cases where this expression is
entered on the Unix command line, enclose the entire expression in
double quotes, and then use single quotes within the expression to
enclose the 'gtifile' and other terms. It is also usually possible to
do the reverse, and enclose the whole expression in single quotes and
then use double quotes within the expression. The gtifile, if
specified, can be blank ("") which will mean to use the first
extension with the name "*GTI*" in the current file, a plain extension
specifier (eg, "+2", "[2]", or "[STDGTI]") which will be used to
select an extension in the current file, or a regular filename with or
without an extension specifier which in the latter case will mean to
use the first extension with an extension name "*GTI*". Expr can be
any arithmetic expression, including simply the time column name. A
vector time expression will produce a vector boolean result. STARTCOL
and STOPCOL are the names of the START/STOP columns in the GTI
extension. If one of them is specified, they both must be.
In its simplest form, no parameters need to be provided -- default
values will be used. The expression "gtifilter()" is equivalent to
gtifilter( "", TIME, "*START*", "*STOP*" )
This will search the current file for a GTI extension, filter the TIME
column in the current table, using START/STOP times taken from columns
in the GTI extension with names containing the strings "START" and
"STOP". The wildcards ('*') allow slight variations in naming
conventions such as "TSTART" or "STARTTIME". The same default values
apply for unspecified parameters when the first one or two parameters
are specified. The function automatically searches for TIMEZERO/I/F
keywords in the current and GTI extensions, applying a relative time
offset, if necessary.
The related function, gtifind(a,b,c,d), is similar to gtifilter()
but instead of returning true/false, gtifind() returns the GTI
number that brackets the requested time sample. gtifind() returns
the row number in the GTI table that matches the time sample, or
-1 if the time sample is not within any GTI. gtifind() is
particularly useful when entries in a table must be categorized by
which GTI the fall within. For example, if events in an event
list must be separated by good time interval. The results of
gtifind() can be used with histogram binning techniques to bin an
event list by which GTI.
gtifind( "gtifile" , expr [, "STARTCOL", "STOPCOL" ] )
The requirements for specifying the gtifile are the same as for
gtifilter() as described above. Like gtifilter(), the expr is
the time-like expression and is optional (defaulting to TIME).
The start and stop columns default to START and STOP.
A separate function, gtioverlap(a,b,c,d,e), computes the overlap
between a user-requested time range and the entries in a GTI. The
cases of no overlap, partial overlap, or overlap of many GTIs
within the user requested range are handled. gtioverlap() is very
useful for calculating exposure times and fractional exposures of
individual time bins, say for a light curve. The syntax of
gtioverlap() is
gtioverlap( "gtifile" , startExpr, stopExpr [, "STARTCOL", "STOPCOL" ] )
or
gtioverlap( 'gtifile' , startExpr, stopExpr [, 'STARTCOL', 'STOPCOL' ] )
The requirements for specifying the gtifile are the same as for
gtifilter() as described above. Unlike gtifilter(), the startExpr
and stopExpr are not optional. startExpr provides a start of the
user requested time interval. startExpr is typically TIME, but
can be any valid expression. Likewise, stopExpr provides the stop
of the user requested time interval, and can be an expression.
For example, for a light curve with a TIME column and time bin
size of 1.0 seconds, the expression
gtioverlap('gtifile',TIME,TIME+1.0)
would calculate the amount of overlap exposure time between each
one second time bin and the GTI in 'gtifile'. In this case the
time bin is assumed to begin at the time specified by TIME and end
1 second later. Neither startExpr nor stopExpr are required to be
constant, and a light curve is not required to have a constant bin
size. For tables, the overlap is calculated for each entry in the
table.
It is also possible to calculate a single overlap value, which
would typically be placed in a keyword. For example, a way to to
compute the total overlap exposure of a file whose TIME column is
bounded by the keywords TSTART and TSTOP, overlapping with the
specified GTI, would be
#EXPOSURE = gtioverlap('gtifile',#TSTART,#TSTOP)
The #EXPOSURE syntax with a leading # ensures that the
requested values are treated as keywords. Otherwise, a column
named EXPOSURE will be created with the (constant) exposure value
in each entry.
Spatial Region Filtering
Another common filtering method is a spatial filter using a SAO-
style region file. The syntax for this high-level filter is
regfilter( "regfilename" [ , Xexpr, Yexpr [ , "wcs cols" ] ] )
The region file name is required, but the rest is optional. Without
any explicit expression for the X and Y coordinates (in pixels), the
filter will search for and operate on columns "X" and "Y". If the
region file is in "degrees" format instead of "pixels" ("hhmmss"
format is not supported, yet), the filter will need WCS information
to convert the region coordinates to pixels. If supplied, the final
parameter string contains the names of the 2 columns (space or comma
separated) which contain the desired WCS information. If not
supplied, the filter will scan the X and Y expressions for column
names. If only one is found in each expression, those columns will
be used. Otherwise, an error will be returned.
The region shapes supported are (names are case insensitive):
Point ( X1, Y1 ) <- One pixel square region
Line ( X1, Y1, X2, Y2 ) <- One pixel wide region
Polygon ( X1, Y1, X2, Y2, ... ) <- Rest are interiors with
Rectangle ( X1, Y1, X2, Y2, A ) | boundaries considered
Box ( Xc, Yc, Wdth, Hght, A ) V within the region
Diamond ( Xc, Yc, Wdth, Hght, A )
Circle ( Xc, Yc, R )
Annulus ( Xc, Yc, Rin, Rout )
Ellipse ( Xc, Yc, Rx, Ry, A )
Elliptannulus ( Xc, Yc, Rinx, Riny, Routx, Routy, Ain, Aout )
Sector ( Xc, Yc, Amin, Amax )
where (Xc,Yc) is the coordinate of the shape's center; (Xn,Yn) are
the coordinates of the shape's edges; Rxxx are the shapes' various
Radii or semimajor/minor axes; and Axxx are the angles of rotation
(or bounding angles for Sector) in degrees. For rotated shapes, the
rotation angle can be left off, indicating no rotation. Common
alternate names for the regions can also be used: rotbox = box;
rotrectangle = rectangle; (rot)rhombus = (rot)diamond; and pie
= sector. When a shape's name is preceded by a minus sign, '-',
the defined region is instead the area *outside* its boundary (ie,
the region is inverted). All the shapes within a single region file
are AND'd together to create the region.
There are three functions that are primarily for use with SAO region
files, but they can be used directly. They
return a boolean true or false depending on whether a two
dimensional point is in the region or not:
"point in a circular region"
circle(xcntr,ycntr,radius,Xcolumn,Ycolumn)
"point in an elliptical region"
ellipse(xcntr,ycntr,xhlf_wdth,yhlf_wdth,rotation,Xcolumn,Ycolumn)
"point in a rectangular region"
box(xcntr,ycntr,xfll_wdth,yfll_wdth,rotation,Xcolumn,Ycolumn)
where
(xcntr,ycntr) are the (x,y) position of the center of the region
(xhlf_wdth,yhlf_wdth) are the (x,y) half widths of the region
(xfll_wdth,yfll_wdth) are the (x,y) full widths of the region
(radius) is half the diameter of the circle
(rotation) is the angle(degrees) that the region is rotated with
respect to (xcntr,ycntr)
(Xcoord,Ycoord) are the (x,y) coordinates to test, usually column
names
NOTE: each parameter can itself be an expression, not merely a
column name or constant.
For complex or commonly used filters, one can also place the
expression into a text file and import it into the calculator using
the syntax '@filename.txt'. The expression can be arbitrarily
complex and extend over multiple lines of the file.
Go to About fv.