The Irregular expressions

This C module realizes pattern-matching with dos-style wildcard characters '*' and '?'. This module doesn't offer the power of "regular expressions", but might be slightly easier to use. '*' matches zero or more instances of any characters, while '?' matches one (no more, no less) instance of any character. '\*' matches '*', '\?' matches '?' and '\\' matches '\', but in string literals you will have to type them as '\\*','\\?' and '\\\\', since the compiler parses '\\' as a single backslash. '*' is "greedy", so that with "variation", "v*i" matches "variati", instead of "vari" that would be matched if '*' was non-greedy. This module works with more complex patterns than early MS-DOS implementations, so that "c*p*d" matches "crepitud" in "decrepitude".

The functions

int WCMatchPos(char *string, char *wildcard, char **begin, char **end,int icase);

Example:

WCMatchPos("garage","a*g",&beg,&end,1);

Where you have previously declared char *beg,*end; so that beg and end are of type "pointer-to-character".

The example would make beg point at the first 'a' in the string "garage", and end to point at the last 'g'. It would return 2, which is the offset of 'a' plus 1. In case of no match, 0 would be returned. The last argument, icase, should be set to 0 if you want case-sensitive matching, or nonzero if you want case-insensitive matching.

int WCMatch(char *string, char *wildcard,int icase);

This is a front-end to WCMatchPos, return value is similar and you don't have to provide pointers to pointers to chars, if such things don't interest you. For example:

if (WCMatch(filename,"*.txt",1)) puts("It is a text file!");

But do note that '*'-wildcards at the beginning of pattern are often unnecessary and only slow down the algorithm. If you only care about the truth value, you might as well use ".txt" as a wildcard. It only matches the ".txt"-part, so that if you want to include the full file name into match (ie utilize the return value apart from zero-nonzero distinction, or in conjunction with WCMatchPos), you might want to use the full form "*.txt", even though it's slower.

char *WCReplaceOne(char *string, char *wildcard, char *replacewith, char *resultbuf,int icase);

This function results in 'resultbuf' containing the same value as 'string', except that the first occurence of wildcard will be replaced by the string contained in 'replacewith'. For example:

char *word="Mountain";
char result[300];
WCReplaceOne(word,"u*a",".......", result,0);

Would result in 'result' containing the string 'Mo.......in'. Note that you must allocate enough memory for the result, otherwise the program will crash (if you are lucky).

int WCPossible(char *string, char *wildcard,int icase);

You won't probably need this function, it checks if it's possible for wildcard to match string, but it can and will return false positives. It's used internally for optimization, as it's much faster than the actual matching functions.

Installation

You need to #inlude the file wcmatch.h, and compile/link your program with wcmatch.c. That's about it. If you are working with C++, you can safely rename wcmatch.c to wcmatch.cpp. No heap memory is allocated in any of the functions.

Copying

This library is freeware, and you may use it for whatever you want. There is no warranty of any kind, and the author doesn't guarantee its suitability for any purpose.

The Author

Ville Vainio
email: vvainio@tp.spt.fi
Last modified: Mon Jan 24 18:57:34 Suomen normaaliaika 2000