pscanf: a sscanf replacement for C

In general, I like C. Pointers don’t generally bother me all that much, and its really nice to use for a lot of different things. However, its string handling SUCKS (though, printf is nice). It seems like a lot of the string routines in the standard C library are designed to screw you in the most unexpected ways possible.

So recently I’ve been struggling with processing input from a user on the command line, and I was using sscanf to attempt to do really simple string processing, and it simply wasn’t matching the correct things. After much struggling and coming to the realization that I needed to do something similar to regular expressions… I sat down and wrote the following function, and it works quite well for what I want to do.

What it does do:

  • Match arbitrary strings using a perl regex, and return the matches to you

What it doesn’t do:

  • This is not a drop-in replacement for sscanf: you need to change the format string around and (possibly) your parameters.
  • It will not return anything but a string. Thats all you get. Of course, if you format your regex correctly then you can pretty much be guaranteed that strtol/strtod/etc will work…
  • Give you any kind of comprehensive error reporting. If you want that, then just use the pcre functions directly
  • Cook your meals or do your laundry

This is released under the license contained on http://www.virtualroadside.com/software/, enjoy!

#include <stdio.h>
#include <string.h>
#include <pcre.h>
#include <stdarg.h>

/*
    Author: Dustin Spicuzza

    sscanf replacement based on perl regular expressions. Matches
    a maximum of 20 items (who really needs that many items). It should be
    called like so:

        char * arg1, * arg2;

        if (pcscanf(input, "^blah (.*) blah (.*)$", &arg1, &args) != 2)
        {
            return SOME_ERROR;
        }

        // do something with arg1/arg 2 here
        ..
        ..

        // free them
        free(arg1);
        free(arg2);

    The routine expects the match parameters to be pointers to char*,
    so ensure that you pass the right pointer or it will segfault!

    It should be noted that this would be rather slow for lots and
    lots of comparisons, given that its recompiling the regular expresion
    each time you call this (especially in a loop). However, for simple
    one-time or interactive usage... works like a charm :) 

    I used gcc's C99 mode to compile, and you need to link to the pcre 
    library
*/
int pscanf(const char * input, const char * format, ... )
{
    pcre * re;
    const char * error;
    int erroffset, rc, i;
    int ovector[60];

    // compile the regex
    if (!(re = pcre_compile(format, 0, &error, &erroffset, NULL)))
    {
        fprintf(stderr, "Error compiling regex at offset %d: %s\n", 
                erroffset, error);
        return -1;
    }

    // execute it
    rc = pcre_exec(re, NULL, input, strlen(input), 0, 0, ovector, 60);

    if (rc > 0)
    {
        va_list ap;
        va_start(ap, format);

        for (i = 1; i < rc; i++)
        {
            char ** arg = va_arg(ap, char **);
            const char * tmp;
            pcre_get_substring(input, ovector, rc, i, &tmp);

            // pcre may use malloc internally, but just in case...
            *arg = strdup(tmp);
            pcre_free_substring(tmp);
        }

        va_end(ap);

        // correct the end value
        rc = rc -1;
    }

    // don't forget to free it
    pcre_free(re);

    return rc;
}

Leave a Reply