How can I read data from data files with particular formats?

Q

How can I read data from data files with particular formats?
How can I read ten floats without having to use a jawbreaker scanf format
like "%f %f %f %f %f %f %f %f %f %f"?
How can I read an arbitrary number of fields from a line into an array?

✍: Guest

A

In general, there are three main ways of parsing data lines:
1. Use fscanf or sscanf, with an appropriate format string. Despite the limitations mentioned in this section, the scanf family is quite powerful. Though whitespace-separated fields are always the easiest to deal with, scanf format strings can also be used with more compact, column oriented, FORTRAN-style data. For instance, the line
1234ABC5.678
could be read with "%d%3s%f".
, then deal with each field individually, perhaps with functions like atoi and atof. (Once the line is broken up, the code for handling the fields is much like the traditional code in main() for handling the argv array;
Break the line into fields separated by whitespace (or some other delimiter), using strtok or the equivalent This method is particularly useful for reading an arbitrary (i.e. not known in advance) number of fields from a line into an array.
Here is a simple example which copies a line of up to 10 floating-point numbers (separated by whitespace) into an array:
#define MAXARGS 10
char line[] = "1 2.3 4.5e6 789e10";
char *av[MAXARGS];
int ac, i;
double array[MAXARGS];

ac = makeargv(line, av, MAXARGS);
for(i = 0; i < ac; i++)
array[i] = atof(av[i]);

3. Use whatever pointer manipulations and library routines are handy to parse the line in an ad-hoc way. (The ANSI strtol and strtod functions are particularly useful for this style of parsing, because they can return a pointer indicating where they stopped reading.) This is obviously the most general way, but it's also the most difficult and error-prone: the thorniest parts of many C programs are those which use lots of tricky little pointers to pick apart strings.
When possible, design data files and input formats so that they don't require arcane manipulations, but can instead be parsed with easier techniques such as 1 and 2: dealing with the files will then be much more pleasant all around.

2015-10-26, 642👍, 0💬