[OT] parsing with sscanf is accidentally quadratic due to strlen

H. S. Teoh hsteoh at quickfur.ath.cx
Wed Mar 3 16:58:27 UTC 2021


On Wed, Mar 03, 2021 at 11:09:04AM +0000, Patrick Schluter via Digitalmars-d wrote:
> On Wednesday, 3 March 2021 at 09:12:19 UTC, Kagamin wrote:
> > Parsers based on sscanf choke on big strings:
> > https://nee.lv/2021/02/28/How-I-cut-GTA-Online-loading-times-by-70/
> > Source: https://github.com/chakra-core/ChakraCore/blob/master/pal/src/safecrt/sscanf.c#L47
> 
> Yes, sscanf() calls strlen(). I got bitten by it also some years ago
> when I memory mapped some log files to parse and had the my program
> hog the CPU when I went in production. On my test files that were not
> that big, memory mapping and changing fscanf() to sscanf() was a no
> brainer. When it went in production and started to map megabytes or
> gigabyte sized files, I rediscovered what a O(n²) algorithm looked
> like...

Ouch.

I myself am no fan of sscanf: too limited and hard to fine-tune parsing
behaviour. If it were up to me, I wouldn't run any large data sets
through sscanf.  Now this adds one more reason for not using sscanf.

Fortunately in D slices eliminate the strlen problem, and slice-based
std.array.split, et al, are generally better for simple parsing tasks
IMO than *scanf functions.


T

-- 
Heads I win, tails you lose.


More information about the Digitalmars-d mailing list