regex direct support for sse4 intrinsics

Jay Norwood jayn at prismnet.com
Mon Mar 26 11:14:22 PDT 2012


The sse4 capabilities include a range mode of string matching,
that lets you match characters in a 16 byte string based on a 16
byte set of start and stop character ranges.  See the
_SIDD_CMP_RANGES mode in the table.


For example, the pattern in some of our examples for finding the
start of a word is a-zA-Z, and for other characters in the word
a-zA-Z0-9.  Either of these patterns could be tested for match on
a 16 byte input in a single operation in the sse4 engine.

http://msdn.microsoft.com/en-us/library/bb531465.aspx

Looking at the msft intrinsics, it seems like the D ones could be
more efficient and elegant looking using D slices, since they are
passing the string and length of string as separate parameters.

It would be good if the D regex processing could detect simple
range match patterns and use the sse4 extensions when available.

There is also an article from intel where they demo use of these
instructions for xml parsing.

http://software.intel.com/en-us/articles/xml-parsing-accelerator-with-intel-streaming-simd-extensions-4-intel-sse4/



More information about the Digitalmars-d mailing list