IDEA: Text search engine tailored to a specific schema

Casey Sybrandy via Digitalmars-d digitalmars-d at puremagic.com
Fri Apr 17 07:21:21 PDT 2015


I was thinking something a bit more specific without having to 
manually generate the structs.

For example, let's say I have a JSON document that has a number 
of fields in it.  Some are numbers, some are strings, etc.  What 
I'm thinking either a) based of the JSON structure or b) based on 
a schema that describes the JSON, the objects and/or indices are 
defined at compile-time and done so in an optimal manner.  For 
example, if based on the schema we know that a field is an 
enumeration, instead of a inverted index a simple associative 
array that contains arrays of matching document IDs is used 
instead.  This way, if I search on that specific field, it can be 
done in the most efficient way possible.  Also, the documents 
themselves would be stored more optimally.

So, no, this isn't an ORM as I'm not mapping objects to an 
underlying data store.  I guess what I'm thinking of is the text 
search equivalent of the regular expression engine.  Thinking 
about it now, I should have mentioned that this would be like 
Sphinx/Lucene/ElasticSearch except it would be optimized to a 
specific document structure vs. more general purpose.  The 
optimizations would be generated at compile-time based on a 
sample document structure or schema vs. coding everything 
manually.


More information about the Digitalmars-d mailing list