mimeapps - finding association between MIME types and applications

FreeSlave via Digitalmars-d-announce digitalmars-d-announce at puremagic.com
Sat Apr 16 13:55:04 PDT 2016


On Saturday, 16 April 2016 at 19:34:52 UTC, Eugene Wissner wrote:
>
> Wow. I just wanted to port libmagic since need it. Can you 
> write a short introduction how I can work with the magic 
> database (defining mime type of a file based on its content)?

Usually mime type detection is done by parsing mime.cache files. 
These are binary files that can be mapped into memory. mime.cache 
files are generated by update-mime-database using source packages 
as base (these are in XML format, see 
https://specifications.freedesktop.org/shared-mime-info-spec/shared-mime-info-spec-0.18.html#idm140001680036896 )

Here's format spec: 
https://specifications.freedesktop.org/shared-mime-info-spec/shared-mime-info-spec-0.18.html#idm140001675194688

Code of 'mime' library responsible for parsing such files: 
https://github.com/MyLittleRobo/mime/blob/master/source/mime/cache.d

mime.cache file has MagicList entry that store magic rules for 
all types.
MagicList consists of Match entries sorted by priority. Match 
includes name of mime type it's related to and has Matchlet 
entries as children which on their own may have other Matchlets 
as children (so it's a tree). Each Matchlet describes part of 
magic rule including content to match and position in file where 
this content should be found to say that the file is of this 
type. This information is also stored in separate 'magic' file. 
Options are described in spec: 
https://specifications.freedesktop.org/shared-mime-info-spec/shared-mime-info-spec-0.18.html#idm140001675229440

Matchlets have OR logic so if any tree path matches file 
contents, then this file is of type in this Match.

For better demonstrating of recursive nature of rules see 
definition of application/x-executable or application/x-sharedlib 
in /usr/share/mime/packages/freedesktop.org.xml. Here <magic> 
element coincides with Match entry in mime.cache and <match> 
elements coincide with Matchlet entries.

So the algorithm is:

1. Iterate over Match entries in MagicList
2. For every Match iterate over every Matchlet.
3. Recursively apply Matchlet rule and its children rules to file 
content.
4. If some tree path matches file contents the mime type for this 
file is found (you don't need to check following Match entries, 
since they have less or the same priority). Otherwise go to the 
next Match in MagicList.

See source code in 'mime' library responsible for this task: 
https://github.com/MyLittleRobo/mime/blob/master/source/mime/cache.d#L463

Note that I did not describe how to define mime type when 
there're more than one mime.cache file and how to handle 
conflicts and explicitly deleted magic rules. Here's source code 
though: 
https://github.com/MyLittleRobo/mime/blob/master/source/mime/detectors/cache.d#L210

Please read the spec.


More information about the Digitalmars-d-announce mailing list