Proof of concept: automatic extraction of gettext-style translation strings

H. S. Teoh hsteoh at quickfur.ath.cx
Thu Apr 2 13:01:09 UTC 2020


This morning a neat idea occurred to me for a gettext-like system in D
that allows automatic and reliable extraction of all translation strings
from a program, that doesn't need an external parser to run over the
program source code.

Traditionally, gettext requires an external tool to parse the source
code and extract translatable strings.  In D, however, we can take
advantage of (1) passing the format string at compile-time to gettext(),
which then allows (2) using static this() to register all format strings
at runtime to a central dictionary of format strings, regardless of
whether the corresponding gettext() call actually got called at runtime.
(3) Wrap that in a version() condition, and you can have the compiler do
the string extraction for you without needing an external source code
parser.

Here's a proof of concept:

	// ------------------------------------------------------------------
	// File: lang.d
	version(extractStr) {
		int[string] allStrings;
		void main() {
			import std.algorithm;
			import std.stdio;
			auto s = allStrings.keys;
			s.sort();
			writefln("string[string] dict = [\n%(\t%s: \"\",\n%|%)];", s);
		}
	}
	
	template gettext(string fmt, Args...)
	{
		version(extractStr)
		static this() {
			allStrings[fmt]++;
		}
		string gettext(Args args) {
			import std.format;
			return format(fmt, args);
		}
	}

	// ------------------------------------------------------------------
	// File: main.d
	import mod1, mod2;
	
	version(extractStr) {} else
	void main() {
		auto names = [ "Joe", "Schmoe", "Jane", "Doe" ];
		foreach (i; 0 .. names.length) {
			fun1(names[i]);
			fun2(5 + cast(int)i*10);
		}
	}

	// ------------------------------------------------------------------
	// File: mod1.d
	import std.stdio;
	import lang;
	
	void fun1(string name) {
		writeln(gettext!"Hello! My name is %s."(name));
	}

	// ------------------------------------------------------------------
	// File: mod2.d
	import std.stdio;
	import lang;
	
	void fun2(int num) {
		writeln(gettext!"I'm counting %d apples."(num));
	}
	
	void fun3() {
		writeln(gettext!"Never called, but nevertheless registered!");
	}


Running the program normally with `dmd -i -run main.d` produces the
output:

	Hello! My name is Joe.
	I'm counting 5 apples.
	Hello! My name is Schmoe.
	I'm counting 15 apples.
	Hello! My name is Jane.
	I'm counting 25 apples.
	Hello! My name is Doe.
	I'm counting 35 apples.


Format strings can be extracted by compiling with -version=extractStr:

	dmd -i -version=extractStr -run main.d

which produces a template for translating the format strings into
another language:

	string[string] dict = [
		"Hello! My name is %s.": "",
		"I'm counting %d apples.": "",
		"Never called, but nevertheless registered!": "",
	];


The idea is that in a real implementation gettext(), it would look up
the format string in the l10n file containing a filled-out instance of
the above dictionary and map it to the target language. It could also
have a fancier extractStr that merges new format strings into an
existing translated file, so that l10n files can be continually updated
as development proceeds.

The best thing about this is that no additional tooling is required; the
string extraction process is 100% reliable and not prone to bugs in an
external parser, and done completely within D.


T

-- 
Computerese Irregular Verb Conjugation: I have preferences.  You have biases.  He/She has prejudices. -- Gene Wirchenko


More information about the Digitalmars-d mailing list