Symbol lookup rules and imports

H. S. Teoh via Digitalmars-d digitalmars-d at puremagic.com
Tue Dec 2 14:00:09 PST 2014


Recently, in a bid to reduce the messy tangle that is the web of
interdependencies among Phobos modules, many module-scope imports have
been replaced with scoped imports. In addition to reducing gratuitous
dependencies (the scoped import is not processed in a template body
until the template is actually needed, so if user code doesn't need that
template, then it won't pull in a whole bunch of unnecessary Phobos
modules), scoped imports also make the code easier to refactor -- you
can move things around more easily without breaking it.

However, there are major issues with scoped imports currently,
that make this otherwise ideal solution less-than-ideal, which stem from
the way 'import' is implemented in D. When the compiler encounters an
'import' statement, what it essentially does is to parse the symbols in
the target module, and add them to the symbol table for the current
scope. While this is a simple, straightforward implementation method, it
leads to the following problems:

1) Consider this code:

	// Quiz for the reader: what does this code do?
	void myFunc(string text, string number) {
		import std.conv;
		import std.stdio;

		auto x = to!int(number);
		writeln(text, x);
	}
	void main() {
		myFunc("The count is: ", "123");
	}

Here's the output:

	123

The reason is that `import std.conv;` pulls in *all* the symbols from
std.conv into the local scope, including the function called "text".
This causes "text" (i.e., std.conv.text) to shadow the parameter "text",
so `writeln(text, x);` is interpreted as:

	std.stdio.writeln(std.conv.text(), x);

so the parameter "text" is never used. This is just a simple example;
the problem becomes much harder to trace when the shadowed symbol is not
this close to the import scope:

	struct S {
		string text;
		...
		void method() {
			import std.conv;
			...
			writeln(text); // uses std.conv.text, not this.text
		}
	}

Worse yet, suppose we have this code, that today works correctly:

	struct S {
		string buf;
		void format() {
			buf = ... /* format S's data into string form */
		}
		void method() {
			import std.file;
			...		// do some processing based on files
			format();	// calls this.format()
		}
	}

Now, suppose at some point in the future, we introduce a function also
called format() in std.file, that formats your hard drive. The user code
is unchanged. What happens when you recompile the program and run it? It
will compile successfully... and then format your hard drive without any
warning.

This is the crux of the issue behind:

	https://issues.dlang.org/show_bug.cgi?id=10378


2) OK, so the conclusion is that unqualified scoped imports are
dangerous, right? Alright, let's see what happens when we use qualified
imports:

	// mod.d
	module mod;
	struct S {
		// Use a fully-qualified import.
		// We place it in the body of S because S's methods
		// repeatedly needs it -- after all, DRY is good, right?
		import std.format : format;

		void method1(string fmt) {
			writeln(format(fmt, ... ));
		}

		void method2() {
			auto s = format("abc %s def", ...);
			...
		}
	}

	// main.d
	module main;
	import mod; // we need the definition of S

	void format(S s) {
		... /* do something with s */
	}

	void main() {
		S s;
		s.format(); // UFCS -- should call main.format(s), right?
	}

Wrong. That last line in main() actually calls std.format.format. Why?
Because in mod.d, the `import std.format : format` declaration actually
pulls in std.format.format into the symbol table of S, therefore,
S.format now refers to std.format.format. This, therefore, hijacks the
intended call to main.format.

This is: https://issues.dlang.org/show_bug.cgi?id=13808


3) OK, so putting scoped, qualified imports in struct bodies is a bad
idea, they should always go into the method bodies, then we'll be OK,
right?  Sure... until you need to do this:

	// A
	struct S {
		// B
		void method(R)(R range)
			if (isInputRange!R)
		{
			// C
		}
	}

This will not compile unless you import std.range : isInputRange
somewhere.  Putting it in the line marked //A introduces a module-global
dependency on std.range, which is what we're trying to eliminate. So
that's a no-go.  Putting in //C is also a no-go, because we need
isInputRange in the sig constraint, which is outside the function body.
So the only option is //B. However, if we do that, then the imported
symbols will get pulled into S's symbol table, thus you have this
situation:

	// mymod.d
	module mymod;
	struct S {
		// Now everyone is happy, right...?
		import std.range : isInputRange;

		void method(R)(R range)
			if (isInputRange!R)
		{
		}
	}

	// main.d
	module main;
	import mymod;

	// Well ...
	void main() {
		S s;

		static assert(isInputRange!(int[])); // fails -- as it should

		static assert(s.isInputRange!(int[])); // works -- WAT?
	}

So even though isInputRange is really only required by the
*implementation* of S, yet the consequence of the compiler's
implementation of import is that it pollutes S's namespace with the
imported symbols in such a way that external code can access said
symbols using S as a proxy -- even if S was never intended to act as
such a proxy!!

And if you used an unqualified import instead, the problem is
compounded:

	// mymod.d
	module mymod;

	struct S {
		import std.algorithm; // oh yes
		...
		void method() {
			// hey, it does make code easier to write, we
			// don't have to keep importing std.algorithm
			// all over the place:
			auto x = map!(...).filter!(...).sort();
			...
		}
		...
	}

	// main.d
	module main;

	void main() {
		// hey, we only need the definition of S, right?
		import mymod : S;

		S s;

		// But we can access all sorts of stuff through S...
		// (Not the mention UFCS hijacking happening everywhere!)
		auto r = s.map!(...);
		auto t = s.filter!(...);
		auto u = s.canFind(...);

		// when they shouldn't even work at all!!
		auto v = canFind(s, ...); // <--- compile error
	}


4) This isn't the end of the story. There's also this lovely bug:

	https://issues.dlang.org/show_bug.cgi?id=1238

which, as its number should tell you, has been around for a LONG time.
Executive summary:

	// mymod.d
	module mymod;
	private int impl;

	// main.d
	module main;
	import mymod;

	void impl() { ... }

	void main() {
		impl(); // Error: main.impl conflicts with mymod.impl (WAT?)
	}

Basically, the underlying cause is the same. Import pulls the symbols of
mymod.d into the current scope (including private ones -- which the
compiler currently has to, since otherwise it couldn't compile mymod's
references to private symbols). So when you try to reference 'impl', the
overload set contains both main.impl and mymod.impl, which is an
ambiguity error.

This doesn't sound too bad, until you consider that mymod.d could be an
external library that, originally, did *not* define a symbol called
"impl". So your code happily compiles and works, and then you upgrade to
the next release of the library, and suddenly you get a compile error,
because the library authors introduced a PRIVATE symbol into the
library.


In all of the above cases, the compiler's current "simple"
implementation of lookup rules and imports is creating a ripple effect
of loopholes and corner cases that all add up to poor encapsulation and
maintainability problems: local imports in structs makes imported
symbols visible to unrelated outside code; private symbols in imported
modules conflict with local symbols; even scoped, qualified imports
(ostensibly the "cleanest" way to import something) can shadow
parameters and enclosing struct/class members.

In order to avoid these issues, one has to adopt increasingly convoluted
ways of writing code based on a series of avoidances -- avoid
module-global imports (introduces needless dependencies), avoid
unqualified imports (shadows parameters), avoid putting imports in
struct/class bodies (leaks imported symbols to outside world), etc..
This is beginning to sound like C++ all over again (don't use malloc,
use new; don't use new for arrays, use new[]; don't use raw pointers,
use std::auto_ptr; etc....). :-(

I'm finding it harder and harder to accept Walter's stance that symbol
lookups should be kept simple and free from complications and convoluted
corner cases, etc.. Except that it is *already* full of convoluted
pitfalls and corner cases you must avoid, as illustrated above. I wish
we would just acknowledge that the current symbol lookup / import rules
(or at least the implementation thereof) are inadequate, and find a
clean solution to this long-standing issue, instead of hoping that
denying the problem will somehow, magically, make it all go away.


T

-- 
"Holy war is an oxymoron." -- Lazarus Long


More information about the Digitalmars-d mailing list