Symbol lookup rules and imports

Fri Dec 5 04:24:02 PST 2014

On 02/12/2014 22:00, H. S. Teoh via Digitalmars-d wrote:
> Recently, in a bid to reduce the messy tangle that is the web of
> interdependencies among Phobos modules, many module-scope imports have
> been replaced with scoped imports. In addition to reducing gratuitous
> dependencies (the scoped import is not processed in a template body
> until the template is actually needed, so if user code doesn't need that
> template, then it won't pull in a whole bunch of unnecessary Phobos
> modules), scoped imports also make the code easier to refactor -- you
> can move things around more easily without breaking it.
>
> However, there are major issues with scoped imports currently,
> that make this otherwise ideal solution less-than-ideal, which stem from
> the way 'import' is implemented in D. When the compiler encounters an
> 'import' statement, what it essentially does is to parse the symbols in
> the target module, and add them to the symbol table for the current
> scope. While this is a simple, straightforward implementation method, it
> leads to the following problems:
>
> 1) Consider this code:
>
> 	// Quiz for the reader: what does this code do?
> 	void myFunc(string text, string number) {
> 		import std.conv;
> 		import std.stdio;
>
> 		auto x = to!int(number);
> 		writeln(text, x);
> 	}
> 	void main() {
> 		myFunc("The count is: ", "123");
> 	}
>
> Here's the output:
>
> 	123
>
> The reason is that `import std.conv;` pulls in *all* the symbols from
> std.conv into the local scope, including the function called "text".
> This causes "text" (i.e., std.conv.text) to shadow the parameter "text",
> so `writeln(text, x);` is interpreted as:
>
> 	std.stdio.writeln(std.conv.text(), x);
>
> so the parameter "text" is never used. This is just a simple example;
> the problem becomes much harder to trace when the shadowed symbol is not
> this close to the import scope:
>
> 	struct S {
> 		string text;
> 		...
> 		void method() {
> 			import std.conv;
> 			...
> 			writeln(text); // uses std.conv.text, not this.text
> 		}
> 	}
>
> Worse yet, suppose we have this code, that today works correctly:
>
> 	struct S {
> 		string buf;
> 		void format() {
> 			buf = ... /* format S's data into string form */
> 		}
> 		void method() {
> 			import std.file;
> 			...		// do some processing based on files
> 			format();	// calls this.format()
> 		}
> 	}
>
> Now, suppose at some point in the future, we introduce a function also
> called format() in std.file, that formats your hard drive. The user code
> is unchanged. What happens when you recompile the program and run it? It
> will compile successfully... and then format your hard drive without any
> warning.
>
> This is the crux of the issue behind:
>
> 	https://issues.dlang.org/show_bug.cgi?id=10378
>
>
> 2) OK, so the conclusion is that unqualified scoped imports are
> dangerous, right? Alright, let's see what happens when we use qualified
> imports:
>
> 	// mod.d
> 	module mod;
> 	struct S {
> 		// Use a fully-qualified import.
> 		// We place it in the body of S because S's methods
> 		// repeatedly needs it -- after all, DRY is good, right?
> 		import std.format : format;
>
> 		void method1(string fmt) {
> 			writeln(format(fmt, ... ));
> 		}
>
> 		void method2() {
> 			auto s = format("abc %s def", ...);
> 			...
> 		}
> 	}
>
> 	// main.d
> 	module main;
> 	import mod; // we need the definition of S
>
> 	void format(S s) {
> 		... /* do something with s */
> 	}
>
> 	void main() {
> 		S s;
> 		s.format(); // UFCS -- should call main.format(s), right?
> 	}
>
> Wrong. That last line in main() actually calls std.format.format. Why?
> Because in mod.d, the `import std.format : format` declaration actually
> pulls in std.format.format into the symbol table of S, therefore,
> S.format now refers to std.format.format. This, therefore, hijacks the
> intended call to main.format.
>
> This is: https://issues.dlang.org/show_bug.cgi?id=13808
>
>
> 3) OK, so putting scoped, qualified imports in struct bodies is a bad
> idea, they should always go into the method bodies, then we'll be OK,
> right?  Sure... until you need to do this:
>
> 	// A
> 	struct S {
> 		// B
> 		void method(R)(R range)
> 			if (isInputRange!R)
> 		{
> 			// C
> 		}
> 	}
>
> This will not compile unless you import std.range : isInputRange
> somewhere.  Putting it in the line marked //A introduces a module-global
> dependency on std.range, which is what we're trying to eliminate. So
> that's a no-go.  Putting in //C is also a no-go, because we need
> isInputRange in the sig constraint, which is outside the function body.
> So the only option is //B. However, if we do that, then the imported
> symbols will get pulled into S's symbol table, thus you have this
> situation:
>
> 	// mymod.d
> 	module mymod;
> 	struct S {
> 		// Now everyone is happy, right...?
> 		import std.range : isInputRange;
>
> 		void method(R)(R range)
> 			if (isInputRange!R)
> 		{
> 		}
> 	}
>
> 	// main.d
> 	module main;
> 	import mymod;
>
> 	// Well ...
> 	void main() {
> 		S s;
>
> 		static assert(isInputRange!(int[])); // fails -- as it should
>
> 		static assert(s.isInputRange!(int[])); // works -- WAT?
> 	}
>
> So even though isInputRange is really only required by the
> *implementation* of S, yet the consequence of the compiler's
> implementation of import is that it pollutes S's namespace with the
> imported symbols in such a way that external code can access said
> symbols using S as a proxy -- even if S was never intended to act as
> such a proxy!!
>
> And if you used an unqualified import instead, the problem is
> compounded:
>
> 	// mymod.d
> 	module mymod;
>
> 	struct S {
> 		import std.algorithm; // oh yes
> 		...
> 		void method() {
> 			// hey, it does make code easier to write, we
> 			// don't have to keep importing std.algorithm
> 			// all over the place:
> 			auto x = map!(...).filter!(...).sort();
> 			...
> 		}
> 		...
> 	}
>
> 	// main.d
> 	module main;
>
> 	void main() {
> 		// hey, we only need the definition of S, right?
> 		import mymod : S;
>
> 		S s;
>
> 		// But we can access all sorts of stuff through S...
> 		// (Not the mention UFCS hijacking happening everywhere!)
> 		auto r = s.map!(...);
> 		auto t = s.filter!(...);
> 		auto u = s.canFind(...);
>
> 		// when they shouldn't even work at all!!
> 		auto v = canFind(s, ...); // <--- compile error
> 	}
>
>
> 4) This isn't the end of the story. There's also this lovely bug:
>
> 	https://issues.dlang.org/show_bug.cgi?id=1238
>
> which, as its number should tell you, has been around for a LONG time.
> Executive summary:
>
> 	// mymod.d
> 	module mymod;
> 	private int impl;
>
> 	// main.d
> 	module main;
> 	import mymod;
>
> 	void impl() { ... }
>
> 	void main() {
> 		impl(); // Error: main.impl conflicts with mymod.impl (WAT?)
> 	}
>
> Basically, the underlying cause is the same. Import pulls the symbols of
> mymod.d into the current scope (including private ones -- which the
> compiler currently has to, since otherwise it couldn't compile mymod's
> references to private symbols). So when you try to reference 'impl', the
> overload set contains both main.impl and mymod.impl, which is an
> ambiguity error.
>
> This doesn't sound too bad, until you consider that mymod.d could be an
> external library that, originally, did *not* define a symbol called
> "impl". So your code happily compiles and works, and then you upgrade to
> the next release of the library, and suddenly you get a compile error,
> because the library authors introduced a PRIVATE symbol into the
> library.
>
>
> In all of the above cases, the compiler's current "simple"
> implementation of lookup rules and imports is creating a ripple effect
> of loopholes and corner cases that all add up to poor encapsulation and
> maintainability problems: local imports in structs makes imported
> symbols visible to unrelated outside code; private symbols in imported
> modules conflict with local symbols; even scoped, qualified imports
> (ostensibly the "cleanest" way to import something) can shadow
> parameters and enclosing struct/class members.
>
> In order to avoid these issues, one has to adopt increasingly convoluted
> ways of writing code based on a series of avoidances -- avoid
> module-global imports (introduces needless dependencies), avoid
> unqualified imports (shadows parameters), avoid putting imports in
> struct/class bodies (leaks imported symbols to outside world), etc..
> This is beginning to sound like C++ all over again (don't use malloc,
> use new; don't use new for arrays, use new[]; don't use raw pointers,
> use std::auto_ptr; etc....). :-(
>
> I'm finding it harder and harder to accept Walter's stance that symbol
> lookups should be kept simple and free from complications and convoluted
> corner cases, etc.. Except that it is *already* full of convoluted
> pitfalls and corner cases you must avoid, as illustrated above. I wish
> we would just acknowledge that the current symbol lookup / import rules
> (or at least the implementation thereof) are inadequate, and find a
> clean solution to this long-standing issue, instead of hoping that
> denying the problem will somehow, magically, make it all go away.
>
>
> T
>

I agree too.

I noticed this too when I was implementing the symbol lookup rules in 
the semantic engine for DDT. It was very weird to me that the imports 
would make symbols accessible not only during lexical lookup, but during 
qualified lookups as well :S .

You said that Walter wanted the symbol lookup rules to be as simple as 
possible. Well, a lot of that bad design could be fixed by simply 
stating that import symbols are only visibile in lexical lookup.

It wouldn't solve all issues, (namely example 1 there) but then we could 
just use qualified imports instead (and the issues with those would be 
fixed).

-- 
Bruno Medeiros
https://twitter.com/brunodomedeiros