GSoC 2018 - Your project ideas

Wed Dec 13 20:13:17 UTC 2017

On Wed, Dec 13, 2017 at 07:50:44PM +0000, bpr via Digitalmars-d-announce wrote:
> On Tuesday, 5 December 2017 at 18:20:40 UTC, Seb wrote:
[...]
> Of the projects in [2], I like the general purpose betterC libraries
> most, and I think it's something where students could make a real
> impact in that time period.
[...]
> > [2] https://wiki.dlang.org/GSOC_2018_Ideas

The "Who's (using) who?" project can use my symbol dependency tool as a
starting point:

	https://github.com/quickfur/symdep

Basically, as it stands, it can extract the list of symbols from the
program, and which symbol references which other symbols, where "A
references B" means the disassembled code between A and the next symbol
in the executable contains a reference to an address somewhere between B
and the next symbol after B.  This is done by inspecting the output of
the `objdump` tool.  A list of dependencies can be produced in either
text format or in GraphViz .dot format, which can be passed to graphviz
or neato to produce a graphical chart of symbol dependencies.

As of now, the following are possible points of improvement:

- Make it work on Windows and other OSes that don't have the `objdump`
  utility;

- Add better capability to limit the output to a subgraph of the full
  graph. Because of the huge number of symbols in a typical D program,
  outputting the entire dependency graph will produce a graph far too
  large to be easily understood.

  Currently, symdep has the capability of restricting the output to the
  subgraph of symbols reachable from a certain given symbol (useful for
  answering "what does function foo call?"), or the subgraph of symbols
  NOT reachable from a certain given symbol (e.g., "what are the symbols
  that aren't reachable from _Dmain?").  However, in medium-to-large D
  programs, the resulting subgraph is still far too large to be useful,
  so a better way of selecting a subgraph would be nice.  Perhaps
  implementing a maximum recursion level to the existing subgraph
  functions might be a good start, i.e., "what are the symbols
  referenced by _Dmain up to 3 levels down the call chain / reference
  graph?".

- Better accuracy for dependency detection. Currently, it may not
  produce the most accurate results because if there are private /
  static symbols in a module that don't export a public symbol in the
  executable, symdep won't know if a reference is actually to that
  private symbol, and will blindly assume that it's actually referencing
  the closest public symbol that comes before the private symbol in the
  executable.  This makes the output graph inaccurate.

  Also, some references that go through indirection may not be detected
  correctly, e.g., if function F calls function G via a function pointer
  table or thunk. (I think the function table case should still work, as
  long as the function table itself has a public symbol; it will just
  show up in the output as F -> tableSym -> G. But this has not been
  rigorously tested.)

- Currently, symdep does not distinguish between code symbols and data
  symbols.  For its stated purpose (i.e., find unexpected dependencies
  to Phobos modules that seemingly aren't used), this is not necessarily
  a bad thing. But being able to tell the difference helps to make the
  output more readable, e.g., use different node shapes for code vs.
  data symbols; it also allows subgraph queries to be restricted to a
  particular node type (show me the call graph vs. show me the data
  dependency graph), etc..

T

-- 
Dogs have owners ... cats have staff. -- Krista Casada