code databases for ai

monkyyy crazymonkyyy at gmail.com
Sat Dec 13 23:27:40 UTC 2025


I started the process of extracting all the code from the forums 
3 weeks ago:

https://github.com/crazymonkyyy/dlangforums (I know of some flaws 
here, ai slop but semi-functional)

adr's style of code of giant files with example programs in 
comments needs some amount of processing (qwen doesnt read adr's 
files without being explicitly told to, I dont know if any of 
them have the "attention" to handle "simple display")

extracting links from dub webpages likely isnt that hard

My own code is a horrible mess, I never got around to actually 
cleaning up my repos, when I planned on doing that last year or 
the year before, or the year before. To say nothing of my unnamed 
gists.

etc.

---

Its a big project to try to collect as much of trusted code into 
one organization system, "rag" is a bit of a meme but seeding a 
code base with known good code(compared to ai hullinations 
anyway) for a degree of taste and something that actually 
compiles is a real technique.

(dont any of yall tell me "I told you so" about dub, it still 
will require processing)

if anyone else is working on pieces id like to know about it. I 
have some thoerys about how to meta program to detect if a struct 
is a container, if a function is a range algorithm, if a file is 
a program, etc.

Has anyone done anything on this subject? Is anyone interested in 
it? It may need a real hosting solution, github has file size 
caps that I ran into with just the forums if I start extracting 
from dub and then try to host that github may get quite upset.


More information about the Digitalmars-d mailing list