vibe.d-lite v0.1.0 powered by photon
Sönke Ludwig
sludwig at outerproduct.org
Thu Sep 25 07:24:00 UTC 2025
Am 23.09.25 um 17:35 schrieb Dmitry Olshansky:
> On Monday, 22 September 2025 at 11:14:17 UTC, Sönke Ludwig wrote:
>> Am 22.09.25 um 09:49 schrieb Dmitry Olshansky:
>>> On Friday, 19 September 2025 at 17:37:36 UTC, Sönke Ludwig wrote:
>>>> So you don't support timeouts when waiting for an event at all?
>>>> Otherwise I don't see why a separate API would be required, this
>>>> should be implementable with plain Posix APIs within vibe-core-lite
>>>> itself.
>>>
>>> Photon's API is the syscall interface. So to wait on an event you
>>> just call poll.
>>> Behind the scenes it will just wait on the right fd to change state.
>>>
>>> Now vibe-core-light wants something like read(buffer, timeout) which
>>> is not syscall API but maybe added. But since I'm going to add new
>>> API I'd rather have something consistent and sane not just a bunch of
>>> adhoc functions to satisfy vibe.d interface.
>>
>> Why can't you then use poll() to for example implement `ManualEvent`
>> with timeout and interrupt support? And shouldn't recv() with timeout
>> be implementable the same way, poll with timeout and only read when
>> ready?
>
> Yes, recv with timeout is basically poll+recv. The problem is that then
> I need to support interrupts in poll. Nothing really changed.
> As far as manual event goes I've implemented that with custom cond var
> and mutex. That mutex is not interruptible as it's backed by semaphore
> on slow path in a form of eventfd.
> I might create custom mutex that is interruptible I guess but the notion
> of interrupts would have to be introduced to photon. I do not really
> like it.
I'd probably create an additional event FD per thread used to signal
interruption and also pass that to any poll() that is used for
interruptible wait.
>> I think we have a misunderstanding of what vibe.d is supposed to be.
>> It seems like you are only focused on the web/server role, while to me
>> vibe-core is a general-purpose I/O and concurrency system with no
>> particular specialization in server tasks. With that view, your
>> statement to me sounds like "Clearly D is not meant to do multi-
>> threading, since main() is only running in a single thread".
>
> The defaults are what is important. Go defaults to multi-threading for
> instance.
> D defaults to multi-threading because TLS by default is certainly a mark
> of multi-threaded environment. std.concurrency defaults to new thread
> per spawn, again this tells me it's about multithreading. I intend to
> support multi-threading by default. I understand that we view this issue
> differently.
But you are comparing different defaults here. With plain D, you also
have to import either `core.thread` or
`std.concurrency`/`std.paralellism` to do any multi-threaded work. The
same is true for vibe-core. What you propose would be more comparable to
having foreach() operate like parallelForeach(), with far-reaching
consequences.
If we are just talking about naming - runTask/runWorkerTask vs.
go/goOnSameThread - that is of course debatable, but in that case I
think it's blown very much out of proportion to take that as the basis
to claim "it's meant to be used single-threaded".
>>>> Anything client side involving a user interface has plenty of
>>>> opportunities for employing secondary tasks or long-running sparsely
>>>> updated state logic that are not CPU bound. Most of the time is
>>>> spent idle there. Specific computations on the other hand can of
>>>> course still be handed off to other threads.
>>>
>>> Latency still going to be better if multiple cores are utilized.
>>> And I'm still not sure what the example is.
>>
>> We are comparing fiber switches and working on data with a shared
>> cache and no synchronization to synchronizing data access and control
>> flow between threads/cores. There is such a broad spectrum of
>> possibilities for one of those to be faster than the other that it's
>> just silly to make a general statement like that.
>>
>> The thing is that if you always share data between threads, you have
>> to pay for that for every single data access, regardless of whether
>> there is actual concurrency going on or not.
>
> Obviously, we should strive to share responsibly. Photon has Channels
> much like vibe-core has Channel. Mine are MPSC though, mostly to model
> Input/Output range concepts.
True, but it's still not free (as in CPU cycles and code complexity) and
you can't always control all code involved.
>> If you want a concrete example, take a simple download dialog with a
>> progress bar. There is no gain in off-loading anything to a separate
>> thread here, since this is fully I/O bound, but it adds quite some
>> communication complexity if you do. CPU performance is simply not a
>> concern here.
>
> Channels tame the complexity. Yes, channels could get more expansive in
> multi-threaded scenario but we already agreed that it's not CPU bound.
If you have code that does a lot of these things, this just degrades
code readability for absolutely no practical gain, though.
>>>> The problem is that for example you might have a handle that was
>>>> created in thread A and is not valid in thread B, or you set a state
>>>> in thread A and thread B doesn't see that state. This would mean
>>>> that you are limited to a single task for the complete library
>>>> interaction.
>>>
>>> Or just initialize it lazily in all threads that happen to use it.
>>> Otherwise, this is basically stick to one thread really.
>>
>> But then it's a different handle representing a different object -
>> that's not the same thing. I'm not just talking about initializing the
>> library as a whole. But even if, there are a lot of libraries that
>> don't use TLS and are simply not thread-safe at all.
>
> Something that is not thread-safe at all is a dying breed. It's been 20
> years that we have multi-cores. Most libraries can be initialized once
> per thread which is quite naturally modeled with TLS handle to said
> library. Communicating between fibers via shared TLS handle is not
> something I would recommend regardless of the default spawn behavior.
Unfortunately, those libraries are an unpleasant reality that you can't
always avoid.
BTW, one of the worst offenders is Apple's whole Objective-C API.
Auto-release pools in particular make it extremely fragile to work with
fibers at all and of course there are all kinds of hidden thread
dependencies inside.
>>>> This doesn't make sense, in the original vibe-core, you can simply
>>>> choose between spawning in the same thread or in "any" thread.
>>>> `shared`/`immutable` is correctly enforced in the latter case to
>>>> avoid unintended data sharing.
>>>
>>> I have go and goOnSameThread. Guess which is the encouraged option.
>>
>> Does go() enforce proper use of shared/immutable when passing data to
>> the scheduled "go routine"?
>
> It goes with the same API as we have for threads - a delegate, so
> sharing becomes user's responsibility. I may add function + args for
> better handling of resources passed to the lambda.
That means that this is completely un`@safe` - C++ level memory safety.
IMO this is an unacceptable default for web applications.
>>>> The GC/malloc is the main reason why this is mostly false in
>>>> practice, but it extends to any central contention source within the
>>>> process - yes, often you can avoid that, but often that takes a lot
>>>> of extra work and processes sidestep that issue in the first place.
>>>
>>> As is observable from the look on other languages and runtimes malloc
>>> is not the bottleneck it used to be. Our particular version of GC
>>> that doesn't have thread caches is a bottleneck.
>>
>> malloc() will also always be a bottleneck with the right load. Just
>> the n times larger amount of virtual address space required may start
>> to become an issue for memory heavy applications. But even if ignore
>> that, ruling out using the existing GC doesn't sound like a good idea
>> to me.
>
> The existing GC is basically 20+ years old, ofc we need better GC and
> thread cached allocation solves contention in multi-threaded environments.
> Alternative memory allocator is doing great on 320 core machines. I
> cannot tell you which allocator that is or what exactly these servers
> are. Though even jemalloc does okayish.
>
>> And the fact is that, even with relatively mild GC use, a web
>> application will not scale properly with many cores.
>
> Only partially agree, Java's GC handles load just fine and runs faster
> than vibe.d(-light). It does allocations on its serving code path.
I was just talking about the current D GC here. Once we have a better
implementation, this can very well become a much weaker argument!
However, speaking more generally, the other arguments for preferring to
scale using processes still stand, and even with a better GC I would
still argue that leading library users to do multi-threaded request
handling is not necessarily the best default (of course it still *can*
be for some applications).
Anyway, the main point from my side is just that the semantics of what
*is* in vibe-core-light should really match the corresponding functions
in vibe-core. Apart from that, I was just telling you that your
impression of it being intended to be used single-threaded is not right,
which doesn't mean that the presentation shouldn't probably emphasize
the multi-threaded functionality and multi-threaded request processing more.
>>>> Separate process also have the advantage of being more robust and
>>>> enabling seamless restarts and updates of the executable. And they
>>>> facilitate an application design that lends itself to scaling across
>>>> multiple machines.
>>>
>>> Then give me the example code to run multiple vibe.d in parallel
>>> processes (should be simillar to runDist) and we can compare
>>> approaches. For all I know it could be faster then multi-threaded
>>> vibe.d-light. Also honestly if vibe.d's target is multiple processes
>>> it should probably start like this by default.
>>
>> Again, the "default" is a high-level issue and none of vibe-core's
>> business. The simplest way to have that work is to use
>> `HTTPServerOption.reusePort` and then start as many processes as desired.
>
> So I did just that. To my surprise it indeed speeds up all of my D
> server examples.
> The speed ups are roughly:
>
> On vibe-http-light:
> 8 cores 1.14
> 12 cores 1.10
> 16 cores 1.08
> 24 cores 1.05
> 32 cores 1.06
> 48 cores 1.07
>
> On vibe-http-classic:
> 8 cores 1.33
> 12 cores 1.45
> 16 cores 1.60
> 24 cores 2.54
> 32 cores 4.44
> 48 cores 8.56
>
> On plain photon-http:
> 8 cores 1.15
> 12 cores 1.10
> 16 cores 1.09
> 24 cores 1.05
> 32 cores 1.07
> 48 cores 1.04
>
> We should absolutely tweak vibe.d TechEmpower benchmark to run vibe.d as
> a process per core! As far as photon-powered versions go I see there is
> a point where per-process becomes less of a gain with more cores, so I
> would think there are 2 factors at play one positive and one negative,
> with negative being tied to the number of processes.
>
> Lastly, I have found opportunities to speed up vibe-http even without
> switching to vibe-core-light. Will send PRs.
Interesting, I wonder whether its the REUSE_PORT connection distribution
that gets more expensive when it's working cross-process. Agreed that
the TechEmpower benchmark is in dire need of being looked at. In fact I
had the code checked out for a long while, intending to look into it,
because it obviously didn't scale like my own benchmarks, but then I
never got around to do it, being to busy with other things.
More information about the Digitalmars-d-announce
mailing list