Suggestion: Operator `in` for slices

Steven Schveighoffer schveiguy at gmail.com
Sun Dec 19 18:31:46 UTC 2021


On 12/18/21 3:46 PM, Dennis wrote:
> On Saturday, 18 December 2021 at 18:47:33 UTC, Adam Ruppe wrote:
>> The `in` operator on AAs has a speed of O(log n) or O(1). `in` on 
>> slices would exceed that being O(n).
> 
> Who do people keep repeating that?
> 
> - Worst case time complexity of the `in` operator in AA's is actually 
> O(n). This can easily happen in practice when your `toHash` has a poor 
> implementation, which I have had before

Well, yeah. But that doesn't mean it happens often or even regularly. 
You need to have a bad `toHash` function.

We shouldn't lower the bar for `in` just because it's possible to make 
it perform poorly.

> - In custom types you can overload the `in` operator to something with 
> arbitrarily bad time complexity.

There's your answer -- make a custom type that allows `in` with an 
array-like structure.

Or make a UFCS function.

> - Despite scaling with O(n), a linear scan is pretty fast. You can 
> probably scan your entire RAM in a matter of seconds.

Complexity is not some old-fashioned idea that has no relevance on 
today's hardware. Linear searching is a real bottleneck, and identifying 
`in` to be sub-linear is a useful design decision.

If you are searching your entire memory by using an `opCmp`, I bet it 
will be more than a few seconds.

> The whole "it ruins performance guarantees" so far seems completely 
> hypothetical and is not something that came from experience with actual 
> code.

The design is sound. You use `in`, you get sub-linear (i.e. fast) 
performance. That kind of expectation is useful from a generic point of 
view. It's similar to opIndex for a linked list -- it's possible, it 
makes it so you can used linked lists in places that might expect 
arrays, but generic code that uses it might perform really badly as the 
code expects opIndex to be fast.

> The problem with `in` for slices to me is that it would be inconsistent 
> with how it works for AAs, where `in` searches for a *key* and returns a 
> pointer to a *value*. The same for slices would look like this:
>

[snip]

> But people suggest `in` to search a *value* in a slice.

Yes, this is a good point against. But the thing is, we also have 
`find`, which works fine, I'm not sure why we need `in` support.

> I've once overloaded the `in` operator for a custom 2D array type used 
> to represent a maze in which path finding was done. There it was useful 
> for bounds checking since writing comparisons with `width` and `height` 
> all the time is cumbersome. I wouldn't mind giving `in` this meaning so 
> you can do easy bounds checks in slices as well, but I think it will be 
> to confusing for newcomers expecting `in` to work like in Python.

Yes, that's not a terrible use case for it. But you are right that `in` 
will be confusing to newcomers, especially when you learn that an array 
is a collection of data. `if(x in arr)` doesn't seem to suggest that it 
will be looking at indexes.

-Steve


More information about the Digitalmars-d mailing list