Help optimize D solution to phone encoding problem: extremely slow performace.
Renato
renato at athaydes.com
Wed Jan 17 11:20:14 UTC 2024
On Wednesday, 17 January 2024 at 10:50:26 UTC, evilrat wrote:
> On Wednesday, 17 January 2024 at 10:43:22 UTC, Renato wrote:
>> On Wednesday, 17 January 2024 at 10:24:31 UTC, Renato wrote:
>>>
>>> It's not Java writing the file, it's the bash script
>>> [`benchmark.sh`](https://github.com/renatoathaydes/prechelt-phone-number-encoding/blob/master/benchmark.sh#L31):
>>>
>>> ```
>>> java -cp "build/util" util.GeneratePhoneNumbers 1000 >
>>> phones_1000.txt
>>> ```
>>>
>>
>> Perhaps using this option when running Java will help:
>>
>> ```
>> java -DFile.Encoding=UTF-8 ...
>> ```
>
> I've used powershell env var to set output to utf8, D version
> now works but java doesn't.
>
> ```
> java -Xms20M -Xmx100M -cp build/java Main print
> words-quarter.txt phones_1000.txt
> Exception in thread "main"
> java.lang.ArrayIndexOutOfBoundsException: Index 65485 out of
> bounds for length 10
> at Trie.completeSolution(Main.java:216)
> at Trie.forEachSolution(Main.java:192)
> at PhoneNumberEncoder.encode(Main.java:132)
> at Main.lambda$main$1(Main.java:38)
> at
> java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
> at
> java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:179)
> at
> java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
> at
> java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
> at
> java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1939)
> at
> java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
> at
> java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
> at
> java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
> at
> java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
> at
> java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
> at
> java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596)
> at Main.main(Main.java:38)
> ```
This is this line:
```
var digit = chars[ index ] - 48;
```
That means the input file is still not ASCII (or UTF-8) as it
should. Java is reading files with the ASCII encoding so it
should've worked fine.
More information about the Digitalmars-d-learn
mailing list