claudiuinberlin has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
claudiuinberlin has joined #jruby
slyphon has joined #jruby
<ChrisBr>
headius: just saw your comment! Great your back, I hope you had awesome vacations :)
<ChrisBr>
regarding your comment: For the linear search approach one thing we could try is to store the hash value for a quick comparision when fetching. Atm we only store the key and value! However, this would increase array size again
<lopex>
ChrisBr: hya
<lopex>
ChrisBr: but can you store that hash in even indexes in the bucket array ?
<lopex>
or add, whatever
<lopex>
*odd
<lopex>
ChrisBr: oh disregard that
<ChrisBr>
currently we only have one array which holds key and value
<ChrisBr>
so we would need three array positions for every entry
<lopex>
and then we could interleave indexes with hashes right ?
<ChrisBr>
how you mean? As we only store key and value and the moment, we need to check the object for equal when fetching which is probably quite slower than hash comparision
<lopex>
ChrisBr: using boxed Integers in interleaved arrays wouldnt be worth en effort I guess
<ChrisBr>
not sure how much difference it would make...
<ChrisBr>
but still wondering why the linear search and open adressing is still slower than master
<lopex>
ChrisBr: I'm just pursuing an idea where you can store hash values in odd indexes
<lopex>
ChrisBr: on what threshold ?
<lopex>
wrt that linear search ?
<ChrisBr>
8
<ChrisBr>
0-7 elemnts is linear search
<ChrisBr>
after that open addressing
<lopex>
with hashvalues cached ?
<ChrisBr>
nope, never cached at the moment
<lopex>
what key objects ?
<lopex>
then you pay the cost of calling hash mostly
<lopex>
I think ?
<ChrisBr>
maybe :/
<lopex>
ChrisBr: for strings or symbols ?
<ChrisBr>
my benchmark?
<lopex>
wrt the bench
<lopex>
yeah
<ChrisBr>
mostly integers at the moment
<lopex>
hash is megamorphic
<lopex>
in this case cashing it wuld help greatly
<lopex>
*would
<ChrisBr>
Hm!
<ChrisBr>
but would also mean to increase the array / object size
<ChrisBr>
but guess we have to try that out ...
<lopex>
ChrisBr: hmm, for linear case, I'd try to cache the hashes as java.lang.Long in odd indexes
<ChrisBr>
lopex: in odd index I already store the value
<lopex>
or even, whatever
<lopex>
ah
<ChrisBr>
even stores the key ;)
<ChrisBr>
so I need a third position or introduce an object again ..
<lopex>
ChrisBr: so maybe even use triples ?
<lopex>
doesn sound hard
<ChrisBr>
yeah right
<lopex>
but only for linear for now
<lopex>
after that everything gets rejiggered right ?
<ChrisBr>
you think doesn't improve for open addressing?
<lopex>
no idea
<ChrisBr>
yep, every resize needs to rehash
<lopex>
yeah, I think quite stringly now that given rare collisions we might try not to cache the hash
<lopex>
so have a bigger array ight ?
<lopex>
ChrisBr: I guess it was a mistake on my part for Hash since it is quite dense now
<lopex>
ChrisBr: a full hash might have like 5 collisions per bucket
<lopex>
bit it deasnt really matter for ruby until we begin calling ruby "hash"
<lopex>
oh well
<lopex>
hence the hash cache
<lopex>
oh we might pay big time equals though
<ChrisBr>
yeah
<lopex>
ChrisBr: so after your change we could see really big improvements anyways
<ChrisBr>
I hope so :)
<ChrisBr>
ok, so I will try to implement a hash cache for the linear search approach
<lopex>
the caches will be boxed but maybe worth trying
<lopex>
so tempting to use unsafe
<lopex>
ChrisBr: on the bad side those boxed longs wont fit long cache (if there is any)
<ChrisBr>
lopex: why they are boxed?
<ChrisBr>
hash is an int, no?
<lopex>
ChrisBr: oh, wait
<ChrisBr>
grml, but the array holds only objects ...
<lopex>
ChrisBr: I thought you want to store that in a bin array which is IRubyObject right ?
<lopex>
yes
<ChrisBr>
yup
<ChrisBr>
forgot about that
<lopex>
for Integer the cache is -127 to 128
<lopex>
but that's futile
<lopex>
hash values should vary greatly
<lopex>
but still
<lopex>
hmm
<lopex>
there's a few things in play
<lopex>
locality - good
<lopex>
index checks - no idea (might be easier for hotspot to optimize if it's in the same array ?)
<lopex>
indirection (good? no idea since it depends on that index checking (bounds))
<lopex>
er
<lopex>
wait
<lopex>
the indirection might be better actually
slyphon has quit [Remote host closed the connection]
<ChrisBr>
which kind of indirection you mean?
<lopex>
like array[array2[...]]
slyphon has joined #jruby
<ChrisBr>
hm
<ChrisBr>
so you mean having three arrays: values, keys, hashes ?
<lopex>
ChrisBr: that's the case with another bin
<ChrisBr>
instead of storing everything in the same array
<lopex>
ChrisBr: I'm not for that for now
<ChrisBr>
ok
<lopex>
but yeah
<lopex>
no idea about how it will be optimized
<ChrisBr>
k
<lopex>
but the general rules are
<lopex>
having less indirection is good
<lopex>
locality is good
<lopex>
ChrisBr: not sure what does hotspot do to eliminate bounds checks for this case
<lopex>
but for example
<lopex>
if you have an arbitrary a[i]
<lopex>
the runtime will always check the boundaries
<lopex>
but if you do that in a loop
<lopex>
then given constrant stride (and maybe other assumptions) hotspot will only check array boundaries outside the loop
<lopex>
that the most straight forward scenarios
<lopex>
the more advanced ones include nested loops
<lopex>
but it's all the same play
<lopex>
ChrisBr: if that does any sense what I'mn saying
<lopex>
ChrisBr: the problem is that array size is also a runtime thing
<ChrisBr>
more or less :) Coming from ruby world so Java is still mystery for me sometimes :/
<lopex>
ChrisBr: imagine a simplistic scenario
<lopex>
int []a = new int[3]; int b = a[0];
<lopex>
the size is known here up front right
<lopex>
the array cannot change in between
<ChrisBr>
right
<lopex>
those two assumptions allow you to eliminate bounds checking when you do [a]
<lopex>
er a[0]
<lopex>
but most likely the allocation is too far behind any access to that array
<lopex>
so the optimizer has a hard job firguring it out
<lopex>
er, too far ahead I meant wrt allocation
<lopex>
so then you can speculate
<lopex>
given this callsite I always saw an array of size 6
<lopex>
and down the calls this knowledge can have tremendous implications
<lopex>
ChrisBr: I'm also not an expert on this so take it with a grain of salt
<ChrisBr>
hm! Well, for the linear search we can also fix the size to 24 (3 * 8) because only for the first allocation we will use the linear search! After that we use hashing and the array can grow
<lopex>
ChrisBr: that's what your code does, then the optimizer might beserve it
<lopex>
ChrisBr: on jrubytruffle you could actually help that optimizer in your code