I dno't recall how 15 as starting length in createByteList in lexer came about
yeah, but it might be many small strings too
I am really just wondering how much we overcommit here and it looks like 80% of the time
but bytelist has a cost if you start to small and the string is larger
lopex: I increased this from 6 to 15 at the same time I stopped intern some ident strings
I feel this would only reduce number of grows by 1 in strings > 15
which is 20% of the strings
going to use science! well be a little hacky and consider strings created during a rails app start
OMGZ I forgot bytelist is does not use a scaling factor
enebo[m]: change it to some larger arbitrary value
so the chance of misinterpreting gets lower
buffer.append(c); // O_o
this might be significant
StringTerm and no doubt HeredocTerm calls append(byte) n times for a n length() string which will call grow(1) n times
so my default of 15 hid the cost by making 80% of all things fit by default
OTOH this never shows up in profiling I guess we will see
This also could explain how poor oj dump speed is using ByteList as the backing store
I thought it was just because of constant bounds checking but if it has no scaling factor I am dump a 15k json file with 15k System.arraycopys
I'm only getting more stupid as I age wrt those things
I dumped all str lengths made and it is fascinating how small most strings are
I guess Ruby encourages interpolation enough where each fragment is generally small
otoh, did we see any perf bump once java moved to compact strings ?
gem list is unaffected by just making the array a lot larger but I don't like this and rails will definitely make a lot more strings
anyways I will play with this tomorrow
add in a scaling factor and maybe also do inlined CR calc instead of walking the string a second time
Is that a heap histogram or an allocation profile?
the latter I have thought about in the past but I will talk to Kevin before I attempt it
it is stringterm bytelist sizes
I made the histogram looking at only that
But this is from a heap snapshot, yes? live objects?
only other thing which is doing this is heredoc itself which will typically be larger strings
this is from rails s and killing it
I made this from printlns
Okay so it is all allocations
all allocations of normal strings in stringterm
so very specific thing
I believe RubyString imposes a growth factor when growing the ByteList
So that end of things may be better
1.5x or something
yeah and this has nothing to do with RubyString and happens before it ever is actually a string
it is the parser reading strings in the lexer and not even the only path just the most common one
dinner though...this is unneeded churn even if I cannot measure much
Yeah, I was just wondering if we should be looking harder at ByteList in other allocation profiles too
I am definitely going to look around
anything which does an append directly is a suspect
what the hell
there is a growth factor in here
I wonder, since ralloc in java is new and copy it sohuld be quite easily localized by tools by this pattern
I am seriously confused now
newSize >> 1
this way we could learns about reallocs
is it not ?
lopex: I was super confused I did not see the newSize + (newSize >> 1) so everything I said above is not an issue from a growth factor issue
jrafanie has joined #jruby
jrafanie has quit [Client Quit]
The choice of 15 may be a bad default though for StringTerm strings
enebo[m]: where would you see that ?
I'm confused
in grow()
StringBuffer defaults to 16
it could be where the number came from
But it's also only used if someone plans to mutate
Could be
I bet I just did the histogram befoer but for gem list
80% requires no grow() so that is pretty nice
Right-sizing some of these BLs could give us free memory reduction
well that was why I was looking
headius[m]: also aggressive cow could trigger more barriers
but I don't want to trafe-off any percieved startup for that
since those like ampty strings for diffferent encodings might be accessed
especially since memory problems are not actually in the heap at all right now
potentially changed
so far most memory improvementsd have no effect on startup so I just want to continue the trend
when I say most I express doubt in that measuring wall clock is a bit noisy
and all that bit flipping in flags
but after all the changes if anything we may be a tiny bit faster
it's a mess
anyways dinner for reals now
headius[m]: and all those potentiall leaks for arrays we talked about years ago
I think it's the first cow we shoudl get rid of
it could help gc
It wouldn't be too difficult to remove copy on write and try some things out
* rtyler
what things ?
Yeah possibly
just remove the cow
I'm sure we're screwing up some object age metrics by keeping these backing arrays around
I mean things likely to hit arrays hard, like any typical Ruby application. Just see how bad the allocation curve looks, if it looks bad at all
but it's hard to measure
well, impossible
like hmm
imagine a pathological benchmark
Why impossible? If the heap is significantly bigger, that tells us something. If applications run slower or faster, that tells us something too
fill an array with some distinct objects
make a slice
operate on them
Primary concern for me is always real applications versus synthetic benchmarks. Obviously we can show a performance hit for things like heavy array slicing
yeah I know
I guess the corollary to this is that I have no idea how common it is too heavily slice up an array
That is always been the case we bring up when we discussed removing copy on write, but do we really know it's a problem?
Only we and MRI do COW for Array
no standard metrics
er, no data I mean
and mri does more now
since it packs small strings in unions
so it's like 4x improvements just on allocations
that extra 1x was on java meta data :P
but mri in single alloc can to whle string indeed
I forgot at what state their gc is
but hat surely helps
headius[m]: btw I'm running a semimportant production jruby app on a docker now
so I'd be interested on a state of those images
We could pack very small strings into the header
like longs ?
or/and unsave ?
we have lots of int though without unsafe
hash only I guess
and flags
wuld we know how aligned RubyString fields are on comon platforms ?