<whitequark>
sb0: remember we talked about a perf regression in LLVM 3.9? I have investigated that one
<whitequark>
turns out that a change in the target-independent code generator made the test regress by about 300ns out of 6200ns per iteration
key2 has quit [Ping timeout: 260 seconds]
<whitequark>
I would rather like to upgrade to LLVM 3.9 and then fix the new pointer global store that prevents LICM from optimizing that test even further
<whitequark>
let me give you an estimate on that improvement
<sb0>
bb-m-labs, force build artiq
<bb-m-labs>
build #1062 forced
<bb-m-labs>
I'll give a shout when the build finishes
<whitequark>
sb0: I can improve test_pulse_rate_dds from 6200 to 5940ns with the @now change
<whitequark>
possibly slghtly more
<sb0>
@now?
<whitequark>
replacing the stores to the now global variable will let LICM move various constant stuff out of the loop even further
<whitequark>
@now is LLVM IR's syntax for a global.
<sb0>
can't now be kept mostly in registers anyway?
<sb0>
including across syscalls
<whitequark>
yup, but there is no explicit register pinning in LLVM
<sb0>
ideally I'd even dedicate registers to common rtio parameters (timestamp, channel etc.) and hook those up directly to the RTIO core to save bus writes
<whitequark>
won't do
<whitequark>
well, no, maybe it will
<whitequark>
sb0: ok, yes, we can do that. but we will need a custom calling convention and CPU type in LLVM
<sb0>
ok. let's look into that later after e.g. the rust runtime is fully settled
<bb-m-labs>
I'll give a shout when the build finishes
<whitequark>
sb0: wtf is happening with serverraum.org dns again?
<whitequark>
lab.m-labs.hk resolves about half of the time here
<sb0>
works fine here
<sb0>
whitequark, what about adding has_dds manually into the artiq targets?
klickverbot has joined #m-labs
<sb0>
this commit adds a lot of garbage config options
<sb0>
and this dds stuff will go anyway...
<klickverbot>
whitequark: Then the example in doc/manual/compiler.rst is rather misleading – the text says "In the synthetic example above, the compiler will be able to detect that the result of evaluating ``self.ratio ** 2`` never changes and replace it with a constant, removing an expensive floating-point operation.", but the emitted code doesn't actually change at all
<whitequark>
yes, the example is completely wrong
<klickverbot>
whitequark: Also, if all attributes of an object are kernel invariants, I don't see why emitting it as a constant would be wrong (I agree that it can't be done when there are any mutable fields at all, which makes it rather inflexible)
klickverbot has quit [Remote host closed the connection]
klickverbot has joined #m-labs
<whitequark>
bb-m-labs: force build --props=package=rust-core-or1k conda-lin64
<bb-m-labs>
build forced [ETA 47m26s]
<bb-m-labs>
I'll give a shout when the build finishes
<whitequark>
sb0: bah, this is troublesome
<whitequark>
I can pin a *pointer* to now to a register but this doesn't seem to make any difference to LLV
<whitequark>
*LLVM
<whitequark>
even with all appropriate noalias metadata
klickverbot has quit [Ping timeout: 260 seconds]
<whitequark>
I cannot pin now itself to a register without making the return convention significantly more complicated, or patching a custom one for ARTIQ into LLVM
<whitequark>
so it would take both return value regs in the C calling convention...
<cr1901_modern>
Out of curiosity, what's wrong w/ using both registers?
<whitequark>
then you cannot return anything else
<rjo>
whitequark: is this for returning now?
<whitequark>
yup
<rjo>
whitequark: do we need that at all if it's pinned?
<whitequark>
yup. that's how you pin things to registers in llvm. you write a calling/return value convention
<whitequark>
(or reuse an existing one)
<whitequark>
there's no support for "explicit" register pinning
<cr1901_modern>
Oh, this is Rust, so multiple return vals are possible?
<whitequark>
this is ARTIQ
<whitequark>
ARTIQ Python to be specific, which is our compiled subset
<rjo>
whitequark: hmm. are you saying that or1k only has two 32 bit return registers (to support returning a 64 bit value, i guess?) and if we were to pin now, they would both be used? i.e. you can only use the return registers to pin variables?
<rjo>
i do see why pinning a value makes one/some registers unavailable (to everything, i.e. irq save restore, prologues, returns, calling conventions). but i don't get why returns are special.
<whitequark>
rjo: LLVM's register allocator or IR do not have any way to say "pin this mutable variable to this register". what you can do is you can make sure it is placed into the register you want at entering the function and exiting it, and (maybe) disallowing allocating it
<cr1901_modern>
"patching a custom one for ARTIQ into LLVM" would be adding that "pin this mutable variable to this register" functionality?
<whitequark>
if we use the C calling convention on OR1K, like we do now, and pin now using this technique, which we can do, it will eat both RV register
<whitequark>
we can also write a custom calling convention that doesn't have this drawback, but it will take time.
<rjo>
ah. ok. that massively restricts the registers that are usable for this to exactly those that would be the return registers.
<whitequark>
mostly, the part that takes time is the custom frame lowering code.
<rjo>
is a custom calling conv harder than adding gateware support with now a new special register?
<whitequark>
in one case we are stuck with a fork of LLVM, in other case with a fork of mor1kx
<whitequark>
I know how to approach the former (I did it before) but not the latter
<rjo>
iirc mor1kx makes some allowance for this kind of extension.
<rjo>
and mor1kx seems very static anyway.
<rjo>
but yeah. i have never done either.
<cr1901_modern>
How difficult could it be to add an "extra register"?
<whitequark>
I'm not very happy with the idea of hacking on mor1kx verilog
<whitequark>
llvm is far faster to build and test, at least
klickverbot has joined #m-labs
klickverbot has quit [Read error: Connection reset by peer]
<whitequark>
bb-m-labs: force build artiq
<bb-m-labs>
build #1069 forced
<bb-m-labs>
I'll give a shout when the build finishes
<whitequark>
rjo: ok. i will try to add a CC.
<whitequark>
for now we use a fork anyway
<whitequark>
hm, which registers should it be? r31:r30 perhaps