<rjo>
seems to be a bigger change. one would want to push most of the wrap() logic into Signal/Constant as well, no?
<sb0>
the operators and other elements are already calling wrap()
<sb0>
so wrap() could attempt to call __fhdl__ if it does not recognize the object
<sb0>
but
<sb0>
this introduces another tricky corner case: operations between two __fhdl__ objects (e.g. CSRConstant + CSRConstant)
<rjo>
the question is why wrap() (and many other functions) need all that intelligence and knowledge about the things they handle. i would have pushed that into the objects in general.
<rjo>
for CSRConstant + CSRConstant, the operator would convert each to fhdl.
<rjo>
actually, wrap() should only concern itself with things that have been determined to be "not fhdl".
<rjo>
Operators should be able to blindly to try: argument.__fhdl__(); except: wrap(argument)
<rjo>
and Signal and constants can return self. CSRConstants can return their constant, CSRStatus CSRStorage similar.
<rjo>
sb0: any idea what transciever would be needed for DAC39J84 (what a beast...)?
<sb0>
kintex 7 should do it
<rjo>
oh minimum serdes rate 0.78 Gbps.
<sb0>
kintex 7 can do 12.5Gbps
<sb0>
it will eat all the transceivers of a 70t or 160t, though
<rjo>
the the cost per channel is more than ~500 or ~1k it will become inconvenient very quickly and there will be few users, especially as general purpose DAC (pdq3 style).
<sb0>
so that would be 2k to 4k for a board with that DAC?
<sb0>
k7s aren't cheap... around 1k on digikey for a 325
<sb0>
Artix7 are 6.6Gbps, and lattice fpgas crawl at 3.2
<sb0>
cyclone is also 6.1 gbps, and arria/stratix are expensive
Guest88321 has quit [Quit: This computer has gone to sleep]
<sb0>
rjo, you cannot redefine what + does in python, unless at least one of the objects has a custom __add__ or __radd__
<sb0>
and same for other operators
<sb0>
so those things that offer __fhdl__ would also need to overload all operators to be consistent
<whitequa1k>
sb0: the problem with your code would arise in slightly different case
<whitequa1k>
specifically...
<whitequa1k>
with parallel: f(x(), delay(2), x(), delay(2)); g(delay(3), y())
<whitequa1k>
I suppose it was sufficiently obscure to not matter
siruf has quit [Ping timeout: 240 seconds]
siruf has joined #m-labs
Mon_ has joined #m-labs
Mon_ is now known as Guest48315
jaeckel has quit [Ping timeout: 240 seconds]
Guest48315 has quit [Quit: This computer has gone to sleep]
Guest48315 has joined #m-labs
jaeckel has joined #m-labs
Guest48315 has quit [Quit: This computer has gone to sleep]
Mon_1 has joined #m-labs
<whitequa1k>
hmm, actually after the current changes it should be possible to use the ttl pulse function
Mon_1 has quit [Ping timeout: 255 seconds]
<GitHub55>
[artiq] whitequark pushed 2 new commits to new-py2llvm: http://git.io/v4pOc
<GitHub55>
artiq/new-py2llvm 5cd12ff whitequark: compiler.iodelay: fold MUToS and SToMU.
<GitHub55>
artiq/new-py2llvm a01e328 whitequark: transforms.interleaver: don't assume all delay expressions are folded.
<sb0>
rjo, if cost matters, maybe artix7 + lvds dac is better...
<sb0>
whitequa1k, are all the unittests passing on the board?
mumptai has joined #m-labs
<sb0>
whitequa1k, ah, i see you took that horrible train as well
<whitequa1k>
sb0: horrible train?
<whitequa1k>
what do you mean?
<sb0>
the mtr with the advertisement everywhere
<whitequa1k>
oh, yeah
<whitequa1k>
as for unittests, definitely not, the inlining is not here yet
<sb0>
doesn't the ttl pulse function require inlining?
<whitequa1k>
yes. I've just div=scovered that it doesn't actually work
<whitequa1k>
sb0: actually, on careful examination, there's a larger problem
<whitequa1k>
you said you want inlining *only when necessary*, right?
<whitequa1k>
but it's not actually possible to do in almost any case
<whitequa1k>
like, the only case where you can avoid inlining is when no functions actually execute in parallel
<sb0>
yes
<whitequa1k>
when is that useful?
<felix_>
i've rebased the ppp patch onto the current master, made pppd record everything to a file, had a look with wireshark at that dump and it really seems that the device-side receive path looses data when packets get processed
<felix_>
tested that with flood pings and the first pin always gets a response, but after a ping which gets a respons, the next one (or sometimes two) don't get a response
<felix_>
(on the pipistrello board)
<felix_>
so yeah, i'm quite confident that implementing hardware flow control will solve at least a big part of the problem
<felix_>
the direction from the device to the computer seems to be completely fine
rohitksingh has joined #m-labs
nicksydney has joined #m-labs
rohitksingh has quit [Ping timeout: 272 seconds]
Mon_ has joined #m-labs
Mon_ is now known as Guest32581
rohitksingh has joined #m-labs
<cr1901_modern>
whitequa1k: What do you mean by this tweet? https://twitter.com/whitequark/status/667982701811527680 wrt "signed overflow being undefined"; this sounds like something decided by the standards committee, rather than being inherent to the language.
<cr1901_modern>
I'm guessing one language-based solution here would be to make C support "arbitrarily large integers" built-in?
Guest32581 has quit [Quit: This computer has gone to sleep]
<whitequa1k>
signed overflow being undefined is necessary to vectorize for loops with an int induction variable
<whitequa1k>
it is reasonable to want to vectorize loops, it's just that most languages are able to provide the invariant the vectorizer wants without resorting to UB
<cr1901_modern>
Well what would be a better invariant than "assume it doesn't happen"? A special datatype that promises that overflow does not happen (and crashes the program if you try)? I'm thinking in terms of a minimal runtime for bare metal targets.
<whitequa1k>
higher-order functions, for example
<whitequa1k>
or, well, something like stl's .begin()/.end(). either will guarantee there's no overflow
<whitequa1k>
trap on overflow is highly desirable but expensive enough in terms of inhibiting optimizations that rust, after long debate, decided against them
<whitequa1k>
you could also reformulate the for loop in terms of ranges, which means that the compiler can insert a check before the SIMD part
<cr1901_modern>
So in an environment w/o a vectorizer (for simplicity of visualization), with a higher order function, you run a snippet of code that figures out the next index to go to before running the loop body?
<whitequa1k>
no, a higher-order function embeds the invariant that the compiler wants
<whitequa1k>
then you inline it and you're back to the kind of code that a for loop generates, only now you have that invariant
<cr1901_modern>
Oh right; a higher order function not only returns a function, but can also take a function (presumably the body of code you want to run in a loop) as input.
<cr1901_modern>
It seems in this case the difference between C and other languages is that loop indices are assumed to be monotonically increasing or decreasing, but other languages enforce it, taking care of edge cases (just before overflow or underflow happens) manually, while C just assume it doesn't happen and leaves it to the programmer.
<cr1901_modern>
s/manually/as special code paths/
<cr1901_modern>
(I assume the monotonically increasing/decreasing is the important invariant here for proper vectorization?)
<whitequa1k>
yes
<cr1901_modern>
What type of function would you use to provide the monotonic invariant? (Splitting the loop into two mutually exclusive ranges sounds like a decent idea for loops where one iteration does not depend on data from the previous)
Mon_ has joined #m-labs
Mon_ is now known as Guest50687
Guest50687 has quit [Quit: This computer has gone to sleep]
Guest50687 has joined #m-labs
<sb0>
whitequa1k, rjo wanted it. i guess for data processing, or long functions that are not executed in parallel with others...
<sb0>
felix_, i see. are you going to implement xon/xoff? that would make it more portable, as it doesn't need extra lines that many boards do not have.
Guest50687 has quit [Quit: This computer has gone to sleep]
Guest50687 has joined #m-labs
<whitequa1k>
sb0: oh, wait, actually *inlining* only when necessary was never an issue, it's not in github issue texts
<whitequa1k>
rather, *unrolling* only when necessary
<sb0>
whitequa1k, it should inline only when necessary too
<sb0>
also, if possible, make the unrolling/inlining/interleaving produce valid IR even when there are unresolved parallel blocks, and throw an error about them at a later stage
<sb0>
as we might someday have HW support for those blocks
<sb0>
but if that's hard, forget it
Guest50687 has quit [Quit: This computer has gone to sleep]
<whitequa1k>
sb0: that's actually exactly what happens now, all parallel blocks are valid IR and they simply explode in LLVMIRGenerator
<whitequa1k>
so I don't have to change anything :)
Mon_ has joined #m-labs
Mon_ has quit [Client Quit]
<whitequa1k>
sb0: so let's revisit inlining only when necessary
<whitequa1k>
when you call a function without inlining, you have to assume the worst case, i.e. it advances the timeline by the amount you know, and it submits an event at the beginning and end of the period it claims
<whitequa1k>
therefore, the only case when you can call it without inlining, is when you interleave it with an immediate (not inside another function) call to delay in the other branch of the with parallel block
<whitequa1k>
or maybe several branches
<sb0>
or when you don't have to interleave it with anything
<whitequa1k>
yes, in other words, in trivially serial code
whitequa1k is now known as whitequark
<whitequark>
the case where you don't have to interleave it with anything is easy to handle but the case where you interleave it with a delay is more tricky...
<sb0>
maybe rjo has a different idea, but i believe that serial code, or function calls that don't touch the timeline, are the only relevant non-inline cases
<whitequark>
oh, function calls that don't touch the timeline are obviously not inlined, that is trivial
<whitequark>
but accurately handling code inside with parallel blocks that turns out to be effectively serial is trickier
<sb0>
doing it properly requires thread-like stuff which is slow
<whitequark>
no, I mean, it's just a rarely used edge case that might tickle transform bugs
<sb0>
I don't think we should do those complicated things... just inline when in doubt
rohitksingh has quit [Ping timeout: 255 seconds]
<sb0>
i.e. i guess inlining anything that touches the timeline when in a parellel block is fine
<sb0>
what cases do you want to support without inlining?
<whitequark>
I meant the one you just described, it would be like half of the interleave transform to support it
<whitequark>
i.e. to not inline in that case
<whitequark>
if a function has iodelay of 0 it is not even considered for inlining
<sb0>
why is it complicated to inline if iodelay > 0 and there is a "with parallel" somewhere in the call stack?
<whitequark>
it is complicated to not inline, because there is a contrived precondition of not inlining
<whitequark>
i.e. "all other branches of control have trivial delays on them that are at least as long as the delay introduced by this function"
<whitequark>
hrm
<whitequark>
well, actually, it's not all that complicated
<whitequark>
nevermind
<whitequark>
and one advantage is that you can use a function that it would be impossible to inline because it is not statically known
<cr1901_modern>
Random q: Is a buggy device driver a potential cause of zombie processes on Linux?
<larsc>
maybe ;)
<cr1901_modern>
So I'm on Windows. I've been having memory usage problems. Turns out, a piece of bloatware that came preinstalled is buggy to the point that EVERY SINGLE PROCESS that ever runs on the machine is not completely removed from memory when I close the process.
<whitequark>
typical
<cr1901_modern>
Normally, I don't notice this, b/c each zombie takes 4k private and 16k page table entries
<cr1901_modern>
When I run ./configure and make however... I start noticing :D
<cr1901_modern>
as thousands of sh, grep, sed invocations remain in memory w/o an owner
<cr1901_modern>
I'm just wondering how that's even possible- that a user app can cause such an egregious leak.
cr1901_modern has quit [Read error: Connection reset by peer]
<felix_>
sb0: i was thinking about implementing rts/cts, but not needing extra lines with xon/xoff is a good point and the performance impact shouldn't be too bad. i'm however not sure if xon/xoff works well with having the buffers in the usb-serial converter and the turnaround times over the usb bus
<felix_>
but with big enough receive side software buffers, that still should work
<felix_>
i've had a look at the kernel driver and xon/xoff should work with the ftdi chips under linux. i also had a look at the silabs usb serial chips; they have no kernel driver support for xon/xoff, even though the hardware does (at least according to the datasheet)