#m-labs on 2015-11-21 — irc logs at freenode.irclog.whitequark.org

2015-03-04 14:45 sb0 changed the topic of #m-labs to: ARTIQ, Migen, MiSoC, Mixxeo & other M-Labs projects :: fka #milkymist :: Logs http://irclog.whitequark.org/m-labs

00:35 ylamarre1 has joined #m-labs

00:36 ylamarre has quit [Remote host closed the connection]

00:48 ylamarre1 has quit [Quit: ylamarre1]

01:02 <rjo> sb0: jesd204b seems to have subclasses 1 and 2 that support some deterministic latency.

01:27 Mon_ has joined #m-labs

01:27 Mon_ is now known as Guest88321

02:08 <sb0> https://github.com/m-labs/artiq/blob/new-py2llvm/lit-test/test/interleaving/nonoverlapping.py is OK

02:09 <sb0> overlapping.py OK as well

02:11 <sb0> indirect_arg/indirect are like nonoverlapping

02:11 <sb0> whitequark, yes, looks good. is my code not doing that?

02:13 <sb0> whitequark, the installation instructions in the doc still mention that ubuntu version you changed and Robert objected to

02:18 ysionnea1 has joined #m-labs

02:18 whitequa1k has joined #m-labs

02:19 ysionneau has quit [Write error: Broken pipe]

02:19 whitequark has quit [Remote host closed the connection]

02:23 bentley` has quit [Ping timeout: 240 seconds]

02:29 <GitHub171> [artiq] sbourdeauducq pushed 1 new commit to master: http://git.io/v4x80

02:29 <GitHub171> artiq/master 8b4b269 Felix Held: doc/manual/installing: fix a bug, add Fedora specific instructions...

02:29 <sb0> felix_, thanks!

02:38 travis-ci has joined #m-labs

02:38 <travis-ci> m-labs/artiq#590 (master - 8b4b269 : Felix Held): The build is still failing.

02:38 <travis-ci> Build details : https://travis-ci.org/m-labs/artiq/builds/92389056

02:38 travis-ci has left #m-labs [#m-labs]

04:43 <rjo> sb0: https://github.com/jordens/misoc/commits/master and https://github.com/nist-ionstorage/artiq/commits/master with the changes discussed. want me to send patches?

04:55 bentley` has joined #m-labs

05:28 <sb0> rjo, no __fhdl__?

05:30 <rjo> seems to be a bigger change. one would want to push most of the wrap() logic into Signal/Constant as well, no?

05:32 <sb0> the operators and other elements are already calling wrap()

05:33 <sb0> so wrap() could attempt to call __fhdl__ if it does not recognize the object

05:33 <sb0> but

05:33 <sb0> this introduces another tricky corner case: operations between two __fhdl__ objects (e.g. CSRConstant + CSRConstant)

05:50 <rjo> the question is why wrap() (and many other functions) need all that intelligence and knowledge about the things they handle. i would have pushed that into the objects in general.

05:50 <rjo> for CSRConstant + CSRConstant, the operator would convert each to fhdl.

05:51 <rjo> actually, wrap() should only concern itself with things that have been determined to be "not fhdl".

05:51 <rjo> Operators should be able to blindly to try: argument.__fhdl__(); except: wrap(argument)

05:53 <rjo> and Signal and constants can return self. CSRConstants can return their constant, CSRStatus CSRStorage similar.

05:59 <rjo> sb0: any idea what transciever would be needed for DAC39J84 (what a beast...)?

06:02 <sb0> kintex 7 should do it

06:04 <rjo> oh minimum serdes rate 0.78 Gbps.

06:06 <sb0> kintex 7 can do 12.5Gbps

06:07 <sb0> it will eat all the transceivers of a 70t or 160t, though

06:15 <rjo> the the cost per channel is more than ~500 or ~1k it will become inconvenient very quickly and there will be few users, especially as general purpose DAC (pdq3 style).

06:20 <sb0> so that would be 2k to 4k for a board with that DAC?

06:23 <sb0> k7s aren't cheap... around 1k on digikey for a 325

06:29 <sb0> Artix7 are 6.6Gbps, and lattice fpgas crawl at 3.2

06:33 <sb0> cyclone is also 6.1 gbps, and arria/stratix are expensive

06:43 Guest88321 has quit [Quit: This computer has gone to sleep]

06:52 <sb0> rjo, you cannot redefine what + does in python, unless at least one of the objects has a custom __add__ or __radd__

06:52 <sb0> and same for other operators

06:53 <sb0> so those things that offer __fhdl__ would also need to overload all operators to be consistent

07:12 <whitequa1k> sb0: the problem with your code would arise in slightly different case

07:13 <whitequa1k> specifically...

07:14 <whitequa1k> with parallel: f(x(), delay(2), x(), delay(2)); g(delay(3), y())

07:14 <whitequa1k> I suppose it was sufficiently obscure to not matter

07:53 siruf has quit [Ping timeout: 240 seconds]

07:54 siruf has joined #m-labs

08:19 Mon_ has joined #m-labs

08:20 Mon_ is now known as Guest48315

08:24 jaeckel has quit [Ping timeout: 240 seconds]

08:27 Guest48315 has quit [Quit: This computer has gone to sleep]

08:27 Guest48315 has joined #m-labs

08:28 jaeckel has joined #m-labs

08:35 Guest48315 has quit [Quit: This computer has gone to sleep]

08:43 Mon_1 has joined #m-labs

08:58 <whitequa1k> hmm, actually after the current changes it should be possible to use the ttl pulse function

09:08 Mon_1 has quit [Ping timeout: 255 seconds]

09:24 <GitHub55> [artiq] whitequark pushed 2 new commits to new-py2llvm: http://git.io/v4pOc

09:24 <GitHub55> artiq/new-py2llvm 5cd12ff whitequark: compiler.iodelay: fold MUToS and SToMU.

09:24 <GitHub55> artiq/new-py2llvm a01e328 whitequark: transforms.interleaver: don't assume all delay expressions are folded.

10:13 <sb0> rjo, if cost matters, maybe artix7 + lvds dac is better...

10:41 <sb0> whitequa1k, are all the unittests passing on the board?

10:51 mumptai has joined #m-labs

10:53 <sb0> whitequa1k, ah, i see you took that horrible train as well

11:41 <whitequa1k> sb0: horrible train?

11:41 <whitequa1k> what do you mean?

11:42 <sb0> the mtr with the advertisement everywhere

11:42 <whitequa1k> oh, yeah

11:50 <whitequa1k> as for unittests, definitely not, the inlining is not here yet

11:53 <sb0> doesn't the ttl pulse function require inlining?

11:54 <whitequa1k> yes. I've just div=scovered that it doesn't actually work

11:55 <whitequa1k> sb0: actually, on careful examination, there's a larger problem

11:55 <whitequa1k> you said you want inlining *only when necessary*, right?

11:56 <whitequa1k> but it's not actually possible to do in almost any case

11:57 <whitequa1k> like, the only case where you can avoid inlining is when no functions actually execute in parallel

11:57 <sb0> yes

11:58 <whitequa1k> when is that useful?

12:17 <felix_> i've rebased the ppp patch onto the current master, made pppd record everything to a file, had a look with wireshark at that dump and it really seems that the device-side receive path looses data when packets get processed

12:18 <felix_> tested that with flood pings and the first pin always gets a response, but after a ping which gets a respons, the next one (or sometimes two) don't get a response

12:18 <felix_> (on the pipistrello board)

12:19 <felix_> so yeah, i'm quite confident that implementing hardware flow control will solve at least a big part of the problem

12:20 <felix_> the direction from the device to the computer seems to be completely fine

13:55 rohitksingh has joined #m-labs

13:59 nicksydney has joined #m-labs

14:13 rohitksingh has quit [Ping timeout: 272 seconds]

14:16 Mon_ has joined #m-labs

14:17 Mon_ is now known as Guest32581

14:27 rohitksingh has joined #m-labs

14:32 <cr1901_modern> whitequa1k: What do you mean by this tweet? https://twitter.com/whitequark/status/667982701811527680 wrt "signed overflow being undefined"; this sounds like something decided by the standards committee, rather than being inherent to the language.

14:32 <cr1901_modern> I'm guessing one language-based solution here would be to make C support "arbitrarily large integers" built-in?

15:06 Guest32581 has quit [Quit: This computer has gone to sleep]

15:14 <whitequa1k> signed overflow being undefined is necessary to vectorize for loops with an int induction variable

15:15 <whitequa1k> it is reasonable to want to vectorize loops, it's just that most languages are able to provide the invariant the vectorizer wants without resorting to UB

15:25 <cr1901_modern> Well what would be a better invariant than "assume it doesn't happen"? A special datatype that promises that overflow does not happen (and crashes the program if you try)? I'm thinking in terms of a minimal runtime for bare metal targets.

15:27 <whitequa1k> higher-order functions, for example

15:28 <whitequa1k> or, well, something like stl's .begin()/.end(). either will guarantee there's no overflow

15:28 <whitequa1k> trap on overflow is highly desirable but expensive enough in terms of inhibiting optimizations that rust, after long debate, decided against them

15:29 <whitequa1k> you could also reformulate the for loop in terms of ranges, which means that the compiler can insert a check before the SIMD part

15:30 <cr1901_modern> So in an environment w/o a vectorizer (for simplicity of visualization), with a higher order function, you run a snippet of code that figures out the next index to go to before running the loop body?

15:31 <whitequa1k> no, a higher-order function embeds the invariant that the compiler wants

15:31 <whitequa1k> then you inline it and you're back to the kind of code that a for loop generates, only now you have that invariant

15:37 <cr1901_modern> Oh right; a higher order function not only returns a function, but can also take a function (presumably the body of code you want to run in a loop) as input.

15:42 <cr1901_modern> It seems in this case the difference between C and other languages is that loop indices are assumed to be monotonically increasing or decreasing, but other languages enforce it, taking care of edge cases (just before overflow or underflow happens) manually, while C just assume it doesn't happen and leaves it to the programmer.

15:43 <cr1901_modern> s/manually/as special code paths/

15:48 <cr1901_modern> (I assume the monotonically increasing/decreasing is the important invariant here for proper vectorization?)

15:59 <whitequa1k> yes

16:04 <cr1901_modern> What type of function would you use to provide the monotonic invariant? (Splitting the loop into two mutually exclusive ranges sounds like a decent idea for loops where one iteration does not depend on data from the previous)

16:08 Mon_ has joined #m-labs

16:09 Mon_ is now known as Guest50687

16:45 Guest50687 has quit [Quit: This computer has gone to sleep]

16:45 Guest50687 has joined #m-labs

16:52 <sb0> whitequa1k, rjo wanted it. i guess for data processing, or long functions that are not executed in parallel with others...

16:53 <sb0> felix_, i see. are you going to implement xon/xoff? that would make it more portable, as it doesn't need extra lines that many boards do not have.

16:54 Guest50687 has quit [Quit: This computer has gone to sleep]

16:58 Guest50687 has joined #m-labs

17:08 <whitequa1k> sb0: oh, wait, actually *inlining* only when necessary was never an issue, it's not in github issue texts

17:08 <whitequa1k> rather, *unrolling* only when necessary

17:09 <sb0> whitequa1k, it should inline only when necessary too

17:11 <sb0> also, if possible, make the unrolling/inlining/interleaving produce valid IR even when there are unresolved parallel blocks, and throw an error about them at a later stage

17:11 <sb0> as we might someday have HW support for those blocks

17:11 <sb0> but if that's hard, forget it

17:13 Guest50687 has quit [Quit: This computer has gone to sleep]

17:14 <whitequa1k> sb0: that's actually exactly what happens now, all parallel blocks are valid IR and they simply explode in LLVMIRGenerator

17:14 <whitequa1k> so I don't have to change anything :)

17:18 Mon_ has joined #m-labs

17:18 Mon_ has quit [Client Quit]

17:18 <whitequa1k> sb0: so let's revisit inlining only when necessary

17:19 <whitequa1k> when you call a function without inlining, you have to assume the worst case, i.e. it advances the timeline by the amount you know, and it submits an event at the beginning and end of the period it claims

17:20 <whitequa1k> therefore, the only case when you can call it without inlining, is when you interleave it with an immediate (not inside another function) call to delay in the other branch of the with parallel block

17:20 <whitequa1k> or maybe several branches

17:20 <sb0> or when you don't have to interleave it with anything

17:21 <whitequa1k> yes, in other words, in trivially serial code

17:21 whitequa1k is now known as whitequark

17:22 <whitequark> the case where you don't have to interleave it with anything is easy to handle but the case where you interleave it with a delay is more tricky...

17:22 <sb0> maybe rjo has a different idea, but i believe that serial code, or function calls that don't touch the timeline, are the only relevant non-inline cases

17:23 <whitequark> oh, function calls that don't touch the timeline are obviously not inlined, that is trivial

17:23 <whitequark> but accurately handling code inside with parallel blocks that turns out to be effectively serial is trickier

17:24 <sb0> doing it properly requires thread-like stuff which is slow

17:25 <whitequark> no, I mean, it's just a rarely used edge case that might tickle transform bugs

17:27 <sb0> I don't think we should do those complicated things... just inline when in doubt

17:29 rohitksingh has quit [Ping timeout: 255 seconds]

17:29 <sb0> i.e. i guess inlining anything that touches the timeline when in a parellel block is fine

17:30 <sb0> what cases do you want to support without inlining?

17:30 <whitequark> I meant the one you just described, it would be like half of the interleave transform to support it

17:30 <whitequark> i.e. to not inline in that case

17:30 <whitequark> if a function has iodelay of 0 it is not even considered for inlining

17:31 <sb0> why is it complicated to inline if iodelay > 0 and there is a "with parallel" somewhere in the call stack?

17:32 <whitequark> it is complicated to not inline, because there is a contrived precondition of not inlining

17:32 <whitequark> i.e. "all other branches of control have trivial delays on them that are at least as long as the delay introduced by this function"

17:33 <whitequark> hrm

17:33 <whitequark> well, actually, it's not all that complicated

17:33 <whitequark> nevermind

17:36 <whitequark> and one advantage is that you can use a function that it would be impossible to inline because it is not statically known

17:43 rohitksingh has joined #m-labs

17:58 <ysionnea1> http://phys.org/news/2015-11-quantum-entanglement-room-temperature-semiconductor.html

18:00 <whitequark> infrared lasers, no goggles

18:33 <cr1901_modern> Random q: Is a buggy device driver a potential cause of zombie processes on Linux?

18:34 <larsc> maybe ;)

18:38 <cr1901_modern> So I'm on Windows. I've been having memory usage problems. Turns out, a piece of bloatware that came preinstalled is buggy to the point that EVERY SINGLE PROCESS that ever runs on the machine is not completely removed from memory when I close the process.

18:39 <whitequark> typical

18:39 <cr1901_modern> Normally, I don't notice this, b/c each zombie takes 4k private and 16k page table entries

18:40 <cr1901_modern> When I run ./configure and make however... I start noticing :D

18:40 <cr1901_modern> as thousands of sh, grep, sed invocations remain in memory w/o an owner

18:42 <cr1901_modern> I'm just wondering how that's even possible- that a user app can cause such an egregious leak.

18:52 cr1901_modern has quit [Read error: Connection reset by peer]

19:27 <felix_> sb0: i was thinking about implementing rts/cts, but not needing extra lines with xon/xoff is a good point and the performance impact shouldn't be too bad. i'm however not sure if xon/xoff works well with having the buffers in the usb-serial converter and the turnaround times over the usb bus

19:28 <felix_> but with big enough receive side software buffers, that still should work

19:37 rohitksingh has quit [Quit: Leaving.]

20:01 cr1901_modern has joined #m-labs

20:02 <felix_> http://www.ftdichip.com/Support/Knowledgebase/an232b_04flowctl.htm suggests that the ftdi chip has some sort of hardware-assisted support for xon/xoff; i'll have a look at the linux driver later

20:31 ysionnea1 is now known as ysionneau

21:08 acathla has quit [Ping timeout: 244 seconds]

21:13 acathla has joined #m-labs

23:45 <felix_> i've had a look at the kernel driver and xon/xoff should work with the ftdi chips under linux. i also had a look at the silabs usb serial chips; they have no kernel driver support for xon/xoff, even though the hardware does (at least according to the datasheet)