#m-labs on 2016-03-26 — irc logs at freenode.irclog.whitequark.org

2015-03-04 14:45 sb0 changed the topic of #m-labs to: ARTIQ, Migen, MiSoC, Mixxeo & other M-Labs projects :: fka #milkymist :: Logs http://irclog.whitequark.org/m-labs

03:35 _rht has joined #m-labs

04:39 <mithro> sb0: What is the current status of the DVI Sampler and frame buffer in the current misoc? _florent_ was mentioning something around the DMA interfaces changing and there are a couple of "TODO: rewrite dma_lasmi module" type things in the dvi_sampler code?

04:41 <sb0> mithro, I haven't tested it for ages and there has been major misoc refactorings since then, so sure enough it's broken

04:41 <mithro> sb0: okay, that is where I thought it was at

04:41 <sb0> the bugfixes shouldn't be substantial, though

06:21 rohitksingh has joined #m-labs

06:21 kuldeep has quit [Ping timeout: 248 seconds]

06:44 kuldeep has joined #m-labs

08:00 mumptai has quit [Quit: Verlassend]

09:18 mumptai has joined #m-labs

10:49 <GitHub189> [migen] sbourdeauducq pushed 2 new commits to master: https://git.io/vVvOa

10:49 <GitHub189> migen/master 04edf17 Sebastien Bourdeauducq: fhdl: disallow None statements (use empty list instead)

10:49 <GitHub189> migen/master 47ef0d1 Sebastien Bourdeauducq: Merge branch 'master' of github.com:m-labs/migen

10:49 <bb-m-labs> Hey! build conda-all #16 is complete: Success [build successful]

10:49 <bb-m-labs> Build details are at http://buildbot.m-labs.hk/builders/conda-all/builds/16

11:01 sandeepkr has joined #m-labs

11:05 <bb-m-labs> build #486 of artiq is complete: Failure [failed python_unittest_1] Build details are at http://buildbot.m-labs.hk/builders/artiq/builds/486

11:21 sb0 has quit [Quit: Leaving]

12:16 <GitHub10> [artiq] whitequark pushed 1 new commit to master: https://git.io/vVvCk

12:16 <GitHub10> artiq/master 63e367a whitequark: compiler: significantly increase readability of LLVM and ARTIQ IRs.

12:18 <bb-m-labs> build #487 of artiq is complete: Failure [failed lit_test] Build details are at http://buildbot.m-labs.hk/builders/artiq/builds/487 blamelist: whitequark <whitequark@whitequark.org>

12:21 <GitHub174> [artiq] whitequark force-pushed master from 63e367a to 3ee9834: https://git.io/vYgPK

12:21 <GitHub174> artiq/master 3ee9834 whitequark: compiler: significantly increase readability of LLVM and ARTIQ IRs.

12:23 sb0 has joined #m-labs

12:35 <bb-m-labs> build #488 of artiq is complete: Failure [failed python_unittest_1] Build details are at http://buildbot.m-labs.hk/builders/artiq/builds/488 blamelist: whitequark <whitequark@whitequark.org>

13:18 <GitHub186> [conda-recipes] whitequark pushed 1 new commit to master: https://github.com/m-labs/conda-recipes/commit/f31203682f4a9d1d94f166b626a554c0658ec155

13:18 <GitHub186> conda-recipes/master f312036 whitequark: llvmlite-artiq: bump.

13:18 <whitequark> bb-m-labs: force build --props=package=llvmlite-artiq conda-all

13:18 <bb-m-labs> build forced [ETA 12h07m12s]

13:18 <bb-m-labs> I'll give a shout when the build finishes

13:20 <GitHub104> [artiq] whitequark pushed 1 new commit to master: https://git.io/vVvB6

13:20 <GitHub104> artiq/master f5c720c whitequark: compiler: tune the LLVM optimizer pipeline (fixes #315).

13:22 <bb-m-labs> Hey! build conda-all #17 is complete: Success [build successful]

13:22 <bb-m-labs> Build details are at http://buildbot.m-labs.hk/builders/conda-all/builds/17

13:26 <whitequark> ok. hm. llvm pipeline customized. but it still doesn't quite see through all the method invocations...

13:26 <whitequark> let me figure something out.

13:36 <bb-m-labs> build #489 of artiq is complete: Failure [failed python_unittest_1] Build details are at http://buildbot.m-labs.hk/builders/artiq/builds/489 blamelist: whitequark <whitequark@whitequark.org>

13:47 <whitequark> right, not quite as slow, but still too slow

15:09 sb0 has quit [Quit: Leaving]

16:06 sb0 has joined #m-labs

16:43 sandeepkr_ has joined #m-labs

16:47 sandeepkr has quit [Ping timeout: 260 seconds]

16:47 kuldeep has quit [Ping timeout: 276 seconds]

16:48 sandeepkr__ has joined #m-labs

16:48 kuldeep has joined #m-labs

16:52 sandeepkr_ has quit [Ping timeout: 260 seconds]

19:22 rohitksingh has quit [Quit: Leaving.]

19:52 <whitequark> mh, LLVM is pretty stupid at breaking up loads of aggregates...

19:54 sandeepkr__ has quit [Ping timeout: 276 seconds]

20:00 <GitHub14> [conda-recipes] jordens pushed 1 new commit to master: https://github.com/m-labs/conda-recipes/commit/9f642c0c1ff8c93075cfc8d8296dbaed24f2cf59

20:00 <GitHub14> conda-recipes/master 9f642c0 Robert Jordens: pygit2: bump

20:08 <rjo> bb-m-labs: force build --props=package=pygit2 conda-lin64

20:08 <bb-m-labs> build forced [ETA 58 seconds]

20:08 <bb-m-labs> I'll give a shout when the build finishes

20:09 <bb-m-labs> Hey! build conda-lin64 #107 is complete: Success [build successful]

20:09 <bb-m-labs> Build details are at http://buildbot.m-labs.hk/builders/conda-lin64/builds/107

20:10 <whitequark> only lin64?

20:12 <rjo> for testing yes.

20:13 <rjo> but it's also likely to be the only package affected.

20:19 <GitHub45> [conda-recipes] jordens pushed 1 new commit to master: https://github.com/m-labs/conda-recipes/commit/6da7d8580ba12ad73a514c0afbf6fbfb3734adc8

20:19 <GitHub45> conda-recipes/master 6da7d85 Robert Jordens: pygit2: add import test

20:19 <rjo> bb-m-labs: force build --props=package=pygit2 conda-lin64

20:19 <bb-m-labs> build forced [ETA 52 seconds]

20:19 <bb-m-labs> I'll give a shout when the build finishes

20:20 <bb-m-labs> build #108 of conda-lin64 is complete: Failure [failed anaconda_upload] Build details are at http://buildbot.m-labs.hk/builders/conda-lin64/builds/108

20:21 <rjo> bb-m-labs: force build --props=package=pygit2 conda-win64

20:21 <bb-m-labs> build forced [ETA 6m09s]

20:21 <bb-m-labs> I'll give a shout when the build finishes

20:21 <rjo> bb-m-labs: force build --props=package=pygit2 conda-win32

20:21 <bb-m-labs> build forced [ETA 5m25s]

20:21 <bb-m-labs> I'll give a shout when the build finishes

20:21 <rjo> make nobody feel left out.

20:22 <bb-m-labs> Hey! build conda-win64 #81 is complete: Success [build successful]

20:22 <bb-m-labs> Build details are at http://buildbot.m-labs.hk/builders/conda-win64/builds/81

20:22 <bb-m-labs> Hey! build conda-win32 #50 is complete: Success [build successful]

20:22 <bb-m-labs> Build details are at http://buildbot.m-labs.hk/builders/conda-win32/builds/50

21:18 <GitHub52> [conda-recipes] whitequark pushed 1 new commit to master: https://github.com/m-labs/conda-recipes/commit/c2e62fddcea4a47e9b183c207d6668ded85b3dca

21:18 <GitHub52> conda-recipes/master c2e62fd whitequark: llvmlite-artiq: bump.

21:18 <whitequark> bb-m-labs: force build --propx=package=llvmlite-artiq conda-all

21:18 <bb-m-labs> Something bad happened (see logs)

21:19 <whitequark> what

21:19 <whitequark> "something bad"??

21:19 <whitequark> bb-m-labs: force build --props=package=llvmlite-artiq conda-all

21:19 <bb-m-labs> build forced [ETA 6h05m44s]

21:19 <bb-m-labs> I'll give a shout when the build finishes

21:20 <bb-m-labs> build #109 of conda-lin64 is complete: Success [build successful] Build details are at http://buildbot.m-labs.hk/builders/conda-lin64/builds/109

21:21 <whitequark> rjo: do you have a testcase for #338?

21:22 <whitequark> hm, nevermind, I made one

21:23 <rjo> https://github.com/m-labs/artiq/blob/master/artiq/test/coredevice/test_rtio.py#L258

21:23 <bb-m-labs> Hey! build conda-all #18 is complete: Success [build successful]

21:23 <bb-m-labs> Build details are at http://buildbot.m-labs.hk/builders/conda-all/builds/18

21:23 <rjo> set that to < 20 µs max.

21:34 <whitequark> rjo: with my latest optimizer tweaking i improved #298 by a factor of 100 and #338 by a factorof 150

21:35 <whitequark> should reduce compile time too

21:36 <rjo> what is the final absolute number? that's what matters.

21:36 <whitequark> 1us

21:36 <rjo> on both?

21:36 <whitequark> that's for PulseRateDDS

21:37 <whitequark> for that RTIO loop it's 250ns

21:37 <rjo> that sounds reasonable. please tie down the unittests so that we don't regress again.

21:37 <whitequark> and I can actually further improve both, though not by much

21:39 <whitequark> mainly, I need to factor out very cold bounds checking code out of the loops

21:39 <rjo> it's a reasonable numer. i remember having 170 ns for a 75 MHz sys_clock in a very old version of RTIO (ventilator) with hand written C and lm32 a few years back.

21:39 <whitequark> since it pessimizes the inliner

21:39 <whitequark> and the second thing is it constantly reads and writes the global now

21:40 <whitequark> 170ns might be achieaable

21:40 <rjo> yeah. i can see that 64 bit stuff actually dominating eventually.

21:40 <whitequark> pulse_rate_dds still has FP math

21:41 <rjo> to repeat: RTIO pulse rate is 1/250ns now?

21:41 <whitequark> mostly because frequency_to_ftw has a division, which has a ZeroDivisionError branch, which ends up inflating that function

21:41 <whitequark> take the example in this issue: https://github.com/m-labs/artiq/issues/298

21:41 <rjo> and DDS pulse rate is 1/1us with a bit of fp math still in there?

21:42 <whitequark> self.runKernel(250*ns) succeeds

21:42 <whitequark> er, sorry, 310*ns apparently is the minimal value

21:42 <whitequark> but that's not pulse rate.... hmmm

21:46 <whitequark> oh, I misunderstood how PulseRateDDS works.

21:49 <whitequark> rjo: http://hastebin.com/urihamopiy.py

21:49 <whitequark> these are the lowest values that succeed.

21:54 <GitHub195> [conda-recipes] jordens pushed 1 new commit to master: https://github.com/m-labs/conda-recipes/commit/f25d3e9b8382de8a1a9aa30ef298a92f386296d4

21:54 <GitHub195> conda-recipes/master f25d3e9 Robert Jordens: libgit/pygit2: upgrade (hoping for bug fixes), tie down dependency

21:55 <rjo> bb-m-labs: force build --props=package=libgit2 conda-all

21:55 <bb-m-labs> build forced [ETA 3h04m55s]

21:55 <bb-m-labs> I'll give a shout when the build finishes

21:56 <rjo> whitequark: ok. TTL pulse rate is good for now. it does one event per 327 ns. that's good. please tie it down.

21:56 <rjo> whitequark: DDS pulse rate is probably still killed by FP.

21:56 <whitequark> yes.

21:57 <whitequark> as for TTL, it could be faster but LLVM is being unreasonably stupid about the `now` global

21:57 <whitequark> it should convert it into a local but it doesn't

21:57 <whitequark> I think that pass is not in 3.5

21:57 <rjo> whitequark: my guess at that something like 20 µs should be doable since it is about 20 events.

21:59 <rjo> ack. maybe it's nice to track the `now` handling in an issue (where we can comaplain extensively about llvm-or1k being old)

21:59 <rjo> and which we can slap others with.

22:00 <whitequark> huh? I don't think that will motivate anyone to forward-port the backend

22:00 <whitequark> it's a massive amount of work

22:00 <rjo> this seems to be absoolutely no problem: "-- Found PythonInterp: C:/Python27/python.exe (found version "2.7.11") "

22:01 <bb-m-labs> Hey! build conda-all #19 is complete: Success [build successful]

22:01 <bb-m-labs> Build details are at http://buildbot.m-labs.hk/builders/conda-all/builds/19

22:01 <rjo> just so that all issues related to that old llvm congregate.

22:03 <rjo> bb-m-labs: force build --props=package=pygit2 conda-all

22:03 <bb-m-labs> build forced [ETA 1h35m40s]

22:03 <bb-m-labs> I'll give a shout when the build finishes

22:04 <whitequark> rjo: looks like no version of llvm is able to optimize that.

22:04 <whitequark> i can write a pass, i think.

22:04 <whitequark> i think the reason there's no such pass is that the utility of the pass is fairly... marginal

22:04 <bb-m-labs> Hey! build conda-all #20 is complete: Success [build successful]

22:04 <bb-m-labs> Build details are at http://buildbot.m-labs.hk/builders/conda-all/builds/20

22:05 <whitequark> i.e. it's only really useful if everything is inlined into a single function

22:06 <whitequark> other than fixing this `now` issue and marking the TTLOut.channel attribute as immutable there is nothing to be done to increase the TTL pulse rate

22:06 <whitequark> since the code is basically optimal already

22:07 <rjo> ack.

22:08 <whitequark> there's some modest PIC overhead, but not too much

22:09 <rjo> one thing that is still in there is marking a few of those registers non-volatile.

22:10 <whitequark> the inner loop is composed of 52 instructions

22:10 <whitequark> going to non-PIC can save you, uh, I think four?

22:10 <whitequark> (52 instructions not counting those in rtio_output)

22:14 <whitequark> actually, nope

22:14 <whitequark> two instructions

22:14 <whitequark> the non-PIC version is 50.

22:15 <whitequark> I think two of them stopped being loads, but that's not really much difference

22:15 <whitequark> so I think PIC overhead can be considered negligible..

22:15 <whitequark> ok. let me see what I can do with PulseRateDDS.

22:18 <whitequark> also, I looked at the PulseRate test (the actual test code) that uses exceptions

22:18 <whitequark> and the reason it's just a 50ns worse than the code in that hastebin, which doesn't use exceptions, is because I used LLVM's zero-cost exception handling

22:18 <whitequark> actually not even 50ns, it's exactly same

22:28 <GitHub176> [artiq] whitequark pushed 3 new commits to master: https://git.io/vVfel

22:28 <GitHub176> artiq/master 186a564 whitequark: compiler: make quoted functions independent of outer environment.

22:28 <GitHub176> artiq/master 5aec82b whitequark: test_pulse_rate: tighten upper bound to 1310ns.

22:28 <GitHub176> artiq/master 20ad762 whitequark: llvm_ir_generator: generate code more amenable to LLVM's GlobalOpt....

22:35 <whitequark> rjo: I think there is a problem with the PulseRateDDS test.

22:35 <whitequark> it does 1000 iterations of setting DDSes

22:36 <whitequark> and this currently results in a 500us value

22:36 <whitequark> however, if I enlarge the number of iterations to 10000, it results in 2500us per pulse

22:37 <whitequark> so I think with 1000 iterations, the measured value is lower than what is real; what happens is that every time it runs it "borrows" a chunk of time from break_realtime

22:37 <whitequark> but the iteration count is not high enough that it results in an underflow.

22:38 <whitequark> the higher I make the mu value in break_realtime, the lower the measured value becomes.

22:43 <whitequark> whereas, if I raise the iteration count to 10000, then the value is the same as with 30000 iterations

22:43 <bb-m-labs> build #490 of artiq is complete: Failure [failed python_unittest_1] Build details are at http://buildbot.m-labs.hk/builders/artiq/builds/490 blamelist: whitequark <whitequark@whitequark.org>

22:43 <whitequark> and independent of break_realtime slop

22:43 <whitequark> what the heck

22:44 <whitequark> why does that test vary, I wonder...

22:45 <rjo> yes. i had suspected that there is something weird going on. i have DrainErrors in test_spi.

22:45 <rjo> also the dds schedule there events in in the past. by themselves the methods are zero-delay.

22:45 <GitHub20> [artiq] whitequark force-pushed master from 5aec82b to 2a210d7: https://git.io/vYgPK

22:45 <GitHub20> artiq/master 2a210d7 whitequark: test_pulse_rate: tighten upper bound to 1500ns.

22:46 <whitequark> rjo: so the *real* value of pulse_rate for dds is 4.6ms.

22:46 <rjo> pretty sure that's not true.

22:46 <whitequark> how so?

22:46 <whitequark> 2.3ms per a bunch of floating point operations seems reasonable to me

22:47 <whitequark> double-precision fp, too...

22:47 <rjo> hell no

22:47 <rjo> that's 280 000 cycles.

22:47 <whitequark> okay, how about a more traditional benchmark method

22:47 <whitequark> is there some kind of counter?

22:48 <whitequark> get_rtio_counter_mu.

22:48 <rjo> you are seeing some effect of the rtio fifos clogging up and errors piling that you then pop for the next test.

22:48 <whitequark> mhm.

22:48 <whitequark> there's a 10us delay

22:48 <whitequark> 10ms

22:49 <rjo> or1k has the cpu cycle counter.

22:49 <rjo> i think we even have that enabled.

22:49 <whitequark> self.core.get_rtio_counter_mu()

22:49 <rjo> yes. that as well.

22:50 <rjo> but that cycle counter can be used to measure small snippets.

22:50 <rjo> you will also see fifo backpressure. even for your "traditional" test.

22:51 <rjo> whitequark: if you are happy with the llvm-changes, can you cherry-pick them (or reverse-rebase) to release-1?

22:51 <rjo> s/llvm-changes/compiler changes/

22:52 <whitequark> well, i'm not very happy with having to do that

22:52 <whitequark> why did you branch the release before the milestone was finished anyway?

22:53 <rjo> look at the other changes that modify things.

22:53 <rjo> you could have developed your changes in release-1 and them merged them into master.

22:53 <rjo> that was your choice.

22:55 <whitequark> so i made a benchmark using the rtio counter. i measure 3.9-4.9ms per that dds batch on a wide range of cycles

22:55 <whitequark> 3.1ms per batch with 100 iterations, 4.9ms per batch with 5000 iterations

22:55 <whitequark> i'm not sure why it varies so much. but it's definitely somewhere in the "a few milliseconds" range.

22:55 <rjo> with empty fifos, no backpressure, and no errors about to be poped?

22:56 <rjo> check that first.

22:56 <whitequark> rjo: http://hastebin.com/nojosoheyu.py

22:56 <whitequark> this is the code I use

22:56 <whitequark> I assume the fifos are empty when I first run it. i don't handle any errors.

22:56 <whitequark> not sure what backpressure is, in this context

22:57 <rjo> if the fifos are full because the phy is waiting for time to pass, you are getting backpressure. rtio_write() waits until there is space.

22:58 <whitequark> how large are the fifos?

22:59 <rjo> couple hundred entries iirc. a dds.set() is about 10 entries.

22:59 <whitequark> oh, so even 100 cycles is too much

22:59 <rjo> no. its fine if the timing is correct and the phys are executing them.

23:00 <whitequark> sure, I mean in my case

23:00 <whitequark> ok. yes. I see your point.

23:00 <rjo> in your case the 5ms will be limiting for sufficiently large n.

23:02 <whitequark> you were right. it was backpressure. with this code I get 260us: http://hastebin.com/ticuwaguzi.py

23:02 <bb-m-labs> build #66 of artiq-win64-test is complete: Failure [failed python_unittest] Build details are at http://buildbot.m-labs.hk/builders/artiq-win64-test/builds/66 blamelist: whitequark <whitequark@whitequark.org>

23:02 <bb-m-labs> build #491 of artiq is complete: Failure [failed] Build details are at http://buildbot.m-labs.hk/builders/artiq/builds/491 blamelist: whitequark <whitequark@whitequark.org>

23:02 <rjo> cut that 5 ms to something like 100 us. pretty sure that's sufficient so subsequent batches don't overlap in your case. if you cut that 5ms to much you will see RTIOSequenceError because of overlapping batches.

23:04 <whitequark> yep. with this new benchmark i get ~266us on a very wide range of n's

23:04 <rjo> that code is weird.

23:04 <whitequark> is it?

23:04 <whitequark> it's a way to ensure that fifos are cleared in time

23:05 <rjo> i think you are just measuring the fifo depth here.

23:05 <whitequark> am I?

23:05 <whitequark> it returns 266us even with n=10

23:06 <whitequark> well, 268us. close enough.

23:06 <rjo> you push a bunch of events always 1ms in the future over and over again.

23:06 <whitequark> hm.

23:07 <whitequark> i see your point

23:07 <rjo> that will generally succeed unless there are events in the fifo that prevent new events from getting in and through in time.

23:07 <whitequark> yes

23:07 <whitequark> so if there are none, am i not measuring the time it takes to submit events?

23:09 <rjo> yes. if the fifo is empty and stays non-full during the entire game. you will measure that time modulo the overhead due to setting and getting now.

23:09 <whitequark> excellent. that's what i wanted to measure.

23:10 <rjo> but for large n (>~ 50) i expect this to be wrong.

23:11 <rjo> anyway. good night. see you tomorrow.

23:12 <whitequark> night.

23:12 <whitequark> this actually returns the same value for even n=100000.

23:12 <whitequark> though i do not understand why