<whitequark>
why didn't we have that on lda_controller in the first place?..
<sb0>
I guess it's an oversight
<whitequark>
sb0: so, speaking of async exceptions
<whitequark>
I looked up what would have to be done with unwinding and adding CFI info to crt0 along with a small patch to restore EPCR and ESR (emulating l.rfe) should be enough
<whitequark>
(currently we can raise and get a backtrace asynchronously but trying to catch such an exception will not succeed)
<whitequark>
and I still fail to see where exactly the race conditions appear. but anyway, even if they do, it's trivial to add the primitives from your paper.
<sb0>
if you do something and you want to guarantee it's undone, the way to do it is "do(); try... finally: undo()"
<sb0>
async exceptions break this. with the current system, there is also no way to have this guarantee, but this is clear and not a subtle race condition
<whitequark>
ok. yes. i agree.
<whitequark>
we can add the with block/with unblock primitives.
<sb0>
also, the way watchdogs work on the device right now (comms CPU resetting the kernel CPU) is consistent with the way they work on the host (master killing the worker)
<whitequark>
the experience of using watchdogs is crao
<whitequark>
*crap
<whitequark>
that should change
<sb0>
what else to do without asynchronous exceptions? we can improve the error reporting, but that's about it...
<whitequark>
we could add async exceptions. but improving error reporting is the least we must do.
<sb0>
we can try to push python to add block/unblock, but that will take 3 years and likely will get rejected
<whitequark>
yes.
<whitequark>
it's not easily implemented on non-CPython too
<whitequark>
well, the problem is not quite that.
<sb0>
the error message should be fine with host watchdogs. on the device, this requires dealing with a nontrivial amount of C/lwip bullshit
<whitequark>
the problem is that killing a thread at random time, or equivalently raising an async exception from /outside/ of the thread, results in data structures being in an inconsistent state
<whitequark>
which is why Thread.kill is deprecated and discouraged in Java, etc
<whitequark>
killing threads basically does not work together with shared-memory parallelism
<sb0>
you do want the device to drop the connection at this stage and behave as if the host suddently disconnected
<whitequark>
do I? can't I just send a packet saying "watchdog expired" and /then/ kill the connection?
<sb0>
"sending a packet" is complicated courtesy of C and lwip
<sb0>
I might be wrong, but it could be that you may have to keep it in memory for a while until any retransmissions have been done and until then actually close the connection, etc.
<sb0>
*only then
<sb0>
well. maybe assuming the host will receive the "watchdog expired" packet is fine, and you can go through the normal packet path...
bb-m-labs has quit [Quit: buildmaster reconfigured: bot disconnecting]
<mithro>
Gah, I was sure that I had done a search and replace to fix that...
<sb0>
misoc was also supposed to run on the m1
<sb0>
it does actually, though it hasn't been tested for a long while and therefore is most certainly broken
<sb0>
yes, flickernoise is C+RTEMS
<sb0>
and it does use those accelerators you mention
<mithro>
sb0: I think I read your PhD which talked about them
<sb0>
I don't have a PhD, that was a MSc
<sb0>
"pivot", hm, not sure if that "lean startup" link gives the best idea of what exactly is going on
<mithro>
Well, I don't actually know what went on... It was my best guess...
<sb0>
just say "moved on", without a link
<mithro>
fair enough
_whitelogger_ has joined #m-labs
_whitelogger has quit [Remote host closed the connection]
<sb0>
what is this doc for btw?
<mithro>
sb0: Partly for GSoC students, partly to share with people who are interested in helping getting their Milkymist expansion board on the Opsis going
<sb0>
oh, so there *are* such people?
<mithro>
sb0: let me rephrase that, partly to share with people who can't run away from me fast enough to not try and rope them into helping out :-P
<GitHub69>
[artiq] sbourdeauducq pushed 1 new commit to master: https://git.io/vaTz1
<GitHub69>
artiq/master 9d1903a Sebastien Bourdeauducq: coredevice/i2c,ttl,spi: consistent device get
<whitequark>
so I decided to redownload the conda packages
<whitequark>
it's doing that at... 50kb/s?
<whitequark>
which is half a megabit?
<whitequark>
did conda decide to answer the question of "can it be even more shitty" with a resounding "YES"?
<whitequark>
ok. yes. i see. it does not even consider local packages.
<whitequark>
this is ... very stupid
bb-m-labs has quit [Ping timeout: 260 seconds]
bb-m-labs has joined #m-labs
FabM has quit [Remote host closed the connection]
<whitequark>
rjo: how much on a scale from 1 to 10 do you want 6to4?
<whitequark>
the router is running some weird build of openwrt with a kernel version slightly off from the official repo...
<whitequark>
rjo: I would install a trunk snapshot but openwrt's CI infra is sort of broken at the moment, so I cannot.
<whitequark>
I can build openwrt from source just for this occasion but I'm not sure if it's worth
FabM has joined #m-labs
<cyrozap>
sb0: I've pushed my working tree for the UART over JTAG stuff (plus the device I'm testing with) to GitHub here if you want to see some dirty hacks: https://github.com/cyrozap/misoc/tree/uart-over-jtag
<cyrozap>
I'm probably going to factor out all the modifications I made to flterm, or at least make it have seperate "synchronous JTAG" and "async serial" modes, because the async stuff is reeeeeally messing with the JTAG stuff right now.
<cyrozap>
But it _kinda_ works
<cyrozap>
There's just the occasional character duplication and bit flips
rohitksingh_work has quit [Quit: Leaving.]
sb0 has quit [Read error: Connection reset by peer]
rohitksingh_work has joined #m-labs
sb0 has joined #m-labs
<whitequark>
you know what, fuck it
<whitequark>
I'll just upload the artiq package before testing it
<whitequark>
if someone objects, they can go and fix conda
<rjo>
whitequark: 3
<rjo>
whitequark: if you want to use openwrt trunk, you should mirror that snapshot. otherwise you won't have packages soon.
<sb0>
hm, yes, qt doesn't allow you to remove a QDockWidget, only hide it
<sb0>
so 1) artiq_gui leaks memory 2) the right-click menu on a dock title bar is broken and can be used to crash the GUI
<sb0>
software is awful, as usual
<whitequark>
rjo: only kernel packages. and i'm putting all of the ones i build into the image itself anyway.
<sb0>
" You add dock widgets to a main window with addDockWidget()", and never remove them
<whitequark>
rjo: ah yes. I see what you meant. I would do that but as I've said, openwrt ar71xx-generic builds are down with no ETA.
<sb0>
calling deleteLater on a QDockWidget causes immediate segfault
<bb-m-labs>
I'll give a shout when the build finishes
<sb0>
ok, no it still crashes, intermittently
<sb0>
grmbl
<sb0>
"RuntimeError: wrapped C/C++ object of type AppletDock has been deleted" followed by either segfault or freeze
<whitequark>
I told you
<sb0>
in 10% of the dock closes
<whitequark>
where are you using a timer?
<sb0>
not in this particular code
<sb0>
probably some library uses one
<whitequark>
well, there is a timer that is used in pyqt
<sb0>
pyqt's memory management is suspicious to me (see previous issue with garbage collecting of shown widget) so I'm not very surprised about this new bug...
<sb0>
actually now it doesn't crash because of a timer
<sb0>
but because some asyncio coroutine later references the dock
<sb0>
with timing dependent on subprocesses, hence the intermittent nature of the crash
<whitequark>
rjo: i've confirmed that ipv6 is working.
<whitequark>
it's configured for pure SLAAC mode, no DHCPv6.
<whitequark>
Windows systems seem to have some kind of issue with that.
sb0 has quit [Read error: Connection reset by peer]
<whitequark>
they seem to have some kind of very stupid issue where they don't recognize the default gateway, it seems
fengling has quit [Ping timeout: 240 seconds]
<rjo>
whitequark: ack. but a) ssh is firewalled b) ipv6 without dns for the hosts is not fun c) the ipv4 address behind that 6to4 address is still dynamic, right?
<rjo>
anyway. we got more important stuff to un-break
sb0 has joined #m-labs
<whitequark>
rjo: a) it isn't b) yes, but I can't do that without sb0 c) not anymore
<whitequark>
ipv4 becoming static is why i did this at all
<whitequark>
ok. a) it was.
<whitequark>
sb0: there's a very bizarre problem.
<whitequark>
test_loopback, when run individually, succeeds.
<whitequark>
but when run as a part of CoredeviceTest, it fails.
<whitequark>
sb0: not sure why test_address_collision fails. and test_ttl_pulse fails because of that analyzer bug that requires an even number of messages
<rjo>
sb0: why do i need that 200*us delay? otherwise i get an underflow
<rjo>
this is firmware and gateware just before the multi-bus dds commit
<GitHub2>
[artiq] jordens pushed 6 new commits to rtiobusy: https://git.io/vakwc
<GitHub2>
artiq/rtiobusy 446dcfb Robert Jordens: Merge commit '9d1903a' into rtiobusy...
<GitHub2>
artiq/rtiobusy b0de9ee Robert Jordens: coredevice: add RTIOBusy to __all__
<GitHub2>
artiq/rtiobusy 522ec60 Robert Jordens: hardware_testbench: don't allow unused *args
<rjo>
whitequark: mosh is firewalled on ipv6
rohitksingh_work has quit [Read error: Connection reset by peer]
<sb0>
rjo, probably because break_realtime() doesn't give enough margin for a full dds programming and/or you have a pathological cache issue
<rjo>
sb0: subsequent output events with the same timestamp should trigger a replacement for ttl but should be RTIOCollision or Busy for dds and spi, right?
<rjo>
sb0: that would be extremely pathologica. it needs delays somewhere around 100-140 us.
<sb0>
does it also need it when you remove the commands before?
<rjo>
the sync()?
<sb0>
all the stuff before
<sb0>
and replacements always happen when the timestamps (including the fine part) are equal, and the addresses are equal
<rjo>
sb0: about the replacement: yes. that is ok for ttl where you can handle it gracefully like that. but for multi-cycle ops like dds and spi, that should be a collision (or busy) imho.
<sb0>
in that last code you probably need to sync the dds before the break_realtime
<rjo>
sync the dds?
<sb0>
yes, otherwise break_realtime can set now() to a lower value
<sb0>
all those special cases add a lot of branches and cruft in the rtio core code.
<sb0>
hm, actually i already changed break_realtime() so it doesn't lower now()
<rjo>
sb0: but replacement should be special case, not the collision. it should be done right.
<sb0>
whatever. there are now two cases, with and without support for replaces.
<sb0>
complexity += 1
<rjo>
yeah. exactly. whatever.
<sb0>
?
<rjo>
it's fixing a bug. increasing complexity may be the price to pay here, right?
<sb0>
in practical cases, it's not doing a replace with dds, since it doesn't write again to the same addresses
<rjo>
maybe we should rewrite spi to exploit the same accidental work around then.
<GitHub41>
[artiq] jordens pushed 2 new commits to rtiobusy: https://git.io/vakx7
<GitHub41>
artiq/rtiobusy 703fc5a Robert Jordens: hardware_testbench: also print artiq_core_exeption
<GitHub41>
artiq/rtiobusy 58e0e67 Robert Jordens: tests: test spi business
rohitksingh has joined #m-labs
<GitHub120>
[artiq] sbourdeauducq pushed 1 new commit to master: https://git.io/vaIqr