<GitHub>
[artiq] whitequark commented on issue #709: > Of course, all the registers can be re-used for the repeated initialization code, so it might be a hard-coded "safety" limit being hit somewhere or something like that.... https://github.com/m-labs/artiq/issues/709#issuecomment-293746794
<GitHub>
[artiq] whitequark commented on issue #712: > Not really, because if you create such an object, then delete the sequence, then play it back, you are programming the DMA engine with an invalid address which will result in very obscure bugs.... https://github.com/m-labs/artiq/issues/712#issuecomment-293748193
<whitequark>
sb0: will add
<sb0>
bb-m-labs, force build artiq
<bb-m-labs>
build #1458 forced
<bb-m-labs>
I'll give a shout when the build finishes
<sb0>
whitequark, maybe the btree string lookup is good enough.
<sb0>
most of the slowness seems to come from the mailbox
<klickverbot>
sb0: I'm not sure whether you guys have talked concrete numbers regarding DMA latency offline already, but just for illustration: One particular experiment where DMA would've come in handy (and hopefully will once I'm back in Oxford) has a loop that ideallly runs at ~1 MHz, branching on a TTL input each time. As I have to squeeze in some physics in that as well, just about every single cycle less will be useful.
<klickverbot>
sb0: As a corollary, string lookup or IPC on that path is not a good idea. At least, having an unsafe hatch that gets us as close as possible to a single store only for playback (possibly in addition to whatever persistent API) would be helpful.
<sb0>
fatal: destination path '/var/lib/buildbot/slaves/debian-stretch-amd64-2/miniconda/conda-bld/artiq-kc705-nist_clock_1492062031184/work' already exists and is not an empty directory.
<sb0>
so yeah, if it's like the previous times it'll keeps doing this for a few hours and then mysteriously work again
<sb0>
we need more staff to deal with this constant stream of pesky bugs.
<sb0>
bb-m-labs, force build artiq
<bb-m-labs>
build #1463 forced
<bb-m-labs>
I'll give a shout when the build finishes
<whitequark>
look at the contributor graph to conda-build, hire the top committer?
rohitksingh_work has quit [Read error: Connection reset by peer]
rohitksingh has joined #m-labs
rohitksingh has quit [Quit: Leaving.]
rqou_ has joined #m-labs
kristianpaul has quit [*.net *.split]
_florent_ has quit [*.net *.split]
rqou has quit [*.net *.split]
rqou_ is now known as rqou
_florent_ has joined #m-labs
futarisIRCcloud has quit [Quit: Connection closed for inactivity]
<rjo>
whitequark: if i pass a list from a non-kernel into a kernel, and the kernel modifies it, why is it not updated on kernel exit?
<whitequark>
rjo: only attributes are written back
<rjo>
maybe we decided to do it that way.
<whitequark>
I don't recall any particular discussion of this feature
<whitequark>
except for "it was done that way in the old compiler"
<whitequark>
I mean, it was even called "attribute writeback" from day one [of me at M-Labs]
<rjo>
whitequark: then we should either explain that in the manual or (if it is not a big amount of work) change it.
<rjo>
whitequark: ack.
<whitequark>
I'm not sure how hard is it to fix that
<whitequark>
the attribute writeback code is fairly finicky and it was difficult to get right
<rjo>
there are fundamentally three ways to return data from a kernel: (a) modify a list arg, (b) return a list (c) modify a list attribute.
<rjo>
(b) is prevented by lifetime.
<whitequark>
list attribute?
<rjo>
(a) is prevented by the current implementation
<whitequark>
also, that's still wrong
<whitequark>
(d) pass a list to an RPC
<rjo>
yes.
<whitequark>
(this is actually how attribute writeback is implemented internally, it uses a special RPC)
<rjo>
ok. then let's leave the non-updating of mutable args as is.
<whitequark>
ah yes, I know why this is problematic
<rjo>
would it be hard to implicitly extend the lifetime of returned objects to infinity?
<whitequark>
I'm quite certain we cannot afford updating either mutable lists or mutable objects passed as arguments
<whitequark>
the attribute writeback tables are generated at compile-time and appending to them at runtime requires allocating
<rjo>
but then at compile time you also know the args.
<whitequark>
how so?
<rjo>
well that's what you are compiling.
<whitequark>
let's backtrack to your original question
<whitequark>
"if i pass a list from a non-kernel into a kernel, and the kernel modifies it"
<whitequark>
there's more than one way to do it. you can return a list from an RPC
<whitequark>
(or a mutable object)
<whitequark>
besides, there's still the inverse problem, modifying objects at Python side doesn't get reflected on kernel side
<whitequark>
which I don't think is solvable in a sane way
<rjo>
but that's always there since we only write-back on return and not e.g. on kernel-to-host-rpc.
<rjo>
the other way round.
<whitequark>
what I'm trying to say is that the set of attributes that get synchronized one way or another and the timing of that are always a somewhat arbitrary choice
<rjo>
what i want to say. we always have inconsistency if we allow reentrant host code with a kernel invocation in between.
<rjo>
yes
<whitequark>
yes
<rjo>
we don't seem to mention that at all in the manual.
<whitequark>
that seems bad.
<rjo>
and about (b): why don't we allow that?
<whitequark>
hrm
<rjo>
whitequark: i understand what the compiler is telling me. but i don't get why this is a problem in our case.
<whitequark>
well, we implement returning values from kernels using something like this:
<whitequark>
def __modinit__():
<whitequark>
set_result(fn(args...))
<whitequark>
where set_result is an RPC generated in Core.
<whitequark>
we could of course add a hack that translates `return` statements in the toplevel function to calls, or something like that
<rjo>
but you are worried about special casing the top level kernel vs kernel-to-kernel calls?
<whitequark>
it would be a really special case, and it will cut through most of the compiler, yes
<whitequark>
hrm
<rjo>
ok.
<whitequark>
I can't think of any reason to not implement it beyond that
<whitequark>
I also can't think of any reason to elegantly handle top-level calls. they are already somewhat hacky, but this would make them *much* more hacky.
<whitequark>
(currently the AST for set_result(...) call is fabricated out of thin air. which is bad enough, but doesn't change the IR...)
<rjo>
but remind me: why is it a problem to extend the lifetime of a object beyond the return? just because de-allocation becomes harder?
<whitequark>
what do you mean "harder"?
<whitequark>
everything is currently deallocated at return, because we do not have a heap
<whitequark>
(allocated in the current function that is)
<rjo>
the compiler knows about everything to do deallocation, doesn't it?
<whitequark>
no
<whitequark>
the compiler does not explicitly do any deallocation
<whitequark>
it translates allocation to `alloca` calls and that's it
<whitequark>
it's a bump-pointer allocator that uses the call stack to store data.
<rjo>
ah. ok. i see the problem.
<whitequark>
if we do allocators, we'll have to do things like allocating lists of lists.
<rjo>
yep. all agreed. then it just needs documentation.
Gurty has quit [Excess Flood]
Gurty has joined #m-labs
Gurty has joined #m-labs
Gurty has quit [Changing host]
<sb0>
rjo, sending every list back would slow down kernel execution further
<sb0>
it's already quite slow to start/stop kernels...
<whitequark>
lists really aren't handled very well by the compiler, to a large degree because they can be heterogeneous...
<whitequark>
doesn't numpy have some homogeneous lists?
sb0 has quit [Quit: Leaving]
<GitHub>
[artiq] dhslichter commented on issue #712: Improving playback speed is one of the major goals of DMA, and persistence between kernels is definitely a feature one would like to have (imagine a DMA sequence for sideband cooling -- you'll probably call this in most experiments, and don't want to have to re-transfer and re-load into memory for each and every kernel). So whatever solution comes needs to work with both of these goals in mind. ... http
hobbes- has quit [Ping timeout: 264 seconds]
<GitHub>
[artiq] whitequark commented on issue #712: > Could one have a table of DMA handles at a specific location in memory (which persists between kernels), where the kernel CPU knows to look to get the start addresses for DMA sequences?... https://github.com/m-labs/artiq/issues/712#issuecomment-293985666
<rjo>
sb0: isn't kernel start/stop slow because of the other well-known problems?
<rjo>
whitequark: yes. numpy is all about homogeneous arrays. i'd be cool with dropping lists and (en)forcing ARTIQ Python to use a (subset of the) numpy API for arrays.
key2 has quit [Quit: Page closed]
hobbes- has joined #m-labs
<whitequark>
rjo: that won't really do
<whitequark>
we have lists of stuff other than numbers too (say, lists of TTL outputs or somelike)
hobbes- has quit [Read error: Connection reset by peer]
<whitequark>
something we *could* very nicely do is make lists (in ARTIQ Python) immutable and homogeneous, tuples immutable and heterogeneous, and numpy arrays mutable.
<whitequark>
since we need some sort of immutable homogeneous sentence and tuples can't be one because of how they're used in the language
<rjo>
whitequark: wasn't there some issue with lists of non-primitive objects?
<GitHub>
[artiq] dhslichter commented on issue #712: >No, because it is legal to use arbitrary code when creating the DMA sequence (I don't remember whether @rjo or @sbourdeauducq requested that). If we moved to only allow creating the DMA sequences before compiling the kernel, that would certainly solve most of the other problems--the sequence is now simply an opaque pointer.... https://github.com/m-labs/artiq/issues/712#issuecomment-293991430
<rjo>
whitequark: making lists immutable and homogeneous is very much fine if the numpy array api is supported somwhat instead. it also aligns well with the future direction of SIMD and features therabouts..
<GitHub>
[artiq] dhslichter commented on issue #712: >No, because it is legal to use arbitrary code when creating the DMA sequence (I don't remember whether @rjo or @sbourdeauducq requested that). If we moved to only allow creating the DMA sequences before compiling the kernel, that would certainly solve most of the other problems--the sequence is now simply an opaque pointer.... https://github.com/m-labs/artiq/issues/712#issuecomment-293991430
<GitHub>
[artiq] dhslichter commented on issue #712: >No, because it is legal to use arbitrary code when creating the DMA sequence (I don't remember whether @rjo or @sbourdeauducq requested that). If we moved to only allow creating the DMA sequences before compiling the kernel, that would certainly solve most of the other problems--the sequence is now simply an opaque pointer.... https://github.com/m-labs/artiq/issues/712#issuecomment-293991430
<GitHub>
[artiq] whitequark commented on issue #712: > It seems it would be faster (at least for long sequences) to have this done by the PC at compile time and re-uploaded, rather than having the core device doing the recomputation. Is this correct?... https://github.com/m-labs/artiq/issues/712#issuecomment-293993545
<GitHub>
[artiq] dhslichter commented on issue #712: One more comment, a bit of a step back: to my mind, the rationale behind DMA is that enables us to emit or receive pulses at a much higher rate than would be possible by simply using the kernel CPU to calculate them (or read them in) on the fly. To that end, it seems wise to involve the kernel CPU and comms CPU as little as possible in the generation of DMA output sequences. https://github.com/m-labs/a
<GitHub>
[artiq] dhslichter commented on issue #712: >>It seems it would be faster (at least for long sequences) to have this done by the PC at compile time and re-uploaded, rather than having the core device doing the recomputation. Is this correct?... https://github.com/m-labs/artiq/issues/712#issuecomment-293994957
hobbes- has joined #m-labs
<GitHub>
[artiq] whitequark commented on issue #712: > As pointed out, this is a very low TCP throughput, but to rise above 100 kbps seems like it should be an achievable task, no? For the general future performance of ARTIQ, it's going to be important to have a much fatter pipe than that between the PC and the FPGA hardware.... https://github.com/m-labs/artiq/issues/712#issuecomment-293996929
<GitHub>
[artiq] whitequark commented on issue #712: > Understanding that this is a nontrivial effort, are there other ways in which one might increase the speed of the communications past the 100-200 kbps mark?... https://github.com/m-labs/artiq/issues/712#issuecomment-294036514