<GitHub184>
sinara/master a264bac Greg: voltage regulator update
<sb0>
is there an easy way to prevent websites to know when the browser window loses focus?
<sb0>
one of the particularly shitty US government websites crashes when you switch windows
<sb0>
to make this bug particularly obnoxious, that website also requires an extraordinary amount of idiotic numbers with strange acronyms to be copy/pasted into it
<whitequark>
open console, then `window.onblur = null;`
<sb0>
ah, onblur
<sb0>
thanks
<sb0>
it's already null, though
<whitequark>
try clicking pause in devtools and then switching out
<whitequark>
it'll show you which event triggers the crash
rohitksingh_work has joined #m-labs
mumptai has joined #m-labs
rohitksingh_wor1 has joined #m-labs
rohitksingh_work has quit [Ping timeout: 248 seconds]
<whitequark>
rjo: got the PSUs
<whitequark>
no media converter yet, had to order it and i'll receive it tomorrow
<rjo>
whitequark: thanks. i don't need the media converter. i just need a sayma powered and hooked via usb.
<whitequark>
ohh ok, I thought you did
<whitequark>
usb only? what about ethernet?
<whitequark>
I'm not sure if I have a spare usb-eth
<whitequark>
rjo: btw you can login to the router and run tcpdump there.
<whitequark>
or at least I can, not sure if your keys are there
<rjo>
whitequark: i only need power and usb. sb0 needs ethernet.
<whitequark>
ok
<rjo>
whitequark: you already have usb (the on board ftdi) hooked up. just needs power.
<whitequark>
i'll depart towards the office soon
<rjo>
thanks! and, if you have, put the Sayma PSU on one of your remote/usb controlled power switches.
<rjo>
the one that you built/used for the rigol scope
<whitequark>
I think we only have one relay
<whitequark>
I can probably hook it so it resets everything
<whitequark>
for good measure
<mithro>
Anyone know of a good C library for talking to Etherbone on the host?
<whitequark>
sb0: I have determined why lwIP did not have the same performance issues here as smoltcp has
<whitequark>
it's actually not anything about lwIP
<whitequark>
it's about the way the networking code was written in the C runtime
<whitequark>
remember how it functioned? it determined the length, then queued everything up until that length, then started processing (or the other way around)
<whitequark>
this way, even with just two buffers, you're never losing packets because you aren't doing almost any actual work when receiving them
<whitequark>
lwip chains the pbufs or somethingplinpbufs into linked lists and then our receive function checks a single condition. that'sdone. next pollca can go on
<whitequark>
erm
<whitequark>
lwip chains the pbufs into a linked list and then our receive function checks a single condition. that's done, and next packet may be processed.
<whitequark>
we, however, run (comparatively) expensive decoding code for every packet. by "comparatively" I mean it finishes in something like 200-300us *but* gigE is also pretty fast and the host is pushing our window
<whitequark>
this is more or less the same as congestion but as far as I can tell congestion control algorithms don't actually work on sub-ms RTTs
<whitequark>
the problem here is I don't really see how even TCP segment reassembly would fundamentally help
<whitequark>
sure, throughput would be higher, but you would still be getting random 200ms spikes in the middle of transfer
<whitequark>
so to summarize, a) core device cannot process packets as fast as hosts with gigE cards can emit them, b) linux's congestion avoidance algorithms don't work in the conditions the core device exists for some reason
<whitequark>
i don't see anything smoltcp is doing obviously wrong. it's sending a challenge ACK every time it receives a packet after the lost one, and yet it takes the host a retransmit time to retransmit the lost packet
<whitequark>
*even if* smoltcp already had segment retransmission today these delays would still be there, just the part immediately after recovery would be skipped
<whitequark>
i'll need to think more about it.
<rjo>
re a) that's clear. but there is an important detail, correct me if i am wrong: due to handling packets strictly in order, once you loose a single packet (for whatever reason), all packets in the window after the lost packet are implicitly lost as well.
<whitequark>
correct
<whitequark>
what i'm saying is it doesn't make much of a difference. consider:
<whitequark>
let's say we have window size of 4*MSS. host transmits four packets (1-4), core device immediately replies with an ACK to each of them, by the end of this sequence there are four more in-flight packets (5-8), the 5th packet gets lots because the core device was too slow.
<whitequark>
core device responds to 6,7,8 with challenge ACK, host stays silent for 200ms regardless, then retransmits 5-8 (acked), transmits 9-12, 9 gets lost, repeat of the same story.
<whitequark>
now consider the case where we have reassembly.
<whitequark>
host transmits packets 1-4, core device acks, 5th packet gets lost, packets 6-8 get put into receive buffer but *still* generate challenge ACKs
<whitequark>
host *still* stays silent for 200ms, resends packet 5, gets ack for 8, transmits 9-16, 13th gets lost, repeat.
<whitequark>
since the packets are processed in so little time compared to these constant retransmit timeouts, the actual bandwidth doesn't grow almost at all
<rjo>
sure. the fast retransmit threshold needs to be smaller than the window size (in packets).
<whitequark>
it's more complicated than that
<whitequark>
sec, let me demonstrate
<GitHub187>
[smoltcp] whitequark pushed 1 new commit to master: https://git.io/v5Be5
<GitHub187>
smoltcp/master 3fff475 whitequark: An unaddressable egress packet should not be a reportable error.
<rjo>
and you need to be able to process (or buffer) one window burst.
<whitequark>
that's half of the more complicated part.
<whitequark>
it's impossible in general.
<whitequark>
if we have multiple independent tcp connections, there is no fixed amount of buffers that makes this possible
<whitequark>
in practice this means that having aqctl_corelog and artiq_run at the same time might be problematic
<rjo>
are you sending the ack before you have vacated the space in the buffer?
<GitHub56>
[smoltcp] whitequark commented on issue #37: Fixed. https://git.io/v5Bvv
<GitHub84>
[smoltcp] whitequark closed issue #37: The `ping` example is currently broken. https://git.io/v543l
<travis-ci>
m-labs/smoltcp#206 (master - 3fff475 : whitequark): The build passed.
<GitHub108>
[artiq] dhslichter commented on issue #685: @whitequark interesting. Definitely will need to see why the RPC does so much worse. And good to see that rates >1 MB/s are appearing now. The throughput and latency issues with the host-core device communication is probably the most major roadblock to switching to ARTIQ 3 for us in the lab right now, so keep us posted on how the debugging progresses. https://github.com/m-labs/artiq/issues/685#
<GitHub43>
[artiq] jbqubit commented on issue #826: @jordens This is an ARTIQ Issue in so much as @sbourdeauducq insists that IRC logs are part of the ARTIQ documentation. Anyway, it's unclear to me who maintains the IRC log. https://github.com/m-labs/artiq/issues/826#issuecomment-326614157
<whitequark>
rjo: do you have some sort of monitoring solution suggested?
<whitequark>
slightly more smart than a cron script that spams me with email
<whitequark>
irclog.wq.org is pretty stable but it dies once every few months for some reason I can never pin down. maybe a memory leak in the server process.
<whitequark>
s/suggested/suggestion/
<whitequark>
ah yes
<whitequark>
oom killer detected that something else caused the oom condition, so it killed the log viewer server.
<rjo>
whitequark: i am no expert on monitoring these things... if you are really asking for a suggestion, i would look into kapacitor/telegraf/influxdb but that might be complete overkill for you.
<rjo>
whitequark: yes. oom has been hitting the "wrong" process for me for decades.
<rjo>
whitequark: hmm. no idea whether window updates should accelerate fast retrans as they seem to be doing in your case.
<whitequark>
actually I think they are *slowing down* the process
<whitequark>
at least, I understood that you imply that these extra ACKs should not appear
<whitequark>
well, I can eliminate them and then let's see if congestion control improves
<rjo>
whitequark: re smoltcp: there is also that weirdly long ~5 ms period between the second retransmit and the following dup ack...
<whitequark>
that's when it went copying out of the TCP buffer and into the kernel CPU buffer.
<rjo>
**5ms** ?
<whitequark>
I think so
<whitequark>
also, did you have log level lower than INFO?
<rjo>
whitequark: that's cjbe's dump
<whitequark>
oh ok, then unlikely
<whitequark>
ok
<rjo>
but that's the first time it has such a long delay. everything before is ~500 µs
<rjo>
btw: what's the problem with out-of-order segments? can't you just allocate the window as a buffer on the stack (maybe you do that already) and then place packets in there and have some simply bounded-size structure to track the received packets? you must be doing something similar already on the TX side...
<whitequark>
no, the TX side is very easy because I do not have holes in it
<whitequark>
I just look at the offset of the payload of the segment I'm emitting and peek at the buffer from that point
<rjo>
so you never have more than one tx segment in-flight?
<whitequark>
no
<whitequark>
the mirror equivalent of out-of-order reassembly is selective acks
<whitequark>
I don't have selective acks
<rjo>
ok. but then the mirror equiv of having at most one tx segment in-flight is having at most 1*MSS rx window...
<whitequark>
I have many tx segments in flight...
<whitequark>
it will emit them until it fills the receiver's window entirely
<rjo>
ah. i misread you.
<whitequark>
no congestion control but since the host is faster and we only have local networks it doesn't seem to harm
<whitequark>
having a simple bounded-size structure is reasonable but there are two issues with this
<whitequark>
ok, well, there are two paths here
<whitequark>
one is to have this bounded size configurable, like smoltcp lets you configure the amount of UDP packets that can fit into a socket buffer
<whitequark>
this requires an additional kind of buffer, minor borrow checker nightmare, additional complication when setting up a socket, another thing to tune...
<whitequark>
the other, as I just realized, is to have it completely hardcoded, say 4 or 8, and inline it into the TCP buffer
<rjo>
iirc several embedded stacks have a reassembly buffer bounded in window size (obviuously) and bounded in number of segments in the buffer.
<rjo>
yes. i'd consider doing it hardcoded. how do you determine max window size BTW?
<whitequark>
ok, actually, I should thank you, because I just realized a very elegant way I can implement this that requires basically no serious changes to smoltcp and completely zero changes to external interface
<whitequark>
this alleviates all my concerns and answers most questions I've stalled on so far
<whitequark>
what do you mean by max window size?
<whitequark>
erm, by "determine max window size"
<whitequark>
the window size is just the size of the entire TCP socket buffer
<whitequark>
it's (at least for now) not growable and not shrinkable
<whitequark>
on top of that it's capped by RX_BUF_COUNT*MSS
<rjo>
ok. imho RX_BUF_COUNT of missing segments in the reassy buffer (== TCP socket buffer?) sounds fine to me. tracking the missing stuff (instead of the received stuff) might me more efficient.
<rjo>
but anyway. i guess i was at most indirectly responsible for that idea of yours... ;)
<whitequark>
oh I was approaching TCP reassembly/selective acks as a kind of dynamic allocator
<whitequark>
this obviously is pretty hard to pull off
<whitequark>
but if the only thing I need is to track holes in the buffer then it's almost trivial
<rjo>
yes.
<whitequark>
this elided me for the last, what, five months? :)
<whitequark>
well whatever time during that I could dedicate to smoltcp, anywayl
<travis-ci>
batonius/smoltcp#63 (master - 3fff475 : whitequark): The build passed.
<whitequark>
i turned off overcommit and now i can't log into ssh
<whitequark>
beautiful
whitequark has quit [Quit: leaving]
<rjo>
whitequark: ;) well. tcp is a bitch.
<rjo>
whitequark: i don't remember what overcommit is.. no Intel AMT serial console backdoor availabe? or HP ILO? ;)
<cr1901_modern>
rjo: Overcommit is the "brilliant" idea where the OS lies to you about whether a malloc succeeds or not.
<cr1901_modern>
But essentially it's required in 2017 :/
<cr1901_modern>
The idea is if you're probably not actually using all the pages you allocate all at once, so it's safe to lie and allocate more memory than physically exists (incl swapfile) under the assumption you'll never need all of it at once
<GitHub17>
sinara/master 4c1712e Greg: schematics completed, pdf added
klickverbot has joined #m-labs
klickverbot has left #m-labs [#m-labs]
d_n|a has joined #m-labs
d_n|a has quit [Remote host closed the connection]
d_n|a has joined #m-labs
<d_n|a>
whitequark: In case you are reading the logs later: Glad to hear you are thinking about implementing reassembly again.
<d_n|a>
We (Chris/I) would be happy to get you traces on whatever log level/custom build/etc. is helpful, but as long as there are obvious issues (no assembly/proper ack handling within the announced rcv window), this might not be very useful
d_n|a has quit [Remote host closed the connection]