#m-labs on 2017-09-01 — irc logs at freenode.irclog.whitequark.org

2015-03-04 14:45 sb0 changed the topic of #m-labs to: ARTIQ, Migen, MiSoC, Mixxeo & other M-Labs projects :: fka #milkymist :: Logs http://irclog.whitequark.org/m-labs

00:20 <GitHub33> [sinara] gkasprow pushed 1 new commit to master: https://github.com/m-labs/sinara/commit/d47a2280f460a07ab0e2d9305f3c1f1b5e34faff

00:20 <GitHub33> sinara/master d47a228 Greg: initial schematic and PCB layout

00:34 <GitHub184> [sinara] gkasprow pushed 1 new commit to master: https://github.com/m-labs/sinara/commit/a264bacb6a90fbbd20aa9ca084af50e1d49af8ba

00:34 <GitHub184> sinara/master a264bac Greg: voltage regulator update

03:02 <sb0> is there an easy way to prevent websites to know when the browser window loses focus?

03:02 <sb0> one of the particularly shitty US government websites crashes when you switch windows

03:05 <sb0> to make this bug particularly obnoxious, that website also requires an extraordinary amount of idiotic numbers with strange acronyms to be copy/pasted into it

03:14 <whitequark> open console, then `window.onblur = null;`

03:16 <sb0> ah, onblur

03:16 <sb0> thanks

03:22 <sb0> it's already null, though

03:23 <whitequark> try clicking pause in devtools and then switching out

03:23 <whitequark> it'll show you which event triggers the crash

03:48 rohitksingh_work has joined #m-labs

06:03 mumptai has joined #m-labs

08:08 rohitksingh_wor1 has joined #m-labs

08:09 rohitksingh_work has quit [Ping timeout: 248 seconds]

09:10 <whitequark> rjo: got the PSUs

09:10 <whitequark> no media converter yet, had to order it and i'll receive it tomorrow

09:12 <rjo> whitequark: thanks. i don't need the media converter. i just need a sayma powered and hooked via usb.

09:12 <whitequark> ohh ok, I thought you did

09:12 <whitequark> usb only? what about ethernet?

09:12 <whitequark> I'm not sure if I have a spare usb-eth

09:13 <whitequark> rjo: btw you can login to the router and run tcpdump there.

09:13 <whitequark> or at least I can, not sure if your keys are there

09:15 <rjo> whitequark: i only need power and usb. sb0 needs ethernet.

09:15 <whitequark> ok

09:16 <rjo> whitequark: you already have usb (the on board ftdi) hooked up. just needs power.

09:16 <whitequark> i'll depart towards the office soon

09:17 <rjo> thanks! and, if you have, put the Sayma PSU on one of your remote/usb controlled power switches.

09:17 <rjo> the one that you built/used for the rigol scope

09:17 <whitequark> I think we only have one relay

09:17 <whitequark> I can probably hook it so it resets everything

09:17 <whitequark> for good measure

09:18 <mithro> Anyone know of a good C library for talking to Etherbone on the host?

09:52 <whitequark> sb0: I have determined why lwIP did not have the same performance issues here as smoltcp has

09:53 <whitequark> it's actually not anything about lwIP

09:53 <whitequark> it's about the way the networking code was written in the C runtime

09:53 <whitequark> remember how it functioned? it determined the length, then queued everything up until that length, then started processing (or the other way around)

09:54 <whitequark> this way, even with just two buffers, you're never losing packets because you aren't doing almost any actual work when receiving them

09:55 <whitequark> lwip chains the pbufs or somethingplinpbufs into linked lists and then our receive function checks a single condition. that'sdone. next pollca can go on

09:58 <whitequark> erm

09:59 <whitequark> lwip chains the pbufs into a linked list and then our receive function checks a single condition. that's done, and next packet may be processed.

10:00 <whitequark> we, however, run (comparatively) expensive decoding code for every packet. by "comparatively" I mean it finishes in something like 200-300us *but* gigE is also pretty fast and the host is pushing our window

10:01 <whitequark> this is more or less the same as congestion but as far as I can tell congestion control algorithms don't actually work on sub-ms RTTs

10:02 <whitequark> the problem here is I don't really see how even TCP segment reassembly would fundamentally help

10:02 <whitequark> sure, throughput would be higher, but you would still be getting random 200ms spikes in the middle of transfer

10:10 <whitequark> so to summarize, a) core device cannot process packets as fast as hosts with gigE cards can emit them, b) linux's congestion avoidance algorithms don't work in the conditions the core device exists for some reason

10:11 <whitequark> i don't see anything smoltcp is doing obviously wrong. it's sending a challenge ACK every time it receives a packet after the lost one, and yet it takes the host a retransmit time to retransmit the lost packet

10:11 <whitequark> *even if* smoltcp already had segment retransmission today these delays would still be there, just the part immediately after recovery would be skipped

10:19 <whitequark> i'll need to think more about it.

10:21 <rjo> re a) that's clear. but there is an important detail, correct me if i am wrong: due to handling packets strictly in order, once you loose a single packet (for whatever reason), all packets in the window after the lost packet are implicitly lost as well.

10:22 <whitequark> correct

10:22 <whitequark> what i'm saying is it doesn't make much of a difference. consider:

10:25 <whitequark> let's say we have window size of 4*MSS. host transmits four packets (1-4), core device immediately replies with an ACK to each of them, by the end of this sequence there are four more in-flight packets (5-8), the 5th packet gets lots because the core device was too slow.

10:26 <whitequark> core device responds to 6,7,8 with challenge ACK, host stays silent for 200ms regardless, then retransmits 5-8 (acked), transmits 9-12, 9 gets lost, repeat of the same story.

10:26 <whitequark> now consider the case where we have reassembly.

10:27 <whitequark> host transmits packets 1-4, core device acks, 5th packet gets lost, packets 6-8 get put into receive buffer but *still* generate challenge ACKs

10:27 <whitequark> host *still* stays silent for 200ms, resends packet 5, gets ack for 8, transmits 9-16, 13th gets lost, repeat.

10:28 <whitequark> since the packets are processed in so little time compared to these constant retransmit timeouts, the actual bandwidth doesn't grow almost at all

10:28 <rjo> sure. the fast retransmit threshold needs to be smaller than the window size (in packets).

10:29 <whitequark> it's more complicated than that

10:30 <whitequark> sec, let me demonstrate

10:31 <GitHub187> [smoltcp] whitequark pushed 1 new commit to master: https://git.io/v5Be5

10:31 <GitHub187> smoltcp/master 3fff475 whitequark: An unaddressable egress packet should not be a reportable error.

10:31 <rjo> and you need to be able to process (or buffer) one window burst.

10:32 <whitequark> that's half of the more complicated part.

10:32 <whitequark> it's impossible in general.

10:32 <whitequark> if we have multiple independent tcp connections, there is no fixed amount of buffers that makes this possible

10:33 <whitequark> in practice this means that having aqctl_corelog and artiq_run at the same time might be problematic

10:33 <rjo> are you sending the ack before you have vacated the space in the buffer?

10:33 <GitHub56> [smoltcp] whitequark commented on issue #37: Fixed. https://git.io/v5Bvv

10:33 <GitHub84> [smoltcp] whitequark closed issue #37: The `ping` example is currently broken. https://git.io/v543l

10:34 <travis-ci> m-labs/smoltcp#206 (master - 3fff475 : whitequark): The build passed.

10:34 <travis-ci> Change view : https://github.com/m-labs/smoltcp/compare/dc94c35da38b...3fff475c8faa

10:34 <travis-ci> Build details : https://travis-ci.org/m-labs/smoltcp/builds/270773569

10:35 <whitequark> simultaneously

10:35 <whitequark> well, it should happen within a few instructions

10:36 <rjo> i don't get why 1. "5th packet gets lost because the core device was too slow" and 2. "host stays silent regardless"

10:36 <rjo> 1. should not happen because there is buffer space in the MAC. 2. should not happen because of FR.

10:36 <whitequark> oh, hm.

12:46 rohitksingh_wor1 has quit [Read error: Connection reset by peer]

13:00 <rjo> whitequark: why are there so many dup acks? https://user-images.githubusercontent.com/1338946/29970678-c25d93ea-8f25-11e7-95d1-3b9cdc0e7359.png

13:00 <rjo> everythin leading to the first triple-ack is ok.

13:01 <rjo> but then you ack twice more but there were no new packets?

13:01 <rjo> *leading to and including...

13:26 <rjo> whitequark: and RFC 1122 4.2.2.20: For example, if the TCP is processing a series of queued

13:27 <rjo> segments, it MUST process them all before sending any ACK

13:27 <rjo> segments.y

13:27 <GitHub14> [artiq] jordens commented on issue #685: ![screenshot from 2017-09-01 14-56-04](https://user-images.githubusercontent.com/1338946/29970678-c25d93ea-8f25-11e7-95d1-3b9cdc0e7359.png)... https://github.com/m-labs/artiq/issues/685#issuecomment-326579806

14:08 bb-m-labs has quit [Ping timeout: 260 seconds]

14:08 bb-m-labs has joined #m-labs

14:20 hobbes- has joined #m-labs

14:56 <GitHub108> [artiq] dhslichter commented on issue #685: @whitequark interesting. Definitely will need to see why the RPC does so much worse. And good to see that rates >1 MB/s are appearing now. The throughput and latency issues with the host-core device communication is probably the most major roadblock to switching to ARTIQ 3 for us in the lab right now, so keep us posted on how the debugging progresses. https://github.com/m-labs/artiq/issues/685#

15:25 jbqubit has joined #m-labs

15:26 <GitHub136> [artiq] jbqubit opened issue #826: IRC log broken https://github.com/m-labs/artiq/issues/826

15:38 <GitHub196> [artiq] jordens commented on issue #826: That's not an ARTIQ issue. Write an e-mail. https://github.com/m-labs/artiq/issues/826#issuecomment-326613291

15:38 <GitHub135> [artiq] jordens closed issue #826: IRC log broken https://github.com/m-labs/artiq/issues/826

15:42 <GitHub43> [artiq] jbqubit commented on issue #826: @jordens This is an ARTIQ Issue in so much as @sbourdeauducq insists that IRC logs are part of the ARTIQ documentation. Anyway, it's unclear to me who maintains the IRC log. https://github.com/m-labs/artiq/issues/826#issuecomment-326614157

15:42 <GitHub143> [artiq] jbqubit reopened issue #826: IRC log broken https://github.com/m-labs/artiq/issues/826

15:58 <GitHub120> [artiq] jbqubit commented on issue #817: Initialization routine for [Creotech FMC DIO 32ch lvds a v1.2](www.ohwr.org/projects/fmc-dio-32chlvdsa/wiki) is here: ... https://github.com/m-labs/artiq/issues/817#issuecomment-326618219

16:05 <GitHub36> [sinara] gkasprow pushed 2 new commits to master: https://github.com/m-labs/sinara/compare/a264bacb6a90...08516fc90929

16:05 <GitHub36> sinara/master 08516fc Greg: Merge branch 'master' of https://github.com/m-labs/sinara

16:05 <GitHub36> sinara/master 1217738 Greg: 90% of schematics done

17:12 <jbqubit> bb-m-labs: help

17:12 <bb-m-labs> Get help on what? (try 'help <foo>', 'help <foo> <bar>, or 'commands' for a command list)

17:12 <jbqubit> bb-m-labs: commands

17:12 <bb-m-labs> buildbot commands: commands, dance, destroy, force, hello, help, last, list, mute, notify, shutdown, source, status, stop, unmute, version, watch

17:13 <jbqubit> bb-m-labs: list

17:13 <bb-m-labs> try 'list builders'

17:13 <jbqubit> bb-m-labs: status

17:13 <bb-m-labs> artiq: idle, last build 26h28m03s ago: build successful

17:13 <bb-m-labs> conda-all: idle, last build 2135h56m30s ago: failed

17:13 <bb-m-labs> conda-lin64: idle, last build 1909h34m33s ago: build successful

17:13 <bb-m-labs> artiq-win64-test: idle, last build 26h28m06s ago: build successful

17:13 <bb-m-labs> artiq-board: idle, last build 26h36m08s ago: build successful

17:13 <bb-m-labs> conda-win64: idle, last build 2135h57m16s ago: failed conda_build

17:13 <bb-m-labs> conda-win32: idle, last build 2135h57m48s ago: failed conda_build

17:13 <bb-m-labs> migen: idle, last build 34h56m49s ago: build successful

17:13 <bb-m-labs> misoc: idle, last build 33h40m38s ago: build successful

17:13 <bb-m-labs> pyqtgraph: idle, last build 4444h22m54s ago: build successful

17:13 <jbqubit> bb-m-labs: last

17:13 <bb-m-labs> last build [artiq]: 26h28m48s ago: build successful

17:14 <bb-m-labs> last build [conda-lin64]: 1909h35m17s ago: build successful

17:14 <bb-m-labs> last build [conda-all]: 2135h57m15s ago: failed

17:14 <bb-m-labs> last build [artiq-win64-test]: 26h28m51s ago: build successful

17:14 <bb-m-labs> last build [artiq-board]: 26h36m52s ago: build successful

17:14 <bb-m-labs> last build [conda-win32]: 2135h58m33s ago: failed conda_build

17:14 <bb-m-labs> last build [conda-win64]: 2135h58m01s ago: failed conda_build

17:14 <bb-m-labs> last build [migen]: 34h57m33s ago: build successful

17:14 <bb-m-labs> last build [misoc]: 33h41m22s ago: build successful

17:14 <bb-m-labs> last build [pyqtgraph]: 4444h23m39s ago: build successful

17:24 <GitHub161> [artiq] whitequark commented on issue #826: The domain is "whitequark.org", so @whitequark would be who maintains it... https://github.com/m-labs/artiq/issues/826#issuecomment-326637937

17:24 <whitequark> rjo: do you have some sort of monitoring solution suggested?

17:24 <whitequark> slightly more smart than a cron script that spams me with email

17:25 <whitequark> irclog.wq.org is pretty stable but it dies once every few months for some reason I can never pin down. maybe a memory leak in the server process.

17:25 <whitequark> s/suggested/suggestion/

17:27 <whitequark> ah yes

17:27 <whitequark> oom killer detected that something else caused the oom condition, so it killed the log viewer server.

17:27 <whitequark> what a wonderful design.

17:27 * whitequark turns off overcommit

17:28 <GitHub184> [artiq] jbqubit opened issue #827: bug in newly documented artiq-dev build work flow https://github.com/m-labs/artiq/issues/827

17:33 <GitHub70> [artiq] jbqubit commented on issue #826: Now fixed. Thanks @whitequark https://github.com/m-labs/artiq/issues/826#issuecomment-326639890

17:33 <GitHub6> [artiq] jbqubit closed issue #826: IRC log broken https://github.com/m-labs/artiq/issues/826

17:48 <GitHub28> [artiq] sbourdeauducq closed issue #827: bug in newly documented artiq-dev build work flow https://github.com/m-labs/artiq/issues/827

17:48 <GitHub199> [artiq] sbourdeauducq commented on issue #827: Your misoc version does not match the one prescribed in the conda-dev recipe. https://github.com/m-labs/artiq/issues/827#issuecomment-326643407

18:00 <whitequark> rjo: regarding "ack twice more"

18:00 <whitequark> this is actually an artifact of how I do window management

18:01 <whitequark> the window is clamped to RX_BUF_CNT*MSS, but on a level above TCP, so TCP doesn't know about that

18:01 <whitequark> and it tries to send a window update

18:02 <whitequark> actually, it should probably not send window updates except for the one case where window rises from zero to one MSS plus

18:09 key2 has quit [Quit: Page closed]

18:19 <GitHub76> [artiq] sbourdeauducq commented on issue #826: I never said that and even less insisted on it. https://github.com/m-labs/artiq/issues/826#issuecomment-326650465

19:05 jbqubit has quit [Quit: Page closed]

19:41 <rjo> whitequark: i am no expert on monitoring these things... if you are really asking for a suggestion, i would look into kapacitor/telegraf/influxdb but that might be complete overkill for you.

19:41 <rjo> whitequark: yes. oom has been hitting the "wrong" process for me for decades.

19:43 <rjo> whitequark: hmm. no idea whether window updates should accelerate fast retrans as they seem to be doing in your case.

19:49 <whitequark> actually I think they are *slowing down* the process

19:49 <whitequark> at least, I understood that you imply that these extra ACKs should not appear

19:49 <whitequark> well, I can eliminate them and then let's see if congestion control improves

19:53 <rjo> whitequark: re smoltcp: there is also that weirdly long ~5 ms period between the second retransmit and the following dup ack...

19:54 <whitequark> that's when it went copying out of the TCP buffer and into the kernel CPU buffer.

19:55 <rjo> **5ms** ?

19:55 <whitequark> I think so

19:55 <whitequark> also, did you have log level lower than INFO?

19:56 <rjo> whitequark: that's cjbe's dump

19:56 <whitequark> oh ok, then unlikely

19:56 <whitequark> ok

19:57 <rjo> but that's the first time it has such a long delay. everything before is ~500 µs

19:58 <rjo> btw: what's the problem with out-of-order segments? can't you just allocate the window as a buffer on the stack (maybe you do that already) and then place packets in there and have some simply bounded-size structure to track the received packets? you must be doing something similar already on the TX side...

19:58 <whitequark> no, the TX side is very easy because I do not have holes in it

19:59 <whitequark> I just look at the offset of the payload of the segment I'm emitting and peek at the buffer from that point

19:59 <rjo> so you never have more than one tx segment in-flight?

19:59 <whitequark> no

19:59 <whitequark> the mirror equivalent of out-of-order reassembly is selective acks

19:59 <whitequark> I don't have selective acks

20:00 <rjo> ok. but then the mirror equiv of having at most one tx segment in-flight is having at most 1*MSS rx window...

20:00 <whitequark> I have many tx segments in flight...

20:00 <whitequark> it will emit them until it fills the receiver's window entirely

20:01 <rjo> ah. i misread you.

20:01 <whitequark> no congestion control but since the host is faster and we only have local networks it doesn't seem to harm

20:01 <whitequark> having a simple bounded-size structure is reasonable but there are two issues with this

20:02 <whitequark> ok, well, there are two paths here

20:03 <whitequark> one is to have this bounded size configurable, like smoltcp lets you configure the amount of UDP packets that can fit into a socket buffer

20:03 <whitequark> this requires an additional kind of buffer, minor borrow checker nightmare, additional complication when setting up a socket, another thing to tune...

20:04 <whitequark> the other, as I just realized, is to have it completely hardcoded, say 4 or 8, and inline it into the TCP buffer

20:04 <rjo> iirc several embedded stacks have a reassembly buffer bounded in window size (obviuously) and bounded in number of segments in the buffer.

20:05 <rjo> yes. i'd consider doing it hardcoded. how do you determine max window size BTW?

20:06 <whitequark> ok, actually, I should thank you, because I just realized a very elegant way I can implement this that requires basically no serious changes to smoltcp and completely zero changes to external interface

20:06 <whitequark> this alleviates all my concerns and answers most questions I've stalled on so far

20:06 <whitequark> what do you mean by max window size?

20:06 <whitequark> erm, by "determine max window size"

20:07 <whitequark> the window size is just the size of the entire TCP socket buffer

20:07 <whitequark> it's (at least for now) not growable and not shrinkable

20:07 <whitequark> on top of that it's capped by RX_BUF_COUNT*MSS

20:09 <rjo> ok. imho RX_BUF_COUNT of missing segments in the reassy buffer (== TCP socket buffer?) sounds fine to me. tracking the missing stuff (instead of the received stuff) might me more efficient.

20:10 <rjo> but anyway. i guess i was at most indirectly responsible for that idea of yours... ;)

20:11 <whitequark> oh I was approaching TCP reassembly/selective acks as a kind of dynamic allocator

20:11 <whitequark> this obviously is pretty hard to pull off

20:11 <whitequark> but if the only thing I need is to track holes in the buffer then it's almost trivial

20:11 <rjo> yes.

20:11 <whitequark> this elided me for the last, what, five months? :)

20:12 <whitequark> well whatever time during that I could dedicate to smoltcp, anywayl

20:14 <travis-ci> batonius/smoltcp#63 (master - 3fff475 : whitequark): The build passed.

20:14 <travis-ci> Change view : https://github.com/batonius/smoltcp/compare/017210ea28b6...3fff475c8faa

20:14 <travis-ci> Build details : https://travis-ci.org/batonius/smoltcp/builds/270974355

20:26 mumptai has quit [Quit: Verlassend]

20:26 <whitequark> rjo: ...

20:27 <whitequark> i turned off overcommit and now i can't log into ssh

20:27 <whitequark> beautiful

20:29 whitequark has quit [Quit: leaving]

20:31 <rjo> whitequark: ;) well. tcp is a bitch.

20:32 <rjo> whitequark: i don't remember what overcommit is.. no Intel AMT serial console backdoor availabe? or HP ILO? ;)

20:39 <cr1901_modern> rjo: Overcommit is the "brilliant" idea where the OS lies to you about whether a malloc succeeds or not.

20:39 <cr1901_modern> But essentially it's required in 2017 :/

20:41 <cr1901_modern> The idea is if you're probably not actually using all the pages you allocate all at once, so it's safe to lie and allocate more memory than physically exists (incl swapfile) under the assumption you'll never need all of it at once

21:14 <GitHub17> [sinara] gkasprow pushed 1 new commit to master: https://github.com/m-labs/sinara/commit/4c1712ed5ebfd12d6675f226e2341face39457dc

21:14 <GitHub17> sinara/master 4c1712e Greg: schematics completed, pdf added

21:50 klickverbot has joined #m-labs

21:51 klickverbot has left #m-labs [#m-labs]

21:52 d_n|a has joined #m-labs

21:52 d_n|a has quit [Remote host closed the connection]

21:54 d_n|a has joined #m-labs

21:57 <d_n|a> whitequark: In case you are reading the logs later: Glad to hear you are thinking about implementing reassembly again.

21:58 <d_n|a> We (Chris/I) would be happy to get you traces on whatever log level/custom build/etc. is helpful, but as long as there are obvious issues (no assembly/proper ack handling within the announced rcv window), this might not be very useful

22:21 d_n|a has quit [Remote host closed the connection]