azonenberg changed the topic of #scopehal to: libscopehal, libscopeprotocols, and glscopeclient development and testing |,, | Logs:
Degi has quit [Ping timeout: 256 seconds]
Degi has joined #scopehal
juli966 has quit [Quit: Nettalk6 -]
electronic_eel has quit [Ping timeout: 264 seconds]
electronic_eel_ has joined #scopehal
<azonenberg> So i'm shifting gears a bit to MAXWELL firmware for a few days
<azonenberg> just for a change of pace, then back to probe design
<azonenberg> I was originally planning to do some function generator ui code for glscopeclient this weekend
<azonenberg> then $work decided to do maintenance on the VPN so i couldn't get into the lab where the scope with an attached function generator lived...
<_whitenotifier-f> [starshipraider] azonenberg pushed 2 commits to master [+7/-5/±0]
<_whitenotifier-f> [starshipraider] azonenberg ac73439 - Renamed resistive-probe to akl-pt2
<_whitenotifier-f> [starshipraider] azonenberg 0be1f07 - Initial implementation of multilane CRC simulation
<_whitenotifier-f> [starshipraider] azonenberg pushed 1 commit to master [+0/-0/±1]
<_whitenotifier-f> [starshipraider] azonenberg 268062e - Continued work on CRC simulation
electronic_eel_ is now known as electronic_eel
<tnt> azonenberg: I'm wondering if that can help crc([a,b,c,d]) == crc([0,0,0,d]) ^ crc([0,0,c,0]) ^ crc([0,b,0,0]) ^ crc([a,0,0,0]) ^ crc([0,0,0,0])
<tnt> so you could compute several 'phases' in // and just combine them at the end.
<azonenberg> tnt: That is actually the theory of a paper i'm reading
<azonenberg> it splits a 128 bit datapath into four 32s then combines
<azonenberg> i think that's probably the way to go. I think with some optimization and possibly bumping the fpga up to -3 speed i can pass timing on a 32 bit
<azonenberg> But i need to play around a bit more... some recent implementation tweaks are suggesting the 128 bit implementation might be possible to smoosh down enough to make timing
<tnt> Ok, I guess there aren't too many way to do it and it seemed like the obvious one :)
miek has quit [Ping timeout: 246 seconds]
asy_ has quit [Ping timeout: 246 seconds]
asy_ has joined #scopehal
miek has joined #scopehal
<Degi> Hmm, the XORing CRCs is interesting...
<tnt> the beauty of linear math
<d1b2> <OmniTechnoMancer> The crc of 4 0s is constant and can be done later right?
<d1b2> <OmniTechnoMancer> and omitted if you have an even number of 4 way splits?
<tnt> it's constant for sure. Not sure about the omission, I didn't work out exactly how the initial/final operations done in the CRC interated.
<d1b2> <OmniTechnoMancer> I guess it depends how you combine two of these packets together?
juli966 has joined #scopehal
juli966 has quit [Quit: Nettalk6 -]
bvernoux has joined #scopehal
<azonenberg> tnt: ok so i'm doing some math
<azonenberg> to start, simplifying by using crc32-posix which is the same polynomial as ethernet but initialized to all 0s not 1s
<azonenberg> in this case, crc(aa bb cc dd) = ~(crc(aa 00 00 00) ^ crc(00 bb 00 00) ^ crc(00 00 cc 00) ^ crc(00 00 00 dd) )
<azonenberg> unclear where the complement comes in
<bvernoux> azonenberg, why not using a CRC32 lookup table ?
<bvernoux> it will compute CRC32 with one cycle per byte
<azonenberg> bvernoux: This is response to my twitter thread trying to figure out a good way of doing 40 Gbps CRC32 with a 128-bit datapath at 312.5 MHz for 40GbE
<bvernoux> ha ok
<azonenberg> one cycle per byte is about 16 times too slow :p
<bvernoux> yes clearly ;)
<bvernoux> so the idea is to do a custom // algorithm ?
<bvernoux> it will be intesting to check what ST does they have a HW CRC32 ...
<azonenberg> I'm exploring an algorithm in a couple of papers
<azonenberg> which exploits the ability of crcs to be split and combined under some circumstances
<azonenberg> i'm still working on figuring out the best chunk size and how to combine things etc
<azonenberg> and how to handle the partial word at the end of a packet efficiently
<bvernoux> yes interesting and the expected speed is to do 4 bytes / cycles or more ?
<azonenberg> I'm targeting 16 bytes per clock at 312.5 MHz
<azonenberg> on a -2 kintex7
<bvernoux> ha yes will be very nice
<azonenberg> That's what it will take to process 40G line rate data
<bvernoux> yes at 40G it is tricky
<bvernoux> High-Speed Computation of CRC Codes for FPGAs
<bvernoux> It reach 2TBytes/s ;)
<bvernoux> it seems it reach 2TBytes/s too
<bvernoux> with verilog
<sorear> it's just a modular reduction, there are many ways to rearrange it
<bvernoux> it is what they do on VC709 FPGA with 2TBytes/s for all Ethernet frames from 64Bytes to 1518bytes =>
<bvernoux> They have not added a license to reuse it anyway
<bvernoux> it is not clear if it can be freely used ...
<bvernoux> Frequency is between 508 to 550MHz
<bvernoux> here
<bvernoux> Very interesting to optimize FPGA ...
<bvernoux> only 1074 LUT with 64bits Bus Width to reach 2000Gbps @550.96MHz
<bvernoux> so if you scale it to 312.5MHz you can do easily 40Gbps CRC32 ;)
<bvernoux> with a huge margin
<bvernoux> just to correct it is not 2TBytes/s but 2Tbps ;)
<azonenberg> bvernoux: those numbers dont add up
<bvernoux> ?
<azonenberg> 64 bits @ 550 MHz is only 35 Gbps
<azonenberg> not 2000
<bvernoux> they say about 2Tbps
<bvernoux> to be checked maybe it is only for the big version ;)
<bvernoux> yes maybe the 2Tbps is only for Bus Width 4096 with 19902 LUT running at 509.13MHz ...
<bvernoux> it is not very clear
<bvernoux> for what you want you can choose the best it will be interesting to test what they have done instead of reinventing the wheel
<bvernoux> It seems to be open source
<bvernoux> they speak about it in Readme
<bvernoux> As far as we know, this is the first open source code covering the whole procedure of programming a single LUT
<bvernoux> speaking about Reprogramming by HWICAP
<bvernoux> their paper is here file:///C:/Users/Ben/AppData/Local/Temp/Low-Cost%20and%20Programmable%20CRC%20Implementation%20based%20on%20FPGA%20(Extended%20Version).pdf
<bvernoux> oups ;)
<bvernoux> in this paper they say => The maximum throughput of the system is 512 Gbps
<bvernoux> on page 6 we clearly see the correlation between Bus Width & Throughput
<bvernoux> yes 64bits Bus shall corresponds to 35Gbps ...
<bvernoux> 1024bits Bus corresponds to 500Gbps
<bvernoux> all that work for a CRC32 ;)
juli966 has joined #scopehal
<azonenberg> bvernoux: also yeah i looked at that paper
<azonenberg> the icap bit is for changing crc polynomial on the fly
<azonenberg> Which i dont need
<bvernoux> but you have the free code on github too ;)
bvernoux has quit [Quit: Leaving]
sorear has quit [Read error: Connection reset by peer]
sorear has joined #scopehal
alexhw_ has joined #scopehal
alexhw has quit [Ping timeout: 256 seconds]