##openfpga on 2020-07-13 — irc logs at freenode.irclog.whitequark.org

00:05 emeb has quit [Quit: Leaving.]

00:33 sgstair has joined ##openfpga

01:27 Degi has quit [Ping timeout: 272 seconds]

01:28 Degi has joined ##openfpga

01:43 <cyrozap-ZNC> Hey, all! Before I go on a yak-shaving adventure, does anyone know of any logic analyzer cores I can just drop into an XC6SLX16 that support a 100 Msps sample rate and streaming data to a PC via an FT232H in synchronous 245 FIFO mode?

01:44 cyrozap-ZNC has quit [Quit: Client quit]

01:44 cyrozap has joined ##openfpga

01:48 <cyrozap> ... I realize this is a very specific request, but I'd be really embarrassed if I went and coded something like this up and it turned out that all or even just most of that effort was unnecessary.

02:01 <cyrozap> Oh, and the context of this is that I want to snoop on an SPI bus running at around 20 MHz for several seconds, so I need to stream 100 MSps of at least 4 channels of data to my PC, and I don't want to buy new hardware to do all that (assuming that hardware even exists and works on Linux), so I'd like to use my Digilent Analog Discovery if I can, but the Discovery/WaveForms only has a buffer depth of

02:01 <cyrozap> 16k samples, so at 100 MSps that's only ~164 microseconds of data.

02:04 <cyrozap> So my current idea is to write my own dumb LA core that always samples 8 channels at 100 MSps, stuffs each byte of that data into a deflate core (https://github.com/tomtor/HDL-deflate), and then spits the compressed stream out to the FT232H at (hopefully) <35 MB/s.

02:28 Bike has quit [Quit: leaving]

02:36 Bird|otherbox has quit [Read error: Connection reset by peer]

02:36 Bird|otherbox has joined ##openfpga

02:40 gregdavill has joined ##openfpga

02:40 <whitequark> azonenberg: https://twitter.com/whitequark/status/1282503395383836673

02:44 jaseg has quit [Ping timeout: 260 seconds]

02:45 jaseg has joined ##openfpga

03:15 <futarisIRCcloud> whitequark: https://youtu.be/0xSA0ljsnjc - There was a talk at LCA 2018 related to that.

03:22 <azonenberg> cyrozap: why deflate if you could just run length encode?

03:22 <azonenberg> RLE works very well on most digital data especially if oversampled

03:22 <azonenberg> and uses much less gates

03:22 <azonenberg> if it's not enough compression then consider something more aggressive

03:22 <azonenberg> but i'd be very doubtful that you could fit full LZ in a 6slx16 for a sane number of channels

03:28 <cyrozap> azonenberg: Well, because it's done a good job of compressing my sigrok captures 10:1 (the SR file format is literally just a zip file with files filled with bytes, one byte for every 8 channels), it doesn't take up too many LUTs, it's fast in hardware (100 MHz according to that repo), and overall it just seemed like a good tradeoff between compression ratio, speed, and resource usage.

03:29 <cyrozap> I suppose I should do a benchmark between deflate and RLE on my real-world captures, so I know for sure if deflate or RLE would be better.

04:16 <cyrozap> Ok, so far, with my naive RLE implementation, deflate is winning. 2.4 MB for deflate vs. 14 MB for RLE, both performed on a 182 MB input. Yeah, I could probably fiddle with things like increasing the number of bytes for the encoded length, using certain bits to indicate the number of bytes for the encoded length, etc., but I'm pretty sure I'm not a better compression engineer than the people who made

04:17 <cyrozap> deflate.

04:19 <TD-Linux> deflate is going to crush rle because it has entropy coding so it's going to easily crush rle as it can take advantage of correlation between channels

04:19 <TD-Linux> even moreso if you have a synchronized clock as then the dictionary will kick in

04:20 <sorear> you can be a pretty terrible compression engineer and make something better *for a specific purpose* than the best general-purpose algorithms

04:21 <TD-Linux> honestly I think you're going to have trouble beating general purpose lossless compression on signal waveforms

04:23 <TD-Linux> especially with a few tweaks like how you format the bytes going in and maybe choice of dictionary size

04:26 <cyrozap> Yeah, Huffman coding is good shit. And I haven't even measured how well (or rather, how poorly) my RLE encoder handles the hard-to-compress parts of the capture--that's really important since I only have so much instantaneous bandwidth available on the USB bus, and no DRAM to buffer the compressed data in.

04:34 <TD-Linux> yeah, the worst case is of course no compression so it really depends on your device. you'll definitely want a long fifo. if your data is bursty that might obviate the need for deflate

04:40 <TD-Linux> now you're making me want to try implementing the av1 entropy coder (which consumes up to 4 bits of entropy per cycle)

04:40 <cyrozap> Here's the results of the additional tests: For the least-compressible part of the capture (largest filesize output), my RLE encoding was 5.6 times the size of the deflated one. For the most-compressible parts, my RLE encoding was consistently about 8 times the size of what deflate produced.

04:44 <sorear> I've done twice-as-good-as-LZMA on the Unicode Character Database, personally

04:44 <sorear> haven't tried with large waveform captures, never needed to

04:46 <sorear> ultimately it comes down to (a) THINK for half a second about the local and nonlocal correlations in your data (b) throw it into an entropy coder

04:47 <sorear> are you storing data synchronous to the SPI clock or to an independent sampling clock?

04:52 <cyrozap> And last interesting comparison between my naive RLE and deflate: On the least-compressible data, RLE was 60% of the size, while deflate was 11%, which is really important because at 35 MB/s max throughput, RLE would let me do 50 MSps, while deflate would enable (in theory) over 300 MSps, though of course in practice I'm limited by the speed of the encoder and the input characteristics of the LA.

04:54 <sorear> you said the SPI was 20 MHz, why do you care about 300 MSps?

04:57 <cyrozap> sorear: Yeah, if I did that, then I wouldn't even need the compression (20 MHz SPI data would easily fit in that 35 MB/s max transfer rate, since I could omit the clock bit and stuff 8 samples of CSN+MOSI+MISO into 3 bytes, which would give me 7.5 MB/s), but I'm trying to make something a little more generic.

04:58 <cyrozap> And I'm only trying for 100 MSps so I can have at least 4x oversampling.

04:59 <cyrozap> That 300 MSps was just a theoretical number based only on the compression ratio I got from deflate, and not something I'd ever expect in practice.

04:59 <sorear> I suspect deflate will work much better if you give it protocol-level bytes rather than 8-consecutive-samples

05:05 <cyrozap> Right, but what I'm saying is that this is mostly just an excuse to get me to write a general-purpose FOSS bitstream for this USB scope/LA, so I can get it working in sigrok and not have to use Digilent WaveForms. If I just wanted to look at SPI data, of course it'd probably be better to just write an SPI sniffer peripheral and send the raw bytes over the wire (which at 20 MHz I think would be 5 MB/s,

05:05 <cyrozap> since that's just the MOSI+MISO data).

05:09 <cyrozap> Oh, and I forgot the other constraint: The timing and grouping of the SPI transactions is important here, so if I were to do that, I'd have to find some way to add a timestamp to each burst (or something).

06:38 hitomi2504 has joined ##openfpga

06:44 OmniMancer has joined ##openfpga

06:57 emeb_mac has quit [Quit: Leaving.]

07:14 mossmann has quit [Ping timeout: 258 seconds]

07:21 unkraut has quit [Remote host closed the connection]

07:21 mossmann has joined ##openfpga

07:22 unkraut has joined ##openfpga

07:47 Asu has joined ##openfpga

09:38 gregdavill_ has joined ##openfpga

09:41 gregdavill has quit [Ping timeout: 256 seconds]

10:45 Asuu has joined ##openfpga

10:45 Asu has quit [Ping timeout: 256 seconds]

11:13 Bike has joined ##openfpga

11:38 q3k has quit [Ping timeout: 260 seconds]

11:40 q3k has joined ##openfpga

11:58 nickjohnson has quit [Ping timeout: 272 seconds]

11:58 nickjohnson has joined ##openfpga

12:46 Asuu has quit [Read error: Connection reset by peer]

12:47 Asu has joined ##openfpga

13:07 <azonenberg> cyrozap: i mean of course rle is less good compression than delate

13:07 <azonenberg> what i wonder more about is gate count and timing performance

13:08 <azonenberg> how much bigger/slower is deflate?

13:08 <azonenberg> the big advantage of rle is simplicity, not compression rate

13:08 * whitequark read that as "rle is less good compression than delete"

13:22 _whitelogger has joined ##openfpga

13:26 gregdavill_ has quit [Quit: Leaving]

13:26 <Hoernchen> cyrozap, beaglelogic? huge buffer depth due to system ram...

13:27 Degi has quit [*.net *.split]

13:27 qu1j0t3 has quit [*.net *.split]

13:27 tlwoerner has quit [*.net *.split]

13:27 Sellerie has quit [*.net *.split]

13:27 ZipCPU has quit [*.net *.split]

13:27 somlo has quit [*.net *.split]

13:27 finsternis has quit [*.net *.split]

13:27 kiboneu has quit [*.net *.split]

13:27 _franck_ has quit [*.net *.split]

13:27 Finde has quit [*.net *.split]

13:27 wizzy has quit [*.net *.split]

13:28 Degi has joined ##openfpga

13:28 tlwoerner has joined ##openfpga

13:28 qu1j0t3 has joined ##openfpga

13:28 Finde has joined ##openfpga

13:28 kiboneu has joined ##openfpga

13:28 Sellerie has joined ##openfpga

13:28 ZipCPU has joined ##openfpga

13:28 finsternis has joined ##openfpga

13:28 wizzy has joined ##openfpga

13:28 somlo has joined ##openfpga

13:28 _franck_ has joined ##openfpga

13:34 <TD-Linux> the big limitation of anything more complex than rle is usually the entropy coder on a fpga

13:35 <TD-Linux> outputting more than one symbol per clock gets very difficult

13:35 <whitequark> is it fundamentally difficult or just the limitation of our tooling

13:36 <TD-Linux> fundamentally difficult. each symbol depends on the state of the entropy coder after the previous one

13:36 <whitequark> right but can you pipeline that

13:37 <tnt> well no ...

13:38 <tnt> if the next state depends on the prev state + next input, you can't just freely pipeline.

13:38 <TD-Linux> it is a latency limitation. you can't compute the next entropy coder state until the previous one is computed

13:40 <TD-Linux> for a decoder, there is a trick you can do by computing all of the possibilities for the second/future symbols and then selecting the proper one

13:40 <whitequark> what if you update the state once per n symbols

13:40 <whitequark> it would be a custom algorithm

13:40 <whitequark> but just... stuff the pipeline depth into the header or something

13:40 <tnt> TD-Linux: that sounds like it would get expensive very quickly :p

13:41 <TD-Linux> tnt, yeah, I know it is done on certain shipped hardware but only one into the future

13:41 <TD-Linux> whitequark, you can do that, it's basically multiple parallel entropy coders

13:42 <whitequark> aha

13:43 <TD-Linux> but you want the "once per n" to be relatively high because there's overhead per each output block

13:44 <TD-Linux> because the output is variable length, if you want it to be decodeable in parallel, you have to encode start positions for all of the output blocks

13:46 <TD-Linux> (oh also, in the case of huffman, the state is just a bit pointer so in that special case you can just concatenate the symbols to do multiple per clock)

13:47 cr1901_modern has quit [Read error: Connection reset by peer]

13:48 <TD-Linux> one other complication is in the case of video coders, the context model features adaptive probabilities. so every time you code a symbol, the probabilites adapt to make it more likely to code (and take less bits). the decoder model runs in lockstep

13:50 cr1901_modern has joined ##openfpga

13:52 <TD-Linux> CABAC, used in H.264 and H.265, is the pessimal example. every symbol is binary (one bit), and each bit causes a probability update

13:52 Asu has quit [Remote host closed the connection]

13:52 Asu has joined ##openfpga

13:55 <TD-Linux> AV1 instead uses variable symbol sizes, up to 16 (4 bits), so it's up to 4 times as fast. also AV1 has tiles which are basically the break point for the parallel entropy coders

13:58 <tnt> Ah, never dug mux into CABAC, but I did make a MQ decoder for jpeg2k for V4/V5 a long time ago.

13:59 <tnt> Same thing with probabilities updated at each step.

14:09 genii has joined ##openfpga

14:30 cr1901_modern1 has joined ##openfpga

14:31 <tpw_rules> TD-Linux: have you heard of finite state entropy?

14:32 <tpw_rules> i want to try and put it on an FPGA. but it's trivially pipelineable because iirc you don't have to deal with encoding start positions and demultiplexing the streams

14:32 <tpw_rules> you can just round robin symbols on both the encoder and decoder. also it doesn't require divides or anything messy

14:33 <tpw_rules> the RAD people did a GPU parallelized version called BitKnit

14:33 cr1901_modern has quit [Ping timeout: 256 seconds]

14:36 <tpw_rules> the only weird kink is that it's LIFO: you have to decode in the reverse order you encode. usually they do encoding backwards, so you have to be able to buffer an entire chunk (and build the tables if you're doing that) before you can read it backwards to encode it

14:43 cr1901_modern1 has quit [Quit: Leaving.]

14:44 cr1901_modern has joined ##openfpga

14:44 <TD-Linux> tpw_rules, encoding ANS actually super sucks on a FPGA because you have to buffer all of the (uncompressed) symbols and then encode them backwards

14:45 <TD-Linux> which is the main reason AV1 doesn't use it

14:45 <TD-Linux> but if you can tolerate that limitation it's pretty good

14:46 Asuu has joined ##openfpga

14:46 <tpw_rules> ah ok. yeah i was thinking it was a good opportunity to build the tables for the tabled version and not a problem if you had lots of DRAM. i have an application in mind but i'm not sure if i'll actually need it

14:46 Asu has quit [Ping timeout: 272 seconds]

14:48 <TD-Linux> and yeah you need the tabled version if you don't want the divide

14:49 <TD-Linux> luckily fpgas are pretty good at tables

14:58 <azonenberg> whitequark: so i started the scopehal cleanup

14:58 <azonenberg> first step was merging scopehal-cmake into scopehal-apps, which appears to have successfully completed

14:59 <azonenberg> So now azonenberg/scopehal-apps is the top level repo for the project and includes the build system and code for glscopeclient and some other utilities

14:59 <azonenberg> and has submodules for azonenberg/scopehal (library code) and azonenberg/scopehal-docs (documentation)

14:59 <azonenberg> i may merge those as well at some point

15:02 <whitequark> azonenberg: excellent

15:04 <whitequark> ah, another question

15:04 <whitequark> can you stuff ffts as a submodule? it already uses cmake anyway

15:04 <azonenberg> File a ticket against scopehal-apps and i'll look into it

15:04 <whitequark> thanks

15:05 <azonenberg> i may have to make a secondary build system around it or something in the parent repo because cmake has some global config and importing a top level cmakelists via add_subdirectory doesnt always work well

15:05 <whitequark> wait, who is github.com/antikerneldev?

15:05 <azonenberg> monochroma

15:05 <whitequark> ah

15:06 <azonenberg> She originally made the account for working on antikernel stuff back in the day and the name stuck

15:06 <whitequark> usually i add subprojects with add_subdirectory(dir EXCLUDE_FROM_ALL)

15:08 <hell__> azonenberg: thanks for reorganizing the scopehal repos! one last detail that would help is to pin scopehal-apps instead of scopehal in your profile (that's how I went from `scopehal-apps` to `scopehal` without noticing `scopehal-cmake`)

15:09 <azonenberg> hell__: will do, i missed that one

15:09 <hell__> or, if moar merging is to be done, I can wait

15:10 <azonenberg> I'm not sure yet

15:10 <azonenberg> merging scopehal-cmake with scopehal would be a much more major undertaking

15:10 <whitequark> the doc is just for glscopeclient, right?

15:10 <azonenberg> scopehal-apps with*

15:10 <azonenberg> whitequark: at the moment yes, but there will be developer documentation eventually

15:10 <whitequark> ah hm

15:10 <whitequark> in the same document?

15:10 <azonenberg> no

15:10 <azonenberg> but the same repo

15:11 <azonenberg> the repo will eventually have multiple documents in it

15:11 <azonenberg> maybe even appnotes etc

15:11 <whitequark> right, ok

15:11 <azonenberg> guides on using the library

15:11 <whitequark> i have no strong opinion on that

15:11 <azonenberg> My plan is to give it a week or two at least, and see how things shake out

15:11 <whitequark> other than "pdfs are inaccessible" but you already know that

15:11 <azonenberg> That's because i havent actually set up a proper host for them yet

15:11 <whitequark> no i mean things like

15:11 <whitequark> you can't link to a section in a pdf

15:12 * hell__ looks at their own github profile

15:12 <azonenberg> pdf.js i think supports that?

15:12 <azonenberg> though not everyone uses firefox

15:12 <whitequark> i have no idea how to do that; if your doc is in html you just right click

15:12 <whitequark> the other reason is that contributing to latex docs is pure suffering

15:13 <whitequark> there have been multiple times when i thought about describing something in the yosys manual, remembering it's latex, deciding i can just not do that

15:13 <TAL> (for pdf.js and chrome/chromiums pdf reader add #page=<pagenum>)

15:13 <whitequark> TAL: that's not the same thing at all

15:13 <TAL> ah you mean actual sections?

15:13 <whitequark> yes

15:14 <whitequark> linking to pages is stupid, it's like linking to lines of code on `master`

15:14 <whitequark> it just means the link rots in a week

15:14 <TAL> ah, nvm. yeah, sorry

15:25 emeb has joined ##openfpga

16:06 hitomi2504 has quit [Quit: Nettalk6 - www.ntalk.de]

16:47 <mithro> whitequark - More updates to quicklogic instructions at https://docs.google.com/document/d/1kPf6cIGRtvCxJHdimlbgvyMmzVkCHH-XPnbz48h2jAg/edit?ts=5f0c077d# -- still a bunch of open questions....

16:53 OmniMancer has quit [Quit: Leaving.]

20:26 m4ssi has joined ##openfpga

20:43 emeb_mac has joined ##openfpga

20:58 m4ssi has quit [Remote host closed the connection]

21:21 Asuu has quit [Remote host closed the connection]

21:55 emeb_mac has quit [Quit: Leaving.]

22:03 lexano has quit [Read error: Connection reset by peer]

22:03 <anuejn> azonenberg: now you just need a more cool name than "scopehal-apps" ;)

22:03 <anuejn> and maybe even a more cool name for glscopeclient

22:04 lexano has joined ##openfpga

22:14 X-Scale` has joined ##openfpga

22:15 X-Scale has quit [Ping timeout: 258 seconds]

22:16 X-Scale` is now known as X-Scale

22:36 bzztploink has joined ##openfpga

22:43 genii has quit [Quit: See you soon.]

22:59 <emeb> Got the davidthings USB Serial device port working on an Orange Crab with a 6502 soft core and 9600bps serial port. https://github.com/emeb/orangecrab_simple_6502

23:55 emeb has quit [Quit: Leaving.]

23:56 <awygle> i wonder what the difference is between W9751G6KB-25 and W9751G6NB-25

23:58 emeb_mac has joined ##openfpga