#scopehal on 2020-12-20 — irc logs at freenode.irclog.whitequark.org

2020-12-13 22:53 azonenberg changed the topic of #scopehal to: libscopehal, libscopeprotocols, and glscopeclient development and testing | Online hackathon December 19th all day | https://github.com/azonenberg/scopehal-apps | Logs: https://freenode.irclog.whitequark.org/scopehal

00:28 <_whitenotifier> [scopehal] CyberpunkDre forked the repository - https://git.io/JI633

01:04 Degi has quit [Ping timeout: 256 seconds]

01:05 Degi has joined #scopehal

02:14 <Cyber_Dre> OpenCL is hard to understand. I just made it through install/build process on Linux and had hiccups with OpenCL and Catch2

02:15 <Cyber_Dre> Mind you I have an AMD R3 2200G on Ubuntu 20.04.1, and not sure if OpenCL supported by AMD for this OS Version

02:16 <azonenberg> Cyber_Dre: No idea. The CL support in glscopeclient is still quite experimental so i'm not 100% I will keep it as is

02:16 <Cyber_Dre> Also was sure I built/installed Catch2 following their github instructions and I see it in my /usr/local/ but still have to disable with that flag during cmake

02:16 <azonenberg> vs moving to compute shaders for waveform processing or something else

02:17 <Cyber_Dre> I followed this link for the OpenCL installation on AMD but their driver website does not list Ubunto 20.4 and I did have failure installing one of the packages downloaded :/ https://linuxconfig.org/install-opencl-for-the-amdgpu-open-source-drivers-on-debian-and-ubuntu

02:18 <azonenberg> That is ironic because amd supports opencl as their tier 1 GPU compute target

02:19 <azonenberg> meanwhile i'm on nvidia, who is trying to bury opencl in favor of their proprietary CUDA API

02:19 <azonenberg> and just by having their blob drivers installed it worked out of the box

02:19 <azonenberg> i didnt have to install anything

02:22 <Cyber_Dre> Artifact of the company size and focus, AMD doesn't have the programmer resources of Nvida or Intel (Shoutout to the months of green/black screens on 5700XT from drivers/hardware). Even the Intel OpenCL support looks better on Linux.

02:22 <Cyber_Dre> How CPU intensive does this become? Probably varies with scopes and filters setup?

02:24 <azonenberg> Yeah it's totally dependent on the filter setup. Also If you have a slower scope you can tolerate much more processing time without losing performance

02:24 <azonenberg> I'm optimizing for the use case of deep waveforms on a LeCroy scope spitting out tens of waveforms per second

02:24 <azonenberg> So i want to be able to do complex processing pipelines in <100ms per waveform *total*

02:25 <azonenberg> which means you have single to low double digit ms per filter max

02:26 <azonenberg> (or a bit more because I multithread filter evaluation so that filters are evaluated simultaneously on different cores to the extent possible)

02:26 <azonenberg> Most filters are O(n) in waveform size

02:27 <azonenberg> however there are huge variations in the constant factor

02:27 <azonenberg> FFT is I believe O(n log n)

02:27 <azonenberg> so it gets slow as waveforms get big. Pushing FFT to GPU is one of my top priorities for performance

02:30 <Cyber_Dre> Makes sense

02:30 <azonenberg> And channel emulation / de-embedding filters use FFT under the hood

02:31 <azonenberg> so many complex serdes analysis setups will use it. often more than once

03:02 <azonenberg> also it looks like Debian has clFFT prepackaged

03:02 <azonenberg> so i'm going to experiment with that too

03:02 <azonenberg> As with OpenCL in general support will be detected at both compile and run time and only enabled if available

03:03 <azonenberg> so it should fall back gracefully

03:04 <sorear> I take it "FFT using GL 4.5 compute shaders which you already depend on" is not an option

03:05 <azonenberg> I only actually depend on gl 4.2 i think

03:06 <azonenberg> I found an iffy looking implementation of fft in them

03:06 <azonenberg> clFFT looks much more usable

03:24 <_whitenotifier> [scopehal-docs] CyberpunkDre opened pull request #23: Update Windows cmake documentation for ffts include - https://git.io/JLzyR

03:24 <Cyber_Dre> Hmm Windows build/install was much smoother beyond change to documentation (see pull request above)

03:26 <Cyber_Dre> I also don't see the OpenCL not supported message on start, so assuming that worked with my Windows (AMD R6 3600 + 5700XT) which is nice. I have older Nvidia GPUs I can slot into my Linux machine and see if the OpenCL is easier for that

03:28 <azonenberg> Did you run with --debug?

03:28 <azonenberg> i'm not sure how much of the CL present/missing stuff is printed by default

03:29 <azonenberg> i generally try to keep stdout spam low during normal operation

03:29 <azonenberg> As a developer/early adopter you should have verbosity up to --debug level by default

03:31 <azonenberg> also huh i swore we had fixed that already

03:31 <_whitenotifier> [scopehal-docs] azonenberg pushed 1 commit to master [+0/-0/±1] https://git.io/JLzS4

03:31 <_whitenotifier> [scopehal-docs] azonenberg f6d6f74 - Added libclfft-dev to suggested dependencies on Debian

03:31 <_whitenotifier> [scopehal-docs] azonenberg closed pull request #23: Update Windows cmake documentation for ffts include - https://git.io/JLzyR

03:31 <_whitenotifier> [scopehal-docs] azonenberg pushed 2 commits to master [+0/-0/±2] https://git.io/JLzSz

03:31 <_whitenotifier> [scopehal-docs] CyberpunkDre cb7bc28 - Update Windows cmake documentation for ffts include

03:31 <_whitenotifier> [scopehal-docs] azonenberg 628612a - Merge pull request #23 from CyberpunkDre/master Update Windows cmake documentation for ffts include

03:32 <azonenberg> as far as nvidia + linux, please try that

03:32 <azonenberg> the GPU acceleration support is about 12 hours old right now so i fully expect there will be quirks :p

03:42 electronic_eel_ has joined #scopehal

03:42 electronic_eel has quit [Ping timeout: 268 seconds]

03:44 <Cyber_Dre> I just ran with --debug and only extra message was "Detecting CPU features... \ * AVX2"

03:44 <Cyber_Dre> Also opened up Task Manager while running the demo/null and I see the GPU active as I add more waveforms, I think it's working pretty well here

03:46 <Cyber_Dre> Oh fun, you can definitely see the activity after play/pause

03:49 <Cyber_Dre> I also got message about the OpenGL compatibility, which is at 4.2 and does not support GL_ARB_gpu_shader_int64

03:49 <azonenberg> Rendering uses compute shaders

03:49 <azonenberg> So you will definitely see that

03:50 <azonenberg> Right now the only thing actually using opencl acceleration is the FIR filter

03:50 <azonenberg> And yes, GL_ARB_gpu_shader_int64 used to be a hard requirement

03:50 <azonenberg> i now have fallback bignum code for cards without it

03:51 <azonenberg> If you don't see "detecting OpenCL devices" you either don't have opencl or you're running an old version of the code

03:51 <azonenberg> did you pull latest? and make sure to get submodules too?

03:51 <azonenberg> fetch.recurseSubmodules is not default for some inane reason in gitconfig

03:58 <Cyber_Dre> Ah yes, I had submodules but not latest, I'm now seeing the Looking for CL_VERSION_* not found during the cmake process

03:58 <azonenberg> That should set HAVE_OPENCL false and everything should compile correctly

03:58 <azonenberg> and you should see opencl not found during startup

04:11 <Cyber_Dre> 1 rebuild later and I indeed have found the OpenCL support not present at compile message

04:14 <azonenberg> soooo i just did an initial test of clFFT

04:14 <azonenberg> got different results than i was getting with FFTS

04:14 <azonenberg> upon closer inspection, my ffts code is normalizing it wrong :p

04:15 <azonenberg> and clFFT is giving correct peak amplitudes

04:17 <azonenberg> I don't know where the error is creeping in yet

04:18 <azonenberg> but FFTS is giving me about -5 dBm for a 560 mV p-p 1 GHz tone

04:18 <azonenberg> Which is actually closer to -1 dBm

04:18 <azonenberg> clFFT gives me the expected result

04:19 <azonenberg> i guess i never bothered to actually check that the peak amplitudes were right because the shape of the spectrum was plausible

04:19 <azonenberg> there's a scaling error somehow

04:23 <azonenberg> That could explain some of my jitter spectrum issues too. They're almost certainly affected by the same bug

04:24 <azonenberg> So i guess now the question is what FFTS is doing to the output scale wise...

04:29 <azonenberg> Seems like there is a constant scale error of 1.5 with FFTS?

04:30 <azonenberg> the peak heights before normalization are almost exactly 2/3 of what they should be

04:33 <azonenberg> it doesn't break de-embeds because the forward and reverse FFTs have opposite magnitude errors and cancel out

04:33 <azonenberg> but the spectrum views are affected

04:38 <azonenberg> Derp it's not 1/1.5. I bet its 1/sqrt(2)

05:02 Degi has quit [Ping timeout: 265 seconds]

05:03 Degi has joined #scopehal

05:05 <azonenberg> oh wait

05:05 <azonenberg> i'm being stupid, i'm copying from the wrong buffer and not applying the window function

05:05 <azonenberg> of course you lose amplitude from that

05:48 Cyber_Dre has quit [Quit: Dre has quit the channel]

06:59 _whitelogger has joined #scopehal

07:06 <_whitenotifier> [scopehal-apps] azonenberg closed issue #300: December 19th hackathon meta-issue - https://git.io/JIjI7

07:23 _whitelogger has joined #scopehal

07:39 <_whitenotifier> [scopehal] azonenberg pushed 5 commits to master [+2/-0/±16] https://git.io/JLg4V

07:39 <_whitenotifier> [scopehal] azonenberg 1104744 - Initial OpenCL support for FFTFilter. Lots of software pre/post computation, only 25% faster than FFTs so far on 800K points.

07:39 <_whitenotifier> [scopehal] azonenberg 2dfb471 - Refactoring of FFT filter in preparation for applying window function in OpenCL

07:39 <_whitenotifier> [scopehal] azonenberg bfcfe79 - FFTFilter: now apply window function on GPU when possible

07:39 <_whitenotifier> [scopehal] ... and 2 more commits.

07:40 <_whitenotifier> [scopehal-apps] azonenberg pushed 1 commit to master [+0/-0/±4] https://git.io/JLg41

07:40 <_whitenotifier> [scopehal-apps] azonenberg 12d1e50 - Added clFFT detection and global cleanup. Updated submodules.

07:42 <azonenberg> Ok, i think that's pretty decent

07:42 <azonenberg> 800K point FFT including windowing and normalization: 15.82 -> 6.1 ms

07:42 <azonenberg> scalar input to log-magnitude scaled to dBm

07:47 <_whitenotifier> [scopehal] azonenberg commented on issue #200: Explore using https://github.com/Themaister/GLFFT for FFTs instead of FFTS - https://git.io/JLgBM

07:47 <_whitenotifier> [scopehal] azonenberg closed issue #200: Explore using https://github.com/Themaister/GLFFT for FFTs instead of FFTS - https://git.io/JJz6h

07:56 <azonenberg> actually sorry 800K input points

07:56 <azonenberg> windowed and zero padded then a 1M point FFT

09:23 electronic_eel_ has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]

09:23 electronic_eel has joined #scopehal

10:04 juli966 has joined #scopehal

10:53 _whitelogger has joined #scopehal

11:40 <_whitenotifier> [scopehal] azonenberg pushed 2 commits to master [+0/-0/±5] https://git.io/JLgh1

11:40 <_whitenotifier> [scopehal] azonenberg 1cfcbce - DeEmbedFilter: initial prep for clFFT

11:40 <_whitenotifier> [scopehal] azonenberg 78c1e0e - DeEmbedFilter: Now using clFFT when possible instead of FFTS. 2x speedup vs baseline, but lots of time wasted copying data back and forth since de-embed loop is CPU-side.

11:51 pd0wm has left #scopehal [#scopehal]

13:00 juli966 has quit [Quit: Nettalk6 - www.ntalk.de]

15:22 <_whitenotifier> [scopehal] miek opened pull request #389: AgilentOscilloscope: support Nth edge burst trigger - https://git.io/JL2ii

15:22 electronic_eel has quit [Ping timeout: 246 seconds]

15:23 electronic_eel has joined #scopehal

15:30 electronic_eel has quit [Ping timeout: 260 seconds]

15:31 electronic_eel has joined #scopehal

15:32 <d1b2> <atx> https://www.eevblog.com/forum/testgear/rigol-ds1000z-firmware-patch-plugins/msg1467137/#msg1467137

15:32 <d1b2> <atx> If only I had infinite free time...

15:55 maartenBE has quit [Ping timeout: 246 seconds]

15:58 maartenBE has joined #scopehal

16:05 <d1b2> <atx> https://www.eevblog.com/forum/testgear/rigol-ds1000z-firmware-patch-plugins/msg1472194/#msg1472194

16:05 <d1b2> <atx> > reading one parameter word (basically accessible and for direct access to memory) requires a dozen calls, including calling the module via a text name with a table search.

16:07 <d1b2> <atx> In general, the thread contains some insights into the SCPI processing on the Rigols

16:09 <monochroma> sounded like my theory might be correct as well, that the CPU (and thus the ethernet interface) might not have fast/full access to the waveform memory

16:10 <d1b2> <atx> Indeed, the waveform buffer likely lives in the FPGA

16:10 <d1b2> <atx> And it's unlikely that it double buffers, hence the requirement for entering stop mode before read out

16:11 <d1b2> <atx> > The specified subroutine searches for the desired module (by direct search through the table via strcmp!!!), creates a special block of query, puts it in the queue for processing and and falls asleep.

16:12 <d1b2> <atx> This just does not spark joy

16:32 <d1b2> <atx> aaand I got buffer overflow

18:59 juli966 has joined #scopehal

22:19 <lain> XD

23:31 <d1b2> <theorbtwo> IIRC from my disassembly of that stuff, ages ago, the FPGA actually DMAs the drawn waveform to an area of memory that the CPU composites in to the framebuffer directly. (Or, possibly, the FPGA is memory-mapped there.)