azonenberg changed the topic of #scopehal to: libscopehal, libscopeprotocols, and glscopeclient development and testing | Online hackathon December 19th all day | https://github.com/azonenberg/scopehal-apps | Logs: https://freenode.irclog.whitequark.org/scopehal
<_whitenotifier> [scopehal] CyberpunkDre forked the repository - https://git.io/JI633
Degi has quit [Ping timeout: 256 seconds]
Degi has joined #scopehal
<Cyber_Dre> OpenCL is hard to understand. I just made it through install/build process on Linux and had hiccups with OpenCL and Catch2
<Cyber_Dre> Mind you I have an AMD R3 2200G on Ubuntu 20.04.1, and not sure if OpenCL supported by AMD for this OS Version
<azonenberg> Cyber_Dre: No idea. The CL support in glscopeclient is still quite experimental so i'm not 100% I will keep it as is
<Cyber_Dre> Also was sure I built/installed Catch2 following their github instructions and I see it in my /usr/local/ but still have to disable with that flag during cmake
<azonenberg> vs moving to compute shaders for waveform processing or something else
<Cyber_Dre> I followed this link for the OpenCL installation on AMD but their driver website does not list Ubunto 20.4 and I did have failure installing one of the packages downloaded :/ https://linuxconfig.org/install-opencl-for-the-amdgpu-open-source-drivers-on-debian-and-ubuntu
<azonenberg> That is ironic because amd supports opencl as their tier 1 GPU compute target
<azonenberg> meanwhile i'm on nvidia, who is trying to bury opencl in favor of their proprietary CUDA API
<azonenberg> and just by having their blob drivers installed it worked out of the box
<azonenberg> i didnt have to install anything
<Cyber_Dre> Artifact of the company size and focus, AMD doesn't have the programmer resources of Nvida or Intel (Shoutout to the months of green/black screens on 5700XT from drivers/hardware). Even the Intel OpenCL support looks better on Linux.
<Cyber_Dre> How CPU intensive does this become? Probably varies with scopes and filters setup?
<azonenberg> Yeah it's totally dependent on the filter setup. Also If you have a slower scope you can tolerate much more processing time without losing performance
<azonenberg> I'm optimizing for the use case of deep waveforms on a LeCroy scope spitting out tens of waveforms per second
<azonenberg> So i want to be able to do complex processing pipelines in <100ms per waveform *total*
<azonenberg> which means you have single to low double digit ms per filter max
<azonenberg> (or a bit more because I multithread filter evaluation so that filters are evaluated simultaneously on different cores to the extent possible)
<azonenberg> Most filters are O(n) in waveform size
<azonenberg> however there are huge variations in the constant factor
<azonenberg> FFT is I believe O(n log n)
<azonenberg> so it gets slow as waveforms get big. Pushing FFT to GPU is one of my top priorities for performance
<Cyber_Dre> Makes sense
<azonenberg> And channel emulation / de-embedding filters use FFT under the hood
<azonenberg> so many complex serdes analysis setups will use it. often more than once
<azonenberg> also it looks like Debian has clFFT prepackaged
<azonenberg> so i'm going to experiment with that too
<azonenberg> As with OpenCL in general support will be detected at both compile and run time and only enabled if available
<azonenberg> so it should fall back gracefully
<sorear> I take it "FFT using GL 4.5 compute shaders which you already depend on" is not an option
<azonenberg> I only actually depend on gl 4.2 i think
<azonenberg> I found an iffy looking implementation of fft in them
<azonenberg> clFFT looks much more usable
<_whitenotifier> [scopehal-docs] CyberpunkDre opened pull request #23: Update Windows cmake documentation for ffts include - https://git.io/JLzyR
<Cyber_Dre> Hmm Windows build/install was much smoother beyond change to documentation (see pull request above)
<Cyber_Dre> I also don't see the OpenCL not supported message on start, so assuming that worked with my Windows (AMD R6 3600 + 5700XT) which is nice. I have older Nvidia GPUs I can slot into my Linux machine and see if the OpenCL is easier for that
<azonenberg> Did you run with --debug?
<azonenberg> i'm not sure how much of the CL present/missing stuff is printed by default
<azonenberg> i generally try to keep stdout spam low during normal operation
<azonenberg> As a developer/early adopter you should have verbosity up to --debug level by default
<azonenberg> also huh i swore we had fixed that already
<_whitenotifier> [scopehal-docs] azonenberg pushed 1 commit to master [+0/-0/±1] https://git.io/JLzS4
<_whitenotifier> [scopehal-docs] azonenberg f6d6f74 - Added libclfft-dev to suggested dependencies on Debian
<_whitenotifier> [scopehal-docs] azonenberg closed pull request #23: Update Windows cmake documentation for ffts include - https://git.io/JLzyR
<_whitenotifier> [scopehal-docs] azonenberg pushed 2 commits to master [+0/-0/±2] https://git.io/JLzSz
<_whitenotifier> [scopehal-docs] CyberpunkDre cb7bc28 - Update Windows cmake documentation for ffts include
<_whitenotifier> [scopehal-docs] azonenberg 628612a - Merge pull request #23 from CyberpunkDre/master Update Windows cmake documentation for ffts include
<azonenberg> as far as nvidia + linux, please try that
<azonenberg> the GPU acceleration support is about 12 hours old right now so i fully expect there will be quirks :p
electronic_eel_ has joined #scopehal
electronic_eel has quit [Ping timeout: 268 seconds]
<Cyber_Dre> I just ran with --debug and only extra message was "Detecting CPU features... \ * AVX2"
<Cyber_Dre> Also opened up Task Manager while running the demo/null and I see the GPU active as I add more waveforms, I think it's working pretty well here
<Cyber_Dre> Oh fun, you can definitely see the activity after play/pause
<Cyber_Dre> I also got message about the OpenGL compatibility, which is at 4.2 and does not support GL_ARB_gpu_shader_int64
<azonenberg> Rendering uses compute shaders
<azonenberg> So you will definitely see that
<azonenberg> Right now the only thing actually using opencl acceleration is the FIR filter
<azonenberg> And yes, GL_ARB_gpu_shader_int64 used to be a hard requirement
<azonenberg> i now have fallback bignum code for cards without it
<azonenberg> If you don't see "detecting OpenCL devices" you either don't have opencl or you're running an old version of the code
<azonenberg> did you pull latest? and make sure to get submodules too?
<azonenberg> fetch.recurseSubmodules is not default for some inane reason in gitconfig
<Cyber_Dre> Ah yes, I had submodules but not latest, I'm now seeing the Looking for CL_VERSION_* not found during the cmake process
<azonenberg> That should set HAVE_OPENCL false and everything should compile correctly
<azonenberg> and you should see opencl not found during startup
<Cyber_Dre> 1 rebuild later and I indeed have found the OpenCL support not present at compile message
<azonenberg> soooo i just did an initial test of clFFT
<azonenberg> got different results than i was getting with FFTS
<azonenberg> upon closer inspection, my ffts code is normalizing it wrong :p
<azonenberg> and clFFT is giving correct peak amplitudes
<azonenberg> I don't know where the error is creeping in yet
<azonenberg> but FFTS is giving me about -5 dBm for a 560 mV p-p 1 GHz tone
<azonenberg> Which is actually closer to -1 dBm
<azonenberg> clFFT gives me the expected result
<azonenberg> i guess i never bothered to actually check that the peak amplitudes were right because the shape of the spectrum was plausible
<azonenberg> there's a scaling error somehow
<azonenberg> That could explain some of my jitter spectrum issues too. They're almost certainly affected by the same bug
<azonenberg> So i guess now the question is what FFTS is doing to the output scale wise...
<azonenberg> Seems like there is a constant scale error of 1.5 with FFTS?
<azonenberg> the peak heights before normalization are almost exactly 2/3 of what they should be
<azonenberg> it doesn't break de-embeds because the forward and reverse FFTs have opposite magnitude errors and cancel out
<azonenberg> but the spectrum views are affected
<azonenberg> Derp it's not 1/1.5. I bet its 1/sqrt(2)
Degi has quit [Ping timeout: 265 seconds]
Degi has joined #scopehal
<azonenberg> oh wait
<azonenberg> i'm being stupid, i'm copying from the wrong buffer and not applying the window function
<azonenberg> of course you lose amplitude from that
Cyber_Dre has quit [Quit: Dre has quit the channel]
_whitelogger has joined #scopehal
<_whitenotifier> [scopehal-apps] azonenberg closed issue #300: December 19th hackathon meta-issue - https://git.io/JIjI7
_whitelogger has joined #scopehal
<_whitenotifier> [scopehal] azonenberg pushed 5 commits to master [+2/-0/±16] https://git.io/JLg4V
<_whitenotifier> [scopehal] azonenberg 1104744 - Initial OpenCL support for FFTFilter. Lots of software pre/post computation, only 25% faster than FFTs so far on 800K points.
<_whitenotifier> [scopehal] azonenberg 2dfb471 - Refactoring of FFT filter in preparation for applying window function in OpenCL
<_whitenotifier> [scopehal] azonenberg bfcfe79 - FFTFilter: now apply window function on GPU when possible
<_whitenotifier> [scopehal] ... and 2 more commits.
<_whitenotifier> [scopehal-apps] azonenberg pushed 1 commit to master [+0/-0/±4] https://git.io/JLg41
<_whitenotifier> [scopehal-apps] azonenberg 12d1e50 - Added clFFT detection and global cleanup. Updated submodules.
<azonenberg> Ok, i think that's pretty decent
<azonenberg> 800K point FFT including windowing and normalization: 15.82 -> 6.1 ms
<azonenberg> scalar input to log-magnitude scaled to dBm
<_whitenotifier> [scopehal] azonenberg commented on issue #200: Explore using https://github.com/Themaister/GLFFT for FFTs instead of FFTS - https://git.io/JLgBM
<_whitenotifier> [scopehal] azonenberg closed issue #200: Explore using https://github.com/Themaister/GLFFT for FFTs instead of FFTS - https://git.io/JJz6h
<azonenberg> actually sorry 800K input points
<azonenberg> windowed and zero padded then a 1M point FFT
electronic_eel_ has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
electronic_eel has joined #scopehal
juli966 has joined #scopehal
_whitelogger has joined #scopehal
<_whitenotifier> [scopehal] azonenberg pushed 2 commits to master [+0/-0/±5] https://git.io/JLgh1
<_whitenotifier> [scopehal] azonenberg 1cfcbce - DeEmbedFilter: initial prep for clFFT
<_whitenotifier> [scopehal] azonenberg 78c1e0e - DeEmbedFilter: Now using clFFT when possible instead of FFTS. 2x speedup vs baseline, but lots of time wasted copying data back and forth since de-embed loop is CPU-side.
pd0wm has left #scopehal [#scopehal]
juli966 has quit [Quit: Nettalk6 - www.ntalk.de]
<_whitenotifier> [scopehal] miek opened pull request #389: AgilentOscilloscope: support Nth edge burst trigger - https://git.io/JL2ii
electronic_eel has quit [Ping timeout: 246 seconds]
electronic_eel has joined #scopehal
electronic_eel has quit [Ping timeout: 260 seconds]
electronic_eel has joined #scopehal
<d1b2> <atx> If only I had infinite free time...
maartenBE has quit [Ping timeout: 246 seconds]
maartenBE has joined #scopehal
<d1b2> <atx> > reading one parameter word (basically accessible and for direct access to memory) requires a dozen calls, including calling the module via a text name with a table search.
<d1b2> <atx> In general, the thread contains some insights into the SCPI processing on the Rigols
<monochroma> sounded like my theory might be correct as well, that the CPU (and thus the ethernet interface) might not have fast/full access to the waveform memory
<d1b2> <atx> Indeed, the waveform buffer likely lives in the FPGA
<d1b2> <atx> And it's unlikely that it double buffers, hence the requirement for entering stop mode before read out
<d1b2> <atx> > The specified subroutine searches for the desired module (by direct search through the table via strcmp!!!), creates a special block of query, puts it in the queue for processing and and falls asleep.
<d1b2> <atx> This just does not spark joy
<d1b2> <atx> aaand I got buffer overflow
juli966 has joined #scopehal
<lain> XD
<d1b2> <theorbtwo> IIRC from my disassembly of that stuff, ages ago, the FPGA actually DMAs the drawn waveform to an area of memory that the CPU composites in to the framebuffer directly. (Or, possibly, the FPGA is memory-mapped there.)