azonenberg changed the topic of #scopehal to: libscopehal, libscopeprotocols, and glscopeclient development and testing | | Logs:
<_whitenotifier-3> [scopehal] fsedano opened pull request #436: Add ATT and OFFSET for RS RTM3000 -
fsedano43 has joined #scopehal
<fsedano43> Thoughts about adding timestamp to trace/debug output?
<azonenberg> fsedano43: hmmm
<azonenberg> i've generally tried to make those calls fairly lightweight, that would add a fair bit of overhead. Maybe as an option
<azonenberg> I'll have a look at your PR in a bit
Degi_ has joined #scopehal
<fsedano43> Tnx. I'm working on another PR to fix some dangerous things on the RS driver that also cause hangs on the scope
Degi has quit [Ping timeout: 240 seconds]
Degi_ is now known as Degi
<azonenberg> Great
<_whitenotifier-3> [scopehal] fsedano opened pull request #437: Avoid asking for data if not needed -
<GenTooMan> azonenberg, first not really as in it doesn't have to be slow, namely I have made code similar to that work, however I can't say it's EASY to do. On the other side your suggest can be made to work. I'm not sure "my way" is the "best way" to be honest. :D
<_whitenotifier-3> [scopehal] fsedano edited pull request #437: Avoid asking for data if not needed on RS driver -
<GenTooMan> azonenberg, I think the best course of action (IE instead of "fixing" code) I will toss //FIXME - <COMMENT> for each function I find causing issues with the SDS1 scope I have first
<GenTooMan> @mubes I found 4 functions that on the SDS1104X-E the SiglentSCIPOsciloscope module has time outs. They are (in module order), GetChannelDisplayName, PollTrigger, GetChannelVoltageRange, and PullTrigger. Would it be a good idea to commit the code I have that ends up with a "Halt Conditions" window?
<_whitenotifier-3> [scopehal] azonenberg pushed 2 commits to master [+0/-0/±4]
<_whitenotifier-3> [scopehal] fsedano f96efc0 - Add ATT and OFFSET for RS
<_whitenotifier-3> [scopehal] azonenberg 540e779 - Merge pull request #436 from fsedano/add_att_offset Add ATT and OFFSET for RS RTM3000
<_whitenotifier-3> [scopehal] azonenberg closed pull request #436: Add ATT and OFFSET for RS RTM3000 -
<_whitenotifier-3> [scopehal] azonenberg pushed 2 commits to master [+0/-0/±2]
<_whitenotifier-3> [scopehal] fsedano 6cdb09d - Avoid asking for data if not needed
<_whitenotifier-3> [scopehal] azonenberg abbf63c - Merge pull request #437 from fsedano/fix_data Avoid asking for data if not needed on RS driver
<_whitenotifier-3> [scopehal] azonenberg closed pull request #437: Avoid asking for data if not needed on RS driver -
<_whitenotifier-3> [scopehal] azonenberg pushed 2 commits to master [+0/-0/±29]
<_whitenotifier-3> [scopehal] azonenberg 2612042 - Added GetAvailableCouplings() API to Oscilloscope class. Fixes #67.
<_whitenotifier-3> [scopehal] azonenberg 256252f - Merge branch 'master' of
<_whitenotifier-3> [scopehal] azonenberg closed issue #67: Add API for querying supported input coupling modes -
<_whitenotifier-3> [scopehal] azonenberg pushed 1 commit to master [+0/-0/±2]
<_whitenotifier-3> [scopehal] azonenberg df8c160 - Added OscilloscopeChannel::GetAvailableCouplings()
<_whitenotifier-3> [scopehal-apps] azonenberg pushed 1 commit to master [+0/-0/±4]
<_whitenotifier-3> [scopehal-apps] azonenberg d2e017f - Added 50 ohm AC coupling option to context menu for supported scopes. Coupling menu now hides items which don't apply to the active instrument.
<d1b2> <mubes> @GenTooMan a couple of those functions are pretty fundemental :-) you can see what the semantics are so I would suggest you guard using the typeid of the scope and figure new incantations to get the behaviour you want. Personally I'd try it in a telnet session or similar first as that saves you compile cycles etc. If you want to put up a PR for comment feel free to do it against my repository at if you like.
<d1b2> There's no problem with doing it against the azonenburg's repository but I'm trying to figure a scalable approach...come the happy day that the project has a hundred contributors PRing developing work against the main repository isn't going to work too well :-)
Tost has joined #scopehal
<azonenberg> Also hmmmmm
<azonenberg> I'm beginning to wonder about the possibility of rewriting the renderer using OpenCL instead of compute shaders at some point
<monochroma> azonenberg: oh?
<azonenberg> So the immediate use case is my work laptop
<azonenberg> Where i have everything rendered on the discrete intel card
<azonenberg> but there's an nvidia card too
<azonenberg> i could plausibly use that as a headless compute accelerator w/o using it for graphics
<azonenberg> But if we had a software opencl backend it could allow running in VMs
<azonenberg> might work on mac too
<azonenberg> Thinking a bit more, it would only be a kloc or so of stuff that had to be rewritten
<monochroma> oooo that would be nice
<d1b2> <mubes> It opens the possibility of a web based front end in future too?
<azonenberg> The opencl stuff needs a bit of fixup before i cna consider that though
<azonenberg> can*
<azonenberg> So basically we would end up with a fully OpenCL based data processing and waveform rendering backend
<azonenberg> which produced bitmaps that a very thin OpenGL compositor would combine with the software rendered overlays and display
<d1b2> <mubes> yum yum
<d1b2> <mubes> Architecturally separating processing from display makes a huge amount of also let's developers play to their strengths (I'm allergic to UIs, for example).
<azonenberg> Well we have that already for the most part
<azonenberg> in terms of glscopeclient vs libscopehal
<azonenberg> The issue is more, right now the rendering of waveforms is done in compute shaders which have proved to be a big headache
<azonenberg> Not that opencl isnt
<sorear> when I last looked into the matter a few years ago WebCL was DOA and the browser people wanted to do opengl 4 compute shaders exclusively instead
<azonenberg> Web stuff is not even remotely on my radar forglscopeclient
<azonenberg> i'm thinking about desktop/laptop use cases here
<azonenberg> if it turns out that opencl, even if software based, can be made to work on mac or VM platforms
<azonenberg> that's a strong argument in favor of moving waveform rendering over to that
<d1b2> <mubes> Is this a pre-release item?
<azonenberg> converting to opencl? this is a pre v1.0 item if we decide to do it at all
<azonenberg> i would not say it's a priority for v0.1 although if i find the time to play with it, i might try
<azonenberg> Not even going to make a ticket just yet. but i did want to make some other improvements to the rendering logic in that area and it might be nicer to do in CL than in shaders
<azonenberg> Before i think about any of that i need to fix some other CL issues
<azonenberg> in particular, really large opencl fft's seem to hit limitations of some sort and fail to run
<azonenberg> and i need to fall back to software compute in those cases rather than aborting or giving garbage results etc
<azonenberg> also if we're going to use it for rendering rather than just compute acceleration, the noopencl switch has to be removed as it will now be a mandatory feature
<azonenberg> and we can also clean up some other processing code by not needing to maintain software and accelerator based paths
<azonenberg> It might actually simplify things a fair bit
<azonenberg> i guess the question is... we know there are platforms with at least software based opencl support that don't have compute shaders
<azonenberg> Is there anything that has compute shaders that can't do opencl?
<d1b2> <mubes> I'm seeing some issues in rendering with large (for me, 10M/channel) datasets that I need to investigate. Xorg goes to 100% load for 2-3 seconds and the whole UI locks out. Haven't had the chance to dig at it yet, and might not for a while.
<azonenberg> Most likely you're seeing scaling issues in the current renderer. Are you zoomed out pretty far and does it get better when you zoom in?
<azonenberg> Right now we run one GPU thread per X coordinate value. So if you have too many points per pixel, you have a fairly large loop on a single GPU thread
<azonenberg> Which leads to long delays
<azonenberg> That's one of the things i wanted to fix with a fairly significant rewrite of the shaders
<azonenberg> which is why i'm thinking it might be a good excuse to migrate to opencl at the same time if that's gonna happen
<d1b2> <mubes> Yes, that seems to explain the symptoms
<d1b2> <mubes> It's generally when I'm 'finding' the waveform. It would be nice if a capture snapped to the full waveform when its landed, but perhaps that won't work well for multi-device captures.
<azonenberg> Yeah i have plans to figure out something about autoscaling
<azonenberg> in the early days it wasnt practical because Reasons (tm)
<azonenberg> but those limitations are no longer present
<d1b2> <mubes> You're in that horrible bit of a project where you're far enough along to know what you want to do, but not so far along that it feels like you've got lots of it done 🙂
<azonenberg> lol
<azonenberg> Also it sounds like there's lots of issues on opencl + AMD
<azonenberg> in particular the combination of using opengl and opencl at the same time is buggy
<azonenberg> So we might want to stick with compute shaders for now. Fixing those scaling issues will definitely be good though
<azonenberg> It's just nontrivial to figure out how
<xzcvczx> opencl is the devil :P
<azonenberg> well CUDA is always an option :p
<azonenberg> CUDA + graphics APIs will work great
<azonenberg> and CUDA provides a FFT library
<azonenberg> but that only works on nvidia
<azonenberg> If we had enough development resources to support multiple implementations of various stuff
<azonenberg> we could plausibly provide cuda, opencl, and compute shader backends and pick whatever is most convenient at build/run time
<azonenberg> but right now, it's just me doing all of that
<xzcvczx> one azonenberg for sale
<xzcvczx> actually no nvm
<azonenberg> and i have to pick something that works a) for me and b) for as many other people as reasonably practical
<xzcvczx> it improved at the end there
* xzcvczx gives azonenberg another cookie for getting it working on his machine
<azonenberg> lol
<azonenberg> i still have bugs with CL stuff on my system
<azonenberg> It's not ideal
<azonenberg> And it's hard to optimize because nvidia's profiler doesnt support opencl, only cuda
<xzcvczx> yeah i setup all the opencl stuff and it just crashed
<xzcvczx> because 2+2 apparently != 5
<xzcvczx> azonenberg: out of curiosity have you tried going *bsd for os?
<azonenberg> No
<xzcvczx> fair enough
<azonenberg> I use too many binary blob tools like sonnet and vivado
<azonenberg> i'm not going to try to get those running under bsd
<azonenberg> its enough of a pain on linux
<xzcvczx> ah true you nvidia, which works for crap on bsd nvm then :)
<azonenberg> As far as i can tell, there is no option for gpu compute that runs on linux/windows/osx, nvidia/amd/intel, and plays well with opengl
<azonenberg> opencl is the closest there is and it's a giant pile of garbage
<xzcvczx> i thought there was another attempt in the early stages
<xzcvczx> oh i might have just been thinking of opencl 3.0
<_whitenotifier-3> [scopehal-apps] azonenberg labeled issue #327: Waveform compute shaders get really slow if too many points per X coordinate -
<_whitenotifier-3> [scopehal-apps] azonenberg opened issue #327: Waveform compute shaders get really slow if too many points per X coordinate -
<d1b2> <bob_twinkles> if you're willing to require relatively modern hardware and drivers, there are OpenGL/VK compute shaders
<d1b2> <bob_twinkles> though i think you've already put quite a bit of work in to the opencl stuff, so it may not be worth switching
<azonenberg> bob_twinkles: Right now we use compute shaders for rendering
<azonenberg> That's a core requirement
<azonenberg> the problem is that OpenGL 4 is not supported on OSX or in any hypervisor
<xzcvczx> relatively modern == intel 4000 series cpus :)
<azonenberg> Hence considering the possibility of moving elsewhere
<d1b2> <bob_twinkles> classic apple drivers =/
<azonenberg> Apple considers opengl deprecated
<azonenberg> they just stopped updating their implementation because they want everyone to use Metal
<xzcvczx> azonenberg: can llvmpipe be used on macos?
<azonenberg> Don't know
<azonenberg> Vulkan might become an option in the future but right now there is no good interop between vulkan and GTK. At least on any stable linux distro
<azonenberg> gtk4 might be adding something along those lines, i know they did a lot of improvements to GL performance etc
<d1b2> <bob_twinkles> I say relatively modern because IIRC the stuff that's core in early versions of GL is pretty useless
<azonenberg> bob_twinkles: Yeah. The current minimum requirement for glscopeclient is gl 4.2 plus GL_ARB_compute_shader and a few other extensions
<azonenberg> most of which are in 4.3
<azonenberg> We also optionally use OpenCL for accelerating a bunch of waveform processing
<azonenberg> but from what i can tell CL+GL interop is a nightmare on AMD
<azonenberg> and even on my nvidia platform it's a pain
<xzcvczx> you use nvidia rather than nv i assume?
<azonenberg> I use the blob drives, yes. I consider nouveau in the same class as internet explorer
<azonenberg> a tool you use to install something better
<xzcvczx> lol
<xzcvczx> how windows xp of you :)
<azonenberg> it's worse than useless because it gets in the way of the blob driver
<azonenberg> which is the only way to get any actual work done
<d1b2> <bob_twinkles> (reading more backlog) if you move your big loops to compute shaders/CL, that will solve your hangs
<d1b2> <bob_twinkles> on NV, graphics work (including frag shaders) cannot be interrupted to ctxsw but compute can
<azonenberg> No
<azonenberg> The loops are in compute shaders now
<d1b2> <bob_twinkles> huh
<d1b2> <bob_twinkles> that probably shouldn't be hanging the whole system then
<azonenberg> on nvidia? it doesnt
<azonenberg> it just slows down to a crawl
<azonenberg> it does however hang the whole thing on intel integrated gfx afaik
<azonenberg> the other thing is, it's making inefficient use of the gpu by doing so much work in a few threads
<azonenberg> I need to retool it to use a 2D thread array of some sort
<xzcvczx> is this if you scroll out massively?
<azonenberg> Yes
<xzcvczx> ah yeah that was fun
<azonenberg> basically you loop over potentially a 50M point waveform in one gpu thread
<xzcvczx> woops zoom not scroll out
<d1b2> <bob_twinkles> ah, ok. Sorry for the noise
<d1b2> <bob_twinkles> if you don't have a burning desire to work on this particular issue, I might have some bandwidth to take a look at it this weekend
<azonenberg> actually wait a minute it looks like i did actually start cleaning things up as far as 2D multithreading
<azonenberg> so that improved it a bit
<azonenberg> now it uses 16 threads per X coordinate
<azonenberg> But still you end up bottlenecking on those 16
<azonenberg> bob_twinkles: sure if you wanna look at it, go for it
<azonenberg> fundamentally the algorithm is as follows
<azonenberg> Preprocess (on the CPU) the waveform so we know which ranges of X coordinates map to which pixel locations
<azonenberg> Each group of 16 threads starts fetching from its offset and loops over the points in its bin
<azonenberg> then it finds the min/max Y values for the sample segment within this pixel, interpolating if needed
<azonenberg> then at the very end, it fills things
<azonenberg> The main interesting bit is in waveform-compute-core.glsl
<azonenberg> this block is compiled multiple times into different shaders for analog, digital, and histogram rendering
<azonenberg> it's basically a simple rasterizer
<azonenberg> that does intensity grading
<azonenberg> the output is a fp32 monochrome texture which is then colorized in a fragment shader at the final point of rendering
<azonenberg> This is the third generation of renderer
<azonenberg> First round used GL_LINES and looked gross
<azonenberg> second round tesselated to GL triangles and had its own set of problems
<azonenberg> Then i switched to compute
<azonenberg> I think sticking with compute is the way to go but the current algorithm is probably inefficient
<azonenberg> but something like bresenham rasterization is a bad idea too because of the very common scenario of having hundreds or thousands of waveform points in a small area
<azonenberg> you dont want to keep rasterizing and stacking them
<azonenberg> so as you can see here i just store min/max Y values
<azonenberg> then at the end of the inner loop i bump the alpha values
<d1b2> <bob_twinkles> makes sense
<d1b2> <bob_twinkles> is there somewhere I can grab some trace samples?
<azonenberg> Best for this would probably just be using the "demo" driver
<azonenberg> which has a bunch of test signals - 8b10b, sinewave, sum of sweeping sines, etc
<azonenberg> you can control sample rate and depth to a range of values which should provide a reasonable testbed
<d1b2> <bob_twinkles> ah nice, thanks!
<azonenberg> oh i almost forgot to mention the other variant
<azonenberg> Waveforms have X coordinates too, since they can be sparse/irregularly sampled
<azonenberg> These are 64 bit ints but not all cards support GL_ARB_gpu_shader_int64
<azonenberg> so i have one variant of the shader that does bignum int32 math and one that uses native int64s
<azonenberg> Longer term there will probably be another one that's optimized for uniform spacing as that's the common case
<azonenberg> and it will eliminate all the memory fetches of X coordinates
<azonenberg> but i imagine the rasterizer will be basically the same, you'd just have one version use memory fetches and the other just index*spacing + offset or something
<azonenberg> Anyway if you wanna have a look at it, definitely let me know :)
<d1b2> <bob_twinkles> yeah, taking a look through the shaders
<azonenberg> one other FYI is that the shaders are in src/glscopeclient/shaders/ but the makefile copies them to the build directory
<azonenberg> so despite the fact that they're not compiled per se, you do need to rerun make to see updates
<azonenberg> this will be fixed later on once we get proper data file path resolution taken care of
<azonenberg> (another v0.1 pending item)
<d1b2> <bob_twinkles> I think maybe an adaptation of the techniques used for light batching in deferred rendering pipelines could work here
<_whitenotifier-3> [scopehal-apps] azonenberg opened issue #328: Add support for dense packed waveforms to rendering shaders -
<_whitenotifier-3> [scopehal-apps] azonenberg labeled issue #328: Add support for dense packed waveforms to rendering shaders -
<azonenberg> also see that one. that would probably be fairly easy and give a 3x reduction in GPU memory bandwidth usage for non-sparse waveforms
<azonenberg> i.e. fetching a float per sample rather than a float plus an int64
<d1b2> <bob_twinkles> makes sense. if I'm reworking the rendering to start with I probably won't be able to do that optimization as well, but I'll try to leave the option open
<azonenberg> With the current architecture that optimization is actually a very clean fit i think
<azonenberg> there's on function FetchX() that would have to be repalced
<azonenberg> one*
<azonenberg> and then probably catching some stuff host side to not bother pushing the x coordinates to the gpu at all if they're not going to be used
<azonenberg> and adding a flag to the header struct to specify which mode is active
<azonenberg> actually no flag
<azonenberg> just an ifdef and two builds of the shader
<azonenberg> so you'd need to have two shader objects in WaveformArea since it's possible that we could switch a given view from sparse to non-sparse as conditions change
<azonenberg> e.g. consider displaying the output of a math function
<azonenberg> if you swap the input from dense to sparse the output will swap too
<_whitenotifier-3> [scopehal-apps] azonenberg commented on issue #325: GPU hang on iris Plus driver -
<_whitenotifier-3> [scopehal] azonenberg pushed 1 commit to master [+0/-0/±1]
<_whitenotifier-3> [scopehal] azonenberg 01c607d - LeCroyOscilloscope: added sample rate/depth tables for SDA8Zi. Fixes #400.
<_whitenotifier-3> [scopehal] azonenberg closed issue #400: LeCroy: Support for SDA 8Zi memory depth/sample rate tables -
<_whitenotifier-3> [scopehal] fsedano opened issue #438: Support digital channels on RS RTM3000 scope -
fsedano43 has quit [Quit: Connection closed]
<d1b2> <fsedano> re: hang on Iris plus - It might be something different that we were thinking. It just happened now to me with a stopped capture, just by playing with the menus (no zoom etc) - It went into a state where the popup menu was being shown and hidden in a cycle without me doing anything, then complete lockup
<d1b2> <fsedano> At this stage scopehal is pretty much unusable for me - It crashes my laptop every few minutes at most
fsedano has quit [Quit: Connection closed]
Tost has quit [Ping timeout: 265 seconds]
Degi has quit [*.net *.split]
Ekho has quit [*.net *.split]
maartenBE has quit [Ping timeout: 240 seconds]
Degi has joined #scopehal
maartenBE has joined #scopehal
Ekho has joined #scopehal
Tost has joined #scopehal
juli9610 has joined #scopehal
juli9610 has quit [Quit: Nettalk6 -]
juli9610 has joined #scopehal
Tost has quit [Ping timeout: 252 seconds]
<d1b2> <mubes> Well, that's the first time I've overflowed an int64_t in regular code 🙂
<d1b2> <mubes> @azonenberg Even 'reasonable' sample rates don't fit in a uint64_t when measured in fs, unless I'm doing something silly. Biggest int into one is 1.84x10^19, and 1GS/sec in Samples/fs is 1x10^24. do we get out of this one?
<d1b2> <mubes> Ah...GetSampleRate is in seconds, not fS. Phew.
<azonenberg> mubes: lo
<azonenberg> yeah Hz is the base unit for frequency
<azonenberg> however when working with sample *periods* we use fs
<azonenberg> fs provides a reasonable range there
<azonenberg> actually since we multiply by the timebase unit this puts a ceiling on the upper length of a scopehal capture
<azonenberg> Approximately 5 hours 7 minutes, or +/- half that since time units are signed
<d1b2> <mubes> Just trying to sort out the mess with trigger points, it's easy once we've got a sample set 'cos it's in the wavedesc, but we need some default for the startup case before any waveform has arrived. I'm bat at it again tomorrow I think.
<azonenberg> I was using ps before which had a much longer min duration, on the order of a month, but just lacked the resolution needed for high speed serial stuff
<azonenberg> I figure you probably arent going to have a single acquisition longer than two hours anyway
<azonenberg> we can break anything longer than that up into multiple waveforms since we use 128-bit timestamps for the beginning of a waveform (64-bit time_t plus 64-bit fs since the last whole second)
<d1b2> <mubes> Yeah, I'd got an extra SECONDS_PER_FS in there and it was frying my brain 😦
<d1b2> <mubes> I may have been slightly too negative about the sampling speed on this thing. With one channel and up to 500Kpoints I get 2.37 frames/sec, which actually feels vaguely interactive.
<d1b2> <mubes> 200Kpoints on 4 channels is about 0.8 frames/sec though.
<d1b2> <mubes> Right, off to file Zzs. Will try and get this pushed out tomorrow, with a following wind.
<azonenberg> I mean, i'm spoiled with lecroy and pico stuff getting double digit WFM/s even on deep memory. But low end gear isnt really optimized for this use case so it's never performed well
<azonenberg> Some entry level keysight/agilent stuff is the only other good performer we've seen so far IIRC. miek has got good results on his 3000 series i think
juli9610 has quit [Quit: Nettalk6 -]
<d1b2> <mubes> I ought to try this with usb tbh...often the different transports behave differently, although generally ethernet is faster than usb in my experience.