azonenberg changed the topic of #scopehal to: libscopehal, libscopeprotocols, and glscopeclient development and testing | https://github.com/azonenberg/scopehal-apps | Logs: https://freenode.irclog.whitequark.org/scopehal
Degi_ has joined #scopehal
Degi has quit [Ping timeout: 252 seconds]
Degi_ is now known as Degi
<_whitenotifier-3> [scopehal] fsedano opened pull request #465: RS Scope driver: Update cache when enabling/disabling channels. Parse… - https://git.io/J39KR
<_whitenotifier-3> [scopehal] azonenberg pushed 2 commits to master [+0/-0/±2] https://git.io/J39Ki
<_whitenotifier-3> [scopehal] fsedano e270bc9 - RS Scope driver: Update cache when enabling/disabling channels. Parse correct reply from scope
<_whitenotifier-3> [scopehal] azonenberg 7312ee7 - Merge pull request #465 from fsedano/fix_enable_channel RS Scope driver: Update cache when enabling/disabling channels. Parse…
<_whitenotifier-3> [scopehal] azonenberg closed pull request #465: RS Scope driver: Update cache when enabling/disabling channels. Parse… - https://git.io/J39KR
<d1b2> <fsedano> @azonenberg just noticed channel 1 cant be disabled. It can be on the GUI, but the driver never receives the request
<d1b2> <fsedano> Is that intentional?
<azonenberg> fsedano: what is the trigger set to?
<azonenberg> Channels are reference counted and only switched off when they have no open references
<d1b2> <fsedano> That channel.. Let me see what happens if I move the trig to a different one
<azonenberg> Trigger counts as a reference because on some scopes, like Pico, you can't trigger on a disabled channel
<d1b2> <fsedano> hum.. how can I change trigger from UI?
<d1b2> <fsedano> ah found sorry
<azonenberg> yeah setup|trigger. There might be a toolbar button or something added for that in the future
<d1b2> <fsedano> ok that was it
<azonenberg> Yeah, there might be room to optimize somehow to not download waveform data if the channel is only used for trigger, but it has to be enabled in hardware
<azonenberg> and honestly it's not THAT common to be trigger on on an invisible channel unless it's ext trig
<azonenberg> to be triggering*
<d1b2> <fsedano> another Q -what should be the correct behaviour when on AcquireData() we don't have data for a particular channel?
<d1b2> <fsedano> I.e. channel is enabled but there's no data.
<d1b2> <fsedano> If I just skip updating that channel I get UI artifacts
<d1b2> <fsedano> meaning headers disappear, previous sample is erased, etc
<azonenberg> If the channel is enabled but there's no data? set it to null
<azonenberg> not sure how that would happen but that's what e.g. filters do if they encounter a situation where they can't proceed
<azonenberg> if you don't update you'll get issues because the history window claims that waveform and moves it into the archives
<azonenberg> and having it simultaneously on screen and in history would probably cause problems
<d1b2> <fsedano> set to null like:
<d1b2> <fsedano> //AnalogWaveform* cap = new AnalogWaveform; //cap->Resize(0); //pending_waveforms[i].push_back(cap);
<d1b2> <fsedano> ?
<azonenberg> No
<azonenberg> literally null
<azonenberg> don't create a waveform object at all
<d1b2> <fsedano> Example?
<d1b2> <fsedano> pending_waveforms[i].push_back(NULL); ?
<azonenberg> Correct
<d1b2> <fsedano> That crashes my GPU 😉
<d1b2> <fsedano> with our old friend signature
<azonenberg> Interesting
<d1b2> <fsedano> I can try again... If it takes more than 20 seconds for me to reply you know what happened
<azonenberg> Lol a zero-sample waveform should work too
<d1b2> <fsedano> ok this time it didnt crash but not nice result
<d1b2> <fsedano> 0:37
<azonenberg> interesting so all channels disappear not just that one?
<d1b2> <fsedano> I get no reading on all of them
<d1b2> <fsedano> Look at the traces
<azonenberg> Innnteresting
<azonenberg> yeah that should not happen
<d1b2> <fsedano> I'd rather just completely skip the update so I keep old trace on screen
<azonenberg> I've got some stuff to do for work but will take a look shortly and see if i can reproduce using some patches on the demo driver
<azonenberg> So in that case just have AcquireData return false
<azonenberg> and don't touch any of the current state
<xzcvczx> azonenberg: >70 is annoyingly controlled here
<azonenberg> xzcvczx: yeah I use 70, although it can be corrosive to copper unless you add some sulfuric. (But this discussion is a bit off topic, probably better for ##sillycon?)
<xzcvczx> my bad
<_whitenotifier-3> [scopehal] fsedano opened pull request #466: Skip adquiring if no data from scope - https://git.io/J39S3
<_whitenotifier-3> [scopehal] fsedano edited pull request #466: RS driver: Skip adquiring if no data from scope - https://git.io/J39S3
<azonenberg> Incidentally, this is one of the problems i have with the scpi polling methodology
<azonenberg> it sometimes does have race conditions like this
<azonenberg> (the pico driver has a few too)
<_whitenotifier-3> [scopehal] fsedano opened pull request #467: RS driver: Basic edge trigger - https://git.io/J39NF
<_whitenotifier-3> [scopehal] azonenberg pushed 2 commits to master [+0/-0/±2] https://git.io/J3Hfl
<_whitenotifier-3> [scopehal] fsedano c35464c - Skip adquiring if no data from scope
<_whitenotifier-3> [scopehal] azonenberg 8c5ca67 - Merge pull request #466 from fsedano/skipupdate RS driver: Skip adquiring if no data from scope
<_whitenotifier-3> [scopehal] azonenberg closed pull request #466: RS driver: Skip adquiring if no data from scope - https://git.io/J39S3
<_whitenotifier-3> [scopehal] azonenberg closed pull request #467: RS driver: Basic edge trigger - https://git.io/J39NF
<_whitenotifier-3> [scopehal] azonenberg pushed 3 commits to master [+0/-0/±3] https://git.io/J3HJR
<_whitenotifier-3> [scopehal] fsedano a93edda - Add basic edge trigger support
<_whitenotifier-3> [scopehal] fsedano c02c9dc - Add basic edge trigger support
<_whitenotifier-3> [scopehal] azonenberg d185242 - Merge pull request #467 from fsedano/trigger-edge RS driver: Basic edge trigger
<azonenberg> Welp i got hit by an annoying bug in my lecroy waverunner that happens every so often
<azonenberg> It's not 100% reproducible but hits me often enough to be bothersome
<azonenberg> basically, when you disconnect an active probe that has offset control, sometimes it gets confused as to what the real offset is
<monochroma> the blood starts spraying out of the USB ports? yeah, mine does that too.
<azonenberg> Lol
<azonenberg> in this case i had an RP4030 plugged in that had a -1.2V offset configured so i could probe a 1.2V power rail
<azonenberg> Removed it, plugged in a passive probe
<azonenberg> and ot was showing the signal 1.2V below where it was supposed to be
<azonenberg> Because it was adding -1.2V to the scope's internal offset thinking that the probe was providing that
<azonenberg> and somehow forgot that the probe wasn't plugged in
<azonenberg> it didnt show up, i could actually switch to a different active or passive probe and it would be detected fine
<azonenberg> but it still was convinced there was -1.2V of magical external offset
<azonenberg> Reloading MAUI doesn't help, the only fix i've found is a full power cycle of the scope
<azonenberg> hmmmm ok now i am really confused
<azonenberg> i rebooted the scope and it's still showing an offset on channel 2
<azonenberg> Only in 1meg mode too
<azonenberg> in 50 ohm or active probe mode it's fine
<azonenberg> The offset generator isnt broken, i can move the trace around
<azonenberg> everything is electrically fine it's just showing the trace displaced
<azonenberg> This is new to me, a reboot normally clears it
* azonenberg does a full powerdown of the scope
<azonenberg> What the actual hell
<azonenberg> I shut the scope down and removed all power
<azonenberg> With nothing connected to the input in 1meg mode it measures -1.57V
<azonenberg> ok yeah this is not that bug, this is looking like a hardware fault
<azonenberg> as i move the offset around the position of the trace changes
<azonenberg> and i see offset when it's in ground coupled mode too
<azonenberg> wtf
<azonenberg> In AC1M i see offset too
<azonenberg> This is starting to smell like a hardware fault
<monochroma> D: !
<lain> ouch
<azonenberg> It's only one channel and only in 1meg mode at least. So not nearly as bad as it could be
<azonenberg> But i just had the scope cal'd so i'm mad. and i have no idea what could have caused it
<azonenberg> Near term I stuck a warning label on that channel to not use the 1meg mode
<azonenberg> I'm probably going to end up sending it back to lecroy, and paying for service since it's out of warranty
<monochroma> still within the warranty period?
<monochroma> awww
<azonenberg> I dont think so at least
<azonenberg> let me check when i got it
<azonenberg> I got it 7/8/20 actually. So i should still have a few months of warranty
<monochroma> yay
<azonenberg> I'll give Jon a call on monday to confirm then have a chat with Carrie about getting a RMA
<azonenberg> I wish i knew what happened though
<azonenberg> i've only used active probes on that channel the last few weeks. the last time it was used in 1meg mode was probably when it was calibrated
<azonenberg> i hope the cal guy didnt put some setting on his standard wrong and blow the frontend or something
<xzcvczx> that would be brutal
<xzcvczx> and they would never accept responsibility
<azonenberg> Weeeell
<azonenberg> I know the date and time of the cal lab visit
<azonenberg> If the scope logs show an overload around that time...
<monochroma> azonenberg: image the drive :3
<azonenberg> But yeah i forgot this scope should still be within warranty. I was thinking the date I bought the 1 GHz one
<azonenberg> this one is 4G and they look exactly the same so glancing at old lab pics its hard to tell
<azonenberg> i had to check tax records to see when i got the 4G
<azonenberg> As far as responsibility, that would be between them and the manufacturer
<azonenberg> If it's within warranty lecroy should still fix it
<xzcvczx> are the cal guys licenced/certified by lecroy?
<azonenberg> They're an independent lab but are ISO 9001 and 17025 certified among other things
<xzcvczx> ah ok
<azonenberg> In other news i'm starting to play with digital channels on the picoscope
<azonenberg> I plugged the first digital channel into the probe compensation output on the waverunner and i'm not seeing a signal, so trying to figure out what i screwed up
<azonenberg> and it's working now so no idea
<azonenberg> Anyway, i now have a test digital signal source i can start playing with in scopehal
<azonenberg> triggering on digital channels doesnt seem to work right in their software, i have to debug that
<azonenberg> (they let you pick the digital channel it just never actually triggers)
<azonenberg> Oh that's convenient
<azonenberg> pico released the 6000A api docs a while back
<azonenberg> i apparently didnt notice
<azonenberg> i'm no longer using an undocumented api woo
<monochroma> that's cheating
<azonenberg> lol
<azonenberg> So it looks like there's a separate function to enable digital channel banks. You can set thresholds uniquely per channel, but hysteresis is set per bank. which doesn't fit into scopehal's model well
<azonenberg> it expects either global threshold+hysteresis per bank or per channel for both
<azonenberg> the most flexible, if confusing, option is to expose each channel as having independent settings, but silently change hysteresis for all sibling channels when you change one
<azonenberg> Since that's what the hardware does
<azonenberg> Longer term we should think about how to model this in the API (some settings per bank and others per channel)
juli9610 has joined #scopehal
<bvernoux> azonenberg, yes you need a flexible model for glscope
<bvernoux> I think a major milestone will be to support the cheap LA with 8chan
<bvernoux> like that lot of guys could provide feedback as such type of LA cost less than 20USD ...
<azonenberg> bvernoux: That's on the v0.1 feature list
<azonenberg> It's one of 31 open tickets against libscopehal though
<bvernoux> yes I know ;p
<bvernoux> It is just a popular things to bring log of new guys which does not have expensive scope
<bvernoux> log->lot
<azonenberg> Yeah, I'm just trying to focus my own efforts on
<bvernoux> But that will bring also lot of bad issue from beginner ;)
<azonenberg> 1) things that I need for work
<bvernoux> like how to power the board ;)
<azonenberg> 2) things I need for my own projects
<azonenberg> 3) writing driver code for companies that gave me free stuff
<bvernoux> yes it is good points
<azonenberg> If you wanna write a fx2la driver, you can do so at any time
<bvernoux> For me it is clearly not a priority
<bvernoux> so far I use PulseView or DSView which works fine
<bvernoux> and I never use the cheap LA as they are too slow for what I'm doing
<bvernoux> I use DSLogic U3Pro16 for everything now
<bvernoux> for digital stuff of course like fast SPI ... ;)
<azonenberg> I'm looking forward to using the picoscope for digital stuff
<bvernoux> yes it is nice but the PicoGUI is awfull
<bvernoux> their decoders suxx
<bvernoux> it is why I never use it ...
<azonenberg> Yeah the software is bad but i dont use it :p
<azonenberg> they can sample surprisingly fast, up to 5 Gsps
<bvernoux> or I convert it in paste to use the data with PulseView
<azonenberg> On the MSO channels
<azonenberg> And you can get super deep memory
<bvernoux> yes their HW is very good they only lack good SW
<bvernoux> I have seen they have new Pico6000 also up to 1GHz BW
<azonenberg> Yeah
<bvernoux> but they are expensive
<azonenberg> I've known about that for months but was asked not to say anything about it :p
<bvernoux> hehe
<bvernoux> I think their price point is too high for the 1GHz BW scope
<bvernoux> especially with their crappy GUI which is always the same since more than 10years
<azonenberg> How much is it? It didn't seem unreasonable although i wish they had more sample rate
<bvernoux> it is more than 12KUSD
<azonenberg> Do you know what a 1 GHz scope from keysight costs? :p
<bvernoux> yes but keysight are not resonable ;)
<bvernoux> they are just crazy on prices
<bvernoux> I think they shall change their mind related to price
<bvernoux> Rigol/Siglent will hit them very hard soon ;)
<bvernoux> even if Rigol/Siglent SW are often very buggy ;)
<bvernoux> 4chan Pico6000E 8/10/12 (6426E) cost is 14955USD
<bvernoux> glurps ;)
<bvernoux> I think the flexible resolution 8/10/12 is fake too
<bvernoux> to be considered as 8bits++
<azonenberg> No it's real inside the ADC, I asked about it. There's an undocumented extra resolution mode similar to how the HMCAD1520 lets you trade res vs sample rate
<bvernoux> yes you have slower sample rate
<azonenberg> i forget the exact part number but it's a Teledyne E2V 4x 1.25 Gsps 8 bit base sample rate
<azonenberg> it's not just taking two samples and averaging them i mean
<bvernoux> so 12bits is probably 200MHz BW (with 1GSPS)
<bvernoux> if not less
<azonenberg> Let me run a check...
<bvernoux> still better than SW decimation
<bvernoux> as they have like 1bit per LSB per div/2 instead of 0.5 bits in SW decimation
<bvernoux> I think they achieve something like that in their ADC
<bvernoux> Such scope will be amazing for half the price (let say 6KUSD) anyway ;)
<bvernoux> especially with their actual PicoGUI
<bvernoux> which is very bloated and not comparable to even my RigolMSO5000 which is 10x better faster to use
<bvernoux> It is maybe why they have offered a free HighEnd scope ;)
<noopwafel> debug with a lecroy and capture with the pico :p
<bvernoux> But it will be even better if they could contribute on glscope source ;)
<bvernoux> I think Pico could sell to actual price with a decent GUI like glscopeclient
<bvernoux> it is their main drawback as their hardware is very good and compact
<bvernoux> their API is not bad and have decent performance in fact their only drawback is their Picoscope 6 GUI/SW ;)
<bvernoux> a bit like their VNA with their awfull Visual Basic GUI ;)
<azonenberg> And why do you think they're throwing free scopes at me hoping i can fix their software problems? :p
<bvernoux> hehe
<bvernoux> azonenberg, Do you think Keysight will provide you a free high end scope (let say 5GHz+BW) one day ? ;)
<azonenberg> Lol we'll see
<bvernoux> They do not have any PC sw in fact maybe they will think about it
<noopwafel> glscopeclient should be auto-scaling the voltage range based on the GUI, right?
<azonenberg> Ok so, this is my 500 MHz picoscope 6824E with a leobodnar SMA pulse gen fed into the input
<azonenberg> noopwafel: v/div of the scope should track the gui, yes
<noopwafel> hmph
<bvernoux> azonenberg, you have the famous 8/10/12 flex res ?
<azonenberg> Yes
<bvernoux> set to 12bits ?
<azonenberg> This is 8 bit to start. at 5 Gsps
<bvernoux> could you show the same with 8bits?
<bvernoux> ha ok
<azonenberg> noopwafel: The gui will send arbitrary v/div so that the min/max range of the scope exactly fills the display. If the instrument can't support exactly that range, round up as needed
<azonenberg> bvernoux: I'm getting there :)
<noopwafel> yeah, I updated the code for this on the bridge side ofc
<bvernoux> azonenberg, after it will be interesting to check with Intermodulation ;)
<azonenberg> Here's 10 bit. Still 5 Gsps
<bvernoux> azonenberg, a bit like we do the test on RF for TOI
<bvernoux> it seems better but hard to say if there is really 2bits more
<azonenberg> https://www.antikernel.net/temp/rise3.png now here's still 10 bits but sample rate backed out to 2.5 Gsps. Rise time suffers a bit due to aliasing at the lower sample rate
<noopwafel> I thought frontend relay click was at 5V but apparently at 2V, fine.
<azonenberg> https://www.antikernel.net/temp/rise4.png and this is at 1.25 Gsps. my rise time measurement is derping for some reason, but you can see more aliasing
<azonenberg> https://www.antikernel.net/temp/rise5.png and finally, 1.25 Gsps with full 12 bits
<azonenberg> I am not seeing any evidence of reduced bandwidth, other than effects from reducing the sampling rate
<noopwafel> really not getting great speeds though
<azonenberg> noopwafel: in terms of WFM/s or what?
<noopwafel> yes
<bvernoux> 48.35WFM/s is very nice ;)
<bvernoux> it is the good point as i'm not convinced by the 8/10/12 bits flex stuff
<azonenberg> There's definitely extra resolution
<bvernoux> especially they sell that option for >2KUSD
<noopwafel> and now I get 14 (EFAULT?) on the write, wth
<bvernoux> Yes there is but it is not so clear
<azonenberg> Whether I'd pay $2K for it is indeed another question
<azonenberg> But in some circles i could see it being helpful
<bvernoux> it will be interesting to check on power measurement
<bvernoux> or on RF test signal ;)
<bvernoux> to check TOI and IM3
<azonenberg> When I get my VSG60A I will definitely have some fun playing with rf performance of my scopes
<azonenberg> But right now i dont have an rf siggen... only the AWG on the picoscope, the leobodnar pulse gens, and a 1 GHz crystek sinewave SAW oscillator
<azonenberg> that i use as a phase noise/jitter reference
<bvernoux> the best will be to have 2 outputs (independant)
<bvernoux> for TOI test
<bvernoux> it is mandatory
<bvernoux> WInfreak have one which is very compact
<noopwafel> (gdb) p/x memdepth
<noopwafel> $2 = 0x19d319d318a718a7
<noopwafel> problem found :D
<azonenberg> noopwafel: that looks wrong :p
<bvernoux> azonenberg, a good things I'm searching is a good ultra wideband noise generator
<bvernoux> azonenberg, like the old HP/Agilent which go to 18GHz+
<bvernoux> but they are still expensive
<bvernoux> it is a must have to check the phase noise with help of a good Spectrum Analyzer
<bvernoux> as a Signal Analyzer price is totally crazy ;)
<noopwafel> ah this is another mutex thingy
<noopwafel> exposing what I guess is another bug
<azonenberg> Fun
<azonenberg> (also BTW I'm working on implementing MSO support in the bridge. It shouldnt change any of the code you're working on, just adding some new scpi commands and such)
<noopwafel> in any case, if we conflict we conflict
<noopwafel> I don't have an MSO scope here to play with alas
<noopwafel> I do have a question
<noopwafel> how do you intend g_captureMemDepth, g_memDepth and g_memDepthChanged to work
<noopwafel> I tried patching it up in one place, but actually it just doesn't make much sense to me
<noopwafel> I would expect the WaveformServerThread to do 'g_captureMemDepth = g_memDepth;' when it reallocates the buffers
<noopwafel> and then g_memDepthChanged is just 'g_captureMemDepth != g_memDepth'
<noopwafel> (right now I never ever get any buffer reallocations, which explains my low WFM/s..)
<azonenberg> So, memDepthChanged might be best to rename to "reallocation needed" or similar. As it's also set when changing flexres config and some other stuff
<azonenberg> g_captureMemDepth is the memory depth that was set at the time the capture started
<azonenberg> g_memDepthChanged indicates that the memory depth is not the same as it was last time the scope triggered
<azonenberg> so you have to call SetDataBuffer again
<noopwafel> so g_memDepth should be the current/old depth?
<noopwafel> because right now, DEPTH changes g_memDepth (and doesn't set anything)
<azonenberg> Correct. Because I don't want to change the depth under a currently armed capture
<azonenberg> Maybe it might make sense to stop and restart when you see DEPTH happen
<azonenberg> actually you know i think you just found a bug
<azonenberg> wait nvm
<azonenberg> But yeah, it might not be unreasonable to do
<azonenberg> if(g_triggerArmed) StartCapture(true)
<azonenberg> after updating depth in the DEPTH command
<_whitenotifier-3> [scopehal-pico-bridge] noopwafel opened pull request #10: Always send numSamples waveform data - https://git.io/J3QGJ
<noopwafel> I mean, the current code is definitely broken
<noopwafel> because I never ever get the buffer realloc path
<noopwafel> just trying to understand what is intended
<_whitenotifier-3> [scopehal-pico-bridge] azonenberg pushed 1 commit to master [+0/-0/±1] https://git.io/J3QGn
<_whitenotifier-3> [scopehal-pico-bridge] azonenberg 4064552 - Added PRESENT command to check for presence of MSO pods. Began work on enabling/disabling MSO pods.
<noopwafel> also that pull request fixes my memdepth thing
<noopwafel> that's kind of a symptom of the code being a mess, but I think it's worth making the change anyway
<azonenberg> Hmm yeah that might be a race condition if memdepth changes there
<_whitenotifier-3> [scopehal-pico-bridge] azonenberg pushed 2 commits to master [+0/-0/±2] https://git.io/J3QGw
<_whitenotifier-3> [scopehal-pico-bridge] noopwafel 3a9d91c - always send numSamples waveform data
<_whitenotifier-3> [scopehal-pico-bridge] azonenberg 3a8a531 - Merge pull request #10 from noopwafel/numsamples Always send numSamples waveform data
<_whitenotifier-3> [scopehal-pico-bridge] azonenberg closed pull request #10: Always send numSamples waveform data - https://git.io/J3QGJ
<noopwafel> I seem to be hitting all of the races :)
<noopwafel> so I already added a check for PICO_NO_SAMPLES_AVAILABLE in my tree, because we can actually *stop* the scope in between the IsReady call and us trying to grab the data
<azonenberg> Lovely
<azonenberg> Yeah there's races in the driver, the same is true in the opposite with the 6000 series
<azonenberg> you can call stop but it isnt actually stopped
<azonenberg> then when you call runblocks it says it's still capturing
<noopwafel> this is our race I think
<noopwafel> because we drop the mutex in the waveform thread because you don't want the lock_guard<> in scope when you sleep
<noopwafel> and the only downside is that sometimes, the scope is not actually ready because the state changed while the mutex was dropped, and this seems fine
<azonenberg> Makes sense
<noopwafel> and thanks, StartCapture clause is exactly what I needed there, wasn't understanding
<azonenberg> yeah this is very new code still that i'm still figuring out myself
<azonenberg> so there's not much in the way of documentation and commenting compared to libscopehal which is a fairly mature decade-old codebase especially in the core classes like Oscilloscope
<noopwafel> the code at the end of WaveformServerThread probably also needs a rethink, where it takes a lock it already has, and checks if(g_captureMemDepth != g_memDepth)
<noopwafel> but for now I'm just trying for a minimum viable product for 3000 :)
<azonenberg> It does not have the mutex already
<azonenberg> that's what the curly braces before stopping the trigger are for
<azonenberg> We only hold the mutex when acquiring the data
<noopwafel> oh damn you're right
<azonenberg> then release it when doing socket stuff
<noopwafel> hmm
<noopwafel> well that definitely explains why the numSamples thing broke :|
<azonenberg> Yeah that was a legitimate bug
<azonenberg> race if stuff was changed while not holding the mutex
<noopwafel> ok, that makes a lot more sense then! the state copies to globals because, it might change.
<azonenberg> in fact, that was the point of numSamples, a local copied version that was guaranteed to be consistent even if stuff happened in the scpi thread
<azonenberg> i just derped and didnt use it evrywhere i was supposed to :p
<noopwafel> somehow I'm still getting <4WFM/s with 1MS depth
<azonenberg> We used to have a sign on the wall at work
<noopwafel> but it's because it's only triggering 4 times a second
<azonenberg> "You must be this tall to write multithreaded code" with a big line
<azonenberg> taped just below the ceiling
<azonenberg> noopwafel: is it faster with the pico tool?
<noopwafel> yeah, way faster
<azonenberg> Check right now
<azonenberg> is it slow?
<azonenberg> i've seen weird cases of the usb3 connection degrading to usb2 until i unplug and replug the cable
<azonenberg> havent figured out how that happens yet
<noopwafel> so the thing is that I get a usb reset anyway when I start any app
<noopwafel> because it does a port reset when it switches to usb power
<_whitenotifier-3> [scopehal-pico-bridge] azonenberg pushed 1 commit to master [+0/-0/±1] https://git.io/J3Qcl
<_whitenotifier-3> [scopehal-pico-bridge] azonenberg 499cb6b - ON and OFF commands now support MSO pods.
<azonenberg> ah ok you are probably fine then
<_whitenotifier-3> [scopehal] azonenberg pushed 1 commit to master [+0/-0/±2] https://git.io/J3Qca
<_whitenotifier-3> [scopehal] azonenberg 61faae0 - PicoOscilloscope: can now detect and enable MSO pods if present. No support for configuring threshold or hysteresis. Cannot yet download actual waveform data. See #457.
<noopwafel> so my hacky bridge was using rapid block mode, and arming the scope before pulling the data
<noopwafel> while right now I'm not arming the scope until after the data transfer is done, like your 6000 code
<noopwafel> but I would .. still expect much better performance than this
<noopwafel> hm, I guess also the socket write is done before the scope arm, I wonder if that's a bottleneck for some reason
<azonenberg> Yeah that can be improved
<azonenberg> clientside, i'm not vectorizing the int16 to fp32 conversion yet either
<azonenberg> i have a ticket for doing that but wanted to refactor some stuff so i could have only one shared implementation in the Oscilloscope class for that
<noopwafel> right, that's indeed the culprit
<noopwafel> ran the client with --debug and my old friend 'Queue is too big, sleeping' :) well that's good nws
<azonenberg> So that means you're giving glscopeclient waveforms faster than it can process and render them
<azonenberg> gpu side bottleneck perhaps?
<azonenberg> not sure if you saw or not but https://github.com/azonenberg/scopehal-apps/issues/338
<azonenberg> my plan is to have a preference for three different shader configs
<azonenberg> full intensity grading all the time, like we have now
<azonenberg> intensity grading always disabled, like the picoscope software (just solid colors)
<azonenberg> or using a timer to select one or the other based on measured FPS
<azonenberg> so it will intensity grade until FPS drops too low then switch to the fast but less pretty path
<noopwafel> is it not truncating what gets handed to the shader based on what's on the screen?
<azonenberg> So it depends on where the bottleneck is. It always copies the entire waveform to GPU memory but that is normally not the issue
<azonenberg> (this allows much faster scrolling once you pause a capture)
<azonenberg> The actual rendering shader is essentially doing a whole bunch of 1-pixel wide alpha blended rectangles
<azonenberg> For each pair of samples that crosses a given pixel, it finds the min and max Y coordinate of the line segment between them
<azonenberg> then fills between those coordinates with the selected alpha value
<azonenberg> compositing in a local buffer with 32 threads for each X coordinate sharing one common working buffer processing different samples
<azonenberg> So you can get fill rate limited if you have say a 128M point waveform all on screen
<noopwafel> so I will have to install perf, but it looks like most of the time is spent in OnWaveformDataReady
<azonenberg> writing to shared memory
<azonenberg> The non intensity graded version will just do a running min/max reduction then one fill at the end
<noopwafel> hilariously, if I hit ctrl-c at a bunch of random points in gdb, it always hits at the code deleting old wavefirms in HistoryWindow
<azonenberg> Interesting
<azonenberg> and i am a huge fan of vtune, that's what i've used for all of my performance work
<noopwafel> and yes, the non-intensity graded shader would be a real plus
<azonenberg> That, and some tweaks to the intensity graded one, are top of my list once i finish MSO support in the pico driver
<noopwafel> 13% of runtime spent in those deletions :)
<azonenberg> Lovely. I remember having optimized some other stuff around there but maybe there's more room to tune
<azonenberg> I'm constantly running glscopeclient under profilers and tweaking stuff
<azonenberg> but usually i'm focused on a particular driver or rendering path or filter
<azonenberg> File a ticket against scopehal-apps tagged performance, include your profiling data, and just make a generic "look into seeing if we can make this faster"
<azonenberg> tag it as v0.2 for now
<noopwafel> another 17% of time is spent in Resize() inside PicoOscilloscope::AcquireData, and the rest is omp or gpu. will do.
<azonenberg> this is my breakdown inside of AcquireData() for a 1-minute run
<azonenberg> much of this is going to be improved when i vectorize
<noopwafel> Way more like I'd expect.
<azonenberg> the code to do this is already in the lecroy driver, it just has to be refactored into something generic
<azonenberg> this is the full dump
<azonenberg> The MapBuffers() stuff might be possible to improve, i think it might be possible to avoid constantly creating and destroying mappings and just have one long lived mapping instead
<azonenberg> but i will have to figure out how to make sure everything is cached and flushed properly
<azonenberg> that's on my longer term todo list
<noopwafel> http://noopwafel.net/tmp_glscopeprofile.png for curiousity value
<noopwafel> am I not compiling with optimizations or something..
<azonenberg> Good question
<azonenberg> Try making sure you build with -DCMAKE_BUILD_TYPE=RELEASE
<azonenberg> I'm not actually sure what it does if you don't explicitly specify debug or release
<noopwafel> judging by this profile, I suspect .. debug
<azonenberg> well it might be an interim i mean
<azonenberg> with neither the debug or release flags
<azonenberg> i.e. no optimizations but not defining _DEBUG
<noopwafel> *nod*
<azonenberg> (if that is the case, file a ticket for that... we should default to release if unspecified)
<bvernoux> noopwafel, how many WFM/s do you have with Picoscope3000 ?
<noopwafel> well right now about 4 :D
<bvernoux> because of bugs ?
<noopwafel> mmhm
<bvernoux> as I think we could be not far from Pico6000
<noopwafel> haha wow yes ok it was totally in debug mode
<azonenberg> how fast is it now? lol
<bvernoux> yes ?
<bvernoux> I hope at least 20WFM/s ;)
<bvernoux> I remember to have very good results with my Picoscope3406DMSO using API
<bvernoux> with my native code using the API to break AES ;)
<noopwafel> it's not great but it's managing ~9WFM/s at 10MS depth now
<azonenberg> noopwafel: with how many channels?
<noopwafel> just the one
<azonenberg> So 90 Msps streaming throughput?
<azonenberg> That's still 1.44 Gbps of waveform data, which isn't awful by any means
<azonenberg> Probably can still be optimized though
<bvernoux> IIRC the API can reach more than 200MSPS in realtime over USB3.0
<noopwafel> yes, I think it's still cpu/gpu limited by my laptop
<bvernoux> so it could reach probably 20WFM/s in theory ;)
<bvernoux> which is 20x times better than RigolMSO5000 haha
<azonenberg> What does profiling look like *now*?
<noopwafel> ~50% omp, 50% gpu
<azonenberg> Do you have OMP_WAIT_POLICY=PASSIVE?
<azonenberg> it defaults to a ton of gross busy waiting without that env var set
<azonenberg> At some point i'm probably going to make it explicitly setenv and exec to force it
<noopwafel> no, I forgot to include it in the perf line, but I'm also only doing one waveform
<azonenberg> Doesnt matter
<azonenberg> it makes a huge difference in my testing
<azonenberg> Try outside perf to see WFM/s too
<bvernoux> If you want a private funny video
<bvernoux> to break AES ;)
<azonenberg> you end up getting stuck in spinlocks constantly otherwise
juli9610 has quit [Quit: Nettalk6 - www.ntalk.de]
<azonenberg> There's a reason i print a big warning about it during startup if it's set wrong
<noopwafel> that is a bit better, and now it's spending basically all the time doing something in the ScopeThread
<azonenberg> Great
<azonenberg> Because you are now probably bottlenecked on the non-AVX waveform downloa
<azonenberg> Which is probably going to quadruple in speed once i port the lecroy vectorized implementation :p
<noopwafel> ah I was going to ask
<bvernoux> quadruple the speed really ?
<bvernoux> it would be very nice
<azonenberg> We're going to find out
<azonenberg> It's next on my list for the pico driver once i finish MSO support
<bvernoux> Il will try noopwafel Pico3000 when there is something ready
<bvernoux> I'm very interested to test it
<azonenberg> noopwafel: oh and also the lecroy implementation uses openmp threading to process multimillion point waveforms on multiple cores too
<azonenberg> In addition to vectorizing within each core
<azonenberg> the pico version does none of this
<azonenberg> Actually, interestingly it looks like I don't currently have an AVX implementation for 16-bit samples in lecroy either
<noopwafel> I think you can just apply https://github.com/noopwafel/scopehal-pico-bridge/commit/HEAD if you want to play
<azonenberg> so HDOs are going to be slow
<noopwafel> the only changes I made to scopehal are to add a new model for the 3000-series, but SERIES_UNKNOWN is fine too
<noopwafel> going to try this on a more powerful machine now, but first have to fix up the dependencies etc
<bvernoux> noopwafel, does it is stable ?
<bvernoux> my PC is clearly not a beast too so it will be interesting ;)
<azonenberg> well like i said the cpu usage of the pico driver is going to be massively improved soon
<noopwafel> it seems pretty reliable but it doesn't cope well with disconnects
<azonenberg> It's supposed to recover from disconnects but that's not well tested
<bvernoux> GeforceGT 650M + CoreI7-3630QM Asus
<noopwafel> after some fixes I will ask for a review and we can maybe just push this
<noopwafel> it's a very small diff
<bvernoux> ha ok if there is just issue with disconnect it is usable my question was just to know if when it is launched it does not crash after 500WFM (like it was with RigolMSO5000 because of Rigol bugs/workaround not applied in paste)
<noopwafel> the only other thing I had in my bridge, was better triggers
<noopwafel> so I might give that a go next week, since it's probably interesting
<bvernoux> ha yes more advanced triggers will be interesting but I think just basic trigger is already very nice if it is fully usable
<noopwafel> well let me try fixing this debian install
<noopwafel> and then I can leave it running for a bit
<azonenberg> Yeah triggers still need work. MSO support i wanted to get done first as triggers are reasonably self contained
<bvernoux> yes MSO support will be very very nice ;)
<bvernoux> as I have bought Picoscope for that in fact to have synchronized Scope chan with MSO
<bvernoux> noopwafel, do you have MSO option on your Pico3000 ?
<noopwafel> nop
<bvernoux> ha ok
<noopwafel> it is also only 100mhz 4ch
<bvernoux> ha ok
<noopwafel> I can probably bully someone into lending me an mso-capable 3000 if it helps
<bvernoux> when I have time I will try to help with mine which is full option
<bvernoux> 3406DMSO
<bvernoux> it was a refurbished Pico3000 ;)
<bvernoux> but with full warranty from Pico
<bvernoux> A demo product IIRC
<bvernoux> Bought mainly to break AES things at start but planning to use it with API for other stuff ;)
<noopwafel> atm I am still penniless phd student, so this was already kinda out of my budget, but it's nice to have for SCA
<noopwafel> would've been nicer if I'd actually had any time at all over the last few months, hence my disappearance from online, but that's dealt with now.. :P
<bvernoux> It is why sometimes it is super useful to have Analog+Digital at same time => https://hydrabus.com/rhme2/rhme2_BL_load_fiesta_startdata.png
<bvernoux> to correlate IO with power consumption for synchro and to recover AES key later (with lot of traces)
<_whitenotifier-3> [scopehal-apps] noopwafel opened pull request #339: default to Release cmake build - https://git.io/J3Qwq
<noopwafel> yeah, I can imagine it is nice :)
<noopwafel> definitely worth supporting
<d1b2> <wintermute> what's the min bandwidth that you need for SCA?
<azonenberg> Depends on how fast your target is
<d1b2> <wintermute> something like 4x faster should be enough?
<azonenberg> 4x what though
<d1b2> <wintermute> like, 4 samples/target clock
<azonenberg> I would want bandwidth at least equal to double the target clock (internal clock, which might be a PLL multiple of the external input)
<azonenberg> probably more
<azonenberg> and sample rate obviously has to be double BW per nyquist but more would be preferable
<bvernoux> for 8MHz Target I use 50MSPS which is largely enough
<LeoBodnar> guys at Pico use quite a few SMA pulsers for scope testing
<azonenberg> As a data point, i havent broken this yet but am looking at a board with 48 MHz core clock for work. I'm sampling at 1 Gsps on a 4 GHz scope with a 200 MHz bandwidth limiter in the frontend
<d1b2> <wintermute> yeah I had something like that in mind, like, I played with SCA in python with lescar, and it has a simulation engine for samples
<d1b2> <wintermute> and by default it gives something like 7 samples per target clock
<bvernoux> for 48MHz core clock 500MSPS is enough in fact depending on the leakage ....
<azonenberg> And there are strong frequency components in my FFT out to 375 MHz
<bvernoux> but more is better as with decimation there is some precious lsb recovered ...
<bvernoux> it is why 12bits is very welcome for SCA in some case
<d1b2> <wintermute> yeah I imagine it's super sensitive to noise
<d1b2> <wintermute> very cool
<azonenberg> I plan to be adding more useful SCA features to glscopeclient longer term
<bvernoux> but the key factor is to power the target with an ultra low noise LDO ;)
<azonenberg> but right now i don't yet know what i don't know yet :p
<bvernoux> and after that even with a 8bits scope you see lot of things ;)
<azonenberg> This was my first attempt at doing any power analysis work and i havent figured out all the details yet
<bvernoux> with a diff Probe like the one provide with Chipwhisperer it is very good for the price
<bvernoux> provided
<d1b2> <wintermute> I don't own any scope, the closest I have is a saleae logic 8
<bvernoux> example of what you see
<bvernoux> it is a basic example
<bvernoux> where you can push AES clear data at input and retrieve AES crypto at output
<d1b2> <wintermute> I've been searching for a good value scope, so far I'm into the picoscope
<bvernoux> yes for AES stuff Picoscope is a must have ;)
<azonenberg> I'm pretty happy with mine so far
<bvernoux> I think the Pico6000E 5GS 8/10/12 Flex shall be amazing also but a bit expensive
<azonenberg> Yeah, i got mine free as a dev unit
<azonenberg> I think MSRP on it was something around 15K USD for this configuration (6824E + two MSO pods)
<bvernoux> azonenberg, yours is like the latest but limited to 500MHz BW ?
<d1b2> <wintermute> oh wow, very cool
<azonenberg> bvernoux: Correct
<azonenberg> but 8 channels instead of 4
<azonenberg> it was the flagship model at the time i got it, the 1 GHz wasn't out yet
<bvernoux> azonenberg, yes 8chan can be nice for some stuff
<bvernoux> azonenberg, I see the new Pico6000E is only available in 4chan but we see 8chan is planned
<d1b2> <wintermute> I'd probably go for something cheap, the 50MHz seems to have most of the features
<bvernoux> if you are interested by basic setup for fun AES recovery => https://hydrabus.com/rhme2/Picoscope_CW_HydraRHME2_SCA_setup1.jpg
<d1b2> <wintermute> biggest selling point for me is the software 🙂
<bvernoux> it is pretty old but work fine on any MCU up to 24MHz
<bvernoux> here it was an ATMEGA328p IIRC
<d1b2> <wintermute> bvernoux: what's your setup there?
<azonenberg> bvernoux: I don't know if an 8ch 1 GHz is planned or not, i havent asked
<azonenberg> I've known about the 4ch 1 GHz since early this year
<bvernoux> I have designed a board to cleanup anything and transplanted the ATMEGA328P on it ;)
<bvernoux> yes it is like cheating but it was for fun to have the most clean recovery ;)
<d1b2> <wintermute> beautiful work 🙂
<bvernoux> and the AES key recovery can be done with 15 traces ;)
<d1b2> <wintermute> oh wow
<d1b2> <wintermute> when I did it, I think I needed > 100 traces
<d1b2> <wintermute> (data from a lascar simulator)
<d1b2> <wintermute> it was a hardware security challenge from hackthebox
<bvernoux> yes we have done that with HydraBus team ;)
<bvernoux> my friend has win an ledger nano ;)
<d1b2> <wintermute> sorry, I said it wrong
<d1b2> <wintermute> I needed more than 1000 traces 🙂
<bvernoux> but it was not fun like RHME2 or RHME3
<bvernoux> I do not like things related to bitcoin ...
<d1b2> <wintermute> personally I don't like anything related to bitcoin
<noopwafel> min bandwidth for SCA is a few samples per run :p
<bvernoux> I was speaking about this one https://donjon.ledger.com/Capture-the-Fortress/
<bvernoux> from 2020
<bvernoux> my preferred was clearly RHME2 ;)
<d1b2> <wintermute> nice!
<d1b2> <wintermute> these things get super competitive
<d1b2> <wintermute> well, it's a competition
<bvernoux> yes
<d1b2> <wintermute> winning is super badass
<bvernoux> with some big labs behind
<bvernoux> like Thales ...
<bvernoux> it is why it is fun to win with a poor Pico3000 when such labs have laser and 100kUSD scopes ;)
<d1b2> <wintermute> you folks are super badass
<bvernoux> for RHME2 with a poor Rigol it was possible to do the same
<bvernoux> with cheap HW but it will be longer to recover the AES on some challenges ;)
<bvernoux> especially when some challenges required 100k traces
<noopwafel> now do it with an arduino :)
<noopwafel> but ye people really did some amazing things on the rhme challenges
<noopwafel> was very nice
<bvernoux> yes especially with an ATMEG328p
<bvernoux> I have pratically broken the secure bootloader
<d1b2> <wintermute> oh now that I saw the image in the root readme, I recognize it
<bvernoux> but was catched by one of their hidden security and the board was not booting anymore ;)
<d1b2> <wintermute> liveoverlow made a series of videos making an intro to SCA, with content from this competition
<bvernoux> as there was lot of anti-glitch sw stuff ;)
<bvernoux> which are very effective in fact on a non secure MCU
<bvernoux> when it is done correctly
<d1b2> <wintermute> hardware security is hard
<bvernoux> yes today HW security are weak
<bvernoux> all ARM TrustZone are just big joke ;)
<bvernoux> as they do not protect from glitch attack or clock attacks ...
<d1b2> <wintermute> and sometimes the manufacturer fucks up
<bvernoux> yes
<bvernoux> especially when they hide what the boot is doing ;)
<bvernoux> NXP is famous for that but also ST
<d1b2> <wintermute> yeah, that should be illegal
<bvernoux> I do not know why they do not want to publish the boot code source
<bvernoux> (because it is not done very wellà
<bvernoux> I think ST are the most open anyway
<bvernoux> I reversed LPC4370 BL years ago
<bvernoux> what a mess inside ;)
<bvernoux> full of hidden peripherals
<d1b2> <wintermute> D:
<bvernoux> this MCU was clearly a beast compared to all other MCU
<noopwafel> so the bad news is that on my desktop, I'm also getting <10WFM/s
<bvernoux> it was designed for Medical usage in fact ;)
<bvernoux> it is why it is the only one to have triple ADC 12 bits at up to 80MSPS ;)
<bvernoux> you can search there is no any MCU or CPU in world with such feature ;)
<d1b2> <wintermute> oh wow
<bvernoux> in addition the ADC is very good even with 3 core running with USB HS 2.0 and tons of IO/Peripherals
<bvernoux> better than external 12bits ADC ;)
<d1b2> <wintermute> 12b at 80MSPS is much more than HS 2.0 can handle
<noopwafel> so I guess maybe this is simply the limit
<bvernoux> yes to reach 80MSPS can be used only with their SGPIO
<noopwafel> with the wait for the transfer
<bvernoux> and requires an external FPGA in fact ;)
<azonenberg> noopwafel: i'm hopeful we will see a speedup when i vectorize the processing
<d1b2> <wintermute> super cool chip
<azonenberg> but i guess we'll see
<noopwafel> azonenberg: I don't appear to be cpu-limited now
<noopwafel> ScopeThread is <50% cpu
<bvernoux> yes even today it is a beast (after more than 8 years) when other MCU do always the same stuff with crappy slow ADC ;)
<noopwafel> oh noo I forgot the openmp env
<noopwafel> ok
<d1b2> <wintermute> it could be the base of a very cool SDR
<d1b2> <wintermute> 80MHz bw is enough to cover all bluetooth channels
<noopwafel> nop, still at ~9.8WFM/s. that's not bad though (and I mean you can get much much higher if you go below 10MS).
<noopwafel> and I expect it'd be much better in rapid block mode, will try this at some point.
<azonenberg> noopwafel: using less cpu though?
<noopwafel> yes
<azonenberg> I mean. 9.8 WFMs * 10M points is 98 Msps, or 1.56 Gbps of sample data
<azonenberg> you're not maxing out usb3 yet, but it's not *bad* either
<azonenberg> try with more channels, see how throughput scales?
<noopwafel> much worse, alas
<noopwafel> I'll add some debug later to confirm my suspicions
<noopwafel> glscopeclient is *lovely* to use on a machine with enough hw though
<noopwafel> can I reset the stats somehow?
<noopwafel> it also occasionally stops triggering, so I guess the logic there is still bad somewhere
<noopwafel> bvernoux: http://noopwafel.net/tmp_pico_wfm.png <- it made it to >20k WFMs while I was changing v/div and depth and speed etc, so I guess it's stable enough
<bvernoux> ha great !!
<bvernoux> also you obtain 10WFM/s but with 10Msamples
<bvernoux> I think it is same perf as Lecroy
<bvernoux> bye
bvernoux has quit [Quit: Leaving]
juli9610 has joined #scopehal
Famine_ has joined #scopehal
Famine- has quit [Ping timeout: 246 seconds]
juli9611 has joined #scopehal
juli9610 has quit [Ping timeout: 252 seconds]
<_whitenotifier-3> [starshipraider] azonenberg pushed 3 commits to master [+38/-0/±5] https://git.io/J35m8
<_whitenotifier-3> [starshipraider] azonenberg d9a1d9a - SMA test simulations
<_whitenotifier-3> [starshipraider] azonenberg c3b6324 - Lots more simulations for sma-test board
<_whitenotifier-3> [starshipraider] azonenberg 1a034b9 - Final version of sma-test board for fab
<d1b2> <mubes> Hmmm...did something change in glscopeclient? Thresholding a 10MS analog channel hangs the GUI...well, slows it to an unusable point, at least
<d1b2> <mubes> (Repeats) #13 0x00007fffb6e8d5c6 in () at /lib/x86_64-linux-gnu/libnvidia-glcore.so.460.73.01 #14 0x00007fffb6b2e7d9 in () at /lib/x86_64-linux-gnu/libnvidia-glcore.so.460.73.01 #15 0x00005555558064bc in ShaderStorageBuffer::Map(unsigned long, unsigned int) (this=0x555557a0e9b0, size=40000000, access=35001) at /home/dmarples/Builds/scope/scopehal-apps/src/glscopeclient/ShaderStorageBuffer.h:79 #16 0x0000555555802df4 in
<d1b2> WaveformRenderData::MapBuffers(unsigned long, bool) (this=0x555557a0e990, width=1278, update_waveform=true) at /home/dmarples/Builds/scope/scopehal-apps/src/glscopeclient/WaveformArea_rendering.cpp:84 #17 0x0000555555803a5a in WaveformArea::MapAllBuffers(bool) (this= 0x555556c7d000, update_y=true) at /home/dmarples/Builds/scope/scopehal-apps/src/glscopeclient/WaveformArea_rendering.cpp:323
<d1b2> <mubes> I'll pull everything and do a clean build, just in case
<d1b2> <mubes> still the same 😦
<azonenberg> mubes: interesting, because i recently made some optimizations to a couple of shaders but on my 2080 they almost doubled performance
<azonenberg> gimme a minute
<azonenberg> it's possible i introduced a regression
<d1b2> <mubes> Well, for 'normal' manipulation of a waveform it certainly feels faster
<azonenberg> yeah i bet i broke something for digital waveforms when i did the analog optimizations
<d1b2> <mubes> Just tried again and it didn't lock this time, but didn't create the thresh'ed wave either
<d1b2> <mubes> I'm on a Quadro P1000
<azonenberg> Ok i just reproduced. Apparently sparse digital waveforms are fine but dense packed ones have a regression
<azonenberg> investigating
<d1b2> <mubes> Don't rush on my behalf, I'm experimenting with early nights for a couple of weeks 🙂
<_whitenotifier-3> [scopehal] azonenberg pushed 1 commit to master [+0/-0/±1] https://git.io/J35RM
<_whitenotifier-3> [scopehal] azonenberg 43578be - DemoOscilloscope: set dense pack flag on generated waveforms
<_whitenotifier-3> [scopehal-apps] azonenberg pushed 3 commits to master [+0/-0/±3] https://git.io/J35Rp
<_whitenotifier-3> [scopehal-apps] azonenberg f6d7f1f - Found a few more spots we should have cleared persistence, but didn't
<_whitenotifier-3> [scopehal-apps] azonenberg 311acb5 - Updated submodules
<_whitenotifier-3> [scopehal-apps] azonenberg 602a811 - WaveformArea: Fixed regression causing failure to render dense packed digital waveforms
<azonenberg> mubes: fixed
<d1b2> <mubes> @azonenberg tested 🙂
<d1b2> <mubes> bril, thx!!
<azonenberg> tl;dr i optimized out a memcpy too aggressively
<azonenberg> the shader still tried to use the value but i wasn't writing it
<azonenberg> so the shader read uninitialized memory and Bad Things (tm) happened
<azonenberg> The long term fix is to produce a version of the digital rendering shader with the same optimization and remove the copy for that case too
<azonenberg> but short term this produces correct results, just a little slower
<d1b2> <mubes> Well, It's certainly an improvement from previous...it's still a bit painful at high zoomout on a 10MS waveform, but it seems better than it was.
<azonenberg> Yeah there's definitely still more tuning happening. On my 2080 Ti, I can render a 128M sample waveform at about 6 WFM/s
<azonenberg> 6 FPS*
<azonenberg> with all of it on screen at once
<azonenberg> Intensity grading uses a lot of GPU power. I plan to make a speed-optimized renderer without all of the pretty shading that can be selected in preferences
<d1b2> <mubes> This is not a fast card in comparison
<azonenberg> and then probably also add various other stuff like decimation, as well as general optimization
<azonenberg> Once nvidia comes out with a version of VTune where the shader profiler supports OpenGL, i expect to make more progress
<azonenberg> Their current profiler can view overall perf counter stats across the entire shader which is good for roughly determining what you're bottlenecked on
<azonenberg> but it can't do line by line run time stats in GL, only for D3D and... vulkan?
<azonenberg> and those are in beta
<azonenberg> so presumably GL support is coming
<azonenberg> sorry not VTune, I meant NSight
<azonenberg> mixing up my profilers :p
<azonenberg> I use vtune all the time, but it's for CPU
<d1b2> <mubes> A few too many MIPS for me, I tend to be optimising on CORTEX-M, which is pretty much at the other end of the spectrum.
<azonenberg> I've done dev for cortex-M but havent tried to do anything compute heavy on them
<azonenberg> I've been squeezing performance out of things since I write an SSE2 optimized matrix math library for a game engine I was writing in high school
<d1b2> <mubes> Mostly not compute heavy, mostly trying to get them to go to deep sleep most frequently and for as long as possible
<azonenberg> ah so power optimization
<azonenberg> Totally different problem domain
<d1b2> <mubes> but yeah, I like the optimisation game
<azonenberg> I'm used to trying to crunch GB of data in the least time possible
<d1b2> <mubes> sometimes the constraint is cycles or speed, or task responsiveness. It does vary a lot.
<azonenberg> Yeah
juli9611 has quit [Quit: Nettalk6 - www.ntalk.de]
<d1b2> <mubes> ...but thats why we're building Orbuculum, I want more tools to do it better.
<azonenberg> I remember when I was taking my computer architecture class freshman year of college the prof gave us a recursive C function for solving the Towers of Hanoi puzzle on an exam and asked us to port it to x86 asm
<d1b2> <mubes> Why did I never get to that sort of Uni???
<azonenberg> The hint was "If you go much over 30 instructions you're probably doing something wrong. Equally, if you think you're done after 20, you either missed something or you're better than the compiler and should think about starring as Neo in the next Matrix movie"
<azonenberg> This was of course a challenge to my manhood :P
<azonenberg> I implemented it in about 23 or 24 on the exam, fairly conservative but i knew it would work and I didn't remember the syntax for the bitfield in the shufps instruction off the top of my head
<azonenberg> The next day I walked into office hours with a printout of a 19 instruction implementation
<azonenberg> Just to prove it could be done
<d1b2> <mubes> You were born into a different age...Z80 and 6502 you could keep all the useful side-effects in your head. Good luck to anyone trying to figure that code out in future though!
<d1b2> <mubes> Right, bedtime. Night.
<azonenberg> Yeah the x86 instruction manuals are massive. I'm familiar enough to know what to look for, but i always have an instruction/intrinsic reference on the other screen when I'm writing vector code