azonenberg changed the topic of #scopehal to: libscopehal, libscopeprotocols, and glscopeclient development and testing |,, | Logs:
<_whitenotifier-4> [scopehal] Codysseus commented on issue #361: USB Decoder does not function correctly. -
_whitenotifier-4 has quit [Remote host closed the connection]
Degi_ has joined #scopehal
Degi has quit [Ping timeout: 272 seconds]
Degi_ is now known as Degi
electronic_eel has quit [Ping timeout: 240 seconds]
electronic_eel has joined #scopehal
juli966 has quit [Quit: Nettalk6 -]
<azonenberg> So i'm playing around mroe with jitter decomposition
<azonenberg> In particular, i want to be able to get a jitter spectrum with DDJ removed
<azonenberg> I think i'm starting to figure out how to do this
<azonenberg> So the first step is to compute DDJ
<azonenberg> But not just p-p DDJ
<azonenberg> you need the actual data dependent shift in edge TIE for each possible bit sequence
<azonenberg> in my case i'm doing an 8-bit window for ISI
<azonenberg> To do that, you loop over each edge in the waveform, calculate TIE, but also keep track of the past 8 bits
<azonenberg> then you separately average TIE in each of 256 bins
<azonenberg> this averages out any Rj or Pj leaving only the data correlated components (DDJ = ISI + DCD)
<azonenberg> Max - min of that histogram then gives you p-p DDJ
<azonenberg> but the trick is, you can then go back into the TIE curve again and subtract the smoothed DDJ from each bin
<azonenberg> giving you Rj + BUj only
<azonenberg> Results seem to track LeCroy SDAII somewhat closely, although there's some variation likely due to the different PLLs in use
<azonenberg> I still need to implement dual-Dirac Rj + Dj measurements
juli966 has joined #scopehal
bvernoux1 has joined #scopehal
bvernoux has quit [Ping timeout: 240 seconds]
bvernoux1 has quit [Read error: Connection reset by peer]
nelgau has quit [Remote host closed the connection]
nelgau has joined #scopehal
nelgau has quit [Read error: Connection reset by peer]
nelgau has joined #scopehal
nelgau has quit [Client Quit]
nelgau has joined #scopehal
bvernoux has joined #scopehal
<bvernoux> I have more details about ffts crash with glscopeclient demo
<bvernoux> Thread 17 received signal SIGSEGV, Segmentation fault.
<bvernoux> [Switching to Thread 8840.0x2008]
<bvernoux> ffts_generate_table_1d_real_32f (p=<optimized out>, sign=-1, invert=1) at C:\msys64\home\Ben\ffts\src\ffts_trig.c:1068
<bvernoux> Python Exception <class 'UnicodeDecodeError'> 'utf-8' codec can't decode byte 0x97 in position 1748: invalid start byte:
<bvernoux> 1068 w[i][1] = ct[2*i][1];
<bvernoux> (gdb) display /3i $pc
<bvernoux> 1: x/3i $pc
<bvernoux> => 0x6a4c8380 <ffts_generate_table_1d_real_32f+208>: movapd (%r15,%r13,2),%xmm3
<bvernoux> 0x6a4c8386 <ffts_generate_table_1d_real_32f+214>: movaps %xmm3,(%r10,%r13,1)
<bvernoux> 0x6a4c838b <ffts_generate_table_1d_real_32f+219>: add $0x10,%r13
<bvernoux> it is very strange to have such issue
<bvernoux> unfortunately gdb provided in MSYS2 Mingw64 is a crap
<bvernoux> doing layout asm
<bvernoux> just crash gdb ...
<bvernoux> what a joke
<bvernoux> the tools under windows are real crap to debug
<miek> wait, how is python involved here?
m4ssi has joined #scopehal
<miek> bvernoux: you should print out the registers and see if it's the same alignment issue
<bvernoux> yes i have checked the addr
<bvernoux> python is embedded in gdb 10.1 it seems
<bvernoux> do not ask me why ...
<bvernoux> the most valuable details comes from VisualSTudio when trying to debug this mess
<bvernoux> and it seems it read data outside of memory ...
<bvernoux> so it is not an alignment issue but more related to some registers trashed or something buggy when ffts is built
<bvernoux> the fun things is ffts demo work fine
<bvernoux> but it use static lib
<bvernoux> there is of course no any test with dynamic library ...
<bvernoux> and the fun things
<bvernoux> if I use the same libffts.dll from my glscopeclient (no release version which works)
<bvernoux> and I copy it on the release version which crash
<bvernoux> it still crash
<bvernoux> so I heavily suspect something wrong from glscopeclient release
<bvernoux> i checked N
<miek> what are the values of r15 and r13?
<bvernoux> let me check
<bvernoux> interesting I have rebuilt ffts with Debug
<bvernoux> now I have a bt
<bvernoux> (gdb) where
<bvernoux> #2 0x000000006a4c1f4a in ffts_execute (p=0xe83d210, in=0x12fb6fe0, out=0x13037040) at C:\msys64\home\Ben\ffts\src\ffts.c:172
<bvernoux> #1 ffts_execute_1d_real (p=0xe83d210, input=0x12fb6fe0, output=0x13037040) at C:\msys64\home\Ben\ffts\src\ffts_real.c:186
<bvernoux> #0 0x000000006a4c4b24 in _mm_load_ps (__P=0x12b9bc2c) at C:/msys64/mingw64/lib/gcc/x86_64-w64-mingw32/10.2.0/include/xmmintrin.h:931
<bvernoux> #3 0x00000000021cbae7 in TestWaveformSource::DegradeSerialData (this=0xb7067e0, cap=0xe81dc80, sampleperiod=20000, depth=100000)
<bvernoux> at C:\msys64\home\Ben\scopehal-apps\lib\scopehal\TestWaveformSource.cpp:307
<bvernoux> #4 0x00000000021cbe3c in TestWaveformSource::GeneratePRBS31 (this=this@entry=0xb7067e0, amplitude=amplitude@entry=0.899999976,
<bvernoux> period=period@entry=96969.6016, sampleperiod=sampleperiod@entry=20000, depth=depth@entry=100000)
<bvernoux> at C:\msys64\home\Ben\scopehal-apps\lib\scopehal\TestWaveformSource.cpp:203
<bvernoux> #5 0x0000000002151d9f in DemoOscilloscope::AcquireData (this=0xb705200)
<bvernoux> at C:\msys64\home\Ben\scopehal-apps\lib\scopehal\DemoOscilloscope.cpp:399
<bvernoux> #6 0x00000000004979dc in ScopeThread (scope=0xb705200) at C:\msys64\home\Ben\scopehal-apps\src\glscopeclient\main.cpp:393
<bvernoux> #7 0x000007fed3a80511 in ?? () from C:\msys64\home\Ben\glscopeclient_build_release\libstdc++-6.dll
<bvernoux> #8 0x000007fed58b4f33 in ?? () from C:\msys64\mingw64\bin\libwinpthread-1.dll
<bvernoux> #9 0x000007fefe16415f in srand () from C:\Windows\system32\msvcrt.dll
<bvernoux> #10 0x000007fefe166ebd in msvcrt!_ftime64_s () from C:\Windows\system32\msvcrt.dll
<bvernoux> #11 0x00000000779e556d in KERNEL32!BaseThreadInitThunk () from C:\Windows\system32\kernel32.dll
<bvernoux> #12 0x0000000077b4372d in ntdll!RtlUserThreadStart () from C:\Windows\SYSTEM32\ntdll.dll
<bvernoux> #13 0x0000000000000000 in ?? ()
<bvernoux> (gdb) display /3i $pc
<bvernoux> => 0x6a4c4b24 <ffts_execute_1d_real+1338>: movaps (%rax),%xmm0
<bvernoux> 1: x/3i $pc
<bvernoux> 0x6a4c4b27 <ffts_execute_1d_real+1341>: movaps %xmm0,0x770(%rbp)
<bvernoux> 0x6a4c4b2e <ffts_execute_1d_real+1348>: mov 0x7dc(%rbp),%eax
<bvernoux> so it is not at same place now
<miek> if it's an alignment thing it's probably gonna move around with different compile flags
<miek> the argument to _mm_load_ps should be 16-byte aligned but isn't
<bvernoux> yes rax is clearly not aligned correctly
<bvernoux> so at the end it seems it is clearly an alignment issue
<bvernoux> which trig an overflow ... or just abort the instruction as it cannot be executed correctly
<bvernoux> here it is hard to find the exact error related to the instruction
<bvernoux> I suspect the bug is with AlignedAllocator<float, 32> g_floatVectorAllocator
<bvernoux> which does not align anything ;)
<bvernoux> let's try something
<bvernoux> replace _aligned_malloc() by _mm_malloc()
<bvernoux> which seems better support more standard
<bvernoux> I suspect _aligned_malloc() is buggy for MSYS2 MINGW64
<bvernoux> as anyway it is not portable and more for VStudio
<bvernoux> _mm_malloc() is standard in C++11
<bvernoux> for all platform
<miek> _mm_malloc is standard in *C11*, which may not be supported in all versions of visual studio
<bvernoux> we do not care of visual studio ;)
<bvernoux> officially it is not supported anyway
<bvernoux> as glscopeclient support only MSYS2 mingw64 so far
<bvernoux> it is impossible to build it with VS
<bvernoux> I'm not even sure it is planned to be compatible with VS ...
<bvernoux> anyway it will requires VS2018 or more to support correctly C11 or more
<bvernoux> older version of VS are crap and are not compliant with C++11 IIRC ...
<bvernoux> they are compliant with M$ stuff ;)
<bvernoux> anyway I have same crash ;)
<bvernoux> rax 0x12e9f76c
<bvernoux> 0x6a4c4b24 <ffts_execute_1d_real+1338>: movaps (%rax),%xmm0
<bvernoux> still not aligned correctly
<bvernoux> (gdb) where
<bvernoux> #0 0x000000006a4c4b24 in _mm_load_ps (__P=0x12e9f76c) at C:/msys64/mingw64/lib/gcc/x86_64-w64-mingw32/10.2.0/include/xmmintrin.h:931
<bvernoux> #1 ffts_execute_1d_real (p=0xea61a30, input=0x12f9f8a0, output=0x1326c520) at C:\msys64\home\Ben\ffts\src\ffts_real.c:186
<bvernoux> #2 0x000000006a4c1f4a in ffts_execute (p=0xea61a30, in=0x12f9f8a0, out=0x1326c520) at C:\msys64\home\Ben\ffts\src\ffts.c:172
<bvernoux> #3 0x00000000021cbae7 in TestWaveformSource::DegradeSerialData (this=0xb5c67e0, cap=0xe84b810, sampleperiod=20000, depth=100000)
<bvernoux> at C:\msys64\home\Ben\scopehal-apps\lib\scopehal\TestWaveformSource.cpp:307
<bvernoux> #4 0x00000000021cbe3c in TestWaveformSource::GeneratePRBS31 (this=this@entry=0xb5c67e0, amplitude=amplitude@entry=0.899999976,
<bvernoux> period=period@entry=96969.6016, sampleperiod=sampleperiod@entry=20000, depth=depth@entry=100000)
<bvernoux> at C:\msys64\home\Ben\scopehal-apps\lib\scopehal\TestWaveformSource.cpp:203
<bvernoux> #5 0x0000000002151d9f in DemoOscilloscope::AcquireData (this=0xb5c5200)
<bvernoux> at C:\msys64\home\Ben\scopehal-apps\lib\scopehal\DemoOscilloscope.cpp:399
<bvernoux> #6 0x00000000004979dc in ScopeThread (scope=0xb5c5200) at C:\msys64\home\Ben\scopehal-apps\src\glscopeclient\main.cpp:393
<bvernoux> #7 0x000007fed3690511 in ?? () from C:\msys64\home\Ben\glscopeclient_build_release\libstdc++-6.dll
<bvernoux> #8 0x000007fed58b4f33 in ?? () from C:\msys64\mingw64\bin\libwinpthread-1.dll
<bvernoux> #9 0x000007fefe16415f in srand () from C:\Windows\system32\msvcrt.dll
<bvernoux> #10 0x000007fefe166ebd in msvcrt!_ftime64_s () from C:\Windows\system32\msvcrt.dll
<bvernoux> #11 0x00000000779e556d in KERNEL32!BaseThreadInitThunk () from C:\Windows\system32\kernel32.dll
<bvernoux> #12 0x0000000077b4372d in ntdll!RtlUserThreadStart () from C:\Windows\SYSTEM32\ntdll.dll
<bvernoux> #13 0x0000000000000000 in ?? ()
<bvernoux> let me check with Ghidra and the libscopehal.dll
<bvernoux> as it is a mess to understand how this C++ stuff mixed with sse are compiled ;)
massi_ has joined #scopehal
<d1b2> <Stary> is it just this issue
<d1b2> <Stary> oh 1d/2d
<bvernoux> pontentially i will say yes
<bvernoux> maybe ;)
<miek> try `frame 1` and `print *p`
<bvernoux> #1 ffts_execute_1d_real (p=0xea61a30, input=0x12f9f8a0, output=0x1326c520) at C:\msys64\home\Ben\ffts\src\ffts_real.c:186
<bvernoux> 186 __m128 t2 = _mm_load_ps(buf + N - i - 4);
<bvernoux> (gdb) print *p
<bvernoux> lastlut = 0x0, transform = 0x6a4c45ea <ffts_execute_1d_real>, transform_base = 0x0, transform_size = 0, constants = 0x0, plans = 0xea61b70,
<bvernoux> $1 = {offsets = 0x0, ws = 0x0, oe_ws = 0x0, eo_ws = 0x0, ee_ws = 0x0, is = 0x0, ws_is = 0x0, i0 = 0, i1 = 0, n_luts = 0, N = 131071,
<bvernoux> rank = 1, Ns = 0x0, Ms = 0x0, buf = 0x12e1f780, transpose_buf = 0x0, destroy = 0x6a4c453c <ffts_free_1d_real>, A = 0x12e9f7e0,
<bvernoux> B = 0x12f1f840, i2 = 0}
<bvernoux> strange things why N = 131071
<bvernoux> it shall be 131072
<bvernoux> A & B seems correctly aligned anyway
<bvernoux> any idea ?
<miek> so buf is just a float*, so that loop & pointer arithmetic is going to result in addresses aligned to 4 bytes (not 16), i think?
<bvernoux> buf = 0x12e1f780 is also correctly aligned
<bvernoux> ha yes maybe
<bvernoux> so a bug in libffts ;)
<miek> but `buf + 1`, for example, wouldn't be
<miek> yeah
<bvernoux> strange thing is why it appears only on windows build release ?
<bvernoux> for me the issue is N = 131071
<bvernoux> instead of 131072 to be aligned
<bvernoux> to be checked
<miek> there's some discussion on that in one of the issues (that gcc's code generation can hide this)
<miek> i don't really understand what that loop is supposed to be doing
<bvernoux> yes me too i'm clearly not an expert on sse x64 stuff
<bvernoux> and I really hate their stuff ;)
<bvernoux> I'm more embedded developer
<bvernoux> anyway the issue was in & out was not aligned on 16bytes
<bvernoux> here we have
<bvernoux> ffts_execute_1d_real (p=0xea61a30, input=0x12f9f8a0, output=0x1326c520)
<bvernoux> and in & out are correctly aligned
<miek> oh wait i was misreading the loop, i didn't see that it's i += 16
<miek> ok that makes more sense, so yes the problem is N
<bvernoux> yes why N is with such value
massi_ has quit [Remote host closed the connection]
<miek> i would set a breakpoint in the plan setup to see if glscopeclient is passing that in
<bvernoux> yes
<bvernoux> at start of DegradeSerialData
<bvernoux> TestWaveformSource::DegradeSerialData
<bvernoux> and do step by step
<bvernoux> N is npoints IIRC
<bvernoux> and so with invalid value 131071 ...
Stary has quit [Ping timeout: 260 seconds]
<miek> oh, i bet it could be - it's doing an implicit cast from float to size_t
<bvernoux> I confirm
<bvernoux> ffts_init_1d_real (N=131071, sign=-1) at C:\msys64\home\Ben\ffts\src\ffts_real.c:607
<bvernoux> ha yes a rounding issue ;)
<bvernoux> depth is 100000
<bvernoux> need to do the computation to check ;)
<bvernoux> const size_t npoints = pow(2, ceil(log2(depth)));
<bvernoux> returns 131071 in actual code
<bvernoux> instead of 131072
<bvernoux> yes at the end results should be 131072
<bvernoux> but it is not ;)
<bvernoux> there is a crazy bug somewhere ;)
<bvernoux> as something is broken
<bvernoux> show it shall returns correctly 131072
<Bird|otherbox> bvernoux, what type is depth?
<bvernoux> size_t
<Bird|otherbox> yikes. size_t to float to size_t...that's not cheap, atop being not quite right
<bvernoux> yes there is clearly something with a round issue ...
<bvernoux> but when I execute it online with GCC it returns correct result
<bvernoux> let me check with MSYS2 mingw64 now ;)
<bvernoux> even with MSYS2 mingw64 it returns correctly 131072
<bvernoux> very very strange ;)
<miek> i guess log2 impl. may vary between compilers?
<bvernoux> very strange I'm testing with a simple code with same dev env
<bvernoux> maybe I do not have same opt for gcc ...
<bvernoux> just tried -O3
Stary has joined #scopehal
<sorear> I would not assume log2 produces exact results for powers of two input
<sorear> that's what frexp is for
<bvernoux> yes interesting
<bvernoux> we can also do that with shift ;)
<bvernoux> and do not use float at all
<bvernoux> anyway size_t is not justified here
<Bird|otherbox> yeah, I was kind of thinking that it'd be better done either using a compiler intrinsic or bitwise work, instead of going out to float and back again
<bvernoux> as it means a number of samples and in best case we will never have more than 32bits values for that
<bvernoux> 4billions samples is a lot for 1 frame ;)
<Bird|otherbox> eeeh, maybe not?
<bvernoux> just with 200Millions pts it is already ultra slow ;)
<bvernoux> like 1WFM/s
<Bird|otherbox> if you're doing it realtime maybe, but for a "one shot" FFT analysis, feeding several billion points in might just be justified
<bvernoux> for a scope ?
<Bird|otherbox> I'm thinking of this from the library/API developer's view
<bvernoux> I do not know any scope which can provide billions pts for 1 frame
<bvernoux> but yes let's keep this size_t
<bvernoux> we can do the trick using 64bits shifft anyway ;)
<Bird|otherbox> (I'm assuming ffts is the one doing the depth to npoints math)
<miek> you're casting it to size_t whether you like it or not, cause that's what ffts takes :p
<bvernoux> as there is clearly an issue with conversion from float/double/size_t in a special condition
<bvernoux> with log2, ceil and/or pow
<miek> Bird|otherbox: that math is in scopehal
<Bird|otherbox> oh. I see
<Bird|otherbox> FFTS requires depth to be power-of-2
<Bird|otherbox> err npoints
<bvernoux> this things will do the tricks
<bvernoux> uint32_t v = depth; // compute the next highest power of 2 of 32-bit v
<bvernoux> v |= v >> 1;
<bvernoux> v |= v >> 2;
<bvernoux> v--;
<bvernoux> v |= v >> 4;
<bvernoux> v |= v >> 8;
<bvernoux> v |= v >> 16;
<bvernoux> v++;
<bvernoux> haha
<bvernoux> ;)
<bvernoux> ok it works only with uint32
<Bird|otherbox> yeah, exactly the recipe I was looking at -- should be trivial to extend to 64bit, no?
<bvernoux> yes
<bvernoux> the solution
<bvernoux> uint64_t v = depth; // compute the next highest power of 2 of 32-bit v
<bvernoux> v--;
<bvernoux> v |= v >> 2;
<bvernoux> v |= v >> 1;
<bvernoux> v |= v >> 4;
<bvernoux> v |= v >> 8;
<bvernoux> v |= v >> 16;
<bvernoux> v++;
<Bird|otherbox> probably should be tucked off into its own function I reckon though
<bvernoux> const size_t npoints1 = v;
<bvernoux> printf("depth=%zu npoints=%zu\n", depth, npoints1);
<bvernoux> it support 64bits ;)
<bvernoux> so I really doubt it is a limitation ;)
<bvernoux> until we have enough ram for 64bits ;)
<Bird|otherbox> well, try feeding a value >4GB in there
<Bird|otherbox> I think you'd need a v |= v >>32; right before the v++; to make it work right
<bvernoux> $ ./main.exe
<bvernoux> depth=10000000000000000 npoints=18014398509481984
<bvernoux> depth=10000000000000000 npoints=18014398509481984
<bvernoux> volatile size_t depth = 10000000000000000;
<bvernoux> int main()
<bvernoux> const size_t npoints = pow(2, ceil(log2(depth)));
<bvernoux> printf("depth=%zu npoints=%zu\n", depth, npoints);
<bvernoux> {
<bvernoux> uint64_t v = depth; // compute the next highest power of 2 of 32-bit v
<bvernoux> v--;
<bvernoux> v |= v >> 1;
<bvernoux> v |= v >> 2;
<bvernoux> v |= v >> 4;
<bvernoux> v |= v >> 8;
<bvernoux> v |= v >> 16;
<bvernoux> v++;
<bvernoux> const size_t npoints1 = v;
<bvernoux> printf("depth=%zu npoints=%zu\n", depth, npoints1);
<bvernoux> return 0;
<bvernoux> }
<bvernoux> it seems to be a hack but let's test it now in real code ;)
<bvernoux> it avoid using tons of lib and conv anyway
<bvernoux> let's test with real code ;)
<bvernoux> even smarter
<bvernoux> uint64_t next_pow2(uint64_t x) {
<bvernoux> return x == 1 ? 1 : 1<<(64-__builtin_clzl(x-1));
<bvernoux> }
<bvernoux> haha ;)
<bvernoux> 2 instructions
<Bird|otherbox> nice
<Bird|otherbox> I was figuring there'd be something like that based on bitwise intrinsics
<bvernoux> haha
<bvernoux> and now that works ;)
<bvernoux> fixed ;)
<bvernoux> we can call that a crazy bug which was affecting only MSYS2 mingw64 build ...
<bvernoux> now I shall fix other class
<bvernoux> with same trick
<bvernoux> like FFT ...
<bvernoux> shall be fixed in DeEmbedFilter.cpp, FFTFilter.cpp and JitterSpectrumFilter.cpp
<bvernoux> will add a common function for that ;)
<bvernoux> ok all is fixed
<bvernoux> waiting azonenberg to merge the PR ;)
<azonenberg> bvernoux: ooh good catch
<azonenberg> also please do keep size_t
<bvernoux> yes i kept it ;)
<bvernoux> the limitation is uint64_t next_pow2(uint64_t v)
<azonenberg> WavePro HD can go up to 5G points per acquisition
<bvernoux> with 64bits we have margin ;)
<azonenberg> so it's plausible we could exceed the range of uint32_t for sample counts
<bvernoux> anyway I have not changed the size_t
<azonenberg> Ok
<bvernoux> I have done some internal test to validate it
<azonenberg> Will look in a minute. still catching up on what happened in the outside world when i was asleep :p
<bvernoux> in limit is around => 9223372036854775809
<bvernoux> so I doubt you have so much memory one done for 1 frame ;)
<bvernoux> -done+day
<bvernoux> it was really tricky to catch that bug
<bvernoux> as I cannot reproduce it outside glscopeclient release
<bvernoux> with a simple C code it works
<bvernoux> it explain why FFT was crashing with Windows build too
<bvernoux> it affect Demo ...
<bvernoux> JitterSpectrumFilter too
<bvernoux> FFTFilter
<bvernoux> DeEmbedFilter
<bvernoux> and TestWaveformSource
<bvernoux> used only in demo mode
<bvernoux> now it is rock stable ;)
<bvernoux> and i can add some nice FFT
<bvernoux> to Demo ...
<bvernoux> My code is portable and universal ;)
<miek> why are there 6 PRs for it? D:
<bvernoux> because I have done them manually in github ;)
<bvernoux> yes it's crap but there is one line each time in each file ;)
<bvernoux> next PR is to improve Rigol MSO5000 speed and robustness ;)
<bvernoux> we can win 10% speed ;)
<azonenberg> :)
<azonenberg> bvernoux: please use proper PRs next time though, not one PR per line changed :p
<miek> also, if you did it all manually on github - is it actually tested?
<bvernoux> it was to win time as all my fork are totally outdated ;)
<bvernoux> yes I have tested it
<bvernoux> locally here
<miek> you checked it back out after all the PRs?
<azonenberg> miek: yeah i'm going to go through and edit it after i've merged everything
<azonenberg> it's just ugly to deal with right now
<bvernoux> more tests are welcome for corner cases ;)
<miek> i gave you one in the review
<bvernoux> ha ok
<bvernoux> nice
<bvernoux> fixed ;)
<azonenberg> Fix pushed
<azonenberg> using the clzl intrinsic if gcc and falling back to the portable version if not
<azonenberg> Also my solder-in 4 GHz active diff probe came in over the weekend. Still waiting on the probe positioner which was supposed to have come saturday but ended up being delayed
<bvernoux> yes nice ;)
<bvernoux> clzl is the best but was not portable
<bvernoux> like that we have a nice next_pow2 function ;)
<azonenberg> yeah i have absolutely no problem with platform specific optimizations as long as they don't break things
<azonenberg> have you seen all the avx2 code i have sitting around? :P
<bvernoux> no i have not checked
<azonenberg> I run glscopeclient under vtune routinely and any time something starts feeling slow i try to vectorize it
<bvernoux> it is a nightmare on MSYS2 GDB for mmx/avx code ;)
<azonenberg> de-embed, eye pattern, even lecroy waveform download (conversion from raw adc samples to volts), and more all have hand tuned AVX2 intrinsic implementations as well as portable C++
<bvernoux> ha nice
<bvernoux> let's rebuild all ;)
<azonenberg> 724 WFMs / 1 min = 12.06 WFM/s average for a 1-minute test
<bvernoux> and before optim ?
<azonenberg> Can't say, some of the filters i use didn't even exist when i started all of the AVX stuff
<azonenberg> i can say that the eye pattern filter alone is >2x faster than it was
<bvernoux> ha great
<azonenberg> i also cache a bunch of operations like "find zero crossings" that i often call multiple times on one waveform
<azonenberg> The CDR PLL is actually a bottleneck now, but before i optimize it i want to implement a proper, industry standard CDR. Preferably several different PLLs
<d1b2> <TiltMeSenpai> meanwhile in Rigol MSO5k land...
<d1b2> <TiltMeSenpai> it has gotten a lot better though, thank you for the work
<bvernoux> on my side I'm searching a very fast 16 IO digital signal generator which can go at up to 250MHz ;)
<d1b2> <TiltMeSenpai> just not 12 WFM/S better
<azonenberg> Lol. That's 12 WFM/s with a fairly heavy processing graph. it does 30+ just showing waveforms
<bvernoux> so far on MSO5k we cannot reach more than 1 WFM/s ;)
<bvernoux> I have an optimisation to win 10% ;)
<azonenberg> But that's what happens when you're using a scope that would have cost you north of 40K USD if you had bought it new
<d1b2> <TiltMeSenpai> I wonder at what point do we explore custom firmware
<azonenberg> rather than under 4K :p
<bvernoux> but we shall push Rigol to fix their SCPI management
<bvernoux> it is a real cheat
<d1b2> <TiltMeSenpai> it's theoretically possible to have a fully custom stack on the Rigol, right? just a ton of work?
<bvernoux> the MSO5k shall be able to push at least 20 WFM/s with 10kpts
<azonenberg> TiltMeSenpai: That is not something i have spent much time looking into because for me, the endgame has always been full custom hardware
<azonenberg> instrumentation purpose built for headless operation with fpga accelerated networking rather than being bottlenecked on slow TCP on a dinky arm core
<d1b2> <TiltMeSenpai> yeah custom Rigol firmware isn't something I would ask you to support, but we can spread the love
<azonenberg> if someone makes a viable custom firmware that gains traction i will gladly help with the driver side
<d1b2> <TiltMeSenpai> I'm totally buying whatever you come up with when it becomes a product though
<bvernoux> if they disable the display with a special SCPI mode there will be not limit ;)
<azonenberg> but yeah in general i really rely on third party contributions for driver support
<bvernoux> I'm pretty sure they can saturate the Ethernet Gigabit
<azonenberg> Because i just don't have access to anything other than lecroy scopes, and whatever else i can convince people to let me borrow in person or over a VPN
<bvernoux> it depends on the link to retrieve the data from ADC/FPGA => Zynq7010 Cortex A9
<azonenberg> bvernoux: BTW, I'm tentatively going to be borrowing an 8 GHz MSO64 from a friend in seattle over christmas
<bvernoux> azonenberg, woahou nice
<bvernoux> azonenberg, what the price of such beast ?
<bvernoux> do he have made a deal ?
<azonenberg> high five, low six digits. no idea what he paid for it
<bvernoux> to pay it less than 40KUSD ;)
<azonenberg> My goals there are... test the AKL-PT1/PT2 on a higher bandwidth scope
<d1b2> <TiltMeSenpai> I was looking at the way the "feature discounts" work, and they're all patching the binary. I think in theory that means you can patch the binary to whatever you want, such as creating a SCPI only mode
<azonenberg> Fine tune and optimize the MSO5/6 driver
<bvernoux> d1b2, yes there is clearly something to do like that or to push Rigol to do it ;)
<azonenberg> and most importantly, i want to debug the multi-scope code a bit more and do a "plugfest" demo with a MSO64 and a WaveRunner 8000 series scope under one UI
<azonenberg> i think that would be yet another world first lol
<azonenberg> Two modern multi-GHz scopes, not just different families but different brands, seamlessly working together under one UI
<azonenberg> In theory i have everything i need for that right now. i'm sure i will run into quirks but that is the goal
<azonenberg> and i'll have a week or two with the scope while i'm off work to debug
<bvernoux> yes will be amazing ;)
<bvernoux> is intended for such type of SCPI tuning ;)
<bvernoux> and is fully reusable for other scope
<bvernoux> anyone is welcome it is fully open source
<bvernoux> advantage is it build in few ms ;)
<bvernoux> compared to build glscopeclient ;)
<azonenberg> Well i'm glad you're adopting the MSO5 driver. even if it's slow it's nice to have someone maintaining it
<azonenberg> also, i have one of those cheap fx2 LAs now
<azonenberg> but have not had time to work on it
<azonenberg> on driver support i mean
<azonenberg> That will be good to have though
<bvernoux> the fx2 LAs will be a big killer
<bvernoux> as it could bring lot of people to help on code
<bvernoux> as LA is not very supported so far
<bvernoux> FFT on LA channel could be fun ;)
<bvernoux> or even ClockRecovery ;)
<bvernoux> main interest is to see in realtime the data with trigger
<bvernoux> which is not possible with LA today
<bvernoux> as it is mainly a big capture then analyze after
<bvernoux> as LA gui like pulseview or saleae are not designed to show things in realtime
<azonenberg> I want to be able to do digital phosphor effects on LA waveforms
<bvernoux> ha yes will be nice
<azonenberg> Which will require, among other things, that i fix persistence which has been broken for months
<azonenberg> Profiling of a 1-minute run on the jitter decomposition example
<azonenberg> showing where the hot filters are
<bvernoux> this intel v-tune profiler is very nice
<azonenberg> yeah that plus nvidia's nsight for the GPU side have been invaluable for performance tuning
<bvernoux> I plan to buy AMD only things ;)
<bvernoux> will be interesting
<bvernoux> I hate Intel & Nvidia ;)
<azonenberg> Meanwhile i'm pretty much 100% intel + nv in my dev environment lol
<d1b2> <TiltMeSenpai> oh, if you're actually curious about custom applications for the MSO5k, someone started a thread on it 😛
<azonenberg> my main workstation is dual xeon 6144s plus a 2080 Ti
<bvernoux> Ryzen9 Serie5 Desktop with latest AMD GFX card which explode Nvidia ;)
<azonenberg> also BTW, average CPU usage during this test was only 1.06 cores
<bvernoux> for 3Keuros I shall have something decent with 12x core running near 5GHz ;)
<azonenberg> So i think there is a *lot* of room for glscopeclient to parallelize more
<azonenberg> There's a lot of stuff being done in the main thread that i think can be moved out
<bvernoux> yes it is still very impressive with the demo
<bvernoux> will be nice to add WFM/s in demo ;)
<bvernoux> as it clearly test the CPU+GPU
<azonenberg> you can see activity of each thread over time
<azonenberg> the first OpenMP worker thread gets a big spike of activity every frame, and the scope driver thread is doing a fair bit of work
<azonenberg> then the second OpenMP thread does a bit here and there
<azonenberg> but most of the cores are not actually doign a ton
<bvernoux> OpenMP is sleeping all the time on 2 cores ;)
<azonenberg> So if i can figure out how to effectively use more threads we should see much better performance
<bvernoux> yes clearly
<bvernoux> and decouple better the UI from WFM from scope
<bvernoux> there is clearly something locked today
<azonenberg> So far i've been focusing on very easy parallelism, like running different blocks in the filter graph in parallel, as well as vectorization because that is also generally easy for DSP-focused stuff
<bvernoux> which can be seen on Rigol MSO5000 with it's 1WFM/s ;)
<azonenberg> Yeah. Definitely a lot of performance tuning to do before we hit my goals of 3-4 digit WFM/s
<bvernoux> the UI is totally unresponsive
<bvernoux> in order to use the UI I switch to step by step WFM
<azonenberg> Write queueing is going to help a lot with that I think
<bvernoux> I'm impatient to see the result when it will be implemented
<azonenberg> But it's going to be a major project so I've been putting it off until I have time to devote to it. That might be another to-do over the christmas holiday
<bvernoux> also a parameter to say single trigger at startup will be nice
<azonenberg> i have two weeks off work
<bvernoux> I will check
<azonenberg> rather than adding that, probably better to just fix things to not be so slow :p
<azonenberg> again, write queueing should make a huge difference
<bvernoux> yes need refactor of the socket
<bvernoux> to expose socket directly
<bvernoux> that will improve lot of cases
<azonenberg> I just don't have enough hours in the day to do everything i want :p
<bvernoux> and add robustness
<bvernoux> for all scope ;)
<bvernoux> in case you change something today it timeout and crash
<azonenberg> Like documentation. I have a 135-page manual that needs major updating and completion
<bvernoux> yes documentation will come later ;)
<bvernoux> You shall ask for GSOC
<bvernoux> to have a student working on scopehal during few months full time
<azonenberg> Maybe for 2022
<azonenberg> there's still a lot more setup work to do before we get to that point. I want to have a proper website for the project, some developer documentation
<azonenberg> a much-updated manual
<azonenberg> etc
<azonenberg> and installers, at least a v0.1 level binary release
<bvernoux> yes
<bvernoux> the only missing step is to copy libffts.dll today
<bvernoux> or maybe use it as static that will avoid an other DLL ...
<bvernoux> as anyway ffts will do not change it is frozen since 2017
<azonenberg> yes. seriously thinking of updating it myself with AVX support at some point
<azonenberg> and/or going straight to GPU
<bvernoux> a good feature will be to add details on trigger how many GSPS/MSPS
<bvernoux> like we have on "standard" scope
<bvernoux> but it is mainly gui stuff to add
<azonenberg> We have that on every waveform in the infobox
<azonenberg> it shows the timebase config
<azonenberg> At least it's supposed to
<bvernoux> but it is not displayed
<azonenberg> see the infobox on the "TX" channel
<azonenberg> do you not get that with mso5k?
<bvernoux> ha yes
<bvernoux> IIRC no ther is nothing in mso5k
<bvernoux> let me check ;)
<azonenberg> it's per channel not in the footer, because with some scopes or multi-scope setups timebase may not be the same for all channels
<bvernoux> ha yes that work
<azonenberg> Like MSO vs analog, or 8 vs 12 bit mode on BLONDEL
<bvernoux> what is missing is horizontal timebase
<bvernoux> in fact
<azonenberg> what do you mean? time/div?
<bvernoux> yes like 200us
<azonenberg> That's intentional, "divisions" don't make sense in this kind of UI
<azonenberg> the record length may be longer or shorter than what you see on screen
<bvernoux> with trigger also with the level
<bvernoux> when it is basic trigger
<azonenberg> Trigger level is shown with a little arrow on the right side
<azonenberg> i don't print out the actual level but you can see it in the trigger properties dialog
<azonenberg> i figure for most purposes just seeing the arrow is good enough
<bvernoux> ha yes
<azonenberg> trigger position is shown on the top time bar too
<azonenberg> although right now you cannot drag it to move it around
<azonenberg> you will be able to in the future
<azonenberg> that's one of many to-do items that i've just been too busy to touch
<bvernoux> i will push some fixes also to refresh the channels when restarting trigger (Single Trigger)
<bvernoux> as it does not refresh what happen on scope
<azonenberg> what do you mean "refresh the channels"
<azonenberg> do you mean clearing the cache of voltage range etc?
<bvernoux> it shall poll what chan are enabled/disabled
<bvernoux> as sometimes it is more convenient to pause glscopeclient do all on scope and restart the trigger
<azonenberg> The 'refresh' button on the toolbar clears the cache and is intended to be used if you poke settings on the hardware
<azonenberg> Enabling/disabling channels on the scope is not something you should be doing
<azonenberg> glscopeclient reference counts channels and disables them on hardware when the last filter or viewport using it is closed
<bvernoux> I speak about > and >II (pause)
<bvernoux> buttons
<azonenberg> Yes. You should NOT clear the cache when you do a single trigger
<azonenberg> that would hugely slow things down
<bvernoux> which do single trigger or continuous
<azonenberg> There's a dedicated toolbar button for that which you can use if you change settings on the scope
<bvernoux> or maybe reload all with the settings button
<azonenberg> Clearing every trigger would be stupid and horribly slow
<bvernoux> Reload configuration from scope
<bvernoux> it shall check again the chan enabled/disabled ...
<bvernoux> that does not work for that
<azonenberg> That doesnt make sense though
<azonenberg> why would you turn channels on/off in the hardware?
<bvernoux> to do a full refresh
<bvernoux> because it is more convenient to do it on scope side with mso5 ;)
<bvernoux> as it is unusable at 1WFM/s on glscopeclient
<azonenberg> what would you expect glscopeclient to do if you turn a channel on? create a new waveform area?
<azonenberg> that's a GUI function
<bvernoux> for such stuff
<azonenberg> we can't have the library calling out to gui code that might not even exist
<azonenberg> and why would you turn a channel off in the hardware if you are still displaying a waveform?
<bvernoux> yes today I shall pause and manually remove the channels
<azonenberg> if you hide that waveform the channel turns off automatically
<bvernoux> yes it works
<bvernoux> in pause ;)
<azonenberg> The proper solution is fixing any UI hangs that make this workflow not usable
<azonenberg> Not adding hacky workarounds
<bvernoux> as with 1 WFM it is too slow to do anything with the GUI today
<azonenberg> again, acquisition framerate should not affect gui performance
<azonenberg> all reads done by gui code should be cached
<azonenberg> if they're not, that's a driver bug
<azonenberg> writes done by gui code are currently blocking but that will be fixed by write queueing
<bvernoux> maybe you can simulate that with your scope ?
<bvernoux> or only the Rigol MSO5000 driver is affected
<bvernoux> it is not clear
<azonenberg> It's very likely that there's a driver bug where something that should be cached is not
<bvernoux> it can be simulated ;)
<bvernoux> with a basic server code simulating basically scope wfm
<bvernoux> with some delay ;)
<azonenberg> Anyway we know write queueing needs to happen
<azonenberg> That's the next of the major refactorings i've had pending
<azonenberg> I'll work on it over christmas
<azonenberg> in the meantime dont do anything major that breaks things :p
<bvernoux> yes let's wait that write queueing ;)
<bvernoux> I will optimize rigol SCPI commands
<azonenberg> But feel free to do wireshark captures of glscopeclient running and look for any read commands it shouldnt be making
<bvernoux> for mso5 only as I do not have the other scope
<bvernoux> just found a new bug
<bvernoux> Unhandled exception at 0x0000000004DBB558 (libscopeprotocols.dll) in glscopeclient.exe: 0xC0000094: Integer division by zero.
<bvernoux> div by zero ;)
<azonenberg> i just fixed one of those this morning
<azonenberg> where is it?
<bvernoux> when adding Jitter Spectrum
<bvernoux> on my 1st scope chan
<azonenberg> What was the input to the jitter spectrum? it should be a TIE waveform
<bvernoux> it was not a TIE waveform ;)
<azonenberg> but it shouldn't crash given an analog channel
<bvernoux> it was an analog chan
<azonenberg> it just won't display useful data
<bvernoux> just a square wave
<azonenberg> I confirm it does crash in that case though. interesting
<azonenberg> scopehal:#379 filed
<bvernoux> Watefall also crash on analog chan
<bvernoux> Waterfall
<azonenberg> scopehal:#380 filed
<bvernoux> with an access violation
<azonenberg> Will work on both after work
<bvernoux> Access violation reading location 0x00000000000000A0.
<bvernoux> now I'm more confident ;)
<bvernoux> as the FFTS mess was really crappy
<bvernoux> it was not the fault of ffts ;)
m4ssi has quit [Remote host closed the connection]
<azonenberg> All of the filters *should* reject invalid inputs
<bvernoux> so you reproduce both issue ?
<azonenberg> Yes
<bvernoux> ok great
<azonenberg> Waterfall expects a FFT as input
<azonenberg> verify it works when you do that?
<bvernoux> yes let's try that
<bvernoux> I was thinking it was doing everything
<azonenberg> No, these are generally pretty single purpose "unix pipe chain" style processing blocks
<azonenberg> i will eventually add shorthands in the UI to handle common cases though
<bvernoux> ha ok logic in that case
<azonenberg> This will all be documented once i've finished writing the manual
<azonenberg> The bug, though, is that it allows you to pick illegal inputs
<azonenberg> I have input validation logic that should prevent you from passing anything but what it expects
<azonenberg> clearly, it's not perfect :p
<bvernoux> yes the waterfall work fine on a FFT
<azonenberg> Ok great so its just an input validation problem
<bvernoux> yes
<bvernoux> never heard of I3C ;)
<bvernoux> it is the successor of I2C ;)
<azonenberg> it's some weird mipi thing
<azonenberg> i've never seen it in the wild
<azonenberg> But i generally file a ticket for a protocol once i become aware that other scopes can decode it because clearly someone is using it
<bvernoux> yes funny
<azonenberg> then actually work on it once i get my hands on a device that implements it
<d1b2> <TiltMeSenpai> there's supposedly a couple Lattice FPGA's that support config over i3c
<bvernoux> you shall add eMMC ;)
<bvernoux> hydrabus fw support it now ;)
<azonenberg> I have emmc support already
<bvernoux> ha really ?
<azonenberg> it's the same as SD Card bus, just with a few different configs
<bvernoux> 1bits, 4bits 8bits ?
<azonenberg> So far only the 4 bit SD bus i believe
<bvernoux> ok
<bvernoux> how do you compute the horizontal scale ?
<bvernoux> for a scope
<bvernoux> as it is always wrong with mso5k
<bvernoux> I have 200us/div and there is always 10div on mso5k
<azonenberg> Read the driver code. Internally i calculate ps per sample and then use that
<azonenberg> the display scale is variable and decoupled from the scope config
<bvernoux> ha ok so it is the ps per sample which is wrong
<azonenberg> sorry fs not ps
<bvernoux> yes
<bvernoux> fs_per_sample
<bvernoux> I do not understand the vertical scale
<bvernoux> as I have no point in negative
<bvernoux> and it is scaled to -6.0485
<bvernoux> maybe some wrong data during capture ?
<bvernoux> as it is scaled automatically depending on live data (min/max) ?
<azonenberg> it should not be auto scaled
<bvernoux> ha ?
<azonenberg> vertical scale should be based on querying the scope for adc full scale range in some way
<bvernoux> ha ok so part is also wrong
<azonenberg> the vertical display scale should always be synced to the adc range so you cannot clip samples off scale
<bvernoux> as both horizontal timebase is wrong and vertical scale too
<bvernoux> I suspect it use code frol OLD_DS ... which is not good for mso5k
<bvernoux> the formula float v = (static_cast<float>(temp_buf[j]) - ydelta) * yincrement;
<bvernoux> is definitely wrong for mso5k
<bvernoux> I shall check what is the correct ones ;)
<bvernoux> it shall take attenuation into account IIRC
<bvernoux> it is crazy no one report anything on mso5k
<bvernoux> as there is tons of guys which have it ;)
<bvernoux> more than guys with high end Lecroy ;)
<azonenberg> No idea
<azonenberg> all i can say is i only fix the bugs people tell me about
<bvernoux> yes ;)
<bvernoux> I have just received my STM32F1xxx with LQFP48 socket programmer ;)
<bvernoux> very nice
<bvernoux> it work like a charm with a GD32F103 provided in LQFP48 in the socket
<bvernoux> it is a project to hack the NanoVNA 2Plus4 to replace the MCU by latest STM32LP4
<bvernoux> which is pin compatible LQFP48
m4ssi has joined #scopehal
_whitelogger has joined #scopehal
<azonenberg> finally got my other diff probe set up
<azonenberg> I really need to build myself a diff probe. But I want to get the single-ended versions fully debugged first
<bvernoux> the orange shall be better
<bvernoux> the pink have a so long ground
<bvernoux> I imagine the mess ...
<bvernoux> it is not a ground it look like an antenna ;)
<bvernoux> if it is differential probe such long ground shall be not too problematic
<azonenberg> They're both differential
<azonenberg> they probably have an inductor or choke in the ground to keep RF from going through it, in fact
<azonenberg> it's just to keep the common mode of the scope and DUT at about the same level
lukego has quit [*.net *.split]
lukego has joined #scopehal
m4ssi has quit [Remote host closed the connection]
bvernoux has quit [Quit: Leaving]
<azonenberg> ok, fixed those crashes