<_whitenotifier-f>
[scopehal] azonenberg labeled issue #352: Strange issues with jitter spectra changing length each trigger - https://git.io/JkdyV
<_whitenotifier-f>
[scopehal] azonenberg labeled issue #352: Strange issues with jitter spectra changing length each trigger - https://git.io/JkdyV
<_whitenotifier-f>
[scopehal] azonenberg opened issue #352: Strange issues with jitter spectra changing length each trigger - https://git.io/JkdyV
<_whitenotifier-f>
[scopehal] azonenberg commented on issue #352: Strange issues with jitter spectra changing length each trigger - https://git.io/Jkd78
<azonenberg>
lain: hmmmm
<azonenberg>
So i tried to plot the jitter spectrum of my signal and got absolute garbage
<azonenberg>
did a bit of digging and it turns out the root cause is that my TIE samples are not at a uniform rate, because I only output a sample when the input togglesd
<azonenberg>
and of course the FFT doesn't properly handle this
<azonenberg>
Sooo i'm trying to think of the best way to handle this
<azonenberg>
I think probably the best option starting out is to assume that the baud rate is constant, in which case all TIE samples are at integer multiples of some common spacing
<azonenberg>
and then fill in data between when there's no toggles so i end up with one sample per UI regardless of the actual data pattern
<azonenberg>
but should i zero fill? interpolate? trying to think what makes the most sense when FFTing irregularly sampled data
<lain>
hmm
<lain>
gooood question
<azonenberg>
the other option is to decimate
<azonenberg>
dropping down to the longest run length
<azonenberg>
but i feel like that would lose data
<lain>
yeah
<azonenberg>
I think i'm going to start out by zero filling and see how that looks
<lain>
hmmm
<lain>
zero-filling between "time domain" samples that you're going to feed into an FFT is going to create high frequency spikes I believe
<azonenberg>
Yeah that was my concern. but i dont know the right way to fix it
<azonenberg>
linear interpolation would probably produce bogus values too
<lain>
so it's literally just the DFT, but replacing 't' with the sample time for each sample
<azonenberg>
yeah but all of the good libs i see for it are GPL
<lain>
time to implement DFT and NUDFT as a computer shader? ;)
<azonenberg>
that would be a good endgame, yes
<azonenberg>
for now i think interpolation is a good starting point and then doing an equispaced fft
<lain>
you really need to implement windowing btw
<azonenberg>
Yeeeah
<azonenberg>
What's the best way to do that on a naked fft?
<azonenberg>
Do you just convolve the input with the window?
<d1b2>
<david.lenfesty> No you just multiply it directly
<azonenberg>
multiply directly? how does that even work
<azonenberg>
is your window as long as the input signal?
<d1b2>
<david.lenfesty> Yeah you size the window for your FFT size
<azonenberg>
Interesting
<lain>
yeah
<d1b2>
<david.lenfesty> I mean it's basically just avoiding discontinuities at either ends of your samples
<d1b2>
<david.lenfesty> and you pick the window that has the best effect on your output
<azonenberg>
So is the common "rectangular" window just a no-op then? or does it force the first/last sample to zero or something like that?
<lain>
exactly, FFT/DFT assume a periodic input, so the point of the window function is to make it so if you repeat your input buffer (after windowing) end-to-end, there's no discontinuity
<lain>
the common "rectangular window" is no window at all, yeah
<lain>
also note: I'm not sure if you're doing this (you should!), but, as I'm sure you're aware, DFT frequency-domain sample count is the same as the number of time-domain samples you feed in. to increase the frequency-domain sample count, you can append zeroes to your input samples, for as many samples as you need
<lain>
but you must append those zeroes *after* windowing
<azonenberg>
Yes i'm doing zero padding. Good to know
<azonenberg>
Right now what i do is zero pad to the next power of two up
<lain>
makes sense
<azonenberg>
since ffts expects power of two input size
<azonenberg>
And in all cases the output of the window is the same length as the input right?
<d1b2>
<david.lenfesty> yep
<codysseus>
Hope y'all had a nice Thanksgiving!
<azonenberg>
Had a nice feast, you?
<azonenberg>
lain: btw, it annoys me to no end that AVX doesn't have a vector sin/cos
<azonenberg>
i really want to vectorize the window processing here
<azonenberg>
I've seen some nice hacks using taylor series or similar, which claim to be faster than doing 8 sequential sin/cos operations
<azonenberg>
gonna have to benchmark them
<azonenberg>
but i wish the cpu could do it natively
<azonenberg>
realistically long term i think i'm gonna be pushing ffts to the gpu
<azonenberg>
so it wont matter :p
<azonenberg>
the gpu should have no problem doing a buttload of parallel trig
<azonenberg>
eye patterns are my top priority for GPU processing though
<azonenberg>
as they shouldn't be that hard to parallelize, and they're one of the most compute heavy filters i have right now
<azonenberg>
i'm probably going to write a multithreaded proof of concept to test the parallel eye algorithm first
<azonenberg>
one thing i need to figure out his how to do gpu processing in *libscopehal* rather than glscopeclient
<azonenberg>
right now scopehal doesnt understand GPU stuff at all
<azonenberg>
lain: ok so the fft filter now supports Hann, Hamming, Rectangular, and Blackman-Harris windows
<azonenberg>
is that a good starting set?
<d1b2>
<Darius> have you seen gr-fosphor?
<d1b2>
<Darius> (might be worth looking at it for ideas)
<codysseus>
Yes, had a traditional Thanksgiving feast. The suffing turned out excellent this year.
<azonenberg>
darius: for what, gpu compute? or what
<d1b2>
<Darius> it uses GL and CL for a lot opf heavy lifting
<d1b2>
<Darius> worth stealing some ideas 🙂
<azonenberg>
oh. the challenge is more on the object model side of things
<sorear>
…what are you using for a scalar sin/cos?
<azonenberg>
sorear: just the normal libc sin/cos functions right now
<azonenberg>
for example, right now i rely on GTK to initialize the opengl context
<azonenberg>
but if you are using libscopehal headless, what happens?
<azonenberg>
Do i detect no GL context and fall back to software? or should there be some way to create a no-video compute context?
<azonenberg>
darius: those are the logistical challenges to work out
<azonenberg>
before i start pushing waveform compute to the GPU
<azonenberg>
it needs to happen, i've just been prioritizing other stuff
electronic_eel has quit [Ping timeout: 240 seconds]
<d1b2>
<Darius> ahh fair enough
<d1b2>
<Darius> not sure how many platforms would have OpenGL without OpenCL these days
electronic_eel has joined #scopehal
<azonenberg>
I'm using GL compute shaders because they interoperate very nicely with GL rendering
<azonenberg>
lain: also i think for de-embedding i want to keep using the rectangular window
<azonenberg>
because i don't want to attenuate everything that's not at the very end of my waveform
<azonenberg>
very middle*
<sorear>
azonenberg: just not sure what you're comparing to, my libc sin/cos are just a big run of avx ops evaluating a polynomial
<azonenberg>
hmm, i havent looked
<azonenberg>
i know i386 had dedicated sin/cos insns in the x87 instruction set
<azonenberg>
i havent looked at how it's done in x86-64
<d1b2>
<Darius> worth looking at how numpy et all do it I would have thought
<_whitenotifier-f>
[scopehal] azonenberg 6c7aab5 - FFT filter now supports Hann, Hamming, and Blackman-Harris windows (default Hamming) as well as the previous default of rectangular. Fixes #129.
<_whitenotifier-f>
[scopehal] azonenberg 39d73cf - Added AVX sin/cos/exp/log library (zlib license, single file). Made some tweaks so it compiles on current gcc.
<Kliment>
azonenberg: that's the difference between each clock edge and the average recovered clock? or between each clock edge and nominal?
<azonenberg>
TIE is each clock edge compared to the recovered clock using an idealized PLL
<azonenberg>
you cant use a fixed frequency because PCIe is spread spectrum clocked
<Kliment>
So it's adaptive over time?
<azonenberg>
Yes
<Kliment>
Over what window?
<azonenberg>
Which filters out the low frequency "jitter" caused by the spread spectrum modulation
<azonenberg>
Right now? I have absolutely no clude :p
<azonenberg>
clue*
<Kliment>
It's kinda a very critical factot
<Kliment>
factor*
<azonenberg>
My current CDR PLL is a rather arbitrary bang-bang control loop i made with no basis in theory and tuned just by seeing what gave the nicest looking eye pattern
<azonenberg>
there's an open ticket for implementing some actual golden reference PLLs
<Kliment>
also that eye looks absolutely awful
<azonenberg>
And i lack the control theory math to understand it
<azonenberg>
That's because it has 6 dB of de-emphasis
<azonenberg>
Which i'm not inverting
<Kliment>
apo: ^ control theory needed
<azonenberg>
i chose this as a test signal because the de-emphasis produces ISI in the jitter plots
<azonenberg>
which makes it a good test subject for jitter decomposition
<azonenberg>
you get nice peaks in the histograms instead of a smooth gaussian distribution like you'd have if you only had Rj
<azonenberg>
now there's DDj too
<Kliment>
I don't know what Rj and DDj are
<azonenberg>
random and data dependent jitter
<Kliment>
Ah
<azonenberg>
This is the VL805 PCIe 2.0 x1 to USB2/3 controller on the raspberry pi4
<azonenberg>
i'm sniffing the TX bus from the VL805 to the SoC right next to the VL805, off the ac coupling caps
<azonenberg>
The thing has WAY too much drive strength for what it's doing
<azonenberg>
it's using full 1.2V p-p swing with what looks like 6 dB of de-emphasis for... about a 20mm long run? less?
<Kliment>
Less
<azonenberg>
so i imagine the eye at the soc is shit too lol
<azonenberg>
The RX side, driven by the pi, looks much nicer
<azonenberg>
i suspect the BCM-whatever has more sane drive settings and is tuned for short range use
<azonenberg>
the VL805 is commonly used on PC motherboard pcie addon cards
<Kliment>
Yep
<azonenberg>
it might not even support low drive strength
<Kliment>
Designed for much much longer paths
<azonenberg>
i mean most serdes have adjustable drive strength for exactly this reason
<azonenberg>
but if it has that capability, the kernel driver in raspbian doesn't enable it
<Kliment>
or it doesn't matter
<azonenberg>
I mean, it wastes power and probably produces a little more EMI
<azonenberg>
but the BER seems fine
<azonenberg>
the eye is nice and open, it's just excessive
<Kliment>
azonenberg: What firmware do you have on the pi?
<azonenberg>
default, i just pulled lastest raspberry pi os image and burned to a sd card
<Kliment>
azonenberg: that's unrelated to fw
<azonenberg>
enabled sshd, plugged in a usb ethernet dongle, and started ping flooding something to generate pcie traffic to look at
<azonenberg>
so it's factory default config basically
<Kliment>
azonenberg: factory default config changed in late summer
<Kliment>
azonenberg: so you need to run a firmware update if your pi was manufactured prior to that
<azonenberg>
Interesting. so there's nonvolatile config on the board?
<azonenberg>
i thought everything came from the sd card
<azonenberg>
and the boot just ran off the soc's rom until it hit the sd
<azonenberg>
and all the blob fw was loaded off the sd
<Kliment>
yes, the pi4 has an eeprom
<Kliment>
the previous pis did not
<azonenberg>
Interesting. Good to know if i ever use a pi for anything sensitive :p
<Kliment>
they made a major change to the pcie config in that update
<Kliment>
which had an enormous impact on power usage and thermals
<azonenberg>
Interesting. Well, at some point i will apply the update but i specifically do not want to do it now
<azonenberg>
because this overly strong drive is actually a good test subject :p
<Kliment>
azonenberg: yeah, you might want to grab another pi4 and compare
<azonenberg>
It would not surprise me if that fw update changes drive strength on the vl805 though
<azonenberg>
i'll check it out later on
<Kliment>
azonenberg: it almost certainly does
<azonenberg>
that would be a cool demo though
<Kliment>
azonenberg: I think you can downgrade fw too
<azonenberg>
comparing eyes before and after
<Kliment>
azonenberg: so you might be able to do it on the same hardware
<azonenberg>
how can i see the current fw rev?
<azonenberg>
is there a linux command?
<Kliment>
yeah, hold on
<azonenberg>
Also, my second 4 GHz active diff probe should be shipping tuesday. So in a week or so i should be able to look at both sides of the link
<monochroma>
azonenberg: vcgencmd version
<Kliment>
vcgencmd bootloader_version
<Kliment>
monochroma: pretty sure it's bootloader_version not version
<lain>
azonenberg: yeah that sounds good @ fft window set
<monochroma>
Kliment: version get's the video core firmware build date
<Kliment>
monochroma: that's not the bit that's of interest here
<azonenberg>
Kliment: oct 22 2020 13:59:27 is "version" and sep 3 2020 13:11:43 is bootloader_version
<Kliment>
monochroma: we are looking at the eeprom that has the vl805 configuration
<azonenberg>
So maybe the VL805 doesn't support programmable drive or something
<azonenberg>
PCIe spec mandates full strength and does not require reduced
<azonenberg>
it defines both but reduced is optional
<Kliment>
azonenberg: maybe this is the reduced strength
<azonenberg>
No
<azonenberg>
it matches the full strength voltage measurements in the pcie gen2 spec
<azonenberg>
I measured 617 to -621 mV differential swing on a random 0-1 transition
<monochroma>
iirc the major change in the FW update was they enabled PCIe idling on the SoC side?
<azonenberg>
Which is 1238 mV differential swing
<azonenberg>
That is actually slightly too high. PCIe 2.0 base spec table 4-9 on page 247
<azonenberg>
Vtx-diff-pp is max 1.2V, min 0.8, at full swing
<azonenberg>
and max 1.2, min 0.4 at reduecd swing
<azonenberg>
With de-emphasis ratios of 3.5 nominal (allowable 3.0 - 4.0) and 6.0 nominal (allowable 5.5 - 6.5)
<lain>
yeah, what monochroma said -- they enabled pcie power saving
<lain>
pcie power saving being disabled is what caused the soc to thermal throttle with all cores underclocked to minimum and fully idle heh
<azonenberg>
lain: So it seems my fft peak detector is sloooow
<azonenberg>
that's my current optimization focus before i do anything else tonight
<azonenberg>
In that screenshot, i was running at 5.5 WFM/s
<azonenberg>
my ~1 minute test used 60.19 sec of CPU time, so basically it averaged 100% on one core
<lain>
:o
<azonenberg>
And PeakDetectionFilter::FindPeaks() took 10.882 sec of that
* lain
nod
<azonenberg>
by comparison the *eye pattern* took 10.685 sec
<azonenberg>
And that is one of my more compute intensive filters - like, it's first in line along with de-embedding to get moved to GPU once i'm ready to do that
electronic_eel_ is now known as electronic_eel
<_whitenotifier-f>
[scopehal] azonenberg pushed 3 commits to master [+0/-0/±5] https://git.io/JkbJ0
<_whitenotifier-f>
[scopehal] azonenberg f8b5656 - Massive optimizations to PeakDetectionFilter. Improved accuracy by interpolating results. Avoid double-covering areas we already know don't have peaks in them (50x speedup!)
<_whitenotifier-f>
[scopehal] azonenberg 484050c - Added "femto" prefix to Unit
<_whitenotifier-f>
[scopehal] azonenberg 35c590e - JitterSpectrumFilter: now display linear rather than log jitter
<azonenberg>
i'm not sure how their jitter spectrum is so much cleaner than mine
<azonenberg>
I note that their spectrum is not actually a Tj spectrum, it's Rj + BUj
<azonenberg>
Which seems to imply they've removed ISI from it
<azonenberg>
aka DDJ
<_whitenotifier-f>
[scopehal-apps] azonenberg pushed 1 commit to master [+0/-0/±2] https://git.io/JkbIB
<_whitenotifier-f>
[scopehal-apps] azonenberg 1934cf8 - WaveformArea: correctly display in-band power on linear spectra
<azonenberg>
So i think i'm starting to reach the point at which #217 is becoming an issue
<azonenberg>
we simply don't have enough timing resolution
<azonenberg>
This might end up being my weekend project
<azonenberg>
Changing the timescale unit for literally everything will probably take a while
<azonenberg>
every scope driver, every filter that works with frequency or timing...
<azonenberg>
Every trigger type
<azonenberg>
Between the refactoring and the fallout from the inevitable bugs, i'll be at it a while. But this has to happen to support higher end scopes no matter what, and my CDR PLL already has sub-picosecond resolution interpolated
<azonenberg>
and i feel like the longer i put it off the more work it's going to be
<azonenberg>
because more code to fix
<azonenberg>
Some quick grepping shows nearly 200 locations in libscopehal/scopeprotocols that reference ps as a timescale unit
<lain>
azonenberg: are you windowing that FFT for the jitter spectrum?
<lain>
I ask because a lack of windowing will result in significantly higher noise floor
<azonenberg>
Yeah the jitter spectrum is derived from the base fft filter class
<azonenberg>
i think i used blackman-harris for that screenshot
<azonenberg>
as it gave the best noise floor
<_whitenotifier-f>
[scopehal] azonenberg edited issue #217: Change base time unit from ps to fs - https://git.io/JJooR
<_whitenotifier-f>
[scopehal] azonenberg commented on issue #217: Change base time unit from ps to fs - https://git.io/JkbZG
<_whitenotifier-f>
[scopehal] azonenberg pushed 2 commits to master [+0/-0/±117] https://git.io/Jkbg7
<_whitenotifier-f>
[scopehal] azonenberg e85f1a9 - scopehal: Refactored all drivers, triggers, and Unit class to use femtoseconds instead of picoseconds as base time unit. See #217.
<_whitenotifier-f>
[scopehal] azonenberg 97e647f - scopeprotocols: converted all filters to us fs as timebase unit. Fixes #217.
<_whitenotifier-f>
[scopehal] azonenberg closed issue #217: Change base time unit from ps to fs - https://git.io/JJooR
<azonenberg>
that took less time than i thought, only two hours for the core of the refactoring on both libraries. Still have to do the glscopeclient side
<azonenberg>
And then i get to find how badly i broke everthing :p
<azonenberg>
one of the problems with a change this massive is there's not really any way to test it incrementally
<azonenberg>
my intent when designing libscopehal was to use a timestep so small that i'd never have to use fractions of it. I wasn't thinking about 80 Gsps scopes or sub-picosecond jitter on high speed serial links...
<miek>
would it not make sense to have a custom type for "scopehal time"? and define methods to go from/to SI units
<azonenberg>
i'm adding some #defines for FS_PER_SECOND etc which will simplify things a bit
<azonenberg>
but i dont see a reason to define a new internal type. The only benefit of that would be allowing further refactoring in the future
<azonenberg>
and honestly, i don't see us ever having to go below fs
<azonenberg>
i had actually considered using tens or hundreds of fs as the base unit but decided to go single fs since it was a SI prefix
<azonenberg>
ps was *almost* enough, i just needed one more sigfig
<azonenberg>
i dont see us needing sub fs resolution... probably ever lol. single fs resolution is enough for 0.0005 UI measurements on a 500 Gbps data stream
<azonenberg>
the best plls i've worked with have jitters in the low hundreds of fs
<azonenberg>
there do exist oscillators with jitter in the single digit fs apparently. But i think we're probably still good. you're not gonna be using a realtime scope to measure those
<azonenberg>
And the good news is, with the new defines, it wouldn't be THAT painful to switch to attoseconds if it became necessary :p
<azonenberg>
the other thing is, femtoseconds is a reasonably convenient time unit for scope measurements
<azonenberg>
it's nice and small, but using a 64-bit integer you still get about five hours of range
juli966 has joined #scopehal
<azonenberg>
monochroma, lain, marshallh (and any other lecroy users)
<azonenberg>
from what i can see lecroy contracts a lot of their decoder dev out to this guy
<monochroma>
oooo interesing
<azonenberg>
I've known about the company but now it's starting to look like it's one ex-lecroy guy who went off on his own
<azonenberg>
and still does contract work for them
<azonenberg>
codysseus: just a heads up, i'm almost done with a major refactoring which changes the internal timebase unit from ps to fs
<azonenberg>
not sure if you've written any analysis code yet but if your timestamps are off by a factor of a thousand with the latest version of scopehal that's why ;p
<_whitenotifier-f>
[scopehal] azonenberg pushed 1 commit to master [+0/-0/±4] https://git.io/Jkbp3
<_whitenotifier-f>
[scopehal] azonenberg 6b89283 - Fixed several bugs from refactoring
<_whitenotifier-f>
[scopehal-apps] azonenberg pushed 2 commits to master [+0/-0/±21] https://git.io/Jkbpl
<_whitenotifier-f>
[scopehal-apps] azonenberg f3b768a - Initial refactoring for femtosecond timebase. Doesn't load old picosecond-format files yet.
<_whitenotifier-f>
[scopehal-apps] azonenberg 6bbaf39 - Fixed several bugs from refactoring. Initial support for loading legacy picosecond-resolution files as well as loading and saving femtosecond-resolution.
<_whitenotifier-f>
[scopehal] azonenberg pushed 1 commit to master [+0/-0/±1] https://git.io/JkbjF
<_whitenotifier-f>
[scopehal-apps] azonenberg pushed 1 commit to master [+0/-0/±4] https://git.io/JkbjN
<_whitenotifier-f>
[scopehal-apps] azonenberg 15ff6db - Fixes to legacy file format loading
<azonenberg>
Welp, turns out it wasnt a weekend project. It's 9 in the morning on saturday and #217 is done
<codysseus>
azonenberg: Oh, thank you for letting me know! I will keep an eye out for that issue.
bvernoux has joined #scopehal
<azonenberg>
codysseus: it's all pushed now. If you're using the units framework, any references to the old picosecond timebase unit will fail to compile (this is intentional, to catch such errors). All you have to do is change it to fs when pretty-printing
<azonenberg>
if you're referencing internal timebase units as integers, just know that now they're measured in fs rather than ps now
<_whitenotifier-f>
[scopehal] azonenberg pushed 1 commit to master [+0/-0/±2] https://git.io/JkNT3
<_whitenotifier-f>
[scopehal] azonenberg a1cb055 - ClockRecoveryFilter: converted to integer math in inner loop rather than switching from float to integer and back all the time, now that we have high resolution timebase
<_whitenotifier-f>
[scopehal-apps] azonenberg pushed 1 commit to master [+0/-0/±1] https://git.io/JkNTS
<_whitenotifier-f>
[scopehal-apps] azonenberg 48d4f6d - Updated to latest scopehal
<azonenberg>
(I updated the example to match)
<codysseus>
That is very nice, thank you :)
<azonenberg>
This is a long overdue refactoring. It fixes some precision issues i was starting to hit with serial data streams above a few Gbps
<azonenberg>
and is also necessary to support >40 Gsps scopes
<azonenberg>
because the next standard sample rate up from 40 is 80, at least in LeCroy's product line, and 80 Gsps is not an integer number of picoseconds per sample (12.5)
<codysseus>
Oh that makes sense. Those small mismatches are a problem. Glad to hear there's more support now!
<azonenberg>
Yeah i'm eyeing several higher end scopes as potential 2021 purchases so this had to happen at some point before then
<azonenberg>
but i was also running into issues with jitter analysis that made it important to do sooner
<codysseus>
Ah yeah precision is important in that area. analog to digital is so finnicky,
<azonenberg>
Yeah. in fact i'm starting to think the linear interpolation i'm doing for the zero crossings might not even be enough, debating moving to sinc
<codysseus>
Now there is a new EE thing for me to learn. I was about to say Single Income No Children
<d1b2>
<Hardkrash> re 80Gsps, Intel using femtoseconds per period for the HPET is not sounding so silly now...