<awygle> i specifically wanted polarized, i never buy non-polarized sunglasses anymore
<azonenberg_work> Yeah i dont care about the polarized stuff aspect
<azonenberg_work> My requirements were a single frame available with replaceable clear and dark lenses
<azonenberg_work> Adjustable earpieces, rubberized nose to help hold it in place when heavily sweating etc
<azonenberg_work> then Z87.1 high impact as well as MIL-PRF-31013 impact standards
<whitequark> azonenberg_work: done
<whitequark> oh sec
<awygle> i can't find any actual, like, manufacturer website for these sunglasses
<openfpga-bot> [jtaghal] whitequark pushed 1 new commit to master: https://git.io/fAwGe
<openfpga-bot> jtaghal/master e19e380 whitequark: Implement device enumeration for GlasgowSWDInterface.
<rqou> azonenberg_work: so, i just discovered that apparently you can get slide-out storage units that fit in server racks
<rqou> did you know about that?
<awygle> they're, uh... not cheap
<rqou> heh, figured as much
<rqou> i didn't even know these existed
<awygle> we had a couple at planetary
<awygle> rack mount shelving ditto
<azonenberg_work> whitequark: gaah the forward is broken again i think
<rqou> lool
<rqou> use ipv6 without a forward? :P
<whitequark> azonenberg_work: fixed i think
<whitequark> let me look up some keepalive options for ssh
<whitequark> azonenberg_work: should do keepalive now
<whitequark> rqou: russia
<whitequark> does not have ipv6
<whitequark> anywhere afaik
<rqou> what
<whitequark> we just have NAT. a lot of NAT
<rqou> whyyy?!
<whitequark> most big ISPs don't give you real IPs anymore
<whitequark> my guess is there's no real incentive
<whitequark> there was NAT even before IPv4 exhaustion
<whitequark> also, shitty SOHO routers don't do IPv6 and no one wants to upgrade all that
<rqou> there's no "real" incentive in the us and yet isps are very slowly upgrading
<whitequark> for some reason ipv6 is quite popular in the US on mobile
<whitequark> no idea why
<rqou> meanwhile supposedly some isps in brazil are doing ipv6-only with ipv4 over ipv6
<rqou> i learned this from a mojang bug report because mojang thought they were really clever by totally disabling ipv6 in their launcher
<rqou> interestingly, I don't have ipv6 on mobile, just nat
<openfpga-bot> [jtaghal-apps] azonenberg pushed 1 new commit to master: https://git.io/fAwGM
<openfpga-bot> jtaghal-apps/master df3d9d7 Andrew Zonenberg: Added initial enumeration support for Glasgow
<openfpga-bot> [jtaghal-cmake] azonenberg pushed 1 new commit to master: https://git.io/fAwGD
<openfpga-bot> jtaghal-cmake/master 0bdeb6c Andrew Zonenberg: Updated to latest submodules
<rqou> surprisingly, comcrap in the us has a really competent backend team and has been deploying native dual stack for quite some time
<azonenberg_work> rqou: yeah looking forward to getting a proper dual stack setup on a static allocation here once i'm set up
<azonenberg_work> i had a tunnel before just because i didnt want to renumber the network too many times
<azonenberg_work> but once i'm set up at the new lab i'm going full static /56
<rqou> oh yeah, one stupid thing is that comcrap assigns dynamic ipv6 prefixes
<azonenberg_work> I think you can get static on business class
<azonenberg_work> it takes some effort but i got one
<azonenberg_work> (comcast has dynamic v4 too fwiw)
<rqou> supposedly according to the interwebs you'll usually keep it until the cmts gets rebooted
<azonenberg_work> yeah thats standard practice for dynamic ips in general
<zkms> whitequark: major US cell carriers built their LTE packet cores on ipv6 and also the fruit company has been pushing for ipv6 pretty hard
<rqou> why does my phone still not have ipv6?
<azonenberg_work> Glasgow API version:
<azonenberg_work> Serial number: (error)
<azonenberg_work> User ID: (error)
<azonenberg_work> Interface 0: Glasgow revA
<azonenberg_work> Enumerating interfaces... 1 found
<azonenberg_work> whitequark: ^
<whitequark> azonenberg_work: hmmmm let's see
<whitequark> azonenberg_work: uhhhh
<whitequark> LeakSanitizer does not work under ptrace (strace, gdb, etc)
<whitequark> this is a reason to not always enable sanitizers.
<azonenberg_work> Thats not relevant to the error, is it?
<whitequark> it is
<whitequark> I tried to strace jtagd
<whitequark> to see why it breaks
<whitequark> and I cant
<azonenberg_work> Well, i guess disable sanitizers in your local build temporarily
<azonenberg_work> and make a ticket for "only enable sanitizers in some specifiic build config: or something?
<whitequark> sure I did that
<whitequark> azonenberg_work: fixed
<openfpga-bot> [jtaghal] whitequark pushed 1 new commit to master: https://git.io/fAwZa
<openfpga-bot> jtaghal/master 6844146 whitequark: Fix GlasgowSWDInterface serial number discovery.
<openfpga-bot> [jtaghal-cmake] azonenberg pushed 1 new commit to master: https://git.io/fAwnX
<openfpga-bot> jtaghal-cmake/master 248db6f Andrew Zonenberg: Updated to latest submodules
<openfpga-bot> [jtaghal-apps] azonenberg pushed 1 new commit to master: https://git.io/fAwnH
<openfpga-bot> jtaghal-apps/master 21e9142 Andrew Zonenberg: Added --api glasgow switch
<openfpga-bot> [jtaghal-cmake] azonenberg pushed 1 new commit to master: https://git.io/fAwn7
<openfpga-bot> jtaghal-cmake/master b22db16 Andrew Zonenberg: Updated to latest submodules
<azonenberg_work> OK jtagd now starts and runs with a glasgow attached
<azonenberg_work> Doesn't do much yet because i havent added socket commands for the SWD protocol yet
<azonenberg_work> I also have not yet added support for the client to query the transport layer protocol
<azonenberg_work> So right now it gets very confused trying to send jtag commands to a swd interface
<whitequark> found some memory leaks
<whitequark> lemme fix those
<azonenberg_work> The server correctly ignores the JTAG commands in SWD mode but the client doesn't yet know it should be trying to do SWD :p
<azonenberg_work> I'm about to start doing some cable plant work downstairs so will have to leave this for a bit
<azonenberg_work> But will try to get back to it tonight in 3-4 hours
<whitequark> ah ok!
unixb0y has quit [Ping timeout: 240 seconds]
unixb0y has joined ##openfpga
<rqou> awygle: ^
Miyu has joined ##openfpga
hackkitten has quit [Ping timeout: 244 seconds]
emeb has quit [Quit: Leaving.]
<rqou> huh this is new -- contactless payment on a gas pump
<rqou> still no emv though
<whitequark> contactless is emv i think
<rqou> yeah, but this pump doesn't support a physical chip card
<rqou> only magstripe and contactless
<whitequark> rqou: no skimmers?
pie___ has joined ##openfpga
<awygle> rqou: I don't disagree with anything said there, but the focus on VC-backed open source is weird, as is the idea, implied by the statement that lack of adoption doesn't help developers, that adoption somehow *does* help developers
pie__ has quit [Ping timeout: 240 seconds]
<whitequark> yeah
<awygle> In addition to Tidelift, I'd call out License Zero as a cool thing I've learned about recently which is relevant to this topic
<awygle> long term i'd like to see us figure out how to adapt the worker-owned cooperative model to a world where a project has potentially thousands of workers and it's very difficult to gauge the relative amounts of their work
<whitequark> lol stripe account
<whitequark> so, not anything i could potentially use
<awygle> whitequark: yes, that sucks for a number of reasons. i like the concept much more than the implementation.
<whitequark> something something bitcoin
<awygle> whitequark: have you checked out uh... Stellar i think? supposedly much more usable for transactions than bitcoin?
GenTooMan has quit [Quit: Leaving]
* awygle is hugely ignorant here and hopes to gain knowledge
<whitequark> oh there's a number of networks like that
<whitequark> none of them have the adoption of bitcoin though
<whitequark> i'm looking forward to using something other than the bitcoin tire fire
<awygle> ah okay so it's adoption limited
<awygle> i was wondering if stellar's backing by stripe meant it was susceptible to all the bullshit gatekeeping or something
<awygle> woo pcbs
<awygle> 2-5 days
<whitequark> awygle: it probably is, but bitcoin exchanges aren't immune from AML either
Maya-sama has joined ##openfpga
Maya-sama has quit [Ping timeout: 272 seconds]
<TD-Linux> awygle, stellar doesn't have the distributed consensus model that bitcoin has, making it not really better than just using stripe
<awygle> TD-Linux: i mean, i feel like "better for what purpose" is a relevant question here, but like i said i have no real knowledge in this space
<azonenberg_work> awygle: better for collecting VC dollars?
<TD-Linux> heh. my more serious (but no less truthful) answer is that I think there's plenty of room for high level improvements to bitcoin (e.g. lightning) that don't throw away properties that make bitcoin as successful as it is.
<TD-Linux> t. slightly biased as I've done some power sidechannel work on libsecp256k1, bitcoin's elliptic curve implementation
<rqou> tbh i don't really trust any asymmetric crypto to be properly hardened against side channels?
<rqou> i've actually implemented secp256r1 and i have no idea if it's even correct, let alone secure
<rqou> and i don't really know how i would go about confirming that it is
<TD-Linux> it's really hard to do correctly. secp256k1 has pretty extensive mitigations - e.g. when there is a branch it computes values for both sides of the branch and then does a cmov
<TD-Linux> but it's still mostly guesswork. one piece of hardware that I keep on not finishing is a stm32 board that has current sense resistors and amplifiers in front of each decoupling cap, giving me really high bandwidth current measuring capability, also synchronized to the clock
<TD-Linux> with the goal of making a ci test of sorts for power sidechannels in crypto algorithms
<rqou> and then intel manages to make you a new side channel anyways :P
<TD-Linux> yeah, that's going to be a gift that keeps on giving :)
_whitelogger has joined ##openfpga
<whitequark> azonenberg_work: are you here yet?
Bike has quit [Quit: Lost terminal]
ZipCPU has quit [Ping timeout: 245 seconds]
ZipCPU has joined ##openfpga
<azonenberg_work> whitequark: just packing up
<azonenberg_work> TD-Linux: awesome
<azonenberg_work> TD-Linux: personally, i think that side-channel-free crypto in software on a GP CPU is impossible, period
<azonenberg_work> If you need to eliminate side channels do it in hardware with guaranteed constant timing
<azonenberg_work> no caches, no buses, no shared resources of any kind that you can have contention that affects timing
<azonenberg_work> Custom hardware makes power tweaking much easier too
<azonenberg_work> TD-Linux: did you see the CPU architecture i designed for running crypto algorithms?
<azonenberg_work> TD-Linux: http://paste.debian.net/plainh/79aca68f
<azonenberg_work> Meant to run any current or future hash or cipher efficiently. Not for pubkey at all
<rqou> <bullshit>what about post-quantum cryptography?????</bullshit>
<azonenberg_work> I mean i can't predict the future, but since the days of DES basically all block/stream ciphers and hashes have involved a bunch of high fan-in bitshifts, additions, bitwise operations, table lookups,etc
<TD-Linux> azonenberg_work, I wouldn't say it's impossible, but it's extremely implementation dependent which is unfortunate
<azonenberg_work> Minimal use of conditionals, multiply/divide, etc
<TD-Linux> part of the reason I'm using stm32 is a lot of the hardware wallets use it
<azonenberg_work> TD-Linux: Without microarchitectural information the chip vendors don't release?
<azonenberg_work> i don't think i's possible
<azonenberg_work> certainly not on an applications processor with any kind of OoO engine - may be possible on an in-order MCU core
<TD-Linux> yeah the m4 is pretty simple and in order
<azonenberg_work> Yeah but there is still potential for some stuff with the flash prefetch engine etc
<azonenberg_work> or AHB bus contention
<TD-Linux> indeed, that's the reason why I'm measuring. if I already knew the answer I woudn't do it :)
<azonenberg_work> Lol
<azonenberg_work> Anyway, i'm curious what you think of that ISA
<rqou> lol i totally forgot that m4s do have prefetch/cache/etc
<azonenberg_work> I never actually implemented it, and i had some more work to do on the control plane side of things i think (this was mostly datapath)
<azonenberg_work> The goal was to maximize instructions per clock for a crypto-specialized CPU with a fully deterministic in-order pipeline and a single register file write port
<azonenberg_work> i.e. you cannot retire >1 reg write per clock
<TD-Linux> rqou, only on flash but yes. the m7 supports external dram and has cache on that too iirc
<rqou> i've also been bit by the store buffer
<azonenberg_work> TD-Linux: It has a Y-shaped pipeline made of three ternary ALUs :D
<rqou> if you clear a timer interrupt bit too close to the end of the isr handler the handler will trigger a second time
<TD-Linux> what's the utility of the ternary?
<azonenberg_work> No R/I type format
<azonenberg_work> Everything is r32 op r32 op imm32
<azonenberg_work> So you have two parallel execution units like that
<azonenberg_work> Total of four registers, two opcodes, two immediates
<azonenberg_work> Then you have a third execution unit that, instead of using registers as inputs, operates on the outputs of the previous ALUs
<azonenberg_work> with a third opcode
<azonenberg_work> and writes to a single register
<azonenberg_work> Did i mention this uarch was targeting VERY high fan-in operations? :D
<TD-Linux> rqou, yeah the nvic is also really complex. I haven't had many problems with it though, but I've always turned off interrupt preemption
<rqou> have you thought about maybe doing an explicit datagraph ISA instead?
<azonenberg_work> rqou: It might be hard to make that deterministic runtime
<rqou> hmm ok
<azonenberg_work> Which was the other goal, i wanted cycle accurate performance with no data dependent control flow whatsoever
<rqou> i don't really know anything about the design space
<TD-Linux> presumably this is primarily targeting AES?
<rqou> and MSFT/QCOM didn't say much about it
<azonenberg_work> TD-Linux: it's designed to run AES, MD5, SHA1/2, and any future replacements
<azonenberg_work> The vision was that you could bake this into an asic that will be in service for decades
<azonenberg_work> and have performance better than a stock MCU core but more flexibility than a hard accelerator
<azonenberg_work> and probably much less area than an eFPGA
<TD-Linux> why the ternary tho
<azonenberg_work> Because a lot of hashes mix a constant in every round
<TD-Linux> ohhhh
<TD-Linux> I thought you were actually implementing ternary *base* arithmetic
<azonenberg_work> No
<azonenberg_work> I meant 3-input arithmetic
<TD-Linux> (there are some horrific base 3 hash algorithms)
<azonenberg_work> The whole ~140 bit instruction word implements
<azonenberg_work> (reg1 op1 reg2) op3 (reg3 op2 reg4)
<azonenberg_work> sorry i missed the immediates
<azonenberg_work> (reg1 op1 reg2 op1 imm1) op3 (reg3 op2 reg4 op2 imm2)
<azonenberg_work> i think op3 might take an immediate too
<azonenberg_work> So you could do (r0 + r2 + 0xdeadbeef) ^ (r4 & r6 & 0x41414141) in one instruction
<azonenberg_work> You see why this would excel at crypto now? :)
<azonenberg_work> oh and i think all of the regs input to the ALUs can be bitwise complemented too
<TD-Linux> it looks okay. the latency is kind of enormous though
<azonenberg_work> So you could do the MD5 (b & c) | (!b & d)
<TD-Linux> this is probably ok for aes-ctr and the like though
<azonenberg_work> in one instruction
<azonenberg_work> And well, this is meant for stream processing in general
<azonenberg_work> crypto tends not to have much data-dependent operations
<azonenberg_work> So you can have a deep pipeline
<azonenberg_work> in fact, if you look at the memory map
<azonenberg_work> input and output are memory mapped FIFOs :D
<azonenberg_work> this core is meant to be a black box where data goes in and data comes out
<azonenberg_work> and it sits between say application logic and a TCP offload engine or something
<azonenberg_work> Like i said before, I never actually *built* it so I don't know how performance would be - it'd need to be tested and tweaked a lot
<azonenberg_work> It was mostly just an architectural experiment that targeted a less-common point in the design space
<rqou> inb4 you built another preshot
<azonenberg_work> whats that?
<rqou> mocking name for intel's prescott uarch
<rqou> also wtf prescott was "only" 90nm?
<rqou> i thought it was a smaller node than that
<TD-Linux> azonenberg_work, you'll know you've hit peak prescott housefire when you have to clock your alus at half speed
<rqou> wait it does that?
<rqou> hey azonenberg_work, you're working on your jtag tools right?
<rqou> want to finish up coolrunner-ii support for all parts?
<rqou> also add the "crbit" format support?
<TD-Linux> also two ports run at double speed
<azonenberg_work> rqou: I'm working on features i can justify for work right now
<azonenberg_work> so arm stuff and swd
<azonenberg_work> But if i have time, sure
<azonenberg_work> if there are not tickets on the github already, file them
<rqou> which repo?
<azonenberg_work> jtaghal
<TD-Linux> this has probably already been discussed to death but I was happy to see the talos ii mobo uses icestorm and friends on a hx1k https://git.raptorcs.com/git/talos-system-fpga/tree/
m_w has quit [Ping timeout: 244 seconds]
m_w has joined ##openfpga
sensille has joined ##openfpga
<sensille> i'm using yosys/arachne-pnr for the first time. how do i set a timing constraint on the clock?
<azonenberg_work> sensille: i could be wrong (I'm less familiar with it than some other people)
<azonenberg_work> but my understanding is that arachne is not a timing-driven PAR
<azonenberg_work> it does the best job it can, then you run static timing to see if it worked
<sensille> i was afraid of that, thanks
<azonenberg_work> nextpnr is the next-generation tool that i believe is timing driven
<sensille> output from icetime: Unable to resolve delay for path ce -> ltout in cell type LogicCell40!
<azonenberg_work> i think it works on ice40 and ecp5?
<azonenberg_work> mithro: ^^
GuzTech has joined ##openfpga
<sensille> looks like i'm having a hard time fitting my design into a hx8k :-/
<sensille> 24% luts of a artix-7/35T
rohitksingh_work has quit [Quit: Leaving.]
<azonenberg_work> sensille: have you considered optimizing it? What's it do?
<sensille> 4-channel stepper motor controller, the design i discussed with ZipCPU the other day
<azonenberg_work> Hmm, might be worth trying to figure out where your area is going
<azonenberg_work> and see if you can share some resources between channels or otherwise shrink it
<azonenberg_work> that sounds awfully large
<sensille> i don't know the old xilinx had a local 4-bit RAM in each cell. i used that for a cpu to switch state between threads. maybe current fpgas are similar and i can very cheaply multiplex the logic between the controllers
<azonenberg_work> well, i was actually going to suggest that you overclock the design
<azonenberg_work> and have one copy at 4x the rate
<azonenberg_work> and just have a shift register or something on the io cells
<azonenberg_work> stepper control doesnt sound like it needs 100 MHz clock frequencies
<sensille> the area goes into adders
<sensille> i'm running it at 20Mhz
<azonenberg_work> So can you make timing at 80?
<azonenberg_work> if so, you can have one copy of the core logic
<azonenberg_work> just replicate state
<sensille> i know as soon as icetime works :)
<azonenberg_work> Lol
<sensille> vivado reported 35ns
<sensille> but it was only constrained to 50ns, so i don't know the limit
<azonenberg_work> Vivado timing doesnt mean much unless you set a tihgt constraint
<azonenberg_work> exactly
<azonenberg_work> it wasnt trying hard
<sensille> is multiplexing cheap? does it map well to the cells?
<sensille> i guess is depends much on the architecture
<azonenberg_work> Yeah it should be, a 4:1 mux is one lut
<azonenberg_work> so basically replace every state bit with a lut and four dffs
<sensille> so 4 channels would be the sweet spot :)
<azonenberg_work> With a 6LUT xilinx arch
<azonenberg_work> On a lattice 4LUT arch you would need two luts per mux i think
<azonenberg_work> unless there is hard mux ip
<sensille> one channel implementation: LCs 6151 / 7680
<azonenberg_work> (you might also be able to optimize the channel logic itself but this is a start)
<azonenberg_work> The naive solutoin to everything in FPGA is throw hardware at it
<azonenberg_work> but sometimes you can run faster with less hardware and do the same work
<sensille> i can cut channel logic by 30% without trying too hard i guess
<sensille> but to be honest i was hoping to have some headroom :)
<azonenberg_work> Yeah before you do any muxing of channels
<azonenberg_work> See how small you can get one
<azonenberg_work> Then let me take a look at the RTL and i might have some suggestions on how to shrink it more
<sensille> you will recoil in horror when you see the 208-bit-adders :)
* GuzTech gasps
<sensille> :)
<GuzTech> Dare I ask?
<GuzTech> Why do you have/need 208-bit adders? :D
<sensille> i can probably cut them down to 180 bits ;)
<azonenberg_work> why
<azonenberg_work> That is probably your #1 problem :p
<azonenberg_work> that's at least 208 luts per adder
<sensille> to calculate a 5th order polynomial differentially over 20M steps without accumulating too much error
<GuzTech> If it's a ripple carry adder...
<GuzTech> Or else it's even more (but most likely faster).
<sensille> so i don't need multiplication
<azonenberg_work> sensille: uh... i feel like there has to be a better solution :p
<azonenberg_work> on a xilinx part i'd use the hard multiplier block
<GuzTech> How large is your bitwidth?
<GuzTech> I feel like you could implement a small multiplier with carry-save adders, which would be faster and smaller.
<sensille> i don't know what bitwidth i'd need, depends if i implement floating point or not i guess
<sensille> but t^5 is large for t==20M, no idea how i would approach that
<sensille> of course the coefficient is small
<rqou> i would suggest a microcoded approach
<rqou> with some ram and a smaller adder
<sensille> but that would have to run at a quite high clock, if i want to end up with 20M evaluations/s
<rqou> hmm
<rqou> I'm not familiar with your design but can you use precomputed lookup tables?
<sensille> no
m4ssi has joined ##openfpga
knielsen has quit [Ping timeout: 246 seconds]
rohitksingh_work has joined ##openfpga
<rqou> not even by cheating? I'm not aware of stepper controllers being that complicated
<sensille> my current approach is something like the mechanical difference engine. babbage also solved the error propagation problem my making the adders larger :)
<sensille> s/my/by
<sensille> he used 35 digits or such
<rqou> why is there a giant polynomial involved?
<sensille> i want to use 5th order to make the math for the approximation very easy. with some tricks a 4th order polynomial might also do
<sensille> end-to-end the design is dead simple. but also relatively expensive in the fpga, but cheap on the host
<rqou> uh... don't stepper motors normally just have a "step" and "direction" pin? what does the polynomial do?
<sensille> so i can do the host part on an rpi
<sensille> calculate the step
<rqou> ah, you're building a cnc machine?
<rqou> why does this calculation need to be in the fpga? can it be in the rpi instead?
<sensille> 3d printer
<rqou> alternatively, can you evaluate your polynomial with Horner's method?
<sensille> this biggest problem of all hobby 3d-printers is step calculation. they currently to that in an mcu and have to make tons of horrible simplification
<sensille> and the rpi doesn't have a good enough timing to do the control directly
knielsen has joined ##openfpga
<rqou> wait, the printer is entirely open-loop, right?
<sensille> yes
<rqou> it seems like it should be possible to precompute all the step counts ahead of time and just feed that into an fpga to generate step pulses?
<sensille> my goal is to control the jerk of the motion, to reduce vibrations
<sensille> yes, but that approach has 2 problems: it might bring the rpi to its limits, and you need to transfer the step data to the fpga
<sensille> for the latter you need some kind of compression
<sensille> and you could say i use a polynomial for compression :)
<rqou> does the rpi have dma-capable qspi?
<sensille> qspi? quad? afaik (i'm actually using an orange pi), dma yes, but only one data line
<sensille> not sure
<rqou> I'd definitely investigate doing all the hard stuff on the pi and have the fpga just generate pulses
<sensille> but it's also a matter of latency. linux isn't very good at it out of the box
<GuzTech> Can't you run a bare-metal program that does all the calculation? At least the latency would be much better.
<rqou> yeah i was thinking just mmaping the dma controller and the spi peripheral and poking them directly
<rqou> overall i think you should pick a better hardware-software tradeoff :P
<sensille> bare metal, without linux? then i'd need another board for the UI stuff
<GuzTech> Oh nvm then :P
<rqou> alternatively do look into Horner's method which avoids your giant exponentiation problem
<sensille> rqou: should i manage to fit the design in an hx8k i think the tradeoff would be not too bad
<GuzTech> But yeah, as rqou said, you should think about about a better tradeoff.
<sensille> of course i don't want to spend $50 on the fpga
<rqou> i also tend to bias my tradeoff towards software because software is easier with shorter dev cycles
<rqou> azonenberg_work tends to bias towards hardware because hardware is easier to verify
<sensille> my current tradeoff is the simplest overall design
<GuzTech> Horner's method seems simple enough to implement.
<sensille> yeah, looking :)
<sensille> you think a, lets say, 30 bit multiplier is cheap than a 200 bit adder?
<sensille> *cheaper
<s_frit> sensille: just curious: do you think this jerk control business is going to work better than building a closed-loop solution?
<sensille> s_frit: closed loop would mean a completely different kind of hardware, much more expensive
<s_frit> sensille: i see. i guess even with a closed loop controller, you still want the control input to be as smooth as possible.
<sensille> i want to raise the cheap hobby-solutions to the next level, by spending $10-$20 on an fpga
<GuzTech> sensille, a 30x30 bit multiplier is basically a 900 bit adder. But if you build it with carry-save adders, then your latency is much lower.
<rqou> how do the current solutions all work?
<GuzTech> So it's a area/performance tradeoff.
<rqou> GuzTech: uh, the final answer should only be 60 bits, not 900?
<GuzTech> You'd need 30 times 30 bit adders, no?
<GuzTech> Or am I not awake yet?
<sensille> rqou: the use an mcu to generate the steps. the max out at 60-100kHz with an awful lot of jitter
<sensille> they can't even control the acceleration properly, let alone the jerk
<rqou> and you feed them gcode?
<rqou> GuzTech: not with the usual wallace tree structure
<s_frit> am i correct to presume there is sigma-delta modulation of the pulses involved?
<sensille> also, the current stepper driver technologie allows for a microstep resolution of 1/256 to get smooth motions, of course there's no way you can make use of it with an mcu
<rqou> why not?
<sensille> rqou: yes, the currently get fed gcode
<rqou> i get the feeling this entire thing can be implemented by intelligently programming an stm32f4
<sensille> rqou: 1/256 means step rates into the low MHz
<rqou> using its hardware timers to generate the proper pulses
<GuzTech> rqou: True, a Wallace/Dadda/HPM structure uses less hardware, but it would still be more than just 60 bit adders.
<sensille> and of course in theory step rate changes with every step when doing a curve
<s_frit> use a high-speed spi port to stream out a pulse stream
<sensille> s_frit: digma-delta?
<sensille> pre-calculating the exact step-data would be too much for an rpi i guess
<rqou> how computationally expensive can these calculations all be?
<s_frit> sigma-delta modulation -- noise-shaped pulse generation (tbh i don't know if it is applicable to stepper motor control, i just assumed)
<sensille> rqou: evaluating a 5th order polynomial or sin/cos :)
<rqou> how frequently?
<sensille> once per step
<sensille> so a few 100k/s
<sensille> on multiple channels
<rqou> O_o
<rqou> I'd personally go with an approach of calculating all of this ahead of time on a PC and generating a list of steps
<rqou> and then you can figure out how to actually "play" this list
<s_frit> or compute the high-res verson at a lower rate, and interpolate to upsample
<sensille> producing GB of data?
<rqou> would it really be that much?
<rqou> in any case, GBs of storage are pretty cheap nowadays :P
<sensille> you could compress it by approximating it with a polynomial ;)
<s_frit> sensille: how are you evaluating the polynomial? as a p(x) type thing, or p(x) = f(p(x-1))
<sensille> also, pre-calculating would take too much time
<sensille> s_frit: currently the latter
<sensille> rqou: if you can't do it in realtime, you can't wait for it. a print may take 20h
<azonenberg_work> sensille: personally, i would indeed go closed loop
<sensille> azonenberg_work: why?
<azonenberg_work> Because it eliminates all this math and you can do basic PID control or something
<azonenberg_work> much more precise too
<azonenberg_work> of course, i also wouldn't build a FDM 3d printer
<rqou> given these constraints I'd probably feed Ethernet into an fpga
<azonenberg_work> i'd go with stereolithography or similar
<azonenberg_work> or SLS
<sensille> and what would be the advantage? you'd still need to calculate the path
<rqou> a program on a PC doing math and feeding a step list into the fpga
<sensille> SLA needs a path for the laser, same problem, higher rates :)
nurelin has quit [Ping timeout: 244 seconds]
<azonenberg_work> sensille: i'd feed a set of coordinates to the FPGA over Ethernet from a CPU of some sort
<azonenberg_work> Then have the FPGA do simple closed loop PID control to reach those coordinates
<sensille> but you also need to control the speed and coordinate that with the extrusion
<sensille> i think closed loop would just add another layer of problems
<s_frit> maybe feed the coordinates to the FPGA at low rate, then upsample using some high-quality interpolation to generate the pulses. assuming that the control signal is bandlimited this will work just as well as running the polynomials at high rate
<rqou> at this point I'd start looking into approximations
<rqou> since somehow those Arduino-powered things work
<sensille> ok, i can fit 2 channels into an hx8k, there's hope :)
<sensille> s_frit: that is not far from what i do
<sensille> rqou: the polynomials are already the approximations
<s_frit> sensille: the advantage of what i'm proposing is there is a single stream of numbers for each channel, and the interpolator just needs to generate a fixed number of interpolated points for each input sample, you may be able to store the interpolation weights in ram
<sensille> otherwise i'd need to calculate sin/cos
<s_frit> for example you could use 8th order interpolation, then each oversampling tap will need 32x8 32-bit coefficients and you'll need to perform 8 multiplies per output sample
<s_frit> erm that should be 8 x 32-bit coefficients
<azonenberg_work> sensille: sin/cos to how many bits?
<sensille> azonenberg_work: haven't really thought about this, maybe 24?
<rqou> wait, I'm not quite seeing why you need to calculate sin/cos every single step
<rqou> what exactly is the performance improvement you want to make?
<sensille> control the jerk
<sensille> but maybe i can do with a linear interpolation between 16 steps or such
<sensille> or more
<rqou> don't you only need accurate calculations around direction changes?
<sensille> my goal is to print models that are described by splines
<sensille> so the direction changes with each step
<s_frit> cubic hermite interpolation is also an option maybe
<sensille> and even with gcode, when transitioning between the lines, the head has to move along a curve
<rqou> yeah, I'd go with the "stream data via Ethernet" approach
<azonenberg_work> honestly, i wouldn't even use gcode
<sensille> s_frit: ZipCPU explained a way to interpolate from coordinates to me, that ended up using 4th order polynomials
<sensille> azonenberg_work: i don't want to use gcode
<sensille> that was my starting point of this adventure
<azonenberg_work> i'd precompute a full toolpath in real time on the CPU, with target positions streaming over ethernet every few microseconds
<azonenberg_work> then the FPGA just does closed-loop control to ensure you go to that position
<azonenberg_work> you could also go with something like a Zynq and nix the ethernet and have a super low latency link
<sensille> closed-loop would probably triple the price of the printer
<azonenberg_work> I didn't say i built cheap stuff :)
<azonenberg_work> I'm the wrong guy to ask if you're cost optimizing
<azonenberg_work> I go for quality, accuracy, and reliability
<s_frit> sensille: ZipCPU knows more math than me, but i was just thinking if you're contemplating linear interpolation, cubic hermite is a nice step up without going to your full 5th order solution. maybe it doesn't fit for x,y paths quite so well, i don't know
<azonenberg_work> Think Mitutoyo, not Aoyue :p
<sensille> currently my only concern is if i need a $50 fpga or a $10 one :)
<sensille> with the $50 my approach works fine
<sensille> s_frit: i tried to write down the requirements: http://3dpfs.sensille.com/index.php?title=Mathematics
<sensille> but hey, if 2 controllers already fit, i can probably optimize it to 4 and i'm good :)
<s_frit> sensille: from my point of view, assuming a low-rate-control-with-upsampling/interpolation solution, a lot of the performance requirements amount to trading off between source data rate, source data bandwidth (i.e. max velocity delta), and interpolation quality.
nurelin has joined ##openfpga
<sensille> s_frit: yes. i'd like to stay well below 1MBit for the source data rate
nurelin has quit [Ping timeout: 246 seconds]
<s_frit> sensille: and you want > 1us pulse resolution?
<s_frit> sensille: or >1MHz pulse rate?
<sensille> for extreme moves, step rate may go up to 6 MHz
<sensille> probably not needed during prints, but 1MHz is reached easily
<s_frit> sennille: how are the steps transmitted to the motor? what's the data look like?
<sensille> 2 lines, step/dir. pulses on step (or double edge)
<s_frit> so it's purely about pulse rate, pulse width doesn't come into it
<sensille> yes
soylentyellow has quit [Remote host closed the connection]
<s_frit> but you're going to get some "issues" with pulse spacing being quantized to the fpga clock speed
<sensille> the pulses are sampled by the stepper driver at about 16MHz
<s_frit> ah ok. so the stepper driver has it's own filtering
soylentyellow has joined ##openfpga
<s_frit> does the stepper driver have a bandwidth specification? when you say it samples at 16MHz do you mean it's rated to properly handle input pulse rates up to 8MHz?
nurelin has joined ##openfpga
<sensille> i think that's the specified limit, yes
<s_frit> sensille: 1mbit/sec for 4 channels of 2-tuple jerks (say 32-bits per value) is a sample rate of 4kHz so that's approx 256x to 2048x upsampling ratio. nyquist rate would be 2kHz -- maybe you could calculate how that relates to spatial resolution
<sensille> erm - no ;) what does the 2kHz relate to? curvature change?
pie___ has quit [Ping timeout: 244 seconds]
<s_frit> it's the maximum representable frequency in your control signal (whatever the control signal represents, presumably some derivative of position, that gets integrated)
<sensille> yes, but how does 'frequency' relate here to the physical reality? assuming the speed is constant, that would be something like the change of curvature?
<s_frit> so, if max |v| is 1 m/sec then i guess that 2kHz maps to .05mm spatial period
<sensille> 1m/sec is a realistic upper bound
<s_frit> like you should be able to represent a little left-right wiggling sine wave / zig/zag with that period, maybe
<s_frit> probably not though, because that's a totally theoretical upper bound, and your interpolators may not be that good
<s_frit> note that this is for representing sharp high-frequency stuff like corners
<s_frit> using a 4kHz sampled control stream
<s_frit> if you use something more like a spline "display list" of course you'll get might tighter temporal/spatial resolution
<sensille> there are no real corners, the point is to transform corners into curves
<s_frit> i just wanted to think through the uniformly sampled case
<s_frit> well then i guess you get best-case 0.5mm rounded corners with this regularly sampled scheme
<sensille> would the maximum frequency determine the maximum 'slew rate' and thus the maximum velocity?
<sensille> hm
<sensille> 0.5mm at 1m/s, ok
<s_frit> yeah i'm not exactly sure how to convert bandwidth to slewrate
<sensille> the hardware can't deliver that anyway
<s_frit> they are convertible, for sure
<sensille> i'm aiming for 0.1mm corners at 0.1m/s
<s_frit> that seems totally doable
<s_frit> i mean, totally doable with a 4kHz stream of jerks (uniformly sampled) then interpolated with linear or cubic interpolation
<s_frit> of course you would still generate the 4kHz stream using your "smooth curves" techniques
<s_frit> hopefully i didn't make a mistake with the math ;)
<sensille> a 4kHz stream of jerks with my old implementation would certainly be good enough. but i failed at generating that stream, so i thought of a way to make the generation easy
<s_frit> haven't you just moved the generation to the fpga? where you still haven't made it easy?
<sensille> in fpga it's easy, only 6 lines of code
<sensille> but also a bit costly
<sensille> not in azonenberg_work's terms of 'costly', though)
<s_frit> what was the problem with the "old implementation" then? why did you fail at generating the stream?
<s_frit> i mean, you could stream out 8 channels of 4kHz 32 bit data using the i2s audio interface on a RasPi
<sensille> i have no idea how to approach the math
<s_frit> oh
<sensille> the main point is that errors must not accumulate
<s_frit> yeah, i was wondering about that
<s_frit> so you need to interpolate the data in such a way that the errors don't accumulate
* s_frit thinks hard
<sensille> i would want to start with 3 basic geometries: lines with s-curve motion profile, circles with constant velocity and clothoids with constant velocity
<sensille> but later on i want to use nurbs as source. this is where things really start to get nasty
<s_frit> numerical integration schemes are always vexed
<sensille> i'll need to find approximate the curve length for that
<s_frit> is all this predicated on the idea that the motors don't skip pulses (i.e. pulses map precisely to particular x, y positions, relative to the starting point)
<s_frit> ?
<sensille> yes
<s_frit> and how do current controllers work? do they transmit x,y coords and then compute the deltas in the controller?
<s_frit> i mean, maybe it's easier to just spit out a 4kHz stream of x,y coordinates, interpolate the data on the fpga, and then compute the jerks from the interpolated data
<sensille> depends on what you call 'controller' in the chain. the current control board (mcu based) read gcode (description based solely on linear moves) and directly generate the pulses
<s_frit> you can still embed all of the 5th order stuff into the x,y generation to keep the motion smooth
knielsen has quit [Read error: Connection reset by peer]
knielsen has joined ##openfpga
<s_frit> right, so at a miniumu you want to replace gcode with something that can represent curves
<s_frit> *minimum
<s_frit> or generate 4kHz gcode
<s_frit> btw. let me know if i'm annoying you with my questions... just seems like an interesting problem
<sensille> in a first implementation i need to read gcode and enrich it by added smooth transitions between the lines. the controllers currently do that, too, but not in a way that satisfies me
<sensille> no, i'm happy to've found someone who has taken an interest in it :)
<sensille> my math skills are weak, so i need every help i can get ;)
m_w has quit [Ping timeout: 245 seconds]
<s_frit> my math skills are not great, but i'm currently back at uni studying math, trying to improve
<sensille> i have to admit i have a math pre-grade from uni, but that was a long time ago. so i'm back at school level
<s_frit> but i know a bit about signal processing, that's why i mention the sample-rate stuff
<sensille> i see. i tried to wrap my head around that several times in the past, without success
<s_frit> well, the main thing is, if you have a regularly sampled stream with sample rate N, you can represent signals with frequencies (i.e. sine wave components) up to N/2.
<sensille> yes, nyquist, that's where my knowledge starts and end :)
<sensille> and a bit of FFT
<s_frit> then if you want to represent a "corner" (e.g. a sharp direction change like a triangle wave in a one dimensional signal) then you won't be able to make it super-sharp, because then it would have energy about the nyquist rate
<s_frit> *above
<s_frit> you're going to have some energy above the nyquist rate with your piecewise curves, but if you transmit absolute position that should be no big deal (it might cause some wobbles, i'm not sure)
soylentyellow_ has joined ##openfpga
<s_frit> the serious problems will start if you are transmitting velocity or jerk, and you violate the nyquist limit, and then you interpolate the result, and then you "integrate" by feeding pulses to the motor
soylentyellow has quit [Ping timeout: 240 seconds]
<sensille> i would first generate a source curve that doesn't violate the limits, and sample that
<sensille> my current approach samples postion, velocity and acceleration, and calculate a 5th order curve between each 2 points. very simple math
<sensille> ZipCPU proposed to just sample position and interpolate over a window with a 4th order curve
<s_frit> makes sense
<s_frit> how many luts does a 32-bit multiplier use?
<sensille> but the latter would give me only indirect control over the velocity, and none over acceleration
<s_frit> assuming that you are producing eqi-spaced time series data for position, that already encodes velocity and acceleration
<sensille> with "space" meaning the 2-dimensional curve length
<sensille> if i want constant velocity
<s_frit> hmm, i'm not exactly following, i think we need to use clearer language
<s_frit> the output of this system is a series of pulses, which are discrete position change commands, no?
<sensille> it is important to see that we control the motor per axis, meaning 2 motors, while the velocity is the 2-dimension velocity, |v|
<sensille> yes
<s_frit> right, so |v| = sqrt(x'^2 + y'^2) if i remember correctly
<s_frit> where x' = dx/dt and y' = dy/dt
<s_frit> i'm imagining the time-series is a series of (x,y) pairs
<s_frit> (x,y) pairs get generated from the source curves using a parameterisation that gives you the |v| properties that you want
<s_frit> by x,y i mean absolute position
soylentyellow__ has joined ##openfpga
<sensille> in ZipCPU's approach, yes
soylentyellow_ has quit [Ping timeout: 264 seconds]
<s_frit> your control step for each axis looks something like (in a c-like language): at each over-sampled time step: newx = iterpolatedataforx(); errx = newx - currentx; if (errx >= stepsize) { output_up_pulse(); currentx += stepsize; } else if (errx >= -stepsize) { output_down_pulse(); currentx -= stepsize; }
<s_frit> this way you're always comparing an interpolated position to the actual position, and there's no chance of drift/error accumulation
<s_frit> as soon as you start doing open-loop numerical integration you're going to accumulate round-off errors -- i guess what you're proposing is to reset to a known position at the start of every curve segment, and make sure your numerics are accurate enough to not accumulate significant error during the curve
<s_frit> i think ZipCPU and my method are the same, except maybe i'd use a different interpolator, but not qualified to give advice on fpga implementation strategies, so i'd defer to him on that
<s_frit> i do think it's worth noting that in a regularly sampled system, your interpolation times are going to be some fixed multiple of the base sample rate so you can potentially store the interpolation coefficients for each sub-sample tap in block ram
<s_frit> perhaps more useful: interpolation times will be at some fixed subdivision of the base sample period
<sensille> s_frit: yes, each segment has to be calculated from the real position incl. error, not the theoretical one. an interesting point: to keep to derivatives continous, interpolation also has to take these into consideration. except one decides that the error is too small to matter
<s_frit> sensille: this runs through a bunch of low-order polynomial interpolators: http://yehar.com/blog/wp-content/uploads/2009/08/deip.pdf
<sensille> that's a long read
<s_frit> well the short summary is: "just use cubic hermite"
<sensille> "... for audio oversampling"
<s_frit> sinsille: there is one with simple coefficients her: http://www.musicdsp.org/archive.php?classid=0#16 see Oscillator::UpdateWithCubicInterpolation
<s_frit> hmm, maybe not that last one
<s_frit> i don't see any difference between audio sampling and what we are discussing here
<s_frit> it's all signals
<s_frit> that last one the code is more confusing than i thought
<s_frit> thing with that 3rd order cubic is i know it will give you significantly better performance than linear
<s_frit> *linear interpolation
<s_frit> it's pretty much flat for frequencies below N/4
<sensille> the difference (or not?) is that i care about continues 2nd derivative
<s_frit> that matters for audio too
<sensille> so can i just use a sound chip to control my steppers? :P
<s_frit> if you can live with pulse rate below 22kHz sure
<sensille> use x/y as stereo input and derive steps from the magnitude of the output signal
<s_frit> anyhow, we're talking about taking your low-rate 4kHz position data (equivalent to the audio signal) and upsampling it in the fpga to 1Mhz or more
<s_frit> out of a raspberry pi you can easily stream 44kHz 64-bit digital data out of the audio interface, so you could easily output 8 8hKz 32-bit values to the fpga that way
<s_frit> i'm talking about the i2s serial audio interface here
soylentyellow__ has quit [Quit: Leaving]
<s_frit> i'm assuming 32-bits is enough resolution to represent the x,y positions on your printer, is that correct?
soylentyellow has joined ##openfpga
<sensille> 20 bit are enough for my printer, and 24 bit would probably enough for any printer
<s_frit> ok
<s_frit> i'm just thinking some more about this interpolator, and comparing it to usual audio applications
<s_frit> usually you would probably use something better than cubic to do such high-level oversampling i think
<s_frit> you want to go from 4kHz to 1Mhz say
<sensille> where 4kHz @32 bit and 1Mhz @1 bit
<s_frit> well, kinda, yeah
<s_frit> actally i was thinking 1MHz 32bit and then you generate the 1 bit signal from that
<s_frit> if you want this thing to output 6Mhz pulses, then you'd ideally want to be able to upsample to 6Mhz, but who knows what the fpga can do
<s_frit> i suspect that the optimial solution looks something like: output data from the rpi at the highest feasible rate (ie you generate high-quality, very smooth data using your fancy algorithm) that will probably be something in the region of ~16kHz, but spatially bandlimited to like 1 or 2 kHz, then the fpga runs the most complex interpolation it can afford to get the data up to 6MHz, then you
<s_frit> run the loop i mentioned above to output the pulses
digshadow has joined ##openfpga
<s_frit> the point here is that the better the data is that you generate with the "smooth" algorithm, the less work the interpolator/up-sampler needs to do
<s_frit> you might also want to consider outputting 32-bit fixed-point position data from the smooth algorithm, so the interpolator has more detail to work with
<sensille> what do you mean by 'better'? closer to the original, or modified in a way that the interpolator generates the best result?
<s_frit> i mean closer to the original / more information
<s_frit> in a theoretical sense a perfect interpolator will recover the original signal even if it contains components at nyquist, but there is no perfect interpolator, and low-order polynomial interpolators won't perform that well (although we're talking about very smooth/low frequency source material, so i'd expect them to perform pretty well)
m_w has joined ##openfpga
<s_frit> sensille, where are you up to with the implementation?
<sensille> running for a circle
<sensille> but need to do the corexy coordinate transformation yet
m_w has quit [Ping timeout: 252 seconds]
<s_frit> what does "corexy coordinate transformation" mean?
<sensille> the motor movement does not directly relate to x/y movements
<sensille> but that's a simple linear transformation
<sensille> other printers may need more complex transformations, like sqrt or transformation into polar coordinates. but that should influence the interpolation much
<s_frit> interesting
<sensille> should not, of course
<s_frit> depending on what scheme you use, you could do the transformation on the low-frequency data prior to interpolation
<sensille> yes, that's the plan
<sensille> in my current implementation i need to transform position, speed and acceleration
<s_frit> is it better to interpolate transformed data, or transform interpolated data? i don't know. it will make a subtle difference if the transformations are not linear
<s_frit> fun :)
<sensille> the interpolation needs to have a step size based in the actual implementation, so i guess the former
soylentyellow_ has joined ##openfpga
soylentyellow has quit [Read error: Connection reset by peer]
soylentyellow_ has quit [Ping timeout: 252 seconds]
ondrej3 has quit [Ping timeout: 245 seconds]
ondrej3 has joined ##openfpga
Miyu is now known as hackkitten
<sensille> maybe now someone can give me a hint with this, from earlier this day: "Unable to resolve delay for path ce -> ltout in cell type LogicCell40!"
<sensille> from icetime
rohitksingh_work has quit [Read error: Connection reset by peer]
lain has quit [Ping timeout: 240 seconds]
Bike has joined ##openfpga
lain has joined ##openfpga
rohitksingh has joined ##openfpga
rohitksingh has quit [Ping timeout: 264 seconds]
Maya-sama has joined ##openfpga
Maya-sama is now known as Miyu
rohitksingh has joined ##openfpga
soylentyellow has joined ##openfpga
rohitksingh has quit [Ping timeout: 240 seconds]
Miyu has quit [Ping timeout: 246 seconds]
rohitksingh has joined ##openfpga
<mithro> azonenberg_work / sensille: Yes nextpnr is timing driven -- can you log a bug about that ce -> ltout delay issue?
pie_ has joined ##openfpga
hackkitten has quit [Read error: Connection reset by peer]
hackkitten has joined ##openfpga
<mithro> sensille: The issue is lack of timing data - if you log a bug I think it will get fixed pretty quickly
GuzTech has quit [Quit: Leaving]
<whitequark> azonenberg_work: poke
rohitksingh has quit [Quit: Leaving.]
emeb has joined ##openfpga
pie_ has quit [Read error: Connection reset by peer]
pie__ has joined ##openfpga
Miyu has joined ##openfpga
Miyu has quit [Read error: Connection reset by peer]
Miyu has joined ##openfpga
pie_ has joined ##openfpga
pie__ has quit [Ping timeout: 240 seconds]
emeb has quit [Ping timeout: 240 seconds]
emeb has joined ##openfpga
m4ssi has quit [Remote host closed the connection]
rainey has left ##openfpga ["Leaving"]
<pie_> lol;
<prpplague> pie_: hehe
ym has quit [Remote host closed the connection]
GuzTech has joined ##openfpga
<awygle> i just accidentlaly opened a second desktop, i didn't know windows could do that
<whitequark> what
<whitequark> oh windows has a proper WM now
<whitequark> it does tiling too
<pie_> yeh
<awygle> yeah it would have been relaly cool if i'd... done it on purpose
<awygle> i also had remote desktop open so i got Confused
m_w has joined ##openfpga
digshadow has quit [Ping timeout: 240 seconds]
<rqou> wait windows has a tiling mode now?
azonenberg_work has quit [Ping timeout: 240 seconds]
Ultrasauce has quit [Quit: No Ping reply in 180 seconds.]
Ultrasauce has joined ##openfpga
azonenberg_work has joined ##openfpga
knielsen has quit [Read error: Connection reset by peer]
knielsen has joined ##openfpga
<rqou> wtf is up with network equipment that used commands like "no foobar" to disable "foobar"?
<rqou> is this a Cisco thing?
<tnt> yes
<whitequark> no u
<Ultrasauce> http://bcas.tv/paste/results/rJ8C5n44.html this is the shit i'm dealing with today
<Ultrasauce> about as inconsistent as naming can be using a single language i think
<Ultrasauce> no public documentation of course, doubt atheros would give me the time of day
<awygle> I dislike how vim does :set novar
<azonenberg_work> rqou: fwiw, LATENT* will probably use an IOS-esque CLI just for ease of use by people who are familiar with it
<azonenberg_work> Not a 100% command clone, but close enough it will be easy to learn
<azonenberg_work> (for example, Quagga is a f/oss implementation of OSPF, BGP, and some other stuff that uses a very IOS-like shell)
<tnt> yeah we actually extracted their vty implementation into a re-usable library that we use for all the osmocom gsm network stuff configuration.
<kc8apf> azonenberg_work: yuck. I must prefer Junos style over Cisco style
<azonenberg_work> kc8apf: i learned on cisco so that's what i know
<azonenberg_work> havent used juniper
<azonenberg_work> That being said, the switching core is going to be pretty well separated from the CLI
<azonenberg_work> So it wouldn't be that difficult for someone to write a replacement CLI that has a different command set
<whitequark> azonenberg_work: poke
<azonenberg_work> whitequark: ack
<kc8apf> wtf? Since IIS 6.0, HTTP request parsing is done in the windows kernel
<whitequark> azonenberg_work: any luck with SWD?
<azonenberg_work> whitequark: got pulled away to do something else for a bit, about to try some more stuff now
<whitequark> sweet
<azonenberg_work> whitequark: right now i'm working on adding swd support to the network protocol as well as adding checks to prevent confusion if you have a swd client connect to a jtag cable and vice versa
<awygle> oh wtf
<awygle> siglent doesn't even sell a rack mount kit for the SSA3000X series
<azonenberg_work> awygle: hint that you're not dealing with professional grade hardware? :p
<awygle> azonenberg_work: i mean rigol sells one for the 1000Z series :p
<azonenberg_work> lol yes i know
<azonenberg_work> i have one for my 1102d
<awygle> want VNA
<azonenberg_work> awygle: i want to be able to do s-parameters through a diffpair
<awygle> don't we all
<azonenberg_work> So that would need a 4-port VNA right?
<azonenberg_work> Not a 2-port like that
<azonenberg_work> i'm honestly not sure what i'd do with a 2-port VNA because all of my high speed stuff is differential
<azonenberg_work> i guess i could characterize each leg independently and hope to extrapolate or something
kuldeep has quit [Read error: Connection reset by peer]
<awygle> i wonder if a 2:1 balun would give you what you need... probably not
<azonenberg_work> awygle: on a different note, some time soon i want to wriracing software
<azonenberg_work> to write some curve tracing software*
<azonenberg_work> (gaah vmware stealing my keyboard focus)
<awygle> oh good, there were more letters there lol
<azonenberg_work> basically bolt my DSO to my variable PSUs
<azonenberg_work> and add support for things like plotting I/V through a FET as a 3D function of Vgs
<awygle> should get a load pull machine and wire that in too
<awygle> or build one
<awygle> i wonder if a variable DC load like you can just buy would work for that...
<azonenberg_work> Not sure
<azonenberg_work> Short term i just want to be able to do things like plot I/V through a diode
<azonenberg_work> i mean i have all of the hardware i need already
<azonenberg_work> I just need to bolt the pieces together
<awygle> "just" :p
<reportingsjr> tools, tools, tools!
<reportingsjr> I keep wanting/needing a DC load and I'm torn between building one and buying a cheap one :P
<awygle> argh lol
<awygle> reportingsjr: same. also a lab psu
<reportingsjr> awygle: I bought the EEZ H24005 when the crowdsupply/whatever run happened, I've been pretty pleased with it
<reportingsjr> a bit pricey for my normal budget though
<awygle> oh cool. Missed this. What did it cost you?
<reportingsjr> I think it was around $400
<azonenberg_work> I want to build my own lab PSU because these rohde & schwarz ones are soooo slow
<reportingsjr> My normally budget about $300/year for a piece of test equipment, so it was a bit higher than I like to spend
<azonenberg_work> like, 1 FPS or less update rate over the SCPI interface
<awygle> yeah I wonder how fast this h24005 is
<azonenberg_work> If i could have the same PSU core and a better management interface i'd be happy
<reportingsjr> good question, I haven't pushed it
<awygle> if it's super slow we could always replace or augment the cpu with an fpga I guess
<awygle> it *is* open source
<reportingsjr> I know that it has the ability to output a "waveform" somewhat fast
<reportingsjr> I don't know if there is going to be another run of the HW at some point. I had heard that was going to happen at some point.
<awygle> How complex is it? We could always build, like, three lol
<reportingsjr> more complex than that
<reportingsjr> well, you could, the price would just be way higher than that
<reportingsjr> it looks like th creator is working on a new "rev" of the powersupply atm
<awygle> speaking of building stuff, glasgow boards arrive thursday
<reportingsjr> nice
<reportingsjr> rev b?
<awygle> yep
Bike has quit [Ping timeout: 252 seconds]
<reportingsjr> how many did you end up ordering?
<awygle> 10 PCBs, kit for 5
<reportingsjr> planning on sending any of them out to other people yet?
<awygle> one for me, two for whitequark, one for azonenberg_work, one to sacrifice to the yield gods
<reportingsjr> haha
Miyu has quit [Ping timeout: 252 seconds]
m_w has quit [Quit: Leaving]
kuldeep has joined ##openfpga
GuzTech has quit [Ping timeout: 246 seconds]
kuldeep has quit [Ping timeout: 252 seconds]
Bike has joined ##openfpga
kuldeep has joined ##openfpga
azonenberg_work has quit [Quit: Leaving.]
azonenberg_work has joined ##openfpga
kuldeep has quit [Ping timeout: 240 seconds]
kuldeep has joined ##openfpga
<awygle> what's the state of the art in open source DDR* controllers?
<q3k> litedram is pretty neat
<q3k> doesn't have phy code for ecp5 tho
<q3k> (i was supposed to write it but then I got distracted)
<gruetzkopf> doit
<q3k> (I have DDR working from migen/litex though, which is not as easy as it should be, though)
<q3k> (you need to manually string together a ddr block and a delay block and some custom logic :/)
kuldeep has quit [Ping timeout: 244 seconds]
kuldeep has joined ##openfpga
<awygle> litedram does look cool
<awygle> i don't love migen/litex tho. but I'll get over it I guess.
<q3k> one of us
<q3k> i don't love it either
<q3k> but then again, there's nothing about computers that I love anymore
<awygle> too dark for a tuesday afternoon lol
<awygle> i already have compilers gaslighting me i don't need an extra helping of ennui :p
<whitequark> awygle: why not?
<awygle> whitequark: i guess i just don't get it?
<whitequark> huh?
<awygle> like, i will totally admit that verilog is suboptimal in a lot of ways
<awygle> but i don't get why using python instead is somehow amazing
<q3k> it's not really instead
<awygle> it just means i have to learn a whole new set of conventions and spend _another_ six months not knowing whether i'm doing a <= or a = assignment
<whitequark> awygle: oh
<q3k> once you wrap your head around the fact that all the python is just run at compile time, it clicks
<awygle> i did that with verilog in college, i don't know why i'd throw away that knowledge
<whitequark> well uh, two things
<q3k> that knowledge still kinda of applies, and migen actually simplifies it
<whitequark> first, it's not useful knowledge
<q3k> you have self.comb and self.seq and that's it
<whitequark> it's an artifact of verilog being shit. it's like remembering all the implicit promotion rules in javascript.
<q3k> sorry, self.sync
<q3k> ^ this
<whitequark> second, the usefulness of migen is not in that it uses python
<awygle> i will take that as a typo and not you proving my point for me lol
<whitequark> python is actually kind of bad for this task
<whitequark> the usefulness of migen is that it's one meta level removed from verilog
<q3k> yes
<whitequark> you literally can't do with verilog what i'm going to do in migen for glasgow soon
<whitequark> which is to say
<whitequark> take sequential descriptions of processes, like "receive 4 bytes from a FIFO and put them into a register" and generate an FSM that has a state per byte
<whitequark> no amount of shitty verilog generate statements and macros will actually make that usable
<q3k> doing things 'the other way around', ie having your python meta-design generate external data is also super useful
<whitequark> writing verilog is like writing assembly. shitty assembly that's too high level to actually represent your target device yet too low level to express anything useful
<awygle> that... kind of sounds trivial to do in verilog, so i assume i'm not understanding hte example
<q3k> you createa config register, and that automatically updates a C header file for you if you use misoc
<q3k> without ever having to manually maintain a memory map of any sort
<whitequark> awygle: so you have a glasgow FIFO
<whitequark> with a dout, readable, and re signals
<whitequark> that gives you a byte at a time
<whitequark> an FSM that reads four bytes from there needs 4 states
<awygle> sure, okay
<whitequark> I can make an abstract operation of "get n bytes (or even bits) from FIFO and put them in this register" and have it generate the FIFO states automatically
<whitequark> without any defines, localparams, having to track what state has which number, caring about resets
<awygle> i mean i'd just do this with a counter. which is technically an N-state FSM but is super easy to define and reason about
<whitequark> oh, but now you need to embed this in a much larger protocol engine
<whitequark> think it's a part of the SWD protocol
<whitequark> the engine would be driven by FIFO commands and drive another module
<whitequark> take a look at swd.py I wrote recently and tell me this would be as straightforward in verilog
<rqou> what about the classic $SILICON_VENDOR technique of ad-hoc code generators written in perl/bash/c/tcl/php? :P
<awygle> q3k: i can see how that would be useful, although it is impossible to stress how much i do not care about the "cpu on an fpga" use case (but i realize other people do and that that would be useful for them)
<whitequark> this is migen
<whitequark> it's an ad-hoc code generator written in python
<q3k> awygle: that's just an example off the top of my head
<whitequark> just less crappy than the usual thing
<rqou> lol
<awygle> q3k: right, i get you and i see the value in the general point
<awygle> whitequark: okay, i'll take a look at that sometime soon
<q3k> but yeah, tbh migen does need better docs
<q3k> but I'm still not sure how to actually write better docs
<whitequark> there's also a lot of value in migen's stdlib
<awygle> i just had a hell of a time with the tiny bits of migen i wrote for the FTDI stuff
<awygle> which sort of reinforced my preexisting bias
<whitequark> I literally never want to write a FIFO from scratch
<whitequark> hm
<awygle> whitequark: again, i see the value in this, but my response to that is something like fusesoc, not something like migen or litex (and what even is the difference anyway?)
<q3k> litex is a fork of misoc
<q3k> both are frameworks that build on migen and provide abstractions useful in SoC definition
<q3k> both for the actual logic and automation around it
<whitequark> awygle: I can write Verilog but I can't imagine any reason I actually would want to write it, unless I'm literally forced to
<awygle> i guess the core of my irritation is that absolutely _everything_ in the FPGA ecosystem is _garbage_, and everyone is like "the reason is verilog" whereas i'm like "i can't even evaluate whether verilog is good or bad becaues i can't see it under all this garbage"
<whitequark> similar to how I can write C but can't imagine any reason I would want to
kuldeep has quit [Ping timeout: 264 seconds]
<whitequark> the reason isn't just verilog
<whitequark> verilog is as garbage as the other things but independenently
<whitequark> it's a pattern of bad decisions basically
<awygle> verilog is like writing C if every compiler segfaulted >1 time per day on every platform
<awygle> lol
<whitequark> verilog is like writing C without valgrind or ubsan
<q3k> verilog is the result of the ecosystem having made a series of unfortunate decisions for dozens of years
<whitequark> like yosys would just silently miscompile incorrect verilog and there is literally no way to make it complain
<whitequark> and clifford has not been very happy about me trying to fix that
<whitequark> "just write correct verilog duh"
<awygle> yeah well, that's a whole other thing
<whitequark> no that's a part of this pattern
<awygle> the level of macho in the ecosystem is quite high
<awygle> for no good reason
<q3k> whitequark: I vaguely remember this discussion from way back, if you still have an example I'd like to see it
<whitequark> mind you, yosys is lightyears ahead of the rest of the ecosystem, but even it is tainted with this stupid bullshit
<whitequark> q3k: it's on yosys bugtracker i think
<q3k> whitequark: since I was having a friendly argument with clifford about this recently
<whitequark> basically it resolves driver-driver conflicts to a constant
kuldeep has joined ##openfpga
<whitequark> in some unobvious cases
<whitequark> even when it emits a warning there's no good way to turn that into an error
<whitequark> clifford added an option to do -Werror based on warning text but I'm not going to use that, it's gross
<q3k> whitequark: do you know if any of the IEEE standards mandate a particular behaviour in that case
<whitequark> no
<whitequark> it's literally UB
<whitequark> by design
<q3k> ah wonderful
<whitequark> verilog has no fucking reason to have UB
<whitequark> during synthesis
<whitequark> i will die on this hill
<q3k> i think 'making the warning/error system better' is somewhere on the backlog closer to the head than the tail
<pie_> how can a HDL have undefined behaviour
<pie_> *sane HDL
<q3k> ie at least semanticize it so that -W(no-)error does not have to be stringly typed
<awygle> what would be a reasonable answer? i guess just "synthesis terminated"?
<whitequark> to be explicti, i think clifford is doing a great job, it's that my standards are set by the vastly better sequential language compilers
<whitequark> pie_: well migen doesn't have UB*
<whitequark> * not by design, i think you can sort of cause UB with it if you try, but that should be fixed as an implementation bug
<whitequark> q3k: but even -Werror now is not sufficient
<whitequark> there's sme cases when that warning isn't emitted at all
<awygle> well anyway, thanks for the lively discussion. i will continue to try to grasp the value of migen
<whitequark> I have a patch somewhere in the queue that fixes it, but I got frustrated after that discussion with Clifford and never finished it
<whitequark> I can give you the diff if you want
<awygle> and continue to not yell "WHY" at everyone who mentions it
<q3k> i wouldn't mind just a pointer to the issue tracker and/or pull request if there is one
<q3k> i honestly won't have time to work on that this month :)
<q3k> i first wanna resolve some nextpnr bugs
<q3k> like the PLL bugs the cr1901_modern keeps finding :P
<q3k> *that
<whitequark> here's the case where no warning is emitted at all
<pie_> quote worthy: <q3k> verilog is the result of the ecosystem having made a series of unfortunate decisions for dozens of years
<awygle> to finish out my thought earlier
<q3k> whitequark: thx
<awygle> it seems to me ill-advised to try to overturn the existing ecosystem without the support of any of the hardware vendors before we have exhausted all other options. which is why i am much more interested in (open source, none of this Verific bullshit) SystemVerilog support for Yosys than i am in Migen
<awygle> thank you all for your time lol
<q3k> awygle: i would live SV support in yosys as well
<q3k> awygle: just that nobody has come up and said that they want to and have time to implement it :P
<awygle> yeah that second half is a real bear :p