#glasgow on 2019-03-03 — irc logs at freenode.irclog.whitequark.org

2019-02-06 02:04 whitequark changed the topic of #glasgow to: glasgow debug tool · code https://github.com/whitequark/Glasgow · logs https://freenode.irclog.whitequark.org/glasgow

03:40 <whitequark> ok

03:40 <whitequark> i need a taxonomy

03:41 <whitequark> spi-flash-25c is dumb

03:41 <whitequark> and spi-flash-avr even more so

03:41 <whitequark> what's important is that it operates on memory

05:56 _whitelogger has joined #glasgow

06:14 _whitelogger has joined #glasgow

09:04 <_whitenotifier> [whitequark/Glasgow] whitequark pushed 2 commits to master [+55/-44/±3] https://git.io/fhApf

09:04 <_whitenotifier> [whitequark/Glasgow] whitequark 8af62aa - software: remove explicit asyncio dependency.

09:04 <_whitenotifier> [whitequark/Glasgow] whitequark efb57fe - applet: invent a taxonomy and reorganize everything according to it.

09:06 <_whitenotifier> [Glasgow] Success. The Travis CI build passed - https://travis-ci.org/whitequark/Glasgow/builds/501039580?utm_source=github_status&utm_medium=notification

10:16 * whitequark pokes marcan

10:17 <whitequark> think you can go over the various layout issues?

10:17 <whitequark> revC0 desperately needs design files exported

10:17 <whitequark> for one

10:17 <marcan> busy today, but ack

13:20 ali-as has joined #glasgow

13:22 <ali-as> whitequark, I share some of your pain with EJTAG.

13:23 <whitequark> ali-as: i also implemented the undocumented ejtag 1.x/2.0 hack required for broadcom silicon

13:23 <whitequark> that was horrifying

13:23 <ali-as> Hmm, what hack?

13:23 <whitequark> you have to enable DMAAcc, which is an undocumented mode that lets you directly read and write memory

13:23 <whitequark> to enable the rest of EJTAG

13:24 <whitequark> there's a bit that would cause the processor to fault each time you're trying to access drseg

13:24 <whitequark> er, dmseg

13:24 <whitequark> so you'd never actually, well, be able to feed it any instructions

13:24 <whitequark> now imo they should have kept DMAAcc and memory-mapped the CPU state or something

13:24 <whitequark> how fucking hard can it be

13:25 <ali-as> Interesting. Mid 2000's I modified HairyDairyMaid's router flasher to work with Broadcom SOCs.

13:25 <ali-as> That was a complete nightmare.

13:25 <ali-as> It was an early EJTAG version though, no DMA access.

13:26 <ali-as> 64bit CPU, about 40 address lines (I'm fuzzy on this).

13:28 <ali-as> I'd moved from STMicro ST20, which had an utterly glorious hardware debug system to EJTAG which seemed little more than a port to the CPU instruction scheduler.

13:28 <whitequark> no, my code is a descendant of HairyDairyMaid

13:29 <whitequark> that's the only place where DMAAcc is documented

13:29 <ali-as> Do the chips you are working with have a FASTLOAD register?

13:30 <ali-as> I never released my code btw. I was part of a small group of people interested in modifying sat boxes.

13:31 <whitequark> FASTDATA is in EJTAG and I think that exists

13:31 <whitequark> but it doesn't help a whole lot while introducing a huge amount of complexity

13:31 <whitequark> what I'm thinking about is well, using a small CPU on the FPGA to drive JTAG transactions

13:32 <whitequark> well I guess first I should add proper pipelining

13:34 <ali-as> There are some cool aspects of EJTAG I didn't really get while I was sweating blood over FASTLOAD.

13:35 <ali-as> At a time where you were lucky to push a TCK to 10MHz, the EJTAG port was specced at something silly like 40MHz.

13:35 <ali-as> And you only needed a small amount of memory to stop feeding the CPU instructions and just do data.

13:36 <ali-as> Of course the 40MHz spec didn't help at all, I was using a parallel port wiggler ;)

13:47 <ali-as> (reading), FASTDATA could be what I'm thinking of, but the openocd docs say it's 1 bit wide. What I remember is a register with PrAcc+Address+Data in one scan chain.

13:47 <ali-as> I made it work in one direction I think.

13:51 <ali-as> Halvar retweeted your EJTAG thread btw. Sorry to burst in mid like this is mid conversation ;) I've had people tell me I should look at Glasgow.

13:52 <whitequark> ali-as: ohhh that one

13:52 <whitequark> let me look up, i remember something like it

13:53 <whitequark> i may even be using it

13:53 <whitequark> ali-as: yeah, it's called "ALL"

13:54 <whitequark> register 0x0B

13:55 <ali-as> Hmm. Is it though. I've not touched this in 10 years. I swear I have PTSD trying to make that work, reading and burning was SO SLOW.

13:56 <ali-as> I looked into adding a CPLD to a MCU to speed up the protocol but never made anything happen.

13:56 <whitequark> yes, ALL is combined CONTROL+ADDRESS+DATA

13:56 <ali-as> I may be mis-remembering.

13:57 <whitequark> FASTDATA is basically a FIFO your code may be using through drseg

13:57 <whitequark> it's useful for something like piping printf() debug to a debugger

13:57 <whitequark> printf() output*

13:57 <ali-as> That sounds right.

13:58 <whitequark> oh, I see why it's not implemented, it's in EJTAG 2.6

13:58 <whitequark> wait, no, I'm wrong

13:58 <whitequark> FASTDATA saves you one CONTROL write to set PrAcc bit to 0

13:59 <whitequark> there is a separate mechanism called Fast Debug Channel in EJTAG 5

13:59 <whitequark> which is what I described above

13:59 <whitequark> the instruction for fast debug channel is called FDC

14:00 <ali-as> I was in a hell of a mess for months. I'd made a mistake with 64bit shifts (64bit support was undocumented and and a bit broken but the actual mistake was mine) and I had to have nops as half of the 64bit instruction.

14:00 <whitequark> amazing

14:00 <whitequark> no matter how bad you can make a claim about EJTAG i will believe you

14:01 <ali-as> Sorry no, support in the C compiler.

14:01 <ali-as> But there was no public code for 64bit EJTAG.

14:01 <whitequark> ahh

14:01 <whitequark> funnily enough, glasgow theoretically supports 64 bit EJTAG

14:01 <whitequark> all the parts are there

14:02 <whitequark> I just have nothing to test it on

14:02 <ali-as> Parts of EJTAG, including FASTDATA I ultimately think were broken in the chip, but that's probably Broadcoms Fault.

14:02 <ali-as> But then I never had a datasheet for the SoC because Broadcom don't do that.

14:02 <whitequark> FASTDATA or FDC?

14:02 <whitequark> broadcom seems to have some really weird concept of EJTAG

14:03 <ali-as> FASTDATA rings a bell.

14:03 <whitequark> ugh, let me pull the PCB out of the shit router pile i have

14:04 <whitequark> i went to the nearest pawn shop (or something like it) and bought their oldest, crappiest routers

14:04 <whitequark> in hope for jtag

14:04 <ali-as> My memory is that early EJTAG was just a hole in the instruction scheduler and someone says technically, in a turing complete proof kida way this is your debugger, enjoy.

14:04 <whitequark> it's mostly broadcom although i have found some truly bizarre SoCs

14:04 <whitequark> that's late EJTAG too.

14:04 <whitequark> all of EJTAG is like that.

14:04 <ali-as> At the time EJTAG had standards that had DMA, but this broadcom chip EJTAG version was pre DMA.

14:05 <whitequark> they actually removed DMA in the new EJTAG spec

14:05 <ali-as> I think I read that.

14:05 <whitequark> and in fact i have never found any EJTAG spec that would describe DMA

14:05 <whitequark> it's 1.x specs

14:05 <whitequark> which I don't have

14:05 <whitequark> I had to plunder register bit definitions from openocd

14:06 <ali-as> It's probably minimal silicon.

14:08 <ali-as> I 'discovered' the STM32F4 chips a few years ago, they will toggle at 80Mtoggles/second I thought fantastic, I can make the JTAG debugger of my dreams, but now JTAG locked out of most of the consumer hardware I'd want to access.

14:11 <whitequark> Glasgow is probably a JTAG debugger of your dreams :P

14:11 <ali-as> It actually probably is ;)

14:11 <whitequark> the level shifters are rated at... well let's say you will never make the FPGA fabric toggle that fast.

14:11 <whitequark> 200 MHz is doable.

14:12 <whitequark> 300 MHz, maaybe

14:12 <whitequark> really at this point your TAP isn't going to keep up for sure.

14:12 <ali-as> Wow.

14:12 <whitequark> right now I think fmax is around 10 MHz, but I simply never had a reason to push it higher.

14:12 <whitequark> I could make it go to at least 60 MHz with a few lines of changes.

14:12 <whitequark> 120 MHz with minimal work, 240 MHz with slightly more work.

14:13 <ali-as> I bought a 'DLC9' from ebay. It's not even an honest clone, it's a FTDI chip with a level shifter. Functional but fake.

14:13 <whitequark> ugh.

14:13 <whitequark> I hate the FTDI shit.

14:13 <ali-as> Yeah, yeah that.

14:13 <whitequark> I've implemented FTDI MPSSE in gateware.

14:13 <whitequark> Glasgow is supposed to be able to emulate FTDI chips.

14:13 <ali-as> Wow.

14:14 <whitequark> so far I found it more interesting *and* useful to implement applets that directly access the required interface though.

14:14 <whitequark> Glasgow has an integrated logic analyzer, so if you write an "accelerated JTAG" gateware and it breaks, you add a `--trace jtag.vcd` argument.

14:14 <whitequark> then you put that into PulseView

14:14 <whitequark> zero messing with wires.

14:15 <whitequark> the logic analyzer understands that JTAG is a synchronous interface, so if your gateware is trying to toggle too fast, it'll gate its clock until the LA buffers clear.

14:15 <whitequark> (some caveats apply)

14:15 <ali-as> WOW.

14:15 <whitequark> this means that when Glasgow is driving any synchronous interface as a master, you can directly observe the signals that the FPGA sees, with *zero* additional effort.

14:16 <ali-as> THAT IS AWESOME.

14:17 <whitequark> the logic analyzer is still rough at edges, e.g. if you have floating pins near the crossing point, it may quickly overload its buffer, and you can't stop that noise by stopping the clock, for example

14:17 <ali-as> The number of times I've had to hook up an oscilloscope to check what the REAL TDO response is.....

14:17 <whitequark> but it's quite usable

14:17 <whitequark> yep, no oscilloscopes

14:17 <ali-as> Everything abstracts JTAG protocol, everything.

14:17 <whitequark> you could even hook up signals from inside your gateware directly to the logic analyzer

14:18 <whitequark> like the FSM state

14:18 <whitequark> once I get some refactoring done, I want to make that fully automatic

14:18 <ali-as> Are there spare lines you can wire into other parts of the circuit?

14:18 <whitequark> so you get pin states, pin output enables, and state of your gateware, all with one command line argument

14:18 <whitequark> oh no, it's not structured like that.

14:18 <whitequark> you give it the set of signals you want to watch.

14:19 <whitequark> it converts that into a set of FIFOs, such that it uses block RAM more efficiently

14:19 <whitequark> for example, it also watches all the commands that are read from USB, and responses to them.

14:19 <whitequark> often, 1 USB command can produce 1000s of toggles on data lines.

14:19 <whitequark> and you don't want to share your FIFO depth between that.

14:20 <whitequark> so, it has shallow buffers for FIFOs, and deep buffers for data lines.

14:20 <whitequark> also, it does on-the-fly compression.

14:20 <whitequark> if nothing is toggling, there will be no data output on the logic analyzer pipe.

14:20 <whitequark> it works best on very bursty data.

14:21 <ali-as> If the first time I looked at your schematics I'd have seen a xilinx chip, I'd have joined this project straight away.

14:21 <whitequark> internally this is implemented by keeping several data FIFOs, an event FIFO that says which data FIFO has a value for which event, and a timestamp FIFO that stores deltas between samples.

14:21 <ali-as> I've never touched Altera before.

14:21 <whitequark> then, there is an encoder that collects all these FIFOs and converts them to an optimized byte stream.

14:22 <whitequark> for example the timestamps are 16 bit, so you need to insert a dummy event each 65536 cycles

14:22 <ali-as> It's higher level than I'm going to be able to follow on a first pass.

14:23 <whitequark> the encoder maintains a running tally with a 35 bit counter, so if absolutely nothing happens, it'll send just a few bytes per hour

14:23 <whitequark> it will also only report the actual events at any cycle that an event has happened

14:24 <whitequark> I also have special code that considers FIFO depth and width, and picks the block RAM primitive that is most suited for the geometry, in terms of not leaving unused silicon

14:24 <whitequark> ali-as: I don't use Altera.

14:24 <whitequark> I use Lattice.

14:24 <ali-as> Some of my early stuff on EJTAG was programing 4 keys on my keyboard to be TMS TDI 00 01 10 11, and the output was the state of TDO, and I spent hours just going through registers.

14:25 <ali-as> Cyclone is Lattice?

14:25 <whitequark> and in fact, the project would not be possible on any other FPGA family other than iCE40 when it was designed, and is still not possible on any other FPGAs other than Lattice's

14:25 <whitequark> I do not use Cyclone

14:25 <whitequark> I use Lattice iCE40, the iCE40HX8K in the latest revision.

14:25 <whitequark> the reason is, iCE40 has a completely open-source toolchain, and it is of extremely high quality.

14:26 <ali-as> Oh, I'm an idiot, I'm reading CY7 and that's Cypruss USB chip.

14:26 <whitequark> when you run an applet, the Glasgow software dynamically composes, synthesizes, places and routes a bitstream specifically optimized for whatever parameters you just requested.

14:26 <whitequark> for something like an UART, this takes... let's see

14:27 <whitequark> under 9 seconds of wall clock time.

14:27 <ali-as> I have nothing against Lattice either, but It's going to be a whole new toolchain to learn :/

14:27 <whitequark> from $ glasgow run uart, to having a bitstream loaded into the FPGA.

14:27 <whitequark> nothing is cached.

14:27 <whitequark> the tools are so fast, that caching would be a waste of space and a source of bugs.

14:27 <whitequark> can you see *any* xilinx tool *ever* producing a bitstream in 9 seconds?

14:28 <whitequark> with the amount of useless shit Vivado has, and the completely braindead way it is written, you are lucky if it will start doing anything useful in 9 seconds.

14:28 <ali-as> It's slow for sure, and I've not linked it to anything else.

14:29 <whitequark> also, you wouldn't really need to learn anything except Migen

14:29 <whitequark> Glasgow transparently handles things like pin assignment, USB request (both FPGA side and PC side), and even PLLs.

14:29 <whitequark> you tell it what frequency you want, and it automatically computes every parameter.

14:29 <whitequark> no "wizards", just a Python function.

14:30 <ali-as> I can't code in python.

14:30 <whitequark> you'll have to learn that to write Glasgow applets

14:30 <whitequark> Glasgow does *everything* in Python.

14:30 <whitequark> gateware in Python, host software in Python, UI in Python, etc

14:30 <whitequark> yes. no Verilog at any point

14:31 <whitequark> the reason is, remember my logic analyzer?

14:31 <ali-as> That's dissapointing. I'm happy with assembler (on a few CPUs) and some C. I want to learn either Verilog or VHDL.

14:31 <ali-as> Yes.

14:31 <whitequark> Glasgow sets things up so that your applet is automatically instrumented, with just a single command line option. then it adds the USB code to read the data from the logic analyzer. then it formats the binary stream to have the pin names you have assigned, regardless of the actual pinout. no more looking for "probe 13"

14:32 <whitequark> you can't do this in Verilog.

14:32 <whitequark> Migen, and (by extension as well as independently) Python, are critical components that make Glasgow what it is, and if I had to stick with C, I would have not bothered with the project.

14:33 <whitequark> (as a professional language designer I also think C is a disaster reprehensible in every imaginable way, but that isn't even very relevant here.)

14:33 <whitequark> (I hate Python too.)

14:33 <ali-as> I hated C++ when I tried to learn it, and when I looked at python, invisible formatting characters are part of the syntax.

14:33 <whitequark> doesn't matter.

14:33 <emily> spaces aren't invisible

14:34 <ali-as> I've heard of migen, but not used it.

14:34 <emily> ifyouinsisttheyarei'llstarttalkinglikethis

14:34 <ali-as> How do you tell a tab from a series of spaces?

14:34 <whitequark> arbitrary syntactic choices in a language almost never matter, and significant indentation is one of those

14:34 <emily> by never using tabs in python

14:34 <emily> it warns on them by default as of 3 i think? or maybe even rejects

14:34 <whitequark> emily: it rejects mixed tabs and spaces, yes

14:34 <whitequark> ali-as: there are far worse things lurking in Python's semantics, that's for sure.

14:35 <emily> tired: complaining about the off-side rule

14:35 <ali-as> I have a very hard time with anything high level.

14:35 <emily> wired: complaining about "global"/"nonlocal"

14:35 <whitequark> but the amount of expressive power it provides is enormous, and very important to Glasgow.

14:35 <whitequark> Glasgow is a high-level tool, fundamentally, because it abstracts almost everything from you.

14:35 <whitequark> it abstracts USB and makes it as if you had a FIFO between your PC and the FPGA directly instead.

14:36 <ali-as> That will be a problem for me, as the moment something doesn't work the way I need I won't be able to change it.

14:36 <whitequark> it abstracts pins and lets you define the pinout as a CLI parameter.

14:36 <whitequark> so, here's the thing.

14:36 <ali-as> So I might be better waiting for a working final version.

14:36 <whitequark> working final version?

14:37 <whitequark> there will be no version of software/gateware stack that is more final than what currently exists.

14:37 <gruetzkopf> I'm not sure that this project can have a final version

14:37 <ali-as> Sorry my understanding were that the versions were in flux and there was a plan for something like a kickstarter.

14:37 <whitequark> nope, I don't do kickstarters.

14:37 <whitequark> there is no need for more capital.

14:37 <whitequark> all we have to do is to replace one chip for revC1, test it, and then it's off to fab to be sold for anyone who wants a board.

14:38 <whitequark> if you don't care for configurable pull resistors (yes, on every pin. so you can do I2C with just four wires, no external passives!) then you can even order a revC0 board right now.

14:38 <ali-as> Hmm.

14:38 <whitequark> I'll tell you more.

14:38 <ali-as> If I wired that to JTAG....

14:38 <whitequark> eventually, there will be more and more powerful hardware.

14:39 <ali-as> Would that tell me if I'm on an input or a driven line?

14:39 <whitequark> this is the third hardware iteration, I changed the level shifters and the FPGA from the previous revision.

14:39 <whitequark> no applet code was modified.

14:39 <whitequark> even though I went from autosensing level shifters to explicit direction control signals.

14:40 <whitequark> that's the power of abstraction; I had a new board with half of the chips replaced, I spent maybe a day, and all of my dozen+ applets worked on it instantly.

14:40 <whitequark> just faster and more reliable :D

14:40 <whitequark> moreover.

14:40 <whitequark> there will be eventually revD (no LVDS bank, but 32 channels), and revE (a much faster FPGA with 5G SERDES).

14:40 <whitequark> any applet that runs on revA will run just as well on revE, which has a different FPGA family and completely reworked IO banks.

14:41 <ali-as> Only I've spent time with an oscilloscope an a series of high value resistors to ground and Vcc trying to identify TDO.

14:41 <whitequark> well, it'll be like five times faster, but other than that, no changes.

14:41 <whitequark> 14:39 < ali-as> Would that tell me if I'm on an input or a driven line?

14:41 <whitequark> there is already an applet called jtag-pinout

14:41 <whitequark> ali-as: this is a command I just typed: https://paste.debian.net/1071211/

14:41 <ali-as> Cool.

14:41 <whitequark> I never label the JTAG pins on my routers.

14:42 <whitequark> too much effort.

14:42 <whitequark> I plug Glasgow into a router more or less randomly, just caring for where the ground is.

14:42 <whitequark> then I ask it to tell me what the pinout actually is.

14:42 <whitequark> then I copy the CLI arguments it gave me, something like this...

14:43 <ali-as> (reads pastebin) O've blown away by this.

14:43 <ali-as> I'm.

14:43 <whitequark> ali-as: https://paste.debian.net/1071214/

14:43 <whitequark> this is another command I just ran.

14:44 <whitequark> copied the arguments from the jtag-pinout applet output.

14:44 <whitequark> it rebuilt a bitstream, instead of a bitstream optimized for pinout detection, it made a bitstream optimized for TAP probing.

14:45 <whitequark> ali-as: my *entire* MIPS code is under 1000 lines, and it deals in abstractions like write_ir and read_dr.

14:45 <whitequark> the JTAG layer provides a TAP abstraction, so if you have many TAPs on the chain? the MIPS EJTAG code doesn't care at all.

14:45 <whitequark> the other TAPs are automatically bypassed.

14:45 <whitequark> even better.

14:46 <ali-as> Do they have to be bypassed?

14:46 <ali-as> If you for example wanted to set a boundary scan pattern on another chip at the same time?

14:47 <whitequark> if you run the `jtag-probe scan` command, it will automatically segment DR to detect every IDCODE/BYPASS, and based on that information and what it knows about valid patterns in Capture-IR, it will automatically segment IR.

14:47 <whitequark> so if you have many devices in the TAP chain, most of the time (if it's possible at all) you don't even need to specify IR lengths.

14:48 <whitequark> of course, if you only have one device, it doesn't need any handholding at all.

14:48 <whitequark> 14:46 < ali-as> If you for example wanted to set a boundary scan pattern on another chip at the same time?

14:48 <whitequark> this is possible, but this code would have to be added explicitly.

14:48 * ali-as nods.

14:48 <whitequark> you would typically make an applet for your specific TAP chain, then delegate to MIPS EJTAG and whatever else.

14:48 <whitequark> the MIPS EJTAG code doesn't know what kind of underlying hardware it runs on, at all.

14:49 <whitequark> it just knows how to manipulate the EJTAG TAP.

14:49 <whitequark> you could definitely add some code that would manipulate another TAP in parallel, if you wanted.

14:49 <whitequark> you'd need to accept commands for the EJTAG TAP, and return results.

14:50 <whitequark> the EJTAG TAP code operates in terms of methods such as write_ir() and exchange_dr().

14:50 <whitequark> it doesn't care how exactly it's done. it doesn't explicitly specify state transitions either.

14:51 <whitequark> if you *do* need control over state transitions (e.g. say you want to go to SWD mode), that's possible.

14:51 <whitequark> you could operate in terms of functions like enter_shift_ir() and shift_tdio().

14:52 <whitequark> that takes care of the TAP state machine management for you.

14:52 <whitequark> (it selects shortest path)

14:52 <whitequark> this is how the SVF support is implemented. (yes, there's an SVF player.)

14:52 <whitequark> if you need even *more* control, you can operate in terms of shift_tms() and shift_tdio().

14:53 <whitequark> but you'd need to be careful to not confuse the rest of JTAG layer. you'd have to tell it what state it's in, or reset the entire thing.

14:53 <whitequark> the thing about abstractions is, well, MIPS EJTAG doesn't care about what happens on the TMS pin at all. it doesn't even care that there's a TMS pin.

14:53 <whitequark> so I want no traces of that in my MIPS EJTAG code.

14:54 <whitequark> but that doesn't mean I can't ever drop down to that level, if I have to.

14:54 <whitequark> and on the other hand? the abstraction gives me a lot of tools.

14:54 <whitequark> it lets me automatically bypass all the TAPs I don't want, since that's what you most often need.

14:54 <ali-as> I remember my code having to jump between JTAG chains, so I needed TMS.

14:54 <whitequark> it has a function that measures IR or DR length, in case you don't know it.

14:55 <whitequark> (MIPS EJTAG needs DR length measurement in fact, that's how you determine how many physical address bits you have...)

14:55 <ali-as> I had to hard code that.

14:55 <whitequark> no hardcoding.

14:55 <ali-as> And I kept making off by one errors counting.

14:55 <whitequark> ali-as: https://github.com/whitequark/Glasgow/blob/master/software/glasgow/applet/debug/mips/__init__.py#L86-L89

14:55 <whitequark> three lines.

14:55 <whitequark> and one of them is a sanity check.

14:56 <whitequark> moreover.

14:56 <whitequark> abstraction lets me do something like this.

14:56 <whitequark> let's say I have an ONFI NAND flash and I want to program it in-circuit

14:56 <whitequark> but I have a chip that I know nothing about except its pinout, maybe I have a BSDL file and that's it.

14:57 <ali-as> max length 64. So it screws up if they ever make one longer?

14:57 <whitequark> no, it returns None, and the assert on the next line catches that.

14:57 <whitequark> (continuing) so I already have code to do JTAG, and already have code to do ONFI NAND.

14:58 <whitequark> the ONFI NAND code operates in terms of "set CE/CLE/ALE high/low" and "clock this data in/out".

14:59 <ali-as> This might be a dumb question, but if you are scanning a register that may contain junk, how can you be sure the length is right, because this kept me awake at night ;)

14:59 <whitequark> it doesn't really care how the physical bus is implemented.

15:00 <whitequark> so you make a small applet that receives commands from ONFI NAND code, and implements them in terms of JTAG transactions that twiddle BSCAN register bits.

15:00 <whitequark> it reuses JTAG code *and* ONFI NAND code. all improvements that are made to ONFI NAND code are available to you at no cost, forever.

15:00 <whitequark> if I add a massive NAND database tomorrow so that end users don't have to look up datasheets for the chips, it will be available to your JTAG flasher.

15:01 * ali-as nods.

15:01 <whitequark> Glasgow applets are like compound interest, the more I add, the higher the value, and it's kind of exponential.

15:01 <whitequark> imagine never writing an SPI driver or bitbanger again.

15:02 <whitequark> 14:59 < ali-as> This might be a dumb question, but if you are scanning a register that may contain junk, how can you be sure the length is

15:02 <whitequark> right, because this kept me awake at night ;)

15:02 <whitequark> there are two pieces of code that measure register length in Glasgow.

15:02 <whitequark> well, JTAG register length.

15:02 <whitequark> the simple piece of code is available for use for other applets, like MIPS EJTAG.

15:03 <whitequark> first, it shifts in max_length zeroes. this should have flushed the register, no matter how long it is, as long as it's under max_length.

15:03 <whitequark> second, it shifts ones until it shifts back an one.

15:03 <whitequark> third, it notes the data it shifted out in the very first step, and shifts it back, in case it was somehow important.

15:03 <whitequark> so when you exit through Capture-DR, your device doesn't emit magic smoke.

15:04 <whitequark> now, this works well if your device is well-behaved, and you already know you have a register there, and what's its size.

15:04 <whitequark> the more complex piece of code lives in the jtag-pinout applet.

15:05 <ali-as> max length for boundary scan can be thousands of bits, it's a really ugly thing to do to flush something ridiculous in linear time.

15:06 <ali-as> Just in case the next one is one bit longer than you ever thought.

15:06 <whitequark> it's actually very fast

15:07 <whitequark> I think I could make it even faster, so that even tens of thousands of DR bits would be scanned in ... about as long as it takes to do an USB transaction.

15:07 <whitequark> but it's not really necessary

15:07 <whitequark> well, hasn't been necessary yet.

15:07 <whitequark> if you come with some use case that needs to constantly scan multiple thousand bit DRs, I'll just make it faster.

15:07 <whitequark> hell, I could add some gateware, purely to accelerate scanning huge DRs.

15:08 <whitequark> you have ten square mm of silicon that's all just one giant DR? no problem, i've got you.

15:08 <ali-as> When I was doing ST20 I looked into merging my code with OpenOCD. I talked to the original author, who had taken a step back and a people had abstracted away the capture dr bit so that it had to terminate.

15:09 <ali-as> Which made it essentially impossible to do ST20 debug on it properly.

15:09 <ali-as> (JTAG is just a gateway for a sort of clocked version of the OSLink interface, which was the dark magic of ST20 debugging before JTAG came along)

15:10 <ali-as> So I'm really really excited by what Glasgow can do.

15:10 <ali-as> And I have a niggle that if someone I want isn't compatible it's going to be impossible for me to modify.

15:11 <ali-as> Umm, something.

15:13 <ali-as> Sorry, that may not have been very clear, for ST20 debugging you have to get into Shift-DR and then stay there forever (or until you want to stop debugging).

15:15 <ali-as> If you move to Exit-DR you have to restart a session and you may miss bits sent by the smart debug controller.

15:15 <whitequark> yes, I understood that.

15:16 <whitequark> you would have an applet that grabs the raw JTAG interface, i.e. if there are other TAPs, you would have to bypass them yourself.

15:16 <whitequark> then you would do:

15:16 <whitequark> await jtag_iface.enter_shift_dr()

15:16 <whitequark> output = await jtag_iface.shift_tdio(input, last=False)

15:16 <whitequark> the second line can be repeated forever.

15:16 <whitequark> it'll just stay in Shift-DR and exchange bits.

15:17 * ali-as nods. Cool.

15:17 <ali-as> I must not have touched ST20 in about 15 years.

15:17 <whitequark> so my current thoughts are, if I ever add "JTAG accelerator" to the applet, basically a small CPU with the JTAG master memory-mapped, you would always have a command that bypasses it.

15:18 <whitequark> you tell the FPGA "hold on, now I am going to send/receive raw TDIO bits"

15:18 <whitequark> that's not going anywhere.

15:19 <whitequark> what is ST20?

15:19 <whitequark> 15:10 < ali-as> And I have a niggle that if someone I want isn't compatible it's going to be impossible for me to modify.

15:19 <whitequark> I mean, that can never happene.

15:19 <whitequark> in the worst case, you write your own gateware to drive the JTAG interface however you want.

15:20 <ali-as> Well, that can happen, it can be written in python ;) I will probably never understand python.

15:20 <whitequark> there isn't really anything "built in" on the I/O bank side, you always implement some sort of interface yourself. you can reuse existing code.

15:20 <whitequark> you do get a lot of built-in stuff on the USB side.

15:20 <whitequark> you almost never have to care about USB's utterly braindead nature.

15:20 <whitequark> and utterly braindead OS USB stack implementations.

15:21 <whitequark> it even works on Windows.

15:21 <whitequark> very slowly.

15:21 <whitequark> 15:20 < ali-as> Well, that can happen, it can be written in python ;) I will probably never understand python.

15:21 <whitequark> so about that.

15:21 <whitequark> I am thinking about adding an RPC interface.

15:21 <whitequark> it would use https://google.github.io/flatbuffers/ for serialization

15:21 <whitequark> so instead of using Python, you could use C, to connect to the rest of the Glasgow stack via a socket.

15:22 <whitequark> then you would tell it "go to Shift-DR" and then "transfer these bits on TDIO".

15:23 <ali-as> Short hstory of the ST20, If you've heard of INMOS Transputer, well when they went bust/were eaten STMicro found themselves with a tiny 32-bit CPU, so when they made set top box chips something that took little silicon area was really useful and for about 10 years the most cost effective highly integrated STB chip had a ST20C2/4 CPU in it.

15:24 <ali-as> It's insane.

15:25 puck has quit [Read error: Connection reset by peer]

15:26 puck has joined #glasgow

15:26 <ali-as> I mean the ALU is like an RPN calculator. 3 deep register stack, push 5, push 4, add.

15:27 <ali-as> Half the instruction set is a descheduling point which means a task switch would currupt the stack and you have to reload your registers after every one.

15:28 <whitequark> huh.

15:29 <ali-as> At the peak of my understanding I wrote some debug trap routines. Nothing to tough. Circular buffers to feed data to the PC so I could on the fly rebuild debug messages that had been removed from the production firmware. I grew to like insane.

15:29 <whitequark> oh, that's a very common trick.

15:29 <whitequark> if C was a good systems language, C compilers would provide it natively.

15:30 <whitequark> Rust is *almost* there

15:34 <ali-as> Bare metal C is almost a thing. I was writing everything in assembler at that point.

15:35 <ali-as> Host stuff in tasm, ST20 stuff manually converting to hex.

15:36 <whitequark> I don't believe in writing assembly, personally.

15:36 <whitequark> Glasgow has a CPU core I designed for it, and its embedded instruction stream is composed using, you guessed it, Python.

15:36 <whitequark> there's also a MIPS not-assembler.

15:36 <ali-as> :)

15:36 puck has quit [Ping timeout: 246 seconds]

15:37 <whitequark> https://github.com/whitequark/Glasgow/blob/master/software/glasgow/applet/debug/mips/__init__.py#L441-L460

15:37 <whitequark> this will probably remind you of things.

15:37 <ali-as> I like assembler, cycle counting, pig tailing code.

15:37 <whitequark> I thought about shelling out to as, then thought again, and nah, generating instructions with Python it is.

15:38 <whitequark> cycle counting, hmm

15:38 <whitequark> so most of Glasgow gateware is designed around a single FSM that interprets command streams serialized through USB.

15:38 <ali-as> It didn't look like that, but that's familiar, saving the return address.

15:38 <whitequark> I think writing out FSM states by hand is silly and wasteful.

15:39 <ali-as> I used a mips assembler to get the hex and then pasted the hex into my source for the debug routines.

15:39 <whitequark> yeah, no, I'm far too lazy for that. I like easy to read code.

15:39 <whitequark> and easy to modify.

15:39 puck has joined #glasgow

15:40 <whitequark> I can even alter parts of it depending on which core I'm working with.

15:40 <ali-as> This was a few years after the ST20 stuff I did, things went high definition and STMicro moved away from the ST20, it wasn't really powerful enough.

15:40 <whitequark> anyway, I need to do some refactoring, but I'm going to make an FSM generator that takes a high-level description of your command set that you want between the applet gateware, and applet host software.

15:41 <whitequark> something like, "this command C sets FPGA reg X to the 18-bit value it receives over USB".

15:41 <whitequark> and then it generates all the logic required to make it happen.

15:41 <ali-as> I do Karnaugh maps manually too still ;P

15:41 <ali-as> I feel like a luddite, thanks for that, hahaha.

15:41 <whitequark> at one point I was not happy about synthesis results from my FPGA toolchain.

15:41 <whitequark> I considered doing K-maps for one millisecond.

15:42 <whitequark> then I found a paper from 1993 that describes a depth optimal LUT mapping algorithm, and implemented it for the synthesizer.

15:42 <whitequark> (runs in polynomial time, even!)

15:42 <whitequark> then I also wrote a LUT optimizer that can do things like...

15:42 <whitequark> suppose you have 4-LUT FPGA.

15:42 <ali-as> Do you need to do that for a LUT? It's basically a ROM.

15:43 <whitequark> no, I mean, FPGA LUTs.

15:43 puck has quit [Excess Flood]

15:43 <whitequark> the basic FPGA logic elements are lookup tables, used to implement arbitrary k-input functions.

15:43 <ali-as> Yes.

15:43 puck has joined #glasgow

15:43 <whitequark> right, so let's say you have an adder.

15:44 <whitequark> for each bit, you have an A input, B input, C (carry) input, and one output (assuming dedicated carry logic).

15:44 <whitequark> so that's a 3-input function in a 4-LUT.

15:44 <whitequark> now, adders are usually mapped to dedicated carry logic and LUTs using pattern matching in a synthesizer, as opposed to some generic synthesis method, because you want to take advantage of carry chains.

15:44 <whitequark> and carry chains usually have restrictions on routing.

15:45 <whitequark> now let's say you have an adder, A+B, and a mux after it, which selects A+B if S=0 and just A if S=1.

15:45 <whitequark> this is something you probably have in your ALU.

15:46 <whitequark> if you just run synthesis, the synthesizer will separately infer a chain of 3-input 4-LUTs for the adder, and then another chain of 3-input 4-LUTs for the mux.

15:46 <whitequark> but these 4-LUTs share one input and are chained, so it's possible to merge them into half as many LUTs, by moving the S input to the adder itself.

15:47 <whitequark> my LUT optimizer understands how to perform this transformation in the general case

15:48 <ali-as> I'm going to reread that again later till I understand it.

15:49 <whitequark> O <= S ? A+B : A;

15:50 <whitequark> think about how you'd implement this naively for 4-LUT architecture, and how it can be optimized.

15:51 <ali-as> (slightly embarresed) I can't understand that.

15:52 <ali-as> I'd interpret that as returning A+B if S is zero, A if S is 1, but it seems to be writing to 0 instead of a wire.

15:52 <ali-as> I am really bad at high level code.

15:53 <whitequark> that's O

15:53 <whitequark> not zero

15:53 <whitequark> bad choice of identifier, sorry.

15:53 <whitequark> R <= S ? A+B : A;

15:53 <whitequark> that's just Verilog.

15:54 <ali-as> Ok. R, A and B are busses, S is a single logic level?

16:04 <ali-as> If they are wires it's just SAB 000:0 001:1 010:1 011:1 100:0 101:0 110:1 111:1 in the LUT twice.

16:40 <whitequark> I don't understand.

16:40 <whitequark> where's carry?

16:43 <ali-as> I was a bit lost, you didn't say which were busses or how big they were ;)

16:43 <ali-as> You do mean + as in add and not OR.

16:49 <whitequark> right

16:50 <whitequark> A and B are say 16-bit busses.

16:50 <whitequark> it doesn't really matter how wide they are.

16:50 <whitequark> just think of a single unit of that adder/mux

16:52 <ali-as> CSAB R = 0000:0 0001:1 0010:1 0011:1 0100:0 0101:0 0110:1 0111:1 1000:1 1001:0 1010:0 1011:0 1100:0 1101:0 1110:1 1111:1 I think.

16:52 <ali-as> And we need a carry output for the next stage...

16:52 <whitequark> yeah, that's the single LUT solution.

16:52 <whitequark> but typical synthesis tools will infer a separate full adder LUT, and a separate mux LUT

16:53 <whitequark> (either that or using LUTs for generating carries, which is even worse)

16:56 <ali-as> I don't know how carry chains work in an FPGA. I'd have said CSAB C' = 0000:0 0001:0 0010:0 0011:1 0100:0 0101:0 0110:1 0111:0 1000:0 1001:1 1010:1 1011:1 1100:0 1101:0 1110:1 1111:1

17:01 <ali-as> I think I've made mistakes in my solution for R.

17:01 <ali-as> I think the last two bits should be 0.

17:01 <whitequark> that's fine, I would never compute a LUT by hand.

17:01 <whitequark> I'm far too lazy.

17:02 <whitequark> if it involves more than two bits I'd much rather have the computer do it, thank you very much

17:02 <whitequark> ali-as: oh by the way

17:02 <whitequark> I got distracted

17:03 <whitequark> ali-as: I have this command: https://paste.debian.net/1071235/

17:03 <whitequark> IR=11010 DR[96]

17:03 <whitequark> this is the register 0xB, FASTDATA.

17:04 <whitequark> has the correct size.

17:04 <whitequark> therefore BCM6348, which is what this router is, implements FASTDATA.

17:04 <whitequark> er, sorry, this is the register ALL.

17:05 <whitequark> FASTDATA is 0xE... and it's indeed missing.

17:05 <ali-as> ALL may not be the register I'm thinking of.

17:05 <ali-as> Yeah.

17:06 <ali-as> I think it was a special address in the EJTAG memory map.

17:06 <whitequark> ohhhhh, that's not FASTDATA then, that's FDC.

17:07 <whitequark> FDC has a set of dedicated FIFO registers in the MIPS accessible memory, FASTDATA does not

17:09 <whitequark> the FDC registers live in CDMM.

17:12 <ali-as> I'm not sure it was a FIFO. I'm not sure I have access to any of my code anymore.

17:12 <ali-as> It was also very badly documented at the time.

17:13 <ali-as> But it made a change from totally undocumented.

17:13 <ali-as> I think there was just a bit to enable it.

17:14 <ali-as> I do still have the STB, so I might wire it up and see.

17:30 <ali-as> bcm7038 based.

17:36 <gruetzkopf> i need to dig through my stack of DVB-S STBs and look for ST20, really

17:37 <whitequark> gruetzkopf: that's even more cursed than my stack of ADSL modems.

17:37 <gruetzkopf> (i kinda like writing transputer assembly, and doing it on a ISA bus ISDN card is getting ridiculous)

17:37 <_whitenotifier> [whitequark/Glasgow] whitequark pushed 2 commits to master [+0/-0/±2] https://git.io/fhxUF

17:37 <_whitenotifier> [whitequark/Glasgow] whitequark 30aa2d6 - cli: permute Python module names in logs for better readability.

17:37 <_whitenotifier> [whitequark/Glasgow] whitequark 47629ea - applet..jtag_pinout: adjust for jtag applet being renamed.

17:37 <gruetzkopf> (i mean, i could also implement the 16/32bit transputer serial interface, it's very simple)

17:37 <ali-as> ST20TP3, TP4, Sti55xx would be the SoC names.

17:38 <ali-as> It's a 32bit CPU, instructions are 8 bits long.

17:38 <ali-as> 4 bits opcode, 4 bits data, for primary ops.

17:39 <gruetzkopf> the OPERATE code is ridiculous :D

17:39 <gruetzkopf> you can technically chain 7 of them to shift in a 32bit instruction word :D

17:39 <_whitenotifier> [Glasgow] Success. The Travis CI build passed - https://travis-ci.org/whitequark/Glasgow/builds/501159668?utm_source=github_status&utm_medium=notification

17:40 <ali-as> It's a lession in if you had never seen a CPU before, how might someone smart design one.

17:45 <ali-as> Was the ISDN card thing for work or school gruetzkopf?

17:45 <ali-as> Transputers were big at universities here. Wiring them into sheets and diamond lattice topologies.

17:48 <gruetzkopf> no, this is me using a commercial "active" isdn card from AVM as a devboard years later

17:48 <ali-as> Cool.

17:48 <ali-as> The ST20 TPx family have a single OSLink.

17:49 <ali-as> For TP2 this is the only debug you get.

17:49 <ali-as> Reset and Analyse!!!!

17:50 <ali-as> 20MBit OSLink was my first successful FPGA project.

17:52 <gruetzkopf> is OSlink still 2-start-8-data-1stop and 2-bit-ack like T2xx/T4xx/T8xx used?

17:52 <gruetzkopf> on LinkOut and LinkIn pins

17:52 <ali-as> Yeah, HH76543210L

17:52 <ali-as> And a single H is ack.

17:53 <gruetzkopf> yup, same thing

17:55 <gruetzkopf> okay, this one has a pinheader labeled ejtag

17:56 <ali-as> ST20 TP3 debug is done through JTAG, you set the right IR, then enter shift DR and while in shift DR what you have is essentially a clocked OSLink with TDI, TDO and TCK. The protocol is he same, HH76543210L and H to ack.

17:56 <ali-as> EJTAG is MIPS.

17:56 <gruetzkopf> (and around 50% unpopulated components)

17:57 <ali-as> ST20TP3 also has a debug controller though.

17:57 <whitequark> gruetzkopf: can you send me something with OSLink?

17:57 <whitequark> or wait

17:57 <whitequark> do you have a glasgow yet?

17:57 <gruetzkopf> i have a B

17:58 <whitequark> that should be sufficient

17:58 <whitequark> i need to make Glasgow network transparent, then

17:58 <whitequark> hrm

17:58 <gruetzkopf> i don't know if i have ST20 (iirc jn has some)

17:58 <gruetzkopf> i *do* have old-style transputers

18:00 <ali-as> Actual OSLink whitequark or ST20 with JTAG debug?

18:02 <whitequark> either, but I guess ST20 would torture Glasgow more

18:02 <ali-as> :D

18:03 <ali-as> ST20 with a real OSLink, you would be talking 1996-1998 sort of era.

18:05 <ali-as> It's primitive debug, halt and look at memory.

18:05 <ali-as> But you can also boot the CPU through it.

18:06 <ali-as> The JTAG protocol is utterly awesome and has direct memory access while the CPU is running.

18:07 <whitequark> huh

18:09 <ali-as> You don't feed any instructions, you can poke a trap handler into memory if one does not exist, put the breakpoint in one of the BP regs, arm the breakpoint in the config register and just wait for the CPU to hit it.

18:11 <whitequark> yeah, like all reasonable designs...

18:11 <ali-as> Makes EJTAG and the ARM methods look like they were written before we discovered iron tools.

18:15 <ali-as> Want to read a memory location? Start by feeding in the opcode for LDR....