ChanServ changed the topic of #nmigen to: nMigen hardware description language · code at https://github.com/nmigen · logs at https://freenode.irclog.whitequark.org/nmigen · IRC meetings each Monday at 1800 UTC · next meeting November 23th
<_whitenotifier-f> [nmigen/nmigen] whitequark pushed 1 commit to master [+0/-0/±1] https://git.io/Jkw3r
<_whitenotifier-f> [nmigen/nmigen] whitequark f1473e4 - vendor.xilinx_spartan_3_6: fix typo.
<_whitenotifier-f> [nmigen] whitequark closed issue #549: XilinxSpartan6Platform and XilinxSpartan3APlatform support broken on commit 2f8669ca - https://git.io/Jkwmo
<_whitenotifier-f> [nmigen/nmigen] github-actions[bot] pushed 1 commit to gh-pages [+0/-0/±13] https://git.io/Jkw31
<_whitenotifier-f> [nmigen/nmigen] whitequark ea92010 - Deploying to gh-pages from @ f1473e483aae027ccaeebe4c9b476cc582bd309a 🚀
Asuu has quit [Quit: Konversation terminated!]
<lsneff> whitequark: I pulled down the cxxsim git branch, but I'm not sure how to add a verilog file when running a simulation. It's only possible to add a file from platform isn't None.
<whitequark> correct. it's not supported yet
<lsneff> I see
<lkcl> lsneff: vup's pointer to minerva is a valuable one. to give you some idea of the time-committment: minerva is 4,000 lines of nmigen, was a 2-man task, and took 1 year to complete. total time: 2 man-years of effort.
<lsneff> Ah, wow
<lsneff> Not a full core, but enough to run some basic code
<lkcl> ah yes that one would be somewhere equivalent to picorv32
<lkcl> with pretty much every extension switched off
<lkcl> if you were to do the same you stand a reasonable chance of writing something in about... 2 weeks
<lsneff> Yeah, I don't need much for this
<lsneff> I expect it would be longer, I'm very new to this
<lkcl> the ALU if you just include add, subtract, shift, should be well... not that much different from the alu_hier.py example!
<lsneff> Would one of you be willing to take a look at my code for embedding picorv32?
<whitequark> sure
<lsneff> Thanks. I think where I'm going wrong is the memory interface. https://gist.github.com/lachlansneff/e30fb4145b52059a348b811366798130
<whitequark> you definitely want to use a .write_port(granularity=8)
<whitequark> otherwise you are writing zeroes to all disabled lanes
Degi has quit [Ping timeout: 240 seconds]
Degi has joined #nmigen
<lsneff> Okay, I tried that, but I don't think it changed the generated verilog at all
<lsneff> Actually, looking at the top.debug.v output, it doesn't write to the memory at all
<whitequark> hm
<lsneff> But I don't think my program actually writes to memory at all, not even to the stack, so I think it should still work even without that
<lsneff> Here's the generated code for that module in particular: https://gist.github.com/lachlansneff/dc154e31ba551c92340ad01131f27e6b
<lsneff> I'm noticing there are very few non-blocking assignments
<whitequark> you haven't assigned write_port.en
<lsneff> Oh crap, I forgot about that
<lsneff> Do I need to do that for read_port.en too?
<whitequark> that one defaults to 1
<whitequark> anyway, if you follow my advice re: granularity, w_en will be 4 bits wide
<whitequark> and you could assign mem_Wstrb to it
<lsneff> Oh, huh, that's clever
<lsneff> And then I don't need all those ifs
<lsneff> It's still not doing anything, but I'm not sure which parts are broken and which aren't.
<whitequark> hmmm
<whitequark> what do you expect it to do?
<lsneff> Well, I'm adding a mapping to it, so that when I write to a specific memory location, it should turn on the led on my board
<whitequark> have you tried simulating it?
<whitequark> hang on
<whitequark> can you show me the mapping?
<lsneff> I haven't been able to figure out how to simulate it yet, if I can't get it working this way, I'll spend some quality time with verilator or cxxrtl
<lsneff> Sure
<lsneff> Here's the mapping, this is in my top.py file: https://gist.github.com/lachlansneff/edc1f1092e9378305ba56e6e141f2d44
<whitequark> right
<whitequark> so this sets `self.led` to whatever you wrote in your risc-v code, for exactly the duration of that instruction
<whitequark> and then it returns to the default 0 value
<lsneff> Wait, really? Okay, I guess I need to spend more time understanding signals
<lsneff> I would need something akin to a reg there, right?
<whitequark> yes
<whitequark> m.d.sync is... not explained in the manual because i switched to other, more pressing work
<lsneff> So, just changing that to sync instead of comb won't do it, right?
<whitequark> it would
<whitequark> it's not a good architecture for a variety of reasons, but it should work just fine in this example
<lsneff> So, still nothing happens after that, so I guess I'll go explore cxxrtl now.
<lsneff> What would be a better way of specifying mappings?
<lsneff> I wasn't sure how to do it, so I just did what I thought of first.
<whitequark> tbf i think the reason it doesn't work is that there are quite a few places where your code seems off by 1 cycle
<whitequark> for example, mem_ready and mem_rdata
<lsneff> Yeah, I wasn't sure how to delay those by one cycle
<whitequark> this needs a small digression
<whitequark> so, you know how we used to have address buses? like physical buses
<lsneff> Yep!
<whitequark> it would be a bunch of address lines going to allll the components. no muxes or anything on those
<whitequark> same for the data
<lsneff> Okay, yep
<whitequark> and you would only really manipulate strobes to select a device and so on
<lsneff> What do you mean by strobes here?
<whitequark> so, what you could do is to permanently drive read_port.addr and write_port.addr with mem_addr >> 2
<whitequark> to save some gates
<whitequark> and then only manipulate write_port.en to write (which picorv32 already conveniently does)
<lsneff> Oh, I see, so I'd drive them outside the if statments
<whitequark> for memory reads, this would mean that you have 2 cycles: first with valid & ~ready, second with valid & ready
<whitequark> on the first cycle the memory does the read, on the second one the read is presented to the CPU
<whitequark> for mappings, you do something similar with the data bus
<whitequark> always broadcast mem_wdata to all peripherals
<whitequark> and only enable the peripheral if the address decodes correctly
<lsneff> Okay, I think that makes sense
<lsneff> I am worried that that way is a tad more complex, which is something I'm struggling with anyhow
electronic_eel has quit [Ping timeout: 240 seconds]
electronic_eel has joined #nmigen
emeb has quit [Quit: Leaving.]
PyroPeter_ has joined #nmigen
PyroPeter has quit [Ping timeout: 260 seconds]
PyroPeter_ is now known as PyroPeter
emeb_mac has quit [Quit: Leaving.]
<lsneff> Hmm, I took a look at it again and it seems that `mem_valid` isn't being asserted, even though the docs for picorv32 say it should be during any memory read/write until I assert mem_ready.
<lsneff> Maybe I need to assert `resetn` for a cycle to boot it?
<lsneff> Actually, I'm having trouble figuring out what's going on with `resetn` in the example: https://github.com/cliffordwolf/picorv32/blob/master/scripts/icestorm/example.v
d1b2 has quit [Remote host closed the connection]
d1b21 has joined #nmigen
d1b21 is now known as d1b2
jeanthom has joined #nmigen
jeanthom has quit [Ping timeout: 256 seconds]
Asu has joined #nmigen
Asu has quit [Remote host closed the connection]
Asu has joined #nmigen
Asu has quit [Quit: Konversation terminated!]
Asu has joined #nmigen
Asuu has joined #nmigen
Asu has quit [Ping timeout: 260 seconds]
jeanthom has joined #nmigen
Asuu has quit [Ping timeout: 240 seconds]
Asuu has joined #nmigen
<cesar[m]> lsneff: It seems to hold resetn low for 255 cycles, then it stays high forever afterwards.
<cesar[m]> Line 11 seems to AND all the bits of resetn_counter, so resetn is high only when it reaches 255.
<cesar[m]> Line 14 freezes the counter when resetn goes high, so it keeps in that state forever.
<cesar[m]> resetn seems to be an active low signal. The "n" at the end of the name also suggests this.
Asuu has quit [Ping timeout: 246 seconds]
Asu has joined #nmigen
Asu has quit [Client Quit]
<d1b2> <Darius> presumably the CPU core needs reset to be at least some number of clocks long
<daveshah> I think this was probably also to deal with the iCE40 initialised BRAM bug
<daveshah> On any other FPGA, one cycle of resetn should be enough
<d1b2> <Darius> if only there was a comment explaining why 😄
Asu has joined #nmigen
jeanthom has quit [Ping timeout: 256 seconds]
Asu has quit [Quit: ZNC 1.7.5 - https://znc.in]
Asu has joined #nmigen
<_whitenotifier-f> [nmigen] anuejn opened issue #550: Attrs on DiffPairs only generate constraints for positive side - https://git.io/Jkrat
jeanthom has joined #nmigen
<_whitenotifier-f> [nmigen] whitequark commented on issue #550: Attrs on DiffPairs only generate constraints for positive side - https://git.io/JkrVz
_whitelogger has joined #nmigen
jeanthom has joined #nmigen
jeanthom has quit [Ping timeout: 260 seconds]
jeanthom has joined #nmigen
jeanthom has quit [Ping timeout: 240 seconds]
jeanthom has joined #nmigen
<_whitenotifier-f> [nmigen] anuejn commented on issue #550: Attrs on DiffPairs only generate constraints for positive side - https://git.io/Jkrxl
<_whitenotifier-f> [nmigen] whitequark commented on issue #550: Attrs on DiffPairs only generate constraints for positive side - https://git.io/JkrxX
<_whitenotifier-f> [nmigen] daveshah1 commented on issue #550: Attrs on DiffPairs only generate constraints for positive side - https://git.io/Jkrxy
<_whitenotifier-f> [nmigen] whitequark commented on issue #550: Attrs on DiffPairs only generate constraints for positive side - https://git.io/JkrpY
<_whitenotifier-f> [nmigen] daveshah1 commented on issue #550: Attrs on DiffPairs only generate constraints for positive side - https://git.io/JkrhG
<_whitenotifier-f> [nmigen] anuejn commented on issue #550: Attrs on DiffPairs only generate constraints for positive side - https://git.io/Jkrh7
anuejn has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
vup has quit [Quit: vup]
<_whitenotifier-f> [nmigen] anuejn commented on issue #550: Attrs on DiffPairs only generate constraints for positive side - https://git.io/JkrjU
<_whitenotifier-f> [nmigen] anuejn closed issue #550: Attrs on DiffPairs only generate constraints for positive side - https://git.io/Jkrat
anuejn has joined #nmigen
vup has joined #nmigen
<lkcl> lsneff, this should give you some clues, it is now litex establishes a foriegn connection to picorv32
<lkcl> if you're not doing exactly what litex is doing (which is known to work) then you are highly likely to be making your life difficult.
emeb has joined #nmigen
<whitequark> lsneff is not using wishbone; they're learning HDL and it's totally fine to not use known good patterns when learning
<lkcl> noticed it's direct memory access rather than wishbone, which i wasn't expecting
<Sarayan> when is wishbone useful?
<lkcl> means that you'll need to actually implement a basic version of the wishbone protocol, lsneff.
<lkcl> Sarayan: when there's enough code (peripherals) implementing that protocol for you to not want to duplicate all that code
<Sarayan> interesting, thanks
<whitequark> picorv32 does not use wishbone and so lsneff doesn't have to touch wishbone to get it to work
<Sarayan> it's a lingua-franca on inter-core communication within a fpga/asic?
<whitequark> Sarayan: it's something of a de facto standard in OSS. it's also not all that great, and it's being slowly replaced with AXI in newer projects
<lkcl> although, you probably meant, "what's special about wishbone itself" and the answer's "not a lot, per se, it's just the de-facto open hardware Bus Standard"
<whitequark> AXI is a lot more complex though so Wishbone is here to stay
<lkcl> whitequark: interesting.
<lkcl> are there any known nmigen implementations of AXI4?
<whitequark> I think some people worked with AXI
<whitequark> nmigen-soc will have AXI, of course
<lkcl> ah goood
<d1b2> <DX-MON> once I get back around to FPGA stuff, I'll probably end up contributing some AXI stuff as I have (albeit, old) experience with it and thanks to my Zynq boards, I have a platform to test it on
<vup> there is also the AXI-lite subset, which is a bit simpler and might be enough for many cases
<vup> anuejn and I also have a couple of AXI master cores @ https://github.com/apertus-open-source-cinema/nmigen-gateware
<lkcl> DX-MON: nice!
<lkcl> vup: nice. and DVI too, it looks like.
<vup> yeah
<Sarayan> For example, a 32-bit AXI bus requires roughly 164 separate wires to drive the slave, whereas the slave will respond with another 50 wires returned in response.
<Sarayan> fuck, taht's a lot
<d1b2> <DX-MON> I highly recomend AXI-Lite because for what we're doing, you don't need AXI's heavy-handed cache cohearency stuff.. one core in the system means AXI-Lite is perfectly good enough
<d1b2> <DX-MON> *for the bulk of
<lkcl> ah! ahd you found a USB3 PHY IC!
<vup> @DX-MON: well bursts can be nice, which is currently our main usecase for full AXI
<lkcl> i presume without the mad pinouts of the TI USB1301, and without the errata? :)
<vup> I have not looked at the TI USB1301, so no clue about that one
<vup> also the ft60x is a simple USB3 FIFO chip, so you can to do arbitrary stuff
<anuejn> the ft601 is not really a phy
<anuejn> it is rather really limited in what it can do
<d1b2> <DX-MON> fair point vup, though I'd suggest for the bulk of what people are doing, bust mode doesn't really offer a tangable benefit that outweighs the cost of full AXI
<d1b2> <DX-MON> *burst
<lkcl> vup: it's like an 80 pin interface, and implements USB3-PIPE... incorrectly
<lkcl> anuejn: ok, appreciated
<vup> @DX-MON: yes agreed
<anuejn> I just shot my foot hard because I was using the output domain of a pll as its input
<anuejn> that was... dumb
<anuejn> that was about a day worth of useless debugging
<daveshah> how did that behave? lock to some totally off frequency?
<anuejn> it didnt lock at all
<daveshah> that makes more sense
<anuejn> but it did generate a strongly variying high frequency
<anuejn> (i guess, i coundnt sample my counter fast enough to see what is going on)
jeanthom has quit [Ping timeout: 256 seconds]
ming__ has joined #nmigen
ming__ has joined #nmigen
jeanthom has joined #nmigen
jeanthom has quit [Ping timeout: 256 seconds]
<lsneff> Thank you for all the discussion. Waiting for 255 cycles, and then asserting `resetn` seems to have booted the softcore, at least it is accessing the memory now.
<whitequark> you could also do something like `i_resetn=~ResetSignal()`
<whitequark> nmigen already inculdes this workaround for ice40 BRAM
<whitequark> you simply need to hook it up to the core
<lsneff> Okay, that works too
jeanthom has joined #nmigen
<lsneff> I'm trying to tie the read and write ports to the cpu memory outputs combinatorially, not sure if I've done it right though.
<lsneff> Oh shit, nvm, I know why that's broken
<lsneff> nope, that didn't fix it
<_whitenotifier-f> [nmigen] anuejn synchronize pull request #547: vendor.lattice_machxo_2_3l: fix sdc generation - https://git.io/JkupM
<_whitenotifier-f> [nmigen] codecov[bot] edited a comment on pull request #547: vendor.lattice_machxo_2_3l: fix sdc generation - https://git.io/JkupN
<_whitenotifier-f> [nmigen] codecov[bot] edited a comment on pull request #547: vendor.lattice_machxo_2_3l: fix sdc generation - https://git.io/JkupN
<_whitenotifier-f> [nmigen] codecov[bot] edited a comment on pull request #547: vendor.lattice_machxo_2_3l: fix sdc generation - https://git.io/JkupN
<_whitenotifier-f> [nmigen] codecov[bot] edited a comment on pull request #547: vendor.lattice_machxo_2_3l: fix sdc generation - https://git.io/JkupN
<_whitenotifier-f> [nmigen] anuejn commented on pull request #547: vendor.lattice_machxo_2_3l: fix sdc generation - https://git.io/Jko0D
sorear has quit [Ping timeout: 260 seconds]
FFY00 has quit [Ping timeout: 260 seconds]
sorear has joined #nmigen
FFY00 has joined #nmigen
jeanthom has quit [Ping timeout: 264 seconds]
chipmuenk has joined #nmigen
jeanthom has joined #nmigen
<lsneff> Trying to simulate with iverilog, but it doesn't understand the SB_IO primitive in the generated verilog. whitequark: any ideas?
d1b2 has quit [*.net *.split]
feldim2425_ has quit [*.net *.split]
d1b2 has joined #nmigen
feldim2425_ has joined #nmigen
emeb_mac has joined #nmigen
<daveshah> lsneff: add the Yosys cells_sim.v to the list of files for iverilog
<daveshah> use 'yosys-config --datdir/ice40/cells_sim.v' to get the path if you want to be portable
<lsneff> After some debugging with icarus, I've gotten the cpu to work. Surprisingly, an issue I'm running into is endianness in the app binary.
<sorear> there are outstanding PRs against several toolchain projects for big-endian support
<Sarayan> Is there any modern cpu that is big-endian?
<sorear> depends on what you mean by "modern" and "is"
<daveshah> I think the POWER stuff still has a big-endian mode at least
<sorear> nearly all modern CPUs support big-endian data (x86 was one of the last to add this)
<lsneff> I'm fine with little-endian, the size of the app binary just isn't a multiple of 4 bytes, so extending it without messing up le is a little weird
<Sarayan> I mean is there anywhere where I could test modern mame's BE support because it's been so long I haven't seen a BE processor that there are probably issues?
<sorear> you can build a big endian system (either cross-compile, or try to find a distro that still does BE arm/mips/ppc), and either run it on qemu or under KVM on current arm hardware
<Sarayan> I wonder if at that point clang or gcc targets anything BE (we need c++17)
<Sarayan> not sure I can boot say a pi in BE
<sorear> morally speaking BEness is a property of ABIs, not processors, and you could do a BE toolchain for x86 if you had a few months to burn
<sorear> idk if you can boot a pi in BE but you can boot a pi and then run a BE OS using kvm
<sorear> the CPU is endian-agnostic but the devices aren't
<sorear> (I asked almost exactly this question to #musl a month or two ago, I'm telling you what I now know but much it it hasn't been tested)
<sorear> gcc/clang _do_ still support BE in the abstract (in particular there is no little endian abi for s390x), and I'm pretty sure they can still do BE for arm/mips/ppc
<Sarayan> swapping everything is terribly slow I suspect, and a pi-scale arm is not very fast in the first place
<sorear> arm/mips/ppc all support different endianness in user mode vs. kernel mode, but Linux does not support that; there might be other kernels that do
<Sarayan> if it's not processor-supported it's kind of a problem
<sorear> every recent arm supports both endiannnesses
<Sarayan> aarch64 removed per-process endianneess from the ABI though
<sorear> but the MMIO devices on the bus have a specific endianness and not all drivers can cope with device endian != whatever_EL1_endianness
<sorear> which is why you run this in a VM (requires swapping MMIO accesses but everything involving RAM is native)
<Sarayan> interesting
<sorear> SCTLR_EL1.EE
<Sarayan> I have a feeling it's going to take way more time that I should use on that though, I have a lot of other things with higher priority by far. Thanks a lot though :-)
chipmuenk has quit [Quit: chipmuenk]
chipmuenk has joined #nmigen
chipmuenk has quit [Client Quit]
<whitequark> remember that openssl had a port to big endian x86?
<sorear> not nearly as interesting without a gcc and linux port to big endian x86
<sorear> that means using movbe (or mov+bswap) instead of "movel"
<sorear> on-stack return addresses are still little endian but You Shouldn't Be Touching Those
<d1b2> <DX-MON> sorear: don't shout too loudly, you might just find it already exists in the form of a HPE hellchain for the HPE Non-Stop
<whitequark> they had a gcc port
<d1b2> <DX-MON> it's "big endian" (normal LE x86, but compiled for and executed as-if a BE architecture, including so many byte swaps to make it appear BE)
<sorear> you have my attention
<d1b2> <DX-MON> I mean.. it's a whole thing for their Non-Stop systems (also fault-tolerant like Stratus VOS)
<sorear> pity I (probably) don't have any reasonable way of running that
<d1b2> <DX-MON> but with how slowly it runs as a result of all those byte swaps and the whole OSS and Guardian OSes (one running atop the other on the same box, no virt)..
<d1b2> <DX-MON> yeah, you don't.. it's..
<d1b2> <DX-MON> very expensive
<d1b2> <DX-MON> and kinda supercrap
<d1b2> <DX-MON> but it exists
<sorear> modern cores can run amazingly shitty code with little if any measurably slowdown
<d1b2> <DX-MON> you'd hope, but not in this case.. this has a very noticable slowdown from all the byte swapping, both because of the cache misses it causes, and because it screws up prediction and pipelining
<d1b2> <DX-MON> you're getting pipeline stalls and resets every few instructions (5-10) rather than every several tens of instruction
<d1b2> <DX-MON> also, the compilers for it are marked HPE proprietary but are a walking GPL violation [shudder] and are based around essentially GCC 3.x
<sorear> i could buy increased i$ pressure but something has gone wrong if this is breaking pipelining
<sorear> (ok, it's also hurting load-to-use, which would be a fairly big deal)
<d1b2> <DX-MON> well, so.. there's just so much byte swapping going on to make it "work" that even register renaming can't help you so you fall into oldschool pipeline stalls waiting for data to become available from the pipeline
<d1b2> <DX-MON> ah, so my source for the compiler bit just informed me that I'm wrong, it's not GCC 3.x (though the linkers and debugger are based around an old Binutils), but rather Open64
<sorear> another fun thing you can do is negate all pointers, and store them in negated format
<d1b2> <DX-MON> (they think)
<sorear> which effectively amounts to using sub instead of add for struct/array accessing. will break close to everything but I'm pretty sure C allows it, since int-ptr casts are IDB :)
<d1b2> <DX-MON> that depends.. casts of (u)intptr_t to pointer are fully defined behaviour
<sorear> "The following type designates a signed integer type with the property that anyvalidpointer tovoidcan be converted to this type, then converted back to pointer tovoid,and the result will compare equal to the original pointer:"
<sorear> there's a lot of latitude there (have you seen what CHERI does)
<sorear> negating twice is a no-op, so that's fine
jeanthom has quit [Ping timeout: 264 seconds]
<lsneff> Hmm, executing a ret doesn't actually jump back in the simulation..