<_whitenotifier-f>
[nmigen] whitequark closed issue #549: XilinxSpartan6Platform and XilinxSpartan3APlatform support broken on commit 2f8669ca - https://git.io/Jkwmo
<_whitenotifier-f>
[nmigen/nmigen] github-actions[bot] pushed 1 commit to gh-pages [+0/-0/±13] https://git.io/Jkw31
<_whitenotifier-f>
[nmigen/nmigen] whitequark ea92010 - Deploying to gh-pages from @ f1473e483aae027ccaeebe4c9b476cc582bd309a 🚀
Asuu has quit [Quit: Konversation terminated!]
<lsneff>
whitequark: I pulled down the cxxsim git branch, but I'm not sure how to add a verilog file when running a simulation. It's only possible to add a file from platform isn't None.
<whitequark>
correct. it's not supported yet
<lsneff>
I see
<lkcl>
lsneff: vup's pointer to minerva is a valuable one. to give you some idea of the time-committment: minerva is 4,000 lines of nmigen, was a 2-man task, and took 1 year to complete. total time: 2 man-years of effort.
<lsneff>
It's still not doing anything, but I'm not sure which parts are broken and which aren't.
<whitequark>
hmmm
<whitequark>
what do you expect it to do?
<lsneff>
Well, I'm adding a mapping to it, so that when I write to a specific memory location, it should turn on the led on my board
<whitequark>
have you tried simulating it?
<whitequark>
hang on
<whitequark>
can you show me the mapping?
<lsneff>
I haven't been able to figure out how to simulate it yet, if I can't get it working this way, I'll spend some quality time with verilator or cxxrtl
<whitequark>
m.d.sync is... not explained in the manual because i switched to other, more pressing work
<lsneff>
So, just changing that to sync instead of comb won't do it, right?
<whitequark>
it would
<whitequark>
it's not a good architecture for a variety of reasons, but it should work just fine in this example
<lsneff>
So, still nothing happens after that, so I guess I'll go explore cxxrtl now.
<lsneff>
What would be a better way of specifying mappings?
<lsneff>
I wasn't sure how to do it, so I just did what I thought of first.
<whitequark>
tbf i think the reason it doesn't work is that there are quite a few places where your code seems off by 1 cycle
<whitequark>
for example, mem_ready and mem_rdata
<lsneff>
Yeah, I wasn't sure how to delay those by one cycle
<whitequark>
this needs a small digression
<whitequark>
so, you know how we used to have address buses? like physical buses
<lsneff>
Yep!
<whitequark>
it would be a bunch of address lines going to allll the components. no muxes or anything on those
<whitequark>
same for the data
<lsneff>
Okay, yep
<whitequark>
and you would only really manipulate strobes to select a device and so on
<lsneff>
What do you mean by strobes here?
<whitequark>
so, what you could do is to permanently drive read_port.addr and write_port.addr with mem_addr >> 2
<whitequark>
to save some gates
<whitequark>
and then only manipulate write_port.en to write (which picorv32 already conveniently does)
<lsneff>
Oh, I see, so I'd drive them outside the if statments
<whitequark>
for memory reads, this would mean that you have 2 cycles: first with valid & ~ready, second with valid & ready
<whitequark>
on the first cycle the memory does the read, on the second one the read is presented to the CPU
<whitequark>
for mappings, you do something similar with the data bus
<whitequark>
always broadcast mem_wdata to all peripherals
<whitequark>
and only enable the peripheral if the address decodes correctly
<lsneff>
Okay, I think that makes sense
<lsneff>
I am worried that that way is a tad more complex, which is something I'm struggling with anyhow
electronic_eel has quit [Ping timeout: 240 seconds]
electronic_eel has joined #nmigen
emeb has quit [Quit: Leaving.]
PyroPeter_ has joined #nmigen
PyroPeter has quit [Ping timeout: 260 seconds]
PyroPeter_ is now known as PyroPeter
emeb_mac has quit [Quit: Leaving.]
<lsneff>
Hmm, I took a look at it again and it seems that `mem_valid` isn't being asserted, even though the docs for picorv32 say it should be during any memory read/write until I assert mem_ready.
<lsneff>
Maybe I need to assert `resetn` for a cycle to boot it?
<lkcl>
noticed it's direct memory access rather than wishbone, which i wasn't expecting
<Sarayan>
when is wishbone useful?
<lkcl>
means that you'll need to actually implement a basic version of the wishbone protocol, lsneff.
<lkcl>
Sarayan: when there's enough code (peripherals) implementing that protocol for you to not want to duplicate all that code
<Sarayan>
interesting, thanks
<whitequark>
picorv32 does not use wishbone and so lsneff doesn't have to touch wishbone to get it to work
<Sarayan>
it's a lingua-franca on inter-core communication within a fpga/asic?
<whitequark>
Sarayan: it's something of a de facto standard in OSS. it's also not all that great, and it's being slowly replaced with AXI in newer projects
<lkcl>
although, you probably meant, "what's special about wishbone itself" and the answer's "not a lot, per se, it's just the de-facto open hardware Bus Standard"
<whitequark>
AXI is a lot more complex though so Wishbone is here to stay
<lkcl>
whitequark: interesting.
<lkcl>
are there any known nmigen implementations of AXI4?
<whitequark>
I think some people worked with AXI
<whitequark>
nmigen-soc will have AXI, of course
<lkcl>
ah goood
<d1b2>
<DX-MON> once I get back around to FPGA stuff, I'll probably end up contributing some AXI stuff as I have (albeit, old) experience with it and thanks to my Zynq boards, I have a platform to test it on
<vup>
there is also the AXI-lite subset, which is a bit simpler and might be enough for many cases
<Sarayan>
For example, a 32-bit AXI bus requires roughly 164 separate wires to drive the slave, whereas the slave will respond with another 50 wires returned in response.
<Sarayan>
fuck, taht's a lot
<d1b2>
<DX-MON> I highly recomend AXI-Lite because for what we're doing, you don't need AXI's heavy-handed cache cohearency stuff.. one core in the system means AXI-Lite is perfectly good enough
<d1b2>
<DX-MON> *for the bulk of
<lkcl>
ah! ahd you found a USB3 PHY IC!
<vup>
@DX-MON: well bursts can be nice, which is currently our main usecase for full AXI
<lkcl>
i presume without the mad pinouts of the TI USB1301, and without the errata? :)
<vup>
I have not looked at the TI USB1301, so no clue about that one
<vup>
also the ft60x is a simple USB3 FIFO chip, so you can to do arbitrary stuff
<anuejn>
the ft601 is not really a phy
<anuejn>
it is rather really limited in what it can do
<d1b2>
<DX-MON> fair point vup, though I'd suggest for the bulk of what people are doing, bust mode doesn't really offer a tangable benefit that outweighs the cost of full AXI
<d1b2>
<DX-MON> *burst
<lkcl>
vup: it's like an 80 pin interface, and implements USB3-PIPE... incorrectly
<lkcl>
anuejn: ok, appreciated
<vup>
@DX-MON: yes agreed
<anuejn>
I just shot my foot hard because I was using the output domain of a pll as its input
<anuejn>
that was... dumb
<anuejn>
that was about a day worth of useless debugging
<daveshah>
how did that behave? lock to some totally off frequency?
<anuejn>
it didnt lock at all
<daveshah>
that makes more sense
<anuejn>
but it did generate a strongly variying high frequency
<anuejn>
(i guess, i coundnt sample my counter fast enough to see what is going on)
jeanthom has quit [Ping timeout: 256 seconds]
ming__ has joined #nmigen
ming__ has joined #nmigen
jeanthom has joined #nmigen
jeanthom has quit [Ping timeout: 256 seconds]
<lsneff>
Thank you for all the discussion. Waiting for 255 cycles, and then asserting `resetn` seems to have booted the softcore, at least it is accessing the memory now.
<whitequark>
you could also do something like `i_resetn=~ResetSignal()`
<whitequark>
nmigen already inculdes this workaround for ice40 BRAM
<whitequark>
you simply need to hook it up to the core
<Sarayan>
Is there any modern cpu that is big-endian?
<sorear>
depends on what you mean by "modern" and "is"
<daveshah>
I think the POWER stuff still has a big-endian mode at least
<sorear>
nearly all modern CPUs support big-endian data (x86 was one of the last to add this)
<lsneff>
I'm fine with little-endian, the size of the app binary just isn't a multiple of 4 bytes, so extending it without messing up le is a little weird
<Sarayan>
I mean is there anywhere where I could test modern mame's BE support because it's been so long I haven't seen a BE processor that there are probably issues?
<sorear>
you can build a big endian system (either cross-compile, or try to find a distro that still does BE arm/mips/ppc), and either run it on qemu or under KVM on current arm hardware
<Sarayan>
I wonder if at that point clang or gcc targets anything BE (we need c++17)
<Sarayan>
not sure I can boot say a pi in BE
<sorear>
morally speaking BEness is a property of ABIs, not processors, and you could do a BE toolchain for x86 if you had a few months to burn
<sorear>
idk if you can boot a pi in BE but you can boot a pi and then run a BE OS using kvm
<sorear>
the CPU is endian-agnostic but the devices aren't
<sorear>
(I asked almost exactly this question to #musl a month or two ago, I'm telling you what I now know but much it it hasn't been tested)
<sorear>
gcc/clang _do_ still support BE in the abstract (in particular there is no little endian abi for s390x), and I'm pretty sure they can still do BE for arm/mips/ppc
<Sarayan>
swapping everything is terribly slow I suspect, and a pi-scale arm is not very fast in the first place
<sorear>
arm/mips/ppc all support different endianness in user mode vs. kernel mode, but Linux does not support that; there might be other kernels that do
<Sarayan>
if it's not processor-supported it's kind of a problem
<sorear>
every recent arm supports both endiannnesses
<Sarayan>
aarch64 removed per-process endianneess from the ABI though
<sorear>
but the MMIO devices on the bus have a specific endianness and not all drivers can cope with device endian != whatever_EL1_endianness
<sorear>
which is why you run this in a VM (requires swapping MMIO accesses but everything involving RAM is native)
<Sarayan>
interesting
<sorear>
SCTLR_EL1.EE
<Sarayan>
I have a feeling it's going to take way more time that I should use on that though, I have a lot of other things with higher priority by far. Thanks a lot though :-)
chipmuenk has quit [Quit: chipmuenk]
chipmuenk has joined #nmigen
chipmuenk has quit [Client Quit]
<whitequark>
remember that openssl had a port to big endian x86?
<sorear>
not nearly as interesting without a gcc and linux port to big endian x86
<sorear>
that means using movbe (or mov+bswap) instead of "movel"
<sorear>
on-stack return addresses are still little endian but You Shouldn't Be Touching Those
<d1b2>
<DX-MON> sorear: don't shout too loudly, you might just find it already exists in the form of a HPE hellchain for the HPE Non-Stop
<whitequark>
they had a gcc port
<d1b2>
<DX-MON> it's "big endian" (normal LE x86, but compiled for and executed as-if a BE architecture, including so many byte swaps to make it appear BE)
<sorear>
you have my attention
<d1b2>
<DX-MON> I mean.. it's a whole thing for their Non-Stop systems (also fault-tolerant like Stratus VOS)
<sorear>
pity I (probably) don't have any reasonable way of running that
<d1b2>
<DX-MON> but with how slowly it runs as a result of all those byte swaps and the whole OSS and Guardian OSes (one running atop the other on the same box, no virt)..
<d1b2>
<DX-MON> yeah, you don't.. it's..
<d1b2>
<DX-MON> very expensive
<d1b2>
<DX-MON> and kinda supercrap
<d1b2>
<DX-MON> but it exists
<sorear>
modern cores can run amazingly shitty code with little if any measurably slowdown
<d1b2>
<DX-MON> you'd hope, but not in this case.. this has a very noticable slowdown from all the byte swapping, both because of the cache misses it causes, and because it screws up prediction and pipelining
<d1b2>
<DX-MON> you're getting pipeline stalls and resets every few instructions (5-10) rather than every several tens of instruction
<d1b2>
<DX-MON> also, the compilers for it are marked HPE proprietary but are a walking GPL violation [shudder] and are based around essentially GCC 3.x
<sorear>
i could buy increased i$ pressure but something has gone wrong if this is breaking pipelining
<sorear>
(ok, it's also hurting load-to-use, which would be a fairly big deal)
<d1b2>
<DX-MON> well, so.. there's just so much byte swapping going on to make it "work" that even register renaming can't help you so you fall into oldschool pipeline stalls waiting for data to become available from the pipeline
<d1b2>
<DX-MON> ah, so my source for the compiler bit just informed me that I'm wrong, it's not GCC 3.x (though the linkers and debugger are based around an old Binutils), but rather Open64
<sorear>
another fun thing you can do is negate all pointers, and store them in negated format
<d1b2>
<DX-MON> (they think)
<sorear>
which effectively amounts to using sub instead of add for struct/array accessing. will break close to everything but I'm pretty sure C allows it, since int-ptr casts are IDB :)
<d1b2>
<DX-MON> that depends.. casts of (u)intptr_t to pointer are fully defined behaviour
<sorear>
"The following type designates a signed integer type with the property that anyvalidpointer tovoidcan be converted to this type, then converted back to pointer tovoid,and the result will compare equal to the original pointer:"
<sorear>
there's a lot of latitude there (have you seen what CHERI does)
<sorear>
negating twice is a no-op, so that's fine
jeanthom has quit [Ping timeout: 264 seconds]
<lsneff>
Hmm, executing a ret doesn't actually jump back in the simulation..