#nmigen on 2020-11-22 — irc logs at freenode.irclog.whitequark.org

2020-11-16 20:55 ChanServ changed the topic of #nmigen to: nMigen hardware description language · code at https://github.com/nmigen · logs at https://freenode.irclog.whitequark.org/nmigen · IRC meetings each Monday at 1800 UTC · next meeting November 23th

00:16 <_whitenotifier-f> [nmigen/nmigen] whitequark pushed 1 commit to master [+0/-0/±1] https://git.io/Jkw3r

00:16 <_whitenotifier-f> [nmigen/nmigen] whitequark f1473e4 - vendor.xilinx_spartan_3_6: fix typo.

00:16 <_whitenotifier-f> [nmigen] whitequark closed issue #549: XilinxSpartan6Platform and XilinxSpartan3APlatform support broken on commit 2f8669ca - https://git.io/Jkwmo

00:17 <_whitenotifier-f> [nmigen/nmigen] github-actions[bot] pushed 1 commit to gh-pages [+0/-0/±13] https://git.io/Jkw31

00:17 <_whitenotifier-f> [nmigen/nmigen] whitequark ea92010 - Deploying to gh-pages from @ f1473e483aae027ccaeebe4c9b476cc582bd309a 🚀

00:24 Asuu has quit [Quit: Konversation terminated!]

00:47 <lsneff> whitequark: I pulled down the cxxsim git branch, but I'm not sure how to add a verilog file when running a simulation. It's only possible to add a file from platform isn't None.

00:48 <whitequark> correct. it's not supported yet

01:02 <lsneff> I see

01:05 <lkcl> lsneff: vup's pointer to minerva is a valuable one. to give you some idea of the time-committment: minerva is 4,000 lines of nmigen, was a 2-man task, and took 1 year to complete. total time: 2 man-years of effort.

01:05 <lsneff> Ah, wow

01:05 <lsneff> I saw this article: https://vivonomicon.com/2020/06/13/lets-write-a-minimal-risc-v-cpu-in-nmigen/

01:05 <lsneff> Not a full core, but enough to run some basic code

01:06 <lkcl> ah yes that one would be somewhere equivalent to picorv32

01:06 <lkcl> with pretty much every extension switched off

01:08 <lkcl> if you were to do the same you stand a reasonable chance of writing something in about... 2 weeks

01:08 <lsneff> Yeah, I don't need much for this

01:08 <lsneff> I expect it would be longer, I'm very new to this

01:09 <lkcl> the ALU if you just include add, subtract, shift, should be well... not that much different from the alu_hier.py example!

01:20 <lsneff> Would one of you be willing to take a look at my code for embedding picorv32?

01:23 <whitequark> sure

01:24 <lsneff> Thanks. I think where I'm going wrong is the memory interface. https://gist.github.com/lachlansneff/e30fb4145b52059a348b811366798130

01:24 <lsneff> Here's the documentation: https://github.com/cliffordwolf/picorv32#picorv32-native-memory-interface

01:26 <lsneff> And here's what I'm basing it on: https://github.com/cliffordwolf/picorv32/blob/master/scripts/icestorm/example.v

01:56 <whitequark> you definitely want to use a .write_port(granularity=8)

01:56 <whitequark> otherwise you are writing zeroes to all disabled lanes

02:00 Degi has quit [Ping timeout: 240 seconds]

02:02 Degi has joined #nmigen

02:02 <lsneff> Okay, I tried that, but I don't think it changed the generated verilog at all

02:03 <lsneff> Actually, looking at the top.debug.v output, it doesn't write to the memory at all

02:07 <whitequark> hm

02:09 <lsneff> But I don't think my program actually writes to memory at all, not even to the stack, so I think it should still work even without that

02:11 <lsneff> Here's the generated code for that module in particular: https://gist.github.com/lachlansneff/dc154e31ba551c92340ad01131f27e6b

02:12 <lsneff> I'm noticing there are very few non-blocking assignments

02:13 <whitequark> you haven't assigned write_port.en

02:13 <lsneff> Oh crap, I forgot about that

02:13 <lsneff> Do I need to do that for read_port.en too?

02:13 <whitequark> that one defaults to 1

02:14 <whitequark> anyway, if you follow my advice re: granularity, w_en will be 4 bits wide

02:14 <whitequark> and you could assign mem_Wstrb to it

02:14 <lsneff> Oh, huh, that's clever

02:14 <lsneff> And then I don't need all those ifs

02:21 <lsneff> Okay, I've updated it: https://gist.github.com/lachlansneff/e30fb4145b52059a348b811366798130

02:22 <lsneff> It's still not doing anything, but I'm not sure which parts are broken and which aren't.

02:22 <whitequark> hmmm

02:25 <whitequark> what do you expect it to do?

02:25 <lsneff> Well, I'm adding a mapping to it, so that when I write to a specific memory location, it should turn on the led on my board

02:26 <whitequark> have you tried simulating it?

02:26 <whitequark> hang on

02:26 <whitequark> can you show me the mapping?

02:27 <lsneff> I haven't been able to figure out how to simulate it yet, if I can't get it working this way, I'll spend some quality time with verilator or cxxrtl

02:27 <lsneff> Sure

02:28 <lsneff> Here's the mapping, this is in my top.py file: https://gist.github.com/lachlansneff/edc1f1092e9378305ba56e6e141f2d44

02:28 <whitequark> right

02:28 <whitequark> so this sets `self.led` to whatever you wrote in your risc-v code, for exactly the duration of that instruction

02:28 <whitequark> and then it returns to the default 0 value

02:29 <whitequark> because of https://gist.github.com/lachlansneff/e30fb4145b52059a348b811366798130#file-picorv32-py-L84

02:29 <lsneff> Wait, really? Okay, I guess I need to spend more time understanding signals

02:29 <lsneff> I would need something akin to a reg there, right?

02:29 <whitequark> yes

02:29 <whitequark> m.d.comb is explained in https://nmigen.info/nmigen/latest/lang.html#combinatorial-evaluation

02:30 <whitequark> m.d.sync is... not explained in the manual because i switched to other, more pressing work

02:32 <lsneff> So, just changing that to sync instead of comb won't do it, right?

02:32 <whitequark> it would

02:32 <whitequark> it's not a good architecture for a variety of reasons, but it should work just fine in this example

02:33 <lsneff> So, still nothing happens after that, so I guess I'll go explore cxxrtl now.

02:33 <lsneff> What would be a better way of specifying mappings?

02:34 <lsneff> I wasn't sure how to do it, so I just did what I thought of first.

02:34 <whitequark> tbf i think the reason it doesn't work is that there are quite a few places where your code seems off by 1 cycle

02:34 <whitequark> for example, mem_ready and mem_rdata

02:35 <lsneff> Yeah, I wasn't sure how to delay those by one cycle

02:35 <whitequark> this needs a small digression

02:35 <whitequark> so, you know how we used to have address buses? like physical buses

02:35 <lsneff> Yep!

02:35 <whitequark> it would be a bunch of address lines going to allll the components. no muxes or anything on those

02:35 <whitequark> same for the data

02:36 <lsneff> Okay, yep

02:36 <whitequark> and you would only really manipulate strobes to select a device and so on

02:36 <lsneff> What do you mean by strobes here?

02:36 <whitequark> so, what you could do is to permanently drive read_port.addr and write_port.addr with mem_addr >> 2

02:36 <whitequark> to save some gates

02:36 <whitequark> and then only manipulate write_port.en to write (which picorv32 already conveniently does)

02:37 <lsneff> Oh, I see, so I'd drive them outside the if statments

02:37 <whitequark> for memory reads, this would mean that you have 2 cycles: first with valid & ~ready, second with valid & ready

02:37 <whitequark> on the first cycle the memory does the read, on the second one the read is presented to the CPU

02:38 <whitequark> for mappings, you do something similar with the data bus

02:38 <whitequark> always broadcast mem_wdata to all peripherals

02:38 <whitequark> and only enable the peripheral if the address decodes correctly

02:39 <lsneff> Okay, I think that makes sense

02:57 <lsneff> I am worried that that way is a tad more complex, which is something I'm struggling with anyhow

04:21 electronic_eel has quit [Ping timeout: 240 seconds]

04:22 electronic_eel has joined #nmigen

04:23 emeb has quit [Quit: Leaving.]

04:32 PyroPeter_ has joined #nmigen

04:35 PyroPeter has quit [Ping timeout: 260 seconds]

04:35 PyroPeter_ is now known as PyroPeter

06:43 emeb_mac has quit [Quit: Leaving.]

07:16 <lsneff> Hmm, I took a look at it again and it seems that `mem_valid` isn't being asserted, even though the docs for picorv32 say it should be during any memory read/write until I assert mem_ready.

07:35 <lsneff> Maybe I need to assert `resetn` for a cycle to boot it?

07:51 <lsneff> Actually, I'm having trouble figuring out what's going on with `resetn` in the example: https://github.com/cliffordwolf/picorv32/blob/master/scripts/icestorm/example.v

08:02 d1b2 has quit [Remote host closed the connection]

08:02 d1b21 has joined #nmigen

08:03 d1b21 is now known as d1b2

08:11 jeanthom has joined #nmigen

08:50 jeanthom has quit [Ping timeout: 256 seconds]

09:25 Asu has joined #nmigen

09:34 Asu has quit [Remote host closed the connection]

09:34 Asu has joined #nmigen

09:40 Asu has quit [Quit: Konversation terminated!]

09:40 Asu has joined #nmigen

09:54 Asuu has joined #nmigen

09:55 Asu has quit [Ping timeout: 260 seconds]

10:08 jeanthom has joined #nmigen

10:09 Asuu has quit [Ping timeout: 240 seconds]

10:09 Asuu has joined #nmigen

10:39 <cesar[m]> lsneff: It seems to hold resetn low for 255 cycles, then it stays high forever afterwards.

10:41 <cesar[m]> Line 11 seems to AND all the bits of resetn_counter, so resetn is high only when it reaches 255.

10:43 <cesar[m]> Line 14 freezes the counter when resetn goes high, so it keeps in that state forever.

10:45 <cesar[m]> resetn seems to be an active low signal. The "n" at the end of the name also suggests this.

11:07 Asuu has quit [Ping timeout: 246 seconds]

11:07 Asu has joined #nmigen

11:11 Asu has quit [Client Quit]

11:31 <d1b2> <Darius> presumably the CPU core needs reset to be at least some number of clocks long

11:31 <daveshah> I think this was probably also to deal with the iCE40 initialised BRAM bug

11:32 <daveshah> On any other FPGA, one cycle of resetn should be enough

11:32 <d1b2> <Darius> if only there was a comment explaining why 😄

11:32 Asu has joined #nmigen

11:36 jeanthom has quit [Ping timeout: 256 seconds]

11:55 Asu has quit [Quit: ZNC 1.7.5 - https://znc.in]

11:56 Asu has joined #nmigen

12:06 <_whitenotifier-f> [nmigen] anuejn opened issue #550: Attrs on DiffPairs only generate constraints for positive side - https://git.io/Jkrat

12:14 jeanthom has joined #nmigen

12:17 <_whitenotifier-f> [nmigen] whitequark commented on issue #550: Attrs on DiffPairs only generate constraints for positive side - https://git.io/JkrVz

12:23 _whitelogger has joined #nmigen

12:37 jeanthom has joined #nmigen

13:10 jeanthom has quit [Ping timeout: 260 seconds]

14:11 jeanthom has joined #nmigen

14:35 jeanthom has quit [Ping timeout: 240 seconds]

14:36 jeanthom has joined #nmigen

14:55 <_whitenotifier-f> [nmigen] anuejn commented on issue #550: Attrs on DiffPairs only generate constraints for positive side - https://git.io/Jkrxl

14:57 <_whitenotifier-f> [nmigen] whitequark commented on issue #550: Attrs on DiffPairs only generate constraints for positive side - https://git.io/JkrxX

14:57 <_whitenotifier-f> [nmigen] daveshah1 commented on issue #550: Attrs on DiffPairs only generate constraints for positive side - https://git.io/Jkrxy

15:00 <_whitenotifier-f> [nmigen] whitequark commented on issue #550: Attrs on DiffPairs only generate constraints for positive side - https://git.io/JkrpY

15:07 <_whitenotifier-f> [nmigen] daveshah1 commented on issue #550: Attrs on DiffPairs only generate constraints for positive side - https://git.io/JkrhG

15:12 <_whitenotifier-f> [nmigen] anuejn commented on issue #550: Attrs on DiffPairs only generate constraints for positive side - https://git.io/Jkrh7

15:13 anuejn has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]

15:13 vup has quit [Quit: vup]

15:13 <_whitenotifier-f> [nmigen] anuejn commented on issue #550: Attrs on DiffPairs only generate constraints for positive side - https://git.io/JkrjU

15:13 <_whitenotifier-f> [nmigen] anuejn closed issue #550: Attrs on DiffPairs only generate constraints for positive side - https://git.io/Jkrat

15:13 anuejn has joined #nmigen

15:13 vup has joined #nmigen

15:38 <lkcl> lsneff, this should give you some clues, it is now litex establishes a foriegn connection to picorv32

15:38 <lkcl> https://github.com/enjoy-digital/litex/blob/master/litex/soc/cores/cpu/picorv32/core.py

15:39 <lkcl> if you're not doing exactly what litex is doing (which is known to work) then you are highly likely to be making your life difficult.

15:40 emeb has joined #nmigen

15:44 <whitequark> lsneff is not using wishbone; they're learning HDL and it's totally fine to not use known good patterns when learning

15:45 * lkcl just reviewing the code-fragment https://gist.github.com/lachlansneff/e30fb4145b52059a348b811366798130

15:46 <lkcl> noticed it's direct memory access rather than wishbone, which i wasn't expecting

15:46 <Sarayan> when is wishbone useful?

15:46 <lkcl> means that you'll need to actually implement a basic version of the wishbone protocol, lsneff.

15:47 <lkcl> Sarayan: when there's enough code (peripherals) implementing that protocol for you to not want to duplicate all that code

15:47 <Sarayan> interesting, thanks

15:48 <whitequark> picorv32 does not use wishbone and so lsneff doesn't have to touch wishbone to get it to work

15:48 <Sarayan> it's a lingua-franca on inter-core communication within a fpga/asic?

15:48 <whitequark> Sarayan: it's something of a de facto standard in OSS. it's also not all that great, and it's being slowly replaced with AXI in newer projects

15:49 <lkcl> although, you probably meant, "what's special about wishbone itself" and the answer's "not a lot, per se, it's just the de-facto open hardware Bus Standard"

15:49 <whitequark> AXI is a lot more complex though so Wishbone is here to stay

15:49 <lkcl> whitequark: interesting.

15:49 <lkcl> are there any known nmigen implementations of AXI4?

15:49 <whitequark> I think some people worked with AXI

15:49 <whitequark> nmigen-soc will have AXI, of course

15:50 <lkcl> ah goood

15:55 <d1b2> <DX-MON> once I get back around to FPGA stuff, I'll probably end up contributing some AXI stuff as I have (albeit, old) experience with it and thanks to my Zynq boards, I have a platform to test it on

15:56 <vup> there is also the AXI-lite subset, which is a bit simpler and might be enough for many cases

15:57 <vup> anuejn and I also have a couple of AXI master cores @ https://github.com/apertus-open-source-cinema/nmigen-gateware

15:57 <lkcl> DX-MON: nice!

15:58 <lkcl> vup: nice. and DVI too, it looks like.

15:58 <vup> yeah

15:58 <Sarayan> For example, a 32-bit AXI bus requires roughly 164 separate wires to drive the slave, whereas the slave will respond with another 50 wires returned in response.

15:58 <Sarayan> fuck, taht's a lot

15:58 <d1b2> <DX-MON> I highly recomend AXI-Lite because for what we're doing, you don't need AXI's heavy-handed cache cohearency stuff.. one core in the system means AXI-Lite is perfectly good enough

15:58 <d1b2> <DX-MON> *for the bulk of

15:58 <lkcl> ah! ahd you found a USB3 PHY IC!

15:59 <vup> @DX-MON: well bursts can be nice, which is currently our main usecase for full AXI

15:59 <lkcl> i presume without the mad pinouts of the TI USB1301, and without the errata? :)

16:00 <vup> I have not looked at the TI USB1301, so no clue about that one

16:00 <vup> also the ft60x is a simple USB3 FIFO chip, so you can to do arbitrary stuff

16:00 <anuejn> the ft601 is not really a phy

16:00 <anuejn> it is rather really limited in what it can do

16:00 <d1b2> <DX-MON> fair point vup, though I'd suggest for the bulk of what people are doing, bust mode doesn't really offer a tangable benefit that outweighs the cost of full AXI

16:01 <d1b2> <DX-MON> *burst

16:01 <lkcl> vup: it's like an 80 pin interface, and implements USB3-PIPE... incorrectly

16:01 <lkcl> anuejn: ok, appreciated

16:01 <vup> @DX-MON: yes agreed

16:10 <anuejn> I just shot my foot hard because I was using the output domain of a pll as its input

16:10 <anuejn> that was... dumb

16:11 <anuejn> that was about a day worth of useless debugging

16:13 <daveshah> how did that behave? lock to some totally off frequency?

16:14 <anuejn> it didnt lock at all

16:14 <daveshah> that makes more sense

16:14 <anuejn> but it did generate a strongly variying high frequency

16:15 <anuejn> (i guess, i coundnt sample my counter fast enough to see what is going on)

16:39 jeanthom has quit [Ping timeout: 256 seconds]

17:15 ming__ has joined #nmigen

17:16 ming__ has joined #nmigen

17:30 jeanthom has joined #nmigen

18:06 jeanthom has quit [Ping timeout: 256 seconds]

18:29 <lsneff> Thank you for all the discussion. Waiting for 255 cycles, and then asserting `resetn` seems to have booted the softcore, at least it is accessing the memory now.

18:38 <whitequark> you could also do something like `i_resetn=~ResetSignal()`

18:38 <whitequark> nmigen already inculdes this workaround for ice40 BRAM

18:38 <whitequark> you simply need to hook it up to the core

18:42 <lsneff> Okay, that works too

18:49 jeanthom has joined #nmigen

18:53 <lsneff> Here's what I've gotten to so far: https://gist.github.com/lachlansneff/e30fb4145b52059a348b811366798130

18:53 <lsneff> I'm trying to tie the read and write ports to the cpu memory outputs combinatorially, not sure if I've done it right though.

18:53 <lsneff> Oh shit, nvm, I know why that's broken

18:55 <lsneff> nope, that didn't fix it

19:00 <_whitenotifier-f> [nmigen] anuejn synchronize pull request #547: vendor.lattice_machxo_2_3l: fix sdc generation - https://git.io/JkupM

19:00 <_whitenotifier-f> [nmigen] codecov[bot] edited a comment on pull request #547: vendor.lattice_machxo_2_3l: fix sdc generation - https://git.io/JkupN

19:02 <_whitenotifier-f> [nmigen] codecov[bot] edited a comment on pull request #547: vendor.lattice_machxo_2_3l: fix sdc generation - https://git.io/JkupN

19:03 <_whitenotifier-f> [nmigen] codecov[bot] edited a comment on pull request #547: vendor.lattice_machxo_2_3l: fix sdc generation - https://git.io/JkupN

19:09 <_whitenotifier-f> [nmigen] anuejn commented on pull request #547: vendor.lattice_machxo_2_3l: fix sdc generation - https://git.io/Jko0D

19:10 sorear has quit [Ping timeout: 260 seconds]

19:12 FFY00 has quit [Ping timeout: 260 seconds]

19:12 sorear has joined #nmigen

19:13 FFY00 has joined #nmigen

19:17 jeanthom has quit [Ping timeout: 264 seconds]

19:37 chipmuenk has joined #nmigen

19:41 jeanthom has joined #nmigen

20:12 <lsneff> Trying to simulate with iverilog, but it doesn't understand the SB_IO primitive in the generated verilog. whitequark: any ideas?

20:22 d1b2 has quit [*.net *.split]

20:22 feldim2425_ has quit [*.net *.split]

20:25 d1b2 has joined #nmigen

20:25 feldim2425_ has joined #nmigen

20:29 emeb_mac has joined #nmigen

20:38 <daveshah> lsneff: add the Yosys cells_sim.v to the list of files for iverilog

20:39 <daveshah> use 'yosys-config --datdir/ice40/cells_sim.v' to get the path if you want to be portable

21:41 <lsneff> After some debugging with icarus, I've gotten the cpu to work. Surprisingly, an issue I'm running into is endianness in the app binary.

21:42 <sorear> there are outstanding PRs against several toolchain projects for big-endian support

21:43 <daveshah> and indeed https://github.com/cliffordwolf/picorv32/pull/172

21:43 <Sarayan> Is there any modern cpu that is big-endian?

21:43 <sorear> depends on what you mean by "modern" and "is"

21:43 <daveshah> I think the POWER stuff still has a big-endian mode at least

21:44 <sorear> nearly all modern CPUs support big-endian data (x86 was one of the last to add this)

21:44 <lsneff> I'm fine with little-endian, the size of the app binary just isn't a multiple of 4 bytes, so extending it without messing up le is a little weird

21:44 <Sarayan> I mean is there anywhere where I could test modern mame's BE support because it's been so long I haven't seen a BE processor that there are probably issues?

21:46 <sorear> you can build a big endian system (either cross-compile, or try to find a distro that still does BE arm/mips/ppc), and either run it on qemu or under KVM on current arm hardware

21:47 <Sarayan> I wonder if at that point clang or gcc targets anything BE (we need c++17)

21:47 <Sarayan> not sure I can boot say a pi in BE

21:48 <sorear> morally speaking BEness is a property of ABIs, not processors, and you could do a BE toolchain for x86 if you had a few months to burn

21:48 <sorear> idk if you can boot a pi in BE but you can boot a pi and then run a BE OS using kvm

21:48 <sorear> the CPU is endian-agnostic but the devices aren't

21:49 <sorear> (I asked almost exactly this question to #musl a month or two ago, I'm telling you what I now know but much it it hasn't been tested)

21:50 <sorear> gcc/clang _do_ still support BE in the abstract (in particular there is no little endian abi for s390x), and I'm pretty sure they can still do BE for arm/mips/ppc

21:50 <Sarayan> swapping everything is terribly slow I suspect, and a pi-scale arm is not very fast in the first place

21:50 <sorear> arm/mips/ppc all support different endianness in user mode vs. kernel mode, but Linux does not support that; there might be other kernels that do

21:50 <Sarayan> if it's not processor-supported it's kind of a problem

21:51 <sorear> every recent arm supports both endiannnesses

21:51 <Sarayan> aarch64 removed per-process endianneess from the ABI though

21:52 <sorear> but the MMIO devices on the bus have a specific endianness and not all drivers can cope with device endian != whatever_EL1_endianness

21:52 <sorear> which is why you run this in a VM (requires swapping MMIO accesses but everything involving RAM is native)

21:55 <Sarayan> interesting

21:56 <sorear> SCTLR_EL1.EE

21:57 <Sarayan> I have a feeling it's going to take way more time that I should use on that though, I have a lot of other things with higher priority by far. Thanks a lot though :-)

22:06 chipmuenk has quit [Quit: chipmuenk]

22:06 chipmuenk has joined #nmigen

22:08 chipmuenk has quit [Client Quit]

22:09 <whitequark> remember that openssl had a port to big endian x86?

22:09 <whitequark> https://en.wikipedia.org/wiki/Stratus_VOS#Programming_for_VOS

22:10 <sorear> not nearly as interesting without a gcc and linux port to big endian x86

22:10 <sorear> that means using movbe (or mov+bswap) instead of "movel"

22:11 <sorear> on-stack return addresses are still little endian but You Shouldn't Be Touching Those

22:11 <d1b2> <DX-MON> sorear: don't shout too loudly, you might just find it already exists in the form of a HPE hellchain for the HPE Non-Stop

22:11 <whitequark> they had a gcc port

22:12 <d1b2> <DX-MON> it's "big endian" (normal LE x86, but compiled for and executed as-if a BE architecture, including so many byte swaps to make it appear BE)

22:13 <sorear> you have my attention

22:16 <d1b2> <DX-MON> I mean.. it's a whole thing for their Non-Stop systems (also fault-tolerant like Stratus VOS)

22:17 <sorear> pity I (probably) don't have any reasonable way of running that

22:17 <d1b2> <DX-MON> but with how slowly it runs as a result of all those byte swaps and the whole OSS and Guardian OSes (one running atop the other on the same box, no virt)..

22:17 <d1b2> <DX-MON> yeah, you don't.. it's..

22:17 <d1b2> <DX-MON> very expensive

22:17 <d1b2> <DX-MON> and kinda supercrap

22:18 <d1b2> <DX-MON> but it exists

22:18 <sorear> modern cores can run amazingly shitty code with little if any measurably slowdown

22:19 <d1b2> <DX-MON> you'd hope, but not in this case.. this has a very noticable slowdown from all the byte swapping, both because of the cache misses it causes, and because it screws up prediction and pipelining

22:19 <d1b2> <DX-MON> you're getting pipeline stalls and resets every few instructions (5-10) rather than every several tens of instruction

22:20 <d1b2> <DX-MON> also, the compilers for it are marked HPE proprietary but are a walking GPL violation [shudder] and are based around essentially GCC 3.x

22:21 <sorear> i could buy increased i$ pressure but something has gone wrong if this is breaking pipelining

22:21 <sorear> (ok, it's also hurting load-to-use, which would be a fairly big deal)

22:23 <d1b2> <DX-MON> well, so.. there's just so much byte swapping going on to make it "work" that even register renaming can't help you so you fall into oldschool pipeline stalls waiting for data to become available from the pipeline

22:23 <d1b2> <DX-MON> ah, so my source for the compiler bit just informed me that I'm wrong, it's not GCC 3.x (though the linkers and debugger are based around an old Binutils), but rather Open64

22:23 <sorear> another fun thing you can do is negate all pointers, and store them in negated format

22:23 <d1b2> <DX-MON> (they think)

22:25 <sorear> which effectively amounts to using sub instead of add for struct/array accessing. will break close to everything but I'm pretty sure C allows it, since int-ptr casts are IDB :)

22:28 <d1b2> <DX-MON> that depends.. casts of (u)intptr_t to pointer are fully defined behaviour

22:31 <sorear> "The following type designates a signed integer type with the property that anyvalidpointer tovoidcan be converted to this type, then converted back to pointer tovoid,and the result will compare equal to the original pointer:"

22:31 <sorear> there's a lot of latitude there (have you seen what CHERI does)

22:31 <sorear> negating twice is a no-op, so that's fine

22:45 jeanthom has quit [Ping timeout: 264 seconds]

22:48 <lsneff> Hmm, executing a ret doesn't actually jump back in the simulation..