<zignig> " mostly because zignig wanted a text assembler " , thanks !
genii has quit [Remote host closed the connection]
emeb_mac has joined ##openfpga
X-Scale` has joined ##openfpga
X-Scale has quit [Ping timeout: 248 seconds]
X-Scale` is now known as X-Scale
Bike has quit [Read error: No route to host]
Bike has joined ##openfpga
X-Scale` has joined ##openfpga
X-Scale has quit [Ping timeout: 268 seconds]
X-Scale` is now known as X-Scale
dh73 has quit [Remote host closed the connection]
emeb has quit [Quit: Leaving.]
dj_pi has joined ##openfpga
dj_pi has quit [Ping timeout: 272 seconds]
Bike has quit [Quit: Lost terminal]
X-Scale` has joined ##openfpga
X-Scale has quit [Ping timeout: 272 seconds]
X-Scale` is now known as X-Scale
<_whitenotifier-3> [Boneless-CPU] zignig synchronize pull request #4: directives bikeshed - https://git.io/fjXmy
<_whitenotifier-3> [Boneless-CPU] zignig synchronize pull request #4: directives bikeshed - https://git.io/fjXmy
rohitksingh_work has joined ##openfpga
emeb_mac has quit [Ping timeout: 245 seconds]
Miyu has joined ##openfpga
m4ssi has joined ##openfpga
pie_ has quit [Ping timeout: 245 seconds]
Asu has joined ##openfpga
<eddyb> okay this is a really helpful example of nmigen combinational logic (specifically, a lot of it in one place) :D https://github.com/whitequark/Boneless-CPU/blob/master/boneless/gateware/decoder.py#L214
<whitequark> yeah
<tnt> whitequark: can you tell "don't care" to nmigen ?
<eddyb> whitequark: would you say it'd be a waste of time if someone made a simple DSL that expanded to pretty much that style of code? just a bit of syntax sugar
<whitequark> tnt: in cases? yes, you can use with m.Case("110010----1010"):
<whitequark> eddyb: what style of code
<whitequark> expanded how
<whitequark> i have no idea what you mean here
<eddyb> whitequark: really dumb small things like `=` or `<-` instead of `.eq` and `foo(...)` instead of `with m.Foo(...)`
<tnt> whitequark: no, more like in "results" of the comb logic. Like when decoding instructions, most of the time, half the output signals don't matter at all and as far as I'm concerned the logic optimizer can make them 1 or 0, whatever yields the smallest/fastest logic.
<whitequark> tnt: as a deliberate choice the base nmigen language does not include any way to get 'x
<eddyb> <3
<whitequark> i've been thinking about adding something like Rust's unsafe to nmigen, where you can opt into having those
<whitequark> but to do that, i want to first see if it actually gives any real benefit
<tnt> :(
<whitequark> because i'm not convinced it does
<whitequark> tnt: like one of the goals behind boneless is to optimize it as far as i can manually and then add 'x to it
<whitequark> and see if it makes any difference
<tnt> I know when I did my decoding logic, this had quite a bit of gain. No point in configuring an ALU path if the 'enable' bit at the end forces the output to 0.
<whitequark> is it because 'x is useful, or because 'x works around a problem elsewhere in the toolchain?
<whitequark> useful on its own*
<tnt> Oh, no, I mean 'x in verilog didn't help (at least not with yosys). I had to use an external tool where I could specify input / output and the output could include "don't cares".
<whitequark> ohhhhh
<whitequark> yeah but nmigen is a yosys frontend, if it didn't help in verilog, what would be the point of adding it to nmigen?
<eddyb> hmm, funnily enough, these classes look a bit like React but with "elaborate" instead of "render". and I've tried to build a DSL around a React-like before (which failed because proper integration would've required an entire compiler for a Java-like language, and that made me switch focus to making it easier to write toy compilers)
<tnt> whitequark: well, there was always the hope that your logic optimization improvements would make yosys more "don't care" aware :p
<tnt> I had opened https://github.com/YosysHQ/yosys/issues/765 as a simple exampl
<whitequark> tnt: i would actually expect that it would optimize all the other cases better
<whitequark> i.e. you would not have to insert don't cares to get faster logic
<whitequark> all you would have to do is to ensure that the "don't care" choice is not on the critical path
<whitequark> but it can still be perfectly deterministic
<tnt> The way I wrote it, I don't really "insert" don't cares. I just make the default output of decoding "doesn't matter". And then for each instruction / path, I make sure to set the actual bits that will be used.
<whitequark> yes, I know. I make the default output of decoding something simple it already does.
<whitequark> anyway
<whitequark> like I said, I'm not absolutely opposed to 'x in nMigen
<whitequark> I just think it's premature
<whitequark> for example, consider that perfectly safe Rust code can and does beat C++ at the things C++ is good at
<tnt> Sure, but I just don't consider 'x' to be unsafe :p
<whitequark> it is though
<tnt> I don't see why.
<whitequark> it's the same semantically as LLVM's `poison`
<eddyb> whitequark: you should add something like `MaybeUninit` :P
<whitequark> because if you ever take a decision based on 'x that results in observable behavior change, everything your circuit does from that point is unpredictable
<eddyb> safe to create, but to read it you need to do `.unsafeAssumeInit()` or so mething
<whitequark> (in general, it's possible to restrict the damage)
<whitequark> i.e. as long as 'x gets into a storage element, all bets are off, in general
<whitequark> as soon as*
<tnt> Sure, but (1) the simulator should show me that. (2) that's a bug, no different that if you assign the wrong value to the bit manually.
<eddyb> whitequark: hmm, you could theoretically ensure that it doesn't reach storage?
<whitequark> tnt: the simulator will show you that if you have a testcase that hits it. are you sure you do?
<whitequark> and it is very, very different.
<whitequark> the reason it is different is, for example, consider an FSM
<tnt> It's as much tested as the rest of the core :p
<whitequark> no matter how wrong are the control inputs to your FSM, if it has 5 states, it will stay within those 5 states
<whitequark> if you have a 'x as a control input, it can get into a state that is completely illegal
<whitequark> like if it is 1-hot encoded, it could get into a state with multiple hot bits.
<whitequark> or zero
<whitequark> I did, in fact, hit that bug
<whitequark> that's why it's unsafe: it is nondeterministic, and that nondeterminism propagates
<eddyb> if it stays within comb logic then the situations in which you have 'x anywhere should be quantifiable and you "just" have to require that none of those situations propagate all the way down to the outputs
<whitequark> tnt: in fact you could have a formal proof that your FSM never gets into an illegal state, and then you feed 'x to that module, and it still will
<tnt> I consider it my job to prevent that propagation by design of my logic.
<whitequark> that doesn't matter one bit as to whether it's unsafe or not
<whitequark> just like C programmers consider it their job to prevent UB from being invoked, yet every one of them writes programs that are full of UB.
<tnt> meh ... I guess we'll just have to agree to disagree. To each his own views.
<eddyb> I think you downgrade it from the halting problem to a SAT problem, if you don't let it reach the state? (assuming you don't want your users to write proofs, in which case you don't even need SAT)
<whitequark> tnt: no.
<whitequark> whether 'x is unsafe is not a point we can agree to disagree on, because it has basis in fact.
<whitequark> whether 'x is *worth it* is a matter of opinion and therefore disagreement
<eddyb> s/in which case/cause if you were,/
<whitequark> eddyb: you can just add logic that tracks the 'x state
<whitequark> the problem is that this negates all efficiency advantages of using 'x in the first place
<whitequark> pretty much like -fsanitize=undefined
<whitequark> of course what it doesn't negate is tracking whether your design is stuck in an illegal state or not, which is a completely different advantage of 'x, and is actually what it is introduced for in Verilog
<whitequark> i.e. the purpose of 'x in simulation and in synthesis is different
<eddyb> tnt: I'm pretty sure it's unsafe in a similar way to languages without memory safety: it can violate local reasoning, at a distance, in ways which would otherwise be impossible
<whitequark> ^ exactly
<whitequark> using 'x is 100% fine in every case where your *complete* design is covered by formal proofs that ensure that 'x never propagates to storage elements
<eddyb> it's kind of insane, but local reasoning can let you prove some things by construction, things that would otherwise require painstaking manual proofs or hit the halting problem with automation
<whitequark> once that is no longer the case, whether *any* part of your design works correctly is down to *every other* part of your design working correctly, which is not something that designs made by humans generally do, ever
<whitequark> "not having 'x anywhere" was actually a foundational, uncompromising principle of oMigen that I'm considering relaxing in nMigen, heh
<eddyb> whitequark: hmm can I stick (something like) this on an iCEstick and have it blink an LED? (well, I'd have to slow it down a lot to see it :P) https://github.com/whitequark/Boneless-CPU/blob/master/examples/software/toggle.asm
<eddyb> oh I guess I might have to hook up the core to IO, lol
<whitequark> eddyb: note I haven't tested that core on a real FPGA at all yet
<whitequark> but in theory yes
<eddyb> tempted to just do this today so I can understand the whole process better. overall, it seems like Boneless is small enough that I can study (and maybe experiment off of) it
<eddyb> whitequark: oh heh
<whitequark> tnt: oh and one last thing. there are good ways to make 'x much safer, for example, a new $freeze cell that takes 0, 1, or x, and outputs 0 or 1, but it's unspecified which if the input is x
<whitequark> so then you can stick that cell onto every input of your module, and you get local reasoning back again
<eddyb> that's funny, this still happens: `*** buffer overflow detected ***: iceprog terminated`
<eddyb> (with the iCEstick LED example from the icestorm repo)
<whitequark> but then it wouldn't be Verilog, it would be Yosys' Safer Verilog or something, since there is no way to get the same behavior from Vivado
<whitequark> which is why it'll never be widespread.
<eddyb> I don't even know what that is from, maybe NixOS compiles some sanitizer into the binary or something lol
<whitequark> eddyb: glibc prints those i think
<eddyb> lol ERROR: Could not install packages due to an EnvironmentError: [Errno 30] Read-only file system: '/nix/store/4c4ajgdnhlqk994hilagk5cgv7vw9yzg-python3-3.7.3/lib/python3.7/site-packages/six.py'
<eddyb> I should look up how this is actually supposed to be done :P
<eddyb> whitequark: how do you actually run the tests? my naive attempts don't get very far
<whitequark> python3 setup.py test
<whitequark> or python3 -m unittest
<eddyb> OOOOOH
<eddyb> okay I see. `pip install .` also worked to get me `boneless-as`
lopsided98 has quit [Ping timeout: 276 seconds]
<whitequark> yep
<whitequark> or `python3 setup.py develop --user`
lopsided98 has joined ##openfpga
<eddyb> oh, cool, VSCode's Python extension autodetect my venv
<eddyb> whitequark: how do I run the "main functions" in alsru/control/core/decoder? what are they for?
<whitequark> eddyb: they're for generating verilog/rtlil for separate units of the CPU
<eddyb> ooh so they expose nmigen itself for those units?
<whitequark> what's "nmigen itself"
<eddyb> oh, I was looking at the top of the file and missed `from nmigen import cli` below. I wasn't sure if `cli` was from `nmigen` or from `boneless`
<eddyb> whitequark: like, nmigen the tool that does the things nmigen... ugh failing to use words, I should go to lunch. anyway how do I get python to run those? naive attempts fail
<whitequark> python3 -m boneless.gateware.alsru generate -t il foo.il
<eddyb> ooooh I was missing the boneless. at the start /facepalm
<eddyb> whitequark: thanks!
<eddyb> whitequark: `python -m boneless.gateware.core core-fsm+memory generate core.v` throws `nmigen.back.verilog.YosysError: ERROR: Parser error in line 30: syntax error`
<whitequark> eddyb: do you have yosys from master branch?
<whitequark> (or 0.9)
<eddyb> awww `Yosys 0.8+ (git sha1 d9daf09cf3, gcc 7.4.0 -fPIC -Os)`
<whitequark> yeah that's from 3 months ago
<eddyb> ugh I have a build server, why am I torturing myself building this locally
<eddyb> whitequark: yeah, that was it, I have a big .v now :D
<whitequark> sweet
<zignig> eddyb: I can confirm that Boneless runs on the tinyfpgaBX , i've left it running blink for a few days.
<eddyb> zignig: aww I won't be the first. do you have the code for that up anywhere?
<zignig> what board are you using ? , it's set up for tinyfpgabx and uses nmigen-boards at the moment .
<eddyb> zignig: iCEstick atm
<zignig> ok , you will need zignig/gizmotron/tree/master/v3 , and edit the core_v3.py
<zignig> MMIO?
<eddyb> memory-mapped IO
<zignig> yep,
<whitequark> boneless doesn't have mmio
<whitequark> it has a separate address space for peripherals
flea86 has joined ##openfpga
<eddyb> oh, is that what `ext` is?
<whitequark> yes
<zignig> eddyb: I have been developing on it and it's borked at the moment
<zignig> eddyb: blink.asm is fixed now.
<eddyb> cool :D
<zignig> just change out the nmigen import to icestick and , change BB to ice stic and get rid of the resourses at the top.
<eddyb> zignig: I can just copy this and run it in a venv with boneless installed, right?
<cr1901> python3 path/to/nmigen/file.py -h, if you're using nmigen.cli
<cr1901> Ahhh shit, scrollback fail, nevermind
<eddyb> cr1901: that is the kind of thing that python didn't like, heh. I think because there are dependencies to other parts of boneless
<cr1901> I'd have to see the error, I haven't built boneless since Jan or so.
<zignig> eddyb: no idea , should be good! :)
<zignig> cr1901: have you looked at the new v3 code? the arch code has some dense python magic in it.
<eddyb> cr1901: that is, it works via e.g. python -m boneless.gateware.core but not python boneless/gateware/core.py
<zignig> cr1901: are you looking to rewrite your simulator for core_v3?
<whitequark> i think instruction semantics should be a part of arch., not a separate simulator
<whitequark> then the formal model can be generated from it too
<eddyb> zignig: if you like bitfield metaprogramming tricks, check this out https://github.com/eddyb/wiREd/blob/master/disasm/arm.js#L209
<cr1901> zignig: Basically, it looks like my code is obsolete and I have nothing to do :P
<zignig> a good plan , perhaps the instructions should have a python version of the operation as a field
<whitequark> yeah that's the idea i had
<zignig> cr1901: sort of, I found the simulator really useful for debugging the asm.
<zignig> whitequark: class ADD (C_ARITH, M_RRR, T_ADD, F_RRR ): sim = "rd = ra + rb"
<whitequark> not really like that no
<cr1901> The point is if in insn semantics are part of the arch, then there is little code for me to actually write to do a simulator.
<eddyb> zignig: btw, `#!/usr/bin/python` is not portable (and might not even play nice with a venv)
<eddyb> you should use `#!/usr/bin/env python` (or even better, `#!/usr/bin/env python3`)
<zignig> eddyb: yeah , it's still in the "hacky crap" phase of development ;)
<eddyb> (mostly pointing it out because I hadn't even seen that use of `env` before NixOS - where it's the entirety of `/usr`. `/bin` only has `/bin/sh` and `/lib` doesn't exist :P)
<zignig> eddyb: fixed.
<whitequark> /usr/bin/env is very common outside of nixos
<eddyb> yeah I just mean there's a chance someone might never notice it until it actually makes a difference (which it sure can outside of NixOS, but it's not guaranteed :P)
<eddyb> zignig: `nmigen.build.res.ResourceError: Resource clk16#0 does not exist`
<eddyb> let me actually check what this supports
<cr1901> icestick doesn't have a clk16
<zignig> eddyb: you will have to use the clock on the ice stick , change it to clk12
<eddyb> 16 refers to 16MHz?
<cr1901> it has a clk12
<zignig> eddyb: it does , it's the default clock of the board
<cr1901> How to abstract something like this away is an open discussion right now
<zignig> cr1901: what do you mean ?
<cr1901> i.e. it should be possible to write platform-agnostic designs w/o much effort. Multiple ways to do it (subclass, mixins). I'm not sure which way is best, and others have asked too.
<eddyb> oh fun `unrecognised option '--placer'`
<whitequark> no subclasses or mixins
<whitequark> i've described the design this will use on the issue tracker, you can read it
<zignig> nmigen-board is a very good step towards that. I think a 'default' clock might be a good plan , for beginner.
<eddyb> it's not even clear what's outputting that
<zignig> eddyb: need the latest nextpnr
<eddyb> I was about to ask, lol
<cr1901> which issue on the issue tracker?
* zignig has migen,yosys and nextpnr pull and build on a cron job.
<cr1901> whitequark: Oh look, I'm tagged in that. This works fine for clocks specifically. I'm thinking of a different issue, such as how one would rewrite hdmi2usb's build infrastructure to take advantage of nmigen's dep injection.
<whitequark> hmmm
<whitequark> what do youmean?
<cr1901> I was looking to reduce code duplication in hdmi2usb, which has a tendency to create SoC's per platform (Base, Video class, etc) in their own isolated Python files. I have been wondering how to leverage the "platform" input to elaborate to nmigen, such that >>
<cr1901> _all_ platforms ultimate share the same single SoC Elaboratable for a given SoC class (Base, Video, etc)
<cr1901> and then use either subclassing or mixins that the SoC Elaboratable uses to completely abstract away platform differences.
<whitequark> ohhhh I see
<cr1901> I hadn't quite worked out how it concretely works.
<whitequark> I agree that requires research. I do not have any especially good ideas for desining that.
<eddyb> zignig: so this is fun, I think I can just run `nix build --store ssh-ng://build.lyken.rs` and it will build all the dependencies I have in `default.nix` on the server (if the official build servers don't have them, or I'm overriding their sources to get up-to-date versions
<cr1901> Somewhere in the m-labs scrollback this came up. Getorix (sp) seems to be very interested in it too
<eddyb> (that server has -j48 and it doesn't make noise in the office :P)
<zignig> eddyb: looked at nix but not had a try yet, sounds like gentoo without the extreme waiting and rollback.
<eddyb> I mean, there are official build servers, like with most distros. but I guess you could compare it to gentoo in terms of being able to customize packages
* zignig observes that whitequark's 'not especially good ideas' are still probably awesome.
<eddyb> it's not perfect and sometimes you have to fight it to do something that assumes too much about Linux
<eddyb> but hey, if steam works on it, anything is possible, right :P?
<zignig> eddyb: indeed , do you have blinky yet , huh huh huh ?
<eddyb> zignig: it's building pypy3 for some reason, and taking a long time to do so
<zignig> cr1901: have you got a v3 on a icebreaker yet ?
<cr1901> I haven't tried, and I'm a bit preoccupied till Sunday most likely
<zignig> cr1901: away mission or just busy ?
<cr1901> but in my plans
<cr1901> both?
<zignig> :/
<cr1901> :P
<cr1901> but in my plans
<cr1901> it'll be useful on icebreaker plus 128kB SRAM
<zignig> cr1901: oooh, 64KW is 128Kb ... nice , need a SRAM driver. sounds like a plan.
<eddyb> while this is compiling every python package in existence or whatever it's doing, I'll go work on the world's most inefficient parser for arbitrary CFGs
* zignig hands eddyb a go faster button.
<eddyb> it's not updating often but it's outputting stuff like `[rtyper] specializing: 15800 / 157489 blocks (10%)`
<eddyb> zignig: is the button connected to a server with more than 48 logical cores :P?
<whitequark> zignig: doesn't icebreaker have SPRAM?
<whitequark> and of course boneless is specifically made to use SPRAM well
<zignig> whitequark: not that I am aware of , but cr1901 might have a pmod.
<whitequark> no I mean on the FPGA
<cr1901> No I meant SPRAM
<cr1901> I was being sloppy
<cr1901> zignig: Port this if you're bored. I wrote it for micropython support for mithro. It's known to work: https://github.com/timvideos/litex-buildenv/blob/master/gateware/ice40.py#L6
<whitequark> oh yeah I forgot we don't have SPRAM inference...
<cr1901> It's wishbone, so it's misoc/litex compat (no idea about heavyX)
<zignig> cr1901: don't have any SPRAM at the moment, i've only go a tiny BX at the moment. will get an EX or an orangecrab
<zignig> when they come out.
<whitequark> SPRAM is on-chip single port RAM on the iCE40UP5K
<zignig> whitequark: ah , ok. does it need a special driver ? or does nextprn infer it ?
<whitequark> it needs the code cr1901 linked you, for now
<whitequark> this will be improved in yosys some day
<zignig> noted.
Asu` has joined ##openfpga
Asu has quit [Ping timeout: 268 seconds]
<zignig> ah , Instances. that reminds me , need to look into Warmboot and PLL at some point.
<tnt> zignig: note that you can't initialize the spram, so you need to "boot" from EBR.
<eddyb> one could make an UART bootloader, right?
<whitequark> tnt: ohhhhh
<whitequark> this makes me think if boneless' careful adjustment to only ever use 1 port (in the smallest configuration) is actually bad
<eddyb> or maybe read it in via SPI
<tnt> whitequark: yeah, the slight detail I had overlooked at first :p
<zignig> eddyb: I've written most of one for the defunct core_v2, which I will port once I have a handle on the assembler.
<whitequark> because if I use a BRAM in front...
<whitequark> it might as well be a cache
<whitequark> hmm
<whitequark> tnt: wild idea
<whitequark> a special hack in the instruction decoder that hardcodes memory writes from the external bus
<whitequark> I think this can be done very cheaply, possibly even at 0 LUT cost
<whitequark> so it would just cycle "read external, write memory, increment"
<zignig> whitequark: some magic in the BusArbiter ?
<whitequark> maybe reusing the PC counter to do the increment
<tnt> eddyb: yeah, at one point I was just having a small hardcoded spi flash reader that preloaded the spram and then released the reset line of a picorv32 once it had been loaded.
<whitequark> yeah I think I'll do it that way
<whitequark> because it saves a few muxes on the critical path
<whitequark> zignig: no, purely in the decoder
<eddyb> heh I think there was an arch that even put special instruction forms for `FFxx` addresses and half of that was RAM, the other half IO
<eddyb> am I thinking of the gameboy?
<eddyb> s/put/had
<whitequark> lots of arches have something like that
pie_ has joined ##openfpga
<cr1901> mips is like that: depending on which memory address you access, you bypass cache, MMU, or both
<cr1901> or neither*
<eddyb> cr1901: that stuff had me very confused until someone (whitequark?) explained it to me
<eddyb> because I started from what other people have tried to document about the N64, I kept thinking it was a weird thing Nintendo did, instead of something specific to MIPS
<cr1901> MIPS also doesn't have page table walks, so you have to write the code to figure out whether a page is in memory manually. Honestly, I think this is less hassle than doing it in h/w and wish riscv didn't mandate a hardware walk.
<cr1901> Mips solution seems pretty good to actually getting MMU shit to work (oh wow I praised MIPS)
<whitequark> software walks are very hard to get right
<whitequark> first, you have to pin your tlb handler in the tlb, which wastes often a lot of space in it
<whitequark> and in general the logic to keep it there is not trivial
<whitequark> second, your tlb handler uses registers, so you need some way to save the registers
<whitequark> but you can't assume you can access pretty much any other memory
<cr1901> you need to pin two tlb entries- one for the handler, and... the other I forget, tbh :P
<whitequark> for the page tables?
<cr1901> might've been something more specific.
<cr1901> whitequark: It's been a while since I studied it. I remember thinking at the end "this seems like so much less hassle than testing and implementing hardware walk logic". I can appreciate that it only _seems_ easier, and that doing either s/w or h/w walk is shit to implement.
<whitequark> iirc there was something about VIVT/PIPT TLB and such
<whitequark> but MMUs and caches were never my strong point
<cr1901> Somebody uses those?
<cr1901> eddyb: Yea another thing... you ever heard the term VIPT?
<whitequark> no I mean, IIRC with sw walk you have fewer possible combinations
<whitequark> but I'm not sure
<eddyb> cr1901: I don't think so, no?
<tnt> whitequark: took me a bit of time to understand what you meant, but yeah, that kind of instruction would definitely be useful for a lot of things when moving data in from external devices into local memory. The opposite ( read memory / write external ) would also be useful I think if it can fit at zero cost in the decoding logic.
<eddyb> ugh I just remembered a stray thought from yesterday ("ccNUMA over air")
<whitequark> tnt: hopefully
<whitequark> but I need to finish this yosys pass first
<cr1901> eddyb: Short version, VIPT is an optimization where cache and MMU accesses are done slightly in parallel. It is extremely common. It also leads to a lovely situation where two addresses in cache can point into the same page (or is it "two different pages can point to the same address"?).
<cr1901> So you have to write a "page coloring algorithm" to ensure that pages will never be aligned in such a way that aliasing occurs, and it's all just a fricking mess I don't understand :'D
<eddyb> tnt: next thing you know you're implementing macroop fusion :P
<eddyb> (if Boneless had compressed instructions, would they be one byte each?)
<whitequark> boneless already has multicycle instructions and multiword instructions
<eddyb> oh right nvm
flea86 has quit [Quit: Goodbye and thanks for all the dirty sand ;-)]
dh73 has joined ##openfpga
Sprite_tm has quit [Remote host closed the connection]
rohitksingh_work has quit [Read error: Connection reset by peer]
emeb has joined ##openfpga
genii has joined ##openfpga
Asu` has quit [Ping timeout: 246 seconds]
Asu has joined ##openfpga
rohitksingh has joined ##openfpga
cr1901 has quit [Quit: Leaving.]
cr1901 has joined ##openfpga
carl0s has joined ##openfpga
moho1 has quit [Quit: WeeChat 2.2]
<eddyb> zignig: wow this took forever and didn't even succeed "builder for '/nix/store/r9s808a51vrag0hhsscrr58y2sgsh8a3-pypy3-7.0.0.drv' failed with exit code 1"
rohitksingh has quit [Ping timeout: 248 seconds]
cr1901 has quit [Quit: Leaving.]
cr1901 has joined ##openfpga
m4ssi has quit [Remote host closed the connection]
Sprite_tm has joined ##openfpga
<Sprite_tm> Hey guys/gals, I'm getting an issue when programming my ECP5 using openocd/jtag...
<Sprite_tm> Error: tdo check error at line 26713
<Sprite_tm> Error: READ = 0x6c00000
<Sprite_tm> Error: WANT = 0x0000100
<Sprite_tm> Error: MASK = 0x0002100
<Sprite_tm> I can decode this using the ECP docs into Config: SRAM, SPIm fail 1, BSE error:User aborted configuration, Execution error
<Sprite_tm> Unfortunately, that still doesn't tell me what's actually going wrong here. FWIW, jtag frequency doesn't seem to matter.
<daveshah> What board and environment?
<Sprite_tm> Custom board. ECP5 LFE5U-45F 8BG381C
<Sprite_tm> What do you mean by 'environment'?
<daveshah> Programmer
<daveshah> and anything else relevant (other JTAG devices, etc)
<Sprite_tm> Tiao Tumpa FT2232H board. It's known to work.
<Sprite_tm> No other JTAG devices.
<daveshah> Is this your first test of this board, or has it worked before/with Diamond?
<Sprite_tm> No, 1st test. It's a bringup test, at the moment 2/2 boards fail :/
<Sprite_tm> I have no clue why. I had pretty similar hardware on the previous incarnation, with the most stiking difference that that used an -uM-45F.
<daveshah> Have you changed the script to the correct device type?
<Sprite_tm> Yes. Both the openocd script as well as the ecppack command line.
<daveshah> You should change nextpnr, not ecppack
<daveshah> For all but the 12k, no device argument to ecppack is needed
<Sprite_tm> Ah, sorry, nextpnr indeed. Ecppack didn't have an argument.
<Sprite_tm> or no device argument at least.
<daveshah> Just to check, can you post the full command line (although a bad device ID would probably fail much earlier) for nextpnr and ecppack?
<Sprite_tm> <-->nextpnr-ecp5 --json $< --lpf $(CONSTR) --textcfg $@ --45k --package CABGA381 --speed 8 --freq $(FREQ)
<Sprite_tm> ecppack --spimode $(FLASH_MODE) --freq $(FLASH_FREQ) --svf-rowsize 100000 --svf $(PROJ).svf --input $< --bit $@
<daveshah> Can you try ecppack without freq and spimode? It might be confusing things
<Sprite_tm> O_O
<Sprite_tm> That works.
<Sprite_tm> Any idea what that's about? I'd love to keep those args tbh, boot is 4 secs or so otherwise.
<Sprite_tm> If not, I'll just experiment to see what works btw.
<daveshah> Those commands set various SPI-flash specific things in the bitstream, which it seems the ECP5 won't accept via JTAG
<Sprite_tm> Also, mode is qspi and flash_freq is 38.8 (MHz).
<Sprite_tm> Huh. I'm 99% sure the -UM-45F had no issue with it.
<Sprite_tm> Can't test as I had to rewire my JTAG-cable :/
<daveshah> Odd. Maybe different silicon revisions or something
<Sprite_tm> Perhaps indeed. Ah well, good to know.
<Sprite_tm> Thanks for the help, I'd've been off checking about five million things before I'd suspect those settings.
<Sprite_tm> If anything, it's on the Internet now so other people will get a search result :)
<daveshah> Anyway, just pushed a fix so this won't happen again (SPI options won't be included in the SVF file)
<whitequark> daveshah: https://imgur.com/a/RHiXInO
<whitequark> i'm using a borderline insane approach but it seems to actually work pretty well
<daveshah> Oh nice, think I understand it a bit better looking at that
rohitksingh has joined ##openfpga
<whitequark> it's a BDD! it's a CFG! it's an SSA-like IR! it's a combination etc
<whitequark> what i realized is that yosys' switch statements are exactly isomorphic to ML pattern matching
<whitequark> so i simply did the CPS transform on it and then used a standard approach from Le Fessant's paper
moho1 has joined ##openfpga
<whitequark> this produces code linear in size *and* is polynomial complexity itself
<whitequark> except now there's a proc_* pass that contains a complete optimizing compiler with an IR and several passes
<Sprite_tm> daveshah: Awesome, that's a very rapid turnaround time :P
<daveshah> whitequark: hehe, I think we will add a few more things like that as we add more and more higher-level optimisations
<whitequark> yes but, yo dawg
Miyu has quit [Ping timeout: 272 seconds]
azonenberg_work has joined ##openfpga
cr1901 has quit [Quit: Leaving.]
cr1901 has joined ##openfpga
bubble_buster has quit [Ping timeout: 257 seconds]
bubble_buster has joined ##openfpga
daveshah has quit [Ping timeout: 257 seconds]
Jybz has joined ##openfpga
daveshah has joined ##openfpga
rohitksingh has quit [Ping timeout: 244 seconds]
Asu is now known as Asu_
mkdir has joined ##openfpga
<mkdir> hello, how do you declare float var in verilog
<mkdir> 'real' type throws error
<whitequark> what error? in what program?
<mkdir> ERROR: syntax error, unexpected TOK_REAL
<mkdir> ice40
<mkdir> sorry yosys
<whitequark> floats are not synthesizable
<whitequark> they're simulation only
<mkdir> mm so what is real used for? and what's a better type for floats? reg[63:0] hz = 0.5
<mkdir> works okay
<whitequark> don't expect to be able to use floats like you'd do it in C
<whitequark> you will need dedicated logic that implements your desired float semantics
<mkdir> oh
<whitequark> synthesizable verilog arithmetics is *only* integers (well booleans too)
<mkdir> hmm so how do we deal with real numbers
<mkdir> and why does the real type exist?
<mkdir> i thought it was for floats
<whitequark> i told you: real is simulation only
<mkdir> ooh
<whitequark> it's for e.g. comparing your logic's behavior to something more precise
<whitequark> or for representing voltage levels or something
<whitequark> 98% of verilog cannot be used at all in synthesis
<mkdir> so this works: reg[7:0] hz = 0.5;
<mkdir> but maybe not the way i think?
<whitequark> i think that just coerces to integer and ends up 0 or 1
<mkdir> ah
<daveshah> You probably don't want to be using floating point stuff on an iCE40 anyway
<whitequark> also that, especially not 64 bit floats
<mkdir> hmm ok
<daveshah> I remember someone who copied some HSV to RGB code into an HLS tool, asked for a single cycle implementation, and got about a 80k LUT design out
<daveshah> Because it was all 64 bit floats
<whitequark> lol
<daveshah> You can hit similar things even with integers using division or modulo (by anything other than a constant power of two)
<whitequark> how would yosys even implement /5?
<mkdir> so what data types are recommended?
<mkdir> reg and wire?
<mkdir> what's the diff between
<ZirconiumX> You can store stuff to a reg but not a wire
<whitequark> reg can be assigned from `always` statement, wire can be assigned with the `assign` statement
<whitequark> note that reg is not necessarily implemented as a register
<daveshah> Just lots of subtracts and compares...
<mkdir> but reg can be assigned outside of always too right
<mkdir> but wire cannot?
<ZirconiumX> So non-restoring division?
<whitequark> mkdir: nope, reg can only be assigned from always
<whitequark> and wire only from assign
<whitequark> the distinction between them (for synthesis) is purely syntactical
<mkdir> whitequark: https://pastebin.com/LnTG116H
<mkdir> look at code from Lattice
<whitequark> sure
<mkdir> reg types are defined outside always
<mkdir> oh
<mkdir> I see
<mkdir> wire cannot be inside
<whitequark> you define both wire and reg outside always, generally
<mkdir> hmm i see well then maybe i did not understand the previous statement
<mkdir> mkdir: nope, reg can only be assigned from always
<whitequark> you can't write `reg x; assign x = ...`
<ZirconiumX> If you see here, div_cntr* are all reg, and all have non-blocking <= assignments in an always block
<whitequark> you have to write `reg x; ... always @(...) x <= 1`
<whitequark> or `x = 1`
<whitequark> depending on whether the always block is clocked or not
<ZirconiumX> Outside the always block, the LED* outputs (which are wires) are assigned
<mkdir> ooh ok thanks i see
<azonenberg> Anybody here have a few minutes to sanity check my understanding of some bignum/crypto code?
<whitequark> nope too scared to look
<azonenberg> lol
mkdir has quit [Remote host closed the connection]
mkdir has joined ##openfpga
Lord_Nightmare has quit [Quit: ZNC - http://znc.in]
Lord_Nightmare has joined ##openfpga
mkdir has quit [Ping timeout: 260 seconds]
carl0s has quit [Remote host closed the connection]
SpaceCoaster has quit [Quit: ZNC 1.6.5+deb1+deb9u2 - http://znc.in]
SpaceCoaster has joined ##openfpga
emeb_mac has joined ##openfpga
Miyu has joined ##openfpga
<kc8apf> bignum, sure
<kc8apf> crypto, nope
<TD-Linux> might be interesting to convert integer divides to dsp blocks instead
Asu_ has quit [Remote host closed the connection]
Asu has joined ##openfpga
<whitequark> tnt: looked through boneless in search of places where 'x could help.
<azonenberg> kc8apf: i'm trying to port the C reference implementation of x25519 in NaCl to HDL
<whitequark> found two where 5 levels of muxes (2 luts) become a wire instead
<whitequark> and yes, my pass can gracefully handle that, in fact, although it doesn't right now
<kc8apf> why NaCL?
<azonenberg> Because that was the most readable 25519 implementation i found
<azonenberg> djb's ref implementation on the website was in assembly
<azonenberg> the versions in libsodium were all fancy optimized CPU code
<tnt> whitequark: how does it "know" it can be reduced to wires ?
<azonenberg> undoing all the bignum stuff and converting back to a straight logic[255:0] is a big pain
<azonenberg> the nacl "ref" implementation is the least optimized one i could find
<azonenberg> so i had the least work to undo :p
<kc8apf> ugh. premature optimization
<whitequark> tnt: well, the pass can see that 4 of 5 decision points (mux levels) lead either to one particular branch, or 'x
<kc8apf> I really shouldn't sign up for any additional projects. I'm already getting lost in Cyclone V docs
<azonenberg> and i use "readable" lightly
<whitequark> so it discards the branch that leads to 'x together with the decision point itself
<kc8apf> and I should be poking at BMCs
<azonenberg> note the 100% lack of comments outside the header
<whitequark> that reduces it to 1 decision point, which looks like `i_insn[5] ? 1 : 0`
<whitequark> which will be later reduced to a wire by a later pass
<azonenberg> ultra dense code with almost no whitspace
<tnt> whitequark: ah oki, so it's using 'x' to mark "don't cares". I had mis-understood, I thought somehow the pass "magically" knew which ... I didn't understand how obviously.
<azonenberg> no descriptions of what the code actually DOES
<azonenberg> no theory of operation, etc
<whitequark> yes, that was me considering whether boneless has any places where 'x in decoder would actually help
<azonenberg> i'm slowly porting things and still don't actually understand how some parts of it work
<whitequark> it would not improve clock speed, but it would shave a few luts... maybe like 5
<kc8apf> classic C programmers worried about all the wrong things
<kc8apf> see all the reciprocal handling, probably unnecessary
<azonenberg> like, i've written ultra optimized code too
<kc8apf> but everything thinks they need to avoid standard libraries
<azonenberg> But it was HEAVILY commented
<azonenberg> yeah, but at least i know this version works
<azonenberg> So as long asi feed test vectors to it and to my code, i can make mine readable :p
<tnt> whitequark: I mean, depending how you coded the decode logic it might/might-not help. Like if you kind of "manually" assigned those bit of the op code to directly control that mux and hardwired it already, obviously it won't help much. (that's just a random example)
<whitequark> tnt: so what I did in the boneless decoder is that it is 100% table driven
<whitequark> i.e. it uses pattern matching to map input to output, for every input and output
<whitequark> even when there is actually a field in the instruction
<whitequark> the idea is that, best case, the synthesis tool unfolds all that and it's just wires
<whitequark> but if i make a mistake somewhere, i still get correct behavior
<tnt> Let me have a look at the code, that might be easier for me to understand.
<whitequark> e.g. the entire self.o_cond switch is a no-op, or at least is supposed to be
<whitequark> I only helped the synthesizer in one place... manually hoisted o_cond and o_flag out of the jump opcodes, so it is not reset to 0 when the operation is not a jump
<TD-Linux> azonenberg, if you aren't tied to that p
<TD-Linux> particular curve the folks in #secp256k1 are pretty good
<azonenberg> I'm doing 25519 per requirements of a protocol i want to use it on
<azonenberg> doesn't support any other curves
<azonenberg> This is hyper-optimized code i wrote ten years ago for password cracking
<azonenberg> back when gpgpu was just taking off
<azonenberg> doing things like using duff's device for unrolled loops with variable iteration counts using a single conditional
<azonenberg> (which i independently invented, I didn't know this was a standard thing with a name until years later)
<tnt> whitequark: I see. But you kinda helped a bit manually by having multiple "with m.Switch(self.i_insn):"
<whitequark> tnt: did I?
<tnt> (like L347 and L358 for instance)
<azonenberg> ah ok this isnt the exact code i was looking for
<whitequark> tnt: nope, that is done solely for readability
<azonenberg> but somewhere in that tool i think i have a strcat() that never touches ram
<whitequark> tnt: actually i had to add a pass to my proc_match pass so that these nested switches would be *as efficient* as a flat one
<whitequark> ie it is worse
<tnt> whitequark: Oh wait, my bad, ABS and LIT aren't instructiong.
<whitequark> oh yeah that too
<azonenberg> this is what good optimization looks like... WITH COMMENTS
Jybz has quit [Quit: Konversation terminated!]
<whitequark> tnt: i'm 95% certain i can match or beat that with my new pass
<tnt> heh yeah, I'd have to try it, but that requires rewriting it to output switch statements rather than a truth table.
<whitequark> i need to finish the pass first... i have the priority decoders taken care of
<whitequark> but no connection to pmux cells yet
<whitequark> i want to stuff in a few more optimizations too
<whitequark> one optimization to improve delay with some coding styles, and another optimization to improve techmapping of very long literal comparisons
<whitequark> currently if you compare with something like 000000001 it synthesizes into a long mux chain
<whitequark> but it should just be a $eq cell
dh73 has quit [Remote host closed the connection]
dh73 has joined ##openfpga
<ZirconiumX> Since CRC works by XORing when it reaches a most-significant 1 bit, then CRC on all-zeroes is either the polynomial or the initial register value, right?
<ZirconiumX> Trying to break a CRC on a bitstream and it's not going well
Miyu has quit [Ping timeout: 245 seconds]
<adamgreig> lots of scope for weird tricks with crc though, like reversing input/output order and inverting output
<ZirconiumX> Yeah...
<ZirconiumX> Been running backwards and forwards through stuff with that
<ZirconiumX> It's not very trivial to extract 916-byte frames from a hexdump to give to reveng
<mwk> ZirconiumX: not exactly
<mwk> it's common to use non-0 initial register value (all-1 usually)
<mwk> if the polynomial is any good, using all-0 input will effectively run the polynomial as LFSR and give you a random-looking value
<ZirconiumX> Mmm
<hackerfoo> If you know (can guess) the exact algorithm, you could just try all 64k polynomials, right?
<mwk> no
<mwk> there are more parameters
<mwk> and there are only 64k polynomials if it's a 16-bit CRC
<mwk> also, if it's a CRC, cracking it is much much simpler
<mwk> the whole thing is linear
<mwk> well, affine with some common modifications
<ZirconiumX> Yup
<mwk> oh yes
<mwk> that's the article I was looking for
<ZirconiumX> Read it
<adamgreig> the article is a classic but surprisingly unhelpful in practice i found
<hackerfoo> This looks neat: http://reveng.sourceforge.net/
<adamgreig> unless you're on top of your linear algebra it's still a bit of a guessing game
<mwk> just beware
<adamgreig> or anyway it was for me and i was meant to be very on top of linear algebra at that point :p
<mwk> we're talking FPGAs
<mwk> the modifications can be something noone in their right mind would implement in software
<mwk> on one FPGA family, Xilinx computes CRC-16, but feeds 36-bit words into it
<mwk> which are a concatenation of just-written 32-bit data word and current 4-bit destination register address
<mwk> on another FPGA family, Xilinx computes a 22-bit CRC from 22-bit words
<mwk> and figuring out *what* goes into a CRC is often harder than figuring out the algorithm itself
<adamgreig> or at least, once you know exactly what goes in and what comes out it should be relatively quick to get the algorithm
<mwk> in my experience, it's the other way around
<mwk> first figure out more-or-less the algorithm
<mwk> inducing neighboring bitflips and comparing the resulting XOR values for many places in the bitstream is useful here
azonenberg has quit [Remote host closed the connection]
azonenberg has joined ##openfpga
<ZirconiumX> So, I literally just put in 914 zero bytes followed by 6C 93 and RevEng cracked it
<ZirconiumX> Now to see if it holds for the other data I have
<ZirconiumX> (CRC-16/MODBUS)
<ZirconiumX> Yes, it is
<ZirconiumX> Well, that was easier than expected
Asu has quit [Quit: Konversation terminated!]
dh73 has quit [Ping timeout: 260 seconds]
Bike has joined ##openfpga
genii has quit [Read error: Connection reset by peer]