<ZipCPU|Laptop>
beefok: Not that I know of. Is there something specific you are looking for?
<cr1901_modern>
ZipCPU|Laptop: : At least in the context of _temporal_ induction I think it's a better explanation. Reconciling temporal induction w/ "the induction I (was supposed to) learned in HS" >>
<cr1901_modern>
is a question I've been meaning to ask on MathOverflow, tbh
<beefok>
I'm using the iceCube2 tools, so I'm dealing with the Synplify Pro tools and I'm getting these odd errors
<beefok>
(Thanks for the quick response)
<ZipCPU|Laptop>
Can you post the errors in a gist and post the link to the gist?
<beefok>
Yeah, sec! It's one error multiple times over, so that simplifies it lol
<beefok>
For instance: :FX689 : cpu_next.vhd(64) | Unbuffered I/O u3.ma[12] which could cause problems in P&R
<ZipCPU|Laptop>
Hmm ... ok ... can you post your code at all?
<beefok>
there's 332 of these errors, for literally every in and out of each entity that aren't top level in my module
<beefok>
Yeah that is pretty vague isn't it, sorry, sec
<ZipCPU|Laptop>
Is your code relatively simple, or extremely complex?
<beefok>
It's a full CPU design, but it's really not that complex
<ZipCPU|Laptop>
Really? I'm interested --- which CPU?
<beefok>
my own design :)
<ZipCPU|Laptop>
Even better!
<beefok>
I'm designing a video game console and every portion of it is FPGA based
<ZipCPU|Laptop>
I love it!
<ZipCPU|Laptop>
Let me know if there's anything I can do to encourage you.
<beefok>
the cpu design is old -- I've changed it since then
<beefok>
it's all in deep WIP mode
<cr1901_modern>
Saw your ABC tweet on my feed, thought "jeez, frequency illusion is _really_ hitting hard this week wrt "ppl doing formal verification""... then I realize "wait, doesn't someone in #yosys use Clash?"
<ZipCPU|Laptop>
A 12-bit register machine?
<thoughtpolice>
Well, I've been doing semi-formal methods for a while as a Haskeller. Just in a fairly different domain. (I used ABC for model checking functional cryptographic code previously, for example, so I'm somewhat familiar with it.)
<ZipCPU|Laptop>
Your brave, beefok.
<beefok>
yeah, haha
<beefok>
It's a good middle between 8-bit and 16-bit
<beefok>
it's odd, I know
<cr1901_modern>
I don't even know what ABC really does other than "AIG means And-Inverter Graph"
<beefok>
anyway this is just playing around until I get the cpu done
<cr1901_modern>
"Doing NTSC from discrete analog components" is a bucket list idea of mine
<beefok>
it uses a cp2130 usb->spi interface for programming, I didn't realize how easy it would be
<ZipCPU|Laptop>
Your not as odd as you might think. The ZipCPU originally was a 32-bit byte machine.
<ZipCPU|Laptop>
I had no end of fighting with GCC over that issue.
<beefok>
I think that makes more sense anyway
<ZipCPU|Laptop>
Nice pictures, though.
<beefok>
cr1901 -- it's oddly not too bad, though mine is all digital except a r2r dac + ntsc burst rate
<beefok>
thanks!
<cr1901_modern>
By burst rate, you just mean "the clock that generates colorburst isn't on the FPGA"?
<beefok>
I have been scared of the idea of creating a gcc backend or llvm.. or
<thoughtpolice>
cr1901_modern: That's mostly what you need to know about it! :D AIGs are simply an efficient representation for representing circuits in some problems, especially for boolean functions ("Any function that just outputs a 1 or a 0 for some input"). And that's good for formal verification, for example, to check if two circuits are equivalent: just create a boolean function f(x) = g(x) == h(x), which checks if two functions 'h'
<thoughtpolice>
and 'g' give the same output for every input.
<beefok>
yeah cr1901, or at least my fpga is clocked by the NTSC burst rate x 8, and then a pll doubles that so I can get 16 colors lol
<cr1901_modern>
So basically like the NES
<beefok>
exactly!
<beefok>
the color generation is a johnson counter exactly like the NES
<thoughtpolice>
You can represent that as a SAT problem, i.e. "is there any assignment to `x` where h(x) != g(x)". Connecting the output of two circuits and seeing if they're equivalent is a "miter" circuit (so miter circuits represent SAT problems, in a sense.)
<cr1901_modern>
Each of the 16 colors is a different clock phase in the 3.58MHz signal
<beefok>
yep
<cr1901_modern>
And presumably you mux them to choose which one to show
<beefok>
exactly
<beefok>
works pretty cleanly
<thoughtpolice>
(That's how I used ABC for cryptography: I compile two different specifications, one in C and another that is very high level, to circuits, and then connect them with a miter and throw it in a solver)
<cr1901_modern>
miter?
<beefok>
zipcpu - I'm looking at SmallerC by alexfru as a way to get a system language on my system
<thoughtpolice>
You can think of two circuits F and G, and a miter is just a circuit that does 'G(x) == F(x)'. So it's a circuit that just takes the output of two other circuits, and outputs 1 if they're the same and 0 if they differ.
<cr1901_modern>
Ahhh... there's prob a way to combine that with temporal induction to do equivalence checking
<thoughtpolice>
(FWIW, In the cryptography case there's not really a notion of sequential logic I had to deal with, really, you can formulate *that* problem purely as one of combinational logic, so in that case equivalence checking does not require lots of tricky stuff. So if you have combinational, circuits you can just take two, make a miter of them, and throw that directly to a SAT solver with very little effort, mostly)
<cr1901_modern>
That's why I was thinking of temporal induction :P
* cr1901_modern
wishes he had 1642 followers, but that would imply being a FP wizard, which is _not_ going to happen
<thoughtpolice>
Pretty useful trick in practice, though. I used it for equivalence checking but also doing things like finding hash collisions...
<awygle>
I am currently waiting to pick up my two new cats in the next like, five minutes
<promach>
for yosys-smtbmc, what is the difference between smtc file and tpl file ?
<promach>
both are used to describe assertions from what I can observe
ZipCPU|Laptop has quit [Ping timeout: 248 seconds]
eduardo__ has quit [Remote host closed the connection]
captain_morgan has quit [Remote host closed the connection]
captain_morgan has joined #yosys
mbuf has joined #yosys
uelen is now known as uelenbot
m_w has quit [Quit: leaving]
proteusguy has quit [Remote host closed the connection]
pie___ has quit [Read error: Connection reset by peer]
pie_ has joined #yosys
aw- has joined #yosys
m_t has joined #yosys
pie_ has quit [Ping timeout: 260 seconds]
leviathanch has joined #yosys
FabM has quit [Ping timeout: 240 seconds]
<promach>
ZipCPU: what do you understand about the assertion input mechanism for cycle3 example ?
dys has joined #yosys
kmehall has quit [K-Lined]
_whitelogger has quit [K-Lined]
_whitelogger has joined #yosys
nrossi has joined #yosys
<promach>
I have added "-wires" to the makefile command, yet I could not see the internal wires in the vcd file. What is wrong ?
aw- has quit [Quit: Leaving.]
proteusguy has joined #yosys
FabM has joined #yosys
sunxi_fan has quit [Read error: Connection reset by peer]
mbock has joined #yosys
mbuf has quit [Quit: Leaving]
pie_ has joined #yosys
<ZipCPU>
promach: While I've taken the tools out for a drive, I haven't lifted the hood.
mbuf has joined #yosys
<promach>
ZipCPU: huh ?
azonenberg_work has quit [Ping timeout: 240 seconds]
m_t has quit [Quit: Leaving]
mbock has quit [Quit: Leaving.]
nrossi has quit [Quit: Connection closed for inactivity]
mbuf has quit [Quit: Leaving]
<cr1901_modern>
It might still be worth writing my blog post, but taking a more concrete approach...
<cr1901_modern>
The most important "wall" for me getting started was I wasn't sure if I could trust the results until I understood what was going on under the hood
<ZipCPU>
cr1901_modern: Does this mean that you can answer promach's question? I have no idea where to start.
<cr1901_modern>
I'd have to see the example code to answer
<ZipCPU>
I'm moving on to "proving" that the ZipCPU prefetches work. (I've got three ...) These will have to prove that the WB bus acts in a "reasonable" manner.
<cr1901_modern>
I think the perception should change from "Haskell is easy" to "Haskell _is_ in fact difficult, I'm willing to help you work through it"
<thoughtpolice>
cr1901_modern: this is not the place to discuss it. But I do elaborate further in any case; I view this mostly as a failure on our part for numerous reasons
<thoughtpolice>
You’re more than free to @ me of course :)
<cr1901_modern>
Not in the mood to get 10 billion replies
<thoughtpolice>
(Also to be fair, my twitter is like 95% shitposting/dumb jokes like that so you should read everything there with a large grain of salt. Or a tall drink.)
<cr1901_modern>
I'm currently trying to _prune_ my following list, tbh :P
<thoughtpolice>
cr1901_modern: But anyway, the joke was more that "If smart people fail at it but dumb people like me succeed at it, there's clearly some structural barriers inhibiting them, which is mostly our fault". I'd actually agree being more up-front about some difficulty is a good, honest thing to do! We sugar coat it a bit.
<thoughtpolice>
I've been in the community for nearly a decade so I'm fairly frank and in tune with a lot of those vibes...
<cr1901_modern>
I don't know if, for instance, formal verification is hard or I just had to see it presented a different way. I do know that it took me a while before I found the correct resources to start getting it.
<thoughtpolice>
But it's a multi-faceted thing. Some pedagogy, some structural issues, perception, some technical issues, etc etc. Small cuts add up.
<cr1901_modern>
So, my main oversimplified reason for not being a huge FP fan is I don't care for it's reliance on a huge RT (read: GC)
<thoughtpolice>
Formal verification is largely a field that in some ways is awakening from a long age of "near-complete irrelevance". I mean, it wasn't entirely irrelevant. But it wasn't nearly as accessible as it is now. Which is still pretty bad.
<thoughtpolice>
Really bad, in fact.
<cr1901_modern>
(This is not a reflection on clifford's and other's previous presentations or blog posts, btw. In retrospect, I understand his presentations fine.)
azonenberg_work has joined #yosys
<cr1901_modern>
But I definitely _did_ need to see it a different way before it clicked
<cr1901_modern>
And that "different way" was essentially "take clifford's examples from his 2016 talk on yosys-smtbmc, examine the smt2 output, and change some asserts/assumes, and see what happens"
<cr1901_modern>
(Everybody should have a copy of the Kitten Book)
<awygle>
cr1901_modern: this is one reason I'd like to see your blog post as well as ZipCPU's. You clearly have a different perspective (see the total lack of smt2 in Dan's post), and I'd like to see both (or ideally more than two)
<awygle>
I too have a hard time learning anything without first convincing myself that the lower stack layers are sound
<qu1j0t3>
cr1901_modern: it need not rely on a 'huge RT'. see Feeley's work on Scheme, for instance
<qu1j0t3>
cr1901_modern: or TIL/ML
dys has quit [Ping timeout: 246 seconds]
<cr1901_modern>
qu1j0t3: Which flavor of ML?
<thoughtpolice>
qu1j0t3: Anything by Marc Feeley is worth looking at, to be fair.
<cr1901_modern>
I mean, OCaml's pretty daunting
<cr1901_modern>
(and what is TIL?)
<qu1j0t3>
thoughtpolice: yeah
<qu1j0t3>
cr1901_modern: TIL/ML was a research project on safe low level functional programming. It's a little hard to find, but worth reading. It's not a mature off the shelf thing, unfortunately. but i don't think there's anything inherent in FP that prevents it pushing to low levels. (that's the message of most research, it seems to me)
<qu1j0t3>
and that research is ongoing, of course. Linear types for example
<cr1901_modern>
Basically I want a "C alternative" that's portable to my vintage machines. I'm willing to relax the "no GC" requirement, but >>
<thoughtpolice>
If you restrict your domain enough you can even do without things like garbage collection entirely. If you design it right.
<cr1901_modern>
AFAIK, Haskell inherently requires a GC for its more powerful features
<cr1901_modern>
though a reigon-based memory manager is in the works
<thoughtpolice>
(e.g. Ur/Web, which can generate a GC-less web server from high level ML programs that use less RAM than bash with no GC, due it its restricted domain)
<qu1j0t3>
i think over time we will see better solutions in this area, because we're always going to be able to do better static analyses in future, which benefits all targets. (plus the continual invention)
<cr1901_modern>
I'm willing to relax the "no GC" requirement, but it needs to be something I can implement myself if no impls exist on my desired target
<cr1901_modern>
(No Forth)
<thoughtpolice>
cr1901_modern: There is no region based memory manager in the works. Such a change would be fairly invasive, but also, "region based" or "stack based" allocation makes a lot less sense in that context anyway (I used to work on the major Haskell compiler, so I'm quite familiar with it.)
<thoughtpolice>
Mostly because the notions of "stack" and where it exists are different. (The stack actually exists... on the heap!)
<cr1901_modern>
thoughtpolice: Well I mean, not all archs have a "stack". C would have to emulate them using linked lists on those archs
<thoughtpolice>
Linear types are something very different than region based memory management.
<cr1901_modern>
I managed to combine them into one concept through a large amount of confusion ._.
<cr1901_modern>
I thought "oh cool, linear types, that must imply region based"
<cr1901_modern>
thoughtpolice: ""region based" or "stack based" allocation makes a lot less sense in that context anyway" Sorry, in what context?
<thoughtpolice>
The context of something like GHC's implementation, I mean.
<thoughtpolice>
In some other Haskell compiler it would make more sense, maybe. In GHC's design it would be... weird, I think.
<cr1901_modern>
I see... well in any case, I'm willing to relax the GC requirement (though I would need a "don't GC here I know better than you!" command).
<thoughtpolice>
(In fact there was a Haskell compiler that relied entirely on a region based memory management system at first, but they eventually reneged and added a GC for the general cases it couldn't handle. The compilation model was very very different, so this kind of decision was possible)
<cr1901_modern>
hrm
<thoughtpolice>
(This Haskell compiler also, coincidentally, produced kick-ass pure ISO C99 programs, that in some cases outperformed handwritten C benchmarks :)
<thoughtpolice>
(When it worked. Which it did not always do.)
<cr1901_modern>
Yea, I know... you can also write OCaml that outperforms C b/c of the assumptions OCaml is allowed to make that C can't
<cr1901_modern>
I'm most interested in portability and "ease of rolling your own impl or porting a compiler if a compiler doesn't already exist"
<thoughtpolice>
That's the theory but doing it in practice is immensely difficult! It's quite a different thing to see it actually work. :)
<cr1901_modern>
And that includes my "vintage machines where LLVM is prob a poor fit"/
<thoughtpolice>
You can take some other tricks though, like metaprogramming to generate code, which is what some FP people do alternatively.
<thoughtpolice>
("Why write a fast FFT when I can write a program to generate a fast FFT for my specific case?")
<cr1901_modern>
B/c I'm lazy and I don't like coding
<thoughtpolice>
That's a popular approach that's been used several times in OCaml for example, like FFTW
m_t has joined #yosys
m_w has joined #yosys
<cr1901_modern>
thoughtpolice: So my issue with GC is that 1. It becomes nearly impossible to reason temporally about how long a hot code snippet will take, since the working set
<cr1901_modern>
of memory is always in flux. At least w/ cache/manual allocation after a few loops the working set will become reasonably stable.
<cr1901_modern>
And 2. It's just not easy to write a good one
<awygle>
cr1901_modern: curious, what makes you think llvm is a bad fit for vintage platforms?
<cr1901_modern>
awygle: LLVM wants code to be swapped around in registers; on, say, 6502, you have... exactly three of those
<cr1901_modern>
awygle: Actually give me a minute, had a long discussion about this some months back
<qu1j0t3>
yeah register poverty is a challenge for some compilers. i definitely had this issue on lcc.
<qu1j0t3>
cr1901_modern: done much with sdcc (or vbcc)?
<awygle>
sdcc maybe?
<awygle>
Lol
<qu1j0t3>
:)
<awygle>
I guess you'd need sdllvm
<ravenexp>
register poverty is not an issue for forth compilers
<ravenexp>
they only need 2 or 3 of those
<ravenexp>
let's rewrite the world in forth
<thoughtpolice>
I think I used 4 or 5 in mine. Horribly bloated.
<cr1901_modern>
qu1j0t3: vbcc doesn't support a feature I frequently use (multiline strings to be concatenated)
<cr1901_modern>
It is thus a non-compliant ANSI C compiler and ANSI C is the bare minimum I support in my code :D
<ZipCPU>
Did anyone ever read my count of how many registers other processors had? Most officially have 32, of which they can use about 24. RISC-V has another 66+ special purpose registers. OpenRISC has 65+ special purpose registers. LM32 has 10 special purpose registers, microblaze 25, NiOS 6.
<ZipCPU>
Wow.
<cr1901_modern>
sdcc doesn't support "structs as input args", though AFAICT, that's not at the parser level
<cr1901_modern>
"RISC-V has another 66+ special purpose registers" Oh ffs
<cr1901_modern>
ZipCPU: In my case I specifically meant just registers used as part of calculations
<ZipCPU>
For that, most of the CPU's I examined declared that they had 32, and then artificially restricted the number they actually used to something closer to 24.
<thoughtpolice>
ZipCPU: I noticed ZipCPU is suspiciously missing in that list :P
<cr1901_modern>
ZipCPU would have no registers if he could feasibly implement it :)
<ZipCPU>
No, it was presented. The ZipCPU has 1 special purpose register per mode, or two total.
<cr1901_modern>
And 0 instructions
<thoughtpolice>
Oh, I meant just in IRC just then. :)
<ZipCPU>
Meh ... not quite. I've seen smaller forth/stack based CPU's.
<ZipCPU>
As for general purpose registers, the ZipCPU has 14 per mode, for a total of 28. In reality, you can only use about 14 at a time.
<cr1901_modern>
I'm not a huge fan of the "shadow register" approach to interrupts; A. what happens during nested exceptions? B. In practice, I find code using them (z80) confusing
<ZipCPU>
Understood completely. The ZipCPU solves A by not allowing nested interrupts, and B, because of the shadow registers you don't really need assembly code to create an interrupt handler.
<ZipCPU>
You can do it all in C.
<cr1901_modern>
Doesn't matter if a *nix port isn't in your plans, but I can't imagine general purpose OSes would like that restriction
<ZipCPU>
Well, reading the Linux device drivers manual *was* one of the reasons for only allowing a single interrupt level.
<cr1901_modern>
lol
<ZipCPU>
I mean ... seriously ... if no one's going to use it, then why support it?
<cr1901_modern>
Probably breaks a device driver or two? And yes, I don't use nested ints either except to indicate something went horribly wrong
<cr1901_modern>
still think it's a good idea to indicate that condition
<awygle>
Huh, what are these 66+ special purpose registers in risc v? I haven't made it all the way through my book yet but so far I've only seen one (the pc)
<ZipCPU>
I wasn't counting the PC.
<ZipCPU>
Keep digging.
<ZipCPU>
Oh, and as I recall, the manual allows for 500+ special registers, they just don't all need to be implemented.
<awygle>
I am reading a chapter a night so I'll get back to you in about two weeks :-P
<ZipCPU>
That's why I just say 66+.
<cr1901_modern>
ZipCPU would be interesting to add to MiSoC/LiteX (I plan to do my own RISCV impl first)
<ZipCPU>
Yeah, I think it would.
<ZipCPU>
Be aware, though, the ZipCPU only supports the wishbone pipeline mode used in the B4 spec, not the more common B3 spec protocols.
<cr1901_modern>
Oh, well I believe MiSoC peripherals have to support standard mode and that's it
* cr1901_modern
would have to see a block diagram of ZipCPU's caches and WB bus to figure out whether it's feasible
<ZipCPU>
Oh I'm sure it's feasible ...
<ZipCPU>
The cache's don't change the interface any either, so they don't show up on any of my diagrams.
<cr1901_modern>
Ahhh, is it like lm32 where "caches are internal to the core", and "the WB bus connects to the caches"?
<awygle>
Needs a WB 3-to-4 bridge
<cr1901_modern>
(lm32 reserves the higher 2GB for "uncached access to I/O")
<ZipCPU>
Not quite.
<ZipCPU>
Caches are internal to the core, yes.
<ZipCPU>
But the CPU has only one external port to the external wishbone bus.
<ZipCPU>
This needs to run through a bridge, if you wish to connect it to WB-3 components.
<cr1901_modern>
I see...
<cr1901_modern>
ZipCPU: A LiteX is more likely to occur, b/c the maintainer of MiSoC has not been open to changes recently
<ZipCPU>
On the other hand, WB B3 runs at least 3x slower than B4, *and* it slows your overall clock down as well.
<cr1901_modern>
well open to changes much*
<ZipCPU>
So ... I wouldn't connect to B3 components if I didn't have to.
<cr1901_modern>
I don't know what spec MiSoC implements. I don't even think lm32 implements pipelined mode b/c ideally it's "talking to the cache" most of the time
<thoughtpolice>
awygle: There are just a lot of RISC-V CSRs, basically. The base set has only a few but the extensions add a bunch
<cr1901_modern>
Wishbone is... not my favorite. I find the spec very difficult to grok and ambiguous in some places (particularly wrt to widths/granularity)
<ZipCPU>
Yeah, see ... it makes no sense to connect a "cache" to the outside separate from the CPU. The cache should be integrated into the CPU, leaving only a standard bus to connect.
<thoughtpolice>
The variable width Vector Extensions ("V" extension) add a ton of CSRs just on their own, because the vector unit has to be configured properly.
<ZipCPU>
cr1901_modern: When comparing WB to other busses, AXI, Avalon, etc., I've always found the WB easier to understand. However, the WB lm32 implements is ... a different WB beast from what I'm doing.
<cr1901_modern>
I don't understand...
<ZipCPU>
Go on.
<rqou>
"the maintainer of MiSoC has not been open to changes recently"
<rqou>
you mean sb0?
<cr1901_modern>
Yes, but not everyone knows who he is
<rqou>
hmm, i didn't expect that he wouldn't want changes
<cr1901_modern>
What I'm getting at is "sb0 doesn't really want to extend MiSoC or Migen unless it's a new platform or something that is essentially zero maintaince burden or zero invasive changes"
<rqou>
ah that makes a lot more sense
<cr1901_modern>
ZipCPU: I mean, lm32 and ZipCPU both implement the wb spec
<cr1901_modern>
why would they be totally different beasts? Does lm32 completely violate the spec?
<ZipCPU>
Yes, but lm32 depends on a large number of "optional" registers.
<ZipCPU>
ZipCPU stripped the bus back down to the bare minimum.
<cr1901_modern>
by registers you mean "optional bus signals?"
<ZipCPU>
The spec allows for such things as (IIRC) a cycle type indicator (CTI), a burst length indicator, etc.
<ZipCPU>
That's one difference.
<cr1901_modern>
Idk if those signals are used in practice in MiSoC
<ZipCPU>
The other difference I've wrestled with has to do with the bus reset line.
<ZipCPU>
ZipCPU allows the reset to be asserted at any time during a transaction. lm32 insists one ACK or one RESET response to every request.
<ZipCPU>
My problem with the RESET line is that in my implementations, RESET is asserted by the bus interconnect--not the peripheral.
<ZipCPU>
The interconnect often doesn't know how many requests are outstanding.
<ZipCPU>
You get the picture.
<ZipCPU>
I either need to upgrade my interconnects, or ... continue to do things subtly different.
<cr1901_modern>
But... in MiSoC, the peripherals _don't_ assert the reset line
<ZipCPU>
On the other hand ... I doubt the CPU cares if the bus interconnect is done "better", such as the lm32 would use.
<cr1901_modern>
there is a clock-reset-generator IP specifically for this
<ZipCPU>
cr1901_modern: That's a good start, but consider the following scenarios:
<ZipCPU>
1) You access the last address of the flash. The controller accepts the request, and starts working.
<ZipCPU>
2) on the next clock, you access non-existent memory. The interconnect creates and returns an error to the CPU before the flash returns the error.
<ZipCPU>
3) Sometime later the flash generates a response ...
<ZipCPU>
Currently, I deal with that by terminating the entire bus transaction--hence the "flash read" request would fail.
<ZipCPU>
I leave it to the programmer not to cross devices--something which is a subtle bug in the WB spec.
<cr1901_modern>
Don't you have to _wait_ for the flash to generate a response before accessing the next memory location?
<cr1901_modern>
it should be holding ACK low until the flash is ready
<cr1901_modern>
(or is that the point of pipelined mode that "bus transactions can happen out of order"?)
<ZipCPU>
See ... that's what I like about B4 ... you can keep making requests even if the peripheral is still working on the first one.
<ZipCPU>
Bus transactions are not supposed to be able to happen "out of order".
<ZipCPU>
It just works in a pipeline fashion ... requests go into a pipeline at one speed, acks come out later after (potentially) being delayed by many clocks.
<cr1901_modern>
Then why bother making transactions if the first one isn't finished? Sure, you can store them in a queue, but unless that queue gets flushed, it will eventually get full
* cr1901_modern
is afk for now, sorry
<ZipCPU>
In the middle of a conversation?
<ZipCPU>
Yes, they get stored in a queue--spread throughout your device as timing demands.
<rqou>
does your CPU take a precise exception when a bus transaction errors?
<ZipCPU>
If the queue ever fills, you then need to stall the bus master so it doesn't make any more requests.
<ZipCPU>
rqou: Excellent questions! The answer: Not currently.
<ZipCPU>
On the other hand, I don't support virtual memory (yet).
<rqou>
I figured :P
<ZipCPU>
With virtual memory, I'll have to go back and implement precise exceptions.
<rqou>
so precise page faults, imprecise bus aborts?
<ZipCPU>
I'll probably make them both precise.
<ZipCPU>
That means I'm going to need to upgrade my interconnect too ... doable, just haven't done it.
<rqou>
hmm, is there a wishbone b3->b4 changelog?
<ZipCPU>
The B4 spec discusses the differences between what it calls "classic" mode and "pipelined" mode. They are easy enough to convert between.
<ZipCPU>
The real performance hit takes place when accessing, say, DDR3-SDRAM.
<ZipCPU>
The MIG gives you a latency of about 20 clocks or so.
<ZipCPU>
Now, if you use WB B3, your cost will be 20 clocks *for* *every* *word* *read*.
<ZipCPU>
Using WB B4, your cost can be 20 clocks plus the number of items read, N+20 vs 20N
<ZipCPU>
That's a *BIG* performance difference.
<rqou>
hmm, i thought wb b3 also had a burst mode?
* ZipCPU
opens up his spec ...
<rqou>
there's a block transfer mode
<ZipCPU>
It has a block read/write mode ... is that what you are talking about? That mode has the problem I just described--you can't move on to the next request until the last one is returned.
<rqou>
right, but the next request can return immediately if it was indeed for the next address
<ZipCPU>
P26 matches the diagrams in B3 spec, P27 shows the pipeline difference.
<ZipCPU>
Ok, P26 doesn't *quite* math ... Under B3, the address and data lines need to be held until the ack is also true.
<rqou>
i don't see how P27 violates the B3 spec?
<ZipCPU>
See page 51 of the B3 spec. The ACK comes back while the strobe is still high. The strobe then needs to be dropped for a cycle, before being raised again. Once raised, it will take a minimum of one clock to ack the next cycle.
<rqou>
but page 54 shows two back-to-back reads
* ZipCPU
turns to page 54
<ZipCPU>
Yeah, I see what you are talking about ...
* ZipCPU
strokes his beard ...
<ZipCPU>
Here's the thing ... what if the ACK doesn't come back for many cycles? The bus is stalled.
<rqou>
yes
<ZipCPU>
How would this work for a DDR3 SDRAM, which can't accomplish the read for 20+ cycles?
<ZipCPU>
BTW ... I see what you are talking about, and sit here at least partially corrected ... ;)
<rqou>
so the first ack is delayed 20+ cycles, and subsequent acks appear instantly
<rqou>
(assuming the address matches)
<ZipCPU>
While that might work for a write, it won't work for a read.
<rqou>
why not?
<ZipCPU>
The master can't change the address until the ACK returns valid.
<ZipCPU>
The ACK can't return valid, until the returned data is valid.
<ZipCPU>
Hence, your stuck doing each read individually.
<ZipCPU>
There's another problem as well: fanout.
<ZipCPU>
A bus tends to have very high fanout, and so it can easily slow down the speed of your circuitry.
<rqou>
so the first address is held for 20+ cycles, and then the master changes the address. the memory controller has already prefetched the next address, and it checks if the new address from the master matches what it prefetched
<ZipCPU>
If you can place delay stages within that, you can keep your speed up.
<rqou>
i don't see why this doesn't work
<ZipCPU>
Because some slaves have consequences when an item is read. You don't want to pre-read from those slaves before you know that's what you'll need.
<ZipCPU>
Ahm ... side-effects is a better term than consequences.
<rqou>
right, but the memory controller can
sklv1 has joined #yosys
<ZipCPU>
Then you'd need two separate controllers.
<rqou>
why?
<ZipCPU>
Let's slow down for a moment ... I can see how memory or flash could pre-read. Cool.
<ZipCPU>
But what about reading from a peripheral? Say an A/D buffer, where everything's at the same address?
<ZipCPU>
You can't pre-read if you don't know you are going to remain at that address ...
<rqou>
you can't pre-read in that case
<ZipCPU>
Besides, that also places "address update" circuitry into multiple places within your design.
<ZipCPU>
But ... what about the fanout issue?
<rqou>
these two issues still exist :)
<ZipCPU>
(Oh, and to deal with pre-reading, the lm32 IIRC implements several TAG lines ... so it knows ahead of time what to read.)
<ZipCPU>
You don't need to do that with B4/pipeline.
sklv has quit [Ping timeout: 248 seconds]
<rqou>
hmm, the only difference i see is whether address needs to remain valid?
<ZipCPU>
If the address needs to remain valid, that implies to me a round-trip between the master and slave.
<ZipCPU>
See, this is what the cycle-type indicator was meant to handle within B3.
<ZipCPU>
It gives you the ability to support several different cycle types, but it also complicates the processing within the slave.
<rqou>
hmm, the spec seems very unclear, but it seems to me that pipelined back-to-back reads are still possible on b3
<ZipCPU>
If I read the spec properly, that's discussed in chapter 4: WB register feedback.
<rqou>
hmm, wishbone seems to conflate "burst" and "atomic"
<ZipCPU>
Registered feedback requires a cycle-type indicator.
<ZipCPU>
This means you have to know, before accessing whatever, that you'll be doing a multi-read cycle.
<ZipCPU>
While this may make sense for a cache, it's an arbitrary requirement otherwise.
<rqou>
huh, i missed that
<rqou>
yeah, that seems somewhat unnecessary
<ZipCPU>
The cool thing is ... the ZipCPU, when I last evaluated Dhrystone, had a *really* awesome performance--for a data-cacheless CPU.
<ZipCPU>
Why? Because it exploited the pipeline bus access capabilities of B4.
<rqou>
hmm, that just sounds like "i used prefetching"
<ZipCPU>
Heheh ... no, it's more than that ;)
<ZipCPU>
Consider write delays. How fast can you write using WB/B3?
<ZipCPU>
How many cycles will each write take?
<rqou>
ah, you have write buffering too
<ZipCPU>
Let's suppose you have a write-through cache ...
<ZipCPU>
So every write to memory goes to the bus.
<ZipCPU>
How many cycles will it take to write N consecutive items to the bus?
<rqou>
so you basically have a one-line cache :P
<rqou>
(of course the implementation looks nothing like a cache)
<rqou>
ZipCPU: are there hazards when writing to an address and then immediately reading from it?
<ZipCPU>
Probably. I don't usually do that, though.
<rqou>
when i was doing a cpu, hazards and branching made up 90+% of the bugs
<ZipCPU>
"usually"? Actually, I don't do that at all.
<ZipCPU>
^^: +1
<ZipCPU>
Same here.
<rqou>
the first opcode i implemented was actually "jump and link" for this reason :P
<ZipCPU>
Really? Gosh, I think I started with the ALU, and only learned the lesson the hard way.
<rqou>
ok, i had "add" as well because that's pretty trivial
<rqou>
one step away from turing completeness i believe (needs compare)
<rqou>
:P
<ZipCPU>
Add + compare + LR = completeness? No requirement for memory reading/writing?
<rqou>
supposedly "subtract and branch if less" is Turing complete by itself
<ZipCPU>
Sigh ... I'm losing my respect for turing completeness the longer we chat on this topic. ;)
<rqou>
actually you're right, this has to modify memory
<rqou>
but it doesn't require registers :P
<ZipCPU>
"hazards and branching made up 90% of the bugs" ... I like that. I'm going to try to remember it, so that I might quote you later on it. It's just so true.
<qu1j0t3>
cr1901_modern: it seems you could submit a patch? they respond to contacts
ekiwi has joined #yosys
m_t has quit [Quit: Leaving]
<rqou>
o/ ekiwi
<rqou>
i see your berkeley.edu reverse-dns
oldtopman has quit [Ping timeout: 258 seconds]
oldtopman has joined #yosys
<cr1901_modern>
qu1j0t3: I
<cr1901_modern>
ve also asked him privately "would you accept X" and most of the time the answer is "it depends"
<cr1901_modern>
Also, back, but kinda don't really have spoons for convo right now :(
leviathanch has quit [Remote host closed the connection]
<ZipCPU>
YES!! It's suppertime here, but I just managed to finish proving all three of my ZipCPU pre-fetch modules!
<qu1j0t3>
cr1901_modern: *nod*
<qu1j0t3>
ZipCPU: nice
eduardo has quit [Quit: Ex-Chat]
azonenberg_work has quit [Ping timeout: 248 seconds]