mark4 has quit [Ping timeout: 264 seconds]
dave0 has quit [Quit: dave's not here]
boru` has joined #forth
boru has quit [Disconnected by services]
boru` is now known as boru
<MrMobius> cheater, it depends on what you're trying to accomplish. if you need good performance, forth on the 6502 is an especially bad match
<MrMobius> but if youre drawn to forth for other reaons like playing around for fun or being able to compile it on the machine rather than cross compiling it could be good
<MrMobius> cheater, I did this comparison of assembly, C, and forth for the 6502 and forth was 10-20x slower than assembly http://calc6502.com/RobotGame/summary.html
<MrMobius> ymmv if youre not going for raw performance though
<cheater> MrMobius: why is it so bad for 6502?
<cheater> and yeah i was thinking about performance
<MrMobius> for one, cells are 16 bits in size and the 6502 is an 8 bit processor so 8 bit computations are automatically more than twice as slow
<MrMobius> as opposed to a 16 bit processor where 8 and 16 bit calculations make take the same amount of time
<MrMobius> another is that to get good performance on the 6502 when looping through data, you need to use the X or Y registers to index into memory
<MrMobius> keeping a 16 bit pointer on the stack and incrementing it like you do in forth is many times slower than indexing
<MrMobius> also you take a big speed hit using stack addressing on the 6502 compared to using static zero page addresses
<MrMobius> these are all performance issues though. you may not care about any of that if you dont need max performance
<MrMobius> other reaons too. you can also use index registers as counters. storing loop counters on the return stack as forth does is several times slower
<MrMobius> cheater, thinking of doing forth on a 6502?
jsoft has joined #forth
gravicappa has joined #forth
<cheater> MrMobius: i'm thinking what a good higher level language would be for a 6502 based computer
<cheater> MrMobius: how do you think forth compares to basic?
<cheater> in terms of speed
<cheater> MrMobius: also, why do cells have to be 16 bit?
<siraben> MrMobius: oh, I recall your article on comparing assembly, C and Forth.
<siraben> the performance hit seems very high
xek has joined #forth
xek has quit [Ping timeout: 240 seconds]
<proteusguy> would be incredibly rare for a basic to outperform a decent implementation of forth. basic is usually implemented as a token interpreter. lots of runtime overhead to keep its space small.
tabemann has quit [Remote host closed the connection]
tabemann has joined #forth
mtsd has joined #forth
gravicappa has quit [Ping timeout: 246 seconds]
xek has joined #forth
mtsd has quit [Quit: mtsd]
jsoft has quit [Quit: Leaving]
mtsd has joined #forth
mtsd has quit [Quit: mtsd]
dave0 has joined #forth
xek_ has joined #forth
xek__ has joined #forth
xek has quit [Ping timeout: 272 seconds]
xek_ has quit [Ping timeout: 264 seconds]
<MrMobius> cheater, probably pretty well if you go with an STC like Tali Forth 2
<cheater> STC?
<MrMobius> subroutine threaded forth
<MrMobius> there are different ways to organize things internally. one way is to have a list of addresses of subroutines and jump to each one by one
<MrMobius> it's a lot faster though to have the same list with a jump to subroutine instruction before each address
<MrMobius> so the call takes up 50% more room but you avoid pointer calculations which are really slow on the 6502
<MrMobius> cheater, cells could also be 32 bit or even 64 bit. it just makes sense to keep them big enough to hold a pointer
<cheater> why not 8 bit?
<MrMobius> then you would have 8 bit pointers and only be able to address 256 bytes
<MrMobius> or have a scheme where pointers take up 2 cells
<cheater> i mean for short jumps that's fine enough, right?
<cheater> it's only long jumps where that's a problem, and you should have two or more cells
Zarutian_HTC has joined #forth
<cheater> 65c816 has 24 bit address space
<cheater> so you might want to use 3 bytes for the pointers
<cheater> i guess you'd have to push them all on the stack and then jump, or something
<cheater> for example, you could actually push the machine code instruction for jump onto the stack, then the 24 bits of the pointer, and then have a forth function called "interpret_jump"
<MrMobius> they will basically all be long jumps and evrn short jumps wouodnt work well since youre either jumping into the first 256 bytes where the stack is or doing a relative jump which will be slow
<cheater> yeah
<MrMobius> ya someone made a forth for 816 and says he got a huge speedup
<cheater> wonder how?
<cheater> oh yea the 816 has 16-bit registers
<cheater> nice
<cheater> i wonder if you could make a basic interpreter that runs on forth?
<MrMobius> and better addressing so some of that is less painful. havent uses it myself though. some people say the processor is unpleasant to work with
<cheater> wonder why they say that
<MrMobius> sure. you would hate your life and it would be ungodly slow if on the 6502 but you could do it
<MrMobius> several reasons. its not a 16 bit redesign. its an 8 bit 6502 with 16 bit stuff frankensteined on top
<cheater> i'm mostly thinking of putting it on the 816 to be honest
<cheater> also let's be honest with each other no one's going to write huge programs like this, for anything even medium complex they'd write some other language on some other machine and cross compile
<cheater> so maybe if you have an 8 bit pointer and can only address 256 bytes at first, maybe that's perfectly fine for a simple program
<MrMobius> like most 16 bit machines will have an 8 bit and a 16 bit version of an instruction like ADD but the 816 just has an 8 bit mode and a 16 bit mode for each register and only one ADD instruction
<cheater> oh
<cheater> interesting
<cheater> MrMobius, basically the 6502 was meant to treat the zero page as registers, that's why operating on that is cheap (or supposed to be, i don't actually know)
<MrMobius> so when you jump into a function or into an interrupt you may not know what modes your in which is a real pain
<MrMobius> cheater, the problem is though that using 8 bit pointers doesnt just mean your program has to fit in 256 bytes but also that the whole forth interpretter does if you want to be able to access the address of primitives or other stuff
<cheater> hmmm right
<MrMobius> and even then you cant fit much of a meaningful program in 256 bytes. the code for my robot game is 40k or so I think
<cheater> what if i made it a thing where /writing/ to the stack is more expensive but reading from it is cheaper?
<MrMobius> how would you do that?
<cheater> i.e. when you write to the stack, you basically write the machine code you'd want to have
<cheater> but that'll have to be taken from memory somewhere else, which is more expensive
<cheater> some sort of partly pre-compiled forth
<MrMobius> hmm, it doesnt work like what youre describing
<cheater> why not?
<MrMobius> what machine code are you writing when you write to the stack? you dont put instructions on the stack, just data
<cheater> i was thinking of one of two designs. design 1: have the stack grow backwards, i.e. opposite to the way machine code is interpreted. then run that. design 2: when committing data to the stack, pre-place holes that you will later populate with instructions once you're ready to. once you're at the point of executing that instruction together with its data, you take the ultimate result, place it where the instruction was, and discard the rest (i.e. the
<cheater> locations where the hole's data was). this means the stack structure is retained and the result of this instruction can be used in the next instruction, for which there will be a hole as well.
<MrMobius> why would you put machine code on the stack though? there's no reason to jump into the stack and start executing things
<MrMobius> if youre leaving a hole there to jump into, you can only put on instruction there
<MrMobius> even the simplest forth instruction takes several steps so needs a dozen or more bytes of instructions
<cheater> i mean yeah. i guess this is a thing that would need to be integrated into the design. i was just trying to present the idea in a simple fashion
<cheater> nothing's stopping you from having larger holes, or having multiple-hole systems
<cheater> like maybe commit to two holes: the instruction to jump to, followed by its data (not holes), then at the end you have a hole for another jump instruction
<MrMobius> doesnt sound like it would be any faster than what already exists, but what you could do is get a forth for your pc and learn it really well then see if you can implement your improve version on the 816 or some other platform
<cheater> tbh i'd probably write an 816 emulator and try to figure how to build something for that
<MrMobius> ya what your describing is the the thread of instructions not the data stack. you can inline data in the thread and use the return address to get the address of the data
<cheater> i mean 8502
<cheater> er 6502
<cheater> what's wrong with me today
<cheater> what is the thread of instructions? that's not a standard forth feature, is it?
<MrMobius> its the list of subroutine calls I was talking about before. every forth has some sort of thread
<MrMobius> go for it. there are a lot of 6502 emulators. it's a good project to learn about the chip. there's even a pretty well tested verification program you can run to test your emulator
<cheater> don't you think a 6502 forth would be faster without a subroutine thread?
<cheater> looks like refering to non-immediate data is the real perf killer here
mtsd has joined #forth
<cheater> that's why i was thinking of this hole system, because that makes the data always immediate to the instructions
<cheater> how many bytes of arguments does a 6502 machine instruction take? is this variable? what about the 816?
<MrMobius> cheater, faster? how else are you going to run anything?
<MrMobius> I see what you mean. there is a form of this you can use in some situations
<MrMobius> you write instructions to memory and modify their arguments at run time. this is called self modifying code and only works in specific circumstances. you couldn't do everything that way. there is overhead to filling the holes
<MrMobius> yes variable. 0-2 on 6502. 0-3 I think on 816 but not sure
gravicappa has joined #forth
dave0 has quit [Quit: dave's not here]
<MrMobius> hmm, looks like they added incremental compilation to the zig programming language
<cheater> yeah, i was thinking of smc, but instead of modifying the instruction arguments, you put up arguments and you modify the instruction that's being called
<MrMobius> I wonder if we'll see NASA running that instead of forth on satelites
<MrMobius> cheater, how would that be better though? it takes time to write the instruction there and the instruction only does one thing. it takes longer to put the instruction in the hole then to execute it
<MrMobius> it only makes sense if the overhead of changing the instruction is less than the improvement you get from using the immediate version which seems like it would only be in a loop
<cheater> it's a trade off, either the cost is large at write (my way) or at read (your way)
<cheater> so for example, you might be able to optimize the writes. if you know you'll be doing only one specific thing with the data once that data is on the stack, just include the instruction with the data, so it's put on the stack together; and the jump-back instruction, which is the other part that gets modified, that still gets "fixed up" by the runtime
<cheater> or let's say you have a bunch of data, and you want to fold or map over it, doing basically the same thing to each piece of the data
<cheater> you'd load the data into memory with holes, then load one instruction into a zero-page "register" for easy access, and write it to all the holes
<cheater> you might need to do that multiple times due to there being multiple things you need to write into the holes
<cheater> but in general you could transform data like this. it's very reminiscent to the map/fold paradigm you'd find in functional programming.
mtsd has quit [Quit: Leaving]
<MrMobius> cheater, that's just not how that works at all. what youre describing does not match the way forth works or more importantly how the processor works
<cheater> would you like to explain the issues to me? it's okay if you don't feel like it, i'm just curious
<MrMobius> lets say you have an 8 bit stack, which I think is a bad idea and you should not do
<MrMobius> you can put an instruction to add for example in the hole and that will add the value to the accumulator but what then?
<MrMobius> that value then needs to be written somewhere and the stack pointer needs to be adjusted
<MrMobius> it would be better to use a 16 bit stack but that is even worse since nothing you put in the hole can do a 16 bit add
<cheater> the hole can be multiple bytes long
<MrMobius> so you could leave an enormous hole and put a whole bunch of instructions in there but then we're back to the huge overhead of copying that in
<cheater> also if you just have one stack, then you can use the program counter
<cheater> which is 16 bit
<MrMobius> again, thats just not how that works. the program and stack are different
<cheater> why must they be different?
<MrMobius> also, if you have the hole multiple bytes long, you arent using the immediate mode any more so you lose the speed up you get from loading an immediate
<cheater> not necessarily. the hole can be actually multiple holes: one small hole for the instruction with arguments following it, and then another longer hole for the handler
<MrMobius> because as the program counter advances, you leave the results of calculations behind you in the space for data. you need some way to pull that forward and keep working on it. thats what the data stack does. its always in the same place no matter where your program counter is
<MrMobius> cheater, are you on windows?
<cheater> yeah
<MrMobius> download this 6502 simulator+assembler and give it a try http://exifpro.com/utils.html
<MrMobius> its a really cool program. i keep it running on my taskbar 24/7 to test asm snippets
<cheater> haha that's nice
<MrMobius> try to implement what youre describing there and youll see that what I mean
<MrMobius> you may not be able to see that this doesnt work until you implement it yourself
<MrMobius> also ##6502 is very active
<cheater> *hacker voice* i'm in
<cheater> what do i do now?
<cheater> i guess i'll look for a hello world
<cheater> nice
<cheater> http://www.rosettacode.org/wiki/Even_or_odd#6502_Assembly hmm... this is Apple II code... is the syntax right for the exifpro assembler? and the apple ii specific stuff?
<MrMobius> hmm, no
<MrMobius> probably better to discuss in ##6502
<cheater> yep let me join
cp- has quit [Quit: Disappeared in a puff of smoke]
cp- has joined #forth
cp- has quit [Client Quit]
cp- has joined #forth
mark4 has joined #forth
<mark4> tabemann, that would actually be the primary use for zero page :)
<mark4> i would consider using index x for a stack pointer somewhere other than 0x100 area, leave that for the processor stack but have two sofware stacks elsewhere in ram. 64k is freeking huge :)
<mark4> i would also put the forth kernel uder the basic rom which can be paged out to reveal the ram under it
Zarutian_HTC has quit [Ping timeout: 258 seconds]
<MrMobius> mark4, yep. probably want to put the x-based data stack in zero page
<mark4> sp yes, stack no :)
<mark4> i would have sp and rp on zero page variables pointing to 0x200 and 0x300
<mark4> tho for a c64 that would probabely be $200 and $300 :)
<MrMobius> mark4, hardware stack is fixed at $100 so no choice about that
<mark4> right but i would not use that for parameters
<mark4> actually i guess that could be the return stack?
<MrMobius> right, r stack
<mark4> i forget do you have access to SP on the c64?
<MrMobius> not directly. you can transfer SP to X, modify then transfer it back
<mark4> 0x00 to 0xff is zer0 page, page 1 is stack
<mark4> aha thats right. thats good enough :)
<mark4> been a very very long time since i looked at a c64 :)
<mark4> very many fond memories :)
<MrMobius> hehe
<mark4> i learned 6502 in 2 weeks and then bought a c64 :)
<mark4> i did a hand reverse engineer of a disassembler that was published in CCI :)
<mark4> took 2 weeks and then i knew almost eveyr opcode by heart :)
<MrMobius> nice :)
<MrMobius> I didnt get one until 10 or 15 years ago off ebay. it was fun to play around with but didnt work any more when I dug it out of the closet a few years ago
<mark4> awww
<mark4> wish i still had mine
<MrMobius> tried to repair it last year and some of the ttl chips have been replace and have been socketed before it got to me since they didnt do that at the factory. problems match that chip being bad
<mark4> thers a guy on youtube that repairs c64s :)
<MrMobius> they are neat. I had a lot of ideas for neat things but kind of lost interest when I found out the process they used for the chips was flawed and the chips' lives shorten the longer you use it
<MrMobius> so no sense in leaving it on to run an 8 hour simulation or something even if it would be neat or running a server on it
<MrMobius> yep! I would be totally lost trying to fix anything without that :P
<mark4> yea and you can do it in an emulator too :)
<mark4> bbl have an errand to run
Zarutian_HTC has joined #forth
cantstanya has quit [Ping timeout: 240 seconds]
xek__ has quit [Ping timeout: 264 seconds]
xek__ has joined #forth
WickedShell has joined #forth
Zarutian_HTC has quit [Remote host closed the connection]
gravicappa has quit [Ping timeout: 256 seconds]
gravicappa has joined #forth
<bluekelp> MrMobius: say more about the chip life being shortened with use?
<bluekelp> I have an old 64 in a box again after booting it up and confirming it worked a few years ago. much nostalgia.
<MrMobius> it was boron or something they used in the process and didnt have it exactly right so the chips MOS made themselves will die with use
<bluekelp> interesting. I'll research more.
<MrMobius> bluekelp, I think I might be partly wrong. I heard Bil Herd talk about it in a video but it might just be the PLAs
<MrMobius> and you can get PLA replacements
remexre has quit [Ping timeout: 240 seconds]
remexre has joined #forth
gravicappa has quit [Ping timeout: 256 seconds]
xek__ has quit [Ping timeout: 256 seconds]
_whitelogger has joined #forth
<f-a> a friend of mine is trying to get me into soldering
<f-a> maybe I should dust off forth too
WQX has joined #forth
f-a has quit [Quit: leaving]
WQX has left #forth [#forth]
dave0 has joined #forth