boru` has joined #forth
boru has quit [Disconnected by services]
boru` is now known as boru
<mark4> :)
<mark4> grats on the stm dma
<tabemann> zhey
<tabemann> *hey
<mark4> trying to convert my 32bit x4 into 64 bit x64. finally got it all to assemble and the listing file is huge but the resultant linked binary is like 400 bytes in size
<mark4> and the linker bitches about not being able to find the origin symbol
* tabemann really wishes matthias would relicense mecrisp-stellaris as BSD or MIT
<mark4> if i link without using the custom linker script that works fine for the 32 bit forth i get all my missing sections but it STILL cant find origin
<mark4> yet readelf -a | grep origin finds that global
<mark4> dafq
<tabemann> weird
dave0 has joined #forth
<tabemann> cmdptr: socialized medicine is about distributing the costs of healthcare across society, because in very many cases the individual cannot afford the costs healthcare, but everyone can afford healthcare when the costs are distributed across society
<tabemann> anyways, I'm not convinced by such libertarianoid arguments
<tabemann> *costs of healthcare
<cmtptr> tabemann, would you support an opt-in communal health care system?
<tabemann> no, because then it would concentrate the costs to those who do need at the present, ultimately making it unaffortable, while fucking over those who shortsightedly don't opt in thinking they won't get sick or injured, and then they do
boru` has joined #forth
boru has quit [Disconnected by services]
boru` is now known as boru
<cmtptr> so your utopia only works if you take by force
<cmtptr> fuck you
<tabemann> it's no different than paying for upkeep of the roads
<tabemann> do you suggest that you not have to pay for taxes that support road repairs?
<cmtptr> i do have to because shitheads like you have voted to remove my ability to choose
<tabemann> here is a question - do you believe you have a right to not pay taxes to support the police and military - I ask because these are the sorts of things that libertarian types seem to believe are the only legitimate functions of the state
proteus-guy has quit [Ping timeout: 244 seconds]
<cmtptr> i never said i was libertarian
<tabemann> your reasoning sounds like that of one
<cmtptr> what i said was fuck you for wanting to steal what i've earned
<cmtptr> and i sincerely mean it
<tabemann> so then you should expect to, if you aren't willing to contribute, when you do get sick or injured down the line, to be royally fucked over
<tabemann> the problem is that people are shortsighted
<cmtptr> this is not a political dispute. when you start talking that way, they are fighting words
<cmtptr> i do contribute, asshole
<cmtptr> and then i'm compensated in the form of monetary payment and benefits
<cmtptr> and now i'm going to bed. goodnight!
<mark4> you have no right to demand that i give my money to you when you wont go out and work to earn your own. thats the ENTIRE purpose of socialism
<mark4> i get to chose where the money I earned goes not my government
<mark4> going to work, earning a living, paying taxes IS contributing to society. when the government starts taking money in taxes from those who are earning it simply to give it to people who refuse to work the government has stepped outside the bounds of their delegated authority
<mark4> i spent seven fucking years as an unemployment statistic and not ONCE did i steal or take money from my government
<mark4> not once
<mark4> my normal rate of pay prior to that was $50+ an hour
<mark4> and after eight years im back there
<mark4> in the first 7 years i had THREE very short term jobs that were burger flipping wages
<tabemann> my point is this - there is a reason why you can't opt out of paying taxes - and at the same time healthcare is exactly the kind of thing best supported via taxes
<tabemann> so if you aren't willing to pay for taxes for healthcare, then you imply that you shouldn't have to pay for taxes to support roads, the police, etc.
proteus-guy has joined #forth
<tabemann> the people who are like "taxes are theft" to me remind me of the rich people who kid their money in offshore accounts, while the rest of us pay our share
<tabemann> *who hide
<tabemann> let's take this logic a bit further
<tabemann> let's say you have no kids
<tabemann> but public schools are publicly funded
<tabemann> so are public schools theft?
Zarutian_HTC has joined #forth
<tabemann> hey proteus-guy, Zarutian_HTC
<proteus-guy> hello tabemann
<proteus-guy> even if you have kids, public schools are theft.
<tabemann> to me the logic with public schools is simply that it is in society's interest that children are educated - without public schools then you'd end up with children not getting educated inevitably, and that would not be in society's interest
irsol has quit [Ping timeout: 264 seconds]
<tabemann> the problem with public schools in reality is not that they are socialized but that they are not socialized enough - they are typically paid for by local property taxes, so poorer areas have worse public schools and richer areas have better public schools
dzho has quit [Ping timeout: 264 seconds]
<tabemann> when what'd make sense is make them funded on a national level, to make them as even as possible
spoofer has quit [Ping timeout: 264 seconds]
spoofer has joined #forth
irsol has joined #forth
dzho has joined #forth
<tabemann> on another note
<tabemann> the license notice in zeptoforth takes up a ton of space - 956 bytes to be xact
<mark4> what is zepto?
<tabemann> zepto is a SI prefix
<tabemann> 10^-21
<tabemann> zeptoforth is named zeptoforth because micro, nano, pico, femto, and atto were already taken
<tabemann> even though zeptoforth isn't small at all in reality
<tabemann> the kernel for STM32L476 is 25144 bytes in size, and the full binary for STM32L476 is 106496 bytes in size
<mark4> thats armv7a?
<tabemann> Cortex-M4
<mark4> you could probably put T4 on there and trim a whole bunch of fat from it and reduce it in size
<mark4> yes thats armv7-m which is thumb2 only
<mark4> t4 is thumb2
<tabemann> yeah, it's all thumb-2
<tabemann> it could be smaller when all the forth is compiled if it had a peephole optimizer
<tabemann> whereas currently the only optimization it does is inlining
<tabemann> at the point it doesn't compare to mecrisp-stellaris RA yet
<mark4> -rwxr-xr-x 1 cox cox 17548 Aug 2 22:56 t4k
<mark4> thats the kernel
<mark4> and some of that is linux specific
<mark4> -rwxr-xr-x 1 cox cox 73790 Aug 2 22:56 t4 thats the extended but most of that wont be relevant to an embedded device
<mark4> oh yea i COULD add an optimizer to t4
<mark4> its sub threaded
<tabemann> the "full" binaries for zeptoforth contain stuff like multitasking and a disassembler
<mark4> and i dont even have an assembler lol
<mark4> i dont have an assembler or disassembler
<tabemann> they would be quite a bit smaller were there no disassembler
<mark4> i think it has a decompiler tho
<mark4> do you get the sources?
<mark4> to the whole thing i mean
<tabemann> what do you mean?
<mark4> yea i just found the github
<mark4> not impressed with the almost complete and utter lack of anu comments in their sources
<mark4> i wouldnt give them the time of day just for that
<mark4> and adding multi tasking is trivial btw
<tabemann> the multitasking was easy
<mark4> YOU wrot it?
<tabemann> yes
<mark4> oooooh
<mark4> you wrote an arm assembler in arm assembler?
<mark4> i get pissy when i see asm sources with no comments lol
<tabemann> it's assembled with gas
<mark4> aha
<tabemann> and the arm disassembler is written in forth
<mark4> yea so is T4
<tabemann> and I apologize for the lack of asm comments
<mark4> i need to write an arm assembler for t4 but t4 is 32 bits and im thinking of retiring all my 32 bit forths and only developing the 64 bit versions
<mark4> :)
<mark4> well when you write something like that you know every inch of it so the code is obvious for you
<mark4> have you looked at t4 at all?
<tabemann> I haven't looked at t4 myself
<mark4> what threading do you use?
<tabemann> SRT/NCI
<mark4> oh yea i should have looked at your primitives doing a bx lr
<mark4> thats enough to tell me
<mark4> if next is a bx lr you must be srt :)
<tabemann> bx lr or pop {pc}
<mark4> ya
<mark4> neither of which are valid on aarch64 :/
<tabemann> if I were going to target aarch64 I'd probably just port hashforth to it
<tabemann> as, by default, hashforth is a 64-bit forth
<tabemann> even though it can be scaled down to 32-bit or even 16-bit
<mark4> check out my gnu as macros for t4 that allow me to lay down forth code and headers in different sections
<mark4> code "blah", blah
<mark4> ...
<mark4> ...
<mark4> and colon "blah" blah
<mark4> ...
<mark4> do you use hashed vocabularies?
<tabemann> I haven't bothered, because to me vocabulary lookup is not something that happens often (i.e. only when code is being compiled)
<tabemann> hashforth puts code and headers in different spaces
<tabemann> with zeptoforth I didn't want to commit certain amounts of space to each
<mark4> when i wrote my dos 16 bit forth using A386 the assembler could hash the vocabularies at assemble time
<tabemann> and I wanted to write flash from low to high consistently, rather than in separate sections
<mark4> with gas i have to link every word against the first thread of the voc and fix it at extend time :/
<mark4> one time fix
* tabemann could probably pare down the compiler section of the code by hardcoding instruction values rather than writing out routines to generate instructions based on parameters, when in many cases the parameters are always the same
<tabemann> zeptoforth isn't very zepto despte its name
<tabemann> somehow matthias has got mecrisp-stellaris RA smaller than zeptoforth despite having a whole optimization engine included
<mark4> a peephole might not reduce it by much
<tabemann> but apparently matthias is an assembly wizard, while this is the first large project I have implemented in assembly (I did a bit of MIPS assembly in college, but that was not much)
<mark4> not in size but in performance i can see
<mark4> so instead of doing "bl dup" you would simply inline "push tos"
<mark4> who is matthias ?
<tabemann> matthias koch, author of mecrisp-stellaris
<mark4> which i also dont know anhything about :)
<tabemann> it's basically the predominant implementation of forth for cortex-m
<tabemann> there are like a couple others, and there is also zeptoforth, but if one wants to use forth on cortex-m it is basically the way to go
boru has quit [Read error: Connection reset by peer]
boru has joined #forth
<mark4> no, thats t4 :P
<mark4> t4 runs under linux tho, not bare bones
<mark4> tho i was porting it to the NXP K64F
<tabemann> mecrisp-stellaris is bare metal
<mark4> problem is, i cannot get flash writes working on the K64F and everyone on the forums is as stumped as i am
<tabemann> have you tried porting to STM32L476 or STM32F407? I've got implementations for those, and things have been pretty painless (aside from some issues with the F407's flash controller's erase feature)
<mark4> and i got so damned frustrated that I basically gave up till i can get some competent help
<mark4> lol
<tabemann> (hint: the F407 only erases flash in huge sectors, which means that one wants to carefully align code with sector boundaries if one wants to not waste a ton of space when setting MARKERs)
dzho has quit [Ping timeout: 264 seconds]
<tabemann> mark4: why just that MCU? there's a ton of MCUs out there, like the wide range of STMicroelectronics ones
dzho has joined #forth
remexre has quit [Ping timeout: 246 seconds]
<mark4> i was working in austin texas for a company that used that cpu
<mark4> and that cpu is ultra powerful
<tabemann> ah
<mark4> but i stonewalled on not being able to write to flash
<mark4> ultra annoying
<mark4> the code is 100% perfect and non functional
remexre has joined #forth
<mark4> p.s. the nxp forums are a way for nxp to entirely wash their hands of doing support of ANY kind what so ever
<mark4> and nobody on those forums could figure out wtf was wrong with my code
<tabemann> I've never dealt with NXP
<tabemann> just STMicroelectronics and Nordic Semiconductor
<mark4> nordic make some good stuff :)
<tabemann> yeah, but the nordic board I have doesn't have proper pins for SMD but rather little spots on the side for soldering, and I can't solder
<tabemann> and the other way of writing to it is with a proprietary bootloader, such that your code won't actually control the whole board from bootup on
<tabemann> there is a better nordic board which seems like it might have pins for SWD
<tabemann> but it's $100, and I don't want to shell out the money for it
<tabemann> oh, also, if you overwrite the proprietary bootloader, you essentially brick the board until you can write to it with SWD
<tabemann> so I haven't bothered with the nordic board
<tabemann> the advantage of the L476 and the F407 DISCO boards I have is they have a builtin SWD programmer, so I can write to them with st-flash without dealing with programmers, SWD, or whatnot
<mark4> i have a j-link edu
<mark4> the k64f comes with a jtag header already populated
<mark4> for work i have the wonderful job of developing for PIC32 :)
<tabemann> that's MIPS right?
<tabemann> MIPS would be a nice arch were it not for the damned delay slot - lol
mark4 has quit [Read error: Connection reset by peer]
mark4 has joined #forth
<tabemann> that's MIPS right?
<tabemann> MIPS would be a nice arch were it not for the damned delay slot - lol
<mark4> stupid interwebs
<mark4> yea thats MIPS
<mark4> not a fan. think the mips arch is flawed
* tabemann remembers that the MIPS assembler he used in school hid the delay slot from the programmer...
<mark4> no processor status word and any time you do an operation that causes an overflow that is an exception
<tabemann> ugh
<mark4> ya. i was actually writing a MIPS32 assembler
<mark4> in fact the only thing left to do is the local lables and branch resolution
<tabemann> why in fuck should overflow cause an exception?
<mark4> and its complete
<mark4> because MIPS was a stupid idea
<mark4> do you know what MIPS stands for?
<tabemann> millions of instructions per second
<mark4> Microprocessor without Instruction Pipeline Staging
<mark4> they back peddled on that though it now has a pipeline
<mark4> but the original MIPS did not
<mark4> well i was almost right
<mark4> was from memory
<mark4> MIPS (Microprocessor without Interlocked Pipelined Stages)[1] is a
<mark4> according to wikipedia, the oracle of all knowledge, true and false
<tabemann> how do you even make a conventional programming language work with exceptions on overflow?
<mark4> you write code to test if there is going to be an overflow if you do the operation
<mark4> and you handle the overflow that did not happen yet
<tabemann> that's going to result in a lot of unportable code w.r.t. code written for other archs that just handle overflow the normal way
<mark4> ya dont say lol
<mark4> mips has no PSW
<mark4> at all
<mark4> like i said, an entirely flawed arch
<mark4> but at least it does not need a skid bucket like every other PIC controller
<tabemann> I remember looking at other PIC archs and wondering how people could program these things
<mark4> PIC is the worst arch ive hever had to use
<dave0> i wrote a program for a pic befor :-)
<dave0> back in the sattelite tv days
<dave0> satellite
<mark4> once i got above 8k in size on BOTH pic24 projects i worked on, any modification of module A would have a detrimental affect on module B
<dave0> gold card :-)
<mark4> chase that down and C starts to fuck up
<mark4> and if you put a breakpoint on an address and the debugger hits that breakpoint it will either stop THERE or up to 2 opcodes later
<dave0> my proudest code is a 512 bit modular exponentiation in 200 bytes of ram
<mark4> dave0, who were you working for? i worked for virtex rsi in longview texas
<mark4> till general dynamics bought them 100% of their code base was forth
<dave0> mark4: no no it was for fun and to get all the tv channels for free
<mark4> ooooh :)
<mark4> NOT illegal
<mark4> btw
<dave0> haha
<mark4> anything that comes in over the air is public domain
<mark4> its only stealing if you hack THEIR equipment to steal the channels
<mark4> create your own and you are within your rights
<mark4> same with cable TV. once the cable crosses your threshold the signal belongs to you
<mark4> so what they do is they put filters on your line if you dont pay
<dave0> it was austar in australia and they fixed it around 2002-3 or something, so nobody's cards worked anymore... i stopped around then
<mark4> i had a similar thing in england where the satellite system used smart cards
<mark4> i had a smart card circuit board with a cable to my laptop
<dave0> yep
<mark4> and run code on laptop to decrypt
<mark4> that WOULD have been stealing but we were already paying for it so it was purely academic :)
* tabemann doesn't care enough about TV to bother with all that - he gets all his TV from the local library
<dave0> it was good money for the pirates
<dave0> every few months everyone's cards stopped working, so you'd need an update to your card... naturally the update wasn't free
<dave0> not now
<dave0> pay tv costs as much as a mothly internet subscription... why would you bother
<tabemann> pay tv is wildly expensive
<mark4> i have not owned a TV in over 30 years now
<tabemann> the library is free as long as you return your DVDs on time
<dave0> i still remember the first dvd i ever pirated - the day of the jackal :-)
<dave0> still got it somewhere on a harddrive
<tabemann> I remember watching that movie
<dave0> tabemann: good movie
<dave0> not the bruce willis one
proteus-guy has quit [Ping timeout: 260 seconds]
<tabemann> of course not the bruce willis one
<mark4> the bruce willis movie is garbage. so is his death wish remake
<dave0> :-)
<mark4> the original movie is a classic
<mark4> awesome actors and based on a true story
<mark4> erm.. sort of
<tabemann> mark4: loosely
<mark4> artistic license :)
<tabemann> the opening scene is largely true
<mark4> i think i need to go watch that :)
<tabemann> the rest is fiction
<mark4> the assassin was based on a real person who had that name, his acts in the movie are pure hollyweird
<tabemann> okay, I should hit the sack now; got work tomorrow
<mark4> yea same same
<tabemann> g'night guys
<mark4> ya :)
proteus-guy has joined #forth
jsoft has joined #forth
gravicappa has joined #forth
dave0 has quit [Quit: dave's not here]
proteus-guy has quit [Ping timeout: 240 seconds]
xek has joined #forth
xek has quit [Ping timeout: 265 seconds]
xek has joined #forth
<cmtptr> < tabemann> the people who are like "taxes are theft" to me remind me of the rich people who kid their money in offshore accounts, while the rest of us pay our share
<cmtptr> and the people who want to control others in the name of "the greater good" remind me of brutal dictatorships who commit mass genocides
<cmtptr> again, I genuinely see your position as evil. maybe take a moment and reflect on why that might be. stop trying to convince me that servitude is better and instead come up with a system that works and where people may participate voluntarily
<siraben> remexre: have you explored modal type theory?
<remexre> siraben: nope
jsoft has quit [Ping timeout: 240 seconds]
<MrMobius> mark4, does MIPS still keep the calculation if you overflow?
<mark4> no
<MrMobius> wth
<mark4> or, i dont think it does
<mark4> ill have to double check but i think the result might not actually be stored back on overflow
<MrMobius> that is just bizarre
<MrMobius> also, Ive heard how terrible the other PICs are but none of that matters if you're doing C
<MrMobius> its just sad for the poor guys who have to write something in assembly
mark4 has quit [Remote host closed the connection]
<tabemann> cmtptr: so you genuinely believe that taxes are evil and analogous to genocide....
<cmtptr> individual liberty should be the first consideration, even at the expense of safety or any "common good". if you can't defend a voluntary system, then you must concede that it is at best theft
<tabemann> would you support anarchist communism?
<cmtptr> is it voluntary?
<tabemann> yes
<cmtptr> then what do i care?
<tabemann> I should note, though, that it operates on the basis of one works what one can and one receives what one needs
<cmtptr> as long as you don't force me or others to participate, then i don't really have to support or oppose it, do i?
<tabemann> well, I need to be getting ready for work, so I'll be on later
Zarutian_HTC has quit [Ping timeout: 240 seconds]
xek has quit [Ping timeout: 256 seconds]
SysDsnEng has joined #forth
SysDsnEng has quit [Client Quit]
gravicappa has quit [Ping timeout: 264 seconds]
pareidolia has quit [Ping timeout: 260 seconds]
pareidolia has joined #forth
gravicappa has joined #forth
gravicappa has quit [Ping timeout: 260 seconds]
gravicappa has joined #forth
mark4 has joined #forth
gravicappa has quit [Ping timeout: 240 seconds]
<mark4> so im porting x4 to 64 bits, its not running yet but im thinking i should convert it from direct threaded to sub threaded
<mark4> any thoughts?
<MrMobius> if its faster, why wouldnt you on x64? not like you need to save space
<mark4> heh
<mark4> actually making it sub threaded allows some optimizations at compile time without even making a peephole
<mark4> instead of call + you just inline the asm for +
<mark4> sort of thing
<mark4> but tbh im not sure how much faster it can be lol
<mark4> that is going to make the parameter stack a software stack tho because the processor stack will be the return stack
<mark4> so stack code will need to change :)
dave0 has joined #forth
Zarutian_HTC has joined #forth
<mark4> ok so if RSP is the return stack now them RBP has to be the parameter stack
<mark4> woud you do "xchg rbp, rsp push rbx xchg rbp, rsp" to push rbx onto the parameter stack or....
<mark4> mov [rbp], rbx
<mark4> add rbp, byte CELL
<mark4> if i allocate the parameter stack as grows down i think the former is the only way that will work
<dave0> that looks like i386 code
<dave0> oh wait rbx.. that's amd64
<mark4> yes
<mark4> when you use the PUSH opcode and RSP points below bottom of stack you get a segfault
<mark4> if the stack is allocated as grows down you get a new page allocated
<mark4> i can either lock p stack to a specific size or make it grows down
<mark4> the return stack is RSP and thats already grows down
<mark4> but only a PUSH will cause the growing
<mark4> actually a push the other way needs to be " sub rbp, byte CELL mov [rbp], rbx"
<mark4> but that will only cause a segfault on stack overflow no grows down
<mark4> anyone think 4k for a parameter stack is too limiting?
<dave0> i used rsp as the forth instruction pointer, and rsi for the return stack, and rdi for the data stack... and because the direction of return and data stack didn't matter, i made them grow up :-)
<mark4> how do you use the stack pointer as IP
<mark4> using the string registers for stack pointers makes some sense
<mark4> because push and pop are just cld/std lodsq/stosq
<mark4> i was using rsi for IP
<mark4> use sp just as the processor stack and use RSI and RDI for p/r stacks
<dave0> mark4: i found that in protected mode, the stack is not written to (interrupts etc. are on their own kernel stack), so you can point rsp at read-only data and pop and RET all day
<dave0> you can RET
<mark4> lol
<dave0> you lose push and pop for the data stack, but the interpreter's NEXT is unbelievable
<mark4> thats not really subroutine threaded though
<dave0> no direct threading
<mark4> yea
<mark4> im currently direct threaded but in my 32 bit x4
<dave0> i had assumed ret would be faster than lods ; jmp rax
<mark4> very probably
<mark4> and shorter too
<mark4> would that be faster than sub thread?
<dave0> hang on one second
<mark4> ! that also makes an unconditional branch much esier to implement
<dave0> i didn't check subrouting threading
<mark4> conditional branches would just add eip cell to skip branch vector
<mark4> CAN you add to IP in 64 bit mode?
<dave0> hmmm i don't think so
<mark4> and with ret threading how would you nest. you would need to r_push eip
<mark4> actually
<mark4> call 1f
<mark4> 1:
<dave0> nope, you push the stack register
<mark4> pop rax
<mark4> oooh right SP is IP
<mark4> soooo confuzing lol
<dave0> ehehe
<mark4> did you invent that or did you copy it from somwehere else?
<dave0> i wrote a program to benchmark ret vs. lods;jmp rax vs lods;jmp [rax] vs pop rax;jmp rax vs pop rax;jmp [rax]
<dave0> afaik it's my own invention
<dave0> at least i didn't copy it from anyone
<mark4> yea
<mark4> im pretty sure ive invented things after someone else :)
<mark4> but that one is not obvious
<dave0> it only works in protected mode
<dave0> luckily unix and windows works
<mark4> omg
<dave0> but no bare metal
<mark4> !
<mark4> you do NOT need to use a software stack
<mark4> sp SP is your IP
<mark4> SI is p stack
<mark4> di is r stack
<mark4> for example
<mark4> to PUSH to the parameter stack
<mark4> xchg rsp, rsi
<mark4> push xxx
<mark4> xchg rbp, rsi
<mark4> so pushes and pops to/from both the p and r stacks are 3 opcodes but STILL use push!
<dave0> i have not benchmarked xchg/push/xchg
<dave0> i just did mov [rsi],xxx ; add rsi,8
<dave0> so i think the code is longer
<mark4> so you always point to next empty?
<dave0> but i don't know how fast it is
<dave0> umm hold on
<mark4> or better yet
<mark4> std
<mark4> stodq
<mark4> cld
<mark4> thats a push
<dave0> oh i hadn't thought of that optimization
<mark4> a pop is simply a lodsq
<mark4> :)
<dave0> cool
<mark4> for BOTH p and r stacks :)
<mark4> however, thats going to need rax as cached top of stack. i currently use rbx :)
<mark4> actually when you do stodq with std set does it decrement before or after?
<mark4> push and pop have to have store then dec for push and inc them load for pop or some such
<mark4> depending on if you are a full stack or an empty stack to use arm's thinking
<mark4> i.e. does your stack pointer point to the top of stack item or the next empty space
<mark4> im not sure the std stosq cld method will help unless the pointer update happens either first on push, second on pop or second on push, first on pop
<mark4> if you get what i mean
<dave0> i looked and the stack pointers points to the top cell
<mark4> but the question is the order of operations on stosd when std is set
<mark4> as opposed to when cld
<mark4> assume stack always points to the most recent item
<mark4> in order for "std stosq cld" to be a push where "lodsq" can be its pop
<mark4> the stosq has to decrement rdi first
<dave0> that's what i have written for amd64
<mark4> and actually pushes and pops get more complex because stosd and lodsd use different registers
<mark4> so a pop would need to do xchg rsi rdi lol
<mark4> ya i dont think using lods and stos will be cheaper
<mark4> for the stacks i mean
<dave0> mark4: what OS are you on?
<mark4> linux
<dave0> ah cool
<mark4> gentoo linux )
<dave0> you should have no trouble compiling my code
<dave0> despite working on it for a long time, it doesn't do much
<mark4> for x4 i write a full memory manager and a terminfo parser and a text window interface to use it
<mark4> the tui is currently slightly broken
<mark4> needs work
<dave0> all in forth?
<mark4> but i could do multiple overlapping moving windows with text scrolling in any of 4 directions
<mark4> yes all in forth
<dave0> cool
<dave0> i'm a forth newbie
<mark4> https://github.com/mark4th/x4/tree/master/src/ext/tui thats the text user interface
<mark4> but menus are the bit that are broken on the latter
<mark4> pulldown menus i mean
<dave0> wow you handle signals in forth in twinch.f
<mark4> twinch is the signal that you get when you change the size of a window
<mark4> so if the terminal window size changes forth needs to udate its COLS and ROWS constants
<mark4> thers a bug in that implementation too that ive known about but didnt fix
<mark4> lol
<dave0> :-)
<mark4> one of the things i was working on was a debugger
<mark4> i wanted a segv handler for when someones code tried to write to bad memory
<dave0> mark4: were you able to run my test.zip ?
<mark4> instead of crashing the entire system just display a "oopts ya screwed up" message
<mark4> got it downloaded
<mark4> was working on something, cant multi task lol
<dave0> okay :-)
<alexshpilkin> <mark4 "so im porting x4 to 64 bits, its"> one gotcha is that you really shouldn't mix data and code on modern x86 (32 or 64), so no putting string literals inside the compiled code and so on
<dave0> i thought unix signals would be limited to c code
<alexshpilkin> ... otherwise it might not even be faster
<mark4> alexshpilkin, thats one thing i dont really care about - im already breaking the rules by making the entire forth memory +rwx :)
<alexshpilkin> <mark4 "alexshpilkin, thats one thing i "> it's not a rule
<alexshpilkin> like, not a style thinf
<alexshpilkin> *thing
<mark4> if i wanted to enforce harvard architecture i would have to make the forth indirect threaded
<alexshpilkin> it's that the CPU has separate I- and D-caches
<mark4> my forth compiles over 4 megabytes of source code per second
<mark4> and im breaking ALL the 'modern cs unerstanding' bs rules
<mark4> again - dont care :)
<alexshpilkin> and if you mix code and data in a single cacheline it'll kill your performance bloodily and messily
<dave0> i find myself fighting with the assembler, but i don't know how to write my own assembler :-/
<mark4> i disagree - i think the performance hit is negligable
<mark4> as evidenced by the fact that my compiler comopiles 4 megabytes of source per second
<mark4> or more
<remexre> alexshpilkin: you sure that applies if you don't write to the memory?
<mark4> alexshpilkin, when i wrote x4 (my 32 bit forth) i wrote it specifically to be readable by anyone
<mark4> when i created this channel many moons ago i would sit in here alone for months on end
<mark4> occasionally someone would come by and chat
<mark4> a couple of times people told me that they were not asm coders and not forth codeers but were very interested and could read my sources
* alexshpilkin is looking for benchmarks of C preprocessors, but apparently Warp didn't have one (wat?)
<mark4> i chose direct threading and did not care about cache hits or instuction pipelines what have you
<mark4> and i STILL ended up with what i have called one of the fastest compilers of any non triial programming language
<mark4> though there are some new compilers for new languages cominig out that have compile speed as a prime directive
<mark4> i.e. the V programming language that will always build in under a second
<dave0> mark4: how did you do your lookup function? FIND i think
<mark4> my headers and code are in separate sections
<mark4> a header has the following structure
<mark4> dd link-to-previous
<mark4> db lenm "name"
<mark4> dd pointer-to-cfa
<mark4> my vocabularies are hashed
<mark4> to search for foo you calculate the hash for foo and that selects the vocabulary thread
<dave0> ah
<mark4> i have a context stack... i search that voc thread of each vocab for the target word
<dave0> is FIND a bottleneck when compiling?
<mark4> i have (find) which is coded in asm
<mark4> and find which is forth
<mark4> yes thats why you do hashed vocabularies
<mark4> my hashing algorithm is the same as the one used by laxen and perry in F83
* dave0 googles
<mark4> (count byte * 2) + first char. if there is a second char multiply that by 2 and add the second char
<mark4> no other chars are looked at
<mark4> just count byte and chars 1 and 2
<mark4> (((count * 2) + char 1) * 2 + char 2)
<mark4> or just count *2 + c1
<mark4> and that with 0x3f to give you a index from 0 to 64
<mark4> a vocabulary is an array of linked lists.
<mark4> of forth headers :)
<mark4> actually its just an array of pointers to the most recent word in that thread
<mark4> when i create a new word i create its header and do NOT add it to the vocabulary till the definition is completed
<mark4> but i remember its thread index
<dave0> did you ever try different hash algorithms?
<mark4> no. but i was thinking of looking at others
<mark4> but this hash is ULTRA ULTRA fast and very few opcodes
<mark4> i think a more complex hash is going to slow down the compile
<mark4> it might give a better spread across the threads, less hits etc so the threads will maybe be smaller
<mark4> but im not sure if thats going to help that much
<mark4> or to a degree that can even be measured
<dave0> i thought to only "hash" on the length of words
<mark4> no
<dave0> that saves comparing lengths
<mark4> nope
<dave0> doesn't work?
<mark4> think of a vocabulary as an array of pointers to word headers
<mark4> so you hash "foo" and come up wtih index N
<mark4> yoju fetch index N of the voc and traverse that thread to see if "foo" is in there
<mark4> because "foo" and "bar have different hash values they go on different threads
<mark4> if you dont hash all words go on the same thread - one long chain making it potentially slower to find where "foo" is in that chain
<mark4> if you simply hashed on word lengths you would not spread very much across the table
<dave0> what if your index N was just the length of the word?
<dave0> yes there would be a lot of short words on one thread and few long words
<mark4> most forth word names are quite short - therefore most forth words woud thread into a few close indicies of the vocabulary
<dave0> but shorter is faster to compare
<mark4> by actually hasing the word name you spread words around throughout the voc keeping each chain smaller and faster to search
<mark4> when searching for "foo" you hash foo. and get thread N of the voc
<mark4> you collect thread N which is a pointer to a header
<mark4> you compare the length of the word you are searching for with the header you are pointing at
<mark4> if they are the same you compare strings
<mark4> if not you point to the previous header in the thread and compare lengths
<mark4> when lengths match you do the string compare and if they do not match you again... link back one
<mark4> when they do match you can collect the address of the words CFA from the header
<mark4> and return "true" or "1" which is also true
<mark4> based on whether or not the word is immediate or not
<mark4> immediate words have a bit set in their headers length byte
<dave0> if thread 1 only has words of 1 char, and thread 2 only has words of 2 char, and thread 3 only has words of 3 chars, etc... you can save a comparison with the string lengths
<mark4> and if 95% of your words are 5 6 and 7 chars in length
<dave0> yeah i'd have to benchmark
<mark4> you are going to keep populating threads 5 6 and 7 and threads 20, 21, 22, ... 63 will never be populated
<mark4> calculating the hash is very very very fast
<mark4> then you know which thread your word would be in if it was defined
<mark4> so you serch onlhy that thread of each vocab in context
<mark4> the more you scatter words into threads the shorter the threads will stay
<mark4> searching shorter threads means finding or failing faster
<mark4> the ONLY benefit this has is on compile time
<mark4> not run time
<mark4> run time should not be dependant on forth headers unless run time also does creatng
<mark4> i am starting to hate nasm's crippled macros more and more
<dave0> creating words at runtime makes my head spin
<mark4> i TRULY hate to say it but GAS macros are more powerful
<dave0> i come from c where there's no such thing
<mark4> what do you thik : does lol
<mark4> you RUN colon to create a new colon definition
<mark4> : constant create , does> @ ;
<mark4> 0 constant foo
<mark4> when you execute foo it literally jumps into the bit of code inside its creator following the does>
<mark4> all constants DO the bit of code after does>
<mark4> and that fetches the body contents of the constant which is just a self fetching variable :)
<dave0> mark4: i think of : as compile-time, not run time
<mark4> create creates a new forth header
<mark4> no colon is run time
<mark4> well its fuzzy lol
<mark4> compile time is when STATE = 1
<mark4> runtime is when STATE = 0
<dave0> yeah there's this nice blurring of interpret/compile/run time that you don't have in c
<mark4> so when you execute colon its in run time but it switches you into compile time
<mark4> void blah(void} { do useful stuff } blah() <-- run blah at compile time lol
<mark4> i have a forth coder friend who says that "Developing embedded applications in C is like opening a can... WITH A ROCK!"
<dave0> i also dig how you can return any number of args from a word... and even different numbers from the same word
<mark4> yes i often retur either a result and true or JUST false
<mark4> ( --- n1 t | f ) is how i document that
<dave0> yes!