ChanServ changed the topic of #glasgow to: glasgow debug tool · code https://github.com/GlasgowEmbedded/glasgow · forum https://glasgow.whitequark.org · logs https://freenode.irclog.whitequark.org/glasgow · production https://www.crowdsupply.com/1bitsquared/glasgow · no ETAs at the moment
Stormwind_mobile has quit [Remote host closed the connection]
Stormwind_mobile has joined #glasgow
<awygle> whitequark: is there a list of applets someplace?
<awygle> (are they all in the mainline repo, i guess is a better question)
gregdavill has quit [Quit: Leaving]
<whitequark> glasgow run --help
<whitequark> yeah
<awygle> Cool, time to learn how to write applets and whatnot
<whitequark> nice
<whitequark> which one are you working on?
<awygle> dunno yet
<awygle> might start with logic analyzer tbh
<awygle> since that's uh... the thing i originally needed
<awygle> or felt i needed
<whitequark> mhm
<whitequark> sure
<whitequark> you can reuse the event compression stuff i think quite easily
<awygle> mhm
ali_as has quit [Remote host closed the connection]
ali_as has joined #glasgow
ali_as has quit [Remote host closed the connection]
ali_as has joined #glasgow
ali_as has quit [Remote host closed the connection]
bvernoux has joined #glasgow
<whitequark> marcan: god damn it
<whitequark> "how hard can it be to build nextpnr for wasm", i thought.
<whitequark> "how bad can it be", i mused, a fool, an absolute fool
<ZirconiumX> I'm sure it won't be *cripplingly slow* to use
<daveshah> Probably fine for iCE40
<daveshah> WASM on PC is probably faster than native on a Raspberry Pi 2/3 which people have definitely used successfully
<whitequark> ZirconiumX: it's not actually slow
<whitequark> at all
<whitequark> i haven't measured it properly yet but the main hit will come from disabling threads in router and placer
<whitequark> and then something like a factor of 2 on top of that
<ZirconiumX> Huh, I thought you said there was like a 50% drop using WASM over native
<whitequark> yeah, i mean, have you seen how bad rpis are?
<whitequark> i can build yosys on my PC in less than 10 minutes and it takes hours on the pi
<whitequark> 2x slowdown just means you got a laptop CPU instead of a desktop one
<whitequark> or a desktop CPU from a few years back
<sorear> how bad is the binary size?
<whitequark> phenomenally bad, by which i mean the same thing as normal nextpnr.
<whitequark> over 99% of nextpnr binary size is the giant & extremely slow to build bba files
<whitequark> i'm going to have to split those and possibly distribute them separately or something
* whitequark . o O ( what if i used the hash of the chipdb as the version of the package on PyPI )
<awygle> is the bba file the huge embedded string thing
<whitequark> yes
<awygle> rip
<whitequark> okay, i got a design out of nextpnr-ice40.wasm
<daveshah> nice
<whitequark> i'm going to upstream the current state and then make it faster by not mapping the chipdbs that aren't used
<whitequark> because, well, pread() of 200 MB isn't free
<whitequark> daveshah: is there a reason nextpnr is using a homegrown tool instead of like, flatbuffers?
<whitequark> bba seems to me to a first approximation to do the same thing
<daveshah> idk, it was claire's idea
<whitequark> mh ok
<daveshah> this does have one layer less indirection than flatbuffers
<daveshah> but I don't know if the performance difference is significant
<whitequark> I guess there's no acute need to change anything
<whitequark> other than the ill-advised C code generation idea
<daveshah> no, one day it would be nice to have some deduplication for iCE40
<whitequark> yes, and i suspect that day will come quite quickly
<whitequark> because as-is nightly builds will upload 1 GB per week on PyPI
<whitequark> would flatbuffers help there?
<daveshah> (ECP5 85K database is less than half that of iCE40 8k, and the ECP5 deduplication isn't great)
<daveshah> No not really
<whitequark> okay so i just have to look at ecp5 and do the same thing right
<daveshah> No because the ECP5 approach is a horrible hack
<whitequark> because shipping 90M for glasgow to work isn't really an option
<whitequark> hm
<whitequark> does it bzip?
<daveshah> and doesn't handle things like globals or the edge of the device well
<daveshah> Yes it should zip ridiculously well
<whitequark> why doesn't nextpnr use something like gzip or lz4 during load then?
<daveshah> I was worried about startup time
<whitequark> hence lz4
<daveshah> yeah, that would probably work
<whitequark> there are newer algorithms that are even better than that
<daveshah> I wouldn't be surprised if it reduced size by 5-10x
<whitequark> 20M for all chipdbs is almost acceptable
<whitequark> hm
<whitequark> i could do it on python side too, really
<whitequark> would need to think about it
<daveshah> Hmm, not as good as I thought for simple gzip, looks to go from 94MB to 29MB on chipdb-8k
<daveshah> still better than nothing
<whitequark> not enough for pypi
<whitequark> hm
<whitequark> ok maybe it's actually enough
<whitequark> that's like 300 MB just on that page
<daveshah> 7zip takes it down to 12.5MB
<whitequark> i'd have to look how lzma in wasm looks like
<whitequark> there's lzma in stdlib so that's nice
<daveshah> 7zip decompression is ~1s so not too bad in the scheme of things
<whitequark> LZMA has BCJ filters
<whitequark> i wonder if bba could be adjusted to take advantage of those
<whitequark> it's pretty cursed, but
<daveshah> Possibly, many of the RelPtrs are to different locations but it might help a bit
<whitequark> wow, ecp5 chipdbs are even larger
<daveshah> Are they?
<whitequark> wait, i'm looking at the wrong number
electronic_eel has quit [Ping timeout: 258 seconds]
electronic_eel has joined #glasgow
<whitequark> daveshah: hm, the chipdbs for ecp5 are all super similar in size
<whitequark> by any chance is it possible to dedup them together?
<daveshah> Yes, it would be
<whitequark> hmmm
<whitequark> i really need to look into compression
bvernoux has quit [Read error: Connection reset by peer]
<whitequark> daveshah: nextpnr-ecp5 works on wasm too!
<whitequark> got a json file i could use to benchmark it?
<daveshah> whitequark: this is picorv32 for 45k CABGA381, IO constraints embedded so no LPF needed
<daveshah> takes 9s on my laptop (i9)
<whitequark> daveshah: WTF
<whitequark> there's no difference
<daveshah> huh
<whitequark> in fact, native version takes slightly more user time
<whitequark> want me to yeet you a .wasm file?
<daveshah> sure
<whitequark> this is built from sha1 2692c6f6cce41d22847058c85715e8822feb9b87 in case that matters
<daveshah> just to check, what wasm runtime should I be using?
<whitequark> wasmtime
<whitequark> i have 0.15.0 which is actually kinda old
<whitequark> but you can get a release of it i think
<daveshah> installing from aur, which is giving me 0.16.0
<whitequark> also good
<whitequark> daveshah: can you reproduce it?
<daveshah> no
<daveshah> I am trying to get the chipdb to work
<daveshah> was just about to ask where EXTERNAL_CHIPDB_ROOT was set to?
<whitequark> oh sorry
<whitequark> this is a little bit involved and i forgot to explain how to do it
<daveshah> it's supposed to print it when it fails to open, but that is in an exception handler, which aiui doesn't work in wasm yet
<daveshah> annoyingly about the only exception in nextpnr
<whitequark> unfortunately not the only one
<whitequark> anyway, first apply this https://paste.debian.net/1148466/
<daveshah> the others should be totally inconsequentia
<daveshah> *inconsequential
<whitequark> then build with -DUSE_C_EMBED=ON
<whitequark> then $ wasmtime run nextpnr-ecp5.wasm --dir . --mapdir /share/nextpnr/ecp5::build-ecp5-wasi/ecp5/chipdbs --
<whitequark> where `build-ecp5-wasi` is just whatever build directory you have where the chipdbs are located
<whitequark> the bba-related cmake code makes me very sad and i will burn it later
<whitequark> with its ashes fertilizing something better
<whitequark> to answer your question directly -DEXTERNAL_CHIPDB_ROOT=/share/nextpnr
<daveshah> yep, working now, thanks
<daveshah> maybe 1s slower than native but effectively the same
<daveshah> nice
<whitequark> on multiple runs?
<daveshah> yes
<daveshah> oh, nvm, I think it was something in the background
<daveshah> now it is identical
<whitequark> cool, so the same as I observe
<daveshah> yep
<daveshah> whitequark: does wasm use a 32bit pointer type or am I reading something out of date
<daveshah> nextpnr is probably memory bandwidth/latency constrained so perhaps 32 bit pointers would explain the good performance
<whitequark> yep
<whitequark> there's a wasm64 proposal but it's vaporware for now i think
<whitequark> fun fact
<whitequark> if i compile the wasm file through wasm2c it'll probably be "faster than native" :p
<whitequark> daveshah: the weirdest part is the threads
<whitequark> which aren't apparently doing anything in nextpnr
<awygle> Damn nice
<daveshah> whitequark: they only speed up a small part of HeAP in the default settings
<daveshah> router2 benefits from threads but it has other issues for ECP5 so isn't enabled at the moment
<whitequark> ah
Getorix has quit [Ping timeout: 260 seconds]
Getorix has joined #glasgow
Getorix has quit [Ping timeout: 240 seconds]
Stormwind_mobile has quit [Ping timeout: 256 seconds]
Getorix has joined #glasgow
Stormwind_mobile has joined #glasgow
Getorix has quit [Ping timeout: 246 seconds]
<whitequark> daveshah: about 15% faster than native with wavm
Getorix has joined #glasgow
Stephie has quit [Ping timeout: 256 seconds]
<gruetzkopf> ooh..kay
Stephie has joined #glasgow
<awygle> is there a way to use 32 bit pointers on 64 bit x86-64
<awygle> I knew this once but forget now
<ZirconiumX> x32
<TD-Linux> it is pretty cursed and therefore inevitable someone on here is going to try it
<sorear> the ISA is not an obstacle here, you can just use 32-bit instructions to load and store addresses
<sorear> but the ABI is a can of worms
<sorear> if you want to use 32-bit pointers _exclusively_ in a single process, that's x32
<sorear> if you have a VM that reinvents the wheel on pointer handling and doesn't share data structures with anything else, you can make that work too, although it's a lot of work and I've only seen a few examples
<sorear> depending on what the code does you could rewrite it to use array indices
<awygle> yeah i was thinking array indices or mmap() a huge swath of memory and roll your own 32-bit pointers as offsets from one 64-bit base pointer
<awygle> that x32 is not what x86 actually is annoys me deeply
<awygle> stupid neologisms lol
<ZirconiumX> See also: IA32 vs IA64
<awygle> surely wasm isn't limited to 4 gigs of memory per process tho? does it have segments or something?
<sorear> can we agree that IA32e is the worst of the group
<sorear> awygle: there are "wasm32" and "wasm64" targets
<sorear> wasm32 is limited to 4 gigs of memory
<sorear> nm that was mentioned already
<awygle> do the wasm64 targets use 64-bit pointers? does wasmtime support wasm64?
<sorear> called "vaporware for now I think" 3h ago
<awygle> o
<awygle> i missed that oops
Getorix has quit [Ping timeout: 260 seconds]
Getorix has joined #glasgow
Stormwind_mobile has quit [Ping timeout: 246 seconds]