marcan changed the topic of #asahi to: Asahi Linux: porting Linux to Apple Silicon macs | General project discussion | GitHub: https://alx.sh/g | Wiki: https://alx.sh/w | Topics: #asahi-dev #asahi-re #asahi-gpu #asahi-offtopic | Keep things on topic | Logs: https://alx.sh/l/asahi
raster has quit [Quit: Gettin' stinky!]
odmir has quit [Remote host closed the connection]
odmir has joined #asahi
odmir has quit [Ping timeout: 276 seconds]
DrNapster has joined #asahi
DrNapster has quit [Client Quit]
odmir has joined #asahi
odmir has quit [Ping timeout: 250 seconds]
agnem has quit [Ping timeout: 240 seconds]
agnem has joined #asahi
odmir has joined #asahi
odmir has quit [Remote host closed the connection]
odmir has joined #asahi
agnem has quit [Ping timeout: 246 seconds]
odmir has quit [Ping timeout: 250 seconds]
agnem has joined #asahi
<marcan>
pipcet[m]: what I meant is that what I'm doing should work; if it doesn't it means I'm doing something wrong, and I want to find out what it is
<marcan>
since i'm getting faults I can just trace the cause of the problem with symbols/etc
<marcan>
"known good" bisection debugging is a valid strategy, but not worth it here, where I already have better visibility
phiologe has quit [Ping timeout: 276 seconds]
phiologe has joined #asahi
<amw>
If you are buried deep in exception traps perhaps it's work building support to trap (breakpoint?) earlier and ealier in the trap heap?
awesomebing1 has quit [Quit: ZNC 1.8.2+deb2+b1 - https://znc.in]
<marcan>
each EL1 abort ends up in python anyway right now, it's just that initially I was passing through page faults on the assumption that they were normal (they aren't, not this early)
<marcan>
so in fact the state right now is data abort on bad address -> hypercall hook -> m1n1 -> python
<marcan>
since I just made it break to the debugger on everything
<marcan>
was just changing one line of python :)
<marcan>
that's the beauty of this all, the silly fast test cycle
<marcan>
honestly the slowest part right now is just sending over the xnu kernel
<marcan>
(which is like a 100+MB image)
VinDuv has joined #asahi
roxfan2 has joined #asahi
roxfan has quit [Ping timeout: 246 seconds]
jeffmiw has joined #asahi
jeffmiw has quit [Ping timeout: 260 seconds]
<svenpeter>
So you’re saying we actually need to bring up the usb3 phy? :P
jeffmiw has joined #asahi
jeffmiw_ has joined #asahi
<amw>
Sorry, I meant it's worth building support..., definitely like the idea that you can start trapping the page faults and only break on the very specific faults you are interested in
<amw>
Perhaps it's worth adding a tracing to help zero in on where it hangs. e.g. record the PC every certain time period or watch points?
jeffmiw_ has quit [Remote host closed the connection]
jeffmiw_ has joined #asahi
jeffmiw_ has quit [Remote host closed the connection]
<amw>
It's probably too much work but perhaps remote gdb protocol could allow easy stack dumps, breakpoints, watch points, but I don't know much about the effort required.
<sven>
the protocol itself is simple
<sven>
the limitation is the hv currently
<sven>
once marcan can do everything that gdb requires from python all that's left is a few 100 lines of glue code to make gdbserver.py
<marcan>
sven: actually compressing is about 1/3 of the time and calculating the checksum in python is about 1/3 of the time... and that's after switching from lzma to gzip :D
<marcan>
so it's pretty much fail all around
<j`ey>
can you just cache the compression for now..
<marcan>
and yeah the main thing missing is actually using the debug features so you can set breakpoints and such
<marcan>
well I could stash the image somewhere in RAM that doesn't get touched, but right now I reboot the system after runs anyway to clean up properly so that wouldn't really help
<marcan>
I could append it as a m1n1 payload just to have it lying around there but... eh
<marcan>
it's not *that* bad waiting a few seconds for it to load
<marcan>
(it's still faster than the old linux kernels over serial)
<j`ey>
it just "looks slow"
<j`ey>
all those dots
<sven>
phew, no usb3 phy then :)
<eta>
(use zstd maybe? >_<)
<marcan>
most of the time is *before* and *after* the dots!
<marcan>
and yeah but then the checksum is the bottleneck
<j`ey>
hehe
<marcan>
and uh I really don't feel like implementing a cython module just for that :-)
<j`ey>
just upload the first 10mb of macos :P
<marcan>
wonder if I can somehow use numpy or something to implement the checksum... or just disable checksums over USB anyway
<marcan>
I'm not sure chopping a kernel at the 10MB mark is a good idea :P
<j`ey>
marcan: are you patching the macOS image still?
<eta>
marcan: are you compressing in Python directly?
<svenpeter>
Maybe try pypy?
<svenpeter>
That should speed up the checksum nicely
* eta
can attest to just giving up and piping the data through a `pigz` subprocess
<marcan>
eta: yeah I was thinking of shelling out to pigz for the gzip part too
<marcan>
then the checksum is left. if it's dumb enough maybe I can numpy it
<marcan>
j`ey: not right now
<eta>
that sped up my gzip code like 6x vs the common lisp library I was using for this particular project
<marcan>
I think the checksum is linear? can probably get away with precomputing some powers of 31337 mod 2**32 and parallelizing this in numpy
<j`ey>
whats the checksum actually checking?
<marcan>
it's just a checksum over all uart messages
<j`ey>
I mean, what bit are you worried will go wrong?
<j`ey>
ah
<marcan>
which is completely redundant for USB because USB already has checksums
<marcan>
but, well, it's the same protocol
<marcan>
oh also, there's ZLP issue with USB. there is *always* a ZLP issue with USB
<sven>
marcan: just be glad it's not a crc :-P
<marcan>
I think I haven't had a USB project that didn't screw that up
<marcan>
need to fix it
<marcan>
(if you read a multiple of 512 bytes of memory it hangs)
<VinDuv>
The builtin zlib module seems to have support for adler32 and crc32 (implemented in C) but I suppose these wouldn’t be compatible with the current checksum you’re using
<marcan>
yeah, I try to keep backwards compat to avoid forcing people to reinstall over old versions during development
<marcan>
but I think I can implement this with a few numpy primitives (which might be even faster than plain C if they vectorize)
PendulumSwinger has quit [Ping timeout: 260 seconds]
jeffmiw_ has joined #asahi
vimal has joined #asahi
jeffmiw_ has quit [Ping timeout: 260 seconds]
vimal has quit [Quit: Leaving]
<amw>
If the image isn't changed much after a run, you could implement rsync like update in place only the blocks with a changed checksum?
matt6 has quit [Ping timeout: 252 seconds]
matt6 has joined #asahi
linkmauve has joined #asahi
adamcstephens has quit [Read error: Connection reset by peer]
adamcstephens has joined #asahi
raster has joined #asahi
<marcan>
amw: I reboot the thing after each boot, and I would kind of expect RAM to be cleared or shuffled after a reboot
linkmauve has quit [Ping timeout: 240 seconds]
<marcan>
< amw> Perhaps it's worth adding a tracing to help zero in on where it hangs. e.g. record the PC every certain time period or watch points? <- sorry, missed this, but that's how I found the PAN thing
odmir has joined #asahi
<marcan>
I was setting up a timer to raise a FIQ after a certain time
<marcan>
setting it to 1 cycle was the closest I got to single stepping and found me the msr pan, #1
<marcan>
I still want to do proper single stepping etc, but so far this quick hack has worked well enough
odmir has quit [Ping timeout: 268 seconds]
choozy has joined #asahi
<amw>
On the checksum backwards compatibility you can always add a new command extension, partially checksummed commands supported, if yes, send new style, command + unprotected data otherwise use old style fully protected command.
<amw>
I am thinking proxyclient/proxy.py checksum() used to protect commands is the problem?
<amw>
The two part would have only the command not the bulk data protected, or perhaps just a unprotected blob operation or a switch to cheaper builtin checksum
<amw>
Sorry just thinking out loud ideas not complete solutions :-)
linkmauve has joined #asahi
maknho has quit [Quit: WeeChat 2.3]
maknho has joined #asahi
maknho has quit [Client Quit]
maknho has joined #asahi
linkmauve has quit [Ping timeout: 250 seconds]
VinDuv has quit [Quit: Leaving.]
ephe_meral1 has joined #asahi
<marcan>
yeah, I thought of a flag like that for bulk data :)
<marcan>
but really if I speed it up enough not to matter, who cares :-)
<marcan>
also, there is another use case (which in practice I have hit more often than actual line noise): concurrency issues interleaving data and messages
<marcan>
with SMP etc
<marcan>
so maybe I'll add checksum disabling once all that is properly SMP-safe :-)
<arnd>
I've finally gotten to the point of trying install the Mac mini, going through the newbie setup at https://github.com/AsahiLinux/docs/wiki/Developer-Quickstart, but I'm failing at the basic MacoOS install. I got an external TB3 drive so I can leave the working installation on the internal drive alone
Bublik has quit [Ping timeout: 260 seconds]
<arnd>
Partitioning and installing macos once works great, but after I install macos a second time, the first installation on the same drive fails
Bublik has joined #asahi
<arnd>
I tried erasing the first install on the external drive and reinstalling to it. The reinstall also worked, but after rebooting, it still fails with "The version of macOS on the selected disk needs to be reinstalled. Use Recovery to ereinstall macOS or select another startup disk."
<arnd>
The internal drive still boots, and the "Linux" partition on the external drive (which contains the unmodified MacOS 11.3.1) also boots, but the "Macintosh HD" on the external drive does not
<VinDuv>
The page you linked indicates that even if you install macOS on external storage, the macOS kernel will be installed on internal storage (search for “M1 machines cannot boot from external storage.”)
<VinDuv>
Maybe that does some weird things if you have multiple external installs?
<arnd>
Could be. I'm trying a third macos partition now to see which ones and how many I can boot after that
<balrog>
roxfan2: the instructions say that they do
<balrog>
(support two-machine debugging)
<roxfan2>
cool if so
roxfan2 is now known as roxfan
jeffmiw has quit [Ping timeout: 246 seconds]
jeffmiw has joined #asahi
jeffmiw has quit [Ping timeout: 260 seconds]
jeffmiw has joined #asahi
jeffmiw has quit [Read error: Connection reset by peer]
<balrog>
roxfan: the biggest caveat I see is that the instructions claim you have to use built-in ethernet or a very small list of tbt ethernet interfaces
<roxfan>
yeah probably the kbd code uses a built-in driver
<roxfan>
*kdb (kernel debugger)
<balrog>
well, to be clear, the readme claims that only built-in ethernet works on Apple Silicon Macs
<balrog>
but laptops don't have that, and doesn't the Mac Mini use a third party chipset anyway?
<balrog>
So I'd expect a TBT adapter with the same or similar chip to work
<balrog>
Also I've heard that the 10GbE Mac Mini is now available
<balrog>
(that's Aquantia AQC107/113 AIUI)
odmir has joined #asahi
marvin24_ has joined #asahi
marvin24 has quit [Ping timeout: 276 seconds]
odmir has quit [Ping timeout: 240 seconds]
marvin24_ has quit [Ping timeout: 250 seconds]
marvin24 has joined #asahi
VinDuv1 has joined #asahi
VinDuv has quit [Ping timeout: 240 seconds]
Chainsaw has quit [Remote host closed the connection]
VinDuv1 has quit [Quit: Leaving.]
ephe_meral1 has quit [Ping timeout: 265 seconds]
Bublik has quit [Ping timeout: 252 seconds]
Bublik has joined #asahi
TheJollyRoger has quit [Ping timeout: 240 seconds]
<marcan>
arnd: I know they still have some, um, breakage when moving external installs between devices; I wouldn't be surprised if multiple OSes on one external drive is broken
<marcan>
I don't think anyone has tested that so far
<marcan>
the problem is all of this has to be provisioned on the internal boot policy and the kernels go in the iSC partition, and... there's a lot of surface area for bugs here
modrobert has quit [Ping timeout: 265 seconds]
<marcan>
maybe you can just stick to one OS externally (the "Linux" thing); are you planning to do anything with the Mac OS install on the external drive?
<marcan>
FWIW for mmiotrace/etc I plan to use the "Linux" partition; I haven't documented this yet but I expect people who want to play along with that to leave the Mac OS install under Linux and boot that when needed (probably best think of it as a throwaway OS)
<marcan>
not the other partition (though you could, if you install m1n1 there too, but it seems kind of redundant at that point)
<marcan>
(I don't want to cross-boot from one partition to another, that is bound to cause issues)
<arnd>
marcan: the MacOS on the internal disk isn't mine, it has to stay functional and I don't have root access on it. I have no need for MacOS on the external drive other than to try to follow your how-to document.
<arnd>
For some reason I assumed I had to build m1n1 under MacOS, but that is clearly not the case
<marcan>
not at all, I've never even tested that
<marcan>
m1n1 builds under linux
<marcan>
I assume it builds under macos if you have an ELF toolchain installed :)
<marcan>
the point of the dual-boot in the guide is exactly what you want, to leave a functional macOS for normal tasks (or normal macOS dev)
<marcan>
(and also for upgrades and such, so you can upgrade 1TR that way)
<arnd>
My current theory of what went wrong with my external drive is that this all broke down because of nonmatching credentials: I had to provide an Admin password for the internal drive when installing to the external drive, but then after booting into that, the new Admin account lacked permissions to change the boot setup back to another external partition
<marcan>
ah, but have you tried changing boot partition via 1TR?
<marcan>
I think that is supposed to do the provisioning properly
<arnd>
Yes
<marcan>
and yes the big issue with all of this is the credentials; each OS's credentials also end up in the SEP data store, which is how 1TR authorizes things like replacing the kernel...
<marcan>
that's actually something I need apple to fix for our end-user install flow, at least with my current plan
<marcan>
(the drive adoption thing)
<arnd>
After a partition gets messed up, 1TR also complains about missing permissions, in various ways
<marcan>
hm, interesting
<marcan>
did you first-boot into those OSes?
<marcan>
(to set up the first user)
<marcan>
I wouldn't be surprised if booting off of an external drive also forbids internal boot mode changes per policy in some way
<arnd>
1TR says something like "this partition has no privileged user, pick one now" when trying to boot back into a partition that stopped working
<marcan>
(this all sounds like legit bugs that could be reported to apple fwiw)
<marcan>
and then you can't pick?
<marcan>
(and ahh yeah, this sounds like related to machine owner stuff)
<arnd>
I always did the First-Boot setup, and that worked, but just couldn't get back to the other partition after that
<marcan>
fwiw I don't fully understand this myself but I know there are dragons all around... :-)
<marcan>
wish apple documented it more clearly
<marcan>
also, from 1TR, have you tried both the direct boot picker, and the startup disk selection under options?
<marcan>
I suspect they are not the same
<arnd>
I suspect they wouldn't really provide support for users on external drives, afaik it only works by accident on TB3 disks but should be disabled on removable drives otherwise
<marcan>
I'm pretty sure booting macos from an external drive is supposed to work in general
<marcan>
even USB or whatever
<marcan>
actually let me try that, I just found my USB SSD...
<marcan>
... it's just kind of broken in various ways because they rushed this M1 thing a *lot*
<marcan>
but also, I *want* them to fix this stuff as it overlaps with feature I need for our installer :-)
<arnd>
I did try various boot pickers (from running OS, from 1tr, from long pressing power button without 1TR and from the failed-boot screen)
<marcan>
the basic idea for our installer I have is to make it look like a foreign external drive (just internally), which means credential adoption needs to work
<marcan>
fwiw long-press is still 1TR AIUI, the boot picker is just a fullscreen macos app
<arnd>
I think I got into different scenarios there, but could never boot more than one of the external partitions
<marcan>
I think "options" just does a UI/session restart into the desktop
<marcan>
so I call that menu 1TR too
<marcan>
(just boot picker)
<marcan>
let me try to reproduce what you see
<arnd>
There was clearly a difference between the long-press and the 1TR menu, in that the latter does not show USB drives, only TB3/nvme
<marcan>
huh, interesting
<marcan>
now I'm wondering if it thinks nvme is internal and that's what causes it to be really confused
<marcan>
obviously iBoot can't boot from TB3, there's no way it initializes that
<marcan>
so those drives still need to be treated as external for boot policy purposes
<arnd>
It shows a picture of an external drive, but a different color compared to the USB one
<arnd>
Anyway, I should go sleep now, will try more tomorrow