marcan changed the topic of #asahi to: Asahi Linux: porting Linux to Apple Silicon macs | General project discussion | GitHub: https://alx.sh/g | Wiki: https://alx.sh/w | Topics: #asahi-dev #asahi-re #asahi-gpu #asahi-offtopic | Keep things on topic | Logs: https://alx.sh/l/asahi
mrkajetanp has joined #asahi
modwizcode has quit [Ping timeout: 246 seconds]
modwizcode has joined #asahi
KindOne has quit [Ping timeout: 260 seconds]
mrkajetanp has quit [Quit: WeeChat 3.1]
KindOne has joined #asahi
odmir has quit [Remote host closed the connection]
<pipcet[m]>
and we could pass through everything but USB, I think.
<marcan>
dottedmag: kvm on linux on m1 should already work better in some ways than the hypervisor stuff on macos (:)), but that's to virtualize linux/windows
<marcan>
pipcet[m]: the problem with "passing through" is you need to involve the IOMMUs
<marcan>
you can't just "pass through" hardware under KVM like I can under my toy hypervisor, there needs to be DMA and IRQ remapping
<marcan>
and not all the hardware in the M1 runs through the same kind of IOMMU
<marcan>
once we better understand the hardware, we'll have a better idea of what can be done
<marcan>
in particular the GPU is a big question mark, but there's reason to believe Apple is interested in virtualizing macos, so it's possible the hardware is built to enable this kind of feature
<marcan>
also, to run macos unmodified, KVM may need quite a bit of patching to support custom M1 features; at the very least we need to set everything to trap, but that might be very slow as it causes vmexits for tons of stuff
<marcan>
so that might be useful for experiments, but possibly not for end users; to do it properly KVM would need more M1-specific stuff and that's harder to implement and upstream
<marcan>
it's certainly something to investigate in the future, but it'll probably come quite further down the line
<marcan>
speaking of hypervisors, going to stream some more of that in a few minutes
<ar>
this hypervisor is super cool, and seems like the kind of thing that would also be useful also for devices other than just the ones based on m1
<marcan>
yeah, it's already useful for debugging linux to some extent... and now I'm conflicted, because m1n1 is quite apple silicon-centric by design
<marcan>
I really don't want to turn this into a big portable bootloader thing, but on the other hand the HV stuff is eminently useful elsewhere
<marcan>
maybe just the HV bits can be extracted into a stand-alone thing you can plug into another existing bootloader more easily?
<marcan>
although e.g. the HV page tables all assume 16K pages which few ARM systems support AIUI, so...
jeffmiw has quit [Remote host closed the connection]
jeffmiw has joined #asahi
jeffmiw has quit [Ping timeout: 268 seconds]
<pipcet[m]>
marcan: I have a plan for that (start macos physically, teach qemu to identity-map /dev/mem for the relevant ranges)
<marcan>
pipcet[m]: it's not going to work unless you make all of macos' RAM also identity mapped
<pipcet[m]>
sorry, I don't understand. qemu would still use its own page tables to translate simulated virtual to simulated physical, which would be equal to actual physical.
<pipcet[m]>
but that's post-HV fine tuning, really.
<marcan>
you would need to restrict the RAM that Linux is allowed to use, and give macos a contiguous chunk of physical RAM
<pipcet[m]>
yes, I'm already doing that.
<marcan>
you also need to emulate AIC
<pipcet[m]>
or modify the driver to coexist peacefully
<pipcet[m]>
or use a linux version without the actual AIC driver.
<pipcet[m]>
plenty of ways of doing that.
<marcan>
linux needs AIC
<marcan>
coexistence is unlikely to work
<pipcet[m]>
no, it needs FIQ
<pipcet[m]>
which doesn't go through AIC
<marcan>
it needs AIC if you want any non-polled IO
<marcan>
if you're going to cripple Linux to this point, I fail to see what that buys you over just using the HV in m1n1
<marcan>
what are you going to do, patch the nvme driver etc to use polling instead of IRQs?
<pipcet[m]>
I don't think coexistence is going to be much of a problem, actually.
<pipcet[m]>
why woud I need the nvme driver?
<marcan>
presumably you want linux to have some kind of working storage
<marcan>
if not nvme, then USB
<pipcet[m]>
all Linux needs is a usb port, and we already have polling for that.
<marcan>
for serial, yes
<marcan>
presuambly you want a filesystem
<marcan>
linux also doesn't have a polling driver for usb
<marcan>
m1n1 does
<pipcet[m]>
ramfs
<marcan>
... so you're loading a big blob of linux and qemu and everything into a tmpfs... without any network, storage, or anything... what's the point then?
<marcan>
you lose any benefits you'd get from doing this under linux
<pipcet[m]>
and we don't even need usb for a minimal setup, we can communicate with macos through shared memory and do the i/o from the emulated macos
<pipcet[m]>
yes, I'm not convinced this approach is worth it, but it's not totally impossible
<marcan>
it's possible, I just don't really see the point :)
<sven>
fwiw, i don't think at least genter/gexit will trap as long as GXF is disabled.
<marcan>
when you're giving the guest access to 95% of the hardware directly, having a massive OS in the way without being able to use most of its features just increases the bug/problem surface
<sven>
they're just undefined then
<marcan>
sven: what if I set a bunch of HACR_EL2 bits?
<marcan>
(like I did)
<marcan>
well, let's find out
<sven>
i think i checked and they don't. but please double check :)
<pipcet[m]>
marcan: again, I'm not convinced it's worth it or working on it, but someone asked and it's totally possible, if possibly harder than just getting the HV to work properly.
<marcan>
sven: yeah, seems they don't :(
<sven>
so you might be able to trap them after enabling GXF but ugh
<sven>
marcan: sooo.... you could just replace genter/gexit in the kernel text with hvc instructions while loading it :D
<sven>
:D
<sven>
you know you want to! :P
<sven>
:>
<j`ey>
might be worth getting the code into place for replacements, might be other use cases too!
<j`ey>
good suggestion from youtube chat to just rewrite vbar_el1
vimal has joined #asahi
jeffmiw has joined #asahi
jeffmiw has quit [Ping timeout: 268 seconds]
vimal has quit [Ping timeout: 260 seconds]
vimal has joined #asahi
raster has joined #asahi
choozy has joined #asahi
idt23[m] has joined #asahi
klaus has quit [Ping timeout: 252 seconds]
Bublik_ has joined #asahi
Bublik has quit [Ping timeout: 265 seconds]
klaus has joined #asahi
odmir has joined #asahi
odmir_ has joined #asahi
jamadazi has joined #asahi
odmir has quit [Ping timeout: 260 seconds]
odmir_ has quit [Ping timeout: 260 seconds]
<jamadazi>
hi guys! there hasn't been a public update post on the project for a long time now (a couple of months?), how's it going? has there been much progress on the project since the blog post update?
<jamadazi>
i know those initial patches were accepted into mainline, but i don't know anything else
<j`ey>
jamadazi: theres a stream right now, about wrting a hypervisor
<j`ey>
which will be used to help understand more of the hw
<marcan>
duh
<marcan>
the hv isn't even returning from the last exception
<marcan>
it's not the guest borking me, it's a bug somewhere
<marcan>
TTY> hv_exc_sync
<marcan>
TTY> ret
<marcan>
Pass: mrs x13, s3_5_c15_c4_0 = d
<marcan>
TTY> hv_exc_sync
<marcan>
Skip: msr s3_5_c15_c4_0, x13 = d
<marcan>
well that's weird on a skip...
maknho has joined #asahi
maknho____ has quit [Ping timeout: 260 seconds]
<marcan>
ah no, I think that's just the USB buffering
jeffmiw has joined #asahi
modrobert has quit [Read error: Connection reset by peer]
m0drobert has joined #asahi
m0drobert is now known as modrobert
<marcan>
it was PAN.
<marcan>
didn't realize PAN leaks up to EL2
<j`ey>
but doesnt that show as an exception?
VinDuv has quit [Quit: Leaving.]
VinDuv has joined #asahi
klaus has quit [Ping timeout: 260 seconds]
KindOne has quit [Ping timeout: 268 seconds]
klaus has joined #asahi
<marcan>
j`ey: EL2 faults
<marcan>
the bit isn't cleared when entering EL2
<j`ey>
Ah
<marcan>
and since m1n1 maps everything as EL0-accessible, EL2 can't access any data, not even the stack
<marcan>
so everything explodes
<marcan>
setting pan to 0 as the first instruction of the EL2 handlers fixes it
<marcan>
before any stack accesses
<kettenis>
short series with the proposed pinctrl bindings sent to the appropriate mailing lists
<kettenis>
finally got things in a state where all the checking tools are happy with them
<kettenis>
we should try to make some progress on the cock bindings as well though
<svenpeter>
Ugh, yes. I’ve been meaning to send the initial dumb version where a page is mapped for each clock for a while now
VinDuv has left #asahi [#asahi]
raster has quit [Quit: Gettin' stinky!]
Spectrejan[m] has joined #asahi
<kettenis>
Ah, but I may have botched the sending by effectively spoofing using my @openbsd.org address
<kettenis>
is it rude to send them again with that fixed?
VinDuv has joined #asahi
jeffmiw_ has joined #asahi
<Emantor>
kettenis: I'd wait for comments/Acks first. As long as the patches are fine I wouldn't bother to send again.
c4r1c4[m] has joined #asahi
jeffmiw_ has quit [Ping timeout: 240 seconds]
<kettenis>
well, they'll probably not make it to the mailing lists this way
<sven>
i received them fwiw
<jn__>
the pinctrl bindings made it to LKML at least
odmir has quit [Remote host closed the connection]
odmir has joined #asahi
agnem has joined #asahi
<marcan>
so macos is panicing now... trying to figure out how to decode the panic from the hypervisor, since serial isn't initialized yet
<kettenis>
progress, I guess?
odmir has quit [Remote host closed the connection]
<yrlf>
marcan: so if I understand it correctly, it prints the panic to the framebuffer, but macos died before it hits serial?
<j`ey>
i dont think the panic is getting to the framebuffer either
<pipcet[m]>
kettenis: FWIW, gmail classified your messages as spam, probably because of the spoofing thing
<yrlf>
one hack would be to try to find the panic function in macos and hook that
<yrlf>
otherwise, what options are there? OCR the framebuffer?
<yrlf>
j`ey: damn, then that's _really_ early__
<yrlf>
(didnt see your message before I sent mine)
<j`ey>
I'm just guessing, lets see what markan says :P
<marcan>
yrlf: there is no framebuffer yet either
<marcan>
I already have the panic hooked because it triggers a debugger trap and I have all exceptions hooked
<marcan>
I'm writing a panic decoder in python now
<yrlf>
ahh, yeah, doing a debugger trap on panic definitely is something that makes sense for them to do
<yrlf>
this kind of "semihosted" way of doing development with python and m1n1 cooperating definitely makes a lot of things easier
<marcan>
xnu's va_list doesn't seem to match the arm standard one... :/
<kettenis>
yeah, apple's arm64 ABI is "different"
<marcan>
ah, it only uses stack args apparently
c4r1c4[m] has left #asahi ["User left"]
zkrx has quit [Ping timeout: 240 seconds]
zkrx has joined #asahi
odmir has joined #asahi
jeffmiw has quit [Ping timeout: 240 seconds]
<marcan>
Invalid kernel stack pointer (probable corruption). at pc 0xfffffe00113dd374, lr 0xc1ff7e00112837e8 (saved state: 0xfffffe0015083cb0)
<marcan>
ah, maybe this is a PAC thing; it's entirely possible the PAC registers are VHE'd and I'm passing through to the wrong ones
Bublik has joined #asahi
zkrx has quit [Ping timeout: 268 seconds]
Bublik_ has quit [Ping timeout: 252 seconds]
<kettenis>
that lr value looks wrong ;)
<marcan>
I know
<marcan>
PAC codes go in the upper bits
<marcan>
so it might be a legitimate PAC'ed LR
zkrx has joined #asahi
<marcan>
"Invalid kernel stack pointer" is supposedly a stack pointer issue though
<marcan>
but the actual faulting instruction there is a null deref
<marcan>
and it's a null deref reading from some percpu/thread info stuff
<marcan>
I'm starting to wonder how many levels deep into a fault I am
<marcan>
(for reference: so far this is kernel panic -> undefined instruction aka debuger trap -> hvc hook -> m1n1)
<marcan>
but it looks like the panic is a stack check that happened due to a null deref
<marcan>
I think this might be a recursive fault triggering a stack overflow
<marcan>
yup so the fault chain currently is
<marcan>
a data abort on a possibly bad but not null address -> 15 frames of recursive null pointer dereferences due to the thread context not being fully initialized -> panic due to stack overflow -> debugger break -> hypercall hook -> m1n1 -> python exception handler
* eta
blinks
solarkraft has joined #asahi
<marcan>
I guess I'm going to have to figure out how to get kernel symbols to make any sense of this
<roxfan>
we need to go deeper.jpg
<jn__>
are there kernel symbols baked into the live kernel?
<marcan>
I think there are symbol files available
<marcan>
but I'm going to upgrade this first, I'm on an ancient version
<roxfan>
KDK has symbolized kernels but they're not the same as retail
<roxfan>
IIRC
<marcan>
could just boot one of those
<roxfan>
"The kernel (release) variant matches the shipping kernel for users" claims the readme
<roxfan>
and there's kernel.dSYM
jeffmiw has joined #asahi
jeffmiw_ has joined #asahi
jeffmiw has quit [Read error: Connection reset by peer]
vimal has quit [Quit: Leaving]
<marcan>
ok so the fault address is actually a null physical pointer
<marcan>
it's a bad vaddr that corresponds, offset-wise, to physical address 0
<marcan>
so obviously something's borked
<marcan>
I suspect devtree. maybe it's trying to look up mach-o sections in the memory map, I don't currently update those...
<roxfan>
when in doubt, blame Apple :)
trimental has joined #asahi
<pipcet[m]>
marcan: but it wouldn't boot when chain-loaded then, and it does, right?
<marcan>
that's true
<marcan>
however, there is at least one that would stay the same: the mach header
<pipcet[m]>
you have the trustcache, SEPFW, BootArgs, and DeviceTree in the devtree?
<marcan>
that one would be valid when chainloaded, but not hv
<marcan>
yes
<pipcet[m]>
you're not chainloading in place?
<marcan>
not with the hypervisor, obviously
<pipcet[m]>
I confess that's not obvious to me.
odmir has quit [Remote host closed the connection]
<marcan>
m1n1 lives where iBoot loaded things initially, the guest goes higher in memory
<pipcet[m]>
oh, I thought m1n1 relocated itself somewhere safe (because obviously that's what I'm doing). It might be worth it just to try relocating m1n1 somewhere and loading the guest at the physical address iBoot gave you?
<balrog>
marcan kernel symbols? go to developer.apple.com/download/more, download the matching KDK, and install it. The symbols get installed in /Library/Developer/KDKs/KDK_version.kdk/System/Library/Kernels/kernel.BUILD.dSYM
odmir has joined #asahi
<balrog>
The symbols file for the release kernel is included, but the release kernel is built with lots of inlining
<balrog>
(there's a DWARF file inside the dSYM, if you need to access it directly)
<balrog>
And yeah you need to be on 11.3 or later, or there won't be symbols for the t8101 kernel at all
<marcan>
pipcet[m]: I'd rather fix the problem than work around it
<marcan>
the whole point of this exercise is understanding
odmir has quit [Remote host closed the connection]
odmir has joined #asahi
odmir has quit [Remote host closed the connection]
jeffmiw_ has quit [Remote host closed the connection]
jeffmiw has joined #asahi
<pipcet[m]>
sorry, probably misundertanding things again.
odmir has joined #asahi
<yrlf>
marcan: the fact that it's possible to debug this huge chain of faults and at all is already a _huge_ success of m1n1 and the hypervisor stuff you are doing
<yrlf>
s/and at all/at all/
<yrlf>
props to already getting this far! at this point I probably wouldn't want to develop an OS without such a nice debugging tool as m1n1
<balrog>
marcan: the documentation with the KDK explains how to do two machine debugging but I'm not sure how that plays within m1
<balrog>
within m1n1*
VinDuv has quit [Quit: Leaving.]
<roxfan>
do they support two-machine debugging for arm? iirc first releases supported panic symbolication only
odmir has quit [Remote host closed the connection]
linkmauve has quit [Ping timeout: 246 seconds]
kettenis has quit [Ping timeout: 240 seconds]
kettenis has joined #asahi
jeffmiw has quit [Remote host closed the connection]
odmir has joined #asahi
jeffmiw has joined #asahi
jeffmiw_ has joined #asahi
jeffmiw has quit [Read error: Connection reset by peer]
odmir has quit [Ping timeout: 250 seconds]
odmir has joined #asahi
diamondbond has joined #asahi
jamadazi has quit [Quit: WeeChat 3.1]
jeffmiw_ has quit [Remote host closed the connection]
odmir has quit [Remote host closed the connection]
odmir has joined #asahi
odmir has quit [Ping timeout: 250 seconds]
jeffmiw has joined #asahi
jeffmiw has quit [Ping timeout: 260 seconds]
raster has joined #asahi
odmir has joined #asahi
odmir has quit [Remote host closed the connection]