#asahi on 2021-05-08 — irc logs at freenode.irclog.whitequark.org

00:21 mrkajetanp has joined #asahi

00:22 modwizcode has quit [Ping timeout: 246 seconds]

00:29 modwizcode has joined #asahi

00:45 KindOne has quit [Ping timeout: 260 seconds]

00:45 mrkajetanp has quit [Quit: WeeChat 3.1]

00:47 KindOne has joined #asahi

00:51 odmir has quit [Remote host closed the connection]

00:52 odmir has joined #asahi

00:57 odmir has quit [Ping timeout: 260 seconds]

01:20 Emantor has quit [Quit: ZNC - http://znc.in]

01:20 Emantor has joined #asahi

01:23 odmir has joined #asahi

01:34 odmir has quit [Ping timeout: 268 seconds]

01:48 odmir has joined #asahi

01:51 phiologe has quit [Ping timeout: 276 seconds]

01:51 phiologe has joined #asahi

01:53 odmir has quit [Ping timeout: 260 seconds]

01:57 phiologe has quit [Ping timeout: 260 seconds]

02:00 phiologe has joined #asahi

02:25 Bublik has joined #asahi

02:26 Bublik_ has quit [Ping timeout: 260 seconds]

02:35 phiologe has quit [Ping timeout: 250 seconds]

02:35 odmir has joined #asahi

02:35 phiologe has joined #asahi

02:39 odmir has quit [Ping timeout: 240 seconds]

03:23 qyousef_ has quit [Ping timeout: 252 seconds]

04:01 marvin24 has quit [Ping timeout: 276 seconds]

04:01 marvin24_ has joined #asahi

04:02 robinp_ is now known as robinp

04:37 odmir has joined #asahi

04:43 odmir has quit [Read error: Connection reset by peer]

05:22 VinDuv has joined #asahi

06:00 adamcstephens has quit [Quit: The Lounge - https://thelounge.chat]

06:04 adamcstephens has joined #asahi

06:04 neunon has quit [Quit: ZNC 1.8.2 - https://znc.in]

06:06 neunon has joined #asahi

06:17 <dottedmag> I wonder how much work is needed for QEMU to run macOS under Linux on M1? Nearly zero or a lot?

06:19 <dottedmag> Is KVM x86-specific?

06:19 <dottedmag> Scratch that, description on linux-kvm.org is obsolete :(

06:28 <VinDuv> KVM (or equivalent) virtualizes the CPU but you’d still need to emulate the rest of the M1 hardware

06:31 <dottedmag> Right, and I suppose not much of it could be just passed through...

06:55 <ar> dottedmag: kvm (the interface the kernel provides) is available on a couple of different architectures, not just x86

07:02 <dottedmag> ar: Yeah, I was misled by linux-kvm.org that apparently wasn't updated since forever

07:08 jeffmiw has joined #asahi

07:10 jeffmiw has quit [Remote host closed the connection]

07:10 jeffmiw has joined #asahi

07:11 <pipcet[m]> dottedmag: https://worthdoingbadly.com/xnuqemu3/ apparently worked at some point

07:14 <pipcet[m]> and we could pass through everything but USB, I think.

07:19 <marcan> dottedmag: kvm on linux on m1 should already work better in some ways than the hypervisor stuff on macos (:)), but that's to virtualize linux/windows

07:20 <marcan> pipcet[m]: the problem with "passing through" is you need to involve the IOMMUs

07:20 <marcan> you can't just "pass through" hardware under KVM like I can under my toy hypervisor, there needs to be DMA and IRQ remapping

07:20 <marcan> and not all the hardware in the M1 runs through the same kind of IOMMU

07:21 <marcan> once we better understand the hardware, we'll have a better idea of what can be done

07:22 <marcan> in particular the GPU is a big question mark, but there's reason to believe Apple is interested in virtualizing macos, so it's possible the hardware is built to enable this kind of feature

07:23 <marcan> also, to run macos unmodified, KVM may need quite a bit of patching to support custom M1 features; at the very least we need to set everything to trap, but that might be very slow as it causes vmexits for tons of stuff

07:24 <marcan> so that might be useful for experiments, but possibly not for end users; to do it properly KVM would need more M1-specific stuff and that's harder to implement and upstream

07:24 <marcan> it's certainly something to investigate in the future, but it'll probably come quite further down the line

07:25 <marcan> speaking of hypervisors, going to stream some more of that in a few minutes

07:42 <ar> this hypervisor is super cool, and seems like the kind of thing that would also be useful also for devices other than just the ones based on m1

07:44 <marcan> yeah, it's already useful for debugging linux to some extent... and now I'm conflicted, because m1n1 is quite apple silicon-centric by design

07:45 <marcan> I really don't want to turn this into a big portable bootloader thing, but on the other hand the HV stuff is eminently useful elsewhere

07:45 <marcan> maybe just the HV bits can be extracted into a stand-alone thing you can plug into another existing bootloader more easily?

07:45 <marcan> although e.g. the HV page tables all assume 16K pages which few ARM systems support AIUI, so...

08:08 jeffmiw has quit [Remote host closed the connection]

08:25 jeffmiw has joined #asahi

08:30 jeffmiw has quit [Ping timeout: 268 seconds]

08:46 <pipcet[m]> marcan: I have a plan for that (start macos physically, teach qemu to identity-map /dev/mem for the relevant ranges)

08:47 <marcan> pipcet[m]: it's not going to work unless you make all of macos' RAM also identity mapped

08:48 <pipcet[m]> sorry, I don't understand. qemu would still use its own page tables to translate simulated virtual to simulated physical, which would be equal to actual physical.

08:49 <pipcet[m]> but that's post-HV fine tuning, really.

08:51 <marcan> you would need to restrict the RAM that Linux is allowed to use, and give macos a contiguous chunk of physical RAM

08:52 <pipcet[m]> yes, I'm already doing that.

08:52 <marcan> you also need to emulate AIC

08:52 <pipcet[m]> or modify the driver to coexist peacefully

08:52 <pipcet[m]> or use a linux version without the actual AIC driver.

08:53 <pipcet[m]> plenty of ways of doing that.

08:53 <marcan> linux needs AIC

08:53 <marcan> coexistence is unlikely to work

08:53 <pipcet[m]> no, it needs FIQ

08:53 <pipcet[m]> which doesn't go through AIC

08:53 <marcan> it needs AIC if you want any non-polled IO

08:54 <marcan> if you're going to cripple Linux to this point, I fail to see what that buys you over just using the HV in m1n1

08:54 <marcan> what are you going to do, patch the nvme driver etc to use polling instead of IRQs?

08:54 <pipcet[m]> I don't think coexistence is going to be much of a problem, actually.

08:54 <pipcet[m]> why woud I need the nvme driver?

08:54 <marcan> presumably you want linux to have some kind of working storage

08:54 <marcan> if not nvme, then USB

08:54 <pipcet[m]> all Linux needs is a usb port, and we already have polling for that.

08:55 <marcan> for serial, yes

08:55 <marcan> presuambly you want a filesystem

08:55 <marcan> linux also doesn't have a polling driver for usb

08:55 <marcan> m1n1 does

08:55 <pipcet[m]> ramfs

08:55 <marcan> ... so you're loading a big blob of linux and qemu and everything into a tmpfs... without any network, storage, or anything... what's the point then?

08:55 <marcan> you lose any benefits you'd get from doing this under linux

08:55 <pipcet[m]> and we don't even need usb for a minimal setup, we can communicate with macos through shared memory and do the i/o from the emulated macos

08:56 <pipcet[m]> yes, I'm not convinced this approach is worth it, but it's not totally impossible

08:57 <marcan> it's possible, I just don't really see the point :)

08:57 <sven> fwiw, i don't think at least genter/gexit will trap as long as GXF is disabled.

08:57 <marcan> when you're giving the guest access to 95% of the hardware directly, having a massive OS in the way without being able to use most of its features just increases the bug/problem surface

08:57 <sven> they're just undefined then

08:57 <marcan> sven: what if I set a bunch of HACR_EL2 bits?

08:57 <marcan> (like I did)

08:57 <marcan> well, let's find out

08:58 <sven> i think i checked and they don't. but please double check :)

08:59 <pipcet[m]> marcan: again, I'm not convinced it's worth it or working on it, but someone asked and it's totally possible, if possibly harder than just getting the HV to work properly.

09:01 <marcan> sven: yeah, seems they don't :(

09:02 <sven> so you might be able to trap them after enabling GXF but ugh

09:06 <sven> marcan: sooo.... you could just replace genter/gexit in the kernel text with hvc instructions while loading it :D

09:07 <sven> :D

09:08 <sven> you know you want to! :P

09:10 <sven> :>

09:11 <j`ey> might be worth getting the code into place for replacements, might be other use cases too!

09:19 <j`ey> good suggestion from youtube chat to just rewrite vbar_el1

10:04 vimal has joined #asahi

10:06 jeffmiw has joined #asahi

10:10 jeffmiw has quit [Ping timeout: 268 seconds]

10:14 vimal has quit [Ping timeout: 260 seconds]

10:27 vimal has joined #asahi

10:35 raster has joined #asahi

10:38 choozy has joined #asahi

10:42 idt23[m] has joined #asahi

10:57 klaus has quit [Ping timeout: 252 seconds]

10:57 Bublik_ has joined #asahi

11:00 Bublik has quit [Ping timeout: 265 seconds]

11:05 klaus has joined #asahi

11:07 odmir has joined #asahi

11:08 odmir_ has joined #asahi

11:11 jamadazi has joined #asahi

11:11 odmir has quit [Ping timeout: 260 seconds]

11:13 odmir_ has quit [Ping timeout: 260 seconds]

11:14 <jamadazi> hi guys! there hasn't been a public update post on the project for a long time now (a couple of months?), how's it going? has there been much progress on the project since the blog post update?

11:14 <jamadazi> i know those initial patches were accepted into mainline, but i don't know anything else

11:15 <j`ey> jamadazi: theres a stream right now, about wrting a hypervisor

11:16 <j`ey> which will be used to help understand more of the hw

11:18 <marcan> duh

11:18 <marcan> the hv isn't even returning from the last exception

11:18 <marcan> it's not the guest borking me, it's a bug somewhere

11:18 <marcan> TTY> hv_exc_sync

11:18 <marcan> TTY> ret

11:18 <marcan> Pass: mrs x13, s3_5_c15_c4_0 = d

11:18 <marcan> TTY> hv_exc_sync

11:18 <marcan> Skip: msr s3_5_c15_c4_0, x13 = d

11:18 <marcan> well that's weird on a skip...

11:58 maknho has joined #asahi

12:01 maknho____ has quit [Ping timeout: 260 seconds]

12:42 <marcan> ah no, I think that's just the USB buffering

12:53 jeffmiw has joined #asahi

13:35 modrobert has quit [Read error: Connection reset by peer]

13:35 m0drobert has joined #asahi

13:35 m0drobert is now known as modrobert

13:59 <marcan> it was PAN.

13:59 <marcan> didn't realize PAN leaks up to EL2

13:59 <j`ey> but doesnt that show as an exception?

14:01 VinDuv has quit [Quit: Leaving.]

14:04 VinDuv has joined #asahi

14:12 klaus has quit [Ping timeout: 260 seconds]

14:12 KindOne has quit [Ping timeout: 268 seconds]

14:14 klaus has joined #asahi

14:16 <marcan> j`ey: EL2 faults

14:16 <marcan> the bit isn't cleared when entering EL2

14:17 <j`ey> Ah

14:17 <marcan> and since m1n1 maps everything as EL0-accessible, EL2 can't access any data, not even the stack

14:17 <marcan> so everything explodes

14:17 <marcan> setting pan to 0 as the first instruction of the EL2 handlers fixes it

14:17 <marcan> before any stack accesses

14:23 <kettenis> short series with the proposed pinctrl bindings sent to the appropriate mailing lists

14:24 <kettenis> finally got things in a state where all the checking tools are happy with them

14:24 <kettenis> we should try to make some progress on the cock bindings as well though

14:31 <svenpeter> Ugh, yes. I’ve been meaning to send the initial dumb version where a page is mapped for each clock for a while now

14:36 VinDuv has left #asahi [#asahi]

14:39 raster has quit [Quit: Gettin' stinky!]

14:41 Spectrejan[m] has joined #asahi

14:41 <kettenis> Ah, but I may have botched the sending by effectively spoofing using my @openbsd.org address

14:43 <kettenis> is it rude to send them again with that fixed?

14:43 VinDuv has joined #asahi

14:43 jeffmiw_ has joined #asahi

14:47 <Emantor> kettenis: I'd wait for comments/Acks first. As long as the patches are fine I wouldn't bother to send again.

14:47 c4r1c4[m] has joined #asahi

14:48 jeffmiw_ has quit [Ping timeout: 240 seconds]

14:48 <kettenis> well, they'll probably not make it to the mailing lists this way

14:48 <sven> i received them fwiw

14:49 <jn__> the pinctrl bindings made it to LKML at least

14:49 <jn__> https://www.spinics.net/lists/kernel/msg3929807.html

14:49 <Emantor> ye, the MLs don't care much about the spoofing.

14:49 <Emantor> https://lore.kernel.org/linux-arm-kernel/20210508142000.85116-1-kettenis@openbsd.org/T/#t

14:49 <kettenis> ok, thanks

15:09 agnem has quit [Quit: WeeChat 3.1]

15:29 qyousef_ has joined #asahi

15:43 odmir has joined #asahi

15:47 odmir has quit [Remote host closed the connection]

15:49 odmir has joined #asahi

15:50 agnem has joined #asahi

15:56 <marcan> so macos is panicing now... trying to figure out how to decode the panic from the hypervisor, since serial isn't initialized yet

15:57 <kettenis> progress, I guess?

16:04 odmir has quit [Remote host closed the connection]

16:10 <yrlf> marcan: so if I understand it correctly, it prints the panic to the framebuffer, but macos died before it hits serial?

16:10 <j`ey> i dont think the panic is getting to the framebuffer either

16:11 <pipcet[m]> kettenis: FWIW, gmail classified your messages as spam, probably because of the spoofing thing

16:12 <yrlf> one hack would be to try to find the panic function in macos and hook that

16:13 <yrlf> otherwise, what options are there? OCR the framebuffer?

16:14 <yrlf> j`ey: damn, then that's _really_ early__

16:15 <yrlf> (didnt see your message before I sent mine)

16:15 <j`ey> I'm just guessing, lets see what markan says :P

16:19 <marcan> yrlf: there is no framebuffer yet either

16:19 <marcan> I already have the panic hooked because it triggers a debugger trap and I have all exceptions hooked

16:19 <marcan> I'm writing a panic decoder in python now

16:21 <yrlf> ahh, yeah, doing a debugger trap on panic definitely is something that makes sense for them to do

16:23 <yrlf> this kind of "semihosted" way of doing development with python and m1n1 cooperating definitely makes a lot of things easier

16:27 <marcan> xnu's va_list doesn't seem to match the arm standard one... :/

16:27 <kettenis> yeah, apple's arm64 ABI is "different"

16:32 <marcan> ah, it only uses stack args apparently

16:35 c4r1c4[m] has left #asahi ["User left"]

16:45 zkrx has quit [Ping timeout: 240 seconds]

16:51 zkrx has joined #asahi

16:53 odmir has joined #asahi

16:59 jeffmiw has quit [Ping timeout: 240 seconds]

17:01 <marcan> Invalid kernel stack pointer (probable corruption). at pc 0xfffffe00113dd374, lr 0xc1ff7e00112837e8 (saved state: 0xfffffe0015083cb0)

17:01 <marcan> ah, maybe this is a PAC thing; it's entirely possible the PAC registers are VHE'd and I'm passing through to the wrong ones

17:02 Bublik has joined #asahi

17:03 zkrx has quit [Ping timeout: 268 seconds]

17:04 Bublik_ has quit [Ping timeout: 252 seconds]

17:05 <kettenis> that lr value looks wrong ;)

17:07 <marcan> I know

17:07 <marcan> PAC codes go in the upper bits

17:07 <marcan> so it might be a legitimate PAC'ed LR

17:08 zkrx has joined #asahi

17:10 <marcan> "Invalid kernel stack pointer" is supposedly a stack pointer issue though

17:13 <marcan> but the actual faulting instruction there is a null deref

17:15 <marcan> and it's a null deref reading from some percpu/thread info stuff

17:15 <marcan> I'm starting to wonder how many levels deep into a fault I am

17:16 <marcan> (for reference: so far this is kernel panic -> undefined instruction aka debuger trap -> hvc hook -> m1n1)

17:16 <marcan> but it looks like the panic is a stack check that happened due to a null deref

17:29 <marcan> I think this might be a recursive fault triggering a stack overflow

17:37 <marcan> yup so the fault chain currently is

17:38 <marcan> a data abort on a possibly bad but not null address -> 15 frames of recursive null pointer dereferences due to the thread context not being fully initialized -> panic due to stack overflow -> debugger break -> hypercall hook -> m1n1 -> python exception handler

17:39 * eta blinks

17:55 solarkraft has joined #asahi

17:57 <marcan> I guess I'm going to have to figure out how to get kernel symbols to make any sense of this

18:01 <roxfan> we need to go deeper.jpg

18:02 <jn__> are there kernel symbols baked into the live kernel?

18:02 <marcan> I think there are symbol files available

18:03 <marcan> but I'm going to upgrade this first, I'm on an ancient version

18:03 <roxfan> KDK has symbolized kernels but they're not the same as retail

18:03 <roxfan> IIRC

18:04 <marcan> could just boot one of those

18:05 <roxfan> "The kernel (release) variant matches the shipping kernel for users" claims the readme

18:05 <roxfan> and there's kernel.dSYM

18:44 jeffmiw has joined #asahi

18:45 jeffmiw_ has joined #asahi

18:45 jeffmiw has quit [Read error: Connection reset by peer]

18:48 vimal has quit [Quit: Leaving]

18:52 <marcan> ok so the fault address is actually a null physical pointer

18:52 <marcan> it's a bad vaddr that corresponds, offset-wise, to physical address 0

18:52 <marcan> so obviously something's borked

18:55 <marcan> I suspect devtree. maybe it's trying to look up mach-o sections in the memory map, I don't currently update those...

19:03 <roxfan> when in doubt, blame Apple :)

19:03 trimental has joined #asahi

19:04 <pipcet[m]> marcan: but it wouldn't boot when chain-loaded then, and it does, right?

19:06 <marcan> that's true

19:06 <marcan> however, there is at least one that would stay the same: the mach header

19:06 <pipcet[m]> you have the trustcache, SEPFW, BootArgs, and DeviceTree in the devtree?

19:06 <marcan> that one would be valid when chainloaded, but not hv

19:06 <marcan> yes

19:09 <pipcet[m]> you're not chainloading in place?

19:12 <marcan> not with the hypervisor, obviously

19:14 <pipcet[m]> I confess that's not obvious to me.

19:17 odmir has quit [Remote host closed the connection]

19:18 <marcan> m1n1 lives where iBoot loaded things initially, the guest goes higher in memory

19:20 <pipcet[m]> oh, I thought m1n1 relocated itself somewhere safe (because obviously that's what I'm doing). It might be worth it just to try relocating m1n1 somewhere and loading the guest at the physical address iBoot gave you?

19:21 <balrog> marcan kernel symbols? go to developer.apple.com/download/more, download the matching KDK, and install it. The symbols get installed in /Library/Developer/KDKs/KDK_version.kdk/System/Library/Kernels/kernel.BUILD.dSYM

19:21 odmir has joined #asahi

19:21 <balrog> The symbols file for the release kernel is included, but the release kernel is built with lots of inlining

19:21 <balrog> (there's a DWARF file inside the dSYM, if you need to access it directly)

19:22 <balrog> And yeah you need to be on 11.3 or later, or there won't be symbols for the t8101 kernel at all

19:24 <marcan> pipcet[m]: I'd rather fix the problem than work around it

19:25 <marcan> the whole point of this exercise is understanding

19:26 odmir has quit [Remote host closed the connection]

19:27 odmir has joined #asahi

19:29 odmir has quit [Remote host closed the connection]

19:35 jeffmiw_ has quit [Remote host closed the connection]

19:40 jeffmiw has joined #asahi

19:40 <pipcet[m]> sorry, probably misundertanding things again.

19:40 odmir has joined #asahi

19:47 <yrlf> marcan: the fact that it's possible to debug this huge chain of faults and at all is already a _huge_ success of m1n1 and the hypervisor stuff you are doing

19:47 <yrlf> s/and at all/at all/

19:48 <yrlf> props to already getting this far! at this point I probably wouldn't want to develop an OS without such a nice debugging tool as m1n1

20:14 <balrog> marcan: the documentation with the KDK explains how to do two machine debugging but I'm not sure how that plays within m1

20:14 <balrog> within m1n1*

20:23 VinDuv has quit [Quit: Leaving.]

20:34 <roxfan> do they support two-machine debugging for arm? iirc first releases supported panic symbolication only

20:40 odmir has quit [Remote host closed the connection]

20:47 linkmauve has quit [Ping timeout: 246 seconds]

20:55 kettenis has quit [Ping timeout: 240 seconds]

20:57 kettenis has joined #asahi

21:07 jeffmiw has quit [Remote host closed the connection]

21:07 odmir has joined #asahi

21:10 jeffmiw has joined #asahi

21:11 jeffmiw_ has joined #asahi

21:11 jeffmiw has quit [Read error: Connection reset by peer]

21:12 odmir has quit [Ping timeout: 250 seconds]

21:21 odmir has joined #asahi

21:23 diamondbond has joined #asahi

21:51 jamadazi has quit [Quit: WeeChat 3.1]

22:06 jeffmiw_ has quit [Remote host closed the connection]

22:17 odmir has quit [Remote host closed the connection]

22:17 odmir has joined #asahi

22:22 odmir has quit [Ping timeout: 250 seconds]

22:23 jeffmiw has joined #asahi

22:27 jeffmiw has quit [Ping timeout: 260 seconds]

22:35 raster has joined #asahi

22:47 odmir has joined #asahi

22:51 odmir has quit [Remote host closed the connection]

22:51 odmir has joined #asahi

23:50 choozy has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]