<dimenus>
anyone having issues building zig (master) on mingw64?
<dimenus>
getting a bunch of undef references to z3_* libclangStaticAnalyzerCore
<dimenus>
even with Z3 installed
<daurnimator>
dimenus: 2965 and 2958 were recently merged.... does it work from before they were merged?
<scientes>
we don't need maskedLoad or maskedStore
<scientes>
you can just use | and & and let the optimizer figure it out
<daurnimator>
scientes: I don't think that fits zig's philosophy
<scientes>
daurnimator, we could just put it in std lib
<scientes>
we only need @gather and @scatter
<daurnimator>
scientes: e.g. if you have half of a vector next to something with PROT_NONE: i'd want to be using a masked load.
<scientes>
daurnimator, ahh unaligned loads?
<daurnimator>
scientes: not unaligned. but when you *do not* want to read from memory
<scientes>
the optimization should just be guaranteed
<scientes>
daurnimator, yeah, but it will always be 16-byte aligned, so PROT_NONE is impossible
<daurnimator>
howso?
<scientes>
it has to be page aligned
<daurnimator>
16-byte aligned can still go over a page long if your vector is > 16 bytes.....
<scientes>
and thus is simd aligned
<scientes>
again, you just have to guarantee that the optimization is wrong
<scientes>
optimization is right
<scientes>
LLVM has already annonced that these instructions will be deprecated too
<scientes>
I feel its a bug in LLVM
<daurnimator>
whats a bug?
<scientes>
if it isn't getting optimized to a maskedstore/maskedload
<scientes>
if you use | and &
<daurnimator>
why should it be?
<daurnimator>
| and & can be much slower than a masked store/load
<scientes>
its equilivent
<daurnimator>
no its not
<scientes>
the optimizer would figure it out
<daurnimator>
from http://llvm.org/docs/LangRef.html#llvm-masked-load-intrinsics > Other targets may support this intrinsic differently, for example by lowering it into a sequence of branches that guard scalar load operations. The result of this operation is equivalent to a regular vector load instruction followed by a ‘select’ between the loaded and the passthru values, predicated on the same mask.
<daurnimator>
a sequence of branches guarding loads is for sure slower than plain `|` and `&`
<scientes>
oh I see
<scientes>
uggh, its still code smell to me
<scientes>
they could add a optimization attribute
<daurnimator>
scientes: from the zig side: if I didn't have @maskedLoad: how would I indicate that I want to load half a vector without reading the other half?
<scientes>
I only see that it could cause problems if the pointer was unaligned
<daurnimator>
it's not about alignment
<scientes>
daurnimator, how do you specifiy that you only want to read one bit?
<daurnimator>
you don't. a byte is called a byte because its the smallest addressable quantity
<daurnimator>
however a vector is more than a byte.
<daurnimator>
(at least, most of the time... I guess we can have vectors of u1 )
<scientes>
and those get bit-packed
<daurnimator>
right
<daurnimator>
but imagine a 64 byte vector
<daurnimator>
split across two pages
<scientes>
it would be two vector loads
<scientes>
oh, x86 is weird in that way...
<scientes>
and also supports unaligned
<daurnimator>
a 64 byte vector only needs to be aligned to 16 bytes.
<scientes>
the compiler can load whatever the hell it wants
<scientes>
it just can't modify it
<daurnimator>
??
<daurnimator>
the whole point of a maskedload is to tell it it can't load whatever it wants
<scientes>
well two things 1) I still think an instruction is not necessarily, but could be an attribute flag, 2) it seems like using a hammer to solve a edge condition
<daurnimator>
> Edge cases matter.
<scientes>
like it would be faster to test if you are at a boundary
<scientes>
yes of course
<scientes>
but the compiler doesn't know where the boundries
<daurnimator>
exactly!
<scientes>
so it generates ship code
<daurnimator>
which is why it needs to be a builtin
<scientes>
*shit
<scientes>
it would be easier to have a check before doing the load/store
<scientes>
and only then use this expensive one
<scientes>
which wouldn't have to then be implemented this way
<daurnimator>
`x | mask` -> may or may not access masked off bits: don't care, do whatever is fast. `@maskedLoad(x, mask)` -> do *not* access masked off bits
<scientes>
well, I guess this would be OK, like on power 8 there is a load/store that takes a length
<scientes>
but then you would have to do a ctz + clz + popcount
<scientes>
instead of just one clz or ctz depending on where the border was that you were avoiding
<scientes>
i doubt they would have the optimization to avoid that
<scientes>
and see how the mask was generated
<scientes>
and you could instead just use differnt length vector types
<scientes>
like use a 61-byte vector
<scientes>
LLVM is deprecating these instructions too
<scientes>
daurnimator, how is your use-case not solved by generating a llvm type that is the length you want?
<scientes>
zig can't do that well right now, but LLVM is working on it
<scientes>
basically you would just page align your vector loads/stores
<scientes>
by splitting them
<scientes>
<daurnimator> but imagine a 64 byte vector
<scientes>
natural alignment has a strict definition
<scientes>
that there is only a single bit in the size
<daurnimator>
oh okay; I didn't know that definition
<scientes>
maskedLoad just seems very x86 specific to me, and kindof a hack, and doing the optimizations to make it possible to use right on ppc for example, I doubt those have been done
<scientes>
but yeah with the way avx-512 has 16-bit alignOf, maybe it is necessary
<scientes>
*128-bit
kristoff_it has quit [Ping timeout: 245 seconds]
<daurnimator>
scientes: it probably defaults to the series of branches lowering I quoted above
<scientes>
even though on vsx it doesn't need to, but the optimizations would be difficult
<scientes>
and the programmer would probably not generate the mask right for that either
<scientes>
even though C11 would allow such a zombie read
<daurnimator>
scientes: though also: doesn't llvm allow arbitrary vector sizes?
<scientes>
inside of the page
<scientes>
daurnimator, yes it does, but not variable length
<scientes>
the lowering of those is also quite bad right now
<scientes>
but it does work
<daurnimator>
scientes: e.g. I could have a @Vector(8, u32768) for a vector of pages.
<scientes>
lol, I don't have enough RAM to compile that
<daurnimator>
huh?
<daurnimator>
Why would that take lots of ram to compile?
<daurnimator>
it's just 8*4KB.
<scientes>
well I did 1MB
<curtisf>
while it's kinda neat that zig (theoretically) supports integers that large, I question how reasonable it is to support numbers that require compiling loops to do things like addition
<scientes>
also clang has a more sane max
<scientes>
I was using gcc
<scientes>
d.c:2:36: error: vector size too large
<scientes>
typedef int badass __attribute__ ((vector_size (1024*1024)));
<scientes>
daurnimator, yeah that is too large for clang
<daurnimator>
curtisf: not different to targetting a 8bit microcontroller and adding u32s...
<scientes>
yeah clang's max is 1024 vector_length
<curtisf>
there's a difference of scale, having 4 instructions unrolled is totally different from having hundreds
NI33_ has quit [Ping timeout: 245 seconds]
<curtisf>
you'd probably not want to have addition of u32768 be unrolled, which is odd
<Hourglass>
Début des en-têtes de programme : 64 (octets dans le fichier)
<halosghost>
LC_ALL=C might be useful for some folk
<nrdmn>
Hourglass: does your objcopy support AArch64?
<Hourglass>
nrdmn: good question, I don't know. How can I check that ?
<nrdmn>
objcopy --info
<Hourglass>
nrdmn: good catch, it doesn't. I suppose its logical since my host computer is x86_64 and I'm trying to target arm64. I'm a bi new to this cross-compilation stuff
<Hourglass>
*bit
<nrdmn>
hmm, there seems to be no way to create a File with an outStream() method that returns an OutStream with a custom OutStream.stream.writeFn
kristoff_it has joined #zig
kristoff_it has quit [Ping timeout: 272 seconds]
Ichorio has joined #zig
<nrdmn>
which makes it difficult to support platforms where stdin, stdout, stderr aren't files
jjido has joined #zig
<emekankurumeh[m]>
daurnimator: can you point me to some implementations or header files that define `sockaddr_any`?
jjido has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
kristoff_it has joined #zig
jjido has joined #zig
omglasers2 has quit [Quit: Leaving]
FireFox317 has joined #zig
<FireFox317>
gonz_: I found the issue for the subsystem detection. You have to declare the WinMain function as a pub function. In the example that andrewrk gave this wasn't specified and that caused the issue.
<FireFox317>
Maybe we should add the same restriction that the 'normal' main function has. Namely the fact that the main function needs to be pub.
Hourglass has left #zig [#zig]
<FireFox317>
Not sure how to properly solve this
<gonz_>
Indeed, that was the reason
<gonz_>
`pub export` seems to do the trick
<FireFox317>
I will make a issue to track this.
wootehfoot has quit [Read error: Connection reset by peer]
<gonz_>
In retrospect it's somehow obvious
knebulae has quit [Ping timeout: 268 seconds]
<FireFox317>
Yeah it is, but the detection should be polished a bit
<gonz_>
The annoying thing is it half-working without the `pub`, yes
<gonz_>
If it didn't, you might just try adding it and end up where you needed to be
halosghost has quit [Quit: WeeChat 2.5]
FireFox317 has quit [Remote host closed the connection]
knebulae has joined #zig
kristoff_it has quit [Remote host closed the connection]
darithorn has joined #zig
kristoff_it has joined #zig
jmiven has quit [Quit: reboot]
jmiven has joined #zig
reductum has joined #zig
kristoff_it has quit [Ping timeout: 272 seconds]
jjido has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
<andrewrk>
how does one order a pinebook pro?
<andrewrk>
gonz_, it's a bug, pub should not be required if the function is exported
<THFKA4>
i think you need a forum account that's older than a month
<THFKA4>
which you then use to get a coupon code during checkout
<andrewrk>
I found a support email address. I sent them a description of my use case, hopefully they let me buy one