<realitix>
by the way arigato, I did another wrapper (with cffi) that you could tweet on pypy twitter acount https://github.com/realitix/pyshaderc (the original project (shaderc) references pyshaderc in its Readme)
arigato has joined #pypy
<fijal>
hi
cstratak has joined #pypy
vkirilichev has joined #pypy
kolko has quit [Ping timeout: 255 seconds]
antocuni has joined #pypy
<arigato>
hi
<antocuni>
hi
kolko has joined #pypy
<fijal>
so maaaybe I have a better idea how much things are used ;-)
* fijal
ponders if he should wait for plan_rich or try to debug the problems with fast-str-methods
antocuni has quit [Ping timeout: 260 seconds]
arigato has quit [Quit: Leaving]
jamadden has joined #pypy
jacob22_ has quit [Quit: Konversation terminated!]
<plan_rich_>
fijal, hi
<plan_rich_>
yes I'm alive, though I only find time to do vmprof related fixing...
<plan_rich_>
what is the problem with fast-str-methods?
<cfbolz>
jneen: (probably wrong timezone, but anyway) cool :-). if we can do anything to help, please let us know! also, I'd be interested in knowing what went wrong with llvm? we also had trouble effectively using llvm, at various points
<John>
hi all, I need some help taking a numpy array of raw bytes, say 10 bytes, and parsing out 4 letters of DNA per byte into a new array
<John>
I can do this in python, but it's quite slow, and i'm trying to show that PyPy, or at least Python, can be very performant when it counts
<John>
So i think i need to operate on this numpy array via CFFI, and while I can do that, i don't know how to take 10 bytes and produce 40 values
Tiberium has quit [Remote host closed the connection]
<John>
If anyone would be able to help me, your 10 minutes might save me a week, so i'd be very grateful, but i also appreciate everyone's busy with their own stuff too
<dash>
John: mmm. What parts of your project need numpy?
<John>
I could provide working Python code, and it just has to be made faster if that helps?
<dash>
John: As opposed to, say, lists or strings
<John>
Hey dash
Tiberium has joined #pypy
<dash>
John: Sure, couldn't hurt.
<John>
I have a huge number of identical-length strings (DNA), and i can encode it quickly, but my decoder is for whatever reason a lot slower than my encoder -_-
<John>
I have a feeling you know about bioinformatics... is that the case?
<dash>
I've heard of it. ;-)
<John>
Ah ok :) I guess perhaps i jut recognise your handle/alias from #python
<John>
hehe
Tiberium has quit [Remote host closed the connection]
<John>
Well yeah, i don't really need numpy at all. But i don't know how to do performant bit shifting in C (and in python its not likely to be possible)]
amaury has joined #pypy
<bremner>
(x << bits) is as performant as it gets, no?
<bremner>
or maybe you mean something more complex by "bit shifting"
<John>
No that's basically it, but i'd have thought there'd be something more performant than
Tiberium has joined #pypy
<bremner>
I'd imagine that's one machine instruction on x86
<bremner>
unless x is bigger than a machine word
<antocuni>
John: can you just read them as strings and use struct.unpack/unpack_from?
<antocuni>
struct.unpack is very fast on pypy
<John>
struct only works on 4/8/16/32 bit numbers, and i've got 2/3/4/5... bit numbers, which is why i can't use that or array.array, etc
<John>
As for <<, yeah I guess it's one instruction, which means to unpack an array of things you'd do:
<John>
OK first you'd convert the string into a number, because you can't bitshift strings. This would be a pain but it's not impossible (just got to read a byte, add it to the int, shift the int right 8 bits, repeat)
<antocuni>
ah I see
<John>
Then once it's a number that python will let you bitshift, you have to do something like "for x in range(0,the_number.bit_length(),2): (the_number >> x) & 0xff"
<John>
so that's a lot of extra work i guess
<antocuni>
well, it's basically the same work that numpy has to do
<John>
oh, my 0xff is wrong, that should just be 0b11
<John>
antocuni: right, exactly
<John>
But i think in CFFI, someone could whip up something that worked much faster
<antocuni>
pypy is able to optimize all these ops very well, so I expect the pure-python version to be as fast as the C version
<John>
And i don't usually so blatantly ask for help, but I don't really have the week it will take me to figure it out in C
<antocuni>
(in this example, only "decode" is written to be fast on PyPy, but you can write encode in a similar way of course)
<John>
right right, but the magic here is in the "val = struct.unpack('H', s)[0]"
<John>
if your array wasn't 16 bits, but was 160 bits, this would take way longer to make the int
<John>
but the bit shifting i accept is fast as hell :)
<antocuni>
well no, of course if you have 160 bits you unpack in steps of 64 bits
<John>
oh, hehe, derp - i was unpacking in steps of 8 bits
<John>
well there's a performance boost right off the bat
<antocuni>
but is the record really 160 bits? O_o
<antocuni>
I expected it to be a multiple of 64 anyway
<John>
well my encoder takes a random max length of DNA
<John>
And there can be any number of letters of DNA, so each letter can be stored in 2bits, 3bits, 4bits, etc. that's variable
<John>
Once the bit-length per letter is known, and the max string length is known, the table can be made thats as wide as the smallest number of bytes needed
<John>
So if you needed to store 100 letters, 2bits per letter, that's 200 bits
<John>
oh that's exactly 25 bytes
<John>
ok say 101 letters, that's 202 bits
<John>
25bytes + 2bits, so 26bytes in the array
<antocuni>
yes but if you serialize it to a file, you might want to align them to 64 bit boundaries anyway (padding with 0s), if you care about speed
<John>
Hm. it could cost me up to 62 extra bits per entry, and i typically have a million entries or so :(
<John>
I guess that's only 62 megabytes
<John>
no, megabits
<antocuni>
as usual, it's a tradeoff between speed and space :)
<John>
hehe, yeah :( Sucks we can't have our cake and eat it
<antocuni>
anyway, I suggest you to play along the lines of what I pasted above, I think this is the fastest you can do on pypy
<John>
ok, i'll give it a shot
<John>
thanks for your help man :)
<antocuni>
you're welcome
* antocuni
back to the faster-rstruct-2 branch :)
realitix has joined #pypy
black_ant has joined #pypy
<mattip>
ok, down to 270 pandas test failures, out of 8800 tests run, it seems basically usable
<kenaan>
tobweber stmgc[c8-adaptive-trx-length-per-thread] daf9d599a698 /c8/: Use double instead of float
<kenaan>
tobweber stmgc[c8-adaptive-trx-length-per-thread] 66afe82c56ce /c8/stm/core.c: Fix nested measurement of waiting time and time in validation; fix transaction ...
oberstet2 has joined #pypy
vkirilichev has quit [Ping timeout: 260 seconds]
vkirilichev has joined #pypy
[Arfrever] has quit [Ping timeout: 240 seconds]
vkirilic_ has joined #pypy
vkirilichev has quit [Read error: Connection reset by peer]
<fijal>
plan_rich_: around?
<kenaan>
antocuni faster-rstruct-2 4be2157b169f /rpython/rtyper/: WIP: start to implement llop.gc_store_indexed; still missing implementation in the C backend and the JIT
<kenaan>
antocuni faster-rstruct-2 1520c1ffb68f /rpython/rtyper/test/test_llop.py: bah, actually TEST the rtyping of gc_store_index, and fix it
<kenaan>
antocuni faster-rstruct-2 884703561c51 /rpython/translator/c/: test and implement gc_store_indexed in the C backend
<kenaan>
antocuni faster-rstruct-2 0f9bab52cf32 /rpython/jit/: WIP: start to add support for llop.gc_store_indexed in the JIT, which means to add stuff a bit everywhe...
<kenaan>
antocuni faster-rstruct-2 0b70e69aebec /rpython/: hoorray! Implement the last bits of gc_store_indexed in llgraph and finally the test passes :)
<kenaan>
antocuni faster-rstruct-2 84c40c5d2545 /rpython/: add JIT support for gc_store_indexed of floats
<kenaan>
antocuni faster-rstruct-2 a735e006ad8a /rpython/: add a passing test for single floats
<kenaan>
antocuni faster-rstruct-2 d5941a454db5 /rpython/: shuffle the order of arguments of llop.gc_store_indexed to match the existing rop.GC_STORE_INDEXED
<kenaan>
antocuni faster-rstruct-2 88ae2f6e9df5 /rpython/jit/: implement support for gc_store_indexed also in llsupport: this fixes the tests for the x86 backend, and...
<kenaan>
antocuni faster-rstruct-2 ca663c6eea4d /rpython/jit/backend/arm/test/test_llop.py: add the llop test also for ARM
stevi3 has quit [Quit: Connection closed for inactivity]
<dstufft>
Is there a benefit to CFFI API mode over ABI mode in terms of performance?
ronan has quit [Ping timeout: 264 seconds]
antocuni has joined #pypy
ronan has joined #pypy
inhahe_ has quit []
<kenaan>
antocuni faster-rstruct-2 3c31e7d36cc9 /rpython/jit/metainterp/: fix bhimpl_gc_store_indexed_i, which was not tested because the blackhole didn't see the op :(
<kenaan>
antocuni faster-rstruct-2 f99d6f69a91c /rpython/rlib/mutbuffer.py: unroll the loop if count is a small constant