antocuni changed the topic of #pypy to: PyPy, the flexible snake (IRC logs: https://botbot.me/freenode/pypy/ ) | use cffi for calling C | "PyPy: the Gradual Reduction of Magic (tm)"
jamesaxl has quit [Quit: WeeChat 1.9.1]
ArneBab has joined #pypy
slackyy has joined #pypy
antocuni has joined #pypy
tbodt has joined #pypy
slacky__ has joined #pypy
tbodt has quit [Read error: Connection reset by peer]
tbodt has joined #pypy
slackyy has quit [Ping timeout: 240 seconds]
Garen has joined #pypy
yuyichao has quit [Ping timeout: 248 seconds]
antocuni has quit [Ping timeout: 248 seconds]
yuyichao has joined #pypy
ceridwen has quit [Ping timeout: 258 seconds]
marr has quit [Ping timeout: 260 seconds]
pilne has quit [Quit: Quitting!]
tbodt has quit [Read error: Connection reset by peer]
tbodt has joined #pypy
tbodt has quit [Read error: Connection reset by peer]
tbodt has joined #pypy
ceridwen has joined #pypy
jcea has quit [Quit: jcea]
Rhy0lite has quit [Quit: Leaving]
tbodt has quit [Quit: My Mac has gone to sleep. ZZZzzz…]
ArneBab_ has joined #pypy
ArneBab has quit [Ping timeout: 248 seconds]
_whitelogger has joined #pypy
jamesaxl has joined #pypy
<kenaan_> mattip default d7c94a4970dd /pypy/module/_continuation/test/conftest.py: generate conf.h for tests
<kenaan_> mattip py3.5 d2807ddb8178 /pypy/module/_continuation/test/conftest.py: merge default into py3.5
marr has joined #pypy
Nizumzen has quit [Ping timeout: 240 seconds]
Nizumzen has joined #pypy
antocuni has joined #pypy
<antocuni> arigato: I remember that once you had a script to take a pypy usession directory and create one with only the files needed to run "make"
<antocuni> is it still somewhere?
mattip has joined #pypy
<mattip> antocuni: played with eventlet a bit, http://paste.openstack.org/show/627291
<mattip> runtest == 4 shows about half the samples, so it is something like
<mattip> running an eventlet does not preserve, but does restore, the vmprof.enable
mattip has left #pypy ["bye"]
<antocuni> mattip: interesting
<antocuni> I'm looking at it right now
<fijal> antocuni: I would expect if you do loads('{"foo":"bar"}') (very short string), then rffi2charp will hurt you significantly in _pypyjson
<antocuni> fijal: yes, maybe
<antocuni> fijal: as you said, nowadays it's probably much better to pin the llstr to call rdtoa
<fijal> pinning is not "free", but it's not very expensive either
<fijal> it's surely cheaper than copying a buffer
<antocuni> good, then
<antocuni> the question is whether to pin the whole buffer for the duration of the parsing, or pinning it briefly only when you need to call rdtoa
<kenaan_> fijal unicode-utf8 109fd5f5d4eb /pypy/: start working on pypyjson
<kenaan_> fijal unicode-utf8 8fac293591e9 /pypy/module/_io/interp_textio.py: merge
<fijal> the latter I would think
<fijal> we kinda promised not to do the former
<fijal> I mean it depends - is one loads under one GIL?
marr has quit [Ping timeout: 260 seconds]
<antocuni> I think so, why wouldn't it?
<fijal> I don't know - can it execute python code in between?
<fijal> I mean, I'm sure it can somehow, but in a meaningful way
<antocuni> I has a space.call_function
<antocuni> although the function it's always space.w_int
<fijal> yeah that's what I meant with "meaningful way"
<fijal> I'm sure it can call __eq__ or something
<antocuni> well, probably not since it only handles very primitive types: list, dict, int, float, str
<antocuni> but it's hard to prove
<fijal> I don't think i need a proof
<fijal> just the "how likely it is to call something massive"
<fijal> and I think "not likely at all"
<njs> do you support all the random silliness that json.load does, like object_pairs_hook?
<fijal> we do, but we probably don't go through _pypyjson then
<antocuni> njs: if you pass "strange" options, you go through the slow pure-python version
<antocuni> if you just to json.loads, you use the fast _pypyjson
<antocuni> wow, I'm trying to compile a pypy on my machine; OOM kiiled the final "ld" because it was using 4GB of ram O_o
<fijal> are you using -flto or something?
<antocuni> I'm just using "make debug"
<antocuni> I don't think it uses flto nowadays
<fijal> it went back and forth, I don't know any more
<antocuni> looking at the command line, it doesn't seem so
<fijal> then it looks like a bug?
<antocuni> or maybe it really uses so much, but nobody never noticed?
yuyichao_ has joined #pypy
yuyichao has quit [Ping timeout: 268 seconds]
<fijal> antocuni: there is another reason, \0 at the end
<fijal> I *think* we started adding them everywhere?
<fijal> or at least thought about it
<antocuni> I don't really know
yuyichao has joined #pypy
yuyichao_ has quit [Ping timeout: 240 seconds]
<fijal> arigato: ping
oberstet2 has joined #pypy
<kenaan_> fijal unicode-utf8 8a24f68050df /pypy/module/_ssl/interp_ssl.py: fix _ssl module
mattip has joined #pypy
<antocuni> ok, I *think* the vmprof+eventlet problem is caused by the fact that at each switch, we call vmprof_stop_sampling twice, and vmprof_start_sampling only once
<mattip> about translation using too much memory, if you rerun make alone (outside translation) it uses less memory
<mattip> something like the translation process memory is leaking into the forked subprocess used to run make
<antocuni> mattip: I'm already running make alone
<mattip> ahh, never mind then
<kenaan_> fijal unicode-utf8 467a32f09dd6 /: start fixing _rawffi
<fijal> I wonder if we can kill _rawffi
<antocuni> isn't it still used by ctypes?
<fijal> yeah
<fijal> who uses ctypes
<fijal> ok, so here is a question
<fijal> we allow so far creation of invalid unicode out of C functions (we don't check at the boundary)
<fijal> is it ok?
<antocuni> I think ctypes is still used by some modules written years ago
<fijal> like that's a change
<fijal> (not be able to create invalid unicode using ctypes)
<fijal> but I'm not sure if it's not a bug
<antocuni> not sure what you mean
<fijal> antocuni: if you have a function returning wchar_t*
<fijal> we don't check the ranges of output, we just blindly copy them into the unicode string
<fijal> so if you have wide build and return invalid unicode and you call function via ctypes we get invalid unicode strings
<njs> does cpython validate in this case?
<fijal> I doubt it
<fijal> maybe on python3, they did improve
* fijal wonders if he feels like checking
<antocuni> yes, I think we should basically try to mimic cpython
<antocuni> if cpython allows it, I am SURE that in the wild there is some obscure module which returns invalid unicode from ctypes :)
<fijal> antocuni: well, that's not possible
<fijal> we decided we are not going to allow invalid unicode
<fijal> but I wonder if it's a bug :)
<antocuni> so if we have already decided, why do you ask? :)
<fijal> was this always a bug?
<fijal> I just found out that it does not check (cffi does for example)
antocuni has quit [Ping timeout: 240 seconds]
<arigato> fijal: that's known. you can also use the array module to make invalid unicodes (on python 2)
<arigato> we pondered it and decided not to care for now
<fijal> right
<fijal> arigato: do you feel like fixing _sre?
<fijal> I'll add checking to wcharp2utf8 and stuff and then catch ValueError in callers, I would think
<arigato> ok
<arigato> wcharp2utf8 already checks, by calling the _append() function which checks
<fijal> right, but the caller should catch ValueError right?
<arigato> yes
<arigato> I wrote a comment in wcharp2utf8()
<arigato> and I made sure the only user (at that point in time two days ago) was catching ValueError
<fijal> right, I need to make sure too
<fijal> arigato: we have _rawffi, _sre and _io and we're done with modules
<fijal> then we can do some proper tests
<fijal> and _pypyjson
<fijal> and try to merge to py3k
<fijal> if we can merge it by next week, we have a month to finalize benchmarking, do a release and write a proper blog post
<fijal> which should be enough?
Nizumzen has quit [Quit: KVIrc 4.2.0 Equilibrium http://www.kvirc.net/]
<mattip> not sure what is going on with s390x and virtualenv
<mattip> I thought creating a virtualenv installs updated pip and setuptools?
<mattip> maybe I should try forcing an update?
<kenaan_> mattip buildbot 0a18cb374a4e /bot2/pypybuildbot/builds.py: not needed, virtualenv is deleted by "hg purge"
<kenaan_> mattip buildbot 0548ff25f980 /bot2/pypybuildbot/builds.py: update pip, setuptools
bbot2 has quit [Quit: buildmaster reconfigured: bot disconnecting]
bbot2 has joined #pypy
<mattip> let's see what happens tonite
jcea has joined #pypy
antocuni has joined #pypy
jcea has quit [Quit: jcea]
jcea has joined #pypy
<kenaan_> cfbolz unicode-utf8 82223a975b6b /pypy/module/_pypyjson/interp_decoder.py: fix unicode \-encoding in _pypyjson
<kenaan_> cfbolz unicode-utf8 a9bb96fbf9d4 /pypy/: fix more tests BUT: a slight pessimization, because object decoding becomes a little bit slower
jcea has quit [Client Quit]
jcea has joined #pypy
<arigato> cfbolz: why can't decode_key() return a utf8 byte string instead of a unicode string on default?
<cfbolz> arigato: it can, but it doesn't help anyway
<cfbolz> because on the branch there is no general UnicodeDictStrategy
<cfbolz> (on the branch it only works for ascii strings :-( )
<arigato> I'm fearing that you're chanching pypyjson in this way because it makes sense for now, but then we'll need UnicodeDictStrategy anyway, and we'll forget to revert pypyjson
<cfbolz> yes, I see that fear. I should at least put a todo
<kenaan_> cfbolz unicode-utf8 8dac9e38c3d5 /TODO: add todo
<kenaan_> cfbolz unicode-utf8 6a13aba253bd /rpython/rlib/: use an actual iterator, to make the code nicer (they work well in rpython nowadays)
<kenaan_> cfbolz unicode-utf8 5b81f483c459 /pypy/module/_pypyjson/interp_encoder.py: fix encoding to operate on utf-8 encoded strings
<cfbolz> arigato: before I continue a lot, could you take a look at this diff?:
<arigato> looks good to me
<cfbolz> pfff, confusion
jamesaxl has quit [Read error: Connection reset by peer]
jamesaxl has joined #pypy
mattip has left #pypy ["bye"]
rubdos has quit [Ping timeout: 250 seconds]
<kenaan_> cfbolz unicode-utf8 f5be33826726 /rpython/rlib/: support for append_utf8
<kenaan_> cfbolz unicode-utf8 48da1a44d860 /pypy/objspace/std/unicodeobject.py: replace a lot of uses of StringBuilder by Utf8StringBuilder
<kenaan_> cfbolz unicode-utf8 f5a5189e5314 /pypy/objspace/std/unicodeobject.py: small cleanup of copy-pasted join code
<cfbolz> arigato: it's all completely annoying. architecture-wise we should have a type in rutf8 that contains most of the logic in unicodeobject.py. then, unwrapping a w_unicode would give that type. but then we would get yet another indirection.
<arigato> yes
<arigato> the alternative would be to add a field to the low-level rstr
<arigato> but it's also annoying
<arigato> of course, all these tuple-returning functions we have in the branch now are also relatively costly
<antocuni> uh, apparently we don't have a way to check whether we already installed a `pypyjit.set_compile_hook` :(
<arigato> cfbolz: maybe at some point we should do something about that
<cfbolz> arigato: or not, to discourage designs where you return a lot of tuples :-P
<cfbolz> But yes, I see your point
marr has joined #pypy
<fijal> cfbolz: one of my thinking was "let's not have yet another layer of rpython magic"
<fijal> we can make tuple returning function do what they would do in C right?
<fijal> specifically x, y = foo() kinda call
<fijal> it seems even easy-ish
<arigato> it's all but easy-ish
<arigato> it's a mess that have implications everywhere including throughout the JIT
<fijal> and the gc?
<arigato> dunno, I can see a way that makes it have no implications in the GC
<arigato> but everywhere else
<fijal> right
<fijal> well, any good ideas how to do it otherwise?
<kenaan_> rlamy default 2477eb379774 /pypy/module/_io/interp_textio.py: Keep chipping away at readline_w()
<arigato> no magic idea that will solve all your use cases, no
<fijal> well, one option would be to return the builder
<fijal> which is again a bit of a mess for JIT
<fijal> arigato: is there a good way to measure if returning a tuple is indeed a problem?
<fijal> arigato: should I attack _rse?
<bbot2> Started: http://buildbot.pypy.org/builders/own-linux-x86-64/builds/6390 [fijal: force build, unicode-utf8]
<arigato> feel free to start but please tell me if you stop, so that I can work on it
<fijal> arigato: ah ok, I won't do anything today I think
<fijal> and maybe I should actually not do anything tomorrow either :-)
<fijal> so feel free to do anything you want, I still have _rawffi to whack if I want to do something
<bbot2> Failure: http://buildbot.pypy.org/builders/own-linux-x86-64/builds/6390 [fijal: force build, unicode-utf8]
oberstet2 has quit [Ping timeout: 250 seconds]
<fijal> ok that looks like someone broke default
<fijal> is it fixed?
<fijal> arigato: so cfbolz has a good point that we already use utf8 on pypy3
<fijal> so maybe having an rpython-level utf8 string would solve both the tuple issue and pypy3 issue?
<arigato> right, it would solve a few deeper issues than the tuple one, like recomputing things currently stored on W_UnicodeObject in some situations
<arigato> on the other hand, it's a major mess
<arigato> pypy3 doesn't "have" a utf8 string, it just uses a regular string that happens to contain utf8
<fijal> why is it a major mess?
<fijal> I mean, we would use a subclass of str() on emulated level, with rpython-level being slightly different with extra fields
<fijal> I think the main problem is that emulated level will be even slower, but maybe that's ok?
<arigato> so you're thinking about a rstr.UTF8STR that would look like a rstr.STR with a few extra fields?
<fijal> yeah
<fijal> and the emulated layer would be *cough* a subclass of str
<arigato> would it annotate as a different and incompatible SomeUtf8String ?
<fijal> yeah
<arigato> I'm sure you'll need sometimes to convert between that and a regular str, is making a copy ok?
<fijal> we can have an operation that does that
<fijal> makes a copy while emulated and a cast when not emulated
<fijal> "cast"
<fijal> one way you need to scan a string anyway
<arigato> well, how do you "cast"?
<fijal> right
<arigato> the rstr.UTF8STR cannot be compatible enough, not easily
<arigato> you'd need a copy, which defeats the point of .encode('utf8') not making a copy
<fijal> indeed
<fijal> there are messier options of course
<fijal> like, have a bit saying which one is it and storing extra data at the end of the string
<fijal> (which is, super messy)
<arigato> as I said earlier we could have an extra pointer inside all rstr.STR
<arigato> so that we don't need a different rstr.UTF8STR
<fijal> yes, that's an option too
<fijal> it kinda shifts the balance in RPython a bit
* fijal should really make food
<fijal> arigato: the problem is as follows - what do we do with py3k?
<fijal> where text_w returns utf8 string (but no flags)
<fijal> do we rerun check_utf8 when rewrapping it?
<fijal> maybe?
<fijal> and we write a super fast check_utf8
<fijal> or do we do something else?
<fijal> that sounds like the easiest option for now (and one that's also an improvement on the current setup anyway)
<arigato> I guess you're talking in the continuation of the current work
<arigato> not in the RPython string hack world
<fijal> I mean - how do we merge utf8 to py3k
<arigato> because in the RPython string hack world, it's easier
<fijal> after the merge to default
<arigato> yes, I understand
jamesaxl has quit [Read error: Connection reset by peer]
<arigato> I'm saying, we came up with a different idea, so let's explore it a little bit
<fijal> yes, sure
<arigato> in this different world, it's easier for py3k
<fijal> so the question is - do we explore it now or do we first try to merge the current approach to py3k?
<arigato> who knows
jamesaxl has joined #pypy
<fijal> note that even if we call check_utf8 at the rewrapping, it's STILL a massive improvement over the current situation
<fijal> and gives us clear path how to finish the branch (and mozilla contract)
<fijal> maybe we should make it a Leysin sprint topic "improve even further" :-)
<arigato> I should ask I guess: are you sure that the work that CPython/PyPy5.9 does in .encode('utf8') and .decode('utf8') is really enough to offset the extra overhead in the unicode-utf8 branch of mostly every other operation?
<arigato> well it's also less memory, so it's not clearly "every other operation"
<fijal> what is "every other operation"?
<fijal> getitem, sure
<arigato> but every operation actually looking inside the string, like most unicode methods, are probably a bit slower
<fijal> (and yes, I believe so)
<fijal> I doubt it
<fijal> eg find scans a lot less of memory
<arigato> ok
<arigato> I guess we'll see in benchmark results
<fijal> startswith for example should be faster
<fijal> arigato: well, give me an example :)
<arigato> things like UnicodeDictStrategy missing is probably costing something too
<fijal> again, no
<fijal> because I added the one for ascii
<fijal> and we don't run a single benchmark with an actual non-ascii unicode payload I think
<fijal> isupper is probably slower
<fijal> no, it's exact same speed on constant string
<arigato> ok, then maybe. I'll trust the benchmarks
<fijal> I think we SHOULD benchmark unicode non-ascii payloads :)
<fijal> but then we never did, so complaining that the branch might do something there is a bit problematic
<arigato> it seems to me that there is more complexity, which will translate into slower interpreted code and more bridges in the JIT
<arigato> but that's only a guess
<arigato> "more bridges" is mostly about: you do a small operation on a unicode string, and you get a bridge for ascii/non-ascii-unicode-string
<fijal> right
<fijal> let's translate and have a look
<fijal> we should also carefully look at some logs
<fijal> (eg check_utf8 forcing virtual strings etc)
<fijal> but that's a bit why I wanted to finish the modules, so I can have benchmarks
<cfbolz> FWIW, I also think that we should reduce the number of special cases for ASCII
<fijal> ok, so the logs are quite bad for example
<fijal> jitlogs
<cfbolz> For what kind of operation?
<fijal> addition, here
<fijal> for i in range(10000): unicode(i) + some_constant_unicode
<fijal> maybe bytes object should keep in mind where it's a valid utf8
<fijal> arigato: anyway, yes, lots of tweaking required
<fijal> so no, I'm not sure
<fijal> but we need to check
yuyichao has quit [Ping timeout: 240 seconds]
yuyichao has joined #pypy
slacky__ has quit [Ping timeout: 248 seconds]
rubdos has joined #pypy
<kenaan_> rlamy default 189c2cce360e /: More refactoring: deal with the remnant more explicitly and handle size limit inside _find_line_ending()
antocuni has quit [Ping timeout: 260 seconds]
jamesaxl has quit [Read error: Connection reset by peer]
jamesaxl has joined #pypy
<kenaan_> rlamy default 9c9233da7cc4 /pypy/module/_io/: Replace (pos-if-found, pos-if-not-found) tuple with (position, found)
<kenaan_> rlamy unicode-utf8 f9a1926628b2 /: hg merge default
cjwelborn has quit [Ping timeout: 252 seconds]
jcea has quit [Remote host closed the connection]
jcea has joined #pypy
traverseda has quit [Read error: Connection reset by peer]
traverseda has joined #pypy
jamesaxl has quit [Quit: WeeChat 1.9.1]
pilne has joined #pypy
traverseda has quit [Remote host closed the connection]
cjwelborn has joined #pypy
kolko has quit [Quit: ZNC - http://znc.in]