cfbolz changed the topic of #pypy to: PyPy, the flexible snake (IRC logs: https://botbot.me/freenode/pypy/ ) | use cffi for calling C | "nothing compares to the timeshifter, my personal polar expedition in software" - pedronis
rokujyouhitoma has quit [Remote host closed the connection]
cloudyplain has quit [Ping timeout: 246 seconds]
exarkun has quit [Ping timeout: 260 seconds]
exarkun has joined #pypy
arigato has joined #pypy
<arigato>
cfbolz: regarding cloudyplain's question: also, likely, the GC reports the memory used *after* a collection, whereas the top RES is the memory used *before* a collection, i.e. 1.82 times more
<cfbolz>
right
raynold has quit [Quit: Connection closed for inactivity]
marr has quit [Remote host closed the connection]
rokujyouhitoma has joined #pypy
rokujyouhitoma has quit [Ping timeout: 248 seconds]
marky1991 has joined #pypy
oberstet has quit [Ping timeout: 260 seconds]
arigato has quit [Ping timeout: 240 seconds]
realitix has quit [Ping timeout: 252 seconds]
arigato has joined #pypy
realitix has joined #pypy
antocuni has joined #pypy
marky1991 has quit [Read error: Connection reset by peer]
oberstet has joined #pypy
Rhy0lite has joined #pypy
exarkun has quit [Ping timeout: 240 seconds]
exarkun has joined #pypy
kenaan has joined #pypy
<kenaan>
arigo unicode-utf8 0f1073a0843b /rpython/rlib/: rutf8.codepoint_position_at_index() should also work for index == len(u)
marr has joined #pypy
rokujyouhitoma has joined #pypy
rokujyouhitoma has quit [Ping timeout: 248 seconds]
<kenaan>
arigo py3.5 db294f23903a /pypy/module/itertools/: Test and fix for CPython's test_itertools (broken by accident in 17c8c1d27c41)
altendky has joined #pypy
yuyichao_ has quit [Ping timeout: 240 seconds]
forgottenone has quit [Quit: Konversation terminated!]
<krono>
does samuele perdoni hang out here sometimes?
Joannah has joined #pypy
aboudreault has quit [Excess Flood]
jcea has quit [Remote host closed the connection]
aboudreault has joined #pypy
<kenaan>
arigo py3.5 115c47f3c022 /lib-python/3/test/test_marshal.py: Skip one check on pypy
<kenaan>
arigo py3.5 8246644e701d /lib-python/3/test/test_marshal.py: Skip more tests on pypy
kdas_ has joined #pypy
<kenaan>
arigo py3.5 ea46b2e0a190 /lib-python/3/test/test_marshal.py: Fix the last test in test_marshal
yuyichao_ has joined #pypy
<cfbolz>
krono: yes, pedronis
<krono>
ah, stupid me
aboudreault has quit [Excess Flood]
adamholmberg has joined #pypy
kdas_ has quit [Quit: Leaving]
aboudreault has joined #pypy
forgottenone has joined #pypy
yuyichao_ has quit [Quit: Konversation terminated!]
aboudreault has quit [Excess Flood]
Joannah has quit [Ping timeout: 260 seconds]
aboudreault has joined #pypy
jcea has joined #pypy
aboudreault has quit [Excess Flood]
aboudreault has joined #pypy
realitix has quit [Ping timeout: 240 seconds]
<kenaan>
arigo py3.5 1dbe3599b597 /pypy/module/_cffi_backend/test/test_recompiler.py: Fix this test, which might be confused by partially-initialized codecs
<antocuni>
yes, it sounds interesting. However, not sure how to do such a heuristic: you need a way to identify and distinguish the possibly many "types" of dicts in the json
arigato has joined #pypy
<antocuni>
also, what would be VERY nice is to reuse the same map if you load two different json files
<antocuni>
but it's even more complicated
<cfbolz>
Yes
<cfbolz>
antocuni: instance maps get this right, so maybe that part is not terrible
<antocuni>
I admit that I don't know exactly how they work; do they use the class as a hint that two instances might have the same map, or they simply use the fields?
<cfbolz>
I am wrong, anyway
<antocuni>
in other words: if I have two classes with the same fields, do they share the map?
<cfbolz>
Nope
<cfbolz>
The map stores the class too
<cfbolz>
So they have to be different
<antocuni>
right
<antocuni>
then with json it's harder, because you don't know the "class"
<cfbolz>
Yes, you're right.
<antocuni>
I suppose that as a first approximation you could use some sort of "xpath" inside the json; like, these are objects of class "root.students[*].address"
<cfbolz>
Do people do json processing with pypy? Is it worth to think more about it?
<cfbolz>
antocuni: ah, interesting idea!
<simpson>
cfbolz: I do a little JSON. I could probably do medium amounts of JSON. I don't know if it's a thing to store Big Data in JSON.
<antocuni>
cfbolz: this doesn't cover the case in which the "Address class" is used in two different places, but it might be good enough
<arigato>
also, what occurs if you have a dict with tons of different keys
<cfbolz>
antocuni: yes, we'd need to look at real json usages. But I suspect it gives a lot of the use cases
<antocuni>
cfbolz: considering the amount of REST APIs around, I guess it's likely that someone somewhere is processing huge jsons
<cfbolz>
arigato: yes, you aren't allowed to leak the keys then
<cfbolz>
simpson: with pypy?
<simpson>
cfbolz: Yeah. Storing homogenous graph and category structures, and querying them via comprehensions. I can pastebin a bit.
<antocuni>
cfbolz: a quick googling shows that there are tons of python libraries to parse json, each of which claims to be faster than the previous ones
<antocuni>
so this probably means that people are parsing huge files. Then of course they all talk about the speed of parsing because on CPython it's the only thing you can optimize
oberstet has quit [Ping timeout: 240 seconds]
<cfbolz>
Of course
<antocuni>
I wonder if it's possible to abuse the JIT to emit a parser which is optimized to parse a very specific schema
<cfbolz>
antocuni: having to give the schema is boring ;-)
<antocuni>
no, of course it would need to autodetect it somehow
<antocuni>
"schema" in the sense of "file which has a recurrent structure"
<arigato>
yes, with guards if the thing to parse no longer follows what we found so far
<antocuni>
arigato: it sounds like a good topic for the sprint :)
<arigato>
...including things like "there is usually no spaces around semicolon here"
<cfbolz>
It sounds like a nice thesis topic too
<cfbolz>
arigato: heh, fun
<cfbolz>
simpson: right. Basically any time there is an expression d[<constant string>] where d is a dict that comes out of the json parser we could do much better than CPython, given enough effort
<simpson>
cfbolz: I wonder if my usecase is at all common. I'm encoding functions from strings to strings as JSON objects with string values. So I'm constantly reusing the data in an applicative fashion.
<simpson>
WTB frozendict~
aboudreault has quit [Excess Flood]
<antocuni>
simpson: to have an idea of the performance, you could manually transform the json-parsed dictionaries into "instance-based dictionaries": https://bpaste.net/show/d7621b8e4b76
<antocuni>
after this transformation, things like d['category']['graph'] will be much faster
<antocuni>
and basically, what cfbolz was proposing would give you this "transformation" for free
<cfbolz>
Something like that. But maybe it's too much magic anyway
aboudreault has joined #pypy
<simpson>
antocuni: Interesting. How would that interact with attrs?
<antocuni>
cfbolz: not more magic than other things we do in PyPy :)
<kenaan>
cfbolz default 8aeaf30c80e8 /pypy/module/_pypyjson/: cache the string keys that occur in the json dicts, as they are likely to repeat this reduces both parsing time a...
<kenaan>
cfbolz default 64e7df28f623 /pypy/: create a dict with the unicode strategy directly (also fix targetjson)