<beneroth>
does this give enough relevance for penti that a wikipedia article would not be deleted? :P
<beneroth>
(storace space seems to be an issue *irony*
<C-Keen>
beneroth: we will just need an article about penti then it is relevant enough
<Regenaxer>
C-Keen: Yes, I know. I talked to him last year in Nürnberg and gave him the Pinti sources
<beneroth>
C-Keen, right, we should write to the heise.de guy who likes the demanding version of user friendliness
<C-Keen>
Regenaxer: met him a couple of weeks ago where he showed it to me. Also nice, he uses gestures for common tasks such as rubbing out chars or pressing enter
* beneroth
mumbles about missing time stamps on websites...
<beneroth>
wow, nice analogy he has there! "My slogan for this is “Point of view is worth 80 IQ points” (you can use “context” or “perspective” etc.). A poor one might subtract 80 IQ points! "
<beneroth>
"Turns out MySQL’s utf8 charset only partially implements proper UTF-8 encoding. It can only store UTF-8-encoded symbols that consist of one to three bytes"
<beneroth>
bwahaha
<beneroth>
pilDB ftw!
<Regenaxer>
oh
<Regenaxer>
same for pil!
<Regenaxer>
1 ... 3 bytes
<Regenaxer>
ie 16 bit chars
<Regenaxer>
like Java
<beneroth>
really?
<beneroth>
reeeeaaaallllll yyyyy ?
<beneroth>
omg that is big fail
<beneroth>
and does it also slice down the characters silently without any errors as MySQL?
<beneroth>
because that is likely to introduce security vulnerabilities using the database
<Regenaxer>
I believe it will give garbage, never tested it
<Regenaxer>
I never had characters beyond 16 bits
<beneroth>
smileys
<Regenaxer>
Could be extended
<Regenaxer>
yeah, true
<Regenaxer>
How is it in Java then?
<beneroth>
no idea. but java is bad, so we don't care.
<Regenaxer>
When I wrote it, I looked at what Java did
<beneroth>
exploit: one could send text (e.g. web client) which gets stored in the database, and why by silent altering gets turned into a working malicious code when retrieved and embedded into html website
<Regenaxer>
On retrieval nothing can happen
<beneroth>
maybe harder in pil than with MySQL because usually the application which does the validation would usually be pil too
<Regenaxer>
always returns pil chars
<beneroth>
yes
<beneroth>
and those chars might be a nice malicious javascript code string
<Regenaxer>
I/O could be easily fixed
<Regenaxer>
'hash' is a problem perhaps
<beneroth>
which it wasn't/looked like during validation before being stored (at least if the validation was not done within MySQL/pil)
<beneroth>
Regenaxer, I would be happy to know this fixed. not because of the utter relevance, but because this not working is against the promise of scalability and compatibility of picolisp
<beneroth>
yeah if it is possible with signed integers, than extending characters in strings is probably easy fixed too
<Regenaxer>
Well, except for 'hash' the fix is easy
<beneroth>
what is the problem with hash? that it gets more input than it thinks?
<Regenaxer>
Only that it is limited to 16 bits
<beneroth>
the input?
<Regenaxer>
I don't know where there are assumptions of 16-bit max chars
<Regenaxer>
I/O, symbols etc. are no problem
<beneroth>
the output of hash is fixed 16bits, but it should be able to handle input of arbitrary length, no?
<Regenaxer>
symbol names are just a byte stream
<beneroth>
yeah
<Regenaxer>
I think there is nowhere a problem, except that bigger chars get truncated
<Regenaxer>
I can't imagine an attack scenario here
<beneroth>
not in pil VM
<Regenaxer>
also not in a browser
<Regenaxer>
as I said, at always outputs legal chars
<Regenaxer>
it cannot do otherwise
<beneroth>
well if you manage to create a string which with un-truncated chars look fine but with truncated chars is malicious
<beneroth>
depends heavily on the use case, nothing to do with pil itself
<Regenaxer>
not sure
<Regenaxer>
it is just a byte stream
<Regenaxer>
also for I/O
<Regenaxer>
only 'chop' etc cares I think
<Regenaxer>
Why is Java no problem then?
<Regenaxer>
So many web sites use Java servers
<beneroth>
e.g. imagine a web application, not written in picolisp but some other language. it takes the input string, validates it (no bad chars in it, okay), sends it to pil app, strings gets altered by UTF truncating, stores it. later it gets retrieved by the web app (not pil), which receives the now truncated stream (not equal to what it send to pil originally) and might embed this in browser (or its C code to be executed or whatever)
<Regenaxer>
What *are* bad chars?
<beneroth>
good question. I would assume that they have the problem too, unless they throw an exception instead of silently truncating
<Regenaxer>
If there are "bad chars", you cannot stop anybody to use them directly! ;)
<Regenaxer>
So I do not see atm the point of your worries
<Regenaxer>
If you store a smiley in the DB, it will come out again just ok
<beneroth>
you can by validating, e.g. make sure that there is no "<script>" within text which come from clients and end up rendered within html later
<Regenaxer>
only if pil code operates on the chars, eg. with 'chop', it sees garbage
<beneroth>
ok, than the issue in pil is not as bad as in MySQL
<Regenaxer>
No part of a pil web app cares about <script>*
<Regenaxer>
this would be very bad, if any user input gets executed
<Regenaxer>
I miniPil it is even worse
<beneroth>
textbox description -> gets somewhere embedded as (<p> NIL (: desc)) -> if you can get the app to accept (<script>) within that text than you can attack everyone who renders that html
<Regenaxer>
handles only 6 1/2 bytes per char :)
<beneroth>
not pil is executing it
<beneroth>
the point is not that pil would be responsible to validate against that, it can't, that only the specific application can
<Regenaxer>
<p> uses ht:Prin
<beneroth>
but the point is, it can only do it if it is guaranteed to get back what it stored, if gets something alerted back, it might be possible to cover something up for the validation check
<Regenaxer>
A "<" can never be encoded this way
<Regenaxer>
in UTF
<beneroth>
yeah, just an example
<Regenaxer>
other byte ranges
<Regenaxer>
yes
<Regenaxer>
I'm very sure it is safe
<Regenaxer>
no way to attack this way
<Regenaxer>
There is no such validation check needed in a pil app
<Regenaxer>
so it cannot be tricked, as there is none ;)
<beneroth>
ah, the exploit might be much less weaker or not possible in pil
<Regenaxer>
I think so
<beneroth>
apparently, MySQL also happened to truncate the rest of the string after the too-big-char
<beneroth>
This taught me that tables with utf8 as charset can not store astral symbols (whose code points range from U+010000 to U+10FFFF). So what happens if we try to store one of these symbols nonetheless? Apparently, everything after such a symbol is just discarded. So for example, when trying to insert foo𝌆bar, MySQL will discard 𝌆bar and just store foo.
<Regenaxer>
I think the only place where pil discards the rest is at a null byte
<beneroth>
but still. if you manage to find a high unicode character which gets truncated to a meaningful character in the executing environment (not pil!), there might be an attack surface
<Regenaxer>
like in C
<Regenaxer>
The only safe way in any lang is when input strings are never interpreted as code
<beneroth>
T
<Regenaxer>
This is in pil, if 'allow' etc. is used proberly
<beneroth>
T
<Regenaxer>
and 'repl' not used in a web page :)
<beneroth>
T
<beneroth>
and you don't do (read) at stupid places :D
<Regenaxer>
right
<beneroth>
or (load) user input
<beneroth>
etc
<Regenaxer>
Though pil is powerful enough to make it possible
<beneroth>
if there would be a possibility to trick ($tim) to do something bad with an argument, that would be a hole
<Regenaxer>
Extending I/O in pil would be trivial. About 4 places in the kernel
<beneroth>
that is the problem with all those SQL websites, still usually doing manual string concatenation
<Regenaxer>
but I don't know how many assumptions are there that chars have 16 bits
<beneroth>
ok
<Regenaxer>
eg. in the Java interfaces
<Regenaxer>
or 'hash'
<Regenaxer>
For 'hash' it is perhaps only the documentation
<beneroth>
so would need a review of all everything working with symbol names
<beneroth>
s/working with/operating on
<beneroth>
(whenever it is not just treated as a binary stream)
<Regenaxer>
No, just grep for 0xC0 (in the C sources) or (hex "C0") in asm
<beneroth>
two hits
<beneroth>
io.l and sym.l
<Regenaxer>
yes
<Regenaxer>
this is one direction
<Regenaxer>
the other direction uses 0x80
<Regenaxer>
and 0xE0
<Regenaxer>
such patterns
<Regenaxer>
just a few functions in the sources
<Regenaxer>
As I said, I don't remember the consequences at the moment
<beneroth>
ok
<beneroth>
yeah
<Regenaxer>
a typical case is:
<Regenaxer>
void charSym(int c, int *i, any *p) {
<Regenaxer>
if (c < 0x80)
<Regenaxer>
byteSym(c, i, p);
<Regenaxer>
else if (c < 0x800) {
<Regenaxer>
byteSym(0xC0 | c>>6 & 0x1F, i, p);
<Regenaxer>
byteSym(0x80 | c & 0x3F, i, p);
<Regenaxer>
}
<Regenaxer>
else if (c == TOP)
<Regenaxer>
byteSym(0xFF, i, p);
<Regenaxer>
else {
<Regenaxer>
byteSym(0xE0 | c>>12 & 0x0F, i, p);
<Regenaxer>
byteSym(0x80 | c>>6 & 0x3F, i, p);
Regenaxer has quit [Remote host closed the connection]
<beneroth>
oops
Regenaxer has joined #picolisp
<beneroth>
wb
<beneroth>
;)
<Regenaxer>
hehe, ^D again :)
<Regenaxer>
void charSym(int c, int *i, any *p) {
<Regenaxer>
if (c < 0x80)
<Regenaxer>
byteSym(c, i, p);
<Regenaxer>
else if (c < 0x800) {
<Regenaxer>
byteSym(0xC0 | c>>6 & 0x1F, i, p);
<Regenaxer>
byteSym(0x80 | c & 0x3F, i, p);
<Regenaxer>
}
<Regenaxer>
else if (c == TOP)
<Regenaxer>
byteSym(0xFF, i, p);
<Regenaxer>
else {
<Regenaxer>
byteSym(0xE0 | c>>12 & 0x0F, i, p);
<Regenaxer>
byteSym(0x80 | c>>6 & 0x3F, i, p);
<Regenaxer>
byteSym(0x80 | c & 0x3F, i, p);
<Regenaxer>
}
<Regenaxer>
}
<Regenaxer>
So this mess must be extended
<Regenaxer>
from 3 to 5
<Regenaxer>
ugly
<beneroth>
I see
<beneroth>
Regenaxer, well you have given me my picolisp smugness arrogance back, seeing that pil only truncates on the single character and not on the string :P
<Regenaxer>
ok
<beneroth>
as long as MySQL looks worse the world is in order (not entirely serious ofc)
<Regenaxer>
The reason to change it would be smileys and other new chars, not so much security
<beneroth>
yeah I see
<Regenaxer>
I put it onto my todo list
<Regenaxer>
I'll do it only for pil64
<beneroth>
well I'm still not absolutely sure that there could never be a security issue because of this, but it is extremely unlikely, would require very unusual setup and the exploit would surely happen outside of pil code
<beneroth>
thank you
<Regenaxer>
For Ersatz it is meaningless anyway
<beneroth>
haha
<Regenaxer>
yes, please think of an attack scenario
<Regenaxer>
which could be caused by this
<Regenaxer>
I think if there were any, it would work also with the current version
<Regenaxer>
regardless of char size I mean
<beneroth>
T, you would just use a 13 byte char instead of 5
<Regenaxer>
wow
<Regenaxer>
why 13 especially? Because of bad omen?
<beneroth>
ah no
<beneroth>
because here was a perl comment about using 13 bytes
<Regenaxer>
In pil it exists internally, in symbol names
<Regenaxer>
as a stream of bytes
<Regenaxer>
Java Strings are arrays of 16 bit unsigned ints
<Regenaxer>
'char' type in Java
<beneroth>
thanks for the explanations and your time, Regenaxer
<Regenaxer>
welcome!
<Regenaxer>
Thanks for reminding me of the 16-bit limitation
<beneroth>
yeah I'm the annoying wadenbeisser. intended to question technical merit. not meant personally in any bad way, quite the opposite. tell me please when I get too annoying or ask too stupid stuff.
<Regenaxer>
nono, don't worry
<beneroth>
not possible to communicate this way with non-technical people.
<beneroth>
yeah ;)
<Regenaxer>
The 3-byte issue is propagating through many places. Really tedious to change
<beneroth>
I trust you to tell me to shut up when I get carried away too much
<beneroth>
tricky
<Regenaxer>
Most difficult is the uppc/lowc issue
<Regenaxer>
It all assumes 16 bits
<beneroth>
and the question is if it has to be done again when eventually unicode grows more (fantasy movie letters or whatever)
<beneroth>
hm... is uppc/lowc even meaningful for high unicode?
<Regenaxer>
I think 32 bits last a long way
<Regenaxer>
don't know
<beneroth>
else I would limit uppc/lowc to 16 bits, document the limit and nothing more
<Regenaxer>
hmm, ugly
<beneroth>
yeah its hard
<Regenaxer>
No, I think I stay with 16 bits. I don't see a problem
<Regenaxer>
Note that I/O works
<Regenaxer>
only *processing* chars gives garbage
<beneroth>
Regenaxer, you heard about the unicode domain hacking? e.g. making a domain "google.com" but e.g. the "e" is not a normal "e" but a usually similarly rendered unicode character. so valid domain, possible to get a certificate for it, but for the user it looks probably like something else
<Regenaxer>
chop, char etc
<Regenaxer>
yes
<Regenaxer>
kind of phishing
<beneroth>
was a topic with firefox and chrome some months ago. I think chrome partially solved it by give such characters a background. not really secure but can't do much more without breaking things.
<beneroth>
yeah.
<Regenaxer>
yes
<beneroth>
and I agree that this should not be limited.
<Regenaxer>
which "this"?
<beneroth>
unicode in domain names
<Regenaxer>
T
<beneroth>
the secure solution (which is meant for this scenarios!) would be well maintained certificate stores.
<Regenaxer>
and users checking them
<Regenaxer>
that's the main problem
<Regenaxer>
(not checking)
mario-goulart has quit [Read error: Connection reset by peer]
mario-goulart has joined #picolisp
<beneroth>
T
<Regenaxer>
What I will do is implement sanity checks in UTF-8 parsing
coffeecup12345 has quit [Ping timeout: 248 seconds]
coffeecup12345 has joined #picolisp
karswell_ has quit [Ping timeout: 248 seconds]
karswell_ has joined #picolisp
<Regenaxer>
ret
<Regenaxer>
tankfeeder: (= 1 (legendre could use '=1'
<tankfeeder>
indeed
<Regenaxer>
like in the other places
<tankfeeder>
i used somewhere
<Regenaxer>
Could the loop be 'while' ?
<Regenaxer>
when (list R (- P R)) is at the end
<Regenaxer>
Cause it has only a single exit
<tankfeeder>
indeed
<Regenaxer>
(the loop I mean)
<Regenaxer>
All nothing important
<tankfeeder>
loop should be until
<tankfeeder>
i will fix
<beneroth>
I just had an issue with (pipe), apparently I called it too quick & too often, resulting in too many open fd's.. I added a (wait 1) before the (pipe) call, that resolved it
<beneroth>
(yep some ugly abuse of (ht:) on strings instead of streams)
<tankfeeder>
fixed, thanks.
Regenaxer has left #picolisp [#picolisp]
Regenaxer has joined #picolisp
<Regenaxer>
yeah, 'until'
<Regenaxer>
beneroth
<Regenaxer>
ok
<Regenaxer>
so you use a pipe for 'ht:Prin'
<beneroth>
aye
<Regenaxer>
Then better not to make a new pipe each time
<beneroth>
my use case is download and later re-upload
<Regenaxer>
when the target db is different
<beneroth>
ok
<beneroth>
no different DB for me then
<Regenaxer>
then it is not necessary I think
<Regenaxer>
the most involved case I find:
<Regenaxer>
(dm (dumpKey> . +USt) ()
<Regenaxer>
(unless (: T)
<Regenaxer>
(when
<Regenaxer>
(assoc (get X 'key)
<Regenaxer>
(quote
<Regenaxer>
("16% Umsatzsteuer" . "U16")
<Regenaxer>
("16% Vorsteuer" . "V16")
<Regenaxer>
("USt (voll)" . "U19")
<Regenaxer>
("VSt (voll)" . "V19")
<Regenaxer>
("frei" . "u0")
<Regenaxer>
("ust (halb)" . "u7")
<Regenaxer>
("vst (halb)" . "v7") ) )
<Regenaxer>
(cons 'key (cdr @) 'key) ) ) )
<Regenaxer>
The (unless (: T) seems always there
<Regenaxer>
ie. if the object is not deleted
<Regenaxer>
s/deleted/lost
<beneroth>
yeah, saw this too
<beneroth>
ok
<Regenaxer>
ah, yes, the default method has it too
<Regenaxer>
yep
<beneroth>
I also got some example scripts from you long ago
<Regenaxer>
ok
<Regenaxer>
Perhaps the same I just looked up
<beneroth>
likely
<beneroth>
I expect (dumpDB) to make a .tgz
<beneroth>
but it only puts the blobs into the .tgz, not the .l
<beneroth>
?
<beneroth>
ah right the filter issues this
<Regenaxer>
yes
<Regenaxer>
only if there are blobs
<beneroth>
yeah, and then you end up with 2 files, .l and .tgz
<Regenaxer>
T
<beneroth>
I guess I just pack both again into another .tgz.
<beneroth>
debian package principle
<Regenaxer>
yes
<Regenaxer>
or just .tar
<Regenaxer>
but double compression does not harm
<beneroth>
usually not
<beneroth>
unless the data cannot be compressed further.. than it just adds overhead :)
<joebo>
beneroth: just a thought -- not sure... I wonder if you could wrap it in a (goal (quote ..
<joebo>
oh shoot, I was scrolled up and didn't see this was already addressed
<joebo>
looks like my hunch was correct though! :)
<joebo>
I got the tip from looking at app/gui.l
<Regenaxer>
joebo, correct
<beneroth>
joebo, thanks :)
<joebo>
np :)
<beneroth>
you get expert points :)
<joebo>
hah!
<joebo>
maybe advanced beginner :)
<joebo>
almost to novice
<beneroth>
Regenaxer, why did you do (out "file.tgz" (in (append ("tar" "cfz" "-") Files))) instead of just (call 'tar ..) ? why the streaming?
<beneroth>
in @lib/too.l
<beneroth>
does this have any advantages?
<joebo>
beneroth: looks like there's a filter clause?
<joebo>
(in (append '("tar" "cfz" "-") (filter format @))
<beneroth>
yeah
<beneroth>
only the files which have numbers as filenames
<Regenaxer>
T
<joebo>
guessing it's because it's easier to apply filters on what files get included in pil vs part of the command line
<beneroth>
ah, instead of building the (call) by hand, e.g. (append '(call ..) Files) ?
<Regenaxer>
you would use (apply call ... ?
<beneroth>
no, I would build a list and then eval xD
<Regenaxer>
also fine
<beneroth>
(I believe this is better than apply?)
<beneroth>
just less readable, you are right
<Regenaxer>
yes, but it depends as eval evaluates the args a second time
<beneroth>
T
<Regenaxer>
(apply foo Lst) = (eval (cons 'foo (mapcar lit Lst))) or so
<beneroth>
yeah
<Regenaxer>
but in lib/too.l we have 'in', so 'call' is of no help
<beneroth>
yeah I meant of course instead of (in) (out)
<beneroth>
:)
<Regenaxer>
ok
<beneroth>
other question.. so now I have my nice dump.. and I want to (loadDB) it... but the +Key of the main object already exists.. and I want this to be ok...
<beneroth>
it should generate a new record
<Regenaxer>
Then perhaps don't export the keys
<Regenaxer>
and build them on import?
<beneroth>
aye, right
<beneroth>
question is, if the dump too.ls have already a way for that?
<beneroth>
so I could overwrite (dumpKey>) to return the genKey code instead?
<Regenaxer>
I think not, as this is in the old environment
<beneroth>
ok
<beneroth>
np
<Regenaxer>
Better a post-process then
<Regenaxer>
I think there is nothing built-in
<beneroth>
I have to go, to pick up my gf
<beneroth>
yeah
<beneroth>
I can do it
<beneroth>
thanks!
<Regenaxer>
Have a nice evening!
<beneroth>
thank you, you all too :)
<Regenaxer>
:)
<beneroth>
bbl
<beneroth>
(away
* beneroth
so much lisp today.. even IRC should be ( instead of /
<Regenaxer>
hehe
<Regenaxer>
Perhaps you can trick dumpKey> into outputting a (genKey) expression?
<beneroth>
yeah that was my idea ^^
<Regenaxer>
Just experiment
<beneroth>
aye
<Regenaxer>
inspecting the resulting load file
coffeecup12345 has quit [Ping timeout: 258 seconds]
<cess11>
beneroth: For tasks that may choke with fd:s or somesuch GNU parallel could be a useful dump for those pipes. Haven't done much of the sort myself but tried it a little this summer, seemed quite nice for a basic loadbalancer.
<cess11>
beneroth: And I think it is the proper way to arrange the new +Key values in the new db, at least I've found this to be straightforward and fairly quick.
<beneroth>
thanks cess11
<viaken>
GNU parallel is fantastic.
karswell_ has quit [Read error: Connection reset by peer]
mtsd has quit [Remote host closed the connection]
zod_ has joined #picolisp
zod_ has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
zod_ has joined #picolisp
zod_ has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]