ChanServ changed the topic of #picolisp to: PicoLisp language | Channel Log: https://irclog.whitequark.org/picolisp/ | Check also http://www.picolisp.com for more information
xkapastel has quit [Quit: Connection closed for inactivity]
Phoenixwater[m] has left #picolisp [#picolisp]
<Regenaxer> I don't want to implement those huge unicode tables for 'uppc' and 'lowc' again for pil21
<Regenaxer> Now I'm very surprised to see that there seems no portable way (i.e. a C function)
<Regenaxer> At least it seems overkill
<Regenaxer> Sigh
<tankf33der> i think you should generate tables once in different file(s) and just use it
<Regenaxer> The problem is that I don't understand this case conversion
<tankf33der> but this is already done in src64, right?
<Regenaxer> for example, there is now support for uppercase ß (german s) in unicode
<Regenaxer> yes, but not correct now
<Regenaxer> needs to update for upper case ß
<Regenaxer> How to do it?
<Regenaxer> I don't want to maintain such stuff too
<tankf33der> i see
<Regenaxer> Really surprising why there is no standard support
<Regenaxer> C only has toupper/tolower for ascii or wide chars
<Regenaxer> not for utf8
<Regenaxer> I don't even find a clear description of the algorithm how to do *correct* case conversion in UTF-8
<Regenaxer> Unicode consortium
<Regenaxer> all very confusing
<Regenaxer> What do you think about the above glib?
<Regenaxer> portable?
<Regenaxer> overkill?
<Regenaxer> It supports tons of functions
<Regenaxer> I have to link them all into pil just to get uppc and lowc
<tankf33der> i belive you should not link to glib
<tankf33der> or musl
<Regenaxer> What I really want is up-to-date tables plus a clear description how to handle them
<Regenaxer> yeah
<tankf33der> let me check myrlang implementation
<Regenaxer> myrlang?
<tankf33der> yea
<tankf33der> language no one cares, as usual
<Regenaxer> Myrddin?
<tankf33der> yea
<tankf33der> i seen somewhere tables and thought picolisp have the same
<Regenaxer> I took them from some free Java project
<Regenaxer> 25 years ago or so
<Regenaxer> "Kaffee" project
<Regenaxer> But I never understood those tables
<Regenaxer> GNU Kaffe Project
<Regenaxer> (see comment in src/sym.c)
<Regenaxer> I could easily convert them to pil21 syntax
<Regenaxer> no problem
<Regenaxer> But how to handle new things*
<Regenaxer> ?
<Regenaxer> I could even just copy/paste from pico/src/sym.c to pil21/src/lib.c
<Regenaxer> But I don't like this
<Regenaxer> Having to roll everything yourself for such a standard thing like utf8
<Regenaxer> stupid
<tankf33der> found
<Regenaxer> How do we know these tables and algos are correct, or better than src/sym.c?
<Regenaxer> "plan 9's runetype.c" is that even still supported?
<tankf33der> problem only in conv up-low ?
<tankf33der> because current utf8 is simple, tested by me
<tankf33der> because current utf8 is correct, tested by me
<Regenaxer> yes, only for uppc and lowc
<Regenaxer> All other utf8 is already in pil21
<tankf33der> solution create *full* test vector by python and test.
<Regenaxer> General testing is perhaps not needed
<Regenaxer> only *new* characters in unicode
<Regenaxer> like upper-case ß
<tankf33der> eh
<Regenaxer> Is in unicode recently
<Regenaxer> and perhaps other characters
<Regenaxer> Unicode is changing all the time
<Regenaxer> Ideal would be some library published by the unicode consortium
<Regenaxer> some *official* code
<Regenaxer> Not everybody rolling his own
<tankf33der> not portable, even libffi maybe problem
<Regenaxer> libffi too?
<Regenaxer> I thought it looks very portable
<tankf33der> maybe.
<tankf33der> so you already have dependeci
<tankf33der> i dont trust glib, who will port glib to riscv? :)
<tankf33der> linux distro maintainers?
<Regenaxer> clang maintainers
<Regenaxer> What we really need is support in clang
<Regenaxer> pil21 should use only clang for system calls
<tankf33der> wow, some utf8 maybe invalid sequences
<Regenaxer> where?
<Regenaxer> Ah, yes, of course
<Regenaxer> utf8 has a special byte format
<Regenaxer> So almost any random byte sequence is illegal
<tankf33der> damn, unicode 13 is coming.
<Regenaxer> o
<Regenaxer> Perhaps ask in some clang forum?
<tankf33der> eh
<Regenaxer> I think it should be the duty of clang to maintain such stuff
<tankf33der> and python and ruby and dlang and so on
<tankf33der> also we have this one
<Regenaxer> We need it across Linux, BSD, Mac, Android and iOS
<Regenaxer> yeah, wide.l
<Regenaxer> forgot that one
<tankf33der> also checking all links from this:
<Regenaxer> T
<Regenaxer> Tons of docs, yes, but which is the "right" one? ;)
<tankf33der> no one knows until you started do something
<tankf33der> hunting for simple pages like this:
<Regenaxer> yeah
<Regenaxer> very good explanations
<Regenaxer> the second link has "One-to-many: (ß → SS )"
<Regenaxer> So this is the case where we have a new (single) char now
<Regenaxer> And that link also shows how complicated it all is. So *not* everybody should have to roll his own
<Regenaxer> There must be some reference implementation somewhere ...
_whitelogger has joined #picolisp
<tankf33der> afk.
<Regenaxer> great
<tankf33der> found how musl do case things
<Regenaxer> Looks quite short
<Regenaxer> What does musl do with "ß"?
<tankf33der> unknown yet.
<Regenaxer> The official table should be http://www.unicode.org/Public/UCD/latest/ucd/CaseFolding.txt
<Regenaxer> (same plase as EastAsianWidth.txt)
<Regenaxer> So I will study this. Perhaps we'll do it similar to the wide char stuff
<tankf33der> sounds good
<Regenaxer> yeah, at least easy
<Regenaxer> just lookup
<tankf33der> analysis about my dlang bugint multiplication
<Regenaxer> ah, yeah
<Regenaxer> buffer size bug
<Regenaxer> hehe "got undetected for so long"!
<Regenaxer> The CaseFolding table has it: 1E9E; S; 00DF; # LATIN CAPITAL LETTER SHARP S
<Regenaxer> But the other direction maps to "SS": 00DF; F; 0073 0073; # LATIN SMALL LETTER SHARP S
<Regenaxer> Problem is that I don't know from the table which one *is* already upper or lower
<Regenaxer> It just maps to the other case
<Regenaxer> I need to split that into two tables it seems
<tankf33der> like musl, right? this function also ignores a lot of ranges
<Regenaxer> Not sure
<tankf33der> static wchar_t __towcase(wchar_t wc, int lower)
<Regenaxer> wchar_t is not helpful as far as I understand
<Regenaxer> And I don't understand the CaseFolding table
<Regenaxer> The left column contains lowercase and some uppercase
<Regenaxer> How to use it?
<Regenaxer> no, opposite: The left column contains uppercase but also *some* lowercase
<Regenaxer> ok, so I can use the text on the right side to filter! :)
<Regenaxer> "# LATIN SMALL LETTER"
<Regenaxer> If the tables are too big, I put it all into a shared library, loaded only when really needed
<Regenaxer> i.e. when 'lowc' or 'uppc' is called
<Regenaxer> I don't remember how I generated @lib/wide.l from http://www.unicode.org/Public/UCD/latest/ucd/EastAsianWidth.txt
<Regenaxer> Must have a script somewhere, but I don't find it
<tankf33der> generator.l
<tankf33der> :)
<Regenaxer> indeed! You are great!!
<tankf33der> we did it together to update to latest version :)
<Regenaxer> So I found it here too, in opt/genWide.l
<Regenaxer> did not know what to search for
_whitelogger has joined #picolisp
DerGuteMoritz has quit [Ping timeout: 268 seconds]
DerGuteMoritz has joined #picolisp