ChanServ changed the topic of #picolisp to: PicoLisp language | Channel Log: https://irclog.whitequark.org/picolisp/ | Picolisp latest found at http://www.software-lab.de/down.html | check also http://www.picolisp.com for more information
freemint has quit [Ping timeout: 265 seconds]
aw- has joined #picolisp
orivej has quit [Ping timeout: 256 seconds]
orivej has joined #picolisp
freemint has joined #picolisp
mtsd has joined #picolisp
orivej has quit [Ping timeout: 256 seconds]
<freemint> Hello
<cess11_> Good morning.
<mtsd> Good morning
<beneroth> Good morning picolispers :)
<Regenaxer> Good morning freemint, cess11_, mtsd, beneroth :)
<yumaikas> and one rogue lurking Erlanger
<freemint> Good morning *
<yumaikas> morning
<Regenaxer> Hi yumaikas
<yunfan> afternoon :D
<freemint> Regenaxer, i decided i am building an mail archive
<cess11_> freemint: What kind of audience is it you will have at your presentation?
<Regenaxer> freemint, cool
<freemint> cess11_, fellow hackers which do not know about Picolisp
<Regenaxer> I used hypermail for that recently
<freemint> Regenaxer, that means i will ask you a lot how to layout the database
<Regenaxer> ok, good
<freemint> I will first do er.l write up. and you might critique it. Actually i have no idea how big our mail archives are but i would guess less than 100 Gigs
<freemint> I will import the .mbox format
<Regenaxer> Yes, I also always start with er.l
<yumaikas> Regenaxer: er.l?
<freemint> entity-relationship
<Regenaxer> yumaikas, it is a filename aka standard for the DB model
<yumaikas> Ah
<freemint> Regenaxer, provided a model application where he named the file containing the classes of objects in the database er.l
<Regenaxer> in fact I use the name "er.l" in all projects
<Regenaxer> (unless the class definitions are just inline in a single file)
<freemint> is there a german word for er?
<Regenaxer> good question
<Regenaxer> Just "Datenmodell"?
<freemint> "Ding-Beziehungsmodell"?
<Regenaxer> hehe
<Regenaxer> "Entitäten"
<Regenaxer> Alle meine "Entitäten" schwimmen ...
<freemint> haha
alexshendi_ has quit [Ping timeout: 256 seconds]
<freemint> (to translate for all: word pun on ducks in german "Ente" that sounds similar to "Enti"täteten, all my entities are swimming
<mtsd> :)
<freemint> Regenaxer, What do i need to know about access to bags ...
<Regenaxer> mom, on tel
<freemint> *Swaps
<freemint> fine
orivej has joined #picolisp
<freemint> afk
freemint has quit [Ping timeout: 252 seconds]
alexshendi_ has joined #picolisp
freemint has joined #picolisp
<freemint> back
<cess11_> mbox is horrible but you'll be able to index senders and some contents fairly easy at least.
<freemint> cess11_, I see that
<freemint> (push '*questionqueue "Regenaxer , is there a way to get information how far you are into a file, to document problems when parsing?")
<cess11_> I think pil is at its most impressive when handled interactively. You could show a session where you build a DB from a fresh pil + and the mbox.
alexshendi_ has quit [Read error: Connection reset by peer]
<cess11_> There are several ways. I've on occasion used globals to keep track of 'line number' as in position in a '(+List +String), one can also use 'match, 'from or 'member to reach back into data from an offending part, and there's a kind of book keeping in the POSIX file handling that allows 'line to sort of keep track of where it is.
<freemint> (in "mbox" (while (there_is_text)(read-mail))) and when read-mail encounters problems it gives me an eval where i can call functions like add-rule to modify the parser?
<freemint> cess11_, the file might be bigger than RAM
<cess11_> More reason to import all of it into the DB to begin with.
<freemint> i could subdivide it ofcourse ... to insert it
<freemint> T
<cess11_> I would probably do '(make (in Mbox (until (eof)(link (line T] and put that into an object, then examine parts of that data to see ways to exfiltrate the good parts that should be copied into new objects.
<cess11_> Perhaps do it in chunks if resources are really tight, can easily be done with 'do or 'co.
<cess11_> I would not add any indexing on that field though, it could cause a DoS due to lack of resources if the import is big. Just pulling in a list of strings is easier on the machine and then you write your own little parsers and pattern matching routines that are simple and efficient.
<cess11_> Sometimes it is faster to 'chop large data sets in bigger chunks than a line at a time. You will have to fiddle around a bit to see what works best in your environment.
<freemint> keeping a line list is a good idea
<freemint> all in one go parsing is to amitious i think
<Regenaxer> Sorry, was on phone until now
<Regenaxer> BTW, mailbox parsing is done in the mailing list handler
<Regenaxer> misc/mailing
<Regenaxer> Is in the pil distro iirc
<beneroth> Regenaxer, about binary io: (pr) and (rd) use PLIO, (wr) is raw bytes, right?
<beneroth> ah I see now
<beneroth> (rd) with argument reads raw bytes
<beneroth> nevermind
<Regenaxer> yess :)
* beneroth is implementing DIME, an outdated deprecated binary format comparable to MIME...
<Regenaxer> Where is that needed?
<beneroth> SOAP with attachments, when the server is horribly legacy stuff
<Regenaxer> oh
<freemint> Regenaxer, I thik
<freemint> *I think your code is not exactly what i am looking for but i will take it as reference
<beneroth> Input: picolisp number. Output: raw 16 bit integer. how: (let Num 257 (out "test" (wr (/ Num 256) (% Num 256))) (in "test" (rd 2)) - correct? better way?
<Regenaxer> Perhaps with >> and &
<Regenaxer> slightly faster than mul/div
<beneroth> aye, I see
<Regenaxer> If the number is not more than 16 bits, & can be omitted
<Regenaxer> all positive numbers?
<beneroth> yeah all unsigned I think
<beneroth> positive only
<freemint> like that (wr (>> 8 (& (* 255 256) NUM)))(wr (& 255 NUM)) ?
<freemint> with `(*255 256) expanded
<Regenaxer> yes, but the & is not needed then
<beneroth> I've got now (wr (>> 8 Num) (& Num 255))
<Regenaxer> yes
<freemint> oh
<beneroth> thanks!
<Regenaxer> if the input is less than 65536
<beneroth> naturally
<beneroth> it's actually a kind of content length, an identifier string
<Regenaxer> I see
<freemint> If it is content length i would go for & honestly
<Regenaxer> Cause it may be bigger?
<freemint> Cause you parse untrusted contend
<beneroth> no I write it
<beneroth> :)
<beneroth> time for lunch. away for a while - thank you guys :)
<Regenaxer> :)
<freemint> Regenaxer 'wr has a built in %
<freemint> but it is 255
<Regenaxer> Ah, rigtht!
<freemint> (wr 33333333)
<freemint> U-> 33333333
<freemint> ! (% 33333333 255)
<freemint> ! (char "U")
<freemint> -> 85
<freemint> -> 243
<freemint> ! (% 33333333 256)
<freemint> -> 85
<freemint> *256
<freemint> Why is it %256?
<Regenaxer> yes, 'wr' uses ld (X) B # Store byte
<Regenaxer> so upper bits are ignored anyway
<Regenaxer> ie (& 255)
<freemint> it is not
<freemint> look at my test case
<freemint> it is %256
<freemint> ahh
<freemint> forget that
<freemint> %256 = %255
<freemint> *&255
<Regenaxer> : (out "a" (wr 33333333))
<Regenaxer> -> 33333333
<Regenaxer> : (hd "a")
<Regenaxer> 00000000 55
<Regenaxer> : (hex (char "U"))
<Regenaxer> -> "55"
<freemint> you are right i confused mod and and
<freemint> %256 is the same as &255
<Regenaxer> yes
<freemint> Regenaxer, Can you recommend the use 'match on char lists of lines?
<Regenaxer> yes, fine
<freemint> Would you write a parser in mostly match?
<freemint> Is there a command to get all set patterns
<Regenaxer> Most frequently I use from/till
<Regenaxer> is also the fastest
<freemint> I want to build it really robust ... i can imagine that is it the fastest.
<freemint> my reasoning i simple that i would make for a cool demonstration of picolisp, when it fails to parse a email, which has some weird Date and i get thrown in to a shell, and i (add-pattern ...) (test-pattern) (commit-pattern)
<cess11_> The 'match patterns are lists you design and keep track of.
<freemint> cess11_, i know that they are often used that way ... but i can imagine making my code easier it there would be a get all matched patterns
<freemint> *if
<cess11_> 'mapcar and a 'match lambda. Result is a list of matches.
<freemint> : (setq @MAIL "jobsch")
<freemint> : (fill '(@MAIL .))
<freemint> -> ("jobsch")
<freemint> : (fill '(@MAIL))
<freemint> -> "jobsch"
<freemint> Speicherzugriffsfehler
<cess11_> Yeah, that's a memory leak.
<freemint> 'match yields true
<cess11_> Right, forgot. 'make is better.
<cess11_> Rather mixed up use cases.
<Regenaxer> 'fill' does not keep track of circular lists
<freemint> I've noticed
<freemint> Should that be documented?
<freemint> how do you use 'make to do what math does cess11_ ?
<Regenaxer> Almost *all* functions don't handle them, as the check is expensive
<Regenaxer> only a handful does
<Regenaxer> print, length, size iirc
<cess11_> '(make (mapcar '((X)(and (match '(@A " " @B) (chop X))(link (pack @A " " @B))) L]
<freemint> your decision if it end's up in the docs
<Regenaxer> cess11_: Why 'make' with 'mapcar'? Build two lists?
<Regenaxer> freemint, document in *every* function?
aw- has quit [Quit: Leaving.]
<Regenaxer> A circular list is simply a huge (infinite) list :)
<cess11_> Ah, the lambda could return from 'pack instead of 'match or somesuch.
<Regenaxer> or use mapc
<Regenaxer> if the value is not needed
<Regenaxer> to avoid building a garbage list
<cess11_> T
<freemint> i can not get your code working
<freemint> cess11_,
<freemint> The difference between 'do and 'mapc is that it iterates over multiple lists?
<cess11_> It works as is. You need a list 'L, that's it, though pretty ugly.
<freemint> How should L look like
<freemint> List string, list of chars?
<freemint> (list of lists of?
<cess11_> Doesn't matter. Char list isn't affected by 'chop. You'll need a better pattern for 'match though, that one assumes it has several chars and a '" ".
<freemint> ok could you give me an example?
<cess11_> '(setq L '("abc" "def" "ghi"))
<freemint> yields nil
<cess11_> '(mapcar '((X)(and (match '(@A "e" @B) (chop X))(let R (pack @B) R))) L]
<cess11_> Yes, it gets no hits.
<freemint> afl (away from lenovo)
jibanes has quit [Ping timeout: 245 seconds]
jibanes has joined #picolisp
<Regenaxer> hehe
mtsd has quit [Ping timeout: 240 seconds]
<beneroth> back
<beneroth> freemint, I just looked at implementation of (wr) in @src64/io.l: shr A 4 # Normalize - I guess shr is right shift
<freemint> It drops the flag of the cell
<Regenaxer> The point is that in the end register B is stored
<Regenaxer> an implicit & FF
<beneroth> ah
<freemint> (including the sign)
orivej has quit [Ping timeout: 240 seconds]
<Regenaxer> putStdoutB -> ld (X) B # Store byte
orivej has joined #picolisp
<freemint> Regenaxer, have you ever considered pre-rendering (in to a file) certain web pages for speed?
<freemint> Regenaxer, You own the PicoLisp Youtube channel, don't you?
<Regenaxer> You mean static pages?
<Regenaxer> No, I have no youtube channel
<beneroth> freemint, I plan to do that (pre-rending - in the past such a thing was just called 'caching)
<beneroth> the problem is to accurately track all changes to know when to invalidate the cache :)
<beneroth> only worth to optimize if you really need it, arguably
<Regenaxer> T
<freemint> T i just wanted to know whether there is a solution since i would be interested how the cache invalidation is handled
<beneroth> for web apps with up to a few 100 (< ca. 500) concurrent users, Regenaxers architecture (with form.l) is optimal I believe
<freemint> There is a PicoLisp youtube channel showing of Penti ... i think it is even linked in the wiki...
<beneroth> freemint, no off-the-shelf solution. also because it is highly app-specific. arguably the form.l architecture by Regenaxer does kinda do caching in RAM :)
<Regenaxer> Ah, a very short video showing Penti "Hello world"
orivej has quit [Ping timeout: 245 seconds]
<freemint> Yes
<freemint> Does somebody know whose youtube channel it is?
orivej has joined #picolisp
<beneroth> afaik picolisp.com, software-lab.de, the mailing list, and his personal twitter and google play accounts are managed by Regenaxer. everything else (including this IRC channel) is managed by others.
<Regenaxer> I made it very early with the first Penti version on Android
<Regenaxer> beneroth, right
<freemint> I see would you be interested in my presentation being recorded to be uploaded there?
* beneroth would watch it
<Regenaxer> Yes, sure!
<Regenaxer> Though it is not my channel
<beneroth> maybe a picolisp playlist would be the right thing, to also include the pilOS (pisces) vid and the froscon talk by Regenaxer, etc
<Regenaxer> I don't remember, I think my daughter uploaded the vid
<Regenaxer> The froscon talk is a bit useless, as there is nothing to see
<freemint> Do you still have the slides?
<beneroth> picolisp resources on the internet are a bit chaotic set up, though optimized for minimal bureaucracy xD
<Regenaxer> T
<beneroth> yeah it would be more useful with the slides as pdf or such...
<Regenaxer> freemint, I think there were no slides, I just presented a session on my netbook
<beneroth> ah
<freemint> the reconstructing what you roughly did would be possible ...
<freemint> the FeM (my local hacker space) does/did video for many chaos communication congresses
<Regenaxer> Let me check
<freemint> so i would know wat people to ask to do the editing.
<Regenaxer> Found something
<Regenaxer> I made it with mgp
<Regenaxer> -> pdf
<Regenaxer> moment
<Regenaxer> software-lab.de/quasiconf.pdf
<Regenaxer> software-lab.de/quasiconfAug12.mgp is the soure
<Regenaxer> So it is just an outline, a few notes
<freemint> mhh if i had all the content i might be able to get that in to modified CCC template or something
<Regenaxer> all content?
<freemint> all the content that should appear on the screen
<freemint> (like what you show in the shell ....
<Regenaxer> Yes, that's lost
<Regenaxer> was on the fly
<freemint> (what a browser would see ...
<freemint> but i guess it can be reconstructed well enough
<freemint> if i had that i could rerecord the screen and have some one put the video together
<freemint> just an offer
<freemint> is picolisp a trademaked name?
<Regenaxer> I don't think so
<Regenaxer> I hope it is not ;)
<Regenaxer> There is no "trade" involved I would say
<freemint> I was more asking if you did trademark it once
<Regenaxer> nada
<freemint> ok back to reading stuff and thinking about match, so i can think what data i can i gain and what er.l works best for that
<Regenaxer> good
<freemint> is there a better way to patterns and conditions the to check the patterns after match for condition (being a month for example)
<freemint> *is there a better way to handle patterns and conditions then to check the patterns after match for condition (being a month for example)
<Regenaxer> hmm, depends on the data
<Regenaxer> I always start with direct parsing with 'from', 'till', 'peek', 'char', 'skip' etc.
<Regenaxer> 'head', 'match' etc. are useful only for line-structured data
<Regenaxer> eg HTTP headers
<Regenaxer> the body has no lines, so a stream parser with 'from' et. al. is better
<freemint> (or mail headers
<Regenaxer> right
<Regenaxer> For some purposes 'echo' is optimal
<Regenaxer> when you want to replace patterns in a stream
<Regenaxer> eg, consider
<Regenaxer> (in "@lib/socialshareprivacy/jquery.socialshareprivacy"
<Regenaxer> (while (echo "<BASE>" "<FBTXT>" "<TWTXT>" "<G+TXT>" "<HELP>" "<PERMA>")
<Regenaxer> (casq @
<Regenaxer> ("<BASE>" (prin (baseHRef) "@lib"))
<Regenaxer> ("<TWTXT>" (prin ,"2 clicks for more data privacy: only after you click
<Regenaxer> ("<FBTXT>" (prin ,"2 clicks for more data privacy: only after you click
<Regenaxer>
<Regenaxer> ...
<Regenaxer> or "Rosetta Code/Fix code tags"
<Regenaxer> (let Lang '("ada" "awk" "c" "forth" "prolog" "python" "z80")
<Regenaxer> (while (echo "<")
<Regenaxer> (in NIL
<Regenaxer> (let S (till ">" T)
<Regenaxer> (cond
<freemint> cool an example of that should end uo in doc 'echo
<Regenaxer> ((pre? "code " S) (prin "<lang" (cddddr (chop S))))
<Regenaxer> ((member S Lang) (prin "<lang " S))
<Regenaxer> ((= S "/code") (prin "</lang"))
<Regenaxer> ((and (pre? "/" S) (member (pack (cdr (chop S))) Lang))
<Regenaxer> (prin "</lang") )
<Regenaxer> (T (prin "<" S)) ) ) ) ) )
<Regenaxer> Rosetta has many such examples
<Regenaxer> Nice: Strip block comments
<Regenaxer> (in "sample.txt"
<Regenaxer> (while (echo "/*")
<Regenaxer> (out "/dev/null" (echo "*/")) ) )
<Regenaxer> I think for the ref the above examples are a bit too long
<freemint> yeah
<freemint> fun fact there 388 different header attributes which might end up in an email
<freemint> If i remove all http related it is still 144
<freemint> *only keep strictly email related
<beneroth> :)
<cess11_> Sender and some contents will be easy, full implementation of headers and whatnot plus quirks of mbox would be nightmarish.
<cess11_> For some reason I still haven't learned that EOF Overrun might as well be lacking " as ( or ).
<beneroth> no worries, it will nag you until you do
<cess11_> Let's hope so.
<Regenaxer> cess11_, true, but the error message is too much at the lowest level to know the context ;)
<freemint> Regenaxer, The error messages could be much friendlier ...
<cess11_> Yar, it would rather be something for linting or syntax colouring to catch, but then I'd have to use those and, well...
<freemint> at a cost
<cess11_> Nah, then one might be surprised when the segfault comes.
<Regenaxer> vip helps here
<cess11_> Sure, I was lazy and on a hobby project.
<Regenaxer> It underlines strings
<cess11_> T, should use it more instead of vanilla vim.
<cess11_> But habits.
<Regenaxer> yeah, especially as some minor details are different
<Regenaxer> I got used to it, use it 100%
<Regenaxer> only for C/JS sources, or binary stuff, I use vip
<Regenaxer> 'vi' is a link to 'vip' here
<Regenaxer> "only for ... I use vim" I meant
<freemint> Regenaxer, what kind of index would you recommend on the email body?
<Regenaxer> You need a special one probably, the standard ones are not for long text
<Regenaxer> Something like in the Wiki
<Regenaxer> (class +MupIdx +index)
<Regenaxer> wiki/er.l
<freemint> what attributes does that one have?
<Regenaxer> it splits and folds the words in the text body
<Regenaxer> You have the wiki sources?
<Regenaxer> in wiki/lib.l is
<Regenaxer> (de splitWords (Lst)
<Regenaxer> (mapcar pack
<Regenaxer> (extract fold
<Regenaxer> (split Lst ~(chop "^J !,-.:;?{}")) ) ) )
<Regenaxer> then it extracts the words to index:
<Regenaxer> (de foldedWords (Mup)
<Regenaxer> (uniq
<Regenaxer> (when Mup
<Regenaxer> (filter '((W) (>= (length W) 3))
<freemint> i just downloaded it agaion
<Regenaxer> (splitWords (in (blob Mup 'txt) (till))) ) ) ) )
<Regenaxer> ok
<Regenaxer> Did not change in these parts I think
<Regenaxer> only last April:
<Regenaxer> (filter '((W) (>= (length W) 3))
<Regenaxer> it was 4 before
<Regenaxer> so now it indexes shorter words
<freemint> Is the wiki text and the index stored in Mup
<freemint> (many picolist commands are around 3 words
<Regenaxer> yes, each modification is a +Mup
<freemint> what does +Mup stand for
<Regenaxer> Markup
<freemint> Can i mostly recycle this?
<Regenaxer> sure
<freemint> could i use these folded indexes to compute similarity between documents (aka detect quotes)
<Regenaxer> Good idea
rick42_ has joined #picolisp
<Regenaxer> Perhaps by counting how many words are the same
<freemint> In the end each branch of the folded index ends in a list of +Mup having the same content is that right?
<Regenaxer> no, each each branch points to a separate mup
<Regenaxer> There may be several mups with the same text
<freemint> if i have "abb bbd dee jje" and "abb bbd dde jjj" do they share parts of the search tree when they re folded
<freemint> Regenaxer, autsch
<freemint> i do not like that
<Regenaxer> no sharing
<freemint> why not?
<Regenaxer> It is how the indexes work
rick42 has quit [*.net *.split]
<Regenaxer> You would need a lot of searching during tree operations otherwise
rick42_ is now known as rick42
<freemint> ah
<freemint> how does that approach affect space?
<Regenaxer> And you would not save sooo much
<Regenaxer> a list also takes space
<Regenaxer> now there is key/value for each entry
<Regenaxer> It is probably not a super-high-performance fulltext index
<freemint> is there a strc
<freemint> *structure at the end of the ree
<Regenaxer> no, each node points to an object
<freemint> *tree which can be distinguished from intermediate levels
<Regenaxer> all +index subclasses
<freemint> How would a new class like +leave affect performance, when then maintain a hash of the path to them and are indexed by this hash additionally
<Regenaxer> no idea
<beneroth> freemint, pilDB index trees are BTrees
<cess11_> If you're low on hardware resources you probably want to profile the actual data.
<Regenaxer> Perhaps a research project for you?
<freemint> cess11_, I am not low on hardware
<beneroth> then don't try to do premature optimization :P
<cess11_> Well then, just rock a +Sn on a partial chunked set and see how it goes.
<beneroth> T
<freemint> you are confusing premature optimization mind games and premature optimization
<beneroth> both cost time and block you from getting to results :P
<Regenaxer> The pil db index classes were designed for relatively short values
<freemint> but generate fun
<freemint> ;P
<Regenaxer> You could use some external indexing tool
<beneroth> point
<Regenaxer> But in the Wiki it performs well :)
<freemint> That is another question i have: are +Sn more compact than character folded indexes
<Regenaxer> +Sn is only for personal (human) names
<Regenaxer> European that is
<Regenaxer> you mean +Idx
<freemint> i here your warning but still
<Regenaxer> folding gives compacter stuff
<cess11_> Works also for other kinds of text, but not as nicely as with names.
<Regenaxer> yes
<freemint> is +Sn (through the loosy compression) more compact than text
<Regenaxer> that's true
<freemint> index
<freemint> what's true?
<Regenaxer> but the typical (+Sn +Idx) gets long
<Regenaxer> the +Idx
<Regenaxer> 17:32 <freemint> is +Sn (through the 17:32 <freemint> is +Sn (through the l
<Regenaxer> oops
<Regenaxer> True is: +Sn (through the loosy compression) more compact than text
<Regenaxer> off for a while
<Regenaxer> :)
<freemint> the +Idx get's as long as longest text, or as long as the longest shared tree between two texts
<freemint> bye
<cess11_> I'm not sure but +Idx and '(+Sn +Idx) on +String takes a fair bit of space, you'll see as soon as you do an import and have top/htop/&c. visible.
<Regenaxer> ret
<Regenaxer> +Sn produces only a single short entry, It is +Idx with all its substrings which blows it up
<freemint> so in (+Sn +Idx), +Idx works on normal strings
<Regenaxer> For non-human names I use almost always +IdxFold
<Regenaxer> yes
<freemint> i would not have assumed that
<Regenaxer> There was a document, I think we discussed it here a while ago
<Regenaxer> moment
<Regenaxer> yes: software-lab.de/doc/search
<Regenaxer> This is where I always look up if I'm not sure
<Regenaxer> The indexes produced by a value "Regen Axer"
<Regenaxer> For longer strings the size differences get more dramatic
<Regenaxer> +Key is the shortest :)
orivej has quit [Ping timeout: 240 seconds]
freemint has quit [Ping timeout: 252 seconds]
orivej has joined #picolisp
orivej has quit [Ping timeout: 256 seconds]
orivej has joined #picolisp
orivej has quit [Ping timeout: 240 seconds]
tankf33der has joined #picolisp
karswell has joined #picolisp
orivej has joined #picolisp
freemint has joined #picolisp
<freemint> "discussed"
orivej has quit [Ping timeout: 245 seconds]
orivej has joined #picolisp
orivej has quit [Ping timeout: 240 seconds]
orivej has joined #picolisp
beneroth is now known as bene|off