<Regenaxer>
Good morning freemint, cess11_, mtsd, beneroth :)
<yumaikas>
and one rogue lurking Erlanger
<freemint>
Good morning *
<yumaikas>
morning
<Regenaxer>
Hi yumaikas
<yunfan>
afternoon :D
<freemint>
Regenaxer, i decided i am building an mail archive
<cess11_>
freemint: What kind of audience is it you will have at your presentation?
<Regenaxer>
freemint, cool
<freemint>
cess11_, fellow hackers which do not know about Picolisp
<Regenaxer>
I used hypermail for that recently
<freemint>
Regenaxer, that means i will ask you a lot how to layout the database
<Regenaxer>
ok, good
<freemint>
I will first do er.l write up. and you might critique it. Actually i have no idea how big our mail archives are but i would guess less than 100 Gigs
<freemint>
I will import the .mbox format
<Regenaxer>
Yes, I also always start with er.l
<yumaikas>
Regenaxer: er.l?
<freemint>
entity-relationship
<Regenaxer>
yumaikas, it is a filename aka standard for the DB model
<yumaikas>
Ah
<freemint>
Regenaxer, provided a model application where he named the file containing the classes of objects in the database er.l
<Regenaxer>
in fact I use the name "er.l" in all projects
<Regenaxer>
(unless the class definitions are just inline in a single file)
<freemint>
is there a german word for er?
<Regenaxer>
good question
<Regenaxer>
Just "Datenmodell"?
<freemint>
"Ding-Beziehungsmodell"?
<Regenaxer>
hehe
<Regenaxer>
"Entitäten"
<Regenaxer>
Alle meine "Entitäten" schwimmen ...
<freemint>
haha
alexshendi_ has quit [Ping timeout: 256 seconds]
<freemint>
(to translate for all: word pun on ducks in german "Ente" that sounds similar to "Enti"täteten, all my entities are swimming
<mtsd>
:)
<freemint>
Regenaxer, What do i need to know about access to bags ...
<Regenaxer>
mom, on tel
<freemint>
*Swaps
<freemint>
fine
orivej has joined #picolisp
<freemint>
afk
freemint has quit [Ping timeout: 252 seconds]
alexshendi_ has joined #picolisp
freemint has joined #picolisp
<freemint>
back
<cess11_>
mbox is horrible but you'll be able to index senders and some contents fairly easy at least.
<freemint>
cess11_, I see that
<freemint>
(push '*questionqueue "Regenaxer , is there a way to get information how far you are into a file, to document problems when parsing?")
<cess11_>
I think pil is at its most impressive when handled interactively. You could show a session where you build a DB from a fresh pil + and the mbox.
alexshendi_ has quit [Read error: Connection reset by peer]
<cess11_>
There are several ways. I've on occasion used globals to keep track of 'line number' as in position in a '(+List +String), one can also use 'match, 'from or 'member to reach back into data from an offending part, and there's a kind of book keeping in the POSIX file handling that allows 'line to sort of keep track of where it is.
<freemint>
(in "mbox" (while (there_is_text)(read-mail))) and when read-mail encounters problems it gives me an eval where i can call functions like add-rule to modify the parser?
<freemint>
cess11_, the file might be bigger than RAM
<cess11_>
More reason to import all of it into the DB to begin with.
<freemint>
i could subdivide it ofcourse ... to insert it
<freemint>
T
<cess11_>
I would probably do '(make (in Mbox (until (eof)(link (line T] and put that into an object, then examine parts of that data to see ways to exfiltrate the good parts that should be copied into new objects.
<cess11_>
Perhaps do it in chunks if resources are really tight, can easily be done with 'do or 'co.
<cess11_>
I would not add any indexing on that field though, it could cause a DoS due to lack of resources if the import is big. Just pulling in a list of strings is easier on the machine and then you write your own little parsers and pattern matching routines that are simple and efficient.
<cess11_>
Sometimes it is faster to 'chop large data sets in bigger chunks than a line at a time. You will have to fiddle around a bit to see what works best in your environment.
<freemint>
keeping a line list is a good idea
<freemint>
all in one go parsing is to amitious i think
<Regenaxer>
Sorry, was on phone until now
<Regenaxer>
BTW, mailbox parsing is done in the mailing list handler
<Regenaxer>
misc/mailing
<Regenaxer>
Is in the pil distro iirc
<beneroth>
Regenaxer, about binary io: (pr) and (rd) use PLIO, (wr) is raw bytes, right?
<beneroth>
ah I see now
<beneroth>
(rd) with argument reads raw bytes
<beneroth>
nevermind
<Regenaxer>
yess :)
* beneroth
is implementing DIME, an outdated deprecated binary format comparable to MIME...
<Regenaxer>
Where is that needed?
<beneroth>
SOAP with attachments, when the server is horribly legacy stuff
<Regenaxer>
oh
<freemint>
Regenaxer, I thik
<freemint>
*I think your code is not exactly what i am looking for but i will take it as reference
<beneroth>
Input: picolisp number. Output: raw 16 bit integer. how: (let Num 257 (out "test" (wr (/ Num 256) (% Num 256))) (in "test" (rd 2)) - correct? better way?
<Regenaxer>
Perhaps with >> and &
<Regenaxer>
slightly faster than mul/div
<beneroth>
aye, I see
<Regenaxer>
If the number is not more than 16 bits, & can be omitted
<Regenaxer>
all positive numbers?
<beneroth>
yeah all unsigned I think
<beneroth>
positive only
<freemint>
like that (wr (>> 8 (& (* 255 256) NUM)))(wr (& 255 NUM)) ?
<freemint>
with `(*255 256) expanded
<Regenaxer>
yes, but the & is not needed then
<beneroth>
I've got now (wr (>> 8 Num) (& Num 255))
<Regenaxer>
yes
<freemint>
oh
<beneroth>
thanks!
<Regenaxer>
if the input is less than 65536
<beneroth>
naturally
<beneroth>
it's actually a kind of content length, an identifier string
<Regenaxer>
I see
<freemint>
If it is content length i would go for & honestly
<Regenaxer>
Cause it may be bigger?
<freemint>
Cause you parse untrusted contend
<beneroth>
no I write it
<beneroth>
:)
<beneroth>
time for lunch. away for a while - thank you guys :)
<Regenaxer>
:)
<freemint>
Regenaxer 'wr has a built in %
<freemint>
but it is 255
<Regenaxer>
Ah, rigtht!
<freemint>
(wr 33333333)
<freemint>
U-> 33333333
<freemint>
! (% 33333333 255)
<freemint>
! (char "U")
<freemint>
-> 85
<freemint>
-> 243
<freemint>
! (% 33333333 256)
<freemint>
-> 85
<freemint>
*256
<freemint>
Why is it %256?
<Regenaxer>
yes, 'wr' uses ld (X) B # Store byte
<Regenaxer>
so upper bits are ignored anyway
<Regenaxer>
ie (& 255)
<freemint>
it is not
<freemint>
look at my test case
<freemint>
it is %256
<freemint>
ahh
<freemint>
forget that
<freemint>
%256 = %255
<freemint>
*&255
<Regenaxer>
: (out "a" (wr 33333333))
<Regenaxer>
-> 33333333
<Regenaxer>
: (hd "a")
<Regenaxer>
00000000 55
<Regenaxer>
: (hex (char "U"))
<Regenaxer>
-> "55"
<freemint>
you are right i confused mod and and
<freemint>
%256 is the same as &255
<Regenaxer>
yes
<freemint>
Regenaxer, Can you recommend the use 'match on char lists of lines?
<Regenaxer>
yes, fine
<freemint>
Would you write a parser in mostly match?
<freemint>
Is there a command to get all set patterns
<Regenaxer>
Most frequently I use from/till
<Regenaxer>
is also the fastest
<freemint>
I want to build it really robust ... i can imagine that is it the fastest.
<freemint>
my reasoning i simple that i would make for a cool demonstration of picolisp, when it fails to parse a email, which has some weird Date and i get thrown in to a shell, and i (add-pattern ...) (test-pattern) (commit-pattern)
<cess11_>
The 'match patterns are lists you design and keep track of.
<freemint>
cess11_, i know that they are often used that way ... but i can imagine making my code easier it there would be a get all matched patterns
<freemint>
*if
<cess11_>
'mapcar and a 'match lambda. Result is a list of matches.
<freemint>
: (setq @MAIL "jobsch")
<freemint>
: (fill '(@MAIL .))
<freemint>
-> ("jobsch")
<freemint>
: (fill '(@MAIL))
<freemint>
-> "jobsch"
<freemint>
Speicherzugriffsfehler
<cess11_>
Yeah, that's a memory leak.
<freemint>
'match yields true
<cess11_>
Right, forgot. 'make is better.
<cess11_>
Rather mixed up use cases.
<Regenaxer>
'fill' does not keep track of circular lists
<freemint>
I've noticed
<freemint>
Should that be documented?
<freemint>
how do you use 'make to do what math does cess11_ ?
<Regenaxer>
Almost *all* functions don't handle them, as the check is expensive
<freemint>
your decision if it end's up in the docs
<Regenaxer>
cess11_: Why 'make' with 'mapcar'? Build two lists?
<Regenaxer>
freemint, document in *every* function?
aw- has quit [Quit: Leaving.]
<Regenaxer>
A circular list is simply a huge (infinite) list :)
<cess11_>
Ah, the lambda could return from 'pack instead of 'match or somesuch.
<Regenaxer>
or use mapc
<Regenaxer>
if the value is not needed
<Regenaxer>
to avoid building a garbage list
<cess11_>
T
<freemint>
i can not get your code working
<freemint>
cess11_,
<freemint>
The difference between 'do and 'mapc is that it iterates over multiple lists?
<cess11_>
It works as is. You need a list 'L, that's it, though pretty ugly.
<freemint>
How should L look like
<freemint>
List string, list of chars?
<freemint>
(list of lists of?
<cess11_>
Doesn't matter. Char list isn't affected by 'chop. You'll need a better pattern for 'match though, that one assumes it has several chars and a '" ".
<beneroth>
freemint, I just looked at implementation of (wr) in @src64/io.l: shr A 4 # Normalize - I guess shr is right shift
<freemint>
It drops the flag of the cell
<Regenaxer>
The point is that in the end register B is stored
<Regenaxer>
an implicit & FF
<beneroth>
ah
<freemint>
(including the sign)
orivej has quit [Ping timeout: 240 seconds]
<Regenaxer>
putStdoutB -> ld (X) B # Store byte
orivej has joined #picolisp
<freemint>
Regenaxer, have you ever considered pre-rendering (in to a file) certain web pages for speed?
<freemint>
Regenaxer, You own the PicoLisp Youtube channel, don't you?
<Regenaxer>
You mean static pages?
<Regenaxer>
No, I have no youtube channel
<beneroth>
freemint, I plan to do that (pre-rending - in the past such a thing was just called 'caching)
<beneroth>
the problem is to accurately track all changes to know when to invalidate the cache :)
<beneroth>
only worth to optimize if you really need it, arguably
<Regenaxer>
T
<freemint>
T i just wanted to know whether there is a solution since i would be interested how the cache invalidation is handled
<beneroth>
for web apps with up to a few 100 (< ca. 500) concurrent users, Regenaxers architecture (with form.l) is optimal I believe
<freemint>
There is a PicoLisp youtube channel showing of Penti ... i think it is even linked in the wiki...
<beneroth>
freemint, no off-the-shelf solution. also because it is highly app-specific. arguably the form.l architecture by Regenaxer does kinda do caching in RAM :)
<Regenaxer>
Ah, a very short video showing Penti "Hello world"
orivej has quit [Ping timeout: 245 seconds]
<freemint>
Yes
<freemint>
Does somebody know whose youtube channel it is?
orivej has joined #picolisp
<beneroth>
afaik picolisp.com, software-lab.de, the mailing list, and his personal twitter and google play accounts are managed by Regenaxer. everything else (including this IRC channel) is managed by others.
<Regenaxer>
I made it very early with the first Penti version on Android
<Regenaxer>
beneroth, right
<freemint>
I see would you be interested in my presentation being recorded to be uploaded there?
* beneroth
would watch it
<Regenaxer>
Yes, sure!
<Regenaxer>
Though it is not my channel
<beneroth>
maybe a picolisp playlist would be the right thing, to also include the pilOS (pisces) vid and the froscon talk by Regenaxer, etc
<Regenaxer>
I don't remember, I think my daughter uploaded the vid
<Regenaxer>
The froscon talk is a bit useless, as there is nothing to see
<freemint>
Do you still have the slides?
<beneroth>
picolisp resources on the internet are a bit chaotic set up, though optimized for minimal bureaucracy xD
<Regenaxer>
T
<beneroth>
yeah it would be more useful with the slides as pdf or such...
<Regenaxer>
freemint, I think there were no slides, I just presented a session on my netbook
<beneroth>
ah
<freemint>
the reconstructing what you roughly did would be possible ...
<freemint>
the FeM (my local hacker space) does/did video for many chaos communication congresses
<Regenaxer>
Let me check
<freemint>
so i would know wat people to ask to do the editing.
<Regenaxer>
Found something
<Regenaxer>
I made it with mgp
<Regenaxer>
-> pdf
<Regenaxer>
moment
<Regenaxer>
software-lab.de/quasiconf.pdf
<Regenaxer>
software-lab.de/quasiconfAug12.mgp is the soure
<Regenaxer>
So it is just an outline, a few notes
<freemint>
mhh if i had all the content i might be able to get that in to modified CCC template or something
<Regenaxer>
all content?
<freemint>
all the content that should appear on the screen
<freemint>
(like what you show in the shell ....
<Regenaxer>
Yes, that's lost
<Regenaxer>
was on the fly
<freemint>
(what a browser would see ...
<freemint>
but i guess it can be reconstructed well enough
<freemint>
if i had that i could rerecord the screen and have some one put the video together
<freemint>
just an offer
<freemint>
is picolisp a trademaked name?
<Regenaxer>
I don't think so
<Regenaxer>
I hope it is not ;)
<Regenaxer>
There is no "trade" involved I would say
<freemint>
I was more asking if you did trademark it once
<Regenaxer>
nada
<freemint>
ok back to reading stuff and thinking about match, so i can think what data i can i gain and what er.l works best for that
<Regenaxer>
good
<freemint>
is there a better way to patterns and conditions the to check the patterns after match for condition (being a month for example)
<freemint>
*is there a better way to handle patterns and conditions then to check the patterns after match for condition (being a month for example)
<Regenaxer>
hmm, depends on the data
<Regenaxer>
I always start with direct parsing with 'from', 'till', 'peek', 'char', 'skip' etc.
<Regenaxer>
'head', 'match' etc. are useful only for line-structured data
<Regenaxer>
eg HTTP headers
<Regenaxer>
the body has no lines, so a stream parser with 'from' et. al. is better
<freemint>
(or mail headers
<Regenaxer>
right
<Regenaxer>
For some purposes 'echo' is optimal
<Regenaxer>
when you want to replace patterns in a stream
<Regenaxer>
eg, consider
<Regenaxer>
(in "@lib/socialshareprivacy/jquery.socialshareprivacy"
<freemint>
Is the wiki text and the index stored in Mup
<freemint>
(many picolist commands are around 3 words
<Regenaxer>
yes, each modification is a +Mup
<freemint>
what does +Mup stand for
<Regenaxer>
Markup
<freemint>
Can i mostly recycle this?
<Regenaxer>
sure
<freemint>
could i use these folded indexes to compute similarity between documents (aka detect quotes)
<Regenaxer>
Good idea
rick42_ has joined #picolisp
<Regenaxer>
Perhaps by counting how many words are the same
<freemint>
In the end each branch of the folded index ends in a list of +Mup having the same content is that right?
<Regenaxer>
no, each each branch points to a separate mup
<Regenaxer>
There may be several mups with the same text
<freemint>
if i have "abb bbd dee jje" and "abb bbd dde jjj" do they share parts of the search tree when they re folded
<freemint>
Regenaxer, autsch
<freemint>
i do not like that
<Regenaxer>
no sharing
<freemint>
why not?
<Regenaxer>
It is how the indexes work
rick42 has quit [*.net *.split]
<Regenaxer>
You would need a lot of searching during tree operations otherwise
rick42_ is now known as rick42
<freemint>
ah
<freemint>
how does that approach affect space?
<Regenaxer>
And you would not save sooo much
<Regenaxer>
a list also takes space
<Regenaxer>
now there is key/value for each entry
<Regenaxer>
It is probably not a super-high-performance fulltext index
<freemint>
is there a strc
<freemint>
*structure at the end of the ree
<Regenaxer>
no, each node points to an object
<freemint>
*tree which can be distinguished from intermediate levels
<Regenaxer>
all +index subclasses
<freemint>
How would a new class like +leave affect performance, when then maintain a hash of the path to them and are indexed by this hash additionally
<Regenaxer>
no idea
<beneroth>
freemint, pilDB index trees are BTrees
<cess11_>
If you're low on hardware resources you probably want to profile the actual data.
<Regenaxer>
Perhaps a research project for you?
<freemint>
cess11_, I am not low on hardware
<beneroth>
then don't try to do premature optimization :P
<cess11_>
Well then, just rock a +Sn on a partial chunked set and see how it goes.
<beneroth>
T
<freemint>
you are confusing premature optimization mind games and premature optimization
<beneroth>
both cost time and block you from getting to results :P
<Regenaxer>
The pil db index classes were designed for relatively short values
<freemint>
but generate fun
<freemint>
;P
<Regenaxer>
You could use some external indexing tool
<beneroth>
point
<Regenaxer>
But in the Wiki it performs well :)
<freemint>
That is another question i have: are +Sn more compact than character folded indexes
<Regenaxer>
+Sn is only for personal (human) names
<Regenaxer>
European that is
<Regenaxer>
you mean +Idx
<freemint>
i here your warning but still
<Regenaxer>
folding gives compacter stuff
<cess11_>
Works also for other kinds of text, but not as nicely as with names.
<Regenaxer>
yes
<freemint>
is +Sn (through the loosy compression) more compact than text
<Regenaxer>
that's true
<freemint>
index
<freemint>
what's true?
<Regenaxer>
but the typical (+Sn +Idx) gets long
<Regenaxer>
the +Idx
<Regenaxer>
17:32 <freemint> is +Sn (through the 17:32 <freemint> is +Sn (through the l
<Regenaxer>
oops
<Regenaxer>
True is: +Sn (through the loosy compression) more compact than text
<Regenaxer>
off for a while
<Regenaxer>
:)
<freemint>
the +Idx get's as long as longest text, or as long as the longest shared tree between two texts
<freemint>
bye
<cess11_>
I'm not sure but +Idx and '(+Sn +Idx) on +String takes a fair bit of space, you'll see as soon as you do an import and have top/htop/&c. visible.
<Regenaxer>
ret
<Regenaxer>
+Sn produces only a single short entry, It is +Idx with all its substrings which blows it up
<freemint>
so in (+Sn +Idx), +Idx works on normal strings
<Regenaxer>
For non-human names I use almost always +IdxFold
<Regenaxer>
yes
<freemint>
i would not have assumed that
<Regenaxer>
There was a document, I think we discussed it here a while ago
<Regenaxer>
moment
<Regenaxer>
yes: software-lab.de/doc/search
<Regenaxer>
This is where I always look up if I'm not sure
<Regenaxer>
The indexes produced by a value "Regen Axer"
<Regenaxer>
For longer strings the size differences get more dramatic