xkapastel has quit [Quit: Connection closed for inactivity]
xkapastel has joined #picolisp
alexshendi has quit [Read error: Connection reset by peer]
orivej has quit [Ping timeout: 244 seconds]
DKordic has quit [Ping timeout: 250 seconds]
razzy has quit [Ping timeout: 272 seconds]
rob_w has joined #picolisp
razzy has joined #picolisp
orivej has joined #picolisp
xkapastel has quit [Quit: Connection closed for inactivity]
razzy has quit [Ping timeout: 272 seconds]
<beneroth>
hi all
<beneroth>
Regenaxer, yeah the usual term is "premature optimization"
<Regenaxer>
Hi beneroth! Thanks :)
<beneroth>
"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%"
<beneroth>
Donald Knuth
<Regenaxer>
Wise man
<beneroth>
I didn't know about skip lists
<beneroth>
it seems they're better for use cases where you 1) do concurrency 2) many writes/changes to the structure. With a skip list modifications are more local to parts of the whole structure where a tree requires modifications to more parts (e.g. for re-balancing), therefore the skip list is better suited for concurrent modifications
<Regenaxer>
I looked at them some time ago, but forgot the details
<beneroth>
as pilDB is normally used with a global full lock, real full serializing of all transactions/actions, skip list makes no sense. it could make sense if one would want to optimize pilDB for high frequency concurrent writes, but this would anyway require some substantial changes to e.g. the whole locking and transaction handling
<Regenaxer>
It has not to do with full locking
<Regenaxer>
When a tree is modified, some nodes may change
<beneroth>
no, skip list hasn't. but skip list makes no sense as long one does full locking anyway, I think.
<Regenaxer>
they are sent by 'tell'
<Regenaxer>
So no global lock is needed
<beneroth>
I refer to the global lock in the normal (dbSync) ... (commit 'upd) pattern
<Regenaxer>
Nodes in b-trees are just like other external syms
<Regenaxer>
yes
<Regenaxer>
but that is not relevant here
<Regenaxer>
you could implement local locks and still use b-trees
<Regenaxer>
like other objects
<beneroth>
the point of skip lists is, as I understand it, that a modification of a skip list would usually involve changes to a smaller number of nodes/external syms than in a btree
<Regenaxer>
yes, but b-trees too
<beneroth>
too what?
<Regenaxer>
normally only a single node is affected
<Regenaxer>
a single sym
<beneroth>
balancing?
<Regenaxer>
yes, when splitting or joining
<beneroth>
e.g. adding new indexed values
<Regenaxer>
then more may be
<Regenaxer>
right
<Regenaxer>
but you add and add for a while
<beneroth>
yeah. for this use cases, the skip list would involve less modifications all in all, that is the argument for skip list.
<Regenaxer>
then the node splits
<beneroth>
aye. no or less node splitting with skip lists, as they are unbalanced.
<Regenaxer>
and you have a lot more nodes
<beneroth>
BTrees remain unbeaten in query speed, afaik
<Regenaxer>
I hope so
<Regenaxer>
the nice point is that they stay balanced
<beneroth>
afaik they're still universally the best thing for all index use cases, except when you have certain index/usage patterns with certain guarantees that you can use a more optimal approach than BTree. but BTree is the best universal optimal approach, still nothing better found (and lot of people tried).
<Regenaxer>
yep
<beneroth>
but yeah, optimizing for high concurrent write throughput requires quite some different structures and algorithms than traditional database setups, which are mostly optimised for querying and some writes (but in general less writes than reads)
<beneroth>
and high write availability are use cases people build databases for this days, e.g. log data stream processing
<beneroth>
mainly to handle website visitor tracking stuff, server/system logging usually does not produce so much data that this is really required ;-)
<beneroth>
btw. Regenaxer, picolisp is pretty unique in that it does actual in-place modifications. most databases (also the old big ones) do only physically add/extend data, producing a stream of snapshots and changes, and then garbage collect regularly (hopefully)
<beneroth>
so they have a lot of overhead to properly keep track of that stuff, but it allows higher concurrent throughput in many situations.
<Regenaxer>
I see, didn't know
<beneroth>
but pilDB is much simpler and for typical applications more than enough :)
<beneroth>
maybe pilDB has more IO operations, many small ones vs. others tend to have less but bigger IO operations.
<Regenaxer>
Not sure, as all stuff is fetched only once usually
<beneroth>
pilDB transactions are always real serializable transactions (well, they're serialized), which is the most safe/secure way to do things (especially for e.g. accounting applications)
<beneroth>
Regenaxer, I mean writes
<Regenaxer>
ok
<beneroth>
for reads you are probally right, there pilDB probably looks like other DBs, likely it even has better IO patterns.
<beneroth>
guesstimate, not benchmarks
<Regenaxer>
Writes in 'commit' are sorted in sequence they are in the file
<beneroth>
ok, so optimized for the OS/disk?
<beneroth>
nice
<Regenaxer>
A little
<Regenaxer>
but they go to the OS disk buffer first anyway
<beneroth>
well, less sorting to do for the buffer
<beneroth>
_=
<Regenaxer>
Modern OSes do a lot of buffering anyway
<beneroth>
:)
<beneroth>
yeah
<Regenaxer>
T
<beneroth>
I think your approach of building/trusting on that is good engineering.
<Regenaxer>
I think other DBs do their own buffering
<beneroth>
most databases try to do everything themselves, but this is a lot of second-inventing/implementing
<Regenaxer>
T
<beneroth>
of course that can give better results on bad OS bad hardware, and it gives more stable/predictable behaviour which might be a pretty important property for proper operation / load estimates
<beneroth>
ah nice link for you, haha
<Regenaxer>
yes, allows finer control eg for block caches
<beneroth>
you will feel confirmed in your insisting on KISS
<beneroth>
and oracle was THE first big leader for relational SQL database business, and remained leader in that space until about 15 years ago
<Regenaxer>
yeah
<beneroth>
and a commenter writes "Even PostgreSQL is 1.3M lines of code"
<beneroth>
I guess PostgreSQL is the most sane implementation of SQL/relational model in widespread use
<beneroth>
pilDB has not all the features and properties of those databases, but for typical business applications and websites etc. it has more than enough, graph and OOP is even better suited to this task than relational model, and pilDB is...