<lf94>
(I know this is old news to folks who've been using lisp forever)
<lf94>
lexi-lambda, (+ 1 2 3 4 5 etc) could be optimized easily though, couldn't it?
<lexi-lambda>
The problem with parallelizing general-purpose programs is that virtually all general-purpose programs have many points that require synchronization unless carefully written in such a way to avoid that, and in those contexts, the overhead of parallelism often outweighs the benefits.
<lexi-lambda>
So it’s not so much that running code in parallel is hard but that orchestrating the parallel work and subsequently collecting the results is expensive.
<lf94>
ah
<lexi-lambda>
Put another way, it’s all the code around the parallel parts that take the most time.
<lf94>
well i was thinking really simply: halt execution in the vm until gpu returns result....
<lf94>
of course if you run 2 vms, one for processing, one for ui, everything should be fine?
<lf94>
not sure how you'd share state between them efficiently
<lexi-lambda>
Indeed. In a stateful language, being able to offload arbitrary computations to a separate processor isn’t really feasible, before even getting into discussions about communication overhead.
<lexi-lambda>
And all general-purpose languages are stateful. But you could certainly have DSLs, even embedded DSLs, that are designed to take advantage of parallelism. There is a lot of research into such things.
<lf94>
I guess you could use network protocol/sockets for communication between a UI and processing code
<lexi-lambda>
The GPU can’t use anything like sockets without coordinating with the CPU, so that doesn’t really help you.
<lexi-lambda>
It’s hard to get around the problem that parallelism is pretty hard. There was a great deal of excitement around automatic parallelization, and parallelism in general, in the nineties, but the hype didn’t translate into results.
<lf94>
why does there have to be parallelization at all?
<lf94>
vs just sending large chunks of data to gpu for processing
<lf94>
and waiting for a result?
<lexi-lambda>
The whole point of doing computation on the GPU is to do it in parallel.
<lf94>
it is an extremely strong point
<lf94>
but to avoid complexity, why not my point?
<lf94>
doesnt gpu offer like SIMD on steroids
<lexi-lambda>
SIMD is about parallelism.
<lf94>
I thought it was just doing many operations at once
<lexi-lambda>
Yes, that’s parallelism.
<lexi-lambda>
Not concurrency! But it is parallelism.
<lf94>
it's parallel but isn't it also atomic?
<lf94>
(obviously, I am extremely naive here)
<lexi-lambda>
Yes, parallelism and concurrency are not the same thing.
<lexi-lambda>
But my earlier point about automatic vectorization having been found to be extremely hard is very relevant here.
<lf94>
so saying "any scheme use simd" is essentially the same issue
<lf94>
hm
<lf94>
so...how do languages usually deal with using simd or gpus
<lf94>
and why is this hard to translate to scheme
<Duns_Scrotus>
Built in simd intrinsics
<lf94>
lack of money?
<lf94>
ah.
<rain1>
it's difficult to automate simd
<lf94>
Why cant we create a SIRF with support
<lexi-lambda>
Right. The programmer is forced to think about it and be explicit about it.
<lf94>
Special symbols
<Duns_Scrotus>
And it’s not in schemes Bc they all use some one word per value pointer tagging scheme so what’s the point
<lexi-lambda>
Sure, you could do that. But then you’re working in a pretty different language.
<lf94>
obviously it'd be done in a way that builds on top
<lf94>
not transform the lang
<lexi-lambda>
It’s hard to get around the fundamental problem that most programs naïvely written are not very parallel, and trying to parallelize small pieces ends up incurring way more communication overhead than you get in actual speedups.
<lf94>
it should be possible to turn one (+ 1 2 3 4 5) into an equivalent parallelized form and back
hjek has joined #racket
<lexi-lambda>
Right, but adding a few numbers together on the GPU isn’t that much faster than doing it on the CPU, and the cost of sending the data to the GPU and back eclipses the tiny speedup you get.
<rain1>
maybe you could create a DSL that enabled things like that, and SIMD programming and stuff
<Duns_Scrotus>
That kind of thing isn’t easily parallelized in any language
<lf94>
Hm, ok then
<lexi-lambda>
You get speedups when you offload huge chunks of work to the GPU and minimize communication between the CPU/GPU.
<lexi-lambda>
But that really has to be baked into the program’s design with a fair amount of forethought.
<lf94>
so much for that idea (and clearly this has been beat to death by thousands before me :P)
<lf94>
so next: hardware interpreter
<lf94>
where is it
<Duns_Scrotus>
I remember hearing some heavy breathing about Haskell autoparallelization
<lf94>
B)
<lf94>
yes I remember that too
<lf94>
because immutability, etc
<Duns_Scrotus>
But it’s probably the usual Haskell lies
<lf94>
r7rs in hardware would be amazing
<lf94>
or is that dumb too
<Duns_Scrotus>
Compilers are better than hardware interpreters
<lexi-lambda>
Data-parallel Haskell, Haskell’s automatic vectorization support, is dead.
<lexi-lambda>
It’s literally gone in the most recent version of GHC; they finally killed it off.
<lf94>
Duns_Scrotus, I thought so
<lf94>
we'd just be creating a turing machine that x86 already can do 10000x better
<rain1>
hardware interpreter isn't a good idea IMO
<rain1>
if you mean something like the old lisp machines
<lf94>
yeah
<rain1>
it's better to have a general purpose CPU and compile many languages to it
<lf94>
I understand though
<lf94>
What is the difference between targeting a lisp cpu instructions vs bytecode instructions vs x86 instructions
<lf94>
x86 is the fastest
<rain1>
it isn't lisp but you may be interested in the languaeg Futhark
<lf94>
so no ponit
<lf94>
Yes I saw that too
<lf94>
I'll probably touch that eventually
<lf94>
So ultimately the most important thing all this boils down to is creating a simple standard
<lf94>
Which I guess r7rs does a good job at being
<lf94>
this makes me pretty happy to think about
<lf94>
Can you bake in assembly instructions in your scheme
<lf94>
(myspecialasmfunction (+ 1 2 3 4))
<lf94>
(asm (stuff))
<lf94>
(define myspecialasmfunction (asm (stuff))) or whatever
<lf94>
(Now I'm thinking how can we out-do C/turing machine oriented languages)
<lf94>
(without resorting to C)
<Duns_Scrotus>
Definitely not in an untyped garbage collected language
<lf94>
because of the gc?
<lf94>
untyped/typed shouldnt matter here
<lf94>
(types at compile time of course)
<Duns_Scrotus>
Why shouldn’t it
<lf94>
because types at compile time disappear
<lf94>
after compilation
<rain1>
a dynamically typed language like scheme has the problem that we tend to box and unbox integers across function call boundaries
<lf94>
so untyped makes no difference
<rain1>
the compiler can try to optimize that out at times but it wont always manage
<lf94>
we dont specify boxes?
<Duns_Scrotus>
Yeah they’re replaced by heap pointers with runtime type information
<rain1>
instead of just int we have to move around a tagged union
<lf94>
Is this because of set! ?
<lf94>
I feel like this could be avoided in a compiled scheme
<rain1>
it's not becaues of SET!, this happens because in scheme you can put any type of value anywhere
<Duns_Scrotus>
There are ways to avoid boxing ints, but then they can’t avoid boxing floats
<lf94>
I avoid that even in js
<rain1>
(define (foo x) .. <- x can be called with a boolean, integer, anything
<lf94>
very nice example.
<lf94>
mmmmm
<Duns_Scrotus>
You can do heroic type inference
<lf94>
yeah type inference would work here I guess?
<rain1>
in a statically typed language like ocaml or haskell you could avoid every having to box integers
<lf94>
Unless something like + is also type agnostic
<rain1>
or C or rust
<lf94>
(define (thing x) (+ x "hello"))
<lf94>
ERROR on line 1: invalid type, expected Number: "yes"
<Duns_Scrotus>
How would you avoid boxing in those languages
<rain1>
it's just the nature of dynamic languages, we just accept a bit of slowdown in these respects for convenience
<lf94>
Nice, chibi scheme does know
<Duns_Scrotus>
They have polymorphism
<lf94>
Ok so + is Number only
<Duns_Scrotus>
I mean besides the obvious
<lf94>
So this can be inferred
<lf94>
But yea, you've shown pretty clearly the only way is to have an explicit type system.
<lf94>
and I guess enter: Racket
<lf94>
it is nice to follow this line of thought up to racket. :)
<lf94>
Is racket's way of defining things very similar to r7rs too?
<lf94>
I want to avoid diverging from r7rs as little as possible
_whitelogger has quit [*.net *.split]
zmt01 has quit [*.net *.split]
banjiewen has quit [*.net *.split]
_whitelogger has joined #racket
endformationage has quit [Ping timeout: 246 seconds]
amz3 has joined #racket
dustyweb has quit [Ping timeout: 268 seconds]
dustyweb has joined #racket
<lf94>
question: what does the `define really return?
<lf94>
'define
<lf94>
nevermind the quote actually
<lf94>
From what I understand it returns...nothing
<lf94>
(begin (define a 1)) ; returns nothing
<lf94>
(begin (define a 1) a) ; returns 1
<lf94>
But how can this be? define clearly has effects here
<lf94>
I guess define has side effects...
<lf94>
modifying the global state
<lexi-lambda>
Definitions are not expressions in Racket, so they don’t return anything.
widp_ has quit [Ping timeout: 245 seconds]
<lexi-lambda>
It would be like asking what `int x = 7;` returns in Java. The question doesn’t make sense because a variable declaration is part of the syntax of the language, and it doesn’t return anything.
<lexi-lambda>
You can see this for yourself by trying to put a definition in a context where a value must be produced. (list (define x 1)) is a syntax error.