(I know this is old news to folks who've been using lisp forever)
lexi-lambda, (+ 1 2 3 4 5 etc) could be optimized easily though, couldn't it?
The problem with parallelizing general-purpose programs is that virtually all general-purpose programs have many points that require synchronization unless carefully written in such a way to avoid that, and in those contexts, the overhead of parallelism often outweighs the benefits.
So it’s not so much that running code in parallel is hard but that orchestrating the parallel work and subsequently collecting the results is expensive.
Put another way, it’s all the code around the parallel parts that take the most time.
well i was thinking really simply: halt execution in the vm until gpu returns result....
of course if you run 2 vms, one for processing, one for ui, everything should be fine?
not sure how you'd share state between them efficiently
Indeed. In a stateful language, being able to offload arbitrary computations to a separate processor isn’t really feasible, before even getting into discussions about communication overhead.
And all general-purpose languages are stateful. But you could certainly have DSLs, even embedded DSLs, that are designed to take advantage of parallelism. There is a lot of research into such things.
I guess you could use network protocol/sockets for communication between a UI and processing code
The GPU can’t use anything like sockets without coordinating with the CPU, so that doesn’t really help you.
It’s hard to get around the problem that parallelism is pretty hard. There was a great deal of excitement around automatic parallelization, and parallelism in general, in the nineties, but the hype didn’t translate into results.
why does there have to be parallelization at all?
vs just sending large chunks of data to gpu for processing
and waiting for a result?
The whole point of doing computation on the GPU is to do it in parallel.
it is an extremely strong point
but to avoid complexity, why not my point?
doesnt gpu offer like SIMD on steroids
SIMD is about parallelism.
I thought it was just doing many operations at once
Yes, that’s parallelism.
Not concurrency! But it is parallelism.
it's parallel but isn't it also atomic?
(obviously, I am extremely naive here)
Yes, parallelism and concurrency are not the same thing.
But my earlier point about automatic vectorization having been found to be extremely hard is very relevant here.
so saying "any scheme use simd" is essentially the same issue
so...how do languages usually deal with using simd or gpus
and why is this hard to translate to scheme
Built in simd intrinsics
lack of money?
it's difficult to automate simd
Why cant we create a SIRF with support
Right. The programmer is forced to think about it and be explicit about it.
Special symbols
And it’s not in schemes Bc they all use some one word per value pointer tagging scheme so what’s the point
Sure, you could do that. But then you’re working in a pretty different language.
obviously it'd be done in a way that builds on top
not transform the lang
It’s hard to get around the fundamental problem that most programs naïvely written are not very parallel, and trying to parallelize small pieces ends up incurring way more communication overhead than you get in actual speedups.
it should be possible to turn one (+ 1 2 3 4 5) into an equivalent parallelized form and back
Right, but adding a few numbers together on the GPU isn’t that much faster than doing it on the CPU, and the cost of sending the data to the GPU and back eclipses the tiny speedup you get.
maybe you could create a DSL that enabled things like that, and SIMD programming and stuff
That kind of thing isn’t easily parallelized in any language
Hm, ok then
You get speedups when you offload huge chunks of work to the GPU and minimize communication between the CPU/GPU.
But that really has to be baked into the program’s design with a fair amount of forethought.
so much for that idea (and clearly this has been beat to death by thousands before me :P)
so next: hardware interpreter
where is it
I remember hearing some heavy breathing about Haskell autoparallelization
yes I remember that too
because immutability, etc
But it’s probably the usual Haskell lies
r7rs in hardware would be amazing
or is that dumb too
Compilers are better than hardware interpreters
Data-parallel Haskell, Haskell’s automatic vectorization support, is dead.
It’s literally gone in the most recent version of GHC; they finally killed it off.
Duns_Scrotus, I thought so
we'd just be creating a turing machine that x86 already can do 10000x better
hardware interpreter isn't a good idea IMO
if you mean something like the old lisp machines
it's better to have a general purpose CPU and compile many languages to it
I understand though
What is the difference between targeting a lisp cpu instructions vs bytecode instructions vs x86 instructions
x86 is the fastest
it isn't lisp but you may be interested in the languaeg Futhark
so no ponit
Yes I saw that too
I'll probably touch that eventually
So ultimately the most important thing all this boils down to is creating a simple standard
Which I guess r7rs does a good job at being
this makes me pretty happy to think about
Can you bake in assembly instructions in your scheme
(myspecialasmfunction (+ 1 2 3 4))
(asm (stuff))
(define myspecialasmfunction (asm (stuff))) or whatever
(Now I'm thinking how can we out-do C/turing machine oriented languages)
(without resorting to C)
Definitely not in an untyped garbage collected language
because of the gc?
untyped/typed shouldnt matter here
(types at compile time of course)
Why shouldn’t it
because types at compile time disappear
after compilation
a dynamically typed language like scheme has the problem that we tend to box and unbox integers across function call boundaries
so untyped makes no difference
the compiler can try to optimize that out at times but it wont always manage
we dont specify boxes?
Yeah they’re replaced by heap pointers with runtime type information
instead of just int we have to move around a tagged union
Is this because of set! ?
I feel like this could be avoided in a compiled scheme
it's not becaues of SET!, this happens because in scheme you can put any type of value anywhere
There are ways to avoid boxing ints, but then they can’t avoid boxing floats
I avoid that even in js
(define (foo x) .. <- x can be called with a boolean, integer, anything
very nice example.
You can do heroic type inference
yeah type inference would work here I guess?
in a statically typed language like ocaml or haskell you could avoid every having to box integers
Unless something like + is also type agnostic
or C or rust
(define (thing x) (+ x "hello"))
ERROR on line 1: invalid type, expected Number: "yes"
How would you avoid boxing in those languages
it's just the nature of dynamic languages, we just accept a bit of slowdown in these respects for convenience
Nice, chibi scheme does know
They have polymorphism
Ok so + is Number only
I mean besides the obvious
So this can be inferred
But yea, you've shown pretty clearly the only way is to have an explicit type system.
and I guess enter: Racket
it is nice to follow this line of thought up to racket. :)
Is racket's way of defining things very similar to r7rs too?
I want to avoid diverging from r7rs as little as possible
question: what does the `define really return?
nevermind the quote actually
From what I understand it returns...nothing
(begin (define a 1)) ; returns nothing
(begin (define a 1) a) ; returns 1
But how can this be? define clearly has effects here
I guess define has side effects...
modifying the global state
Definitions are not expressions in Racket, so they don’t return anything.
It would be like asking what `int x = 7;` returns in Java. The question doesn’t make sense because a variable declaration is part of the syntax of the language, and it doesn’t return anything.
You can see this for yourself by trying to put a definition in a context where a value must be produced. (list (define x 1)) is a syntax error.