azonenberg_work has quit [Ping timeout: 258 seconds]
scrts has quit [Ping timeout: 260 seconds]
digshadow has joined ##openfpga
scrts has joined ##openfpga
<rqou>
why is windows ridiculously bad at everything?
<balrog>
rqou: what are you fighting with now
<rqou>
the usual: updating
<rqou>
it's always amazing how linux can update everything on the system and still work whereas windows can barely even get itself updated most of the time
<azonenberg>
Lol
<azonenberg>
yeah
<azonenberg>
the only time i ever reboot is when a major kernel bug gets patched
<azonenberg>
minor kernel stuff i normally dont bother rebooting for
<rqou>
anyways, after flipping my gpu setup a bunch my system is working again
<awygle>
god i hate this smoke
<rqou>
switching from nvidia in linux, amd in guest over to nvidia in guest, amd in linux
<rqou>
apparently amdgpu got debugged a whole bunch in the meantime
<azonenberg>
awygle: yeah its pretty bad
<rqou>
wait you guys have smoke too?
* azonenberg
looks at amazon tracking to see when his respirators will be here
<azonenberg>
awygle: the whole PNW is downwind of a bunch of massive fires in BC
<awygle>
THIS IS WHY I MOVED OUT OF CALIFORNIA (no it's not)
<rqou>
there's apparently a fire near here too that's been put out
<awygle>
at least my hair isn't covered in ash, so still better than my childhood near LA
<rqou>
anyways, my f*cking amazon order that i rather urgently need still isn't here
<azonenberg>
... arriving tomorrow by 20:00
<rqou>
should have been here yesterday
<azonenberg>
Welp, better than nothing i guess
<rqou>
i've had multiple problems with amazon recently
<awygle>
so what's everybody working on this evening?
<rqou>
sysadmin-ing
<azonenberg>
Debugging some PLL code, then prepping stuff for a camping trip later in the month
<azonenberg>
have to vacuum pack a bunch of trail snacks etc
<rqou>
oooh radv became working in the meantime as well
<awygle>
still getting thermal/timing/starshipraider setup going?
<rqou>
woo ANV works too
<rqou>
except it spams the console with a fixme
<azonenberg>
awygle: i've tabled that for a bit
<azonenberg>
have to assemble some more greenpak breakouts
<azonenberg>
Which are in my non-AC'd garage
<azonenberg>
i really dont want to go in there right now :p
* awygle
understands completely
* awygle
is jealous of your AC'd anything
<azonenberg>
I have no AC
<azonenberg>
But the garage is also uninsulated
<azonenberg>
vs the house
<awygle>
ah
<rqou>
hmm btw azonenberg remember me complaining about my laptop on battery at defcon?
<rqou>
if i plug it in and run higan, it runs at full speed
<azonenberg>
you probably have throttling set to powersave
<rqou>
because it boots the cpu from ~1.6 ghz to ~2.9 ghz
<azonenberg>
vs performance
<rqou>
i don't recall configuring this or how to change it
<azonenberg>
IDK about whatever GUI you have
<azonenberg>
but /sys/devices/system/cpu/cpu*/cpufreq/
<azonenberg>
scaling_available_governors shows the schemes installed in the kernel
<azonenberg>
write the name of one to scaling_governor to set throttling for that core
scrts has quit [Ping timeout: 248 seconds]
<rqou>
i'm quite impressed how well anv/radv work
<rqou>
now if only team green could get more open
<rqou>
whoops, never mind
<rqou>
just got a gpu hang
http_GK1wmSU has joined ##openfpga
http_GK1wmSU has left ##openfpga [##openfpga]
<awygle>
azonenberg: what do you typically use for massive CPU concurrency? MPI?
<rqou>
bash? :P
<rqou>
i've only written very trivial parallel programs :P
<azonenberg>
awygle: MPI is meant for closely coupled stuff
<awygle>
more like mosh :P
<azonenberg>
on a heterogeneous cluster
<azonenberg>
It can work but requires a bunch of setup
<awygle>
yeah i know ~~nothing about cluster stuff
<rqou>
awygle: i used openmp thanks to 61c :P
<rqou>
does that count?
<azonenberg>
openmp is for single host SMP stuff
<azonenberg>
For distributed stuff, honestly, i prefer plain old sockets
<rqou>
for distributed, i've always done it with bash and scp :P :P
<awygle>
just roll your own? client-server?
<azonenberg>
have a daemon on each worker system (or each core, whatever is easier)
<azonenberg>
Yes
<awygle>
hm, k
<azonenberg>
Works much better than MPI for strange topologies
<azonenberg>
But before we worry about specifics
<rqou>
nobody likes bash+scp? :P
<rqou>
babby's first distributed system?
<azonenberg>
figure out algorithms for CPU concurrency
<azonenberg>
in PAR
<rqou>
actually, i just remembered i've also used mapreduce
<azonenberg>
Once we have some papers picked out etc, we then look at specifics of implementation
<rqou>
awygle: did 61c have "mapreduce on ec2" when you took it?
<awygle>
i'm a "learn by doing" kind of guy tbh
<awygle>
so i've been doing some prototyping to make sure i actually understand these algorithms
<azonenberg>
awygle: sure but the choice of topology etc will depend in part on the algorithm we pick
<azonenberg>
and i want to start with a known one, then build out from there
<awygle>
rqou: we did a hadoop project. that was the one i did drunk.
<rqou>
oh right yeah hadoop
scrts has joined ##openfpga
<awygle>
classmates: "How'd you do the hadoop project?" me: "I legitimately don't know"
<rqou>
lol
<rqou>
that sounds right even if you weren't drunk
<rqou>
was there cuda when you took it?
<awygle>
azonenberg: sure but that goes both ways, i'll have a better idea of what to look for in algorithms if i understand the basics of how we'd want it to scale
<azonenberg>
awygle: True
<azonenberg>
So my planned use cases are threefold
<awygle>
rqou: "was there cuda", yes. "was there cuda in 61c", no
<azonenberg>
first: daemons on every dev's workstation
<rqou>
ah
<azonenberg>
whenever someone does a build, you launch it on everyone's desktop
<azonenberg>
(within a corp office etc)
<azonenberg>
second: dedicated rack in the lab for doing PAR runs
<rqou>
when i took 61c there was a project involving cuda
<azonenberg>
third: EC2 instances or similar
<rqou>
it was duct-taped onto the "matrix multiply" project
<azonenberg>
Ideally, i want scalability from a single instance on a single workstation up to a few thousand cores
<rqou>
idk if they had that when you took it
<awygle>
rqou: fuck that project for real
<rqou>
lool
<azonenberg>
But even scaling to a few hundred with good linearity is waaay better than e.g. vivado does
<rqou>
hilariously, the cuda implementation was significantly slower than the cpu implementation
<rqou>
and yes, this was expected
<awygle>
rqou: i got like a B+ on that project. a kid in my class got *hired by intel* after it.
<awygle>
people went fucking insane
<rqou>
also hilariously, before multi-core was added, the project ran faster on my laptop vs the hive workstations
<rqou>
because my laptop is haswell and has avx, and hive doesn't
<awygle>
azonenberg: i see. and i think i get why we've got different opinions re: GPUs
<awygle>
because "dude in his garage" (or in my case, "on an air matress" isn't on your list
<azonenberg>
Correct
<rqou>
meanwhile i was working on that project while i was busy with a whole bunch of other stuff, so i just hit the minimum performance requirement and turned it in :P
<azonenberg>
I want it to run down to a few cores on one desktop
<azonenberg>
so somebody without ten racks of servers or a lot of EC2 credit can use it
<azonenberg>
But the goal is to allow test-driven development and continuous integration flows for large HDL projects
<azonenberg>
Which means design iterations in seconds to minutes, not hours to days
<awygle>
i agree with that, but i think the best (metaphorical) power-delay product in that space is GPU support
<azonenberg>
We will still need scaling across multiple hosts, one GPU isn't enough for my plans
<awygle>
esp. if it can cover more than one use case, i.e. me with my laptop and somebody with EC2 G instances (it's G right?)
<rqou>
i really don't want to start with GPUs either because of just how painful debugging/optimizing/profiling GPU compute is
<azonenberg>
If we use GPUs host-side, that's fine within reason
<azonenberg>
And that too
<azonenberg>
If we design an algorithm for classical multithreading we can always push the compute-intensive stuff to GPUs if it seems optimal
<azonenberg>
My experience with GPUs is that they're devices for turning compute-bound problems into memory-bandwidth-bound problems :p
<awygle>
azonenberg: lol. not necessarily inaccurate
<rqou>
hmm, windows update lies and does something special when it hits 90% downloaded
<rqou>
also, my qemu IO is slow as shit
<rqou>
thanks btrfs
<azonenberg>
awygle: in particular, GPUs are great at data-parallel code with very little branching on a fairly small amount of data (or a large amount of data processed in a sliding window by many threads with spatial locality to each other) that does a lot of floating point math
<azonenberg>
Like machine vision or image processing or DSP in general
<rqou>
eh, integer math is "okay" now
<azonenberg>
For branchy integer stuff that jumps all over the place and is full of random access operations, like a PAR probably is
<azonenberg>
i'm skeptical the GPU will buy you much
<azonenberg>
you'll spend most of your time waiting on RAM fetches
<azonenberg>
But until we have an algorithm figured out, i can't do that analysis with anything more than a gut feeling
<awygle>
azonenberg: sure, gotcha
<rqou>
yeah, i personally would start with a non-parallel non-optimized implementation in something like python
<azonenberg>
I did a lot of CUDA stuff back in the day, i wrote the fastest CUDA password cracker for $1$ crypt(3) hashes (BSD-MD5) back in 2009
<rqou>
wait you used to do cuda?
<azonenberg>
this was back before GPUs even had L2 caches
<rqou>
that just seems very surprising to me
<awygle>
just to repay your "here's why i'm leaning CPU" with my "here's why i'm leaning GPU"
<azonenberg>
geforce 8000/9000 era, Tesla C1060
<awygle>
i have a little experience with CUDA (not a ton but more than "a million x86 cores")
<azonenberg>
so you had to use the shared memory as a software-managed cache ,explicitly prefetching memory many clocks before you needed it to avoid latency
<azonenberg>
etc
<rqou>
i tried running cuda on a geforce 8500 (oem card) and noticed how slow it was
<awygle>
i have a couple decent GPUs
<awygle>
(and no EC2 credit currently)
<azonenberg>
I have what was once a $8000 GPU workstation sitting in my garage
<rqou>
wait how did you get access to a tesla?
<azonenberg>
now, the four tesla C1060s are probably worth $50 each and will likely become microscope food :p
<azonenberg>
rqou: consulting gig a while back, as part of my payment i got to keep the rig
<rqou>
damn
<azonenberg>
tl;dr company folded and i inherited some of the stuff
<awygle>
and most of the papers i've found are in CUDA (tbh academia seems to use "parallel" as a synonym for "runs on a GPU")
<azonenberg>
awygle: you're reading the wrong papers
<azonenberg>
HPC stuff does a LOT with zillions of CPU cores
<awygle>
seems likely
<azonenberg>
I have not seen any papers on parallel PAR doing such an architecture
<rqou>
anyways, my overkill gaming rig has a gtx1080 if you want me to test stuff
<rqou>
so dumb question: what is it about AMD's architecture that makes cryptocurrency miners prefer it so much more?
<azonenberg>
i have the paper floating around somewhere...
<azonenberg>
sec
<awygle>
i'll spend some time tonight on nonspecific HPC papers, like i said my background in that area is weak. not gonna make me an expert but at least i might pick up some of the language
<azonenberg>
rqou: apparently i had two cracker.pdf in different dirs
<azonenberg>
that were different papers...
<awygle>
(yes. i learned MIPS. kind of.)
<rqou>
if you took it with dan garcia then you also learned the joy of self-modifying code :P
<awygle>
na it was katz and... hennessy? patterson? one of those guys
<awygle>
probably patterson, that sounds right
* azonenberg
a bit jelly that you actually took a class from the authors of H&P
<awygle>
yeah i had no fukkin clue tbh
<rqou>
can anybody explain to me why windows has to "prepare" to install updates?
<rqou>
why can't it just install them like a normal OS?
<azonenberg>
Because windows has no concept of package management? :p
<rqou>
but it _does_
<rqou>
windows installer has _some_ idea
<awygle>
azonenberg: do you have a recommendation for a decent "overview" type paper in the HPC space?
<lain>
rqou: I'm probably wrong but I think the preparing to install step is related to system restore points or something
<azonenberg>
awygle: unfortunately no
<azonenberg>
i took a class on it back in 2009 and otherwise have just fooled around and tinkered
<azonenberg>
my experience is more with distributed stuff than supercomputing although i did do some stuff with matrix multiplication homework on a BlueGene/L
<rqou>
lain: even better, wtf is "Getting Windows ready?"
GenTooMan has quit [Quit: Leaving]
<lain>
rqou: just a guess but windows does some trickery for fast boot where they basically save a minimal hibernation state on first boot after an update, so that may be it building the hibernation state
<rqou>
wow really?
<lain>
yeah
<rqou>
that's quite impressive
<lain>
I know they do that, just not sure if that's what "getting windows ready" is
<rqou>
that's even more advanced than a "prelinked kernel"
<rqou>
that a certain other vendor uses
<lain>
that's why my laptop boots in like two seconds haha
<lain>
you can bypass the image at bout time but I forget the incantation
<lain>
boot*
<rqou>
now it's actually "Working on updates"
<lain>
haha
<rqou>
i wonder how much of this is windows being shitty and how much of this is btrfs being shitty
<rqou>
or how much of this is spinning rust being shitty because this vm isn't on an ssd
<lain>
the windows upgrade process is pretty slow, much moreso on rust because it involves a crapton of small disk io ops
<rqou>
which is also slow hitting btrfs :P
<rqou>
so it's slow^3
<rqou>
hyperslow :P
<lain>
haha
scrts has quit [Ping timeout: 260 seconds]
scrts has joined ##openfpga
<rqou>
alright, my amazon order finally arrived
<rqou>
i can single-click again!
<rqou>
can someone explain to me why window can't install all updates at once rather than a batch at a time?
<rqou>
are DAGs really that complicated?
shiyer has quit [Ping timeout: 276 seconds]
scrts has quit [Ping timeout: 276 seconds]
shiyer has joined ##openfpga
amclain has quit [Quit: Leaving]
scrts has joined ##openfpga
<rqou>
whelp, firefox lost all my tabs
<rqou>
tab bankruptcy time
scrts has quit [Ping timeout: 240 seconds]
<lain>
lol
wpwrak has quit [Ping timeout: 255 seconds]
<lain>
Firefox is a dumpster fire in terms of session state
<awygle>
i wish i could get papers in audiobook form... would make my commute more useful
<awygle>
sadly even if i ever finish my James Marsters Text-to-Speech bot it will not handle diagrams
scrts has joined ##openfpga
<rqou>
ugh, why is the android rom situation such a clusterf*ck?
<cyrozap>
The tl;dr of that being, unless you _really_ have a lot of data, running jobs on a single host can actually be a lot faster.
<cyrozap>
And the larger lesson is, in order to write really fast code, you have to understand the problem really well. i.e., "Is it I/O-bound?", "CPU-bound?", etc.
<rqou>
ugh this is really dumb
<cyrozap>
I got many lessons in this when I was optimizing a OpenCL miner for a particular cryptocurrency, and learned even more when I tried writing an FPGA miner.
<rqou>
why is so much android ROM manipulating software built for windows?
<rqou>
i just want to patch my kernel, why is this so hard?
<cyrozap>
For instance, it's often faster to tolerate bad results from the GPU if it makes the GPU code run faster, the CPU can perform a check on the results, and the bad results don't saturate the GPU<->CPU bus bandwidth. The relevance to cryptocurrency mining is that it's usually faster to just check if the upper N-bits of the hash are below the target, where N is the ALU data width. Then, the CPU runs a
<cyrozap>
check on the full hash returned from the GPU before submitting it to the network.
<cyrozap>
Like, if you need random numbers, do you need _truly_ random numbers? Or will a PRNG work?
<cyrozap>
In this instance, by using a PRNG running on the GPU, the author was able to parallelize the particle generation and was only limited by the GPU speed, instead of being limited to the rate the CPU could send random numbers to the GPU.
<cyrozap>
btw, I'm writing all of this because I was once one of those people who thought you could make any algo faster by putting it on a GPU/FPGA :P
<cyrozap>
rqou: Have you tried _not_ using Windows ROM manipulation software? There're better programs for Linux.
<rqou>
well, i can't seem to find it
<rqou>
there just seem to be giant messes all over XDA
<cyrozap>
The kernel is usually stored as a separate partition, so it should only be a matter of taking your newer kernel, converting it to the Android image format, and then writing that to flash at the right offset.
<rqou>
well, i haven't even gotten to that part yet
<rqou>
arrgh f*ck it i'm just going to "port" sunxi-debug into my kernel
<jn__>
can't you run an sshd as root or something?
<rqou>
no, because reasons
<rqou>
android sucks
digshadow has quit [Ping timeout: 268 seconds]
<azonenberg_work>
rqou: so i just successfully untechmapped a trivial greenpak design (the Inverters test case)
<rqou>
congrats
<rqou>
i'm still fucking with my phone
<azonenberg_work>
no bitstream involved
<azonenberg_work>
I ran it through synth_greenpak4
<azonenberg_work>
then techmap with cells_sim.v
<azonenberg_work>
then optimized and write_verilog
<rqou>
yeah, that's the procedure i was thinking too
<azonenberg_work>
next step is to go through a par'd bitstream and back, which means adding a bit more bitfile twiddling logic to handle a few special cases
<azonenberg_work>
as well as massively expanding cells_sim
<azonenberg_work>
i may actually split cells_sim up into two files
<azonenberg_work>
cells_sim_ams which has analog/mixed signal cells (not untechmapped, kept as black box IP, and not supported for simulation)
<azonenberg_work>
and cells_sim_digital which is fully behavioral verilog and supported for untechmapping
<rqou>
how are you planning to test the ams parts?
<azonenberg_work>
It'll just lift them up to a raw primitive instantiation
<azonenberg_work>
which i can check against the original HDL
<azonenberg_work>
in either case i have to do a lot more work on the counter sim model to make it useful for untechmapping
<rqou>
i was referring to the problem where i'm not aware of an open-source verilog-ams simulator
<azonenberg_work>
Oh
<azonenberg_work>
i wasnt going to simulate them
<azonenberg_work>
i was going to have them as empty blackbox modules
<azonenberg_work>
yosys needs a cells_sim as blackbox to synthesize properly vs giving 'undeclared module' errors
<rqou>
then how do you know it works?
<azonenberg_work>
know what, the untechmapping or what?
<azonenberg_work>
if you have a GP_ACMP in your bitstream yosys will spit out a GP_ACMP instance from the untechmap
<rqou>
how do you know the simulation models work?
<azonenberg_work>
There *are no simulation models* for the mixed signal stuff
<rqou>
or you don't really care at this point?
<azonenberg_work>
they are literally empty modules
<rqou>
i thought you were saying you were going to write some?
<azonenberg_work>
No
<azonenberg_work>
RIght now if i untechmap with cells_sim
<azonenberg_work>
all AMS IP will disappear
<azonenberg_work>
b/c they are empty modules
<azonenberg_work>
So i'm going to move them into a separate file that is used for synthesis as module declarations
<azonenberg_work>
but not for untechmapping
<rqou>
ah ok
<azonenberg_work>
i.e. it will only untechmap hard IP with a pure behavioral model
<azonenberg_work>
cells_sim will be `include cells_ams, cells_digital
<azonenberg_work>
cells_ams will be empty modules
<azonenberg_work>
cells_digital will be behavioral verilog
<azonenberg_work>
make sense now?
<rqou>
yeah
<rqou>
i thought you were going to actually go and write cells_ams
<azonenberg_work>
No
<azonenberg_work>
Not for a while at least
<rqou>
aargh i tried porting "sunxi-debug" to my phone and selinux is getting in my way
<rqou>
i f*cking hate android
<rqou>
wow, how many binary blobs do you need here?
scrts has quit [Ping timeout: 240 seconds]
<rqou>
apparently repacking an android ramdisk is rocket science
scrts has joined ##openfpga
coino has joined ##openfpga
<rqou>
apparently you need to be root to successfully pack/unpack a ramdisk
<rqou>
arrgh wtf this doesn't work either
<rqou>
azonenberg_work: at least one thing that's good is that it's hopefully somewhat harder to trivially sploit an android phone now
scrts has quit [Ping timeout: 246 seconds]
test123456 has quit [Quit: Leaving]
scrts has joined ##openfpga
<rqou>
ok, something in the kernel breaks when i call commit_creds
cr1901_modern has quit [Ping timeout: 240 seconds]
digshadow has joined ##openfpga
<rqou>
ugh the free-electrons lxr is totally broken now
<rqou>
can't search for anything
scrts has quit [Ping timeout: 240 seconds]
scrts has joined ##openfpga
m_w has quit [Quit: leaving]
scrts has quit [Ping timeout: 240 seconds]
scrts has joined ##openfpga
m_t has quit [Quit: Leaving]
<_awygle>
Lazy Internet fact-check - is it correct to say that the most recent Verilog standard was released in 2005, and that SystemVerilog has a 2005, a 2009, and a 2012 version? Or is 2009 just SV-05 plus Verilog-05?
<pie_>
rqou?
<rqou>
i'm not positive, but i believe systemverilog is now a superset of verilog
<pie_>
(sorry i just know youve read at least some of the spec :P)
<rqou>
if you can't find them on sci-hub, i'll send you a copy from behind the paywall
<_awygle>
I'm sure they're in SciHub I'm just still at work :-P
<_awygle>
Lattice Diamond is the least friendly environment I've ever worked with
<rqou>
i've never even installed diamond
<rqou>
first of all, diamond != icecube
<rqou>
i've also never used icecube, because why would i? :P
<_awygle>
I am not licensed to run two simulations at once >_<
<rqou>
heh, i've hit this problem before too :P
scrts has quit [Ping timeout: 260 seconds]
<_awygle>
I wonder if IT would let me install icarus
<rqou>
aaargh now i'm mad
<rqou>
the reason Magisk was bootlooping my phone has nothing to do with any of its tricks
<rqou>
it's just its stupid stupid attempts to patch out dm-verity
<rqou>
and FDE
<rqou>
neither of which i even want
<rqou>
now i can remove the backdoor from my kernel
<jn__>
what is Magisk?
<rqou>
yet another one of those superuser manager things
<rqou>
the old supersu one had some drama or something
<rqou>
also, this one has better safetynet tricking
scrts has joined ##openfpga
<rqou>
whelp, wasted close to half a day on that
<_awygle>
rqou: what nexus did you have?
<rqou>
i previously had a nexus 6p and got hit by the infamous battery problem
_awygle has quit [Remote host closed the connection]
azonenberg_work has quit [Ping timeout: 240 seconds]