jemc changed the topic of #ponylang to: Welcome! Please check out our Code of Conduct => https://github.com/ponylang/ponyc/blob/master/CODE_OF_CONDUCT.md | Public IRC logs are available => http://irclog.whitequark.org/ponylang | Please consider participating in our mailing lists => https://pony.groups.io/g/pony
alxs has joined #ponylang
codec2 has quit [Read error: Connection reset by peer]
codec1 has quit [Read error: Connection reset by peer]
alxs has quit [Quit: Computer's gone to sleep. ZZZzzz…]
gokr has quit [Ping timeout: 248 seconds]
alxs has joined #ponylang
alxs has quit [Client Quit]
alxs has joined #ponylang
alxs has quit [Ping timeout: 256 seconds]
HarryHaaren has quit [Quit: bye all]
_whitelogger_ has joined #ponylang
DDR has joined #ponylang
jmiven has joined #ponylang
alxs has joined #ponylang
alxs has quit [Ping timeout: 246 seconds]
khan has joined #ponylang
jemc has quit [Ping timeout: 252 seconds]
jemc has joined #ponylang
jemc has quit [Ping timeout: 248 seconds]
khan has quit [Quit: khan]
khan has joined #ponylang
khan has quit [Quit: khan]
khan has joined #ponylang
khan has quit [Client Quit]
khan has joined #ponylang
jemc has joined #ponylang
jmiven has quit [Quit: co'o]
jmiven has joined #ponylang
_whitelogger has joined #ponylang
jemc has quit [Ping timeout: 264 seconds]
khan has quit [Quit: khan]
samuell has joined #ponylang
dipin has quit [Quit: dipin]
codec1 has joined #ponylang
gokr has joined #ponylang
gokr has quit [Ping timeout: 264 seconds]
user10032 has joined #ponylang
khan has joined #ponylang
khan has quit [Client Quit]
khan has joined #ponylang
dipin has joined #ponylang
jemc has joined #ponylang
jemc has quit [Ping timeout: 276 seconds]
gokr has joined #ponylang
khan has quit [Quit: khan]
khan has joined #ponylang
winksaville has joined #ponylang
khan has quit [Client Quit]
khan has joined #ponylang
<winksaville> @dipin, yt?
<dipin> @winksaville, yes... i'm here...
<dipin> what's up?
<winksaville> I've run pony-ring for an 1 hour with --ponyminthreads 9999 and --ponynoblock
<winksaville> It did NOT hang
<dipin> yay! that's good to know that the disabling of dynamic scheduler scaling does indeed restore the correct behavior
<winksaville> It was build with build/debug-scheduler_scaling_pthreads/ponyc
<dipin> btw, you probably didn't notice yet, but i just opened a PR for the pthreads variant of dynamic scheduler scaling.. it also adds in some assertions that should help identify any incorrect behavior for the signals variant
<winksaville> What would you like me to try next?
<winksaville> I did see it, which is why I'm asking what you'd like me to try next
<dipin> Ah, okey.. Ummm... i guess the PR version would be the next step.. both with pthreads and signals... initially as debug.. and assuming the debug version works without any issues (a big assumption), then as release
<winksaville> k, what about the patch that tests atomics?
<dipin> keep in mind that the PR would not address "The third is where the no threads are suspended but there is still a hang (your last backtrace). My guess is this happens because there's an edge case around quiescence and thread waking/sleeping that I missed."
alxs has joined #ponylang
<dipin> I'll redo a variant of the atomics patch on top of the PR commit since at least the pthreads implementation would still hang no matter what due to the incorrect use of the pthread condition variables
<dipin> we can run the atomics patch on top of the PR commit if there are still hangs with the PR changes in place (except for the one i noted that the PR wouldn't address)
alxs has quit [Client Quit]
<winksaville> Right at the moment I've also got my corrected version of your "ponyint_thread_wake return an error" patch, what do you want to do with that?
<dipin> i just added the atomics patch for the PR code to the issue: https://github.com/ponylang/ponyc/issues/2451
<dipin> i included the "ponyint_thread_wake return an error" logic in the PR for the pthreads fix
<dipin> you can revert all changes and use the PR commit as a starting point
<dipin> it should be complete and include everything we've discussed/tried so far
<winksaville> To minimize screw ups on me patching things, how about you create a branch on your fork that I'll fetch and apply to my fork
<winksaville> And lets both sync to TOT of master so we're starting at the same point
<dipin> i updated to master right before i opened the PR from that branch
<winksaville> What is the SHA1 for your master?
<winksaville> I'm on b9554803b51e1426364e1144e6834def14934e5e right now
<dipin> the last commit i have from upstream ponyc is 28b67dd22e4a91cd736c69b7beaf9e7529c01428 by Sean from 16 hours ago
<winksaville> I've got to update my fork, doing that now
<dipin> okey dokey
<winksaville> k, I'm on 28b67dd22e4a91cd736c69b7beaf9e7529c01428 too
<dipin> the branch "fix_pthread_scheduler_scaling" in my fork is exactly 1 commit ahead of that
<winksaville> If we use my pony-ring for testing I won't need to cherry-pick my openssl PR, can you grab that and see that it compiles and hopefully fails for you?
<dipin> sure, i'll do that now
<winksaville> great, I'll pull your fix_pthread... and see how it goes on my side
Pyrrh has joined #ponylang
<dipin> i have it built and running.. waiting for it to hang (it takes a lot longer on my vm than it does for you).. will let you know
<winksaville> Jezz, I forget how to fetch/pull a branch
<winksaville> can you help me
<dipin> wait, nevermind.. it hung already
<dipin> yes,.. first add a remote: git remote add dhp https://github.com/dipinhora/ponyc
<dipin> then fetch the remote: git fetch dhp
<dipin> then you can reference any branch in that remote as dhp/branch... so you can switch to the pthread fix branch as: git checkout dhp/fix_pthread_scheduler_scaling
<dipin> at least, that how i end up doing it for upstream... there's likely a better way
<winksaville> ok that seemed to work, I've cherry-picked that change into a branch off master I'm calling debug-2451-dhp-fix_pthread_scheduler_scaling
<winksaville> ok, what is the command line you've used for compiling ponyc
<dipin> okey, so, the hang i reproduced was off that master commit.. switching to my branch hasn't caused a hang for over 5 mins now
<dipin> make config=debug use=scheduler_scaling_pthreads
<winksaville> k, starting build
<dipin> once you have that built, can you try compiling builtin_test? (it should compile without any issues because it doesn't rely on openssl) ./build/debug-scheduler_scaling_pthreads/ponyc packages/builtin_test/
<winksaville> off master did you reproduce using pony-ring?
<dipin> yes, reproduced off that master commit 28b67dd22e4a91cd736c69b7beaf9e7529c01428
<winksaville> GREAT!!!!
<dipin> using `builtin_test` might be better than pony-ring because it runs very quickly compared to pony-ring (at least on my vm)
<winksaville> whoops, build failed I needed default_pic=true for Arch
<dipin> ah, yes.... that's annoying
<winksaville> something the compiler should be able to figure out :)
<dipin> i'm still waiting for it to be able to figure out exactly what i intend and magically build it for me 8*P
<winksaville> someday, AI is going to do things like that
<dipin> sadly, i can never find it on the calendar so i can plan accordingly 8*/
<winksaville> Actually, I think the one of the first uses of AI for programmers would be in testing, especially getting good coverage
<winksaville> build is done, going to try pony-ring
<dipin> k
<winksaville> started a looping test
koczurekk has joined #ponylang
<dipin> k.. lets see how far it gets
HarryHaaren has joined #ponylang
khan has quit [Quit: khan]
khan has joined #ponylang
<winksaville> 5 min pretty good, considering it wouldn't make it for more than 20secs, i.e. just a few loops
<winksaville> before
<dipin> that's good
khan has quit [Client Quit]
khan has joined #ponylang
<winksaville> I'm going to stop it and see if playing with the parameters makes a difference, it looks like only a couple cpus are running 100% for any length of time
<dipin> i don't think the root cause is related to how much cpu is used but just how often it runs (i.e. race condition that only surfaces every once in a while)
<winksaville> Changed pass to 1,000,000 now its running a long time, hasn't stopped yet but only 2 cpus are at 100%
<winksaville> Doesn't seem right
<dipin> it's why i think using "builtin_test" is good.. it allows for more runs becuase it finishes very quickly
<dipin> what doesn't seem right?
<winksaville> only 2 cpus are running full tilt
<winksaville> there are 1000 rings of 1000 nodes I think 6 of my 12 cpus should be maxed out
<winksaville> pony-ring runs quickly when pass is 10
<winksaville> ok now only one CPU is at 100%
<dipin> oh, well, that's because of the scheduler scaling
<dipin> it will shut down threads if they're not really being used
<winksaville> hmmm, there are 1000 rings each process messages so should be plenty of work to do
<dipin> just becuase you have a lot of work, but it's not parallel work, dynamic scheduler scaling will shut down the extra threads
<winksaville> I don't understand, there are 1000 different rings each with 1000 actors and each of the rings is passing a message from one node in the ring to the next.
<winksaville> seems a perfect parallel scenario
<winksaville> before the fix there would be 6 CPUs at 100%
<dipin> i would have to look at the code for the ring in more detail.. all i can say for sure is that the scheduler suspend code only kicks in when a scheduler thread would normally block (i.e. has no work to do or steal for a while)
<dipin> hmm. okey.. maybe i broke something then. 8*/
<winksaville> let me use gdb and see what the state of affairs is
<winksaville> it just finished and started another run as I typed the above
<dipin> k
<winksaville> so its didn't hang, just didn't behave as I was expecting
<dipin> not hanging is a good starting point
<winksaville> yea, but we can accomplish that by setting minthreads to 9999 :)
<winksaville> and all 6 CPUs will be utilized
<dipin> also, when i was developing/testing the dynamic scheduler scaling logic, i relied on slfritchie's "examples/message-ubench/" program to test that it was working as expected (i.e. only scales down it not enough work)
<dipin> yes, the minthreads setting is there for folks who want maximum performance at the expense of wasted cpu cycles
<winksaville> what would you like me to try, I can revert the fix and double check that 6 of my 12 CPUs are used.
<winksaville> (I just pushed a change to pony-ring which allows you to pass the various parameters on the command line, you might want to pick that up)
<winksaville> I'm going to do a quick test with master without your change and verify CPU utilization
<dipin> can you try using the message-ubench program? let me dig up the arguments i used...
<winksaville> sure
<dipin> see this comment for details on how i tested with message-ubench during development: https://github.com/ponylang/ponyc/pull/2386#issuecomment-347705787
<dipin> let me know if you have any questions
<winksaville> k
<winksaville> So I tested master by itself, its interesting
<dipin> well, i would expect so since master by itself has broken logic for how it wakes sleeping threads
khan has quit [Quit: khan]
khan has joined #ponylang
<winksaville> Here is something weird: with pass 10000 2 CPUs are 100%, but if pass is 1000000 only 1 CPU is at 100%
<winksaville> In anycase I lied I'm not seeing 6 CPUs at 100%, but that would still be my expectation. Of course why when passing 10,000 2 are at 100%, but passing 1,000,000 only 1 one is at 100%, obviously that doesn't meet my expectation
khan has quit [Client Quit]
khan has joined #ponylang
<winksaville> That's on master, let me try the same experiment with the fix.
<winksaville> dumb me, I hadn't recompiled with the master build :(
<winksaville> So on master with pass 10,000 and 1,000,000 2 CPUs are at 100%.
<winksaville> I compiled message-ubench on master and 6 CPUs are at 100%
<winksaville> I then recompiled with the fix and also 6 CPUs are at 100%, so that meets my expectation.
<dipin> k.. can you try message-ubench with the fix branch?
<dipin> k, that's good
<winksaville> So apparently passing messages between actors isn't considered work?
<winksaville> I've gtg for now, maybe back this evening, please add comments to issue 2451 with what you'd like me to try next.
<dipin> @winksaville message-ubench also only does message passing between actors so there's something else different between the two.. thanks for your help.. i'll stick a note in the issue for what i think would be worth trying next
koczurekk has quit [Quit: Leaving]
khan has quit [Quit: khan]
codec1 has quit [Read error: Connection reset by peer]
HarryHaaren has quit [Quit: bye all]
atk has quit [Quit: Well this is unexpected.]
atk has joined #ponylang