fche changed the topic of #systemtap to: http://sourceware.org/systemtap; email systemtap@sourceware.org if answers here not timely, conversations may be logged
orivej has quit [Ping timeout: 250 seconds]
khaled has quit [Quit: Konversation terminated!]
hpt has joined #systemtap
orivej has joined #systemtap
orivej has quit [Ping timeout: 256 seconds]
orivej has joined #systemtap
yogananth has joined #systemtap
khaled has joined #systemtap
hpt has quit [Ping timeout: 256 seconds]
tromey has joined #systemtap
yogananth has quit [Quit: Leaving]
khaled has quit [Remote host closed the connection]
khaled has joined #systemtap
khaled has quit [Quit: Konversation terminated!]
khaled has joined #systemtap
<lindi-> Amy1: for(i=0;i<Nbody;i++){ should be for(i=0;i<Nbody-1;i++){
<lindi-> Amy1: since you want i+1 to stay below Nbody
<lindi-> Amy1: depending on the values for Nbody and Ndim you might benefit from chunking to keep the working set in the cache
<lindi-> Amy1: but without some benchmarks its pretty hard to say
<Amy1> lindi-: NBody is 4*1024, NDim is 3.
<lindi-> Amy1: in that case your memory access pattern is not very optimal
<Amy1> I think it is clear about the function.
<lindi-> Amy1: think about how the cache gets used
<Amy1> lindi-: ?
<lindi-> Amy1: do the memory accesses stay in the L1 cache?
<Amy1> I can exchage i and l. It will make pos and delta_pos 's access more locality.
<Amy1> But the performance improve very little.
<lindi-> Amy1: something like https://developers.redhat.com/blog/2014/03/10/determining-whether-an-application-has-poor-cache-performance-2/ might help you see if you get a lot of cache misses
<Amy1> lindi-: yeah, I used thi.
<lindi-> Amy1: simple floating point operation should be quite fast
<lindi-> so this is probably all memory-bound
<Amy1> yeah, you are right.
<Amy1> But I had tried many ways to improve it, only 1.2 speedup gained.
<Amy1> lindi-: do you have better solution.
<lindi-> switching to a GPU might help?
<Amy1> No, just could use single core.
<Amy1> But you can use simd.
<lindi-> is this some homework exercise? ;)
<Amy1> No, I think the question is very interesting.
<Amy1> Just want to optimize it.
<lindi-> sure, simd might help
Amy1 has quit [Quit: WeeChat 2.2]
Amy1 has joined #systemtap
Amy1 has quit [Quit: WeeChat 2.2]
Amy1 has joined #systemtap
tromey has quit [Quit: ERC (IRC client for Emacs 28.0.50)]
drsmith has quit [Ping timeout: 258 seconds]
drsmith has joined #systemtap
sscox has quit [Ping timeout: 260 seconds]
sscox has joined #systemtap
orivej has quit [Ping timeout: 256 seconds]
orivej has joined #systemtap
orivej has quit [Ping timeout: 265 seconds]
khaled has quit [Quit: Konversation terminated!]