#systemtap on 2020-04-13 — irc logs at freenode.irclog.whitequark.org

2015-11-12 23:18 fche changed the topic of #systemtap to: http://sourceware.org/systemtap; email systemtap@sourceware.org if answers here not timely, conversations may be logged

01:10 orivej has quit [Ping timeout: 250 seconds]

01:29 khaled has quit [Quit: Konversation terminated!]

03:02 hpt has joined #systemtap

03:24 orivej has joined #systemtap

03:58 orivej has quit [Ping timeout: 256 seconds]

06:32 orivej has joined #systemtap

07:08 yogananth has joined #systemtap

08:54 khaled has joined #systemtap

09:37 hpt has quit [Ping timeout: 256 seconds]

13:30 tromey has joined #systemtap

13:35 yogananth has quit [Quit: Leaving]

13:48 khaled has quit [Remote host closed the connection]

13:51 khaled has joined #systemtap

14:26 khaled has quit [Quit: Konversation terminated!]

14:32 khaled has joined #systemtap

15:08 <lindi-> Amy1: for(i=0;i<Nbody;i++){ should be for(i=0;i<Nbody-1;i++){

15:09 <lindi-> Amy1: since you want i+1 to stay below Nbody

15:12 <lindi-> Amy1: depending on the values for Nbody and Ndim you might benefit from chunking to keep the working set in the cache

15:13 <lindi-> Amy1: but without some benchmarks its pretty hard to say

15:27 <Amy1> lindi-: NBody is 4*1024, NDim is 3.

15:28 <lindi-> Amy1: in that case your memory access pattern is not very optimal

15:28 <Amy1> I think it is clear about the function.

15:28 <lindi-> Amy1: think about how the cache gets used

15:28 <Amy1> lindi-: ?

15:29 <lindi-> Amy1: do the memory accesses stay in the L1 cache?

15:31 <Amy1> I can exchage i and l. It will make pos and delta_pos 's access more locality.

15:31 <Amy1> But the performance improve very little.

15:31 <lindi-> Amy1: something like https://developers.redhat.com/blog/2014/03/10/determining-whether-an-application-has-poor-cache-performance-2/ might help you see if you get a lot of cache misses

15:33 <Amy1> lindi-: yeah, I used thi.

15:33 <lindi-> Amy1: simple floating point operation should be quite fast

15:34 <lindi-> so this is probably all memory-bound

15:34 <Amy1> yeah, you are right.

15:35 <Amy1> But I had tried many ways to improve it, only 1.2 speedup gained.

15:35 <Amy1> lindi-: do you have better solution.

15:35 <lindi-> switching to a GPU might help?

15:37 <Amy1> No, just could use single core.

15:37 <Amy1> But you can use simd.

15:37 <lindi-> is this some homework exercise? ;)

15:39 <Amy1> No, I think the question is very interesting.

15:39 <Amy1> Just want to optimize it.

15:42 <lindi-> sure, simd might help

15:48 Amy1 has quit [Quit: WeeChat 2.2]

15:48 Amy1 has joined #systemtap

15:57 Amy1 has quit [Quit: WeeChat 2.2]

15:57 Amy1 has joined #systemtap

21:11 tromey has quit [Quit: ERC (IRC client for Emacs 28.0.50)]

21:40 drsmith has quit [Ping timeout: 258 seconds]

21:41 drsmith has joined #systemtap

22:02 sscox has quit [Ping timeout: 260 seconds]

22:21 sscox has joined #systemtap

23:06 orivej has quit [Ping timeout: 256 seconds]

23:07 orivej has joined #systemtap

23:28 orivej has quit [Ping timeout: 265 seconds]

23:37 khaled has quit [Quit: Konversation terminated!]