fche changed the topic of #systemtap to: http://sourceware.org/systemtap; email systemtap@sourceware.org if answers here not timely, conversations may be logged
lijunlong has joined #systemtap
<lijunlong> fche stap got the wrong vma address when the first tow segments should be joined together.
<lijunlong> the patch is very simple. in function stap_find_vma_map_info, we should use "addr >= entry->vm_start && addr <= entry->vm_end", so the first tow segment can be joined.
khaled has quit [Quit: Konversation terminated!]
khaled has joined #systemtap
lijunlon1 has joined #systemtap
lijunlon1 has quit [Client Quit]
lijunlon1 has joined #systemtap
lijunlong has quit [Quit: Connection closed]
orivej has quit [Ping timeout: 252 seconds]
<fche> lijunlon1, ^^ kerneltoast
<kerneltoast> fche, yeah he's a colleague
<kerneltoast> sscox you should try this
<fche> (not sure why joining vs. having a new adjacent entry should make a difference, but haven't looked into it lately)
khaled has quit [Ping timeout: 246 seconds]
khaled has joined #systemtap
<lijunlon1> _stp_umodule_relocate find the base address by task_group and _stp_module, only the first find entry is used.
<lijunlon1> stap_find_vma_map_info_user(tsk->group_leader, m, &vm_start, NULL, NULL)
<lijunlon1> the new adjacent entry is added by hlist_add_head_rcu so will be found first.
orivej has joined #systemtap
fdalleau_away is now known as fdalleau
fdalleau is now known as fdalleau_away
fdalleau_away is now known as fdalleau
<fche> ISTM stp_umodule_relocate could avoid stopping at the first entry and do a more complete traversal ... so entry -extension- is just an optimization to reduce number of entires, not the result of the lookup
<lijunlon1> if do a more complete traversal, how do we know when to stop?
<fche> if we sort address ranges, as soon as we are past the current one of interest
Deknos has left #systemtap [#systemtap]
<lijunlon1> I look into the related code again. And You are right.
<fche> evergreen :)
<lijunlon1> so just change from hlist_add_head_rcu to hlist_add_tail_rcu should fixed this issue.
<lijunlon1> But a complete regression test is needed
<lijunlon1> since we only get the first start_address in stap_find_vma_map_info_user
<lijunlon1> there is not need to do a traversal
<lijunlon1> fech, I think we should only keep one entry, and always extend the end_addr if the module has ever added.
<lijunlon1> [root@fedora-lijunlong ljl]# cat /proc/931092/maps | grep "bin/php"
<lijunlon1> 5582fffd5000-5583000a5000 r--p 00000000 fd:00 25875616 /usr/bin/php
<lijunlon1> 5583001d5000-558300439000 r-xp 00200000 fd:00 25875616 /usr/bin/php
<lijunlon1> 5583005d5000-55830069a000 r--p 00600000 fd:00 25875616 /usr/bin/php
<lijunlon1> 55830094b000-5583009d5000 r--p 00776000 fd:00 25875616 /usr/bin/php
<lijunlon1> 5583009d5000-5583009d7000 rw-p 00800000 fd:00 25875616 /usr/bin/php
<lijunlon1> if we want to get the relative address, we should use the addr - vm_start of the first map which is 0x5582fffd5000
<lijunlon1> the relative loaded addresses are different from offset in the file. Got from readelf
<lijunlon1> readelf -l
<lijunlon1> Program Headers:
<lijunlon1> FileSiz MemSiz Flags Align
<lijunlon1> 0x00000000000cf8a0 0x00000000000cf8a0 R 0x200000
<lijunlon1> Type Offset VirtAddr PhysAddr
<lijunlon1> LOAD 0x0000000000000000 0x0000000000000000 0x0000000000000000
<lijunlon1> LOAD 0x0000000000200000 0x0000000000200000 0x0000000000200000
<lijunlon1> 0x0000000000263bd5 0x0000000000263bd5 R E 0x200000
<lijunlon1> LOAD 0x0000000000600000 0x0000000000600000 0x0000000000600
<lijunlon1> 0x00000000000c4128 0x00000000000c4128 R 0x200000
<lijunlon1> LOAD 0x0000000000776b50 0x0000000000976b50 0x0000000000976b50
<lijunlon1> 0x000000000008aa9e 0x00000000000a7e40 RW 0x200000
<lijunlon1> for now, _stp_kallsyms_lookup use "rel_addr = addr - vm_start + sect_offset;" to get the relative address which is wrong.
<lijunlon1> but most of the time, the offset is equal to VirtAddr. As a result, we rarely encounter errors. In particular, we only look for function names most of the time.
<lijunlon1> the last load segment offset is 776b50 and VirtAddr is 976b50 which is different.
lijunlon1 has quit [Remote host closed the connection]
<fche> and this is with that hugepagesize binary right?
khaled has quit [Quit: Konversation terminated!]
orivej has quit [Ping timeout: 252 seconds]
fdalleau is now known as fdalleau_away
orivej has joined #systemtap
<agentzh> fche: i've forwarded your message to Junlong. he seems still be fighting with his irssi setup...
khaled has joined #systemtap
orivej has quit [Ping timeout: 265 seconds]
tonyj has quit [Ping timeout: 240 seconds]
tonyj has joined #systemtap
tonyj has quit [Ping timeout: 252 seconds]
tonyj has joined #systemtap
<sscox> kerneltoast thanks; I'll give it a spin
orivej has joined #systemtap
lijunlong has joined #systemtap
<lijunlong> fech, yes. this is with that hugepagesize binary