Topic for #qi-hardware is now Copyleft hardware - http://qi-hardware.com | hardware hackers join here to discuss Ben NanoNote, atben / atusb 802.15.4 wireless, and other community driven hw projects | public logging at http://en.qi-hardware.com/irclogs
compcube has joined #qi-hardware
compcube has quit [Changing host]
compcube has joined #qi-hardware
LunaVorax has quit [Ping timeout: 246 seconds]
urandom__ has quit [Remote host closed the connection]
nikescar has quit [Ping timeout: 246 seconds]
Hoolxi has joined #qi-hardware
guanucoluis has joined #qi-hardware
Hoolxi has quit [Remote host closed the connection]
compcube has quit [Quit: Leaving]
Ayla has quit [Quit: dodo]
Jay7 has quit [Quit: Conversation terminated]
Jay7 has joined #qi-hardware
<DocScrutinizer05>
and sometimes you hear in coding style lessons to use for ( stream = SNDRV_PCM_STREAM_LAST; stream >(=)0; stream-- ) for optimizing reasons (compare to 0 faster&shorter than compare to const)
<DocScrutinizer05>
anyway this form has the benefit of being to the point about how to interprete pragma-origin-0 and upper-limit
<DocScrutinizer05>
alternative:
<DocScrutinizer05>
for ( stream = --SNDRV_PCM_STREAM_AFTER_LAST; stream >(=)0; stream-- )
<DocScrutinizer05>
ooops
<DocScrutinizer05>
for ( stream = --SNDRV_PCM_STREAM_AFTER_LAST; stream >(=)0; --stream )
<DocScrutinizer05>
actually I learnt about pre- vs post-in/decrement as well, one is better than the other, alas I already forgot which is which
<DocScrutinizer05>
performance-wise
<DocScrutinizer05>
I guess pre-*crement is better since it doesn't require creating a temporary variable
<DocScrutinizer05>
unless your compiler is slightly smarter than any assembler and recognizes that there's no need for any such var
<DocScrutinizer05>
jr@halebop:~> time for (( i=0; i<300000; i++ )); do :; done #-> real 0m1.807s ### jr@halebop:~> time for (( i=300000; i>0; i-- )); do :; done #-> real 0m1.694s
<DocScrutinizer05>
sure, it's shellscript
<DocScrutinizer05>
and then there's been that one CPU that didn't know decrements ;-P
<DocScrutinizer05>
for (( i=-300000; i<0; i++ ))
guanucoluis has quit [Remote host closed the connection]
jekhor has joined #qi-hardware
kristoffer has joined #qi-hardware
nikescar has joined #qi-hardware
Hoolxi has joined #qi-hardware
rejon has joined #qi-hardware
<wpwrak>
since the invention of SSA, you can pretty much stop worrying about temporary variables :)
<viric>
ssa?
<lindi->
static single assignment
<wpwrak>
yup :)
<viric>
what's that like?
<lindi->
viric: you only assign each variable once in the IR
<wpwrak>
your compiler basically creates a new temporary variable for each basic operation
<viric>
ah.
<viric>
and that's any good?
<wpwrak>
afterwards, it clears that up and removes everything it doesn't actually need
<wpwrak>
yes, it gives you very nice semantics for optimization
<viric>
ok
<larsc>
DocScrutinizer05: your coding style lessons are from the 90ies ;)
<DocScrutinizer05>
sure
<DocScrutinizer05>
though held 6 months ago
<larsc>
e.g. there is performance wise no difference between i++; and ++i; with a modern compiler
<mth>
there can be a performance difference if you're using C++ iterators, but not for integers
<DocScrutinizer05>
I think I already mentioned that?
<DocScrutinizer05>
ooh yes, I forgot to recall to mention it is of course *very* depending on type of variable, and esp c vs c++
<DocScrutinizer05>
c++ you err compiler might even have to instanciate a new object
<DocScrutinizer05>
maybe not nowadays anymore, or more seldom nowadays
<larsc>
mth: yes, looks like a bug
<DocScrutinizer05>
but this ENEA guy told us that as well
<DocScrutinizer05>
and it took him like 2 sentences resp 10s to handle the whole topic
<wpwrak>
larsc: early 90es ;-)
<DocScrutinizer05>
c'mon guys, cpu still is faster to check register for 0 than against a constant
<viric>
++ pre and post operators can be implemented completely different, for objects.
<larsc>
DocScrutinizer05: depends on your arch
<DocScrutinizer05>
and that's sth a compiler usually can't auto-optimize
<mth>
larsc: will you fix it or shall I fix it?
<wpwrak>
depends. most compiler don't see beyond function calls, that's true. so they don't know whether there are side effects lurking.
<DocScrutinizer05>
I don't know any arch that wouldn't set flags on arithmetic operations like inc or whatever
<wpwrak>
but if there are no calls or all the functions are inline (or maybe static), then there's a good chance the compiler can rearrange that check if it thinks it makes a difference
<DocScrutinizer05>
so comparing against 0 is basically for free
<larsc>
mth: I'm lazy today
<mth>
ok, I'll make a patch then; can I include your name in a header?
<wpwrak>
i'm lazy too. so i won't look up one that doesn't set flags on inc/dec ;-)
<larsc>
mth: sure
<DocScrutinizer05>
that for sure was too much for a lazy weekend
<DocScrutinizer05>
;-D
<wpwrak>
but the point is that the compiler should be perfectly capable of changing the loop itself, provided the context isn't spread out too widely (in which case the optimization is irrelevant anyway)
<DocScrutinizer05>
compiler can't optimize a inc loop to a dec loop, since that's way beyond what compiler understands about code semantics
<larsc>
there are archs where you don't jump on flags, but the jmp or branch instruction does a compare
<larsc>
so it does not really matter whether inc/dec sets a flag or not
<DocScrutinizer05>
dbnz
<DocScrutinizer05>
yes, I know
<DocScrutinizer05>
I don't know a ibnz instruction
<DocScrutinizer05>
ibl
<DocScrutinizer05>
even
<DocScrutinizer05>
was it 8080 or z80 that invented decrement-and-branch-if-non-zero instruction?
<DocScrutinizer05>
this training lesson was targeted at embedded. The CPUs there are amazingly stupic little things sometimes. And still today you can find those
<DocScrutinizer05>
stupid*
<DocScrutinizer05>
I strongly suspect a compiler will rather create a second variable it increments, inside a djnz loop, rather than comparing this variable against a constant for loop termination
<DocScrutinizer05>
of course when you're using a dec loop terminating at zero, the compiler can use the djnz register
<DocScrutinizer05>
wpwrak: and I really wanna see the compiler smart enough to usually detect this case and convert for (i=0, i<10,i++) to for(i=9, i>=0, i--)
<DocScrutinizer05>
it needs thorough understanding of all that's done inside the loop with i, and the side effects
<larsc>
DocScrutinizer05: if i for example is not used inside the loop the compiler will do it
<larsc>
if it actually is faster
<DocScrutinizer05>
yep, then it's safe
<DocScrutinizer05>
and that's the only safe case the compiler is 'smart' enough to detect
<DocScrutinizer05>
I'd say
<DocScrutinizer05>
anyway even bashscript is faster on comparing to 0 ;-D
<larsc>
if it can be sure that the loop is side effect free, I'd say I can also optimize other cases. E.g. summing up a array
<DocScrutinizer05>
and while I've been aware of that since 1980, I tend to forget it in daily coding
<DocScrutinizer05>
securing side effect absence isn't trivial
<DocScrutinizer05>
it's possible 95% of cases, as long as you got 'clean' architecture
<DocScrutinizer05>
means no mem-mapped IO that the compiler doesn't know about, etc
<DocScrutinizer05>
even then in a multitasking environment there always might be differences
<wpwrak>
and you didn't even enable optimization ;-)
<DocScrutinizer05>
larsc: sure, but in one of the cases your results may be bogus
<wpwrak>
DocScrutinizer05: how so ?
<DocScrutinizer05>
well, not in this very example
<wpwrak>
qed :)
<DocScrutinizer05>
for (i = 0; i < 10; i++); if (i=0 || i=1) {while x[i]==0 {usleep 1000} } else { tmp += x[i] }; return tmp;
<DocScrutinizer05>
ooops
<DocScrutinizer05>
for (i = 0; i < 10; i++); if (i==0 || i==1) {while x[i]==0 {usleep 1000} } else { tmp += x[i] }; return tmp;
<wpwrak>
luckily, the loop optimization should be very simple here ;-)
<DocScrutinizer05>
sure, you shouldn't do that inside the loop anyway
<wpwrak>
do what ? ;-)
<wpwrak>
your loop is empty
<DocScrutinizer05>
eh?
<wpwrak>
well, the for loop
<DocScrutinizer05>
meh
<wpwrak>
and the while loop won't compile, so .. :)
<DocScrutinizer05>
I can spam the chan with another 5 to 10 typo fixed verions
<DocScrutinizer05>
though it's a pita to edit in this IRC client
<wpwrak>
the main problem in your example is the usleep anyway, because there's no good way to tell the compiler that the side effect in question is a delay
<DocScrutinizer05>
err wut?
<wpwrak>
ah, it's an endless loop. i see.
<DocScrutinizer05>
the side effect in question is another process initializing the array with values and only _then_ set the semaphore in x[0] x[1]
<wpwrak>
there are a few "volatile" missing already for correctness ;-)
<DocScrutinizer05>
good point
<wpwrak>
and obviously, if you tell the compiler explicitly to avoid any optimizations, you shouldn't be too disappointed if it indeed doesn't optimize :)
<DocScrutinizer05>
though I'm not sure you need volatile for storage that can get changed by CPU only
<wpwrak>
you're confused :)
<DocScrutinizer05>
I'm just saying it mustn't optimize in that case
<wpwrak>
if you add the right volatiles, it won't optimize
<DocScrutinizer05>
since (it seems I should explain my code) when the loop reads the array from x[9] down to x[0], then the test for semaphore set and usleep will be done last, not first
<DocScrutinizer05>
hell, I'm not setting volatile for all global stroage that might get accessed by other processes
<wpwrak>
"volatile" tells the compiler that the access sequence must not be changed
<wpwrak>
then your code is broken :)
<wpwrak>
of course, since function calls are usually a barrier, too, you can get away with that in many cases. but the sooner or later, you'll run into trouble.
<DocScrutinizer05>
any static var already implies that there may be (possibly concurrent) access to the storage from other location in code, no?
<DocScrutinizer05>
I mean, that's one of the reasons you get static var
<wpwrak>
concurrency is never assumed unless you explicitly indicate it with "volatile"
<wpwrak>
what ?
<DocScrutinizer05>
err, yes I'm confused. Please wait until my coffe had time to kick in
<wpwrak>
in your coding class, they shouldn't waste time with prehistoric optimization techniques and instead teach some basics ;-)
<DocScrutinizer05>
meh
<DocScrutinizer05>
haha
<DocScrutinizer05>
a *global* var (scope beyond local function) usually is defined that way because other functions may want to access it as well
<larsc>
yes, but C assumes that no two functions run at the same time
<DocScrutinizer05>
in the example *x is "form outside"
<wpwrak>
larsc: ... unless one calls the other (possibly via intermediates)
<DocScrutinizer05>
larsc: C assumes what? No taskswitching allowed during execution of int foo(int *x) ?
<larsc>
DocScrutinizer05: exactly
<wpwrak>
anyway, gotta get things ready for today's barbecue. 24 C predicted for this chilly mid-winter day :)
<DocScrutinizer05>
now that sounds absolutely BS. How's preemptive multitasking going to work with that?
cladamw has joined #qi-hardware
<larsc>
DocScrutinizer05: Well if you run two processes, your two processes run in different contexts and they'll never see each other
<larsc>
but if you have two processes which share memory you need to explicitly protect that memory against concurrent access
<DocScrutinizer05>
ummpf, what if I got a IRQ service callback function in same module like int foo(int *x)
<larsc>
if that IRQ service routine also accesses your x you need to explicitly write your code to handle that case
<DocScrutinizer05>
I did, by defining x[0] x[1] as semaphore
<larsc>
well, see as no native semaphores
<larsc>
s/see/C/
<qi-bot>
larsc meant: "well, C as no native semaphores"
<DocScrutinizer05>
and I assumed compiler better not comverts this loop to a decrement loop for optimization
<larsc>
s/as/has/
<qi-bot>
larsc meant: "well, C has no native semaphores"
<larsc>
it won't. but that's because it does not no what happens inside of usleep
<larsc>
s/not no/not know/
<qi-bot>
larsc meant: "it won't. but that's because it does not know what happens inside of usleep"
<larsc>
what the fuck is wrong with my typing today?
<lindi->
yeah usleep does a syscall so it is quite hard to optimize
<DocScrutinizer05>
I *might* kick out the usleep (though that's for sure rogue)
<lindi->
for library calls some optimizations are possible though, like converting printf with constant arguments to puts
<lindi->
or printf("%c", c) to putchar(c)
<DocScrutinizer05>
replace that by a simple return(-ETOOFAST) when x[0]==0
<DocScrutinizer05>
still a pretty nice race when foo() sums up bogus values for x[2..9] and then finds meanwhile semaphore in x[0] got set so everything seems to be fine
<larsc>
gcc has extensions like __attribute__(("pure")) which tells the compiler that the function has no sideeffects
<lindi->
larsc: llvm IR has readonly and readnone attributes that are similar
<DocScrutinizer05>
the whole point of my reasoning been that compiler has a hard time to decide if there are side effects in a for-loop that might bite you when compiler optimizes loops from increment to decrement
porchao has joined #qi-hardware
porchaso0 has quit [Ping timeout: 276 seconds]
<larsc>
lindi-: I think llvm, well clang, should also support __attribute__(("pure"))
<DocScrutinizer05>
take a simple memmove shifting each byte down one position. Do you think compiler is smart enough to see it will fail miserably when you change sequence? like x[8]=x[9]; x[7]=x[8]...
<lindi->
larsc: yeah, I was talking about how it is represented in the IR
<larsc>
DocScrutinizer05: it must see this
<DocScrutinizer05>
no sarcasm, really wondering if compilers might be smart enough for that
<mth>
larsc: I'm testing the pwm changes and there is one annoying effect: the numbering changes
<mth>
JZ_GPIO_PWM2 becomes pwm 0 etc
<mth>
should we just accept that? request that the pwm core can handle mappings that don't start at 0? or register JZ_GPIO_PWM0 and JZ_GPIO_PWM1 and return EBUSY if someone tries to use them?
<mth>
the hardware does have 8 PWMs, it's just that timer 0 and 1 are in use by the kernel and therefore not available as PWMs
jluis has quit [Read error: Connection reset by peer]
<larsc>
mth: e.g. for jz4760 timer 0 and 1 are not used by the kernel, so I think it should be fine
<larsc>
I don't really get patch 1 of that series
<larsc>
but I might be missing someting
<larsc>
Btw. I met with Thierry last week, seems to be a nice guy
<mth>
patch 1 wants to get rid of a header named "irq.h" in the current directory (for 4740 arch) since it gets included instead of something from the header search path
<mth>
I don't know what that something is
<mth>
maybe the MIPS version of irq.h?
<larsc>
but we include irq.h with "" instead of <>
<mth>
hmm, so it's the one that is supposed to be included
<mth>
the MIPS version is called asm/irq.h, so unless the "asm" path is in a search path as well, it cannot be masked by just "irq.h"