#milkymist on 2011-10-22 — irc logs at freenode.irclog.whitequark.org

08:58 <wpwrak> wolfspraul: you'll like this: my 20000+ cycles run finally corrupted flickernoise, so i had to stop it and set things up again manually (the script only auto-recovers the standby partition)

08:59 <wpwrak> on that occasion, i checked the other partitions. guess what i found ? the first block had gotten locked, while the next four were unlocked. my lock setup script unlocks the first five blocks.

09:01 <wpwrak> so either this script had made an error (seems unlikely, but who knows) and i didn't check the locks bits (i normally do this, but i don't know if i also did it in this case), or ...

09:01 <wpwrak> ... the bus noise has actually generated the block locking pattern ;-)

09:05 <wpwrak> since the vast majority of corruptions happens in the first block, it then took a long time until the next hit in an unprotected partition

09:50 <wolfspraul> cannot follow

09:50 <wolfspraul> so the 'locking' our latest script is doing is believed to be effective or not?

09:50 <wolfspraul> in your case it sounds like an unlocked block went to locked?

11:45 <roh> heyho

11:45 <roh> simple question out of wondering.. why not use a modern flash chip which doesnt unlock so easy?

12:01 <wpwrak> wolfspraul: 2x yes :) the locking seems to work very well. i.e., i've had not a single instance of corruption of a locked block. so the partitions we can lock are safe.

12:02 <wpwrak> or rather, reasonably safe, because of the other problem

12:03 <wpwrak> the new observation is that an unlocked block seems to have gone to locked. i suspect, this is because the "random" pattern written to the NOR just happened to contain a lock command

12:04 <wpwrak> now, an unlock command would have a similar probability of being randomly generated. i don't really know the probability. one sample just tells us that it's possible, but little else.

12:07 <wpwrak> so there's a very low probability that murphy first concocts an unlock command (maybe with a 1-in-20000 probability), and then, in a later session, concocts a write to the unlocked block (maybe with a 1-in-500 probability)

12:08 <wpwrak> i'm not sure we really need to worry about this scenario. i'd consider it a mere curiosity. there must be other things that fail more often in real-life use

12:10 <wolfspraul> roh: the answer is because of available resources and opportunity cost

12:10 <wolfspraul> lots of things can be improved on m1 (tons), but which first?

12:10 <wolfspraul> want to 'improve' the nor chip - go ahead ;-)

12:11 <roh> wolfspraul: i am just seeing that you guys use _loads_ of time to fix that problems

12:11 <wolfspraul> find a suitable replacement, do a design verification, make sure software supports both old and new, make sure updates can still work with 1 binary or picking the right one

12:11 <roh> eh. i mean the 'corruption issue

12:11 <wolfspraul> not really. there is a specific bug, and Werner's approach is a little academic

12:11 <wpwrak> roh: they all lock/unlock with more or less the same ease. it's just that ours stays locked/unlocked, while the others go back to a hardwired state after the next reset :)

12:12 <wolfspraul> nobody can say that a more 'modern' nor chip would not show the same or other bugs

12:12 <roh> wolfspraul: sure. but after all.. thats what i like about werner.. if you know something you know WHY you know and how sureÂ Â ;)

12:12 <wolfspraul> for 'modern', the one we have is pretty much latest-gen, I believe

12:12 <wolfspraul> there are 'others', yes

12:12 <wolfspraul> and there is serial flash, and and and

12:12 <roh> sure

12:13 <wolfspraul> werner doesn't want to apply all fixes we have in mind and see whether we can still reproduce the bug

12:13 <wolfspraul> why?

12:13 <wolfspraul> because he is worried that he cannot reproduce the bug anymore...

12:13 <roh> i am not suggesting even changing the pinout... i was thinking about simply soldering a different chip

12:13 <wolfspraul> WITHOUT (god forbid) understanding why it went away :-)

12:13 <wolfspraul> which one?

12:14 <wolfspraul> I think with Werner's overkill methods the lifetime of the nor corruption bug is very limited by now

12:14 <wolfspraul> it's just a matter of starting the final attack

12:15 <wolfspraul> and if we can still reproduce the nor corruption even after applying all currently known fixes, well, then we have something new indeed

12:15 <wpwrak> a chip with a different locking scheme would probably be more robust, making the NOR corruption happen a lot less often, even if the root cause is never solved. but ... it would still have a number of vulnerabilities that actually matter

12:15 <roh> wpwrak: i see (not locking on reset) .. how did we get to that chip?

12:15 <wpwrak> duno. i wasn't involved ;)

12:16 <roh> wpwrak: just curious... i havent seen such behaviour yet

12:17 <wpwrak> it seems to be the very "old school" kind of locking

12:17 <roh> or to be exact.. i dont understand it... all chips i read the datasheet yet go to lock when reset

12:18 <wpwrak> i have a little concern that we may wear out the lock bits if reflashing often via jtag. but that's again a "special interest" problem

12:18 <roh> wpwrak: sure. different issue

12:19 <roh> i only wore out nor one time, and that was a dbox1

12:19 <roh> it took years of multiple flashing per week/day

12:19 <wpwrak> the wear may be a little worse in this case

12:20 <wpwrak> you're thinking of O(updates)

12:20 <wpwrak> here, it's O(updates*update_size)

12:20 <roh> eek

12:20 <wpwrak> ;-)

12:21 <wpwrak> that's because urtag does an unlock/lock cycle for each block, not taking into account that the unlock is global

12:21 <roh> meh. so we need to fix urjtag?

12:22 <wpwrak> now, i'm not 100% sure whether erasing an erased NOR bit really counts as a full cycle for wear, but since NOR life is generally specified in erase cycles, it may be prudent to assume so

12:23 <wpwrak> roh: it would at least not hurt

12:24 <wpwrak> wolfspraul: the nice thing about the "academic approach": you get to know all sorts of strange critters that live at the end of the bell curve. it's the kind of gremlins that cause those "unthinkable" accidents.

12:26 <wpwrak> as long as you don't run nuclear plants or equip airplanes, you may not worry about these so much. but then, there can be a shift in context that suddenly washes them towards the middle of the bell curve.

12:37 <wolfspraul> wpwrak: just trying to explain the status quo to roh in the shortest way possible :-)

12:37 <wolfspraul> a casual observer might think we are lost or stuck on the nor bug, but I totally don't see it like that...

12:38 <wolfspraul> it's more slowness by design :-)

12:38 <roh> wolfspraul: i dont think the shortest way is the best ;)

12:38 <wolfspraul> and of course, I agree, in the long run this approach is better, and we will surely reuse some of the tools we build today in the future

12:39 <roh> ack

12:44 <wpwrak> roh: i actually tried the fast approch at the beginning. had a set of very nice results just after two days. then it tried to do one last confirmation run. and well, that run completely disagreed with what i had found before ...

12:45 <roh> hrhr

14:29 <lekernel> any ideas to push some power at high frequencies through a transformer?

14:30 <lekernel> (e.g. single-transistor oscillators...)

14:30 <lekernel> it's for powering up a circuit working on the spinning part of a VCR cylinder

14:32 <wpwrak> maybe ask DocScrutinizer over at #qi-hardware

16:13 <larsc> what am i missing. if i write 'if (x) if (y)' ise generates 'not ((not ((not x) and y)) or y)' and if i write 'if (x && y)' it generates '(x && y)'

16:23 <lekernel> what do you mean, "ise generates" ?

16:23 <lekernel> what output are you examining?

16:27 <larsc> the ngr file