leiyin 发表于 2014-10-17 16:12:06

C8051F360/1/2/3系列单片机BUG问题

今天收到新华龙公司的电话和邮件,说是360出现了一个BUG,请我们协助测试,邮件内容如下:
As many of you know we have been unable to ship the C8051F36x (100 MHz operation). After nearly 8 years of production a bug was found. This bug is extremely rare and only happens under a very specific set of circumstances (see attached Errata_Candidate.doc for details). I want to assure you that we are working hard to track down root cause. Once root cause is established we can make a decision on how to proceed. I will continue to provide updates as I get new information.
 
Action:
We know that all customers cannot wait until we track down and fix root cause so we have created an interim solution that will allow customers to keep shipping. The interim solution requires that you analyze the customer’s “HEX Image”. Here are the steps.
 
1)      Secure a copy of the customer’s “HEX image” (.hex <Intel HEX Format>) from the project directory. This is essentially a copy of the flash contents. This is a file that is created during compilation and is need for analysis.
2)      Run scan.py script (attached) in the same directory as the “Hex Image”.
a.       To access the script – Unzip the attached .zip file anywhere
b.      Drop hex files to be scanned into directory with script
c.       Double click script (.py) if you have python it will run.
d.      If you don’t have python install it.
                                                               i.      Python 3 for windows https://www.python.org/ftp/python/3.4.2/python-3.4.2.msi
                                                             ii.      You can find python 3 for mac and Linux as well.
3)      If the script passes you will not have an issue. Notify your customer service Rep and we will release product to that customer. All product is on marketing hold so it is important to communicate that you analyzed the code.  If the script fails go to Step 4.
 
EXAMPLE: The script analyzed two unique files example.hex and F360_QI_1775.hex


 
Failure of the script does not mean that your customer’s design has a problem it means that further analysis is required by Apps.
4)      Get a copy of the customer’s design files and project
5)      Create a support ticket. Include a copy of the project and the output of the script.
6)      If the customer refuses to send source then submit a support ticket requesting help. Apps will need to get on the phone with the customer and walk them through analysis


附带的 Errata_Candidate.doc 内容如下:
Errata for C8051F360/1/2/3/4/5 (100 MHz devices only. Does not apply to
C8051F366/7/8/9)
We have discovered a speed path issue in the CPU that causes an execution failure for the
“CPL C” (Complement Carry bit) instruction under a narrow set of conditions involving an
instruction order dependency.The probability of failure is increased at higher temperatures,
lower power supply voltage, and higher system clock frequencies.
The failure mode is that if the Carry bit contains a ‘1’ prior to the execution phase of the
“CPL C” opcode, the Carry bit will remain a ‘1’ after the execution phase of the opcode has
completed.If the Carry bit contained a ‘0’ prior to the execution of the “CPL C” opcode, it
will properly transition to a ‘1’ when the execution phase of the opcode has completed.This
is illustrated in the following table:
Correct operation            Correct operation             Failure case
initial state of ‘C’ is ‘1’       initial state of ‘C’ is ‘0’       initial state of ‘C’ is ‘1’
CPL C                            CPL C                           CPL C
final state of ‘C’ is ‘0’   final state of ‘C’ is ‘1’            final state of ‘C’ is ‘1’
The instruction order dependency is as follows:
In the failure case, the CPL C opcode must be immediately preceded by a JB, JNB, or JBC
opcode.
JB, JNB, and JBC are all conditional branch instructions (JB is “Jump if bit is set”, JNB is
“Jump if bit is not set”, and JBC is “Jump if bit is set and clear bit”).   Because the branches
are conditional, they have both a “branch taken” condition as well as a “branch not taken”
condition.Both “branch taken” and “branch not taken” conditions may exhibit the error, as
long as the CPL C opcode executes immediately after the branch instruction has executed.
Recommendations:
The bug can be avoided by removing the instruction combination of JB / JNB / JBC opcodes
followed by the CPL C opcode.
Silicon Labs has developed a hex file scanner that can be used to determine if a code
project possesses the instruction sequence above.Instructions for using the scanner, as
well as details regarding the scanner’s operation can be found here:
<<insert TBD KB article link>>

简单说就是CPL C这个指令 C初始状态是1的情况,可能出现 CPL C 无效的错误

测试方法也很简单,下载了scan.zip, 下载了python安装后,

把自己程序工程文件放到 与scan.py 同意一个目录,下直接运行scan.py,如果显示 OK则没有问题


leiyin 发表于 2014-10-17 16:13:24

附件是扫描文件

johnlj 发表于 2014-10-17 16:35:39

这么严重的问题啊,而且好像目前无解

leiyin 发表于 2014-10-17 22:53:47

多多测试!

modbus 发表于 2014-10-18 23:16:37

做产品最怕芯片有BUG

jarodzz 发表于 2014-10-19 16:37:53

Timing issue!

Nuker 发表于 2014-10-21 12:41:25

的确是Timing issue
The probability of failure is increased at higher temperatures,
lower power supply voltage, and higher system clock frequencies.

yaho007 发表于 2014-10-21 16:09:05

英文邮件表示压力很大
页: [1]
查看完整版本: C8051F360/1/2/3系列单片机BUG问题