I. Maintenance Platform / Tool / Equipment Preparation Requirements
1. Platform requirements:
Rubber blanket repair workbench (workbench needs to be grounded), anti-static wrist strap and grounding.
2. Equipment requirements:
Constant temperature soldering iron (350℃ - 380℃) and pointed soldering iron tip are used for soldering small patches such as chip resistors and capacitors; portable desoldering gun and BGA repair station are used for chip / BGA disassembly and soldering; Fluke 15B+ multimeter with soldering steel pin and heat - shrinkable T bush is used for easy measurement; Oscilloscope, network cable.
3. Requirements for test tools:
APW12 power supply (APW12_12V-15V_V1.2 and power adapter cable, use thick copper wires to connect the power supply at positive and negative poles and the hash board. It is recommended to use 4AWG copper wires with a length of 60cm or less) for the hash board. Next, use the test fixture of the V2.2010 control board. The positive and negative poles of the tester need to be installed with discharge resistors. It is recommended to use a cement resistor of 25 ohms and more than 100W.
4. Maintenance auxiliary materials / tool requirements:
Solder paste, flux, board washing water with absolute alcohol; board washing water is used to clean up the solder residue after repair; thermal grease is used to smear the surface of the chip after repair; tin tool steel sheet, tin removal wire, solder ball (the ball diameter is recommended to be 0.4mm); when replacing a new chip, you need to tin the chip pins and then solder to the hash board, and then lock large heat sinks after applying thermal grease evenly on the chip surface; port adapter board RS232 / TTL adapter board 3.3V; self - made short - circuit probe.
5. Common maintenance spare material requirements:
0402 resistances (0R, 51R, 10K, 4.7K); 0402 capacitors (0.1uf, 1uf)
II. Maintenance Requirements
1. Pay attention to the operation method when replacing the chip. After replacing any accessories, the PCB board shall have no obvious deformation. Check the replacement parts and the surrounding parts for open and short circuit issues.
2. The maintenance operators must have specific electronic knowledge, more than one year of maintenance experience, and be proficient in BGA / QFN / LGA package soldering technology.
3. After repairing, the hash board must be tested more than two times to be OK before it can pass!
4. Check whether the tools and test fixtures can normally work, determine the maintenance station to test software parameters, test fixture versions, etc.
5. In the test of repairing and replacing the chip, the chip needs to be tested first, and then the function test shall be performed after passing. The function test must ensure that the small heat sink is welded qualified. When installing the large heat sink, the surface of the chip must be evenly coated with thermal grease, and the cooling fan shall be at full speed. When using the chassis to dissipate heat, 2 hash boards should be placed simultaneously to form an air duct. The single - sided test of the production should also ensure that the air duct is formed (important).
6. When measuring the signal, use 4 cooling fans to assist heat dissipation, and the fans shall maintain full speed.
7. When powering on the hash board, the miner must first connect the negative copper wire of the power supply, then the positive copper wire of the power supply, and finally plug in the signal cable. When removing, the order of installation must be reversed. First, remove the signal cable, then remove the positive copper wire of the power supply, and finally remove the negative copper wire of the power supply. If the user does not follow this order, it is very easy to cause damage to R233 and R232 (not all chips can be found). Before testing the pattern, the repaired hash board must be cooled down before testing. Otherwise, it may cause PNG.
8. To replace a new chip, printing pins and soldering paste are required to ensure that the chip is pre-soldered and then soldered to the PCBA for repair.
III. Test Fixture Making and Matters Needing Attention
The supporting fixture of the test fixture should satisfy the heat dissipation of the hash board and facilitate the measurement of signals.
1. Use the 19 series test fixture SD card reflash program for the first time to update the test fixture control board FPGA, unzip it and copy it to the SD card, and then insert the card into the test fixture card slot; power on for about 1 minute and wait for the control board indicator to double flash 3 times, then the update is completed; (if it is not updated, it may cause a certain chip to report fault during the test)
2. Make the SD testing card according to the requirements, and directly unzip the compressed package of the single - sided heat sink inspection chip to make the SD card.
3. The SD testing card will be made according to the requirements, and the double - sided heat sink 8 -times Pattern test needs to make an SD card, as shown in the figure below.
1) Delete the original config file after unzipping.
2) Name the original Config.ini - BHB42601-PT1 file as Config.ini for PT1 and name the original Config.ini - BHB42601-PT2 file as Config.ini for PT2.
IV. Principle Overview
1. Working structure of S19j Pro hash board:
The Antminer S19j Pro hash board is composed of 126 BM1362 chips, which are divided into 42 groups (domains), and each group is composed of 3 ICs; the operating voltage of the BM1362 chip used in the S19j Pro hash board is 0.32V; for the 42nd, 41st, 40th, 39th, 38th, 37th, 36th group (7 groups in total), LDO is powered by the 20V output from the boost circuit U238 and outputs 1.2V and 0.8V. The 35th group LDO is supplied by VDD 15V, and the voltage of each domain retreated is reduced by 0.32V. As shown in the below figure.
2. S19j Pro hash board boost circuit:
The boost circuit transfers the 15V powered by the power supply to 20V, as shown in the below figure.
3. Signal trend of S19j Pro chip:
1) CLK (XIN) signal flow direction: generated by Y1 25M oscillator, transmitting from chip 00 to chip 125; voltage of 0.5V - 0.6V;
2) TX (CI, CO) signal flow direction: from IO port pin 7 (3.3V) to IC U4 through level conversion, and then transmitting from chip 01 to chip 126; the voltage is 0V when the IO cable is not inserted, and the voltage is 1.2V during operation;
3) RX (RI, RO) signal flow direction: from chip 125 to chip 00, return to the signal cable terminal pin 8 through U2 and then return to the control board; when the IO signal cable is not inserted, the voltage is 0.3V, and the voltage will be 1.2V during operation;
4) BO (BI, BO) signal flow direction: from chip 00 to chip 125; the Fluke 15B+ multimeter measurement value is 0V;
5) RST signal flow direction: the RST signal flow is from pin 3 of the IO port to IC U1 through level conversion and then is transmitted from chip 00 to chip 125 after level conversion; if no IO signal cable is inserted and equipment is in standby, voltage is 0V, and the voltage is 1.2V when operating;
4. Whole miner architecture:
V. Common Faults and Troubleshooting Steps of the Hash Board
Phenomenon 1: single hash board test detection chip is 0 (PT1 / PT2 stations)
Step 1: Check the power output first.
Step 2: Check the voltage output in the voltage domain
The voltage of each voltage domain is about 0.32V. If there is a 15V power supply, generally, it has domain voltage. The priority is to measure the output of the power supply terminal of the hash board and check whether the MOS is short - circuited (measure the resistance between pins 1, 4, and 8). If 15V has a power supply but no domain voltage, continue to check.
Step 3: Check the PIC circuit
Measure whether there is output on the second pin of U6 and the voltage is about 3.3V; if yes, please continue to troubleshoot the problem; if there is no 3.3V, please check the connection status of the test fixture cable and the hash board is OK, and reprogram the PIC.
PIC programming steps:
1. PIC program programming on the hash board. Program: 20200101-PIC1704-BM1398- V89.hex
Download the programming tool: PICkit3; pin 1 of the PICkit3 cable corresponds to pin 1 of J3 on the PCB, and pins 1, 2, 3, 4, 5, and 6 need to be connected.
2. Programming software:
Open MPLABIPE and select device: PIC16F1704, click power to select the power supply mode, then click operate. The first step: select "file" to find the HEX file to be programmed; the second step: click "connect" to connect normally; the third step: click the "program" button, and after completion, click "verify" to prompt the verification to prove that the programming is successful.
Step 4: Check the boost circuit output; C915 in the figure can measure 20V voltage.
Step 5: Check the output of each group of LDO 1.2V or PLL 0.8V
Step 6: Check the chip signal output (CLK / CI / RI / BO / RST)
Refer to the voltage value range described by the signal trend. If the measurement encounters a large deviation of the voltage value, it can be compared with the measured value of the adjacent group to determine.
When the EEPROM NG is displayed on the LCD screen of the test fixture, check whether the welding of U10 is normal;
If the ‘PIC sensor NG’ is displayed on the LCD screen of the test fixture and the tested temperature is abnormal, then follow the steps below to troubleshoot:
1. Check whether the 4 resistors of R217, R218, C22 and C23 are welding abnormally, and check whether the welding of PIN of U5 is normal;
2. Check whether the four temperature sensors of U5, R216, R219, R220, U7, R221~R223, U8, R224~R226, U9, R229~R231, and the matching resistance welding are abnormal, the location of the temperature sensor is as shown in the figure, the temperature sensor is all located on the back of the PCB, the resistance is located on the front and back of the PCB, and whether the temperature sensor 3.3V power supply is normal; Check the welding quality of the chip which connects sensor and the small heat sink. The deformation of large heat sink material will cause poor chip heat dissipation and affect the temperature difference.
Phenomenon 2: single hash board detection chip is not complete (PT1 / PT2 stations)
1. LCD shows ASICNG: if (0), first, measure the total voltage of the measuring domain and the boost circuit 20V is normal, and then use the short - circuit probe to short - circuit the RO test point and the 1V2 test point between the first and the second chip, and then operate the program to find the chip. Check the serial port log; if 0 chip is still found at this time, it will be one of the following situations:
1) Use the Fluke 15B+ multimeter to measure whether the voltages at the 1V2 and 0V8 test points are 1.2V or 0.8V. If not, it indicates that the 1.2V or 0.8V LDO circuit of this domain is abnormal, or the two ASIC chips of this domain are not soldered well; most conditions of this phenomenon are caused by short circuits of 0.8V, 1.2V patch filter capacitors (measure the resistance of the patch filter capacitors related to the front and back of the PCBA.)
2) Check whether the circuits of U1 and U2 are abnormal, such as resistance welding, etc.
3) Check if the pins of the first chip are soldered well (it was found during repair that the pins are tinned observing from the side, but the pins are not stained with tin at all when the chip is removed.)
2. If one chip can be found in step 1, it indicates that the 1st chip and the previous circuit are good. Use a similar method to check the next chips. For example, short-circuit the 1V2 test point and the RO test point between the 38th and 39th chips. If the log can find 38 chips, the first 38 chips have no problem; if you still find 0 chips, check the 1V2 first; if it's normal, it means that there is a problem with the chips after 38. Continue to investigate with dichotomy until the problematic chip is found. For example, assuming that there is a problem with the Nth chip, when the 1V2 and RO between the N-1th and Nth chips are short-circuited, N-1 chips can be found, but when the 1V2 and RO between the Nth and N+1th chips are short-circuited, the entire chip cannot be found.
3. LCD shows ASIC113: (Reporting 113), which means that the hash board can detect 114 chips at 115200 baud rate, but only 113 chips are found at 12M baud rate, and one chip could not be found at 12M baud rate.
Repair method: Using the dichotomy method, short-circuit the 1V2 test point and the RO test point between the 38th and 39th chips with the short-circuit probe. If the log can find 38 chips, there is no problem with the first 38 chips; if short-circuiting the 47th chip but the log reports 46th, it indicates that the 47th chip cannot be detected, and there is no problem with the visual inspection. Generally, the 47th chip shall be replaced.
4. LCD shows ASICNG: (X, a certain chip is fixedly reported); there are two situations:
1) The first case: the test time is the same as the good board (usually, the value of X will not change each time you test) (test time refers to the time from when the start test button is pressed to the time when the result of ASICNG: X is displayed on the LCD). This situation is likely caused by the abnormal resistance welding of the front and rear CLK, CI, and BO of the Xth chip, so miners shall focus on these 6 resistors. The small probability is due to X-1, X, X+1, that is, among the three chips, the following pins abnormal welding conditions of the chip occur;
2) The second case; the test time is almost twice as long as the good board (sometimes the value of X will change every time you test, and sometimes X = 0); at this time, the log usually has the following information (sometimes the red number is not 13, depending on which seat the test fixture is connected to); during the test, assume that the domain voltage of all the fields in front of the abnormal position is almost less than 0.3V, and the domain voltage of the back fields are almost all higher than 0.34V. This situation is caused by the chip not being soldered well; usually, 1.2V, 0.8V, RXT, and CLK are not soldered well. It is recommended to directly measure the domain voltage to locate which domain is the problem. The 1V2 and RO short - circuit methods used in section 1 can also locate the abnormal position;
Phenomenon 3: Single hash board Pattern NG, indicating that the response nonce data is incomplete (PT2 station)
Pattern NG is caused by the significant difference between the characteristics of the chip and other chips. At present, it is found that the chip die is damaged, so just replace the chip. According to the log information, the replacement rules are as follows:
If the appearance of the chip is not damaged, replace the chip with the lowest response rate in each domain. For example, the following figure shows one of the test logs; it can be seen from the log that the response rate of four chips asic      is low. Therefore, replace the 61st (62nd) chip.
PS: Special attention shall be paid to the that the numbers of domain and asic start from 0.
Phenomenon 4: The chip test is OK, but the PT2 function test serial port does not stop (long - distance running)
Repair method: during the PT2 test, check the serial port print log. When the serial port starts to operate for a long time, use a short - circuit probe to short - circuit RO&1.2V. The short - circuit starts from the first chip. If the serial port stops long - term operating after the short circuit, the first chip is OK. According to this method, find the chip that still has the long - term operating failure after a certain chip is short - circuited. Generally, it is caused by a certain chip damaging, so just replace it;
Phenomenon 5: PT1 chip test is OK, PT2 function test always reports a certain chip NG;
Repair method: check the appearance, measure the chip capacitor or resistance in front, usually it's caused by poor chip soldering or a chip capacitor, resistor damaging or abnormal resistance;
VI. Control Board Problem Causing the Following Problems
1. The whole miner does not work
1) Check whether the voltages at several voltage output points are normal. U8 can be disconnected first if 3.3V is short - circuited. If it is still short - circuited, the CPU can be unplugged for measurement. For other voltage abnormalities, generally replace the corresponding converter IC;
2) If the voltage is normal, please check the welding status of the DDR/CPU;
3) Try to update the flash program with an SD card;
If the miner of which the control board has been reflashed needs to start normally, the following two steps are required:
a. After the card recovery is passed, the green LED indicator will be always on, and the power shall be restarted;
b. Wait for the 30s after powering on again (the time course of turning on OTP)
c. OTP (One Time Programmable) is a memory type of MCU, which means one-time programmable: after the program is programmed into the IC, it cannot be changed and cleared again;
1) Sudden power failure during OTP or time of less than 30s will cause the control board to fail to open the OTP function. As for the issue that the control board cannot start (not networked), the user needs to replace the U1 (main control IC FBGA of control board), and U1 can no longer be used in 19 series after replacement.
2) For the control board with the OTP function turned on, U1 cannot be used on other series of models;
2. The whole miner cannot find the IP
Probably, the IP cannot be found due to abnormal operation. Refer to the first point for troubleshooting. Check the appearance and welding of the network port, network transformer T1, and CPU.
3. The whole miner cannot be upgraded
Check the appearance and welding of the network port, network transformer T1, and CPU.
4. The whole miner fails to load the hash board or has fewer links
1) Check the cable connection status.
2) Check the parts of the control board corresponding to the chain.
3) Check the wave soldering quality of the plug - in pins and the resistance around the plug-in interface.
VII. Failure Phenomenon of the Whole Miner
IP cannot be detected, the number of fans is abnormal, and the chain is abnormal. If the test is abnormal, follow the monitoring interface and test log prompts for maintenance.
1. Fan display is abnormal
Check whether the fan is working normally, whether the connection with the control board is normal, and whether the control board is abnormal.
2. Less chain
Less chain refers to that among the 3 hash boards, 1 board is missing. There is a problem with the connection between the hash board and the control board in most cases. Check the cable to see if there is an open circuit. If the connection is OK, the user can test the single board PT2 to check if it can pass the test. If it passes the test, it can basically be determined that the problem is on the control board. If the test fails, use the PT2 repair method to repair it.
3. Abnormal temperature
Generally, it's due to the temperature being high. The PCB temperature set by our monitoring system cannot exceed 90 degrees. If it exceeds 90 degrees, the miner will alarm and fail to work normally. It is usually caused by high ambient temperature for the miner not to be able to work normally. Abnormal fan operation will also cause abnormal temperature.
4. Cannot find all the chips (boot can be operated, but the hash rate is 2/3 or 1/3 of the normal value).
The number of chips is not enough: if the number of chips is not enough, you can refer to PT2 for testing and repair.
5. After operating for a while, there is no hash rate
The connection to the mining pool is interrupted, then check the network;
6. Status of normal miner:
7. One hash board has a low hash rate
As for this situation, you can log in to the IP through the Putty software to observe whether the domain working voltage of this hash board is normal and whether the NONCE return is normal. Then, you can repair it according to the Putty log prompt.
How to use Putty: tail-f /tmp/nonce.log-NONCE print command tail -f /tmp/adc.log-domain voltage print command
The specific operations are as follows:
1) Open Putty , enter the IP of the miner in question, and click OPEN.
2) Enter the user name, password, and test command to check the NONCE response status and the status of the voltage domain. If the NONCE and domain voltage are abnormal, the user can perform measurement and maintenance based on the printed abnormal chip.
VIII. Other Matters Needing Attention
1. Routine inspection: First, visually inspect the hash board to be repaired, and observe whether there is PCB deformation or scorching. If yes, it must be processed first; whether there are apparent burnt marks on the parts, collision offset parts or missing parts, etc.; secondly, after the visual inspection is passed, the impedance of each voltage domain can be tested first to detect whether there is a short circuit or an open circuit. If yes, it must be dealt with first. Furthermore, check whether the voltage of each domain is about 0.32V.
2. After the routine test is passed (the short - circuit test of the general routine test is necessary to avoid the chip or other materials being burnt due to the short circuit when the power is turned on), the chip test can be performed with the test fixture. The positioning can be determined according to the test result of the test fixture.
3. According to the test fixture detection results, starting from the near faulty chip, check the chip test points (CO / NRST / RO / XIN / BI) and voltages such as VDD0V8 and VDD1V2.
4. According to the signal flow, except for the RX signal is a reversely transmitted signal (from No.126 to No.1 chip), several of the signals, including CLK CO BO RST, are forward transmission (1 - 126), and the abnormal fault point can be found through the power supply sequence.
5. When locating the faulty chip, the chip needs to be welded again. The method adds flux around the chip (preferably no - clean flux) and heats the solder joints of the chip pins to a dissolved state to prompt the chip pins and pads to re-run in and collect the tin to achieve the effect of tinning again. You can directly replace the BM1362 chip if the fault remains the same after re-soldering.
6. The repaired hash board can be judged as a good product with more than two passes when testing with the hash board tester. For the first time, after replacing the parts, wait for the hash board to cool down, use the hash board tester to test, and after the test is passed, set it aside and then cool it down. After a few minutes, test again for the second time when the hash board cools down.