Basic Concepts and System Requirements of BMS Communication Protocols
In the Battery Management System (BMS), the communication protocol is the core link for realizing cell state data acquisition, control command transmission, and system collaborative operation. It not only determines the real-time performance and reliability of information interaction but also directly affects the system’s scalability and functional safety level.
As power batteries develop towards high energy density and high integration, the communication requirements within the BMS and between external controllers (such as VCU and charging piles) are becoming increasingly complex, requiring protocols to feature multi-node support, low-latency response, strong anti-interference capabilities, and flexible topological adaptability. This chapter will start from the essence of the communication architecture to analyze the core demands of the BMS on the communication system, laying a theoretical foundation for the subsequent technology selection and scenario-based design of mainstream protocols.
Technical Principles and Characteristics Analysis of Mainstream Communication Protocols
In the architecture design of a Battery Management System (BMS), the choice of communication protocol directly determines the system’s real-time performance, reliability, scalability, and power consumption performance. With power battery systems evolving towards high energy density, multi-module integration, and intelligent management, the communication subsystem must be able to support high-speed data interaction, low-latency response, and strong anti-interference capabilities.
Currently, the communication protocols widely adopted in the BMS field mainly include CAN (Controller Area Network), I2C (Inter-Integrated Circuit), and SPI (Serial Peripheral Interface). These three protocols each have unique characteristics and are suitable for data transmission requirements at different levels.
From the system-level main controller to the single-cell monitoring unit, and from short-distance intra-board communication to cross-module long-distance interconnection, different physical layer constraints and functional requirements have given rise to diverse protocol selection strategies. For example, the CAN protocol, with its powerful error handling mechanism and multi-node arbitration capability, has become the de facto standard for communication between the central controller and slave modules of an automotive BMS;
whereas I2C is widely used in small battery packs or sensor data acquisition scenarios due to its few pins and simple wiring; SPI, with its advantages of full-duplex operation and high bandwidth, demonstrates irreplaceable value in situations that require rapid reading of large amounts of sampling data.
This chapter will deeply analyze the technical implementation principles of these three mainstream communication protocols and comprehensively parse their working mechanisms and performance boundaries by combining the signal timing, electrical characteristics, and protocol overhead in actual applications.
By systematically breaking down the core structure, data transmission process, fault tolerance mechanism, and applicable scenarios of each protocol, this chapter will help engineers establish a clear cognitive framework for protocols and provide theoretical support for performance comparisons and scenario-based designs in subsequent chapters.
Architecture and Working Mechanism of CAN Bus Protocol
Since the CAN bus was proposed by Bosch in the 1980s, it has been widely applied in automotive electronics, industrial control, and embedded systems, playing a core role especially in the BMS field. The key to its ability to operate stably in complex electromagnetic environments lies in its unique non-destructive arbitration mechanism, broadcast communication model, and robust error detection capabilities. Understanding the underlying working mechanism of CAN is a prerequisite for building a highly reliable BMS communication network.
Data Frame Structure and Arbitration Mechanism
The CAN protocol defines multiple frame types, among which the most core is the Data Frame, used to carry actual application data. A standard CAN 2.0A data frame contains the following fields:
| Field | Length (bit) | Description |
| Start of Frame (SOF) | 1 | Marks the beginning of a frame, dominant bit |
| Arbitration Field (Identifier + RTR) | 11 + 1 | Contains an 11-bit identifier ID and a remote transmission request bit |
| Control Field | 6 | Includes IDE, r0, and DLC (Data Length Code) |
| Data Field | 0~64 | Actual payload data, up to 8 bytes (Classic CAN) |
| CRC Field | 15 + 1 | Cyclic Redundancy Check value and delimiter |
| ACK Field | 1 + 1 | Acknowledgment slot and delimiter |
| End of Frame (EOF) | 7 | Marks the end of the frame |
// Example: CAN data frame structure definition (based on C structure)
typedef struct {
uint32_t id; // 11-bit or 29-bit standard/extended ID
uint8_t rtr; // Remote transmission request flag
uint8_t dlc; // Data length code (0-8)
uint8_t data[8]; // Payload data
} can_frame_t;
Code Logic Analysis:
idis the key field determining message priority. The smaller the value, the higher the priority.rtrindicates whether it is a Remote Transmission Request frame, which simply requests data without sending any.dlcexplicitly specifies the number of valid data bytes, avoiding incorrect parsing of invalid content by the receiver.- The entire structure is compact, making it suitable for memory management in embedded systems.
Detailed Explanation of Arbitration Mechanism
CAN uses the Carrier Sense Multiple Access with Collision Detection and Arbitration on Message Priority (CSMA/CD+AMP or CSMA/ND) mechanism to solve the conflict of multiple nodes transmitting simultaneously. All nodes monitor the bus state before transmitting (and transmit if it is idle); when multiple nodes initiate communication simultaneously, the node that gains bus access is determined by comparing the identifier ID bit by bit.
Assume Node A transmits ID = 0x100 (binary: 000 0001 0000 0000), and Node B transmits ID = 0x105 (000 0001 0000 0101). They drive the bus bit by bit starting from the highest bit:
- Bits 0~10 are identical, all being 0.
- Until the 6th bit (counting from the highest bit), A transmits 0 (dominant) and B transmits 1 (recessive).
- Because a dominant bit overrides a recessive bit, the bus level shows as 0. At this point, B detects that it transmitted a 1 but the bus is 0, indicating that a node with higher priority is communicating, so it automatically exits the transmission and switches to receive mode. The entire process requires no retransmission and does not waste bandwidth, achieving “non-destructive” arbitration.
This mechanism ensures that critical alarm information (such as overvoltage and overtemperature) can be assigned a low ID value to gain the highest priority, guaranteeing the system’s real-time response capability. For instance, in a BMS, a fault reporting frame can be set with ID = 0x100, while routine voltage sampling uses ID = 0x200, thereby ensuring that abnormal events are handled by the main control MCU at the earliest opportunity.
Additionally, CAN supports the Standard Format (11-bit ID) and the Extended Format (29-bit ID). The latter is suitable for more complex network topologies, but arbitration efficiency slightly decreases because the increased number of comparison bits lengthens arbitration time.
In summary, CAN’s data frame design balances simplicity and functionality, and its ID-based arbitration mechanism fundamentally solves the multi-master contention issue, providing an efficient and deterministic communication foundation for a distributed BMS.
Error Detection and Fault Tolerance Capability
In harsh automotive environments, electromagnetic interference, power fluctuations, and line aging can all lead to communication errors. Therefore, the CAN protocol has built-in multi-level error detection mechanisms to ensure data integrity.
The main error detection methods include:
| Detection Mechanism | Principle | Detectable Error Types |
| Bit Error | The bus level read back by the transmitting node does not match its own output | Driver failure, noise interference |
| Stuff Error | Violation of bit stuffing rules (6 consecutive bits of the same polarity) | Synchronization loss, hardware anomaly |
| CRC Error | The CRC calculated by the receiver does not match the CRC in the frame | Data corruption during transmission |
| Form Error | Illegal level appears in fixed bits (e.g., EOF, ACK del) | Protocol violation |
| Ack Error | Transmitter does not detect an acknowledgment bit | Receiving node failure or bus disconnection |
Each CAN node maintains two internal error counters:
- Transmit Error Counter (TEC)
- Receive Error Counter (REC)
These counters dynamically increase or decrease based on error occurrences. When TEC > 127, the node enters the “Error Passive” state, limiting its number of retransmissions; if TEC ≥ 255, it enters the “Bus Off” state, actively disconnecting from the bus to prevent it from affecting other nodes.
// Error state machine example (simplified version)
enum can_error_state {
ERROR_ACTIVE, // Normal operation
ERROR_PASSIVE, // Error passive
BUS_OFF // Bus off
};
void can_check_error_status(uint8_t tec, uint8_t rec) {
if (tec >= 255) {
set_bus_state(BUS_OFF);
can_recover_from_busoff(); // Trigger recovery process
} else if (tec > 127) {
set_bus_state(ERROR_PASSIVE);
} else {
set_bus_state(ERROR_ACTIVE);
}
}
Parameter Description and Logic Analysis
tecandrecoriginate from hardware registers and reflect the cumulative number of errors.- State transitions follow the ISO 11898 standard, ensuring behavioral consistency.
can_recover_from_busoff()usually requires an external trigger or periodic attempts to resynchronize, reflecting a fault-tolerant design.
Moreover, the CAN physical layer utilizes differential signaling (CAN_H and CAN_L), providing good common-mode rejection and effectively resisting EMI interference. Combined with terminating resistors (typically 120Ω) to match impedance, this reduces signal reflection and further improves communication stability.
In BMS applications, such fault-tolerant mechanisms are crucial. For instance, if a slave board suffers intermittent communication failure due to PCB moisture, the CAN node will automatically lower its transmission frequency or temporarily go offline, rather than causing the entire communication link to paralyze. The main control MCU can assess node health via periodic heartbeat monitoring and initiate diagnostics or switch to a backup path.
High Real-Time Performance and Multi-Node Communication Advantages
Another major advantage of the CAN protocol is its support for a multi-master architecture and broadcast communication, making it highly suitable for collaborative work between a Master and multiple Slaves in a BMS.
In a typical electric vehicle BMS architecture, the main control MCU polls the voltage, temperature, and SOC information of each battery module via the CAN bus, while simultaneously issuing equalization commands or configuration parameters. Because all nodes share the same bus, any node can transmit emergency alarm frames at any time without waiting for a polling cycle, immensely improving the system’s response speed.
For example, consider the following communication scenario:
| Node | ID Allocation | Function |
| Main MCU | 0x100 | Main controller, receives data, issues commands |
| Slave_1 | 0x201 | Module 1, uploads voltage/temperature |
| Slave_2 | 0x202 | Module 2, uploads voltage/temperature |
| Fault_Monitor | 0x050 | Safety monitoring, highest priority |
When Slave_1 detects a single-cell overvoltage, it immediately sends a data frame with ID=0x201. At the same time, the Main MCU is sending an equalization command to Slave_2 (ID=0x150). Because 0x201 < 0x150, the former has a higher priority and will seize the bus to transmit during the next bus idle period, achieving millisecond-level alarm response.
To quantify its real-time performance, the maximum delay can be estimated using the following formula:
Where:
- $T_{slot}$: Single bit time (e.g., at a baud rate of 500 kbps, $T_{slot} = 2\mu s$)
- $N_{higher}$: The number of active frames with a higher priority than the current frame
Assuming there are 3 higher-priority periodic tasks in the system (such as safety monitoring, braking signals, etc.), and the average frame length is 128 bits, the worst-case delay is approximately:
This meets the real-time control requirements of most BMSs.
Furthermore, CAN supports flexible network topologies, connecting up to 110 nodes (limited by bus load), and implements selective reception through ID filtering to reduce CPU processing overhead. Modern BMS designs often utilize a layered CAN network, such as:
- High-speed CAN (500 kbps ~ 1 Mbps) for communication between the main controller and slave controllers.
- Low-speed fault-tolerant CAN (125 kbps) for vehicle body network interaction.
In conclusion, relying on its excellent real-time performance, multi-master support, and robust error handling mechanism, the CAN protocol has become the preferred solution for backbone communication in a BMS. The next section will turn to another lightweight protocol—I2C—to discuss its applicability in resource-constrained environments.
Implementation Principles and Applicable Scenarios of I2C Protocol
I2C (Inter-Integrated Circuit) is a serial communication protocol developed by Philips (now NXP) in the 1980s. Because it requires only two signal lines (SDA data line and SCL clock line) to achieve multi-device interconnection, it is widely used in sensor interfaces, EEPROM communication, and connections for small battery management ICs. In consumer electronics and small BMS modules, I2C occupies an important position due to advantages like pin savings and low hardware costs.
Master-Slave Architecture and Addressing Mechanism
I2C adopts a strict Master-Slave architecture, where communication and clock signal (SCL) generation are always initiated and controlled by the Master, and the Slave can only respond to requests. Multiple slave devices can be mounted on a single I2C bus, with each device possessing a unique 7-bit or 10-bit address.
The communication process is as follows:
- The Master sends a Start Condition: While SCL is high, SDA is pulled from high to low.
- Send the Slave address (7 bits) + Read/Write bit (R/W).
- The Slave returns an ACK (low-level acknowledgment).
- Data transmission (1 byte at a time, followed by an ACK).
- Stop Condition: While SCL is high, SDA is pulled from low to high.
// I2C write operation pseudo-code (taking STM32 HAL library as an example)
HAL_StatusTypeDef write_to_slave(uint8_t dev_addr, uint8_t reg, uint8_t value) {
uint8_t tx_data[2] = {reg, value};
return HAL_I2C_Master_Transmit(&hi2c1, (dev_addr << 1), tx_data, 2, 100);
}
Parameter Description:
dev_addr: 7-bit slave address (e.g., 0x55).<< 1: Left-shift by one bit, reserving the lowest bit for the R/W bit (0=write, 1=read).tx_data: Transmit the register address first, then the data.HAL_I2C_Master_Transmit: Blocking transmit function, with a timeout of 100ms.
Address Conflicts and Solutions
A common issue is the presence of multiple chips of the same model on the same bus (e.g., multiple INA226 current sensors). Their factory default addresses are identical, easily causing conflicts. Solutions include:
- Using versions that support address pin configuration (e.g., modifying the address by connecting the ADDR pin to VCC/GND).
- Adding an I2C multiplexer (such as the PCA9548A) to achieve channel isolation.
In addition, I2C supports a General Call Address (0x00), which can be used for global configuration or waking up all slave devices, but it must be used with caution to prevent accidental operations.
Transmission Rate and Bus Load Limits
I2C defines various rate modes:
| Mode | Baud Rate | Typical Application Scenarios |
| Standard Mode | 100 kbps | General sensor communication |
| Fast Mode | 400 kbps | High-frequency sampling requirements |
| Fast Plus Mode | 1 Mbps | High-speed ADC/DAC |
| High-Speed Mode | 3.4 Mbps | Special dedicated devices |
However, the actually achievable rate is limited by bus capacitance. The I2C specification states that the maximum capacitive load is 400pF. Every centimeter of PCB trace introduces about 1–2pF of capacitance; combined with device input capacitance (typically 10pF), long-distance wiring easily exceeds the standard.
Calculation formula:
If it exceeds 400pF, measures must be taken:
- Shorten the trace length.
- Use devices with lower capacitance.
- Increase the pull-up resistor’s drive strength (in conjunction with an Active Pull-up circuit).
Recommended pull-up resistor value estimation:
For example, for the 400kHz Fast Mode with $C_{load}=200pF$, the recommended $R_{pull-up} \approx 2.2k\Omega$.
Typical Applications in Low-Power Design
In portable device BMSs, I2C is frequently used to connect fuel gauges (such as MAX17048) and temperature sensors (such as TMP117) alongside other low-power peripherals. These devices support sleep mode and interrupt-triggered wake-up; paired with I2C’s low quiescent current characteristic, this significantly extends standby time.
Typical design:
- Use open-drain output + external pull-up (an optional NMOS switch can control the pull-up power supply).
- Turn off the SDA/SCL pull-ups during non-communication periods to enter a “Floating” state.
- The slave device notifies the main controller of events via the ALERT pin.
// Disable I2C pull-up before main controller goes to sleep
void enter_low_power_mode() {
HAL_GPIO_WritePin(PULLUP_EN_GPIO, PULLUP_EN_PIN, GPIO_PIN_RESET); // Disconnect pull-up
__WFI(); // Wait for interrupt
}
Code Logic Analysis:
By using software to control the pull-up power supply, the I2C bus can be placed in a high-impedance state during idle periods, eliminating leakage current, which is particularly suitable for coin-cell battery-powered systems.
In conclusion, I2C plays an essential role in small BMSs thanks to its simplicity and low pin footprint, though attention must be paid to issues regarding rate, load, and address management.
Data Transmission Mechanism and Performance Characteristics of SPI Protocol
SPI (Serial Peripheral Interface) is a high-speed, full-duplex, synchronous serial communication protocol commonly used for connections between microcontrollers and high-speed peripherals like Flash memory, ADCs, and LCD screens. In a BMS, SPI is routinely employed for high-speed data exchange between the main control MCU and battery monitoring ASICs (such as LTC6811 and BQ76952).
Full-Duplex Synchronous Communication and Pin Configuration
SPI uses four signal lines:
- MOSI (Master Out Slave In)
- MISO (Master In Slave Out)
- SCLK (Serial Clock)
- CS/SS (Chip Select)
Communication entails the Master driving SCLK and controlling the CS pin to select a slave device. Data is transmitted synchronously on the clock edges, supporting full-duplex functionality—meaning the master and slave can transmit and receive simultaneously.
// SPI read/write using STM32 LL library
uint8_t spi_transfer_byte(uint8_t tx_data) {
while (!LL_SPI_IsActiveFlag_TXE(SPI1));
LL_SPI_TransmitData8(SPI1, tx_data);
while (!LL_SPI_IsActiveFlag_RXNE(SPI1));
return LL_SPI_ReceiveData8(SPI1);
}
Parameter Description:
LL_SPI_IsActiveFlag_TXE: Checks if the transmit buffer is empty.LL_SPI_TransmitData8: Writes a byte to the DR register.- Automatically triggers clock pulses; MISO data is shifted in synchronously.
Advantages:
- Can reach speeds up to 50+ Mbps (depending on the MCU and peripheral capabilities).
- No address overhead, meaning low communication overhead.
- Supports DMA acceleration to reduce CPU load.
Disadvantages:
- Requires an extra CS pin for each additional slave device (or the use of a decoder).
- Does not support multi-master configurations.
- Lacks a built-in error detection mechanism (relies on application-layer checks).
Flexible Control of Clock Polarity and Phase
SPI supports four operating modes, determined by CPOL (Clock Polarity) and CPHA (Clock Phase):
| Mode | CPOL | CPHA | Sampling Edge | Output Edge |
| 0 | 0 | 0 | Rising edge | Falling edge |
| 1 | 0 | 1 | Falling edge | Rising edge |
| 2 | 1 | 0 | Falling edge | Rising edge |
| 3 | 1 | 1 | Rising edge | Falling edge |
For example, the LTC6811 uses Mode 1 (CPOL=0, CPHA=1), which demands matching configurations on the main controller; otherwise, it will not be able to communicate correctly.
// Configure SPI mode (LL library)
LL_SPI_InitTypeDef spi_init;
spi_init.ClockPolarity = LL_SPI_POLARITY_LOW;
spi_init.ClockPhase = LL_SPI_PHASE_2EDGE;
LL_SPI_Init(SPI1, &spi_init);
Code Logic Analysis:
The SPI mode must strictly match the mode specified in the slave device’s manual; otherwise, data misalignment or communication failure will occur.
Bottleneck Analysis of High-Speed Short-Distance Communication
Although SPI boasts extremely high speeds, it faces challenges in multi-board cascading within a BMS:
- Trace Length Limitations: High-speed signals are susceptible to reflections and crosstalk; lengths ≤10cm are recommended.
- CS Signal Delay: In a daisy chain topology, CS propagation delays can impact timing.
- EMI Issues: Unshielded traces may radiate noise.
Solutions:
- Use differential SPI (like LVDS) to extend transmission distances.
- Add signal conditioning chips (such as the TI SN65LVDSxx).
- Adopt a Daisy Chain layout to reduce pin footprint.
Flowchart note: A daisy chain architecture allows multiple monitoring ICs to share SCLK/MOSI/CS, strictly connecting MISO in series, thereby significantly conserving MCU pin resources.
In summary, SPI is the ideal choice for high-performance BMS data acquisition, but it requires meticulous PCB layout and timing control design.