SPRACS0A May 2020 – November 2022 TMS320F280048-Q1 , TMS320F280048C-Q1 , TMS320F280049 , TMS320F280049-Q1 , TMS320F280049C , TMS320F280049C-Q1 , TMS320F28033 , TMS320F28033-Q1 , TMS320F28035 , TMS320F28035-EP , TMS320F28035-Q1 , TMS320F28053 , TMS320F28055 , TMS320F2806-Q1 , TMS320F28065 , TMS320F28069 , TMS320F28069-Q1 , TMS320F28069F , TMS320F28069F-Q1 , TMS320F28069M , TMS320F28069M-Q1 , TMS320F28075 , TMS320F28075-Q1 , TMS320F28076 , TMS320F28374D , TMS320F28374S , TMS320F28375D , TMS320F28375S , TMS320F28375S-Q1 , TMS320F28376D , TMS320F28376S , TMS320F28377D , TMS320F28377D-EP , TMS320F28377D-Q1 , TMS320F28377S , TMS320F28377S-Q1 , TMS320F28378D , TMS320F28378S , TMS320F28379D , TMS320F28379D-Q1 , TMS320F28379S , TMS320F28384D , TMS320F28384D-Q1 , TMS320F28384S , TMS320F28384S-Q1 , TMS320F28386D , TMS320F28386D-Q1 , TMS320F28386S , TMS320F28386S-Q1 , TMS320F28388D , TMS320F28388S , TMS320F28P650DH , TMS320F28P650DK , TMS320F28P650SH , TMS320F28P650SK , TMS320F28P659DH-Q1 , TMS320F28P659DK-Q1 , TMS320F28P659SH-Q1
Enabling extremely high performance computation and efficient processing is critical for solving today’s complex real-time control problems. Real-time control systems are closed-loop control systems where one has a tight time window to gather data, process that data, and update the system in order to meet the performance objectives. TI’s Control Law Accelerator (CLA) is designed to execute real-time control algorithms in parallel with the C28x CPU, effectively doubling the computational performance of C2000 devices. This application report discusses some of the unique features of CLA and demonstrates them using simple software examples. These stand-alone examples are available as part of C2000Ware and can be quickly used to explore and evaluate the capabilities of CLA.
C2000 is a trademark of Texas Instruments.
All trademarks are the property of their respective owners.
The CLA is a fully-programmable independent 32-bit floating-point CPU that is designed for optimal math intensive computations to offer a significant boost to the performance of control algorithms. Unlike the standard traditional processor which executes instructions and services interrupts, the CLA instead is a task-driven machine and can support up to 8 user-defined tasks. The CLA in addition to providing computational capability provides an unique combination of minimal latency and ease of access to the key control peripherals. This makes the CLA ideal for implementing fast control loops, thus freeing up bandwidth on C28x to run additional control loops and perform other diagnostic and communication related tasks. The subsequent sections of this application report discusses these unique capabilities of CLA in detail and also demonstrates them through simple software examples which are provided as part of C2000Ware package [2]. For more details on CLA architecture and instructions set, see [1], [3].
The examples discussed in this document can be found in C2000Ware v3.01.00.00 or latest, located within the following directories after installation:
The discussed example projects are:
Most of the real-time control algorithms can be split into three main tasks: excite the system, sample the system and control the system. Exciting the system would involve updating the PWM registers, sampling the system involves accessing the ADC result registers while controlling the system involves control loop math computations. CLA being an independent math processor, also has the ability to access registers of all key peripherals used for control applications like EPWM, ADC, ECAP, EQEP, CMPSS, and so forth directly. This allows CLA to perform sampling and actuation along with computation of control logic and is capable of executing the entire control task independently without any C28x involvement.
The example “cla_ex4_pwm_control” showcases how to control the PWM signal output directly through CLA. The block diagram of this example is shown in Figure 2-1. In this example, EPWM1 is configured to generate complementary signals on both of its channels at a fixed frequency of 100 KHz while EPWM4 is configured to trigger a periodic CLA control task at a frequency of 10 KHz. The CLA Task 1 implements a very simple logic to vary the duty of the EPWM1 outputs by increasing it by 0.1 for every iteration while maintaining it in the range of 0.1-0.9. The code sequence below illustrates how the existing C28x driverlib APIs (available as part of C2000Ware) can be used as it is within the CLA task to update the EPWM registers avoiding any additional software development effort with respect to CLA. The CLA task can access key registers of other shared peripherals as well in a similar fashion. Note that the CLA global variables cannot be initialized at the start of .cla file thus this example also illustrates a systematic way of initializing all the CLA global variables inside a dedicated CLA task (CLA task 8), which is triggered by C28x software at the time of initialization.
__attribute__((interrupt)) void Cla1Task1 ( void )
{
//
// Uncomment this to debug the CLA while connected to the debugger
//
__mdebugstop();
//
// Write to the COMPA register to realize a particular duty value
//
EPWM_setCounterCompareValue(EPWM1_BASE, EPWM_COUNTER_COMPARE_A,
(uint16_t)(duty * EPWM1_PERIOD + 0.5f));
//
// Update duty value and use the limiter
//
duty += 0.1f;
duty = (duty > 0.9f) ? 0.1f : duty;
//
// Clear EPWM4 interrupt flag so that next interrupt can come in
//
EPWM_clearEventTriggerInterruptFlag(EPWM4_BASE);
}
In any real-time control application, the sample to output delay, defined as the time that elapses between sensing, processing and actuation, is an important system consideration. The low-latency architecture of CLA reduces this sample to output time while increasing the overall system throughput. This is made possible because CLA is task oriented instead of interrupt driven machine and does not use interrupts to synchronize with hardware. Instead, it supports up to eight independent tasks, which are each mapped to hardware events such as a timer or data availability on an ADC, and so forth. A task initiated on the CLA runs to completion without any interruption or nesting involved, hence eliminating the need for any context-switching overhead typically involved in traditional interrupt-based processors. Thus, there is little to no delay involved in processing the data by CLA, which ultimately reduces the sample to output delay and enables faster system response. Figure 3-1 illustrates the differences between a task driven machine (TDM) and an interrupt driven machine (IDM).
The low interrupt response of CLA can be leveraged in combination with the early-interrupt feature of TI’s internal ADC to further reduce the sample to output delay. The ADC can be configured to generate an early interrupt pulse at the end of sampling before the conversion completes. This early-interrupt pulse from the ADC can be used to trigger a CLA task that would allow the CLA to read the result as soon as the conversion result is available in the ADC result register. This combination of just-in-time sampling along with the low interrupt response of the CLA enable faster system response and higher frequency control loops. The available time before the conversion can be effectively utilized for any necessary pre-processing steps within the CLA task as illustrated in Figure 3-2. The exact instruction at which the read request should be placed to achieve just-in-time read can be calculated based on the CLA pipeline activity for N-cycle ADC conversion. As shown in Figure 3-3, The N-2 instruction will arrive in the R2 phase just in time to read the result register. For the standard 12-bit ADC configuration and clock divider as 4, N is 42. To find out the correct value of N based on the configuration of ADC, see the device-specific data sheet [4].
The example “cla_ex5_adc_just_in_time” utilizes the above concept to read the ADC data “just-in-time” even at very high sampling frequencies. As depicted in Figure 3-4, EPWM1 is configured to generate a PWM output signal of frequency 1 MHz, which is also used to trigger the ADC sampling at each cycle. The example also utilizes the newly added feature in TI’s Type 5 ADCs, which allows delaying the early interrupt pulse by few cycles as per the programmed OFFSET value. Thus ADCA is configured to sample the input on Channel 0 and to generate the early interrupt at the end of S/H + offset cycles. This interrupt is used to trigger the CLA control task. The CLA task implements the control logic to update the duty of the PWM output based on the read ADC value. The early interrupt feature and low interrupt latency of CLA allows the application to do any necessary pre-work so that the application can act on the ADC results immediately when they become available and still complete updating the PWM output before the next interrupts arrives. Thus, all the three steps (sampling, processing and actuation) are completed within a 1 MHz cycle. As shown in the below code snippet of the CLA task, 3-point moving average filter is used to simulate the processing sequence for illustration purposes and few steps of the filtering sequence that are denoted as the pre-processing code are implemented before reading the ADC result to make use of the time available before conversion.
//
// Pre-processing for implementing moving average filter, takes 13 cycles
// This is just to illustrate how cycles can be utilized to do some pre-
// processing before ADC result latches. Based on the cycles taken by
// pre-processing code, ADC interrupt offset need to be programmed
//
data_read_total = data_read + data_read_prev;
data_read_prev2 = data_read_prev;
data_read_prev = data_read;
//
// Reading ADC just-in-time
//
data_read = HWREGH(ADCARESULT_BASE + ADC_RESULTx_OFFSET_BASE + ADC_SOC_NUMBER0);
//
// "data_read_total" stores the cumulative sum of current and last 2 data elements
//
data_read_total += data_read;
//
// Taking average of 3 elements, normalizing for 12-bit and mapping to output duty
// linearly in the range 0.1-0.9
// duty = 0.1 + (0.9-0.1) * ((data_read_total / 3) / 2^12 )
//
duty = 0.1f + (data_read_total / (15360.0f));
//
// Writing to the COMPA register for realizing computed duty value
//
HWREGH(EPWM1_BASE + EPWM_O_CMPA + 0x1U) = (uint16_t)(duty * EPWM1_PERIOD + 0.5f);
The early interrupt OFFSET value of ADC need to be adjusted based on the cycles consumed by the pre-processing in order to read the ADC data “just-in-time”. In this example, the OFFSET value of 20 is used based on the calculation shown in example header. The programming sequence for this configuration of ADC is shown below. The actual use-case may involve different pre-processing steps, hence the interrupt OFFSET value need to programmed accordingly.
//
// Set pulse positions to early
//
ADC_setInterruptPulseMode(ADCA_BASE, ADC_PULSE_END_OF_ACQ_WIN);
//
// Set interrupt offset delay as 20 cycles based on the calculation
// shown in example header
//
ADC_setInterruptCycleOffset(ADCA_BASE, 20);
CLA also offers powerful 32-bit floating point processing capability to C2000 devices and provides a significant boost to the performance of typical math functions that are commonly used in control algorithms. The powerful CLA instruction set supports floating point multiplication with parallel add or subtract operations in a single cycle and also supports computation of inverse square root in a single cycle too. For the ease of software development with CLA, a wide collection of commonly used floating-point math functions (a few of them are listed in Table 4-1) are packaged into a single library called as CLA Math, which is available as part of C2000Ware. This source code library includes several C callable assembly math functions optimally written for CLA architecture.
In addition to the basic math routines, TI also provides Digital Control library (DCL available as part of C2000Ware) that includes optimal implementation of standard control routines on CLA CPU, few of them are listed in Table 4-1. These C callable assembly control routines can be called within a CLA application task to realize digital controller on CLA CPU. Along with the library source code, examples are provided to show the user how to integrate the library into their projects and use any of the math or control routines. These examples can be found in the example directories indicated in the introduction section that can be used to explore and evaluate the compute capability of CLA.
Library | Routine | Description | Cycles |
---|---|---|---|
CLA Math | CLAcos | Calculates cosine on CLA | 28 |
CLAsin | Calculates sine on CLA | 28 | |
CLAacos | Calculates arc-cos on CLA | 24 | |
CLAasin | Calculates arc-sine on CLA | 22 | |
CLAatan | Calculates arc-tan on CLA | 41 | |
CLAlog10 | Calculates Log (base10) on CLA | 29 | |
CLAexp | Calculates exponential on CLA | 41 | |
CLAdiv | Calculates floating-point division on CLA | 13 | |
CLAisqrt | Calculates inverse square root on CLA | 14 | |
CLAsqrt | Calculates square root on CLA | 16 | |
DCL | DCL_runPID_L1 | Runs Ideal Form PID controller on CLA | 53 |
DCL_runPID_L2 | Runs Parallel Form PID controller on CLA | 45 | |
DCL_runPI_L1 | Runs Ideal Form PI controller on CLA | 34 | |
DCL_runDF13_L1 | Runs the DF13 Full Compensator on CLA | 61 | |
DCL_runDF13_L2 | Runs the DF13 Immediate Compensator on CLA | 20 | |
DCL_runDF13_L3 | Runs the DF13 Partial Compensator on CLA | 58 |