【LPC54114双核使用指南翻译】+任务三

feixiang20 · 发表于 2017-6-8 17:57:05

Slave core handles misc non-computational tasks
从核心处理其他非计算任务

In some cases, satisfy some requirements otherwise will need CPLD/FPGA
Most of which are GPIO related operations
High speed, accurate executor
GPIO operation
Run some high real-time protocols
Mainly for various field bus specifications.
在某些情况下，满足一些要求，否则将需要CPLD / FPGA
其中大部分是GPIO的相关操作
高速、精确执行器
GPIO操作
运行一些高实时协议
主要适用于各种现场总线规范。

(Use with CAUTION) Use M0+ to get lower energy consumption

（小心使用）使用M0 +获得更低的能源消耗

M0+ uA/MHz is lower than M4, but usually cost longer time to complete same task, power * time (energy) is not must be lower.
M0 has weaker instruction set and single bus master.
Be aware, CPU is not the only power sink
IRC/FRO/PLL, regulator, flash, leakage, 700uA+
Conditions that makes M0+ to save energy:
Clock frequency is high (typically 48MHz+), or CPU can’t sleep due to some strict performance constraints, such as high IRQ rate, accurate timing, etc.
On LPC5410x, M0+ is as low as 55% power of M4 when CPU MHz is > 48MHz
M0+ code does not involve math and M4 only stuffs:
integer DIV (and MUL on LPC5410x, 32 times slower than M4)
DSP, SIMD, and floating point.
Other M4 advantages: bit-field manipulation, bitmap based allocator helpers (CLZ, RBIT), high bandwidth data transfer, high frequency IRQ handling (not for this sake).

M0 +尿酸/ MHz低于M4，但通常会花较长的时间来完成同样的任务，功率*时间（能量）是不是要低。
M0具有较弱的指令集和单总线主控。
要知道，CPU不是唯一的电源接收器
IRC /来回/锁相环，调节器，闪光，泄漏，700ua +
条件使得M0 +节约能源：
时钟频率高（通常48mhz +），或CPU不能入睡，由于一些严格的性能的限制，如高的中断率，精确的定时，等。
在lpc5410x，M0 +是M4 55%功率为低时，CPU MHz > 48mhz
M0 +代码不涉及数学和M4的唯一的东西：
整数的div（多在lpc5410x，比M4慢32倍）
与浮点运算。
其他M4优势：位域操作，通过分配器佣工位图（CLZ，上），高带宽的数据传输，高频率的IRQ处理（不是这个缘故）。

阅读全文

feixiang20 · 发表于 2017-6-8 18:15:13

本帖最后由 feixiang20 于 2017-6-8 18:17 编辑

Special usage: "Load balance" between master and slave
特殊用法：主从之间的“负载平衡”

It is like two people eat a plate of beans: Every one picks them until eaten up.
Master and slave do the same repetitive operations, such as
Processing all pixels in an image
Doing matrix multiplication for multiple matrix pairs.
Master initialize the input data, and setup an “item remaining” count-down counter that shows how many unprocessed operations.
Master send message to slave to notify slave to balance the processing load.

这就像两个人吃一盘豆子：每个人都挑，直到吃了。
主和从执行相同的重复操作，如
处理图像中的所有像素
多矩阵对的矩阵乘法。
主初始化输入数据，并设置一个“项目剩余”计数计数器显示多少未经处理的操作。
主发送消息到从属通知从属以平衡处理负载。

Load balance implementation
负载平衡的实现

Both core enters a processing loop, it is a “while (1)” loop, in it, both core:
Lock the remaining counter with H/W MUTEX,
If counter is 0, then unlock and exit loop
Otherwise substract the counter with a number “pick_count” and unlock. Pick count >=1.
pick “pick_count” unprocessed operations, but M4 pick from first to last, M0+ pick from last to first, and process them.
For slave, it also set a “busy” flag during the processing loop and clear it after exit.
For master, after it exits the processing loop, it wait until slave busy flag is cleared.

两个核心进入一个处理循环，它是一个“同时（1）”循环，在它，两个核心：
将剩余计数器锁定为H / W互斥，
如果计数器为0，则解锁并退出循环
否则减一号”pick_count”和解锁计数器。拾取计数> 1。
选择“pick_count“未加工的业务，但M4从开始到最后，M0 +挑选从后到前，处理。
对于奴隶，它还设置了一个“忙”的标志在处理循环和退出后清除它。
对于主进程，在它退出处理循环之后，它会等待直到从忙标记清除。

Load balance case study: Guassian blur for 128x128 image
负载平衡的案例研究：为128x128的图像的高斯模糊

it shows how time is saved when both M4 and M0+ process the pixels of one image, compared with use only M4 or only M0+ to process the same image. M4 and M0+ picks unprocessed pixels by themselves to process (just like two people compete to eat the same plate of beans)
In this demo, equivalent processing power of dual-core is about 154%-180% of single M4, or about 230% - 260% of single M0+. (Optimization: M4 code O2, M0+ code O2)
If running M4 code from the same SRAM block of M0+ code, both core will compete the same RAM block, and both core are slowed down to about 87.4% performance

它显示了如何节省时间当M4与M0 +过程图像的像素，使用M4或只有M0 +相比，处理相同的图像。M4和M0 +挑选自己未加工处理的像素（就像两人竞争吃豆子一样的板）
在本演示中，双核心的等效处理能力是154%单M4 180%，或约230%单M0 + 260%。（优化：M4代码O2，M0 +代码O2）
如果从相同的SRAM块M0 +代码运行M4代码，核心竞争同一内存块，而核心是放缓至约87.4%的性能

CPU at 12000kHz, Press the SW2 button to Start.
Running Gaussian blur with M4 only
154 ms elapsed!
Running Gaussian blur with both M4 and M0+
90 ms elapsed!
Running Gaussian blur with M0+ only
218 ms elapsed!
Test done
CPU at 48000kHz, Press the SW2 button to Start.
Running Gaussian blur with M4 only
40 ms elapsed!
Running Gaussian blur with both M4 and M0+
23 ms elapsed!
Running Gaussian blur with M0+ only
54 ms elapsed!
Test done

CPU在12000khz，按下SW2键开始。
仅与M4运行高斯模糊
154毫秒时间！
用M4和M0 +运行高斯模糊
90毫秒时间！
仅与M0 +运行高斯模糊
218毫秒时间！
做测试
CPU在48000khz，按下SW2键开始。
仅与M4运行高斯模糊
40毫秒时间！
用M4和M0 +运行高斯模糊
23毫秒时间！
仅与M0+运行高斯模糊
54毫秒时间！
已完成测试

小马哥-1650185 · 发表于 2017-6-25 01:05:29

本帖最后由小马哥-1650185 于 2017-6-27 08:46 编辑

For slave 居然翻译成对于奴隶

小马哥-1650185 · 发表于 2017-6-25 01:17:14

本帖最后由小马哥-1650185 于 2017-6-27 08:46 编辑

Use M0+ to get lower energy consumption

使用M0 +获得更低的能源消耗

纠正：使用M0+ 来降低功耗

小马哥-1650185 · 发表于 2017-6-25 01:23:54

本帖最后由小马哥-1650185 于 2017-6-27 08:47 编辑

M0+ uA/MHz is lower than M4, but usually cost longer time to complete same task, power * time (energy) is not must be lower.
M0 +尿酸/ MHz低于M4，但通常会花较长的时间来完成同样的任务，功率*时间（能量）是不是要低。

。。。尿酸都出来了

，什么时候微安变成尿酸了
还有最后的纠正：功率*时间（能量）不一定更低。

党国特派员 · 发表于 2017-6-25 17:44:02

这感觉是机器翻译的一样。翻译的一踏糊涂。。。

小马哥-1650185 · 发表于 2017-6-25 19:32:46

本帖最后由小马哥-1650185 于 2017-6-27 08:47 编辑

这感觉是机器翻译的一样。

yangjiaxu · 发表于 2017-11-29 19:26:33

没有找了这么久啊啊啊啊啊啊啊啊啊啊

[原创] 【LPC54114双核使用指南翻译】+任务三

浏览过的版块

站长推荐 /3