APU平台运行示例#
作者: Shawn
本文主要介绍如何在类脑芯片上运行脑仿真及深度学习网络。
脑仿真#
将平台设置为 apu
。用 apu
来指 Lynxi
系列芯片,指定 use_brainmodel
来确定使用哪种中间描述文件实现 apu
上脑仿真运行。
from neurai.config import set_platform, set_simulate_status
set_platform(platform='apu')
set_simulate_status(use_brainmodel=True)
``use_brainmodel``为 ``True``时,完成网络构建后生成网络详细描述文件(以神经元为单位的详细网络结构信息),可以实现自定义规则连接、精细权重、精细延迟等自定义连接; ``use_brainmodel``为 ``False``时,获取网络参数信息生成网络构建描述文件(以神经元簇为单位的网络结构信息)。
本文创建一个脑仿真网络,包含两个 neurai.nn.neuron.exp_lif.ExpLIF
神经元,并连接到一个 neurai.nn.synapse.static_synapse.StaticSynapse
。
由于使用 Lynxi
系列芯片,脑仿真运行时需要神经元的全局信息,在创建时需使用惰性创建,加上 parallelize
装饰器。
另外,需要注意的是,使用 apu
仿真时,当网络中有生成器时需要先创建神经元簇,再创建生成器。
from neurai import nn
from neurai.initializer import NormalIniter, UniformIniter
from neurai.parallelize import parallelize
class TwoNeurons(nn.SNet):
@parallelize
def setup(self):
# 创建两个ExpLIF神经元簇
self.lif0 = nn.ExpLIF(I_e=400., v_init=UniformIniter(-80, -50))
self.lif1 = nn.ExpLIF(size=1, v_init=NormalIniter(-70.0, 1))
weight = 20.0
delay = 10
# 创建突触
self.syn1 = nn.StaticSynapse(
pre=self.lif0, post=self.lif1, weight=weight, delay_step=delay, conn=nn.FixedTotalNum(1))
self.syn2 = nn.StaticSynapse(pre=self.lif1, post=self.lif0, weight=weight, delay_step=delay)
定义网络的 __call__ 方法,并在 __call__ 方法中调用 self.network 实例,并传入 input 、 t 和 monitor 参数。
class TwoNeuronNet(nn.Module):
def setup(self):
self.network = nn.SNetLayer(TwoNeurons)
def __call__(self, input=None, t=0, monitor=None):
_, mon = self.network(input=input, t=t, monitor=monitor)
return mon
初始化网络:
net = TwoNeuronNet()
params = net.init()
当调用 net.init 时,内部会调用 neurai.util.apu.save_ir 方法,并且保存 IR。
创建监视器,记录ExpLIF_0 的脉冲和电压,记录到内存:
from neurai.monitor import MonitorBS, MonitorConf, MemoryRecorder
monitorbs = MonitorBS(
monitors=[MonitorConf("ExpLIF_0", "spike"),
MonitorConf("ExpLIF_0", "v", slices=slice(0, 1))],
recorder=MemoryRecorder())
有关监视器的详细信息,可参考 监视神经元及突触信息。
运行网络:
mon = net.run(params, t=1000.0, monitor=monitorbs)
当调用 net.run
时,会把网络用IR(详细的IR格式定义请参考 网络结构描述文件(IR))来表示,并把此IR传递给 Lyngor
,然后使用 Lyngor
的引擎编译并运行网络。
运行日志如下:
2023/12/08 10:07:16.987714 2633201472 1095 [lynHost.cpp:doRun:317] [node0] run 9991 timestep. (1|1000)
2023/12/08 10:07:16.988375 2633201472 1095 [lynHost.cpp:doRun:317] [node0] run 9992 timestep. (2|1000)
2023/12/08 10:07:16.989021 2633201472 1095 [lynHost.cpp:doRun:317] [node0] run 9993 timestep. (3|1000)
2023/12/08 10:07:16.989669 2633201472 1095 [lynHost.cpp:doRun:317] [node0] run 9994 timestep. (4|1000)
2023/12/08 10:07:16.990317 2633201472 1095 [lynHost.cpp:doRun:317] [node0] run 9995 timestep. (5|1000)
2023/12/08 10:07:16.990969 2633201472 1095 [lynHost.cpp:doRun:317] [node0] run 9996 timestep. (6|1000)
2023/12/08 10:07:16.991635 2633201472 1095 [lynHost.cpp:doRun:317] [node0] run 9997 timestep. (7|1000)
2023/12/08 10:07:16.992286 2633201472 1095 [lynHost.cpp:doRun:317] [node0] run 9998 timestep. (8|1000)
2023/12/08 10:07:16.992932 2633201472 1095 [lynHost.cpp:doRun:317] [node0] run 9999 timestep. (9|1000)
2023/12/08 10:07:16.993581 2633201472 1095 [lynHost.cpp:doRun:317] [node0] run 10000 timestep. (10|1000)
graph_runtime_chip_snn.py[line:151]-INFO: input start 1702001236.987068, run_start 1702001236.987068, run_end 1702001236.993709, input cost = 0.000238ms, run cost = 6.640911ms
util.py[line:267]-INFO: [run]-chip Total time running run(1999): 0.0067 seconds
util.py[line:267]-INFO: [run]-get_apu_out_node_chipkeep Total time running get_output_node_chipkeep(2000): 0.0020 seconds
运行时间统计信息如下:
2023/12/08 10:07:18.615044 2633201472 1095 [lynHost.cpp:printNodeStat:965] ===============节点间同步等待耗时最多的节点 0===============
[lynHost.cpp:printNodeStat:966] [node0] 引擎: 单芯片[ 1792]神经元, 发放率[0.00000], 运行拍数[10000]拍, DoRun[10:07:05.852066 - 10:07:16.993702]
[lynHost.cpp:printNodeStat:967] [node0] COMM[], minDelay[10]
[lynHost.cpp:printNodeStat:968] [node0] 初始化阶段: 耗时[ 284253]us
[lynHost.cpp:printNodeStat:969] [node0] 停止阶段: 耗时[ 1618198]us
[lynHost.cpp:printNodeStat:970] [node0] 运行阶段: 耗时[ 6503198]us, 平均每拍[ 650]us
[lynHost.cpp:printNodeStat:971] [node0] 等待芯片运行: 耗时[ 5350125]us, 平均每拍[ 535]us
[lynHost.cpp:printNodeStat:972] [node0] 节点处理: 耗时[ 1022191]us, 平均每拍[ 102]us
[lynHost.cpp:printNodeStat:973] [node0] 并行: 耗时[ 1022176]us, 平均每拍[ 102]us
[lynHost.cpp:printNodeStat:974] [node0] 节点内处理: 耗时[ 819307]us, 平均每拍[ 81]us
[lynHost.cpp:printNodeStat:981] [node0] 节点间处理: 耗时[ 14]us, 平均每拍[ 0]us
[lynHost.cpp:printNodeStat:982] [node0] 脉冲搬迁(D2H): 耗时[ 12]us, 平均每拍[ 0]us
[lynHost.cpp:printNodeStat:983] [node0] *脉冲汇聚(Gather):耗时[ 0]us, 平均每拍[ 0]us
[lynHost.cpp:printNodeStat:984] [node0] *节点间通讯: 耗时[ 0]us, 平均每拍[ 0]us
[lynHost.cpp:printNodeStat:985] [node0] 同步: 耗时[ 0]us, 平均每拍[ 0]us
[lynHost.cpp:printNodeStat:986] [node0] 数据传输: 耗时[ 0]us, 平均每拍[ 0]us
[lynHost.cpp:printNodeStat:987] [node0] *脉冲散布(Scatter):耗时[ 0]us, 平均每拍[ 0]us
[lynHost.cpp:printNodeStat:988] [node0] 脉冲搬迁(H2D): 耗时[ 3]us, 平均每拍[ 0]us
[lynHost.cpp:printNodeStat:989] [node0] 平均每拍: 脉冲发放到其他节点的神经元数[ 0],本节点发送脉冲[ 0]
[lynHost.cpp:printNodeStat:991] [node0] 本节点接收脉冲[ 0],本节点处理脉冲[ 0]
[lynHost.cpp:printChipStat:1006] =============================处理时间最长的芯片 0 平均每拍[ 451]us=========================
[lynHost.cpp:printChipStat:1007] [node0-chip0] 平均每拍: 发往其他芯片的神经元数[ 0] 发往其他节点的神经元数[ 0]
[lynHost.cpp:printChipStat:1010] [node0-chip0] 片间发送脉冲[ 0] 片间接收脉冲[ 0] 节点间接收脉冲[ 0]
[lynHost.cpp:printChipStat:1014] [node0-chip0] ARM: 总耗时[ 210647]us, 平均每拍[ 21]us, 使用率[3.239%]
[lynHost.cpp:printChipStat:1015] [node0-chip0] 等待Host: 耗时[ 1850158]us, 平均每拍[ 185]us
[lynHost.cpp:printChipStat:1016] [node0-chip0] *读片外脉冲: 耗时[ 29817]us, 平均每拍[ 2]us
[lynHost.cpp:printChipStat:1017] [node0-chip0] *片外脉冲STDP更新: 耗时[ 0]us, 平均每拍[ 0]us
[lynHost.cpp:printChipStat:1018] [node0-chip0] *处理片外脉冲: 耗时[ 20449]us, 平均每拍[ 2]us
[lynHost.cpp:printChipStat:1019] [node0-chip0] *写脉冲包(串行): 耗时[ 2352]us, 平均每拍[ 0]us
[lynHost.cpp:printChipStat:1020] [node0-chip0] APU: 耗时[ 4307256]us, 平均每拍[ 430]us, 使用率[66.233%]
[lynHost.cpp:printChipStat:1021] [node0-chip0] *读脉冲并路由: 耗时[ 30246]us, 平均每拍[ 3]us
[lynHost.cpp:printChipStat:1022] [node0-chip0] 通知Host: 耗时[ 50723]us, 平均每拍[ 5]us
[lynHost.cpp:printChipStat:1023] [node0-chip0] sum1(vs 等待芯片运行): 平均每拍[ 629]us
[lynHost.cpp:printChipStat:1024] [node0-chip0] *片内脉冲STDP更新: 耗时[ 0]us, 平均每拍[ 0]us
[lynHost.cpp:printChipStat:1025] [node0-chip0] *处理片内脉冲: 耗时[ 127783]us, 平均每拍[ 12]us
[lynHost.cpp:printChipStat:1026] [node0-chip0] sum2(vs 节点处理): 平均每拍[ 12]us
[lynHost.cpp:printChipStat:1006] =============================处理时间最短的芯片 0 平均每拍[ 451]us=========================
[lynHost.cpp:printChipStat:1007] [node0-chip0] 平均每拍: 发往其他芯片的神经元数[ 0] 发往其他节点的神经元数[ 0]
[lynHost.cpp:printChipStat:1010] [node0-chip0] 片间发送脉冲[ 0] 片间接收脉冲[ 0] 节点间接收脉冲[ 0]
[lynHost.cpp:printChipStat:1014] [node0-chip0] ARM: 总耗时[ 210647]us, 平均每拍[ 21]us, 使用率[3.239%]
[lynHost.cpp:printChipStat:1015] [node0-chip0] 等待Host: 耗时[ 1850158]us, 平均每拍[ 185]us
[lynHost.cpp:printChipStat:1016] [node0-chip0] *读片外脉冲: 耗时[ 29817]us, 平均每拍[ 2]us
[lynHost.cpp:printChipStat:1017] [node0-chip0] *片外脉冲STDP更新: 耗时[ 0]us, 平均每拍[ 0]us
[lynHost.cpp:printChipStat:1018] [node0-chip0] *处理片外脉冲: 耗时[ 20449]us, 平均每拍[ 2]us
[lynHost.cpp:printChipStat:1019] [node0-chip0] *写脉冲包(串行): 耗时[ 2352]us, 平均每拍[ 0]us
[lynHost.cpp:printChipStat:1020] [node0-chip0] APU: 耗时[ 4307256]us, 平均每拍[ 430]us, 使用率[66.233%]
[lynHost.cpp:printChipStat:1021] [node0-chip0] *读脉冲并路由: 耗时[ 30246]us, 平均每拍[ 3]us
[lynHost.cpp:printChipStat:1022] [node0-chip0] 通知Host: 耗时[ 50723]us, 平均每拍[ 5]us
[lynHost.cpp:printChipStat:1023] [node0-chip0] sum1(vs 等待芯片运行): 平均每拍[ 629]us
[lynHost.cpp:printChipStat:1024] [node0-chip0] *片内脉冲STDP更新: 耗时[ 0]us, 平均每拍[ 0]us
[lynHost.cpp:printChipStat:1025] [node0-chip0] *处理片内脉冲: 耗时[ 127783]us, 平均每拍[ 12]us
[lynHost.cpp:printChipStat:1026] [node0-chip0] sum2(vs 节点处理): 平均每拍[ 12]us
2023/12/08 10:07:18.615853 2633201472 1095 [lynHost.cpp:showNodeInfo:934] ===========================================================
画出脉冲和电压:
import jax.numpy as jnp
from neurai.util import visualization
visualization.raster_plot(
mon["ts"], jnp.asarray(mon['ExpLIF_0.spike']), show=True, save=True, title="ExpLIF_0.spike")
visualization.line_plot(
mon["ts"],
jnp.asarray(mon['ExpLIF_0.v'])[:, 0],
xlabel="Time(ms)",
ylabel="ExpLIF_0.V(mv)",
show=True,
save=True,
title="ExpLIF_0.v")
深度学习#
KA200 也支持深度学习任务。这里是一个关于MNIST的简单示例。 与脑仿真网络相比,使用上有一些细微差别,将使用XLA计算图进行编译。
将平台设为 KA200:
from neurai.config import set_platform
set_platform(platform='apu')
加载测试数据集:
from neurai.setting import DATASETS_DIR
from neurai.datasets import MNIST, DataLoader
batch_size = 1
test_data = MNIST(DATASETS_DIR, download=True, train=False)
test_dataloader = DataLoader(dataset=test_data, batch_size=batch_size, drop_last=True)
定义深度学习网络模型:
from neurai.nn.layer.activate import Relu
from neurai import nn
class MNISTModel(nn.Module):
def setup(self):
# 创建网络层
self.fc1 = nn.Linear(50)
self.fc2 = nn.Linear(60)
self.fc3 = nn.Linear(50)
self.relu = Relu()
def __call__(self, input):
fc1_out = self.relu(self.fc1(input=input.reshape(-1, 28 * 28 * 1)))
fc2_out = self.relu(self.fc2(fc1_out))
fc3_out = self.fc3(fc2_out)
return fc3_out
实例化网络并获取权重:
model = Network()
params = model.init(input=jnp.ones([1, 28 * 28 * 1]))
当调用 net.init,这个方法将返回网络的初始化权重。如果已经训练过网络,可以直接加载权重。
运行网络推理:
nets_input = np.random.randn(1, 28 * 28 * 1).astype(np.float16)
train_loop = tqdm.tqdm(test_dataloader, desc="APU Test")
for sample in train_loop:
sample_data = (sample[0].reshape(-1, 28 * 28 * 1).astype(np.float32))[0]
predict_post = model.run(params, input=sample_data, annbuild_input=nets_input)
当调用 net.run 时,需要传入权重 param
、整个网络的输入 input
和ANN网络的输入 annbuild_input
。
以上操作可导出网络的计算图,并使用lyngor进行编译,编译产物可以在运行目录中看到,然后程序加载编译产物,进行推理。
运行日志如下:
graph_runtime_chip.py[line:169]-INFO: input start 1702262026.796168, run_start 1702262026.796358, run_end 1702262026.796678, input cost = 0.190735ms, run cost = 0.319719ms
util.py[line:267]-INFO: [run]-chip Total time running run(29932): 0.0006 seconds
graph_runtime_chip.py[line:169]-INFO: input start 1702262026.796955, run_start 1702262026.797131, run_end 1702262026.797454, input cost = 0.175714ms, run cost = 0.323296ms
util.py[line:267]-INFO: [run]-chip Total time running run(29935): 0.0006 seconds
graph_runtime_chip.py[line:169]-INFO: input start 1702262026.797737, run_start 1702262026.797919, run_end 1702262026.798204, input cost = 0.181198ms, run cost = 0.285149ms
util.py[line:267]-INFO: [run]-chip Total time running run(29938): 0.0006 seconds
graph_runtime_chip.py[line:169]-INFO: input start 1702262026.798485, run_start 1702262026.798597, run_end 1702262026.798872, input cost = 0.112057ms, run cost = 0.275612ms
util.py[line:267]-INFO: [run]-chip Total time running run(29941): 0.0005 seconds
graph_runtime_chip.py[line:169]-INFO: input start 1702262026.799140, run_start 1702262026.799253, run_end 1702262026.799511, input cost = 0.112772ms, run cost = 0.258446ms
util.py[line:267]-INFO: [run]-chip Total time running run(29944): 0.0005 seconds
graph_runtime_chip.py[line:169]-INFO: input start 1702262026.799774, run_start 1702262026.799887, run_end 1702262026.800156, input cost = 0.112534ms, run cost = 0.268698ms
util.py[line:267]-INFO: [run]-chip Total time running run(29947): 0.0005 seconds
graph_runtime_chip.py[line:169]-INFO: input start 1702262026.800421, run_start 1702262026.800535, run_end 1702262026.800803, input cost = 0.114441ms, run cost = 0.268221ms
util.py[line:267]-INFO: [run]-chip Total time running run(29950): 0.0005 seconds
graph_runtime_chip.py[line:169]-INFO: input start 1702262026.801093, run_start 1702262026.801209, run_end 1702262026.801481, input cost = 0.115871ms, run cost = 0.271797ms
util.py[line:267]-INFO: [run]-chip Total time running run(29953): 0.0005 seconds
graph_runtime_chip.py[line:169]-INFO: input start 1702262026.801754, run_start 1702262026.801868, run_end 1702262026.802135, input cost = 0.114202ms, run cost = 0.266790ms
util.py[line:267]-INFO: [run]-chip Total time running run(29956): 0.0005 seconds
graph_runtime_chip.py[line:169]-INFO: input start 1702262026.802421, run_start 1702262026.802536, run_end 1702262026.802800, input cost = 0.114679ms, run cost = 0.264168ms
util.py[line:267]-INFO: [run]-chip Total time running run(29959): 0.0005 seconds
graph_runtime_chip.py[line:169]-INFO: input start 1702262026.803068, run_start 1702262026.803182, run_end 1702262026.803437, input cost = 0.114202ms, run cost = 0.254869ms
util.py[line:267]-INFO: [run]-chip Total time running run(29962): 0.0005 seconds
graph_runtime_chip.py[line:169]-INFO: input start 1702262026.803702, run_start 1702262026.803813, run_end 1702262026.804079, input cost = 0.111341ms, run cost = 0.265837ms
util.py[line:267]-INFO: [run]-chip Total time running run(29965): 0.0005 seconds
graph_runtime_chip.py[line:169]-INFO: input start 1702262026.804359, run_start 1702262026.804473, run_end 1702262026.804746, input cost = 0.114441ms, run cost = 0.272751ms
util.py[line:267]-INFO: [run]-chip Total time running run(29968): 0.0005 seconds
graph_runtime_chip.py[line:169]-INFO: input start 1702262026.805013, run_start 1702262026.805128, run_end 1702262026.805392, input cost = 0.114441ms, run cost = 0.263929ms
util.py[line:267]-INFO: [run]-chip Total time running run(29971): 0.0005 seconds
graph_runtime_chip.py[line:169]-INFO: input start 1702262026.805657, run_start 1702262026.805781, run_end 1702262026.806055, input cost = 0.123978ms, run cost = 0.274420ms
util.py[line:267]-INFO: [run]-chip Total time running run(29974): 0.0005 seconds
graph_runtime_chip.py[line:169]-INFO: input start 1702262026.806339, run_start 1702262026.806456, run_end 1702262026.806731, input cost = 0.116825ms, run cost = 0.274897ms
util.py[line:267]-INFO: [run]-chip Total time running run(29977): 0.0005 seconds
graph_runtime_chip.py[line:169]-INFO: input start 1702262026.807003, run_start 1702262026.807115, run_end 1702262026.807372, input cost = 0.112295ms, run cost = 0.257492ms
util.py[line:267]-INFO: [run]-chip Total time running run(29980): 0.0005 seconds
graph_runtime_chip.py[line:169]-INFO: input start 1702262026.807665, run_start 1702262026.807851, run_end 1702262026.808112, input cost = 0.185966ms, run cost = 0.261307ms
util.py[line:267]-INFO: [run]-chip Total time running run(29983): 0.0006 seconds
graph_runtime_chip.py[line:169]-INFO: input start 1702262026.808383, run_start 1702262026.808498, run_end 1702262026.808765, input cost = 0.115156ms, run cost = 0.267029ms
util.py[line:267]-INFO: [run]-chip Total time running run(29986): 0.0005 seconds
graph_runtime_chip.py[line:169]-INFO: input start 1702262026.809034, run_start 1702262026.809148, run_end 1702262026.809450, input cost = 0.113964ms, run cost = 0.302076ms
util.py[line:267]-INFO: [run]-chip Total time running run(29989): 0.0005 seconds
graph_runtime_chip.py[line:169]-INFO: input start 1702262026.809724, run_start 1702262026.809842, run_end 1702262026.810115, input cost = 0.118256ms, run cost = 0.272989ms
util.py[line:267]-INFO: [run]-chip Total time running run(29992): 0.0005 seconds
graph_runtime_chip.py[line:169]-INFO: input start 1702262026.818538, run_start 1702262026.818726, run_end 1702262026.819072, input cost = 0.187635ms, run cost = 0.345945ms
util.py[line:267]-INFO: [run]-chip Total time running run(29995): 0.0006 seconds
graph_runtime_chip.py[line:169]-INFO: input start 1702262026.819352, run_start 1702262026.819527, run_end 1702262026.819799, input cost = 0.174522ms, run cost = 0.272274ms
util.py[line:267]-INFO: [run]-chip Total time running run(29998): 0.0005 seconds
graph_runtime_chip.py[line:169]-INFO: input start 1702262026.820090, run_start 1702262026.820207, run_end 1702262026.820475, input cost = 0.116825ms, run cost = 0.268221ms
util.py[line:267]-INFO: [run]-chip Total time running run(30001): 0.0005 seconds
graph_runtime_chip.py[line:169]-INFO: input start 1702262026.820745, run_start 1702262026.820859, run_end 1702262026.821125, input cost = 0.114441ms, run cost = 0.265598ms
util.py[line:267]-INFO: [run]-chip Total time running run(30004): 0.0005 seconds
graph_runtime_chip.py[line:169]-INFO: input start 1702262026.821391, run_start 1702262026.821503, run_end 1702262026.821776, input cost = 0.112295ms, run cost = 0.272512ms
util.py[line:267]-INFO: [run]-chip Total time running run(30007): 0.0005 seconds
graph_runtime_chip.py[line:169]-INFO: input start 1702262026.822042, run_start 1702262026.822157, run_end 1702262026.822445, input cost = 0.115395ms, run cost = 0.287771ms
util.py[line:267]-INFO: [run]-chip Total time running run(30010): 0.0005 seconds
graph_runtime_chip.py[line:169]-INFO: input start 1702262026.822714, run_start 1702262026.822826, run_end 1702262026.823088, input cost = 0.112057ms, run cost = 0.262260ms
util.py[line:267]-INFO: [run]-chip Total time running run(30013): 0.0005 seconds
graph_runtime_chip.py[line:169]-INFO: input start 1702262026.823372, run_start 1702262026.823487, run_end 1702262026.823751, input cost = 0.114441ms, run cost = 0.264645ms
util.py[line:267]-INFO: [run]-chip Total time running run(30016): 0.0005 seconds
graph_runtime_chip.py[line:169]-INFO: input start 1702262026.824018, run_start 1702262026.824131, run_end 1702262026.824396, input cost = 0.112057ms, run cost = 0.265598ms
util.py[line:267]-INFO: [run]-chip Total time running run(30019): 0.0005 seconds
graph_runtime_chip.py[line:169]-INFO: input start 1702262026.824661, run_start 1702262026.824772, run_end 1702262026.825037, input cost = 0.110865ms, run cost = 0.264168ms
util.py[line:267]-INFO: [run]-chip Total time running run(30022): 0.0005 seconds
graph_runtime_chip.py[line:169]-INFO: input start 1702262026.825307, run_start 1702262026.825418, run_end 1702262026.825686, input cost = 0.111818ms, run cost = 0.267267ms
util.py[line:267]-INFO: [run]-chip Total time running run(30025): 0.0005 seconds
graph_runtime_chip.py[line:169]-INFO: input start 1702262026.825966, run_start 1702262026.826081, run_end 1702262026.826347, input cost = 0.114441ms, run cost = 0.266552ms
util.py[line:267]-INFO: [run]-chip Total time running run(30028): 0.0319 seconds
graph_runtime_chip.py[line:169]-INFO: input start 1702262026.858100, run_start 1702262026.858397, run_end 1702262026.858735, input cost = 0.296831ms, run cost = 0.337839ms
util.py[line:267]-INFO: [run]-chip Total time running run(30031): 0.0007 seconds
APU Test: 100%|██████████| 10000/10000 [00:45<00:00, 218.78it/s]