当前位置: 首页 > news >正文

自己做国外网站百度网络推广优化

自己做国外网站,百度网络推广优化,只做女性的网站,新西兰注册公司做网站PyTorch 模型进阶训练技巧 自定义损失函数动态调整学习率 典型案例#xff1a;loss上下震荡 [外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-BndMyRX0-1692613806232)(attachment:image-2.png)] 1、自定义损失函数 1、PyTorch已经提供了很多常用…PyTorch 模型进阶训练技巧 自定义损失函数动态调整学习率 典型案例loss上下震荡 [外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-BndMyRX0-1692613806232)(attachment:image-2.png)] 1、自定义损失函数 1、PyTorch已经提供了很多常用的损失函数但是有些非通用的损失函数并未提供比如DiceLoss、HuberLoss…等2、模型如果出现loss震荡在经过调整数据集或超参后现象依然存在非通用损失函数或自定义损失函数针对特定模型会有更好的效果 比如DiceLoss是医学影像分割常用的损失函数定义如下 [外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-Fsl0SyZ6-1692613806233)(attachment:image-2.png)] Dice系数, 是一种集合相似度度量函数通常用于计算两个样本的相似度(值范围为 [0, 1])∣X∩Y∣表示X和Y之间的交集∣ X ∣ 和∣ Y ∣ 分别表示X和Y的元素个数其中分子中的系数 2是因为分母存在重复计算 X 和 Y 之间的共同元素的原因. import torch import torch.nn.functional as F import torch.nn as nn from torch.optim.lr_scheduler import LambdaLR from torch.optim.lr_scheduler import StepLR import torchvision from torch.utils.data import Dataset, DataLoader from torchvision.transforms import transforms import matplotlib.pyplot as plt from torch.utils.tensorboard import SummaryWriter import time import numpy as np#DiceLoss 实现 Vnet 医学影像分割模型的损失函数 class DiceLoss(nn.Module):def __init__(self, weightNone, size_averageTrue):super(DiceLoss, self).__init__()def forward(self, inputs, targets, smooth1):inputs F.sigmoid(inputs) inputs inputs.view(-1)targets targets.view(-1)intersection (inputs * targets).sum() dice_loss 1 - (2.*intersection smooth)/(inputs.sum() targets.sum() smooth)return dice_loss#自定义实现多分类损失函数 处理多分类 # cross_entropy L2正则化 class MyLoss(torch.nn.Module):def __init__(self, weight_decay0.01):super(MyLoss, self).__init__()self.weight_decay weight_decaydef forward(self, inputs, targets):ce_loss F.cross_entropy(inputs, targets)l2_loss torch.tensor(0., requires_gradTrue).to(inputs.device)for name, param in self.named_parameters():if weight in name:l2_loss torch.norm(param)loss ce_loss self.weight_decay * l2_lossreturn loss 注 在自定义损失函数时涉及到数学运算时我们最好全程使用PyTorch提供的张量计算接口利用Pytorch张量自带的求导机制 #超参数定义 # 批次的大小 batch_size 16 #可选32、64、128 # 优化器的学习率 lr 1e-4 #运行epoch max_epochs 2 # 方案二使用“device”后续对要使用GPU的变量用.to(device)即可 device torch.device(cuda:1 if torch.cuda.is_available() else cpu) # 指明调用的GPU为1号# 数据读取 #cifar10数据集为例给出构建Dataset类的方式 from torchvision import datasets#“data_transform”可以对图像进行一定的变换如翻转、裁剪、归一化等操作可自己定义 data_transformtransforms.Compose([transforms.ToTensor(),transforms.Normalize((0.5,0.5,0.5),(0.5,0.5,0.5))])train_cifar_dataset datasets.CIFAR10(cifar10,trainTrue, downloadFalse,transformdata_transform) test_cifar_dataset datasets.CIFAR10(cifar10,trainFalse, downloadFalse,transformdata_transform)#构建好Dataset后就可以使用DataLoader来按批次读入数据了 train_loader torch.utils.data.DataLoader(train_cifar_dataset, batch_sizebatch_size, num_workers4, shuffleTrue, drop_lastTrue)test_loader torch.utils.data.DataLoader(test_cifar_dataset, batch_sizebatch_size, num_workers4, shuffleFalse) # restnet50 pretrained Resnet50 torchvision.models.resnet50(pretrainedTrue) Resnet50.fc.out_features10 print(Resnet50)D:\Users\xulele\Anaconda3\lib\site-packages\torchvision\models\_utils.py:208: UserWarning: The parameter pretrained is deprecated since 0.13 and may be removed in the future, please use weights instead.warnings.warn( D:\Users\xulele\Anaconda3\lib\site-packages\torchvision\models\_utils.py:223: UserWarning: Arguments other than a weight enum or None for weights are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing weightsResNet50_Weights.IMAGENET1K_V1. You can also use weightsResNet50_Weights.DEFAULT to get the most up-to-date weights.warnings.warn(msg)ResNet((conv1): Conv2d(3, 64, kernel_size(7, 7), stride(2, 2), padding(3, 3), biasFalse)(bn1): BatchNorm2d(64, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)(relu): ReLU(inplaceTrue)(maxpool): MaxPool2d(kernel_size3, stride2, padding1, dilation1, ceil_modeFalse)(layer1): Sequential((0): Bottleneck((conv1): Conv2d(64, 64, kernel_size(1, 1), stride(1, 1), biasFalse)(bn1): BatchNorm2d(64, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)(conv2): Conv2d(64, 64, kernel_size(3, 3), stride(1, 1), padding(1, 1), biasFalse)(bn2): BatchNorm2d(64, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)(conv3): Conv2d(64, 256, kernel_size(1, 1), stride(1, 1), biasFalse)(bn3): BatchNorm2d(256, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)(relu): ReLU(inplaceTrue)(downsample): Sequential((0): Conv2d(64, 256, kernel_size(1, 1), stride(1, 1), biasFalse)(1): BatchNorm2d(256, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)))(1): Bottleneck((conv1): Conv2d(256, 64, kernel_size(1, 1), stride(1, 1), biasFalse)(bn1): BatchNorm2d(64, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)(conv2): Conv2d(64, 64, kernel_size(3, 3), stride(1, 1), padding(1, 1), biasFalse)(bn2): BatchNorm2d(64, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)(conv3): Conv2d(64, 256, kernel_size(1, 1), stride(1, 1), biasFalse)(bn3): BatchNorm2d(256, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)(relu): ReLU(inplaceTrue))(2): Bottleneck((conv1): Conv2d(256, 64, kernel_size(1, 1), stride(1, 1), biasFalse)(bn1): BatchNorm2d(64, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)(conv2): Conv2d(64, 64, kernel_size(3, 3), stride(1, 1), padding(1, 1), biasFalse)(bn2): BatchNorm2d(64, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)(conv3): Conv2d(64, 256, kernel_size(1, 1), stride(1, 1), biasFalse)(bn3): BatchNorm2d(256, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)(relu): ReLU(inplaceTrue)))(layer2): Sequential((0): Bottleneck((conv1): Conv2d(256, 128, kernel_size(1, 1), stride(1, 1), biasFalse)(bn1): BatchNorm2d(128, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)(conv2): Conv2d(128, 128, kernel_size(3, 3), stride(2, 2), padding(1, 1), biasFalse)(bn2): BatchNorm2d(128, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)(conv3): Conv2d(128, 512, kernel_size(1, 1), stride(1, 1), biasFalse)(bn3): BatchNorm2d(512, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)(relu): ReLU(inplaceTrue)(downsample): Sequential((0): Conv2d(256, 512, kernel_size(1, 1), stride(2, 2), biasFalse)(1): BatchNorm2d(512, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)))(1): Bottleneck((conv1): Conv2d(512, 128, kernel_size(1, 1), stride(1, 1), biasFalse)(bn1): BatchNorm2d(128, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)(conv2): Conv2d(128, 128, kernel_size(3, 3), stride(1, 1), padding(1, 1), biasFalse)(bn2): BatchNorm2d(128, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)(conv3): Conv2d(128, 512, kernel_size(1, 1), stride(1, 1), biasFalse)(bn3): BatchNorm2d(512, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)(relu): ReLU(inplaceTrue))(2): Bottleneck((conv1): Conv2d(512, 128, kernel_size(1, 1), stride(1, 1), biasFalse)(bn1): BatchNorm2d(128, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)(conv2): Conv2d(128, 128, kernel_size(3, 3), stride(1, 1), padding(1, 1), biasFalse)(bn2): BatchNorm2d(128, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)(conv3): Conv2d(128, 512, kernel_size(1, 1), stride(1, 1), biasFalse)(bn3): BatchNorm2d(512, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)(relu): ReLU(inplaceTrue))(3): Bottleneck((conv1): Conv2d(512, 128, kernel_size(1, 1), stride(1, 1), biasFalse)(bn1): BatchNorm2d(128, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)(conv2): Conv2d(128, 128, kernel_size(3, 3), stride(1, 1), padding(1, 1), biasFalse)(bn2): BatchNorm2d(128, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)(conv3): Conv2d(128, 512, kernel_size(1, 1), stride(1, 1), biasFalse)(bn3): BatchNorm2d(512, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)(relu): ReLU(inplaceTrue)))(layer3): Sequential((0): Bottleneck((conv1): Conv2d(512, 256, kernel_size(1, 1), stride(1, 1), biasFalse)(bn1): BatchNorm2d(256, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)(conv2): Conv2d(256, 256, kernel_size(3, 3), stride(2, 2), padding(1, 1), biasFalse)(bn2): BatchNorm2d(256, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)(conv3): Conv2d(256, 1024, kernel_size(1, 1), stride(1, 1), biasFalse)(bn3): BatchNorm2d(1024, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)(relu): ReLU(inplaceTrue)(downsample): Sequential((0): Conv2d(512, 1024, kernel_size(1, 1), stride(2, 2), biasFalse)(1): BatchNorm2d(1024, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)))(1): Bottleneck((conv1): Conv2d(1024, 256, kernel_size(1, 1), stride(1, 1), biasFalse)(bn1): BatchNorm2d(256, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)(conv2): Conv2d(256, 256, kernel_size(3, 3), stride(1, 1), padding(1, 1), biasFalse)(bn2): BatchNorm2d(256, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)(conv3): Conv2d(256, 1024, kernel_size(1, 1), stride(1, 1), biasFalse)(bn3): BatchNorm2d(1024, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)(relu): ReLU(inplaceTrue))(2): Bottleneck((conv1): Conv2d(1024, 256, kernel_size(1, 1), stride(1, 1), biasFalse)(bn1): BatchNorm2d(256, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)(conv2): Conv2d(256, 256, kernel_size(3, 3), stride(1, 1), padding(1, 1), biasFalse)(bn2): BatchNorm2d(256, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)(conv3): Conv2d(256, 1024, kernel_size(1, 1), stride(1, 1), biasFalse)(bn3): BatchNorm2d(1024, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)(relu): ReLU(inplaceTrue))(3): Bottleneck((conv1): Conv2d(1024, 256, kernel_size(1, 1), stride(1, 1), biasFalse)(bn1): BatchNorm2d(256, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)(conv2): Conv2d(256, 256, kernel_size(3, 3), stride(1, 1), padding(1, 1), biasFalse)(bn2): BatchNorm2d(256, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)(conv3): Conv2d(256, 1024, kernel_size(1, 1), stride(1, 1), biasFalse)(bn3): BatchNorm2d(1024, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)(relu): ReLU(inplaceTrue))(4): Bottleneck((conv1): Conv2d(1024, 256, kernel_size(1, 1), stride(1, 1), biasFalse)(bn1): BatchNorm2d(256, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)(conv2): Conv2d(256, 256, kernel_size(3, 3), stride(1, 1), padding(1, 1), biasFalse)(bn2): BatchNorm2d(256, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)(conv3): Conv2d(256, 1024, kernel_size(1, 1), stride(1, 1), biasFalse)(bn3): BatchNorm2d(1024, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)(relu): ReLU(inplaceTrue))(5): Bottleneck((conv1): Conv2d(1024, 256, kernel_size(1, 1), stride(1, 1), biasFalse)(bn1): BatchNorm2d(256, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)(conv2): Conv2d(256, 256, kernel_size(3, 3), stride(1, 1), padding(1, 1), biasFalse)(bn2): BatchNorm2d(256, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)(conv3): Conv2d(256, 1024, kernel_size(1, 1), stride(1, 1), biasFalse)(bn3): BatchNorm2d(1024, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)(relu): ReLU(inplaceTrue)))(layer4): Sequential((0): Bottleneck((conv1): Conv2d(1024, 512, kernel_size(1, 1), stride(1, 1), biasFalse)(bn1): BatchNorm2d(512, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)(conv2): Conv2d(512, 512, kernel_size(3, 3), stride(2, 2), padding(1, 1), biasFalse)(bn2): BatchNorm2d(512, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)(conv3): Conv2d(512, 2048, kernel_size(1, 1), stride(1, 1), biasFalse)(bn3): BatchNorm2d(2048, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)(relu): ReLU(inplaceTrue)(downsample): Sequential((0): Conv2d(1024, 2048, kernel_size(1, 1), stride(2, 2), biasFalse)(1): BatchNorm2d(2048, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)))(1): Bottleneck((conv1): Conv2d(2048, 512, kernel_size(1, 1), stride(1, 1), biasFalse)(bn1): BatchNorm2d(512, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)(conv2): Conv2d(512, 512, kernel_size(3, 3), stride(1, 1), padding(1, 1), biasFalse)(bn2): BatchNorm2d(512, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)(conv3): Conv2d(512, 2048, kernel_size(1, 1), stride(1, 1), biasFalse)(bn3): BatchNorm2d(2048, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)(relu): ReLU(inplaceTrue))(2): Bottleneck((conv1): Conv2d(2048, 512, kernel_size(1, 1), stride(1, 1), biasFalse)(bn1): BatchNorm2d(512, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)(conv2): Conv2d(512, 512, kernel_size(3, 3), stride(1, 1), padding(1, 1), biasFalse)(bn2): BatchNorm2d(512, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)(conv3): Conv2d(512, 2048, kernel_size(1, 1), stride(1, 1), biasFalse)(bn3): BatchNorm2d(2048, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)(relu): ReLU(inplaceTrue)))(avgpool): AdaptiveAvgPool2d(output_size(1, 1))(fc): Linear(in_features2048, out_features10, biasTrue) )#训练验证# 定义损失函数和优化器 device torch.device(cuda:0 if torch.cuda.is_available() else cpu) # 损失函数自定义损失函数 criterion MyLoss() # 优化器 optimizer torch.optim.Adam(Resnet50.parameters(), lrlr) epoch max_epochs Resnet50 Resnet50.to(device) total_step len(train_loader) train_all_loss [] test_all_loss []for i in range(epoch):Resnet50.train()train_total_loss 0train_total_num 0train_total_correct 0for iter, (images,labels) in enumerate(train_loader):images images.to(device)labels labels.to(device)outputs Resnet50(images)loss criterion(outputs,labels)train_total_correct (outputs.argmax(1) labels).sum().item()#backwordoptimizer.zero_grad()loss.backward()optimizer.step()train_total_num labels.shape[0]train_total_loss loss.item()print(Epoch [{}/{}], Iter [{}/{}], train_loss:{:4f}.format(i1,epoch,iter1,total_step,loss.item()/labels.shape[0]))Resnet50.eval()test_total_loss 0test_total_correct 0test_total_num 0for iter,(images,labels) in enumerate(test_loader):images images.to(device)labels labels.to(device)outputs Resnet50(images)loss criterion(outputs,labels)test_total_correct (outputs.argmax(1) labels).sum().item()test_total_loss loss.item()test_total_num labels.shape[0]print(Epoch [{}/{}], train_loss:{:.4f}, train_acc:{:.4f}%, test_loss:{:.4f}, test_acc:{:.4f}%.format(i1, epoch, train_total_loss / train_total_num, train_total_correct / train_total_num * 100, test_total_loss / test_total_num, test_total_correct / test_total_num * 100))train_all_loss.append(np.round(train_total_loss / train_total_num,4))test_all_loss.append(np.round(test_total_loss / test_total_num,4)) Epoch [1/10], Iter [1/3125], train_loss:0.710159 Epoch [1/10], Iter [2/3125], train_loss:0.761919 Epoch [1/10], Iter [3/3125], train_loss:0.748266 Epoch [1/10], Iter [4/3125], train_loss:0.777146 Epoch [1/10], Iter [5/3125], train_loss:0.699766 Epoch [1/10], Iter [6/3125], train_loss:0.741773 Epoch [1/10], Iter [7/3125], train_loss:0.687201 Epoch [1/10], Iter [8/3125], train_loss:0.618017 Epoch [1/10], Iter [9/3125], train_loss:0.653016 Epoch [1/10], Iter [10/3125], train_loss:0.690120 Epoch [1/10], Iter [11/3125], train_loss:0.648009 Epoch [1/10], Iter [12/3125], train_loss:0.694650 Epoch [1/10], Iter [13/3125], train_loss:0.502452 Epoch [1/10], Iter [14/3125], train_loss:0.538519 Epoch [1/10], Iter [15/3125], train_loss:0.596250 Epoch [1/10], Iter [16/3125], train_loss:0.607648 Epoch [1/10], Iter [17/3125], train_loss:0.574751 Epoch [1/10], Iter [18/3125], train_loss:0.584658 Epoch [1/10], Iter [19/3125], train_loss:0.428719 Epoch [1/10], Iter [20/3125], train_loss:0.530868 Epoch [1/10], Iter [21/3125], train_loss:0.496522 Epoch [1/10], Iter [22/3125], train_loss:0.463315 Epoch [1/10], Iter [23/3125], train_loss:0.453258 Epoch [1/10], Iter [24/3125], train_loss:0.409726 Epoch [1/10], Iter [25/3125], train_loss:0.422388 Epoch [1/10], Iter [26/3125], train_loss:0.414946 Epoch [1/10], Iter [27/3125], train_loss:0.512142 Epoch [1/10], Iter [28/3125], train_loss:0.400936 Epoch [1/10], Iter [29/3125], train_loss:0.405139 Epoch [1/10], Iter [30/3125], train_loss:0.346599 Epoch [1/10], Iter [31/3125], train_loss:0.388829 Epoch [1/10], Iter [32/3125], train_loss:0.389818 Epoch [1/10], Iter [33/3125], train_loss:0.420276 Epoch [1/10], Iter [34/3125], train_loss:0.376930 Epoch [1/10], Iter [35/3125], train_loss:0.385421 Epoch [1/10], Iter [36/3125], train_loss:0.308666 Epoch [1/10], Iter [37/3125], train_loss:0.287350 Epoch [1/10], Iter [38/3125], train_loss:0.235770 Epoch [1/10], Iter [39/3125], train_loss:0.238073 Epoch [1/10], Iter [40/3125], train_loss:0.255732 Epoch [1/10], Iter [41/3125], train_loss:0.351971 Epoch [1/10], Iter [42/3125], train_loss:0.255061 Epoch [1/10], Iter [43/3125], train_loss:0.372930 Epoch [1/10], Iter [44/3125], train_loss:0.294059 Epoch [1/10], Iter [45/3125], train_loss:0.291519 Epoch [1/10], Iter [46/3125], train_loss:0.293720 Epoch [1/10], Iter [47/3125], train_loss:0.313904 Epoch [1/10], Iter [48/3125], train_loss:0.468409 Epoch [1/10], Iter [49/3125], train_loss:0.289942 Epoch [1/10], Iter [50/3125], train_loss:0.314422 Epoch [1/10], Iter [51/3125], train_loss:0.193365 Epoch [1/10], Iter [52/3125], train_loss:0.280942 Epoch [1/10], Iter [53/3125], train_loss:0.194293 Epoch [1/10], Iter [54/3125], train_loss:0.271868 Epoch [1/10], Iter [55/3125], train_loss:0.244220 Epoch [1/10], Iter [56/3125], train_loss:0.203591 Epoch [1/10], Iter [57/3125], train_loss:0.253909 Epoch [1/10], Iter [58/3125], train_loss:0.189856 Epoch [1/10], Iter [59/3125], train_loss:0.251850 Epoch [1/10], Iter [60/3125], train_loss:0.231074 Epoch [1/10], Iter [61/3125], train_loss:0.226731 Epoch [1/10], Iter [62/3125], train_loss:0.175667 Epoch [1/10], Iter [63/3125], train_loss:0.184940 Epoch [1/10], Iter [64/3125], train_loss:0.210438 Epoch [1/10], Iter [65/3125], train_loss:0.190574 Epoch [1/10], Iter [66/3125], train_loss:0.238683 Epoch [1/10], Iter [67/3125], train_loss:0.195508 Epoch [1/10], Iter [68/3125], train_loss:0.152640 Epoch [1/10], Iter [69/3125], train_loss:0.240555 Epoch [1/10], Iter [70/3125], train_loss:0.134351 Epoch [1/10], Iter [71/3125], train_loss:0.183020 Epoch [1/10], Iter [72/3125], train_loss:0.211488 Epoch [1/10], Iter [73/3125], train_loss:0.140310 Epoch [1/10], Iter [74/3125], train_loss:0.162346 Epoch [1/10], Iter [75/3125], train_loss:0.175559 Epoch [1/10], Iter [76/3125], train_loss:0.165264 Epoch [1/10], Iter [77/3125], train_loss:0.232803 Epoch [1/10], Iter [78/3125], train_loss:0.175323 Epoch [1/10], Iter [79/3125], train_loss:0.215453 Epoch [1/10], Iter [80/3125], train_loss:0.229922 Epoch [1/10], Iter [81/3125], train_loss:0.166971 Epoch [1/10], Iter [82/3125], train_loss:0.252459 Epoch [1/10], Iter [83/3125], train_loss:0.175405 Epoch [1/10], Iter [84/3125], train_loss:0.174851 Epoch [1/10], Iter [85/3125], train_loss:0.219277 Epoch [1/10], Iter [86/3125], train_loss:0.200698 Epoch [1/10], Iter [87/3125], train_loss:0.164529 Epoch [1/10], Iter [88/3125], train_loss:0.223835 Epoch [1/10], Iter [89/3125], train_loss:0.132322 Epoch [1/10], Iter [90/3125], train_loss:0.185210 Epoch [1/10], Iter [91/3125], train_loss:0.125042 Epoch [1/10], Iter [92/3125], train_loss:0.127481 Epoch [1/10], Iter [93/3125], train_loss:0.213097 Epoch [1/10], Iter [94/3125], train_loss:0.191506 Epoch [1/10], Iter [95/3125], train_loss:0.169901 Epoch [1/10], Iter [96/3125], train_loss:0.177843 Epoch [1/10], Iter [97/3125], train_loss:0.192217 Epoch [1/10], Iter [98/3125], train_loss:0.186991 Epoch [1/10], Iter [99/3125], train_loss:0.127605 Epoch [1/10], Iter [100/3125], train_loss:0.130038 Epoch [1/10], Iter [101/3125], train_loss:0.139159 Epoch [1/10], Iter [102/3125], train_loss:0.152760 Epoch [1/10], Iter [103/3125], train_loss:0.152227 Epoch [1/10], Iter [104/3125], train_loss:0.128511 Epoch [1/10], Iter [105/3125], train_loss:0.126772 Epoch [1/10], Iter [106/3125], train_loss:0.220105 Epoch [1/10], Iter [107/3125], train_loss:0.163889 Epoch [1/10], Iter [108/3125], train_loss:0.205263 Epoch [1/10], Iter [109/3125], train_loss:0.181927 Epoch [1/10], Iter [110/3125], train_loss:0.126500 Epoch [1/10], Iter [111/3125], train_loss:0.154556 Epoch [1/10], Iter [112/3125], train_loss:0.169978 Epoch [1/10], Iter [113/3125], train_loss:0.166387 Epoch [1/10], Iter [114/3125], train_loss:0.160409 Epoch [1/10], Iter [115/3125], train_loss:0.123102 Epoch [1/10], Iter [116/3125], train_loss:0.133461 Epoch [1/10], Iter [117/3125], train_loss:0.136813 Epoch [1/10], Iter [118/3125], train_loss:0.100353 Epoch [1/10], Iter [119/3125], train_loss:0.126170 Epoch [1/10], Iter [120/3125], train_loss:0.141422 Epoch [1/10], Iter [121/3125], train_loss:0.157280 Epoch [1/10], Iter [122/3125], train_loss:0.113595 Epoch [1/10], Iter [123/3125], train_loss:0.159074 Epoch [1/10], Iter [124/3125], train_loss:0.108684 Epoch [1/10], Iter [125/3125], train_loss:0.175729 Epoch [1/10], Iter [126/3125], train_loss:0.071910 Epoch [1/10], Iter [127/3125], train_loss:0.124298 Epoch [1/10], Iter [128/3125], train_loss:0.115980 Epoch [1/10], Iter [129/3125], train_loss:0.132223 Epoch [1/10], Iter [130/3125], train_loss:0.114184 Epoch [1/10], Iter [131/3125], train_loss:0.123914 Epoch [1/10], Iter [132/3125], train_loss:0.150845 Epoch [1/10], Iter [133/3125], train_loss:0.208639 Epoch [1/10], Iter [134/3125], train_loss:0.106705 Epoch [1/10], Iter [135/3125], train_loss:0.177262 Epoch [1/10], Iter [136/3125], train_loss:0.157350 Epoch [1/10], Iter [137/3125], train_loss:0.149479 Epoch [1/10], Iter [138/3125], train_loss:0.096941 Epoch [1/10], Iter [139/3125], train_loss:0.174548 Epoch [1/10], Iter [140/3125], train_loss:0.156214 Epoch [1/10], Iter [141/3125], train_loss:0.135187 Epoch [1/10], Iter [142/3125], train_loss:0.136901 Epoch [1/10], Iter [143/3125], train_loss:0.122161 Epoch [1/10], Iter [144/3125], train_loss:0.139143 Epoch [1/10], Iter [145/3125], train_loss:0.119795 Epoch [1/10], Iter [146/3125], train_loss:0.122523 Epoch [1/10], Iter [147/3125], train_loss:0.136952 Epoch [1/10], Iter [148/3125], train_loss:0.175852 Epoch [1/10], Iter [149/3125], train_loss:0.107031 Epoch [1/10], Iter [150/3125], train_loss:0.175130 Epoch [1/10], Iter [151/3125], train_loss:0.159306 Epoch [1/10], Iter [152/3125], train_loss:0.149552 Epoch [1/10], Iter [153/3125], train_loss:0.166173 Epoch [1/10], Iter [154/3125], train_loss:0.165044 Epoch [1/10], Iter [155/3125], train_loss:0.116875 Epoch [1/10], Iter [156/3125], train_loss:0.104037 Epoch [1/10], Iter [157/3125], train_loss:0.129057 Epoch [1/10], Iter [158/3125], train_loss:0.141920 Epoch [1/10], Iter [159/3125], train_loss:0.102720 Epoch [1/10], Iter [160/3125], train_loss:0.097012 Epoch [1/10], Iter [161/3125], train_loss:0.157148 Epoch [1/10], Iter [162/3125], train_loss:0.117710 Epoch [1/10], Iter [163/3125], train_loss:0.112908 Epoch [1/10], Iter [164/3125], train_loss:0.096563 Epoch [1/10], Iter [165/3125], train_loss:0.076501 Epoch [1/10], Iter [166/3125], train_loss:0.147476 Epoch [1/10], Iter [167/3125], train_loss:0.177934 Epoch [1/10], Iter [168/3125], train_loss:0.121549 Epoch [1/10], Iter [169/3125], train_loss:0.124102 Epoch [1/10], Iter [170/3125], train_loss:0.097225 Epoch [1/10], Iter [171/3125], train_loss:0.104199 Epoch [1/10], Iter [172/3125], train_loss:0.150368 Epoch [1/10], Iter [173/3125], train_loss:0.098011 Epoch [1/10], Iter [174/3125], train_loss:0.131318 Epoch [1/10], Iter [175/3125], train_loss:0.120925 Epoch [1/10], Iter [176/3125], train_loss:0.120460 Epoch [1/10], Iter [177/3125], train_loss:0.106729 Epoch [1/10], Iter [178/3125], train_loss:0.161727 Epoch [1/10], Iter [179/3125], train_loss:0.169705 Epoch [1/10], Iter [180/3125], train_loss:0.142939 Epoch [1/10], Iter [181/3125], train_loss:0.120374 Epoch [1/10], Iter [182/3125], train_loss:0.120579 Epoch [1/10], Iter [183/3125], train_loss:0.093452 Epoch [1/10], Iter [184/3125], train_loss:0.102697 Epoch [1/10], Iter [185/3125], train_loss:0.129010 Epoch [1/10], Iter [186/3125], train_loss:0.127772 Epoch [1/10], Iter [187/3125], train_loss:0.121482 Epoch [1/10], Iter [188/3125], train_loss:0.153874 Epoch [1/10], Iter [189/3125], train_loss:0.122253 Epoch [1/10], Iter [190/3125], train_loss:0.135232 Epoch [1/10], Iter [191/3125], train_loss:0.095962 Epoch [1/10], Iter [192/3125], train_loss:0.159813 Epoch [1/10], Iter [193/3125], train_loss:0.110215 Epoch [1/10], Iter [194/3125], train_loss:0.103142 Epoch [1/10], Iter [195/3125], train_loss:0.106792 Epoch [1/10], Iter [196/3125], train_loss:0.108262 Epoch [1/10], Iter [197/3125], train_loss:0.109841 Epoch [1/10], Iter [198/3125], train_loss:0.141134 Epoch [1/10], Iter [199/3125], train_loss:0.104478 Epoch [1/10], Iter [200/3125], train_loss:0.119154 Epoch [1/10], Iter [201/3125], train_loss:0.143389 Epoch [1/10], Iter [202/3125], train_loss:0.106533 Epoch [1/10], Iter [203/3125], train_loss:0.104834 Epoch [1/10], Iter [204/3125], train_loss:0.096285 Epoch [1/10], Iter [205/3125], train_loss:0.192590 Epoch [1/10], Iter [206/3125], train_loss:0.131787 Epoch [1/10], Iter [207/3125], train_loss:0.093841 Epoch [1/10], Iter [208/3125], train_loss:0.093261 Epoch [1/10], Iter [209/3125], train_loss:0.090215 Epoch [1/10], Iter [210/3125], train_loss:0.062551 Epoch [1/10], Iter [211/3125], train_loss:0.103201 Epoch [1/10], Iter [212/3125], train_loss:0.101281 Epoch [1/10], Iter [213/3125], train_loss:0.112832 Epoch [1/10], Iter [214/3125], train_loss:0.109726 Epoch [1/10], Iter [215/3125], train_loss:0.193847 Epoch [1/10], Iter [216/3125], train_loss:0.114712 Epoch [1/10], Iter [217/3125], train_loss:0.096408 Epoch [1/10], Iter [218/3125], train_loss:0.104277 Epoch [1/10], Iter [219/3125], train_loss:0.101230 Epoch [1/10], Iter [220/3125], train_loss:0.088779 Epoch [1/10], Iter [221/3125], train_loss:0.122967 Epoch [1/10], Iter [222/3125], train_loss:0.132155 Epoch [1/10], Iter [223/3125], train_loss:0.106906 Epoch [1/10], Iter [224/3125], train_loss:0.101865 Epoch [1/10], Iter [225/3125], train_loss:0.094080 Epoch [1/10], Iter [226/3125], train_loss:0.117470 Epoch [1/10], Iter [227/3125], train_loss:0.107198 Epoch [1/10], Iter [228/3125], train_loss:0.113856 Epoch [1/10], Iter [229/3125], train_loss:0.113308 Epoch [1/10], Iter [230/3125], train_loss:0.136503 Epoch [1/10], Iter [231/3125], train_loss:0.096320 Epoch [1/10], Iter [232/3125], train_loss:0.131607 Epoch [1/10], Iter [233/3125], train_loss:0.140338 Epoch [1/10], Iter [234/3125], train_loss:0.125807 Epoch [1/10], Iter [235/3125], train_loss:0.109107 Epoch [1/10], Iter [236/3125], train_loss:0.104653 Epoch [1/10], Iter [237/3125], train_loss:0.112867 Epoch [1/10], Iter [238/3125], train_loss:0.096239 Epoch [1/10], Iter [239/3125], train_loss:0.113070 Epoch [1/10], Iter [240/3125], train_loss:0.138504 Epoch [1/10], Iter [241/3125], train_loss:0.116264 Epoch [1/10], Iter [242/3125], train_loss:0.140497 Epoch [1/10], Iter [243/3125], train_loss:0.111269 Epoch [1/10], Iter [244/3125], train_loss:0.126607 Epoch [1/10], Iter [245/3125], train_loss:0.166210 Epoch [1/10], Iter [246/3125], train_loss:0.114601 Epoch [1/10], Iter [247/3125], train_loss:0.086945 Epoch [1/10], Iter [248/3125], train_loss:0.117582 Epoch [1/10], Iter [249/3125], train_loss:0.103387 Epoch [1/10], Iter [250/3125], train_loss:0.105529 Epoch [1/10], Iter [251/3125], train_loss:0.095726 Epoch [1/10], Iter [252/3125], train_loss:0.099371 Epoch [1/10], Iter [253/3125], train_loss:0.086019 Epoch [1/10], Iter [254/3125], train_loss:0.117785 Epoch [1/10], Iter [255/3125], train_loss:0.095674 Epoch [1/10], Iter [256/3125], train_loss:0.107202 Epoch [1/10], Iter [257/3125], train_loss:0.106855 Epoch [1/10], Iter [258/3125], train_loss:0.089076 Epoch [1/10], Iter [259/3125], train_loss:0.085481 Epoch [1/10], Iter [260/3125], train_loss:0.105372 Epoch [1/10], Iter [261/3125], train_loss:0.135841 Epoch [1/10], Iter [262/3125], train_loss:0.091050 Epoch [1/10], Iter [263/3125], train_loss:0.104396 Epoch [1/10], Iter [264/3125], train_loss:0.085995 Epoch [1/10], Iter [265/3125], train_loss:0.082015 Epoch [1/10], Iter [266/3125], train_loss:0.101983 Epoch [1/10], Iter [267/3125], train_loss:0.082330 Epoch [1/10], Iter [268/3125], train_loss:0.096020 Epoch [1/10], Iter [269/3125], train_loss:0.107438 Epoch [1/10], Iter [270/3125], train_loss:0.108927 Epoch [1/10], Iter [271/3125], train_loss:0.090110 Epoch [1/10], Iter [272/3125], train_loss:0.082612 Epoch [1/10], Iter [273/3125], train_loss:0.124343 Epoch [1/10], Iter [274/3125], train_loss:0.134607 Epoch [1/10], Iter [275/3125], train_loss:0.103530 Epoch [1/10], Iter [276/3125], train_loss:0.088286 Epoch [1/10], Iter [277/3125], train_loss:0.120471 Epoch [1/10], Iter [278/3125], train_loss:0.090534 Epoch [1/10], Iter [279/3125], train_loss:0.098560 Epoch [1/10], Iter [280/3125], train_loss:0.093890 Epoch [1/10], Iter [281/3125], train_loss:0.114845 Epoch [1/10], Iter [282/3125], train_loss:0.155583 Epoch [1/10], Iter [283/3125], train_loss:0.084580 Epoch [1/10], Iter [284/3125], train_loss:0.078266 Epoch [1/10], Iter [285/3125], train_loss:0.089209 Epoch [1/10], Iter [286/3125], train_loss:0.129949 Epoch [1/10], Iter [287/3125], train_loss:0.068909 Epoch [1/10], Iter [288/3125], train_loss:0.120867 Epoch [1/10], Iter [289/3125], train_loss:0.107639 Epoch [1/10], Iter [290/3125], train_loss:0.099353 Epoch [1/10], Iter [291/3125], train_loss:0.132016 Epoch [1/10], Iter [292/3125], train_loss:0.090960 Epoch [1/10], Iter [293/3125], train_loss:0.101058 Epoch [1/10], Iter [294/3125], train_loss:0.096238 Epoch [1/10], Iter [295/3125], train_loss:0.084716 Epoch [1/10], Iter [296/3125], train_loss:0.079769 Epoch [1/10], Iter [297/3125], train_loss:0.124798 Epoch [1/10], Iter [298/3125], train_loss:0.096835 Epoch [1/10], Iter [299/3125], train_loss:0.089952 Epoch [1/10], Iter [300/3125], train_loss:0.095460 Epoch [1/10], Iter [301/3125], train_loss:0.086470 Epoch [1/10], Iter [302/3125], train_loss:0.105848 Epoch [1/10], Iter [303/3125], train_loss:0.130099 Epoch [1/10], Iter [304/3125], train_loss:0.131335 Epoch [1/10], Iter [305/3125], train_loss:0.103911 Epoch [1/10], Iter [306/3125], train_loss:0.092839 Epoch [1/10], Iter [307/3125], train_loss:0.128423 Epoch [1/10], Iter [308/3125], train_loss:0.101717 Epoch [1/10], Iter [309/3125], train_loss:0.102042 Epoch [1/10], Iter [310/3125], train_loss:0.108195 Epoch [1/10], Iter [311/3125], train_loss:0.116109 Epoch [1/10], Iter [312/3125], train_loss:0.107782 Epoch [1/10], Iter [313/3125], train_loss:0.102813 Epoch [1/10], Iter [314/3125], train_loss:0.095960 Epoch [1/10], Iter [315/3125], train_loss:0.086566 Epoch [1/10], Iter [316/3125], train_loss:0.081492 Epoch [1/10], Iter [317/3125], train_loss:0.077582 Epoch [1/10], Iter [318/3125], train_loss:0.053461 Epoch [1/10], Iter [319/3125], train_loss:0.084671 Epoch [1/10], Iter [320/3125], train_loss:0.088476 Epoch [1/10], Iter [321/3125], train_loss:0.105547 Epoch [1/10], Iter [322/3125], train_loss:0.079457 Epoch [1/10], Iter [323/3125], train_loss:0.080500 Epoch [1/10], Iter [324/3125], train_loss:0.116692 Epoch [1/10], Iter [325/3125], train_loss:0.095060 Epoch [1/10], Iter [326/3125], train_loss:0.090416 Epoch [1/10], Iter [327/3125], train_loss:0.068069 Epoch [1/10], Iter [328/3125], train_loss:0.110763 Epoch [1/10], Iter [329/3125], train_loss:0.060889 Epoch [1/10], Iter [330/3125], train_loss:0.110807 Epoch [1/10], Iter [331/3125], train_loss:0.122002 Epoch [1/10], Iter [332/3125], train_loss:0.115815 Epoch [1/10], Iter [333/3125], train_loss:0.067004 Epoch [1/10], Iter [334/3125], train_loss:0.063815 Epoch [1/10], Iter [335/3125], train_loss:0.120017 Epoch [1/10], Iter [336/3125], train_loss:0.104086 Epoch [1/10], Iter [337/3125], train_loss:0.091577 Epoch [1/10], Iter [338/3125], train_loss:0.084077 Epoch [1/10], Iter [339/3125], train_loss:0.113410 Epoch [1/10], Iter [340/3125], train_loss:0.061866 Epoch [1/10], Iter [341/3125], train_loss:0.101881 Epoch [1/10], Iter [342/3125], train_loss:0.107144 Epoch [1/10], Iter [343/3125], train_loss:0.142906 Epoch [1/10], Iter [344/3125], train_loss:0.072013 Epoch [1/10], Iter [345/3125], train_loss:0.088949 Epoch [1/10], Iter [346/3125], train_loss:0.067578 Epoch [1/10], Iter [347/3125], train_loss:0.086871 Epoch [1/10], Iter [348/3125], train_loss:0.068842 Epoch [1/10], Iter [349/3125], train_loss:0.086257 Epoch [1/10], Iter [350/3125], train_loss:0.112828 Epoch [1/10], Iter [351/3125], train_loss:0.090362 Epoch [1/10], Iter [352/3125], train_loss:0.092230 Epoch [1/10], Iter [353/3125], train_loss:0.058990 Epoch [1/10], Iter [354/3125], train_loss:0.114826 Epoch [1/10], Iter [355/3125], train_loss:0.076303 Epoch [1/10], Iter [356/3125], train_loss:0.115605 Epoch [1/10], Iter [357/3125], train_loss:0.083856 Epoch [1/10], Iter [358/3125], train_loss:0.114196 Epoch [1/10], Iter [359/3125], train_loss:0.154424 Epoch [1/10], Iter [360/3125], train_loss:0.103248 Epoch [1/10], Iter [361/3125], train_loss:0.093536 Epoch [1/10], Iter [362/3125], train_loss:0.064217 Epoch [1/10], Iter [363/3125], train_loss:0.103777 Epoch [1/10], Iter [364/3125], train_loss:0.049145 Epoch [1/10], Iter [365/3125], train_loss:0.085676 Epoch [1/10], Iter [366/3125], train_loss:0.095860 Epoch [1/10], Iter [367/3125], train_loss:0.045282 Epoch [1/10], Iter [368/3125], train_loss:0.102015 Epoch [1/10], Iter [369/3125], train_loss:0.073394 Epoch [1/10], Iter [370/3125], train_loss:0.080284 Epoch [1/10], Iter [371/3125], train_loss:0.094347 Epoch [1/10], Iter [372/3125], train_loss:0.085500 Epoch [1/10], Iter [373/3125], train_loss:0.119371 Epoch [1/10], Iter [374/3125], train_loss:0.095046 Epoch [1/10], Iter [375/3125], train_loss:0.118757 Epoch [1/10], Iter [376/3125], train_loss:0.107976 Epoch [1/10], Iter [377/3125], train_loss:0.090448 Epoch [1/10], Iter [378/3125], train_loss:0.085898 Epoch [1/10], Iter [379/3125], train_loss:0.110092 Epoch [1/10], Iter [380/3125], train_loss:0.093738 Epoch [1/10], Iter [381/3125], train_loss:0.094126 Epoch [1/10], Iter [382/3125], train_loss:0.087205 Epoch [1/10], Iter [383/3125], train_loss:0.083657 Epoch [1/10], Iter [384/3125], train_loss:0.080641 Epoch [1/10], Iter [385/3125], train_loss:0.101648 Epoch [1/10], Iter [386/3125], train_loss:0.102539 Epoch [1/10], Iter [387/3125], train_loss:0.090064 Epoch [1/10], Iter [388/3125], train_loss:0.140402 Epoch [1/10], Iter [389/3125], train_loss:0.100177 Epoch [1/10], Iter [390/3125], train_loss:0.106683 Epoch [1/10], Iter [391/3125], train_loss:0.072911 Epoch [1/10], Iter [392/3125], train_loss:0.094680 Epoch [1/10], Iter [393/3125], train_loss:0.097260 Epoch [1/10], Iter [394/3125], train_loss:0.104942 Epoch [1/10], Iter [395/3125], train_loss:0.133387 Epoch [1/10], Iter [396/3125], train_loss:0.131581 Epoch [1/10], Iter [397/3125], train_loss:0.107176 Epoch [1/10], Iter [398/3125], train_loss:0.076420 Epoch [1/10], Iter [399/3125], train_loss:0.071057 Epoch [1/10], Iter [400/3125], train_loss:0.102585 Epoch [1/10], Iter [401/3125], train_loss:0.071347 Epoch [1/10], Iter [402/3125], train_loss:0.104381 Epoch [1/10], Iter [403/3125], train_loss:0.111743 Epoch [1/10], Iter [404/3125], train_loss:0.081141 Epoch [1/10], Iter [405/3125], train_loss:0.071977 Epoch [1/10], Iter [406/3125], train_loss:0.095490 Epoch [1/10], Iter [407/3125], train_loss:0.085300 Epoch [1/10], Iter [408/3125], train_loss:0.068072 Epoch [1/10], Iter [409/3125], train_loss:0.068445 Epoch [1/10], Iter [410/3125], train_loss:0.092671 Epoch [1/10], Iter [411/3125], train_loss:0.066765 Epoch [1/10], Iter [412/3125], train_loss:0.107009 Epoch [1/10], Iter [413/3125], train_loss:0.072693 Epoch [1/10], Iter [414/3125], train_loss:0.088150 Epoch [1/10], Iter [415/3125], train_loss:0.090847 Epoch [1/10], Iter [416/3125], train_loss:0.077029 Epoch [1/10], Iter [417/3125], train_loss:0.102404 Epoch [1/10], Iter [418/3125], train_loss:0.138703 Epoch [1/10], Iter [419/3125], train_loss:0.074720 Epoch [1/10], Iter [420/3125], train_loss:0.103256 Epoch [1/10], Iter [421/3125], train_loss:0.091416 Epoch [1/10], Iter [422/3125], train_loss:0.104568 Epoch [1/10], Iter [423/3125], train_loss:0.077688 Epoch [1/10], Iter [424/3125], train_loss:0.090047 Epoch [1/10], Iter [425/3125], train_loss:0.127545 Epoch [1/10], Iter [426/3125], train_loss:0.088344 Epoch [1/10], Iter [427/3125], train_loss:0.101759 Epoch [1/10], Iter [428/3125], train_loss:0.079185 Epoch [1/10], Iter [429/3125], train_loss:0.063097 Epoch [1/10], Iter [430/3125], train_loss:0.121180 Epoch [1/10], Iter [431/3125], train_loss:0.101340 Epoch [1/10], Iter [432/3125], train_loss:0.128714 Epoch [1/10], Iter [433/3125], train_loss:0.062577 Epoch [1/10], Iter [434/3125], train_loss:0.091420 Epoch [1/10], Iter [435/3125], train_loss:0.090504 Epoch [1/10], Iter [436/3125], train_loss:0.119372 Epoch [1/10], Iter [437/3125], train_loss:0.066290 Epoch [1/10], Iter [438/3125], train_loss:0.119662 Epoch [1/10], Iter [439/3125], train_loss:0.110264 Epoch [1/10], Iter [440/3125], train_loss:0.079450 Epoch [1/10], Iter [441/3125], train_loss:0.111833 Epoch [1/10], Iter [442/3125], train_loss:0.094980 Epoch [1/10], Iter [443/3125], train_loss:0.111621 Epoch [1/10], Iter [444/3125], train_loss:0.082750 Epoch [1/10], Iter [445/3125], train_loss:0.104502 Epoch [1/10], Iter [446/3125], train_loss:0.114041 Epoch [1/10], Iter [447/3125], train_loss:0.071238 Epoch [1/10], Iter [448/3125], train_loss:0.088294 Epoch [1/10], Iter [449/3125], train_loss:0.069142 Epoch [1/10], Iter [450/3125], train_loss:0.129054 Epoch [1/10], Iter [451/3125], train_loss:0.091864 Epoch [1/10], Iter [452/3125], train_loss:0.080189 Epoch [1/10], Iter [453/3125], train_loss:0.060313 Epoch [1/10], Iter [454/3125], train_loss:0.129373 Epoch [1/10], Iter [455/3125], train_loss:0.073149 Epoch [1/10], Iter [456/3125], train_loss:0.073206 Epoch [1/10], Iter [457/3125], train_loss:0.088790 Epoch [1/10], Iter [458/3125], train_loss:0.066144 Epoch [1/10], Iter [459/3125], train_loss:0.103504 Epoch [1/10], Iter [460/3125], train_loss:0.060709 Epoch [1/10], Iter [461/3125], train_loss:0.108793 Epoch [1/10], Iter [462/3125], train_loss:0.093702 Epoch [1/10], Iter [463/3125], train_loss:0.116326 Epoch [1/10], Iter [464/3125], train_loss:0.104743 Epoch [1/10], Iter [465/3125], train_loss:0.082492 Epoch [1/10], Iter [466/3125], train_loss:0.092319 Epoch [1/10], Iter [467/3125], train_loss:0.065833 Epoch [1/10], Iter [468/3125], train_loss:0.051208 Epoch [1/10], Iter [469/3125], train_loss:0.093229 Epoch [1/10], Iter [470/3125], train_loss:0.095329 Epoch [1/10], Iter [471/3125], train_loss:0.099470 Epoch [1/10], Iter [472/3125], train_loss:0.072319 Epoch [1/10], Iter [473/3125], train_loss:0.062743 Epoch [1/10], Iter [474/3125], train_loss:0.108008 Epoch [1/10], Iter [475/3125], train_loss:0.046297 Epoch [1/10], Iter [476/3125], train_loss:0.077335 Epoch [1/10], Iter [477/3125], train_loss:0.088254 Epoch [1/10], Iter [478/3125], train_loss:0.101036 Epoch [1/10], Iter [479/3125], train_loss:0.083029 Epoch [1/10], Iter [480/3125], train_loss:0.097751 Epoch [1/10], Iter [481/3125], train_loss:0.096469 Epoch [1/10], Iter [482/3125], train_loss:0.087993 Epoch [1/10], Iter [483/3125], train_loss:0.099732 Epoch [1/10], Iter [484/3125], train_loss:0.073528 Epoch [1/10], Iter [485/3125], train_loss:0.101679 Epoch [1/10], Iter [486/3125], train_loss:0.100552 Epoch [1/10], Iter [487/3125], train_loss:0.087380 Epoch [1/10], Iter [488/3125], train_loss:0.121468 Epoch [1/10], Iter [489/3125], train_loss:0.097617 Epoch [1/10], Iter [490/3125], train_loss:0.104743 Epoch [1/10], Iter [491/3125], train_loss:0.078716 Epoch [1/10], Iter [492/3125], train_loss:0.098265 Epoch [1/10], Iter [493/3125], train_loss:0.082094 Epoch [1/10], Iter [494/3125], train_loss:0.087327 Epoch [1/10], Iter [495/3125], train_loss:0.069399 Epoch [1/10], Iter [496/3125], train_loss:0.066200 Epoch [1/10], Iter [497/3125], train_loss:0.068601 Epoch [1/10], Iter [498/3125], train_loss:0.126001 Epoch [1/10], Iter [499/3125], train_loss:0.085090 Epoch [1/10], Iter [500/3125], train_loss:0.109014 Epoch [1/10], Iter [501/3125], train_loss:0.106699 Epoch [1/10], Iter [502/3125], train_loss:0.082973 Epoch [1/10], Iter [503/3125], train_loss:0.095683 Epoch [1/10], Iter [504/3125], train_loss:0.113937 Epoch [1/10], Iter [505/3125], train_loss:0.032092 Epoch [1/10], Iter [506/3125], train_loss:0.071751 Epoch [1/10], Iter [507/3125], train_loss:0.082614 Epoch [1/10], Iter [508/3125], train_loss:0.076657 Epoch [1/10], Iter [509/3125], train_loss:0.078356 Epoch [1/10], Iter [510/3125], train_loss:0.109523 Epoch [1/10], Iter [511/3125], train_loss:0.108152 Epoch [1/10], Iter [512/3125], train_loss:0.092030 Epoch [1/10], Iter [513/3125], train_loss:0.115947 Epoch [1/10], Iter [514/3125], train_loss:0.108748 Epoch [1/10], Iter [515/3125], train_loss:0.091761 Epoch [1/10], Iter [516/3125], train_loss:0.073188 Epoch [1/10], Iter [517/3125], train_loss:0.120827 Epoch [1/10], Iter [518/3125], train_loss:0.067271 Epoch [1/10], Iter [519/3125], train_loss:0.050369 Epoch [1/10], Iter [520/3125], train_loss:0.070868 Epoch [1/10], Iter [521/3125], train_loss:0.113249 Epoch [1/10], Iter [522/3125], train_loss:0.090670 Epoch [1/10], Iter [523/3125], train_loss:0.104130 Epoch [1/10], Iter [524/3125], train_loss:0.095427 Epoch [1/10], Iter [525/3125], train_loss:0.141192 Epoch [1/10], Iter [526/3125], train_loss:0.076236 Epoch [1/10], Iter [527/3125], train_loss:0.117406 Epoch [1/10], Iter [528/3125], train_loss:0.114006 Epoch [1/10], Iter [529/3125], train_loss:0.066016 Epoch [1/10], Iter [530/3125], train_loss:0.093731 Epoch [1/10], Iter [531/3125], train_loss:0.072306 Epoch [1/10], Iter [532/3125], train_loss:0.074725 Epoch [1/10], Iter [533/3125], train_loss:0.090788 Epoch [1/10], Iter [534/3125], train_loss:0.071732 Epoch [1/10], Iter [535/3125], train_loss:0.083744 Epoch [1/10], Iter [536/3125], train_loss:0.066183 Epoch [1/10], Iter [537/3125], train_loss:0.116836 Epoch [1/10], Iter [538/3125], train_loss:0.086225 Epoch [1/10], Iter [539/3125], train_loss:0.097140 Epoch [1/10], Iter [540/3125], train_loss:0.076652 Epoch [1/10], Iter [541/3125], train_loss:0.058895 Epoch [1/10], Iter [542/3125], train_loss:0.068447 Epoch [1/10], Iter [543/3125], train_loss:0.071758 Epoch [1/10], Iter [544/3125], train_loss:0.055181 Epoch [1/10], Iter [545/3125], train_loss:0.058409 Epoch [1/10], Iter [546/3125], train_loss:0.101034 Epoch [1/10], Iter [547/3125], train_loss:0.078014 Epoch [1/10], Iter [548/3125], train_loss:0.101554 Epoch [1/10], Iter [549/3125], train_loss:0.099358 Epoch [1/10], Iter [550/3125], train_loss:0.086353 Epoch [1/10], Iter [551/3125], train_loss:0.087590 Epoch [1/10], Iter [552/3125], train_loss:0.050383 Epoch [1/10], Iter [553/3125], train_loss:0.100233 Epoch [1/10], Iter [554/3125], train_loss:0.095480 Epoch [1/10], Iter [555/3125], train_loss:0.093082 Epoch [1/10], Iter [556/3125], train_loss:0.077300 Epoch [1/10], Iter [557/3125], train_loss:0.097098 Epoch [1/10], Iter [558/3125], train_loss:0.108629 Epoch [1/10], Iter [559/3125], train_loss:0.080039 Epoch [1/10], Iter [560/3125], train_loss:0.086488 Epoch [1/10], Iter [561/3125], train_loss:0.105568 Epoch [1/10], Iter [562/3125], train_loss:0.079867 Epoch [1/10], Iter [563/3125], train_loss:0.094058 Epoch [1/10], Iter [564/3125], train_loss:0.071488 Epoch [1/10], Iter [565/3125], train_loss:0.068944 Epoch [1/10], Iter [566/3125], train_loss:0.107989 Epoch [1/10], Iter [567/3125], train_loss:0.072702 Epoch [1/10], Iter [568/3125], train_loss:0.092457 Epoch [1/10], Iter [569/3125], train_loss:0.116950 Epoch [1/10], Iter [570/3125], train_loss:0.057468 Epoch [1/10], Iter [571/3125], train_loss:0.067517 Epoch [1/10], Iter [572/3125], train_loss:0.069241 Epoch [1/10], Iter [573/3125], train_loss:0.112788 Epoch [1/10], Iter [574/3125], train_loss:0.135044 Epoch [1/10], Iter [575/3125], train_loss:0.139375 Epoch [1/10], Iter [576/3125], train_loss:0.083855 Epoch [1/10], Iter [577/3125], train_loss:0.111794 Epoch [1/10], Iter [578/3125], train_loss:0.087120 Epoch [1/10], Iter [579/3125], train_loss:0.089663 Epoch [1/10], Iter [580/3125], train_loss:0.074575 Epoch [1/10], Iter [581/3125], train_loss:0.064921 Epoch [1/10], Iter [582/3125], train_loss:0.192595 Epoch [1/10], Iter [583/3125], train_loss:0.107797 Epoch [1/10], Iter [584/3125], train_loss:0.077203 Epoch [1/10], Iter [585/3125], train_loss:0.123417 Epoch [1/10], Iter [586/3125], train_loss:0.082694 Epoch [1/10], Iter [587/3125], train_loss:0.075541 Epoch [1/10], Iter [588/3125], train_loss:0.097291 Epoch [1/10], Iter [589/3125], train_loss:0.052539 Epoch [1/10], Iter [590/3125], train_loss:0.066947 Epoch [1/10], Iter [591/3125], train_loss:0.061442 Epoch [1/10], Iter [592/3125], train_loss:0.066907 Epoch [1/10], Iter [593/3125], train_loss:0.059535 Epoch [1/10], Iter [594/3125], train_loss:0.074935 Epoch [1/10], Iter [595/3125], train_loss:0.084690 Epoch [1/10], Iter [596/3125], train_loss:0.063918 Epoch [1/10], Iter [597/3125], train_loss:0.063785 Epoch [1/10], Iter [598/3125], train_loss:0.108638 Epoch [1/10], Iter [599/3125], train_loss:0.086835 Epoch [1/10], Iter [600/3125], train_loss:0.098556 Epoch [1/10], Iter [601/3125], train_loss:0.075705 Epoch [1/10], Iter [602/3125], train_loss:0.059754 Epoch [1/10], Iter [603/3125], train_loss:0.054489 Epoch [1/10], Iter [604/3125], train_loss:0.073924 Epoch [1/10], Iter [605/3125], train_loss:0.094530 Epoch [1/10], Iter [606/3125], train_loss:0.053714 Epoch [1/10], Iter [607/3125], train_loss:0.090675 Epoch [1/10], Iter [608/3125], train_loss:0.078084 Epoch [1/10], Iter [609/3125], train_loss:0.066804 Epoch [1/10], Iter [610/3125], train_loss:0.100219 Epoch [1/10], Iter [611/3125], train_loss:0.075962 Epoch [1/10], Iter [612/3125], train_loss:0.070294 Epoch [1/10], Iter [613/3125], train_loss:0.071478 Epoch [1/10], Iter [614/3125], train_loss:0.096717 Epoch [1/10], Iter [615/3125], train_loss:0.086769 Epoch [1/10], Iter [616/3125], train_loss:0.104664 Epoch [1/10], Iter [617/3125], train_loss:0.072344 Epoch [1/10], Iter [618/3125], train_loss:0.074144 Epoch [1/10], Iter [619/3125], train_loss:0.084967 Epoch [1/10], Iter [620/3125], train_loss:0.095983 Epoch [1/10], Iter [621/3125], train_loss:0.068011 Epoch [1/10], Iter [622/3125], train_loss:0.051430 Epoch [1/10], Iter [623/3125], train_loss:0.072359 Epoch [1/10], Iter [624/3125], train_loss:0.051836 Epoch [1/10], Iter [625/3125], train_loss:0.103024 Epoch [1/10], Iter [626/3125], train_loss:0.088216 Epoch [1/10], Iter [627/3125], train_loss:0.061990 Epoch [1/10], Iter [628/3125], train_loss:0.107665 Epoch [1/10], Iter [629/3125], train_loss:0.076811 Epoch [1/10], Iter [630/3125], train_loss:0.123782 Epoch [1/10], Iter [631/3125], train_loss:0.094078 Epoch [1/10], Iter [632/3125], train_loss:0.059769 Epoch [1/10], Iter [633/3125], train_loss:0.066241 Epoch [1/10], Iter [634/3125], train_loss:0.071580 Epoch [1/10], Iter [635/3125], train_loss:0.076411 Epoch [1/10], Iter [636/3125], train_loss:0.110754 Epoch [1/10], Iter [637/3125], train_loss:0.065504 Epoch [1/10], Iter [638/3125], train_loss:0.083259 Epoch [1/10], Iter [639/3125], train_loss:0.107182 Epoch [1/10], Iter [640/3125], train_loss:0.060376 Epoch [1/10], Iter [641/3125], train_loss:0.077829 Epoch [1/10], Iter [642/3125], train_loss:0.100774 Epoch [1/10], Iter [643/3125], train_loss:0.087143 Epoch [1/10], Iter [644/3125], train_loss:0.060597 Epoch [1/10], Iter [645/3125], train_loss:0.101928 Epoch [1/10], Iter [646/3125], train_loss:0.092720 Epoch [1/10], Iter [647/3125], train_loss:0.081452 Epoch [1/10], Iter [648/3125], train_loss:0.097151 Epoch [1/10], Iter [649/3125], train_loss:0.070104 Epoch [1/10], Iter [650/3125], train_loss:0.094944 Epoch [1/10], Iter [651/3125], train_loss:0.056059 Epoch [1/10], Iter [652/3125], train_loss:0.065773 Epoch [1/10], Iter [653/3125], train_loss:0.087860 Epoch [1/10], Iter [654/3125], train_loss:0.088647 Epoch [1/10], Iter [655/3125], train_loss:0.074508 Epoch [1/10], Iter [656/3125], train_loss:0.078260 Epoch [1/10], Iter [657/3125], train_loss:0.068859 Epoch [1/10], Iter [658/3125], train_loss:0.080638 Epoch [1/10], Iter [659/3125], train_loss:0.101420 Epoch [1/10], Iter [660/3125], train_loss:0.084931 Epoch [1/10], Iter [661/3125], train_loss:0.066806 Epoch [1/10], Iter [662/3125], train_loss:0.105629 Epoch [1/10], Iter [663/3125], train_loss:0.084870 Epoch [1/10], Iter [664/3125], train_loss:0.071970 Epoch [1/10], Iter [665/3125], train_loss:0.087836 Epoch [1/10], Iter [666/3125], train_loss:0.100669 Epoch [1/10], Iter [667/3125], train_loss:0.077280 Epoch [1/10], Iter [668/3125], train_loss:0.116738 Epoch [1/10], Iter [669/3125], train_loss:0.061395 Epoch [1/10], Iter [670/3125], train_loss:0.090685 Epoch [1/10], Iter [671/3125], train_loss:0.080947 Epoch [1/10], Iter [672/3125], train_loss:0.095348 Epoch [1/10], Iter [673/3125], train_loss:0.092972 Epoch [1/10], Iter [674/3125], train_loss:0.107024 Epoch [1/10], Iter [675/3125], train_loss:0.084352 Epoch [1/10], Iter [676/3125], train_loss:0.059006 Epoch [1/10], Iter [677/3125], train_loss:0.092779 Epoch [1/10], Iter [678/3125], train_loss:0.077512 Epoch [1/10], Iter [679/3125], train_loss:0.096963 Epoch [1/10], Iter [680/3125], train_loss:0.096011 Epoch [1/10], Iter [681/3125], train_loss:0.079866 Epoch [1/10], Iter [682/3125], train_loss:0.075723 Epoch [1/10], Iter [683/3125], train_loss:0.085611 Epoch [1/10], Iter [684/3125], train_loss:0.123355 Epoch [1/10], Iter [685/3125], train_loss:0.069978 Epoch [1/10], Iter [686/3125], train_loss:0.077491 Epoch [1/10], Iter [687/3125], train_loss:0.055490 Epoch [1/10], Iter [688/3125], train_loss:0.067270 Epoch [1/10], Iter [689/3125], train_loss:0.114452 Epoch [1/10], Iter [690/3125], train_loss:0.079901 Epoch [1/10], Iter [691/3125], train_loss:0.090492 Epoch [1/10], Iter [692/3125], train_loss:0.072870 Epoch [1/10], Iter [693/3125], train_loss:0.065780 Epoch [1/10], Iter [694/3125], train_loss:0.078856 Epoch [1/10], Iter [695/3125], train_loss:0.062660 Epoch [1/10], Iter [696/3125], train_loss:0.094964 Epoch [1/10], Iter [697/3125], train_loss:0.085245 Epoch [1/10], Iter [698/3125], train_loss:0.096854 Epoch [1/10], Iter [699/3125], train_loss:0.056521 Epoch [1/10], Iter [700/3125], train_loss:0.064707 Epoch [1/10], Iter [701/3125], train_loss:0.102361 Epoch [1/10], Iter [702/3125], train_loss:0.083936 Epoch [1/10], Iter [703/3125], train_loss:0.071545 Epoch [1/10], Iter [704/3125], train_loss:0.056376 Epoch [1/10], Iter [705/3125], train_loss:0.075224 Epoch [1/10], Iter [706/3125], train_loss:0.088155 Epoch [1/10], Iter [707/3125], train_loss:0.075692 Epoch [1/10], Iter [708/3125], train_loss:0.077199 Epoch [1/10], Iter [709/3125], train_loss:0.069121 Epoch [1/10], Iter [710/3125], train_loss:0.077576 Epoch [1/10], Iter [711/3125], train_loss:0.069567 Epoch [1/10], Iter [712/3125], train_loss:0.075430 Epoch [1/10], Iter [713/3125], train_loss:0.070002 Epoch [1/10], Iter [714/3125], train_loss:0.083099 Epoch [1/10], Iter [715/3125], train_loss:0.129424 Epoch [1/10], Iter [716/3125], train_loss:0.076017 Epoch [1/10], Iter [717/3125], train_loss:0.093424 Epoch [1/10], Iter [718/3125], train_loss:0.046105 Epoch [1/10], Iter [719/3125], train_loss:0.103817 Epoch [1/10], Iter [720/3125], train_loss:0.063443 Epoch [1/10], Iter [721/3125], train_loss:0.068008 Epoch [1/10], Iter [722/3125], train_loss:0.080830 Epoch [1/10], Iter [723/3125], train_loss:0.063206 Epoch [1/10], Iter [724/3125], train_loss:0.046125 Epoch [1/10], Iter [725/3125], train_loss:0.098638 Epoch [1/10], Iter [726/3125], train_loss:0.059091 Epoch [1/10], Iter [727/3125], train_loss:0.104707 Epoch [1/10], Iter [728/3125], train_loss:0.060244 Epoch [1/10], Iter [729/3125], train_loss:0.056369 Epoch [1/10], Iter [730/3125], train_loss:0.066725 Epoch [1/10], Iter [731/3125], train_loss:0.078067 Epoch [1/10], Iter [732/3125], train_loss:0.074055 Epoch [1/10], Iter [733/3125], train_loss:0.035916 Epoch [1/10], Iter [734/3125], train_loss:0.066059 Epoch [1/10], Iter [735/3125], train_loss:0.118576 Epoch [1/10], Iter [736/3125], train_loss:0.095265 Epoch [1/10], Iter [737/3125], train_loss:0.085072 Epoch [1/10], Iter [738/3125], train_loss:0.076775 Epoch [1/10], Iter [739/3125], train_loss:0.077835 Epoch [1/10], Iter [740/3125], train_loss:0.071196 Epoch [1/10], Iter [741/3125], train_loss:0.068851 Epoch [1/10], Iter [742/3125], train_loss:0.041999 Epoch [1/10], Iter [743/3125], train_loss:0.074546 Epoch [1/10], Iter [744/3125], train_loss:0.098691 Epoch [1/10], Iter [745/3125], train_loss:0.100539 Epoch [1/10], Iter [746/3125], train_loss:0.079695 Epoch [1/10], Iter [747/3125], train_loss:0.078971 Epoch [1/10], Iter [748/3125], train_loss:0.081766 Epoch [1/10], Iter [749/3125], train_loss:0.089490 Epoch [1/10], Iter [750/3125], train_loss:0.077093 Epoch [1/10], Iter [751/3125], train_loss:0.077361 Epoch [1/10], Iter [752/3125], train_loss:0.114653 Epoch [1/10], Iter [753/3125], train_loss:0.047497 Epoch [1/10], Iter [754/3125], train_loss:0.121098 Epoch [1/10], Iter [755/3125], train_loss:0.070111 Epoch [1/10], Iter [756/3125], train_loss:0.069042 Epoch [1/10], Iter [757/3125], train_loss:0.073422 Epoch [1/10], Iter [758/3125], train_loss:0.070171 Epoch [1/10], Iter [759/3125], train_loss:0.104445 Epoch [1/10], Iter [760/3125], train_loss:0.075994 Epoch [1/10], Iter [761/3125], train_loss:0.057151 Epoch [1/10], Iter [762/3125], train_loss:0.086842 Epoch [1/10], Iter [763/3125], train_loss:0.050175 Epoch [1/10], Iter [764/3125], train_loss:0.114565 Epoch [1/10], Iter [765/3125], train_loss:0.088730 Epoch [1/10], Iter [766/3125], train_loss:0.084020 Epoch [1/10], Iter [767/3125], train_loss:0.055446 Epoch [1/10], Iter [768/3125], train_loss:0.073858 Epoch [1/10], Iter [769/3125], train_loss:0.076490 Epoch [1/10], Iter [770/3125], train_loss:0.117408 Epoch [1/10], Iter [771/3125], train_loss:0.074123 Epoch [1/10], Iter [772/3125], train_loss:0.091184 Epoch [1/10], Iter [773/3125], train_loss:0.101151 Epoch [1/10], Iter [774/3125], train_loss:0.069927 Epoch [1/10], Iter [775/3125], train_loss:0.078611 Epoch [1/10], Iter [776/3125], train_loss:0.076168 Epoch [1/10], Iter [777/3125], train_loss:0.098598 Epoch [1/10], Iter [778/3125], train_loss:0.080934 Epoch [1/10], Iter [779/3125], train_loss:0.065147 Epoch [1/10], Iter [780/3125], train_loss:0.092266 Epoch [1/10], Iter [781/3125], train_loss:0.088162 Epoch [1/10], Iter [782/3125], train_loss:0.048683 Epoch [1/10], Iter [783/3125], train_loss:0.068024 Epoch [1/10], Iter [784/3125], train_loss:0.061430 Epoch [1/10], Iter [785/3125], train_loss:0.084588 Epoch [1/10], Iter [786/3125], train_loss:0.055528 Epoch [1/10], Iter [787/3125], train_loss:0.069858 Epoch [1/10], Iter [788/3125], train_loss:0.066797 Epoch [1/10], Iter [789/3125], train_loss:0.055900 Epoch [1/10], Iter [790/3125], train_loss:0.081083 Epoch [1/10], Iter [791/3125], train_loss:0.104611 Epoch [1/10], Iter [792/3125], train_loss:0.069633 Epoch [1/10], Iter [793/3125], train_loss:0.076716 Epoch [1/10], Iter [794/3125], train_loss:0.058692 Epoch [1/10], Iter [795/3125], train_loss:0.071644 Epoch [1/10], Iter [796/3125], train_loss:0.075141 Epoch [1/10], Iter [797/3125], train_loss:0.057095 Epoch [1/10], Iter [798/3125], train_loss:0.091708 Epoch [1/10], Iter [799/3125], train_loss:0.082720 Epoch [1/10], Iter [800/3125], train_loss:0.082454 Epoch [1/10], Iter [801/3125], train_loss:0.062604 Epoch [1/10], Iter [802/3125], train_loss:0.064724 Epoch [1/10], Iter [803/3125], train_loss:0.070556 Epoch [1/10], Iter [804/3125], train_loss:0.062924 Epoch [1/10], Iter [805/3125], train_loss:0.068634 Epoch [1/10], Iter [806/3125], train_loss:0.125406 Epoch [1/10], Iter [807/3125], train_loss:0.105064 Epoch [1/10], Iter [808/3125], train_loss:0.094673 Epoch [1/10], Iter [809/3125], train_loss:0.058413 Epoch [1/10], Iter [810/3125], train_loss:0.068775 Epoch [1/10], Iter [811/3125], train_loss:0.082067 Epoch [1/10], Iter [812/3125], train_loss:0.069499 Epoch [1/10], Iter [813/3125], train_loss:0.046804 Epoch [1/10], Iter [814/3125], train_loss:0.052497 Epoch [1/10], Iter [815/3125], train_loss:0.039903 Epoch [1/10], Iter [816/3125], train_loss:0.075335 Epoch [1/10], Iter [817/3125], train_loss:0.118900 Epoch [1/10], Iter [818/3125], train_loss:0.095827 Epoch [1/10], Iter [819/3125], train_loss:0.080276 Epoch [1/10], Iter [820/3125], train_loss:0.078976 Epoch [1/10], Iter [821/3125], train_loss:0.067389 Epoch [1/10], Iter [822/3125], train_loss:0.039839 Epoch [1/10], Iter [823/3125], train_loss:0.084257 Epoch [1/10], Iter [824/3125], train_loss:0.086442 Epoch [1/10], Iter [825/3125], train_loss:0.067308 Epoch [1/10], Iter [826/3125], train_loss:0.065607 Epoch [1/10], Iter [827/3125], train_loss:0.076576 Epoch [1/10], Iter [828/3125], train_loss:0.059056 Epoch [1/10], Iter [829/3125], train_loss:0.045432 Epoch [1/10], Iter [830/3125], train_loss:0.097930 Epoch [1/10], Iter [831/3125], train_loss:0.029969 Epoch [1/10], Iter [832/3125], train_loss:0.089879 Epoch [1/10], Iter [833/3125], train_loss:0.065557 Epoch [1/10], Iter [834/3125], train_loss:0.055370 Epoch [1/10], Iter [835/3125], train_loss:0.078189 Epoch [1/10], Iter [836/3125], train_loss:0.078902 Epoch [1/10], Iter [837/3125], train_loss:0.049187 Epoch [1/10], Iter [838/3125], train_loss:0.073233 Epoch [1/10], Iter [839/3125], train_loss:0.042756 Epoch [1/10], Iter [840/3125], train_loss:0.095991 Epoch [1/10], Iter [841/3125], train_loss:0.054647 Epoch [1/10], Iter [842/3125], train_loss:0.090404 Epoch [1/10], Iter [843/3125], train_loss:0.084048 Epoch [1/10], Iter [844/3125], train_loss:0.042351 Epoch [1/10], Iter [845/3125], train_loss:0.110720 Epoch [1/10], Iter [846/3125], train_loss:0.058698 Epoch [1/10], Iter [847/3125], train_loss:0.065574 Epoch [1/10], Iter [848/3125], train_loss:0.103704 Epoch [1/10], Iter [849/3125], train_loss:0.092518 Epoch [1/10], Iter [850/3125], train_loss:0.105825 Epoch [1/10], Iter [851/3125], train_loss:0.092112 Epoch [1/10], Iter [852/3125], train_loss:0.060410 Epoch [1/10], Iter [853/3125], train_loss:0.053077 Epoch [1/10], Iter [854/3125], train_loss:0.096419 Epoch [1/10], Iter [855/3125], train_loss:0.070295 Epoch [1/10], Iter [856/3125], train_loss:0.038191 Epoch [1/10], Iter [857/3125], train_loss:0.067107 Epoch [1/10], Iter [858/3125], train_loss:0.068591 Epoch [1/10], Iter [859/3125], train_loss:0.118834 Epoch [1/10], Iter [860/3125], train_loss:0.057502 Epoch [1/10], Iter [861/3125], train_loss:0.112667 Epoch [1/10], Iter [862/3125], train_loss:0.068514 Epoch [1/10], Iter [863/3125], train_loss:0.078345 Epoch [1/10], Iter [864/3125], train_loss:0.086322 Epoch [1/10], Iter [865/3125], train_loss:0.060227 Epoch [1/10], Iter [866/3125], train_loss:0.069537 Epoch [1/10], Iter [867/3125], train_loss:0.051423 Epoch [1/10], Iter [868/3125], train_loss:0.065481 Epoch [1/10], Iter [869/3125], train_loss:0.078509 Epoch [1/10], Iter [870/3125], train_loss:0.087949 Epoch [1/10], Iter [871/3125], train_loss:0.089137 Epoch [1/10], Iter [872/3125], train_loss:0.097406 Epoch [1/10], Iter [873/3125], train_loss:0.058960 Epoch [1/10], Iter [874/3125], train_loss:0.058738 Epoch [1/10], Iter [875/3125], train_loss:0.061488 Epoch [1/10], Iter [876/3125], train_loss:0.066018 Epoch [1/10], Iter [877/3125], train_loss:0.074891 Epoch [1/10], Iter [878/3125], train_loss:0.086487 Epoch [1/10], Iter [879/3125], train_loss:0.036267 Epoch [1/10], Iter [880/3125], train_loss:0.052825 Epoch [1/10], Iter [881/3125], train_loss:0.086232 Epoch [1/10], Iter [882/3125], train_loss:0.067304 Epoch [1/10], Iter [883/3125], train_loss:0.090174 Epoch [1/10], Iter [884/3125], train_loss:0.074173 Epoch [1/10], Iter [885/3125], train_loss:0.103388 Epoch [1/10], Iter [886/3125], train_loss:0.063061 Epoch [1/10], Iter [887/3125], train_loss:0.111390 Epoch [1/10], Iter [888/3125], train_loss:0.082873 Epoch [1/10], Iter [889/3125], train_loss:0.067860 Epoch [1/10], Iter [890/3125], train_loss:0.069580 Epoch [1/10], Iter [891/3125], train_loss:0.071146 Epoch [1/10], Iter [892/3125], train_loss:0.046750 Epoch [1/10], Iter [893/3125], train_loss:0.069989 Epoch [1/10], Iter [894/3125], train_loss:0.054033 Epoch [1/10], Iter [895/3125], train_loss:0.091311 Epoch [1/10], Iter [896/3125], train_loss:0.089567 Epoch [1/10], Iter [897/3125], train_loss:0.082130 Epoch [1/10], Iter [898/3125], train_loss:0.115708 Epoch [1/10], Iter [899/3125], train_loss:0.099699 Epoch [1/10], Iter [900/3125], train_loss:0.084736 Epoch [1/10], Iter [901/3125], train_loss:0.099145 Epoch [1/10], Iter [902/3125], train_loss:0.096519 Epoch [1/10], Iter [903/3125], train_loss:0.070268 Epoch [1/10], Iter [904/3125], train_loss:0.048972 Epoch [1/10], Iter [905/3125], train_loss:0.055735 Epoch [1/10], Iter [906/3125], train_loss:0.092406 Epoch [1/10], Iter [907/3125], train_loss:0.094186 Epoch [1/10], Iter [908/3125], train_loss:0.058645 Epoch [1/10], Iter [909/3125], train_loss:0.059716 Epoch [1/10], Iter [910/3125], train_loss:0.066300 Epoch [1/10], Iter [911/3125], train_loss:0.055384 Epoch [1/10], Iter [912/3125], train_loss:0.063149 Epoch [1/10], Iter [913/3125], train_loss:0.078833 Epoch [1/10], Iter [914/3125], train_loss:0.047108 Epoch [1/10], Iter [915/3125], train_loss:0.095854 Epoch [1/10], Iter [916/3125], train_loss:0.067950 Epoch [1/10], Iter [917/3125], train_loss:0.089043 Epoch [1/10], Iter [918/3125], train_loss:0.091433 Epoch [1/10], Iter [919/3125], train_loss:0.071309 Epoch [1/10], Iter [920/3125], train_loss:0.064289 Epoch [1/10], Iter [921/3125], train_loss:0.075466 Epoch [1/10], Iter [922/3125], train_loss:0.041136 Epoch [1/10], Iter [923/3125], train_loss:0.069332 Epoch [1/10], Iter [924/3125], train_loss:0.103374 Epoch [1/10], Iter [925/3125], train_loss:0.048819 Epoch [1/10], Iter [926/3125], train_loss:0.102714 Epoch [1/10], Iter [927/3125], train_loss:0.059707 Epoch [1/10], Iter [928/3125], train_loss:0.103872 Epoch [1/10], Iter [929/3125], train_loss:0.071671 Epoch [1/10], Iter [930/3125], train_loss:0.043527 Epoch [1/10], Iter [931/3125], train_loss:0.101342 Epoch [1/10], Iter [932/3125], train_loss:0.090892 Epoch [1/10], Iter [933/3125], train_loss:0.084326 Epoch [1/10], Iter [934/3125], train_loss:0.085523 Epoch [1/10], Iter [935/3125], train_loss:0.104836 Epoch [1/10], Iter [936/3125], train_loss:0.071485 Epoch [1/10], Iter [937/3125], train_loss:0.075505 Epoch [1/10], Iter [938/3125], train_loss:0.055048 Epoch [1/10], Iter [939/3125], train_loss:0.052603 Epoch [1/10], Iter [940/3125], train_loss:0.052872 Epoch [1/10], Iter [941/3125], train_loss:0.046744 Epoch [1/10], Iter [942/3125], train_loss:0.084774 Epoch [1/10], Iter [943/3125], train_loss:0.089809 Epoch [1/10], Iter [944/3125], train_loss:0.077171 Epoch [1/10], Iter [945/3125], train_loss:0.053297 Epoch [1/10], Iter [946/3125], train_loss:0.048126 Epoch [1/10], Iter [947/3125], train_loss:0.069072 Epoch [1/10], Iter [948/3125], train_loss:0.081771 Epoch [1/10], Iter [949/3125], train_loss:0.086464 Epoch [1/10], Iter [950/3125], train_loss:0.078226 Epoch [1/10], Iter [951/3125], train_loss:0.070242 Epoch [1/10], Iter [952/3125], train_loss:0.065498 Epoch [1/10], Iter [953/3125], train_loss:0.057135 Epoch [1/10], Iter [954/3125], train_loss:0.087012 Epoch [1/10], Iter [955/3125], train_loss:0.087501 Epoch [1/10], Iter [956/3125], train_loss:0.076051 Epoch [1/10], Iter [957/3125], train_loss:0.093375 Epoch [1/10], Iter [958/3125], train_loss:0.098896 Epoch [1/10], Iter [959/3125], train_loss:0.094898 Epoch [1/10], Iter [960/3125], train_loss:0.051544 Epoch [1/10], Iter [961/3125], train_loss:0.112901 Epoch [1/10], Iter [962/3125], train_loss:0.064911 Epoch [1/10], Iter [963/3125], train_loss:0.127530 Epoch [1/10], Iter [964/3125], train_loss:0.060438 Epoch [1/10], Iter [965/3125], train_loss:0.073689 Epoch [1/10], Iter [966/3125], train_loss:0.058125 Epoch [1/10], Iter [967/3125], train_loss:0.076736 Epoch [1/10], Iter [968/3125], train_loss:0.076557 Epoch [1/10], Iter [969/3125], train_loss:0.064269 Epoch [1/10], Iter [970/3125], train_loss:0.078429 Epoch [1/10], Iter [971/3125], train_loss:0.053220 Epoch [1/10], Iter [972/3125], train_loss:0.059810 Epoch [1/10], Iter [973/3125], train_loss:0.061482 Epoch [1/10], Iter [974/3125], train_loss:0.059918 Epoch [1/10], Iter [975/3125], train_loss:0.095541 Epoch [1/10], Iter [976/3125], train_loss:0.066343 Epoch [1/10], Iter [977/3125], train_loss:0.063362 Epoch [1/10], Iter [978/3125], train_loss:0.049746 Epoch [1/10], Iter [979/3125], train_loss:0.076230 Epoch [1/10], Iter [980/3125], train_loss:0.085253 Epoch [1/10], Iter [981/3125], train_loss:0.055329 Epoch [1/10], Iter [982/3125], train_loss:0.073866 Epoch [1/10], Iter [983/3125], train_loss:0.090456 Epoch [1/10], Iter [984/3125], train_loss:0.065264 Epoch [1/10], Iter [985/3125], train_loss:0.094808 Epoch [1/10], Iter [986/3125], train_loss:0.083755 Epoch [1/10], Iter [987/3125], train_loss:0.100000 Epoch [1/10], Iter [988/3125], train_loss:0.044194 Epoch [1/10], Iter [989/3125], train_loss:0.089688 Epoch [1/10], Iter [990/3125], train_loss:0.061354 Epoch [1/10], Iter [991/3125], train_loss:0.072798 Epoch [1/10], Iter [992/3125], train_loss:0.055077 Epoch [1/10], Iter [993/3125], train_loss:0.066739 Epoch [1/10], Iter [994/3125], train_loss:0.085635 Epoch [1/10], Iter [995/3125], train_loss:0.062349 Epoch [1/10], Iter [996/3125], train_loss:0.055486 Epoch [1/10], Iter [997/3125], train_loss:0.061249 Epoch [1/10], Iter [998/3125], train_loss:0.046875 Epoch [1/10], Iter [999/3125], train_loss:0.078696 Epoch [1/10], Iter [1000/3125], train_loss:0.071514 Epoch [1/10], Iter [1001/3125], train_loss:0.084848 Epoch [1/10], Iter [1002/3125], train_loss:0.051532 Epoch [1/10], Iter [1003/3125], train_loss:0.084807 Epoch [1/10], Iter [1004/3125], train_loss:0.088694 Epoch [1/10], Iter [1005/3125], train_loss:0.081654 Epoch [1/10], Iter [1006/3125], train_loss:0.067032 Epoch [1/10], Iter [1007/3125], train_loss:0.124414 Epoch [1/10], Iter [1008/3125], train_loss:0.080349 Epoch [1/10], Iter [1009/3125], train_loss:0.036862 Epoch [1/10], Iter [1010/3125], train_loss:0.076840 Epoch [1/10], Iter [1011/3125], train_loss:0.042844 Epoch [1/10], Iter [1012/3125], train_loss:0.078605 Epoch [1/10], Iter [1013/3125], train_loss:0.044502 Epoch [1/10], Iter [1014/3125], train_loss:0.080783 Epoch [1/10], Iter [1015/3125], train_loss:0.071481 Epoch [1/10], Iter [1016/3125], train_loss:0.085543 Epoch [1/10], Iter [1017/3125], train_loss:0.107438 Epoch [1/10], Iter [1018/3125], train_loss:0.076212 Epoch [1/10], Iter [1019/3125], train_loss:0.078109 Epoch [1/10], Iter [1020/3125], train_loss:0.047839 Epoch [1/10], Iter [1021/3125], train_loss:0.090297 Epoch [1/10], Iter [1022/3125], train_loss:0.060652 Epoch [1/10], Iter [1023/3125], train_loss:0.107761 Epoch [1/10], Iter [1024/3125], train_loss:0.075100 Epoch [1/10], Iter [1025/3125], train_loss:0.065084 Epoch [1/10], Iter [1026/3125], train_loss:0.086126 Epoch [1/10], Iter [1027/3125], train_loss:0.076870 Epoch [1/10], Iter [1028/3125], train_loss:0.090435 Epoch [1/10], Iter [1029/3125], train_loss:0.071291 Epoch [1/10], Iter [1030/3125], train_loss:0.072460 Epoch [1/10], Iter [1031/3125], train_loss:0.065093 Epoch [1/10], Iter [1032/3125], train_loss:0.046128 Epoch [1/10], Iter [1033/3125], train_loss:0.081843 Epoch [1/10], Iter [1034/3125], train_loss:0.098334 Epoch [1/10], Iter [1035/3125], train_loss:0.044121 Epoch [1/10], Iter [1036/3125], train_loss:0.067291 Epoch [1/10], Iter [1037/3125], train_loss:0.055147 Epoch [1/10], Iter [1038/3125], train_loss:0.075272 Epoch [1/10], Iter [1039/3125], train_loss:0.097143 Epoch [1/10], Iter [1040/3125], train_loss:0.083308 Epoch [1/10], Iter [1041/3125], train_loss:0.083002 Epoch [1/10], Iter [1042/3125], train_loss:0.074888 Epoch [1/10], Iter [1043/3125], train_loss:0.097697 Epoch [1/10], Iter [1044/3125], train_loss:0.049311 Epoch [1/10], Iter [1045/3125], train_loss:0.081692 Epoch [1/10], Iter [1046/3125], train_loss:0.064942 Epoch [1/10], Iter [1047/3125], train_loss:0.044580 Epoch [1/10], Iter [1048/3125], train_loss:0.085176 Epoch [1/10], Iter [1049/3125], train_loss:0.063269 Epoch [1/10], Iter [1050/3125], train_loss:0.077601 Epoch [1/10], Iter [1051/3125], train_loss:0.105948 Epoch [1/10], Iter [1052/3125], train_loss:0.059415 Epoch [1/10], Iter [1053/3125], train_loss:0.094063 Epoch [1/10], Iter [1054/3125], train_loss:0.092959 Epoch [1/10], Iter [1055/3125], train_loss:0.092067 Epoch [1/10], Iter [1056/3125], train_loss:0.067009 Epoch [1/10], Iter [1057/3125], train_loss:0.098917 Epoch [1/10], Iter [1058/3125], train_loss:0.057587 Epoch [1/10], Iter [1059/3125], train_loss:0.130291 Epoch [1/10], Iter [1060/3125], train_loss:0.067882 Epoch [1/10], Iter [1061/3125], train_loss:0.060654 Epoch [1/10], Iter [1062/3125], train_loss:0.055052 Epoch [1/10], Iter [1063/3125], train_loss:0.113558 Epoch [1/10], Iter [1064/3125], train_loss:0.092149 Epoch [1/10], Iter [1065/3125], train_loss:0.080471 Epoch [1/10], Iter [1066/3125], train_loss:0.077791 Epoch [1/10], Iter [1067/3125], train_loss:0.064857 Epoch [1/10], Iter [1068/3125], train_loss:0.061791 Epoch [1/10], Iter [1069/3125], train_loss:0.092346 Epoch [1/10], Iter [1070/3125], train_loss:0.061829 Epoch [1/10], Iter [1071/3125], train_loss:0.052066 Epoch [1/10], Iter [1072/3125], train_loss:0.060261 Epoch [1/10], Iter [1073/3125], train_loss:0.052576 Epoch [1/10], Iter [1074/3125], train_loss:0.091335 Epoch [1/10], Iter [1075/3125], train_loss:0.085970 Epoch [1/10], Iter [1076/3125], train_loss:0.051026 Epoch [1/10], Iter [1077/3125], train_loss:0.054480 Epoch [1/10], Iter [1078/3125], train_loss:0.076401 Epoch [1/10], Iter [1079/3125], train_loss:0.067915 Epoch [1/10], Iter [1080/3125], train_loss:0.080814 Epoch [1/10], Iter [1081/3125], train_loss:0.079265 Epoch [1/10], Iter [1082/3125], train_loss:0.064177 Epoch [1/10], Iter [1083/3125], train_loss:0.070294 Epoch [1/10], Iter [1084/3125], train_loss:0.076654 Epoch [1/10], Iter [1085/3125], train_loss:0.048900 Epoch [1/10], Iter [1086/3125], train_loss:0.080051 Epoch [1/10], Iter [1087/3125], train_loss:0.062221 Epoch [1/10], Iter [1088/3125], train_loss:0.053528 Epoch [1/10], Iter [1089/3125], train_loss:0.078500 Epoch [1/10], Iter [1090/3125], train_loss:0.054167 Epoch [1/10], Iter [1091/3125], train_loss:0.060830 Epoch [1/10], Iter [1092/3125], train_loss:0.070064 Epoch [1/10], Iter [1093/3125], train_loss:0.059513 Epoch [1/10], Iter [1094/3125], train_loss:0.064300 Epoch [1/10], Iter [1095/3125], train_loss:0.064953 Epoch [1/10], Iter [1096/3125], train_loss:0.098469 Epoch [1/10], Iter [1097/3125], train_loss:0.070608 Epoch [1/10], Iter [1098/3125], train_loss:0.063558 Epoch [1/10], Iter [1099/3125], train_loss:0.047807 Epoch [1/10], Iter [1100/3125], train_loss:0.040138 Epoch [1/10], Iter [1101/3125], train_loss:0.054244 Epoch [1/10], Iter [1102/3125], train_loss:0.094688 Epoch [1/10], Iter [1103/3125], train_loss:0.040553 Epoch [1/10], Iter [1104/3125], train_loss:0.054478 Epoch [1/10], Iter [1105/3125], train_loss:0.051893 Epoch [1/10], Iter [1106/3125], train_loss:0.063331 Epoch [1/10], Iter [1107/3125], train_loss:0.092488 Epoch [1/10], Iter [1108/3125], train_loss:0.079674 Epoch [1/10], Iter [1109/3125], train_loss:0.082050 Epoch [1/10], Iter [1110/3125], train_loss:0.053623 Epoch [1/10], Iter [1111/3125], train_loss:0.142942 Epoch [1/10], Iter [1112/3125], train_loss:0.071629 Epoch [1/10], Iter [1113/3125], train_loss:0.070982 Epoch [1/10], Iter [1114/3125], train_loss:0.096225 Epoch [1/10], Iter [1115/3125], train_loss:0.071539 Epoch [1/10], Iter [1116/3125], train_loss:0.058115 Epoch [1/10], Iter [1117/3125], train_loss:0.069117 Epoch [1/10], Iter [1118/3125], train_loss:0.048873 Epoch [1/10], Iter [1119/3125], train_loss:0.041571 Epoch [1/10], Iter [1120/3125], train_loss:0.062927 Epoch [1/10], Iter [1121/3125], train_loss:0.060754 Epoch [1/10], Iter [1122/3125], train_loss:0.072750 Epoch [1/10], Iter [1123/3125], train_loss:0.112615 Epoch [1/10], Iter [1124/3125], train_loss:0.051256 Epoch [1/10], Iter [1125/3125], train_loss:0.086577 Epoch [1/10], Iter [1126/3125], train_loss:0.058549 Epoch [1/10], Iter [1127/3125], train_loss:0.038518 Epoch [1/10], Iter [1128/3125], train_loss:0.080108 Epoch [1/10], Iter [1129/3125], train_loss:0.088471 Epoch [1/10], Iter [1130/3125], train_loss:0.062608 Epoch [1/10], Iter [1131/3125], train_loss:0.029030 Epoch [1/10], Iter [1132/3125], train_loss:0.102873 Epoch [1/10], Iter [1133/3125], train_loss:0.044108 Epoch [1/10], Iter [1134/3125], train_loss:0.062481 Epoch [1/10], Iter [1135/3125], train_loss:0.070823 Epoch [1/10], Iter [1136/3125], train_loss:0.056807 Epoch [1/10], Iter [1137/3125], train_loss:0.086398 Epoch [1/10], Iter [1138/3125], train_loss:0.070901 Epoch [1/10], Iter [1139/3125], train_loss:0.057244 Epoch [1/10], Iter [1140/3125], train_loss:0.084820 Epoch [1/10], Iter [1141/3125], train_loss:0.060651 Epoch [1/10], Iter [1142/3125], train_loss:0.050026 Epoch [1/10], Iter [1143/3125], train_loss:0.051782 Epoch [1/10], Iter [1144/3125], train_loss:0.078317 Epoch [1/10], Iter [1145/3125], train_loss:0.101919 Epoch [1/10], Iter [1146/3125], train_loss:0.066825 Epoch [1/10], Iter [1147/3125], train_loss:0.058590 Epoch [1/10], Iter [1148/3125], train_loss:0.065694 Epoch [1/10], Iter [1149/3125], train_loss:0.073218 Epoch [1/10], Iter [1150/3125], train_loss:0.055545 Epoch [1/10], Iter [1151/3125], train_loss:0.091100 Epoch [1/10], Iter [1152/3125], train_loss:0.064072 Epoch [1/10], Iter [1153/3125], train_loss:0.056346 Epoch [1/10], Iter [1154/3125], train_loss:0.051450 Epoch [1/10], Iter [1155/3125], train_loss:0.092154 Epoch [1/10], Iter [1156/3125], train_loss:0.042432 Epoch [1/10], Iter [1157/3125], train_loss:0.089265 Epoch [1/10], Iter [1158/3125], train_loss:0.060625 Epoch [1/10], Iter [1159/3125], train_loss:0.099431 Epoch [1/10], Iter [1160/3125], train_loss:0.083928 Epoch [1/10], Iter [1161/3125], train_loss:0.035615 Epoch [1/10], Iter [1162/3125], train_loss:0.085633 Epoch [1/10], Iter [1163/3125], train_loss:0.072629 Epoch [1/10], Iter [1164/3125], train_loss:0.025984 Epoch [1/10], Iter [1165/3125], train_loss:0.039261 Epoch [1/10], Iter [1166/3125], train_loss:0.069321 Epoch [1/10], Iter [1167/3125], train_loss:0.069004 Epoch [1/10], Iter [1168/3125], train_loss:0.089742 Epoch [1/10], Iter [1169/3125], train_loss:0.079844 Epoch [1/10], Iter [1170/3125], train_loss:0.072411 Epoch [1/10], Iter [1171/3125], train_loss:0.067221 Epoch [1/10], Iter [1172/3125], train_loss:0.042146 Epoch [1/10], Iter [1173/3125], train_loss:0.057201 Epoch [1/10], Iter [1174/3125], train_loss:0.080315 Epoch [1/10], Iter [1175/3125], train_loss:0.071066 Epoch [1/10], Iter [1176/3125], train_loss:0.052890 Epoch [1/10], Iter [1177/3125], train_loss:0.068389 Epoch [1/10], Iter [1178/3125], train_loss:0.064046 Epoch [1/10], Iter [1179/3125], train_loss:0.077891 Epoch [1/10], Iter [1180/3125], train_loss:0.048555 Epoch [1/10], Iter [1181/3125], train_loss:0.050501 Epoch [1/10], Iter [1182/3125], train_loss:0.048259 Epoch [1/10], Iter [1183/3125], train_loss:0.062327 Epoch [1/10], Iter [1184/3125], train_loss:0.109548 Epoch [1/10], Iter [1185/3125], train_loss:0.065658 Epoch [1/10], Iter [1186/3125], train_loss:0.093734 Epoch [1/10], Iter [1187/3125], train_loss:0.063664 Epoch [1/10], Iter [1188/3125], train_loss:0.037065 Epoch [1/10], Iter [1189/3125], train_loss:0.057139 Epoch [1/10], Iter [1190/3125], train_loss:0.036839 Epoch [1/10], Iter [1191/3125], train_loss:0.067464 Epoch [1/10], Iter [1192/3125], train_loss:0.066957 Epoch [1/10], Iter [1193/3125], train_loss:0.084686 Epoch [1/10], Iter [1194/3125], train_loss:0.052129 Epoch [1/10], Iter [1195/3125], train_loss:0.088091 Epoch [1/10], Iter [1196/3125], train_loss:0.108515 Epoch [1/10], Iter [1197/3125], train_loss:0.066917 Epoch [1/10], Iter [1198/3125], train_loss:0.081250 Epoch [1/10], Iter [1199/3125], train_loss:0.060395 Epoch [1/10], Iter [1200/3125], train_loss:0.111344 Epoch [1/10], Iter [1201/3125], train_loss:0.067042 Epoch [1/10], Iter [1202/3125], train_loss:0.056118 Epoch [1/10], Iter [1203/3125], train_loss:0.100409 Epoch [1/10], Iter [1204/3125], train_loss:0.079419 Epoch [1/10], Iter [1205/3125], train_loss:0.044308 Epoch [1/10], Iter [1206/3125], train_loss:0.053429 Epoch [1/10], Iter [1207/3125], train_loss:0.045393 Epoch [1/10], Iter [1208/3125], train_loss:0.056517 Epoch [1/10], Iter [1209/3125], train_loss:0.051357 Epoch [1/10], Iter [1210/3125], train_loss:0.074712 Epoch [1/10], Iter [1211/3125], train_loss:0.067255 Epoch [1/10], Iter [1212/3125], train_loss:0.066072 Epoch [1/10], Iter [1213/3125], train_loss:0.036946 Epoch [1/10], Iter [1214/3125], train_loss:0.074870 Epoch [1/10], Iter [1215/3125], train_loss:0.095798 Epoch [1/10], Iter [1216/3125], train_loss:0.058114 Epoch [1/10], Iter [1217/3125], train_loss:0.067285 Epoch [1/10], Iter [1218/3125], train_loss:0.076193 Epoch [1/10], Iter [1219/3125], train_loss:0.069693 Epoch [1/10], Iter [1220/3125], train_loss:0.072604 Epoch [1/10], Iter [1221/3125], train_loss:0.064588 Epoch [1/10], Iter [1222/3125], train_loss:0.070116 Epoch [1/10], Iter [1223/3125], train_loss:0.078694 Epoch [1/10], Iter [1224/3125], train_loss:0.073832 Epoch [1/10], Iter [1225/3125], train_loss:0.057916 Epoch [1/10], Iter [1226/3125], train_loss:0.074006 Epoch [1/10], Iter [1227/3125], train_loss:0.094362 Epoch [1/10], Iter [1228/3125], train_loss:0.052954 Epoch [1/10], Iter [1229/3125], train_loss:0.066249 Epoch [1/10], Iter [1230/3125], train_loss:0.037475 Epoch [1/10], Iter [1231/3125], train_loss:0.037161 Epoch [1/10], Iter [1232/3125], train_loss:0.080392 Epoch [1/10], Iter [1233/3125], train_loss:0.064337 Epoch [1/10], Iter [1234/3125], train_loss:0.036732 Epoch [1/10], Iter [1235/3125], train_loss:0.080269 Epoch [1/10], Iter [1236/3125], train_loss:0.073352 Epoch [1/10], Iter [1237/3125], train_loss:0.071526 Epoch [1/10], Iter [1238/3125], train_loss:0.064553 Epoch [1/10], Iter [1239/3125], train_loss:0.094893 Epoch [1/10], Iter [1240/3125], train_loss:0.061000 Epoch [1/10], Iter [1241/3125], train_loss:0.069262 Epoch [1/10], Iter [1242/3125], train_loss:0.079779 Epoch [1/10], Iter [1243/3125], train_loss:0.066429 Epoch [1/10], Iter [1244/3125], train_loss:0.046146 Epoch [1/10], Iter [1245/3125], train_loss:0.054782 Epoch [1/10], Iter [1246/3125], train_loss:0.080050 Epoch [1/10], Iter [1247/3125], train_loss:0.081471 Epoch [1/10], Iter [1248/3125], train_loss:0.065746 Epoch [1/10], Iter [1249/3125], train_loss:0.037090 Epoch [1/10], Iter [1250/3125], train_loss:0.076876 Epoch [1/10], Iter [1251/3125], train_loss:0.051030 Epoch [1/10], Iter [1252/3125], train_loss:0.042274 Epoch [1/10], Iter [1253/3125], train_loss:0.068953 Epoch [1/10], Iter [1254/3125], train_loss:0.077853 Epoch [1/10], Iter [1255/3125], train_loss:0.078600 Epoch [1/10], Iter [1256/3125], train_loss:0.029034 Epoch [1/10], Iter [1257/3125], train_loss:0.067805 Epoch [1/10], Iter [1258/3125], train_loss:0.105204 Epoch [1/10], Iter [1259/3125], train_loss:0.044573 Epoch [1/10], Iter [1260/3125], train_loss:0.098438 Epoch [1/10], Iter [1261/3125], train_loss:0.044922 Epoch [1/10], Iter [1262/3125], train_loss:0.077494 Epoch [1/10], Iter [1263/3125], train_loss:0.068515 Epoch [1/10], Iter [1264/3125], train_loss:0.082361 Epoch [1/10], Iter [1265/3125], train_loss:0.065620 Epoch [1/10], Iter [1266/3125], train_loss:0.061101 Epoch [1/10], Iter [1267/3125], train_loss:0.072236 Epoch [1/10], Iter [1268/3125], train_loss:0.057902 Epoch [1/10], Iter [1269/3125], train_loss:0.078264 Epoch [1/10], Iter [1270/3125], train_loss:0.053628 Epoch [1/10], Iter [1271/3125], train_loss:0.076903 Epoch [1/10], Iter [1272/3125], train_loss:0.055117 Epoch [1/10], Iter [1273/3125], train_loss:0.122055 Epoch [1/10], Iter [1274/3125], train_loss:0.041958 Epoch [1/10], Iter [1275/3125], train_loss:0.110160 Epoch [1/10], Iter [1276/3125], train_loss:0.080354 Epoch [1/10], Iter [1277/3125], train_loss:0.036007 Epoch [1/10], Iter [1278/3125], train_loss:0.051821 Epoch [1/10], Iter [1279/3125], train_loss:0.103632 Epoch [1/10], Iter [1280/3125], train_loss:0.105166 Epoch [1/10], Iter [1281/3125], train_loss:0.068429 Epoch [1/10], Iter [1282/3125], train_loss:0.072354 Epoch [1/10], Iter [1283/3125], train_loss:0.058038 Epoch [1/10], Iter [1284/3125], train_loss:0.071881 Epoch [1/10], Iter [1285/3125], train_loss:0.033587 Epoch [1/10], Iter [1286/3125], train_loss:0.041231 Epoch [1/10], Iter [1287/3125], train_loss:0.072158 Epoch [1/10], Iter [1288/3125], train_loss:0.037460 Epoch [1/10], Iter [1289/3125], train_loss:0.052904 Epoch [1/10], Iter [1290/3125], train_loss:0.051290 Epoch [1/10], Iter [1291/3125], train_loss:0.076521 Epoch [1/10], Iter [1292/3125], train_loss:0.045308 Epoch [1/10], Iter [1293/3125], train_loss:0.077797 Epoch [1/10], Iter [1294/3125], train_loss:0.050401 Epoch [1/10], Iter [1295/3125], train_loss:0.054285 Epoch [1/10], Iter [1296/3125], train_loss:0.071456 Epoch [1/10], Iter [1297/3125], train_loss:0.069530 Epoch [1/10], Iter [1298/3125], train_loss:0.063551 Epoch [1/10], Iter [1299/3125], train_loss:0.060730 Epoch [1/10], Iter [1300/3125], train_loss:0.054880 Epoch [1/10], Iter [1301/3125], train_loss:0.049532 Epoch [1/10], Iter [1302/3125], train_loss:0.069171 Epoch [1/10], Iter [1303/3125], train_loss:0.061904 Epoch [1/10], Iter [1304/3125], train_loss:0.047012 Epoch [1/10], Iter [1305/3125], train_loss:0.045866 Epoch [1/10], Iter [1306/3125], train_loss:0.042385 Epoch [1/10], Iter [1307/3125], train_loss:0.050176 Epoch [1/10], Iter [1308/3125], train_loss:0.082048 Epoch [1/10], Iter [1309/3125], train_loss:0.042563 Epoch [1/10], Iter [1310/3125], train_loss:0.078971 Epoch [1/10], Iter [1311/3125], train_loss:0.086524 Epoch [1/10], Iter [1312/3125], train_loss:0.056474 Epoch [1/10], Iter [1313/3125], train_loss:0.037732 Epoch [1/10], Iter [1314/3125], train_loss:0.078819 Epoch [1/10], Iter [1315/3125], train_loss:0.082700 Epoch [1/10], Iter [1316/3125], train_loss:0.092105 Epoch [1/10], Iter [1317/3125], train_loss:0.059939 Epoch [1/10], Iter [1318/3125], train_loss:0.073690 Epoch [1/10], Iter [1319/3125], train_loss:0.049467 Epoch [1/10], Iter [1320/3125], train_loss:0.086146 Epoch [1/10], Iter [1321/3125], train_loss:0.061879 Epoch [1/10], Iter [1322/3125], train_loss:0.093417 Epoch [1/10], Iter [1323/3125], train_loss:0.041446 Epoch [1/10], Iter [1324/3125], train_loss:0.055495 Epoch [1/10], Iter [1325/3125], train_loss:0.061338 Epoch [1/10], Iter [1326/3125], train_loss:0.057086 Epoch [1/10], Iter [1327/3125], train_loss:0.051174 Epoch [1/10], Iter [1328/3125], train_loss:0.054015 Epoch [1/10], Iter [1329/3125], train_loss:0.061765 Epoch [1/10], Iter [1330/3125], train_loss:0.066730 Epoch [1/10], Iter [1331/3125], train_loss:0.054490 Epoch [1/10], Iter [1332/3125], train_loss:0.057822 Epoch [1/10], Iter [1333/3125], train_loss:0.063132 Epoch [1/10], Iter [1334/3125], train_loss:0.069564 Epoch [1/10], Iter [1335/3125], train_loss:0.044150 Epoch [1/10], Iter [1336/3125], train_loss:0.080780 Epoch [1/10], Iter [1337/3125], train_loss:0.058406 Epoch [1/10], Iter [1338/3125], train_loss:0.049550 Epoch [1/10], Iter [1339/3125], train_loss:0.044474 Epoch [1/10], Iter [1340/3125], train_loss:0.055215 Epoch [1/10], Iter [1341/3125], train_loss:0.097746 Epoch [1/10], Iter [1342/3125], train_loss:0.071166 Epoch [1/10], Iter [1343/3125], train_loss:0.050535 Epoch [1/10], Iter [1344/3125], train_loss:0.065595 Epoch [1/10], Iter [1345/3125], train_loss:0.069312 Epoch [1/10], Iter [1346/3125], train_loss:0.068984 Epoch [1/10], Iter [1347/3125], train_loss:0.114133 Epoch [1/10], Iter [1348/3125], train_loss:0.053902 Epoch [1/10], Iter [1349/3125], train_loss:0.039486 Epoch [1/10], Iter [1350/3125], train_loss:0.077412 Epoch [1/10], Iter [1351/3125], train_loss:0.105866 Epoch [1/10], Iter [1352/3125], train_loss:0.036934 Epoch [1/10], Iter [1353/3125], train_loss:0.028790 Epoch [1/10], Iter [1354/3125], train_loss:0.044115 Epoch [1/10], Iter [1355/3125], train_loss:0.050180 Epoch [1/10], Iter [1356/3125], train_loss:0.035173 Epoch [1/10], Iter [1357/3125], train_loss:0.066359 Epoch [1/10], Iter [1358/3125], train_loss:0.061649 Epoch [1/10], Iter [1359/3125], train_loss:0.090383 Epoch [1/10], Iter [1360/3125], train_loss:0.094560 Epoch [1/10], Iter [1361/3125], train_loss:0.051187 Epoch [1/10], Iter [1362/3125], train_loss:0.051535 Epoch [1/10], Iter [1363/3125], train_loss:0.086489 Epoch [1/10], Iter [1364/3125], train_loss:0.064312 Epoch [1/10], Iter [1365/3125], train_loss:0.035589 Epoch [1/10], Iter [1366/3125], train_loss:0.074556 Epoch [1/10], Iter [1367/3125], train_loss:0.095972 Epoch [1/10], Iter [1368/3125], train_loss:0.079113 Epoch [1/10], Iter [1369/3125], train_loss:0.075476 Epoch [1/10], Iter [1370/3125], train_loss:0.055053 Epoch [1/10], Iter [1371/3125], train_loss:0.036419 Epoch [1/10], Iter [1372/3125], train_loss:0.082008 Epoch [1/10], Iter [1373/3125], train_loss:0.035035 Epoch [1/10], Iter [1374/3125], train_loss:0.061965 Epoch [1/10], Iter [1375/3125], train_loss:0.090616 Epoch [1/10], Iter [1376/3125], train_loss:0.071584 Epoch [1/10], Iter [1377/3125], train_loss:0.062969 Epoch [1/10], Iter [1378/3125], train_loss:0.049597 Epoch [1/10], Iter [1379/3125], train_loss:0.042371 Epoch [1/10], Iter [1380/3125], train_loss:0.058470 Epoch [1/10], Iter [1381/3125], train_loss:0.089132 Epoch [1/10], Iter [1382/3125], train_loss:0.042923 Epoch [1/10], Iter [1383/3125], train_loss:0.066922 Epoch [1/10], Iter [1384/3125], train_loss:0.055818 Epoch [1/10], Iter [1385/3125], train_loss:0.077349 Epoch [1/10], Iter [1386/3125], train_loss:0.034871 Epoch [1/10], Iter [1387/3125], train_loss:0.034735 Epoch [1/10], Iter [1388/3125], train_loss:0.041610 Epoch [1/10], Iter [1389/3125], train_loss:0.078672 Epoch [1/10], Iter [1390/3125], train_loss:0.079922 Epoch [1/10], Iter [1391/3125], train_loss:0.053695 Epoch [1/10], Iter [1392/3125], train_loss:0.094359 Epoch [1/10], Iter [1393/3125], train_loss:0.066231 Epoch [1/10], Iter [1394/3125], train_loss:0.053103 Epoch [1/10], Iter [1395/3125], train_loss:0.054961 Epoch [1/10], Iter [1396/3125], train_loss:0.069908 Epoch [1/10], Iter [1397/3125], train_loss:0.036498 Epoch [1/10], Iter [1398/3125], train_loss:0.070611 Epoch [1/10], Iter [1399/3125], train_loss:0.046233 Epoch [1/10], Iter [1400/3125], train_loss:0.045637 Epoch [1/10], Iter [1401/3125], train_loss:0.026635 Epoch [1/10], Iter [1402/3125], train_loss:0.051463 Epoch [1/10], Iter [1403/3125], train_loss:0.072863 Epoch [1/10], Iter [1404/3125], train_loss:0.039532 Epoch [1/10], Iter [1405/3125], train_loss:0.094029 Epoch [1/10], Iter [1406/3125], train_loss:0.107056 Epoch [1/10], Iter [1407/3125], train_loss:0.068884 Epoch [1/10], Iter [1408/3125], train_loss:0.045376 Epoch [1/10], Iter [1409/3125], train_loss:0.035768 Epoch [1/10], Iter [1410/3125], train_loss:0.058423 Epoch [1/10], Iter [1411/3125], train_loss:0.105580 Epoch [1/10], Iter [1412/3125], train_loss:0.059442 Epoch [1/10], Iter [1413/3125], train_loss:0.056727 Epoch [1/10], Iter [1414/3125], train_loss:0.046670 Epoch [1/10], Iter [1415/3125], train_loss:0.052132 Epoch [1/10], Iter [1416/3125], train_loss:0.086853 Epoch [1/10], Iter [1417/3125], train_loss:0.053923 Epoch [1/10], Iter [1418/3125], train_loss:0.043211 Epoch [1/10], Iter [1419/3125], train_loss:0.042907 Epoch [1/10], Iter [1420/3125], train_loss:0.044250 Epoch [1/10], Iter [1421/3125], train_loss:0.084763 Epoch [1/10], Iter [1422/3125], train_loss:0.063013 Epoch [1/10], Iter [1423/3125], train_loss:0.031712 Epoch [1/10], Iter [1424/3125], train_loss:0.066372 Epoch [1/10], Iter [1425/3125], train_loss:0.079808 Epoch [1/10], Iter [1426/3125], train_loss:0.070664 Epoch [1/10], Iter [1427/3125], train_loss:0.042726 Epoch [1/10], Iter [1428/3125], train_loss:0.047623 Epoch [1/10], Iter [1429/3125], train_loss:0.054263 Epoch [1/10], Iter [1430/3125], train_loss:0.065956 Epoch [1/10], Iter [1431/3125], train_loss:0.067826 Epoch [1/10], Iter [1432/3125], train_loss:0.049903 Epoch [1/10], Iter [1433/3125], train_loss:0.058264 Epoch [1/10], Iter [1434/3125], train_loss:0.082112 Epoch [1/10], Iter [1435/3125], train_loss:0.048372 Epoch [1/10], Iter [1436/3125], train_loss:0.089613 Epoch [1/10], Iter [1437/3125], train_loss:0.070496 Epoch [1/10], Iter [1438/3125], train_loss:0.048467 Epoch [1/10], Iter [1439/3125], train_loss:0.048719 Epoch [1/10], Iter [1440/3125], train_loss:0.051029 Epoch [1/10], Iter [1441/3125], train_loss:0.066726 Epoch [1/10], Iter [1442/3125], train_loss:0.074743 Epoch [1/10], Iter [1443/3125], train_loss:0.062530 Epoch [1/10], Iter [1444/3125], train_loss:0.031921 Epoch [1/10], Iter [1445/3125], train_loss:0.082468 Epoch [1/10], Iter [1446/3125], train_loss:0.066029 Epoch [1/10], Iter [1447/3125], train_loss:0.079104 Epoch [1/10], Iter [1448/3125], train_loss:0.050547 Epoch [1/10], Iter [1449/3125], train_loss:0.070847 Epoch [1/10], Iter [1450/3125], train_loss:0.066685 Epoch [1/10], Iter [1451/3125], train_loss:0.062502 Epoch [1/10], Iter [1452/3125], train_loss:0.039792 Epoch [1/10], Iter [1453/3125], train_loss:0.074898 Epoch [1/10], Iter [1454/3125], train_loss:0.082731 Epoch [1/10], Iter [1455/3125], train_loss:0.051062 Epoch [1/10], Iter [1456/3125], train_loss:0.081949 Epoch [1/10], Iter [1457/3125], train_loss:0.048781 Epoch [1/10], Iter [1458/3125], train_loss:0.031672 Epoch [1/10], Iter [1459/3125], train_loss:0.081797 Epoch [1/10], Iter [1460/3125], train_loss:0.043624 Epoch [1/10], Iter [1461/3125], train_loss:0.042655 Epoch [1/10], Iter [1462/3125], train_loss:0.065425 Epoch [1/10], Iter [1463/3125], train_loss:0.051312 Epoch [1/10], Iter [1464/3125], train_loss:0.069975 Epoch [1/10], Iter [1465/3125], train_loss:0.054417 Epoch [1/10], Iter [1466/3125], train_loss:0.068450 Epoch [1/10], Iter [1467/3125], train_loss:0.055852 Epoch [1/10], Iter [1468/3125], train_loss:0.056495 Epoch [1/10], Iter [1469/3125], train_loss:0.048216 Epoch [1/10], Iter [1470/3125], train_loss:0.116062 Epoch [1/10], Iter [1471/3125], train_loss:0.076963 Epoch [1/10], Iter [1472/3125], train_loss:0.061780 Epoch [1/10], Iter [1473/3125], train_loss:0.057824 Epoch [1/10], Iter [1474/3125], train_loss:0.051863 Epoch [1/10], Iter [1475/3125], train_loss:0.064877 Epoch [1/10], Iter [1476/3125], train_loss:0.026023 Epoch [1/10], Iter [1477/3125], train_loss:0.071512 Epoch [1/10], Iter [1478/3125], train_loss:0.046893 Epoch [1/10], Iter [1479/3125], train_loss:0.086675 Epoch [1/10], Iter [1480/3125], train_loss:0.056367 Epoch [1/10], Iter [1481/3125], train_loss:0.086944 Epoch [1/10], Iter [1482/3125], train_loss:0.059426 Epoch [1/10], Iter [1483/3125], train_loss:0.062180 Epoch [1/10], Iter [1484/3125], train_loss:0.036093 Epoch [1/10], Iter [1485/3125], train_loss:0.053832 Epoch [1/10], Iter [1486/3125], train_loss:0.059764 Epoch [1/10], Iter [1487/3125], train_loss:0.069709 Epoch [1/10], Iter [1488/3125], train_loss:0.058866 Epoch [1/10], Iter [1489/3125], train_loss:0.042857 Epoch [1/10], Iter [1490/3125], train_loss:0.051318 Epoch [1/10], Iter [1491/3125], train_loss:0.046036 Epoch [1/10], Iter [1492/3125], train_loss:0.067652 Epoch [1/10], Iter [1493/3125], train_loss:0.068058 Epoch [1/10], Iter [1494/3125], train_loss:0.058382 Epoch [1/10], Iter [1495/3125], train_loss:0.071653 Epoch [1/10], Iter [1496/3125], train_loss:0.030701 Epoch [1/10], Iter [1497/3125], train_loss:0.085657 Epoch [1/10], Iter [1498/3125], train_loss:0.051193 Epoch [1/10], Iter [1499/3125], train_loss:0.047368 Epoch [1/10], Iter [1500/3125], train_loss:0.056843 Epoch [1/10], Iter [1501/3125], train_loss:0.077672 Epoch [1/10], Iter [1502/3125], train_loss:0.046002 Epoch [1/10], Iter [1503/3125], train_loss:0.050379 Epoch [1/10], Iter [1504/3125], train_loss:0.067272 Epoch [1/10], Iter [1505/3125], train_loss:0.039557 Epoch [1/10], Iter [1506/3125], train_loss:0.072687 Epoch [1/10], Iter [1507/3125], train_loss:0.049326 Epoch [1/10], Iter [1508/3125], train_loss:0.072209 Epoch [1/10], Iter [1509/3125], train_loss:0.092582 Epoch [1/10], Iter [1510/3125], train_loss:0.049500 Epoch [1/10], Iter [1511/3125], train_loss:0.037127 Epoch [1/10], Iter [1512/3125], train_loss:0.062338 Epoch [1/10], Iter [1513/3125], train_loss:0.047520 Epoch [1/10], Iter [1514/3125], train_loss:0.069938 Epoch [1/10], Iter [1515/3125], train_loss:0.058069 Epoch [1/10], Iter [1516/3125], train_loss:0.070114 Epoch [1/10], Iter [1517/3125], train_loss:0.071238 Epoch [1/10], Iter [1518/3125], train_loss:0.036374 Epoch [1/10], Iter [1519/3125], train_loss:0.067921 Epoch [1/10], Iter [1520/3125], train_loss:0.103123 Epoch [1/10], Iter [1521/3125], train_loss:0.084642 Epoch [1/10], Iter [1522/3125], train_loss:0.052527 Epoch [1/10], Iter [1523/3125], train_loss:0.060209 Epoch [1/10], Iter [1524/3125], train_loss:0.078986 Epoch [1/10], Iter [1525/3125], train_loss:0.055619 Epoch [1/10], Iter [1526/3125], train_loss:0.035694 Epoch [1/10], Iter [1527/3125], train_loss:0.067099 Epoch [1/10], Iter [1528/3125], train_loss:0.058410 Epoch [1/10], Iter [1529/3125], train_loss:0.073605 Epoch [1/10], Iter [1530/3125], train_loss:0.048546 Epoch [1/10], Iter [1531/3125], train_loss:0.059657 Epoch [1/10], Iter [1532/3125], train_loss:0.064168 Epoch [1/10], Iter [1533/3125], train_loss:0.037178 Epoch [1/10], Iter [1534/3125], train_loss:0.053720 Epoch [1/10], Iter [1535/3125], train_loss:0.076513 Epoch [1/10], Iter [1536/3125], train_loss:0.058834 Epoch [1/10], Iter [1537/3125], train_loss:0.071573 Epoch [1/10], Iter [1538/3125], train_loss:0.060269 Epoch [1/10], Iter [1539/3125], train_loss:0.052749 Epoch [1/10], Iter [1540/3125], train_loss:0.037708 Epoch [1/10], Iter [1541/3125], train_loss:0.066439 Epoch [1/10], Iter [1542/3125], train_loss:0.090691 Epoch [1/10], Iter [1543/3125], train_loss:0.056245 Epoch [1/10], Iter [1544/3125], train_loss:0.055924 Epoch [1/10], Iter [1545/3125], train_loss:0.041803 Epoch [1/10], Iter [1546/3125], train_loss:0.048068 Epoch [1/10], Iter [1547/3125], train_loss:0.036092 Epoch [1/10], Iter [1548/3125], train_loss:0.043875 Epoch [1/10], Iter [1549/3125], train_loss:0.079322 Epoch [1/10], Iter [1550/3125], train_loss:0.039852 Epoch [1/10], Iter [1551/3125], train_loss:0.103905 Epoch [1/10], Iter [1552/3125], train_loss:0.091744 Epoch [1/10], Iter [1553/3125], train_loss:0.055681 Epoch [1/10], Iter [1554/3125], train_loss:0.092191 Epoch [1/10], Iter [1555/3125], train_loss:0.062235 Epoch [1/10], Iter [1556/3125], train_loss:0.057970 Epoch [1/10], Iter [1557/3125], train_loss:0.067547 Epoch [1/10], Iter [1558/3125], train_loss:0.055146 Epoch [1/10], Iter [1559/3125], train_loss:0.054776 Epoch [1/10], Iter [1560/3125], train_loss:0.027517 Epoch [1/10], Iter [1561/3125], train_loss:0.072663 Epoch [1/10], Iter [1562/3125], train_loss:0.058465 Epoch [1/10], Iter [1563/3125], train_loss:0.046655 Epoch [1/10], Iter [1564/3125], train_loss:0.119325 Epoch [1/10], Iter [1565/3125], train_loss:0.054731 Epoch [1/10], Iter [1566/3125], train_loss:0.081642 Epoch [1/10], Iter [1567/3125], train_loss:0.048881 Epoch [1/10], Iter [1568/3125], train_loss:0.058173 Epoch [1/10], Iter [1569/3125], train_loss:0.069358 Epoch [1/10], Iter [1570/3125], train_loss:0.061475 Epoch [1/10], Iter [1571/3125], train_loss:0.065325 Epoch [1/10], Iter [1572/3125], train_loss:0.070670 Epoch [1/10], Iter [1573/3125], train_loss:0.081902 Epoch [1/10], Iter [1574/3125], train_loss:0.049094 Epoch [1/10], Iter [1575/3125], train_loss:0.056214 Epoch [1/10], Iter [1576/3125], train_loss:0.069279 Epoch [1/10], Iter [1577/3125], train_loss:0.056715 Epoch [1/10], Iter [1578/3125], train_loss:0.099390 Epoch [1/10], Iter [1579/3125], train_loss:0.051443 Epoch [1/10], Iter [1580/3125], train_loss:0.066337 Epoch [1/10], Iter [1581/3125], train_loss:0.032681 Epoch [1/10], Iter [1582/3125], train_loss:0.036135 Epoch [1/10], Iter [1583/3125], train_loss:0.133781 Epoch [1/10], Iter [1584/3125], train_loss:0.039585 Epoch [1/10], Iter [1585/3125], train_loss:0.040581 Epoch [1/10], Iter [1586/3125], train_loss:0.045098 Epoch [1/10], Iter [1587/3125], train_loss:0.079372 Epoch [1/10], Iter [1588/3125], train_loss:0.083663 Epoch [1/10], Iter [1589/3125], train_loss:0.057084 Epoch [1/10], Iter [1590/3125], train_loss:0.070563 Epoch [1/10], Iter [1591/3125], train_loss:0.065010 Epoch [1/10], Iter [1592/3125], train_loss:0.047786 Epoch [1/10], Iter [1593/3125], train_loss:0.060590 Epoch [1/10], Iter [1594/3125], train_loss:0.081765 Epoch [1/10], Iter [1595/3125], train_loss:0.056855 Epoch [1/10], Iter [1596/3125], train_loss:0.039855 Epoch [1/10], Iter [1597/3125], train_loss:0.046420 Epoch [1/10], Iter [1598/3125], train_loss:0.043999 Epoch [1/10], Iter [1599/3125], train_loss:0.046221 Epoch [1/10], Iter [1600/3125], train_loss:0.064322 Epoch [1/10], Iter [1601/3125], train_loss:0.026215 Epoch [1/10], Iter [1602/3125], train_loss:0.035398 Epoch [1/10], Iter [1603/3125], train_loss:0.082975 Epoch [1/10], Iter [1604/3125], train_loss:0.069643 Epoch [1/10], Iter [1605/3125], train_loss:0.074299 Epoch [1/10], Iter [1606/3125], train_loss:0.036288 Epoch [1/10], Iter [1607/3125], train_loss:0.089655 Epoch [1/10], Iter [1608/3125], train_loss:0.052850 Epoch [1/10], Iter [1609/3125], train_loss:0.103227 Epoch [1/10], Iter [1610/3125], train_loss:0.021318 Epoch [1/10], Iter [1611/3125], train_loss:0.053062 Epoch [1/10], Iter [1612/3125], train_loss:0.064742 Epoch [1/10], Iter [1613/3125], train_loss:0.041883 Epoch [1/10], Iter [1614/3125], train_loss:0.046411 Epoch [1/10], Iter [1615/3125], train_loss:0.058942 Epoch [1/10], Iter [1616/3125], train_loss:0.044977 Epoch [1/10], Iter [1617/3125], train_loss:0.041410 Epoch [1/10], Iter [1618/3125], train_loss:0.084004 Epoch [1/10], Iter [1619/3125], train_loss:0.064973 Epoch [1/10], Iter [1620/3125], train_loss:0.083455 Epoch [1/10], Iter [1621/3125], train_loss:0.061671 Epoch [1/10], Iter [1622/3125], train_loss:0.040480 Epoch [1/10], Iter [1623/3125], train_loss:0.058023 Epoch [1/10], Iter [1624/3125], train_loss:0.059297 Epoch [1/10], Iter [1625/3125], train_loss:0.056020 Epoch [1/10], Iter [1626/3125], train_loss:0.070588 Epoch [1/10], Iter [1627/3125], train_loss:0.057357 Epoch [1/10], Iter [1628/3125], train_loss:0.056434 Epoch [1/10], Iter [1629/3125], train_loss:0.063109 Epoch [1/10], Iter [1630/3125], train_loss:0.088339 Epoch [1/10], Iter [1631/3125], train_loss:0.098464 Epoch [1/10], Iter [1632/3125], train_loss:0.085437 Epoch [1/10], Iter [1633/3125], train_loss:0.056909 Epoch [1/10], Iter [1634/3125], train_loss:0.044746 Epoch [1/10], Iter [1635/3125], train_loss:0.058112 Epoch [1/10], Iter [1636/3125], train_loss:0.051674 Epoch [1/10], Iter [1637/3125], train_loss:0.073020 Epoch [1/10], Iter [1638/3125], train_loss:0.054744 Epoch [1/10], Iter [1639/3125], train_loss:0.020978 Epoch [1/10], Iter [1640/3125], train_loss:0.040359 Epoch [1/10], Iter [1641/3125], train_loss:0.078304 Epoch [1/10], Iter [1642/3125], train_loss:0.042950 Epoch [1/10], Iter [1643/3125], train_loss:0.035843 Epoch [1/10], Iter [1644/3125], train_loss:0.075233 Epoch [1/10], Iter [1645/3125], train_loss:0.057683 Epoch [1/10], Iter [1646/3125], train_loss:0.058583 Epoch [1/10], Iter [1647/3125], train_loss:0.054886 Epoch [1/10], Iter [1648/3125], train_loss:0.074777 Epoch [1/10], Iter [1649/3125], train_loss:0.035126 Epoch [1/10], Iter [1650/3125], train_loss:0.030282 Epoch [1/10], Iter [1651/3125], train_loss:0.065689 Epoch [1/10], Iter [1652/3125], train_loss:0.038346 Epoch [1/10], Iter [1653/3125], train_loss:0.077780 Epoch [1/10], Iter [1654/3125], train_loss:0.057102 Epoch [1/10], Iter [1655/3125], train_loss:0.054383 Epoch [1/10], Iter [1656/3125], train_loss:0.033800 Epoch [1/10], Iter [1657/3125], train_loss:0.047648 Epoch [1/10], Iter [1658/3125], train_loss:0.040589 Epoch [1/10], Iter [1659/3125], train_loss:0.057799 Epoch [1/10], Iter [1660/3125], train_loss:0.060077 Epoch [1/10], Iter [1661/3125], train_loss:0.045393 Epoch [1/10], Iter [1662/3125], train_loss:0.051922 Epoch [1/10], Iter [1663/3125], train_loss:0.122704 Epoch [1/10], Iter [1664/3125], train_loss:0.048353 Epoch [1/10], Iter [1665/3125], train_loss:0.021179 Epoch [1/10], Iter [1666/3125], train_loss:0.076526 Epoch [1/10], Iter [1667/3125], train_loss:0.079436 Epoch [1/10], Iter [1668/3125], train_loss:0.039214 Epoch [1/10], Iter [1669/3125], train_loss:0.042830 Epoch [1/10], Iter [1670/3125], train_loss:0.042728 Epoch [1/10], Iter [1671/3125], train_loss:0.048967 Epoch [1/10], Iter [1672/3125], train_loss:0.054698 Epoch [1/10], Iter [1673/3125], train_loss:0.041978 Epoch [1/10], Iter [1674/3125], train_loss:0.073049 Epoch [1/10], Iter [1675/3125], train_loss:0.037080 Epoch [1/10], Iter [1676/3125], train_loss:0.027289 Epoch [1/10], Iter [1677/3125], train_loss:0.060551 Epoch [1/10], Iter [1678/3125], train_loss:0.045196 Epoch [1/10], Iter [1679/3125], train_loss:0.080010 Epoch [1/10], Iter [1680/3125], train_loss:0.053764 Epoch [1/10], Iter [1681/3125], train_loss:0.073596 Epoch [1/10], Iter [1682/3125], train_loss:0.070110 Epoch [1/10], Iter [1683/3125], train_loss:0.047264 Epoch [1/10], Iter [1684/3125], train_loss:0.061473 Epoch [1/10], Iter [1685/3125], train_loss:0.041371 Epoch [1/10], Iter [1686/3125], train_loss:0.049107 Epoch [1/10], Iter [1687/3125], train_loss:0.051743 Epoch [1/10], Iter [1688/3125], train_loss:0.109640 Epoch [1/10], Iter [1689/3125], train_loss:0.048228 Epoch [1/10], Iter [1690/3125], train_loss:0.050521 Epoch [1/10], Iter [1691/3125], train_loss:0.079257 Epoch [1/10], Iter [1692/3125], train_loss:0.042919 Epoch [1/10], Iter [1693/3125], train_loss:0.058962 Epoch [1/10], Iter [1694/3125], train_loss:0.072977 Epoch [1/10], Iter [1695/3125], train_loss:0.029940 Epoch [1/10], Iter [1696/3125], train_loss:0.072861 Epoch [1/10], Iter [1697/3125], train_loss:0.075670 Epoch [1/10], Iter [1698/3125], train_loss:0.065588 Epoch [1/10], Iter [1699/3125], train_loss:0.067763 Epoch [1/10], Iter [1700/3125], train_loss:0.037320 Epoch [1/10], Iter [1701/3125], train_loss:0.084554 Epoch [1/10], Iter [1702/3125], train_loss:0.046403 Epoch [1/10], Iter [1703/3125], train_loss:0.040859 Epoch [1/10], Iter [1704/3125], train_loss:0.058458 Epoch [1/10], Iter [1705/3125], train_loss:0.066891 Epoch [1/10], Iter [1706/3125], train_loss:0.100955 Epoch [1/10], Iter [1707/3125], train_loss:0.062376 Epoch [1/10], Iter [1708/3125], train_loss:0.068730 Epoch [1/10], Iter [1709/3125], train_loss:0.038045 Epoch [1/10], Iter [1710/3125], train_loss:0.060304 Epoch [1/10], Iter [1711/3125], train_loss:0.046575 Epoch [1/10], Iter [1712/3125], train_loss:0.048462 Epoch [1/10], Iter [1713/3125], train_loss:0.072498 Epoch [1/10], Iter [1714/3125], train_loss:0.052895 Epoch [1/10], Iter [1715/3125], train_loss:0.065395 Epoch [1/10], Iter [1716/3125], train_loss:0.076119 Epoch [1/10], Iter [1717/3125], train_loss:0.084909 Epoch [1/10], Iter [1718/3125], train_loss:0.058882 Epoch [1/10], Iter [1719/3125], train_loss:0.064582 Epoch [1/10], Iter [1720/3125], train_loss:0.056367 Epoch [1/10], Iter [1721/3125], train_loss:0.059624 Epoch [1/10], Iter [1722/3125], train_loss:0.058548 Epoch [1/10], Iter [1723/3125], train_loss:0.071492 Epoch [1/10], Iter [1724/3125], train_loss:0.087462 Epoch [1/10], Iter [1725/3125], train_loss:0.038312 Epoch [1/10], Iter [1726/3125], train_loss:0.039811 Epoch [1/10], Iter [1727/3125], train_loss:0.047398 Epoch [1/10], Iter [1728/3125], train_loss:0.054377 Epoch [1/10], Iter [1729/3125], train_loss:0.061826 Epoch [1/10], Iter [1730/3125], train_loss:0.051879 Epoch [1/10], Iter [1731/3125], train_loss:0.105766 Epoch [1/10], Iter [1732/3125], train_loss:0.058592 Epoch [1/10], Iter [1733/3125], train_loss:0.058135 Epoch [1/10], Iter [1734/3125], train_loss:0.077106 Epoch [1/10], Iter [1735/3125], train_loss:0.053300 Epoch [1/10], Iter [1736/3125], train_loss:0.099648 Epoch [1/10], Iter [1737/3125], train_loss:0.038420 Epoch [1/10], Iter [1738/3125], train_loss:0.074359 Epoch [1/10], Iter [1739/3125], train_loss:0.075496 Epoch [1/10], Iter [1740/3125], train_loss:0.026707 Epoch [1/10], Iter [1741/3125], train_loss:0.051810 Epoch [1/10], Iter [1742/3125], train_loss:0.061063 Epoch [1/10], Iter [1743/3125], train_loss:0.070292 Epoch [1/10], Iter [1744/3125], train_loss:0.042350 Epoch [1/10], Iter [1745/3125], train_loss:0.059614 Epoch [1/10], Iter [1746/3125], train_loss:0.025684 Epoch [1/10], Iter [1747/3125], train_loss:0.044094 Epoch [1/10], Iter [1748/3125], train_loss:0.039633 Epoch [1/10], Iter [1749/3125], train_loss:0.061609 Epoch [1/10], Iter [1750/3125], train_loss:0.059462 Epoch [1/10], Iter [1751/3125], train_loss:0.085215 Epoch [1/10], Iter [1752/3125], train_loss:0.061459 Epoch [1/10], Iter [1753/3125], train_loss:0.051309 Epoch [1/10], Iter [1754/3125], train_loss:0.055947 Epoch [1/10], Iter [1755/3125], train_loss:0.082786 Epoch [1/10], Iter [1756/3125], train_loss:0.097624 Epoch [1/10], Iter [1757/3125], train_loss:0.061017 Epoch [1/10], Iter [1758/3125], train_loss:0.070072 Epoch [1/10], Iter [1759/3125], train_loss:0.075882 Epoch [1/10], Iter [1760/3125], train_loss:0.039222 Epoch [1/10], Iter [1761/3125], train_loss:0.071271 Epoch [1/10], Iter [1762/3125], train_loss:0.043728 Epoch [1/10], Iter [1763/3125], train_loss:0.060507 Epoch [1/10], Iter [1764/3125], train_loss:0.072506 Epoch [1/10], Iter [1765/3125], train_loss:0.056758 Epoch [1/10], Iter [1766/3125], train_loss:0.043773 Epoch [1/10], Iter [1767/3125], train_loss:0.053143 Epoch [1/10], Iter [1768/3125], train_loss:0.092098 Epoch [1/10], Iter [1769/3125], train_loss:0.027869 Epoch [1/10], Iter [1770/3125], train_loss:0.057473 Epoch [1/10], Iter [1771/3125], train_loss:0.060365 Epoch [1/10], Iter [1772/3125], train_loss:0.040789 Epoch [1/10], Iter [1773/3125], train_loss:0.064049 Epoch [1/10], Iter [1774/3125], train_loss:0.063056 Epoch [1/10], Iter [1775/3125], train_loss:0.051557 Epoch [1/10], Iter [1776/3125], train_loss:0.054645 Epoch [1/10], Iter [1777/3125], train_loss:0.039127 Epoch [1/10], Iter [1778/3125], train_loss:0.024407 Epoch [1/10], Iter [1779/3125], train_loss:0.052543 Epoch [1/10], Iter [1780/3125], train_loss:0.046873 Epoch [1/10], Iter [1781/3125], train_loss:0.041262 Epoch [1/10], Iter [1782/3125], train_loss:0.080122 Epoch [1/10], Iter [1783/3125], train_loss:0.050520 Epoch [1/10], Iter [1784/3125], train_loss:0.055967 Epoch [1/10], Iter [1785/3125], train_loss:0.035253 Epoch [1/10], Iter [1786/3125], train_loss:0.079063 Epoch [1/10], Iter [1787/3125], train_loss:0.074867 Epoch [1/10], Iter [1788/3125], train_loss:0.055334 Epoch [1/10], Iter [1789/3125], train_loss:0.057995 Epoch [1/10], Iter [1790/3125], train_loss:0.040717 Epoch [1/10], Iter [1791/3125], train_loss:0.077024 Epoch [1/10], Iter [1792/3125], train_loss:0.050221 Epoch [1/10], Iter [1793/3125], train_loss:0.094391 Epoch [1/10], Iter [1794/3125], train_loss:0.074695 Epoch [1/10], Iter [1795/3125], train_loss:0.058015 Epoch [1/10], Iter [1796/3125], train_loss:0.047358 Epoch [1/10], Iter [1797/3125], train_loss:0.065972 Epoch [1/10], Iter [1798/3125], train_loss:0.045176 Epoch [1/10], Iter [1799/3125], train_loss:0.038734 Epoch [1/10], Iter [1800/3125], train_loss:0.066014 Epoch [1/10], Iter [1801/3125], train_loss:0.046584 Epoch [1/10], Iter [1802/3125], train_loss:0.057352 Epoch [1/10], Iter [1803/3125], train_loss:0.036245 Epoch [1/10], Iter [1804/3125], train_loss:0.040863 Epoch [1/10], Iter [1805/3125], train_loss:0.120763 Epoch [1/10], Iter [1806/3125], train_loss:0.031612 Epoch [1/10], Iter [1807/3125], train_loss:0.073508 Epoch [1/10], Iter [1808/3125], train_loss:0.059417 Epoch [1/10], Iter [1809/3125], train_loss:0.072521 Epoch [1/10], Iter [1810/3125], train_loss:0.063052 Epoch [1/10], Iter [1811/3125], train_loss:0.059529 Epoch [1/10], Iter [1812/3125], train_loss:0.046363 Epoch [1/10], Iter [1813/3125], train_loss:0.073090 Epoch [1/10], Iter [1814/3125], train_loss:0.034225 Epoch [1/10], Iter [1815/3125], train_loss:0.085764 Epoch [1/10], Iter [1816/3125], train_loss:0.046848 Epoch [1/10], Iter [1817/3125], train_loss:0.059717 Epoch [1/10], Iter [1818/3125], train_loss:0.047675 Epoch [1/10], Iter [1819/3125], train_loss:0.084691 Epoch [1/10], Iter [1820/3125], train_loss:0.079962 Epoch [1/10], Iter [1821/3125], train_loss:0.089780 Epoch [1/10], Iter [1822/3125], train_loss:0.060596 Epoch [1/10], Iter [1823/3125], train_loss:0.049416 Epoch [1/10], Iter [1824/3125], train_loss:0.091829 Epoch [1/10], Iter [1825/3125], train_loss:0.086237 Epoch [1/10], Iter [1826/3125], train_loss:0.051125 Epoch [1/10], Iter [1827/3125], train_loss:0.097379 Epoch [1/10], Iter [1828/3125], train_loss:0.102906 Epoch [1/10], Iter [1829/3125], train_loss:0.080723 Epoch [1/10], Iter [1830/3125], train_loss:0.040206 Epoch [1/10], Iter [1831/3125], train_loss:0.059156 Epoch [1/10], Iter [1832/3125], train_loss:0.043076 Epoch [1/10], Iter [1833/3125], train_loss:0.029663 Epoch [1/10], Iter [1834/3125], train_loss:0.051820 Epoch [1/10], Iter [1835/3125], train_loss:0.068084 Epoch [1/10], Iter [1836/3125], train_loss:0.036504 Epoch [1/10], Iter [1837/3125], train_loss:0.048193 Epoch [1/10], Iter [1838/3125], train_loss:0.053339 Epoch [1/10], Iter [1839/3125], train_loss:0.051840 Epoch [1/10], Iter [1840/3125], train_loss:0.019614 Epoch [1/10], Iter [1841/3125], train_loss:0.055469 Epoch [1/10], Iter [1842/3125], train_loss:0.069309 Epoch [1/10], Iter [1843/3125], train_loss:0.077044 Epoch [1/10], Iter [1844/3125], train_loss:0.091119 Epoch [1/10], Iter [1845/3125], train_loss:0.056013 Epoch [1/10], Iter [1846/3125], train_loss:0.052507 Epoch [1/10], Iter [1847/3125], train_loss:0.079659 Epoch [1/10], Iter [1848/3125], train_loss:0.053403 Epoch [1/10], Iter [1849/3125], train_loss:0.077848 Epoch [1/10], Iter [1850/3125], train_loss:0.051112 Epoch [1/10], Iter [1851/3125], train_loss:0.046792 Epoch [1/10], Iter [1852/3125], train_loss:0.041306 Epoch [1/10], Iter [1853/3125], train_loss:0.043293 Epoch [1/10], Iter [1854/3125], train_loss:0.051519 Epoch [1/10], Iter [1855/3125], train_loss:0.055836 Epoch [1/10], Iter [1856/3125], train_loss:0.047736 Epoch [1/10], Iter [1857/3125], train_loss:0.069006 Epoch [1/10], Iter [1858/3125], train_loss:0.046833 Epoch [1/10], Iter [1859/3125], train_loss:0.112520 Epoch [1/10], Iter [1860/3125], train_loss:0.049536 Epoch [1/10], Iter [1861/3125], train_loss:0.054126 Epoch [1/10], Iter [1862/3125], train_loss:0.079082 Epoch [1/10], Iter [1863/3125], train_loss:0.046699 Epoch [1/10], Iter [1864/3125], train_loss:0.042452 Epoch [1/10], Iter [1865/3125], train_loss:0.050977 Epoch [1/10], Iter [1866/3125], train_loss:0.037490 Epoch [1/10], Iter [1867/3125], train_loss:0.044270 Epoch [1/10], Iter [1868/3125], train_loss:0.022775 Epoch [1/10], Iter [1869/3125], train_loss:0.048254 Epoch [1/10], Iter [1870/3125], train_loss:0.047147 Epoch [1/10], Iter [1871/3125], train_loss:0.064558 Epoch [1/10], Iter [1872/3125], train_loss:0.033295 Epoch [1/10], Iter [1873/3125], train_loss:0.037831 Epoch [1/10], Iter [1874/3125], train_loss:0.035450 Epoch [1/10], Iter [1875/3125], train_loss:0.120475 Epoch [1/10], Iter [1876/3125], train_loss:0.065689 Epoch [1/10], Iter [1877/3125], train_loss:0.051821 Epoch [1/10], Iter [1878/3125], train_loss:0.030954 Epoch [1/10], Iter [1879/3125], train_loss:0.055886 Epoch [1/10], Iter [1880/3125], train_loss:0.046567 Epoch [1/10], Iter [1881/3125], train_loss:0.054960 Epoch [1/10], Iter [1882/3125], train_loss:0.060007 Epoch [1/10], Iter [1883/3125], train_loss:0.042093 Epoch [1/10], Iter [1884/3125], train_loss:0.042883 Epoch [1/10], Iter [1885/3125], train_loss:0.072663 Epoch [1/10], Iter [1886/3125], train_loss:0.047739 Epoch [1/10], Iter [1887/3125], train_loss:0.072337 Epoch [1/10], Iter [1888/3125], train_loss:0.032112 Epoch [1/10], Iter [1889/3125], train_loss:0.063742 Epoch [1/10], Iter [1890/3125], train_loss:0.126797 Epoch [1/10], Iter [1891/3125], train_loss:0.060045 Epoch [1/10], Iter [1892/3125], train_loss:0.050613 Epoch [1/10], Iter [1893/3125], train_loss:0.018665 Epoch [1/10], Iter [1894/3125], train_loss:0.118631 Epoch [1/10], Iter [1895/3125], train_loss:0.072257 Epoch [1/10], Iter [1896/3125], train_loss:0.048342 Epoch [1/10], Iter [1897/3125], train_loss:0.053053 Epoch [1/10], Iter [1898/3125], train_loss:0.046766 Epoch [1/10], Iter [1899/3125], train_loss:0.041298 Epoch [1/10], Iter [1900/3125], train_loss:0.039161 Epoch [1/10], Iter [1901/3125], train_loss:0.052756 Epoch [1/10], Iter [1902/3125], train_loss:0.088474 Epoch [1/10], Iter [1903/3125], train_loss:0.054476 Epoch [1/10], Iter [1904/3125], train_loss:0.074824 Epoch [1/10], Iter [1905/3125], train_loss:0.038476 Epoch [1/10], Iter [1906/3125], train_loss:0.034390 Epoch [1/10], Iter [1907/3125], train_loss:0.031541 Epoch [1/10], Iter [1908/3125], train_loss:0.042509 Epoch [1/10], Iter [1909/3125], train_loss:0.048603 Epoch [1/10], Iter [1910/3125], train_loss:0.033619 Epoch [1/10], Iter [1911/3125], train_loss:0.088345 Epoch [1/10], Iter [1912/3125], train_loss:0.073088 Epoch [1/10], Iter [1913/3125], train_loss:0.053431 Epoch [1/10], Iter [1914/3125], train_loss:0.074593 Epoch [1/10], Iter [1915/3125], train_loss:0.067950 Epoch [1/10], Iter [1916/3125], train_loss:0.036191 Epoch [1/10], Iter [1917/3125], train_loss:0.057052 Epoch [1/10], Iter [1918/3125], train_loss:0.062682 Epoch [1/10], Iter [1919/3125], train_loss:0.073875 Epoch [1/10], Iter [1920/3125], train_loss:0.059812 Epoch [1/10], Iter [1921/3125], train_loss:0.049579 Epoch [1/10], Iter [1922/3125], train_loss:0.111791 Epoch [1/10], Iter [1923/3125], train_loss:0.076176 Epoch [1/10], Iter [1924/3125], train_loss:0.049307 Epoch [1/10], Iter [1925/3125], train_loss:0.037029 Epoch [1/10], Iter [1926/3125], train_loss:0.078327 Epoch [1/10], Iter [1927/3125], train_loss:0.073983 Epoch [1/10], Iter [1928/3125], train_loss:0.071034 Epoch [1/10], Iter [1929/3125], train_loss:0.072575 Epoch [1/10], Iter [1930/3125], train_loss:0.035677 Epoch [1/10], Iter [1931/3125], train_loss:0.078652 Epoch [1/10], Iter [1932/3125], train_loss:0.050624 Epoch [1/10], Iter [1933/3125], train_loss:0.061268 Epoch [1/10], Iter [1934/3125], train_loss:0.030012 Epoch [1/10], Iter [1935/3125], train_loss:0.064447 Epoch [1/10], Iter [1936/3125], train_loss:0.067326 Epoch [1/10], Iter [1937/3125], train_loss:0.047509 Epoch [1/10], Iter [1938/3125], train_loss:0.080461 Epoch [1/10], Iter [1939/3125], train_loss:0.065088 Epoch [1/10], Iter [1940/3125], train_loss:0.045047 Epoch [1/10], Iter [1941/3125], train_loss:0.048151 Epoch [1/10], Iter [1942/3125], train_loss:0.041551 Epoch [1/10], Iter [1943/3125], train_loss:0.062923 Epoch [1/10], Iter [1944/3125], train_loss:0.047921 Epoch [1/10], Iter [1945/3125], train_loss:0.055047 Epoch [1/10], Iter [1946/3125], train_loss:0.047319 Epoch [1/10], Iter [1947/3125], train_loss:0.079555 Epoch [1/10], Iter [1948/3125], train_loss:0.060398 Epoch [1/10], Iter [1949/3125], train_loss:0.024709 Epoch [1/10], Iter [1950/3125], train_loss:0.057181 Epoch [1/10], Iter [1951/3125], train_loss:0.073039 Epoch [1/10], Iter [1952/3125], train_loss:0.080788 Epoch [1/10], Iter [1953/3125], train_loss:0.027360 Epoch [1/10], Iter [1954/3125], train_loss:0.099107 Epoch [1/10], Iter [1955/3125], train_loss:0.039013 Epoch [1/10], Iter [1956/3125], train_loss:0.085083 Epoch [1/10], Iter [1957/3125], train_loss:0.061486 Epoch [1/10], Iter [1958/3125], train_loss:0.054446 Epoch [1/10], Iter [1959/3125], train_loss:0.069039 Epoch [1/10], Iter [1960/3125], train_loss:0.040418 Epoch [1/10], Iter [1961/3125], train_loss:0.073553 Epoch [1/10], Iter [1962/3125], train_loss:0.045772 Epoch [1/10], Iter [1963/3125], train_loss:0.060261 Epoch [1/10], Iter [1964/3125], train_loss:0.065421 Epoch [1/10], Iter [1965/3125], train_loss:0.076194 Epoch [1/10], Iter [1966/3125], train_loss:0.064436 Epoch [1/10], Iter [1967/3125], train_loss:0.076793 Epoch [1/10], Iter [1968/3125], train_loss:0.055979 Epoch [1/10], Iter [1969/3125], train_loss:0.029151 Epoch [1/10], Iter [1970/3125], train_loss:0.038949 Epoch [1/10], Iter [1971/3125], train_loss:0.041652 Epoch [1/10], Iter [1972/3125], train_loss:0.057385 Epoch [1/10], Iter [1973/3125], train_loss:0.063295 Epoch [1/10], Iter [1974/3125], train_loss:0.065931 Epoch [1/10], Iter [1975/3125], train_loss:0.063027 Epoch [1/10], Iter [1976/3125], train_loss:0.069438 Epoch [1/10], Iter [1977/3125], train_loss:0.043597 Epoch [1/10], Iter [1978/3125], train_loss:0.077617 Epoch [1/10], Iter [1979/3125], train_loss:0.075510 Epoch [1/10], Iter [1980/3125], train_loss:0.064318 Epoch [1/10], Iter [1981/3125], train_loss:0.057600 Epoch [1/10], Iter [1982/3125], train_loss:0.051950 Epoch [1/10], Iter [1983/3125], train_loss:0.060522 Epoch [1/10], Iter [1984/3125], train_loss:0.043160 Epoch [1/10], Iter [1985/3125], train_loss:0.046968 Epoch [1/10], Iter [1986/3125], train_loss:0.030345 Epoch [1/10], Iter [1987/3125], train_loss:0.067975 Epoch [1/10], Iter [1988/3125], train_loss:0.070917 Epoch [1/10], Iter [1989/3125], train_loss:0.050825 Epoch [1/10], Iter [1990/3125], train_loss:0.056659 Epoch [1/10], Iter [1991/3125], train_loss:0.075110 Epoch [1/10], Iter [1992/3125], train_loss:0.018620 Epoch [1/10], Iter [1993/3125], train_loss:0.086012 Epoch [1/10], Iter [1994/3125], train_loss:0.061522 Epoch [1/10], Iter [1995/3125], train_loss:0.115937 Epoch [1/10], Iter [1996/3125], train_loss:0.045985 Epoch [1/10], Iter [1997/3125], train_loss:0.053937 Epoch [1/10], Iter [1998/3125], train_loss:0.070547 Epoch [1/10], Iter [1999/3125], train_loss:0.042071 Epoch [1/10], Iter [2000/3125], train_loss:0.043023 Epoch [1/10], Iter [2001/3125], train_loss:0.081274 Epoch [1/10], Iter [2002/3125], train_loss:0.066850 Epoch [1/10], Iter [2003/3125], train_loss:0.033427 Epoch [1/10], Iter [2004/3125], train_loss:0.061561 Epoch [1/10], Iter [2005/3125], train_loss:0.062892 Epoch [1/10], Iter [2006/3125], train_loss:0.029832 Epoch [1/10], Iter [2007/3125], train_loss:0.084254 Epoch [1/10], Iter [2008/3125], train_loss:0.086006 Epoch [1/10], Iter [2009/3125], train_loss:0.075942 Epoch [1/10], Iter [2010/3125], train_loss:0.086731 Epoch [1/10], Iter [2011/3125], train_loss:0.061293 Epoch [1/10], Iter [2012/3125], train_loss:0.031159 Epoch [1/10], Iter [2013/3125], train_loss:0.094308 Epoch [1/10], Iter [2014/3125], train_loss:0.058767 Epoch [1/10], Iter [2015/3125], train_loss:0.042780 Epoch [1/10], Iter [2016/3125], train_loss:0.053814 Epoch [1/10], Iter [2017/3125], train_loss:0.044383 Epoch [1/10], Iter [2018/3125], train_loss:0.054721 Epoch [1/10], Iter [2019/3125], train_loss:0.037710 Epoch [1/10], Iter [2020/3125], train_loss:0.050791 Epoch [1/10], Iter [2021/3125], train_loss:0.088299 Epoch [1/10], Iter [2022/3125], train_loss:0.023384 Epoch [1/10], Iter [2023/3125], train_loss:0.059585 Epoch [1/10], Iter [2024/3125], train_loss:0.047600 Epoch [1/10], Iter [2025/3125], train_loss:0.050966 Epoch [1/10], Iter [2026/3125], train_loss:0.069498 Epoch [1/10], Iter [2027/3125], train_loss:0.059679 Epoch [1/10], Iter [2028/3125], train_loss:0.054175 Epoch [1/10], Iter [2029/3125], train_loss:0.048971 Epoch [1/10], Iter [2030/3125], train_loss:0.055469 Epoch [1/10], Iter [2031/3125], train_loss:0.042843 Epoch [1/10], Iter [2032/3125], train_loss:0.054261 Epoch [1/10], Iter [2033/3125], train_loss:0.034696 Epoch [1/10], Iter [2034/3125], train_loss:0.050647 Epoch [1/10], Iter [2035/3125], train_loss:0.075666 Epoch [1/10], Iter [2036/3125], train_loss:0.082343 Epoch [1/10], Iter [2037/3125], train_loss:0.050409 Epoch [1/10], Iter [2038/3125], train_loss:0.050441 Epoch [1/10], Iter [2039/3125], train_loss:0.068800 Epoch [1/10], Iter [2040/3125], train_loss:0.064183 Epoch [1/10], Iter [2041/3125], train_loss:0.033020 Epoch [1/10], Iter [2042/3125], train_loss:0.068810 Epoch [1/10], Iter [2043/3125], train_loss:0.036257 Epoch [1/10], Iter [2044/3125], train_loss:0.060899 Epoch [1/10], Iter [2045/3125], train_loss:0.061538 Epoch [1/10], Iter [2046/3125], train_loss:0.044145 Epoch [1/10], Iter [2047/3125], train_loss:0.039485 Epoch [1/10], Iter [2048/3125], train_loss:0.042501 Epoch [1/10], Iter [2049/3125], train_loss:0.063631 Epoch [1/10], Iter [2050/3125], train_loss:0.046520 Epoch [1/10], Iter [2051/3125], train_loss:0.055999 Epoch [1/10], Iter [2052/3125], train_loss:0.063847 Epoch [1/10], Iter [2053/3125], train_loss:0.069343 Epoch [1/10], Iter [2054/3125], train_loss:0.052924 Epoch [1/10], Iter [2055/3125], train_loss:0.036919 Epoch [1/10], Iter [2056/3125], train_loss:0.054971 Epoch [1/10], Iter [2057/3125], train_loss:0.048387 Epoch [1/10], Iter [2058/3125], train_loss:0.084165 Epoch [1/10], Iter [2059/3125], train_loss:0.044616 Epoch [1/10], Iter [2060/3125], train_loss:0.033628 Epoch [1/10], Iter [2061/3125], train_loss:0.027558 Epoch [1/10], Iter [2062/3125], train_loss:0.055136 Epoch [1/10], Iter [2063/3125], train_loss:0.062519 Epoch [1/10], Iter [2064/3125], train_loss:0.050408 Epoch [1/10], Iter [2065/3125], train_loss:0.033982 Epoch [1/10], Iter [2066/3125], train_loss:0.087878 Epoch [1/10], Iter [2067/3125], train_loss:0.044555 Epoch [1/10], Iter [2068/3125], train_loss:0.036030 Epoch [1/10], Iter [2069/3125], train_loss:0.047172 Epoch [1/10], Iter [2070/3125], train_loss:0.057118 Epoch [1/10], Iter [2071/3125], train_loss:0.050927 Epoch [1/10], Iter [2072/3125], train_loss:0.055021 Epoch [1/10], Iter [2073/3125], train_loss:0.042873 Epoch [1/10], Iter [2074/3125], train_loss:0.069662 Epoch [1/10], Iter [2075/3125], train_loss:0.086718 Epoch [1/10], Iter [2076/3125], train_loss:0.060907 Epoch [1/10], Iter [2077/3125], train_loss:0.055302 Epoch [1/10], Iter [2078/3125], train_loss:0.063130 Epoch [1/10], Iter [2079/3125], train_loss:0.041546 Epoch [1/10], Iter [2080/3125], train_loss:0.079889 Epoch [1/10], Iter [2081/3125], train_loss:0.059205 Epoch [1/10], Iter [2082/3125], train_loss:0.077855 Epoch [1/10], Iter [2083/3125], train_loss:0.040796 Epoch [1/10], Iter [2084/3125], train_loss:0.063951 Epoch [1/10], Iter [2085/3125], train_loss:0.060815 Epoch [1/10], Iter [2086/3125], train_loss:0.105773 Epoch [1/10], Iter [2087/3125], train_loss:0.055865 Epoch [1/10], Iter [2088/3125], train_loss:0.058389 Epoch [1/10], Iter [2089/3125], train_loss:0.085886 Epoch [1/10], Iter [2090/3125], train_loss:0.037964 Epoch [1/10], Iter [2091/3125], train_loss:0.037571 Epoch [1/10], Iter [2092/3125], train_loss:0.051286 Epoch [1/10], Iter [2093/3125], train_loss:0.072742 Epoch [1/10], Iter [2094/3125], train_loss:0.027918 Epoch [1/10], Iter [2095/3125], train_loss:0.064145 Epoch [1/10], Iter [2096/3125], train_loss:0.062825 Epoch [1/10], Iter [2097/3125], train_loss:0.047760 Epoch [1/10], Iter [2098/3125], train_loss:0.051347 Epoch [1/10], Iter [2099/3125], train_loss:0.066230 Epoch [1/10], Iter [2100/3125], train_loss:0.062902 Epoch [1/10], Iter [2101/3125], train_loss:0.047526 Epoch [1/10], Iter [2102/3125], train_loss:0.039127 Epoch [1/10], Iter [2103/3125], train_loss:0.046777 Epoch [1/10], Iter [2104/3125], train_loss:0.059681 Epoch [1/10], Iter [2105/3125], train_loss:0.061811 Epoch [1/10], Iter [2106/3125], train_loss:0.039108 Epoch [1/10], Iter [2107/3125], train_loss:0.075459 Epoch [1/10], Iter [2108/3125], train_loss:0.063627 Epoch [1/10], Iter [2109/3125], train_loss:0.035721 Epoch [1/10], Iter [2110/3125], train_loss:0.060149 Epoch [1/10], Iter [2111/3125], train_loss:0.067085 Epoch [1/10], Iter [2112/3125], train_loss:0.059505 Epoch [1/10], Iter [2113/3125], train_loss:0.056017 Epoch [1/10], Iter [2114/3125], train_loss:0.020455 Epoch [1/10], Iter [2115/3125], train_loss:0.081689 Epoch [1/10], Iter [2116/3125], train_loss:0.039513 Epoch [1/10], Iter [2117/3125], train_loss:0.048386 Epoch [1/10], Iter [2118/3125], train_loss:0.059267 Epoch [1/10], Iter [2119/3125], train_loss:0.082934 Epoch [1/10], Iter [2120/3125], train_loss:0.060041 Epoch [1/10], Iter [2121/3125], train_loss:0.061388 Epoch [1/10], Iter [2122/3125], train_loss:0.042897 Epoch [1/10], Iter [2123/3125], train_loss:0.045056 Epoch [1/10], Iter [2124/3125], train_loss:0.060849 Epoch [1/10], Iter [2125/3125], train_loss:0.049667 Epoch [1/10], Iter [2126/3125], train_loss:0.048343 Epoch [1/10], Iter [2127/3125], train_loss:0.068228 Epoch [1/10], Iter [2128/3125], train_loss:0.037251 Epoch [1/10], Iter [2129/3125], train_loss:0.027494 Epoch [1/10], Iter [2130/3125], train_loss:0.064851 Epoch [1/10], Iter [2131/3125], train_loss:0.044079 Epoch [1/10], Iter [2132/3125], train_loss:0.058055 Epoch [1/10], Iter [2133/3125], train_loss:0.028688 Epoch [1/10], Iter [2134/3125], train_loss:0.063009 Epoch [1/10], Iter [2135/3125], train_loss:0.049375 Epoch [1/10], Iter [2136/3125], train_loss:0.070779 Epoch [1/10], Iter [2137/3125], train_loss:0.061121 Epoch [1/10], Iter [2138/3125], train_loss:0.045141 Epoch [1/10], Iter [2139/3125], train_loss:0.032898 Epoch [1/10], Iter [2140/3125], train_loss:0.044351 Epoch [1/10], Iter [2141/3125], train_loss:0.056783 Epoch [1/10], Iter [2142/3125], train_loss:0.056133 Epoch [1/10], Iter [2143/3125], train_loss:0.088715 Epoch [1/10], Iter [2144/3125], train_loss:0.068217 Epoch [1/10], Iter [2145/3125], train_loss:0.043055 Epoch [1/10], Iter [2146/3125], train_loss:0.032986 Epoch [1/10], Iter [2147/3125], train_loss:0.041009 Epoch [1/10], Iter [2148/3125], train_loss:0.044360 Epoch [1/10], Iter [2149/3125], train_loss:0.065169 Epoch [1/10], Iter [2150/3125], train_loss:0.075291 Epoch [1/10], Iter [2151/3125], train_loss:0.050981 Epoch [1/10], Iter [2152/3125], train_loss:0.062930 Epoch [1/10], Iter [2153/3125], train_loss:0.058825 Epoch [1/10], Iter [2154/3125], train_loss:0.076227 Epoch [1/10], Iter [2155/3125], train_loss:0.083203 Epoch [1/10], Iter [2156/3125], train_loss:0.063778 Epoch [1/10], Iter [2157/3125], train_loss:0.045961 Epoch [1/10], Iter [2158/3125], train_loss:0.070411 Epoch [1/10], Iter [2159/3125], train_loss:0.064471 Epoch [1/10], Iter [2160/3125], train_loss:0.056950 Epoch [1/10], Iter [2161/3125], train_loss:0.074447 Epoch [1/10], Iter [2162/3125], train_loss:0.052749 Epoch [1/10], Iter [2163/3125], train_loss:0.057865 Epoch [1/10], Iter [2164/3125], train_loss:0.037370 Epoch [1/10], Iter [2165/3125], train_loss:0.103615 Epoch [1/10], Iter [2166/3125], train_loss:0.076190 Epoch [1/10], Iter [2167/3125], train_loss:0.044481 Epoch [1/10], Iter [2168/3125], train_loss:0.050516 Epoch [1/10], Iter [2169/3125], train_loss:0.036114 Epoch [1/10], Iter [2170/3125], train_loss:0.037495 Epoch [1/10], Iter [2171/3125], train_loss:0.058162 Epoch [1/10], Iter [2172/3125], train_loss:0.072126 Epoch [1/10], Iter [2173/3125], train_loss:0.058480 Epoch [1/10], Iter [2174/3125], train_loss:0.057047 Epoch [1/10], Iter [2175/3125], train_loss:0.058543 Epoch [1/10], Iter [2176/3125], train_loss:0.044135 Epoch [1/10], Iter [2177/3125], train_loss:0.021453 Epoch [1/10], Iter [2178/3125], train_loss:0.091287 Epoch [1/10], Iter [2179/3125], train_loss:0.030686 Epoch [1/10], Iter [2180/3125], train_loss:0.043142 Epoch [1/10], Iter [2181/3125], train_loss:0.061297 Epoch [1/10], Iter [2182/3125], train_loss:0.052431 Epoch [1/10], Iter [2183/3125], train_loss:0.064683 Epoch [1/10], Iter [2184/3125], train_loss:0.052090 Epoch [1/10], Iter [2185/3125], train_loss:0.059552 Epoch [1/10], Iter [2186/3125], train_loss:0.043549 Epoch [1/10], Iter [2187/3125], train_loss:0.039106 Epoch [1/10], Iter [2188/3125], train_loss:0.033696 Epoch [1/10], Iter [2189/3125], train_loss:0.059473 Epoch [1/10], Iter [2190/3125], train_loss:0.042966 Epoch [1/10], Iter [2191/3125], train_loss:0.038413 Epoch [1/10], Iter [2192/3125], train_loss:0.048166 Epoch [1/10], Iter [2193/3125], train_loss:0.062529 Epoch [1/10], Iter [2194/3125], train_loss:0.063281 Epoch [1/10], Iter [2195/3125], train_loss:0.068794 Epoch [1/10], Iter [2196/3125], train_loss:0.060039 Epoch [1/10], Iter [2197/3125], train_loss:0.059375 Epoch [1/10], Iter [2198/3125], train_loss:0.052642 Epoch [1/10], Iter [2199/3125], train_loss:0.046952 Epoch [1/10], Iter [2200/3125], train_loss:0.071861 Epoch [1/10], Iter [2201/3125], train_loss:0.044257 Epoch [1/10], Iter [2202/3125], train_loss:0.057232 Epoch [1/10], Iter [2203/3125], train_loss:0.039750 Epoch [1/10], Iter [2204/3125], train_loss:0.074284 Epoch [1/10], Iter [2205/3125], train_loss:0.029797 Epoch [1/10], Iter [2206/3125], train_loss:0.058231 Epoch [1/10], Iter [2207/3125], train_loss:0.066111 Epoch [1/10], Iter [2208/3125], train_loss:0.067477 Epoch [1/10], Iter [2209/3125], train_loss:0.065425 Epoch [1/10], Iter [2210/3125], train_loss:0.039687 Epoch [1/10], Iter [2211/3125], train_loss:0.054980 Epoch [1/10], Iter [2212/3125], train_loss:0.052664 Epoch [1/10], Iter [2213/3125], train_loss:0.065844 Epoch [1/10], Iter [2214/3125], train_loss:0.094000 Epoch [1/10], Iter [2215/3125], train_loss:0.053468 Epoch [1/10], Iter [2216/3125], train_loss:0.061695 Epoch [1/10], Iter [2217/3125], train_loss:0.067787 Epoch [1/10], Iter [2218/3125], train_loss:0.035557 Epoch [1/10], Iter [2219/3125], train_loss:0.054791 Epoch [1/10], Iter [2220/3125], train_loss:0.074102 Epoch [1/10], Iter [2221/3125], train_loss:0.053827 Epoch [1/10], Iter [2222/3125], train_loss:0.064904 Epoch [1/10], Iter [2223/3125], train_loss:0.048594 Epoch [1/10], Iter [2224/3125], train_loss:0.038459 Epoch [1/10], Iter [2225/3125], train_loss:0.033388 Epoch [1/10], Iter [2226/3125], train_loss:0.053181 Epoch [1/10], Iter [2227/3125], train_loss:0.070912 Epoch [1/10], Iter [2228/3125], train_loss:0.087150 Epoch [1/10], Iter [2229/3125], train_loss:0.043372 Epoch [1/10], Iter [2230/3125], train_loss:0.053783 Epoch [1/10], Iter [2231/3125], train_loss:0.040672 Epoch [1/10], Iter [2232/3125], train_loss:0.045534 Epoch [1/10], Iter [2233/3125], train_loss:0.040906 Epoch [1/10], Iter [2234/3125], train_loss:0.046060 Epoch [1/10], Iter [2235/3125], train_loss:0.073936 Epoch [1/10], Iter [2236/3125], train_loss:0.048040 Epoch [1/10], Iter [2237/3125], train_loss:0.044033 Epoch [1/10], Iter [2238/3125], train_loss:0.058578 Epoch [1/10], Iter [2239/3125], train_loss:0.046442 Epoch [1/10], Iter [2240/3125], train_loss:0.070717 Epoch [1/10], Iter [2241/3125], train_loss:0.057559 Epoch [1/10], Iter [2242/3125], train_loss:0.071514 Epoch [1/10], Iter [2243/3125], train_loss:0.072684 Epoch [1/10], Iter [2244/3125], train_loss:0.071098 Epoch [1/10], Iter [2245/3125], train_loss:0.029106 Epoch [1/10], Iter [2246/3125], train_loss:0.047889 Epoch [1/10], Iter [2247/3125], train_loss:0.074630 Epoch [1/10], Iter [2248/3125], train_loss:0.039345 Epoch [1/10], Iter [2249/3125], train_loss:0.076240 Epoch [1/10], Iter [2250/3125], train_loss:0.046938 Epoch [1/10], Iter [2251/3125], train_loss:0.051236 Epoch [1/10], Iter [2252/3125], train_loss:0.060951 Epoch [1/10], Iter [2253/3125], train_loss:0.072658 Epoch [1/10], Iter [2254/3125], train_loss:0.072621 Epoch [1/10], Iter [2255/3125], train_loss:0.071780 Epoch [1/10], Iter [2256/3125], train_loss:0.047900 Epoch [1/10], Iter [2257/3125], train_loss:0.083139 Epoch [1/10], Iter [2258/3125], train_loss:0.042750 Epoch [1/10], Iter [2259/3125], train_loss:0.030537 Epoch [1/10], Iter [2260/3125], train_loss:0.071231 Epoch [1/10], Iter [2261/3125], train_loss:0.058627 Epoch [1/10], Iter [2262/3125], train_loss:0.061551 Epoch [1/10], Iter [2263/3125], train_loss:0.057065 Epoch [1/10], Iter [2264/3125], train_loss:0.063427 Epoch [1/10], Iter [2265/3125], train_loss:0.052468 Epoch [1/10], Iter [2266/3125], train_loss:0.052080 Epoch [1/10], Iter [2267/3125], train_loss:0.033376 Epoch [1/10], Iter [2268/3125], train_loss:0.041073 Epoch [1/10], Iter [2269/3125], train_loss:0.065047 Epoch [1/10], Iter [2270/3125], train_loss:0.062026 Epoch [1/10], Iter [2271/3125], train_loss:0.109442 Epoch [1/10], Iter [2272/3125], train_loss:0.056198 Epoch [1/10], Iter [2273/3125], train_loss:0.063348 Epoch [1/10], Iter [2274/3125], train_loss:0.039659 Epoch [1/10], Iter [2275/3125], train_loss:0.062523 Epoch [1/10], Iter [2276/3125], train_loss:0.057241 Epoch [1/10], Iter [2277/3125], train_loss:0.026030 Epoch [1/10], Iter [2278/3125], train_loss:0.060936 Epoch [1/10], Iter [2279/3125], train_loss:0.037769 Epoch [1/10], Iter [2280/3125], train_loss:0.047071 Epoch [1/10], Iter [2281/3125], train_loss:0.067723 Epoch [1/10], Iter [2282/3125], train_loss:0.071875 Epoch [1/10], Iter [2283/3125], train_loss:0.049202 Epoch [1/10], Iter [2284/3125], train_loss:0.060309 Epoch [1/10], Iter [2285/3125], train_loss:0.068315 Epoch [1/10], Iter [2286/3125], train_loss:0.072877 Epoch [1/10], Iter [2287/3125], train_loss:0.063042 Epoch [1/10], Iter [2288/3125], train_loss:0.078719 Epoch [1/10], Iter [2289/3125], train_loss:0.026097 Epoch [1/10], Iter [2290/3125], train_loss:0.060497 Epoch [1/10], Iter [2291/3125], train_loss:0.078648 Epoch [1/10], Iter [2292/3125], train_loss:0.068681 Epoch [1/10], Iter [2293/3125], train_loss:0.044549 Epoch [1/10], Iter [2294/3125], train_loss:0.079612 Epoch [1/10], Iter [2295/3125], train_loss:0.036360 Epoch [1/10], Iter [2296/3125], train_loss:0.029000 Epoch [1/10], Iter [2297/3125], train_loss:0.055833 Epoch [1/10], Iter [2298/3125], train_loss:0.078257 Epoch [1/10], Iter [2299/3125], train_loss:0.064521 Epoch [1/10], Iter [2300/3125], train_loss:0.053077 Epoch [1/10], Iter [2301/3125], train_loss:0.061464 Epoch [1/10], Iter [2302/3125], train_loss:0.054382 Epoch [1/10], Iter [2303/3125], train_loss:0.029077 Epoch [1/10], Iter [2304/3125], train_loss:0.047081 Epoch [1/10], Iter [2305/3125], train_loss:0.034250 Epoch [1/10], Iter [2306/3125], train_loss:0.067229 Epoch [1/10], Iter [2307/3125], train_loss:0.038814 Epoch [1/10], Iter [2308/3125], train_loss:0.059177 Epoch [1/10], Iter [2309/3125], train_loss:0.029574 Epoch [1/10], Iter [2310/3125], train_loss:0.034070 Epoch [1/10], Iter [2311/3125], train_loss:0.077129 Epoch [1/10], Iter [2312/3125], train_loss:0.036397 Epoch [1/10], Iter [2313/3125], train_loss:0.065701 Epoch [1/10], Iter [2314/3125], train_loss:0.044045 Epoch [1/10], Iter [2315/3125], train_loss:0.078438 Epoch [1/10], Iter [2316/3125], train_loss:0.099388 Epoch [1/10], Iter [2317/3125], train_loss:0.053328 Epoch [1/10], Iter [2318/3125], train_loss:0.033426 Epoch [1/10], Iter [2319/3125], train_loss:0.045820 Epoch [1/10], Iter [2320/3125], train_loss:0.071173 Epoch [1/10], Iter [2321/3125], train_loss:0.058071 Epoch [1/10], Iter [2322/3125], train_loss:0.032791 Epoch [1/10], Iter [2323/3125], train_loss:0.049563 Epoch [1/10], Iter [2324/3125], train_loss:0.037852 Epoch [1/10], Iter [2325/3125], train_loss:0.071495 Epoch [1/10], Iter [2326/3125], train_loss:0.051821 Epoch [1/10], Iter [2327/3125], train_loss:0.049604 Epoch [1/10], Iter [2328/3125], train_loss:0.084093 Epoch [1/10], Iter [2329/3125], train_loss:0.050646 Epoch [1/10], Iter [2330/3125], train_loss:0.035999 Epoch [1/10], Iter [2331/3125], train_loss:0.079603 Epoch [1/10], Iter [2332/3125], train_loss:0.036003 Epoch [1/10], Iter [2333/3125], train_loss:0.029306 Epoch [1/10], Iter [2334/3125], train_loss:0.080034 Epoch [1/10], Iter [2335/3125], train_loss:0.056424 Epoch [1/10], Iter [2336/3125], train_loss:0.067404 Epoch [1/10], Iter [2337/3125], train_loss:0.048945 Epoch [1/10], Iter [2338/3125], train_loss:0.034922 Epoch [1/10], Iter [2339/3125], train_loss:0.060189 Epoch [1/10], Iter [2340/3125], train_loss:0.041691 Epoch [1/10], Iter [2341/3125], train_loss:0.076982 Epoch [1/10], Iter [2342/3125], train_loss:0.075437 Epoch [1/10], Iter [2343/3125], train_loss:0.056825 Epoch [1/10], Iter [2344/3125], train_loss:0.038702 Epoch [1/10], Iter [2345/3125], train_loss:0.048160 Epoch [1/10], Iter [2346/3125], train_loss:0.054957 Epoch [1/10], Iter [2347/3125], train_loss:0.073520 Epoch [1/10], Iter [2348/3125], train_loss:0.025029 Epoch [1/10], Iter [2349/3125], train_loss:0.078251 Epoch [1/10], Iter [2350/3125], train_loss:0.058632 Epoch [1/10], Iter [2351/3125], train_loss:0.027224 Epoch [1/10], Iter [2352/3125], train_loss:0.078937 Epoch [1/10], Iter [2353/3125], train_loss:0.047743 Epoch [1/10], Iter [2354/3125], train_loss:0.051082 Epoch [1/10], Iter [2355/3125], train_loss:0.079061 Epoch [1/10], Iter [2356/3125], train_loss:0.073499 Epoch [1/10], Iter [2357/3125], train_loss:0.043175 Epoch [1/10], Iter [2358/3125], train_loss:0.056764 Epoch [1/10], Iter [2359/3125], train_loss:0.019714 Epoch [1/10], Iter [2360/3125], train_loss:0.063975 Epoch [1/10], Iter [2361/3125], train_loss:0.051211 Epoch [1/10], Iter [2362/3125], train_loss:0.057849 Epoch [1/10], Iter [2363/3125], train_loss:0.069020 Epoch [1/10], Iter [2364/3125], train_loss:0.062727 Epoch [1/10], Iter [2365/3125], train_loss:0.038595 Epoch [1/10], Iter [2366/3125], train_loss:0.029429 Epoch [1/10], Iter [2367/3125], train_loss:0.039399 Epoch [1/10], Iter [2368/3125], train_loss:0.065248 Epoch [1/10], Iter [2369/3125], train_loss:0.031663 Epoch [1/10], Iter [2370/3125], train_loss:0.027714 Epoch [1/10], Iter [2371/3125], train_loss:0.041660 Epoch [1/10], Iter [2372/3125], train_loss:0.023911 Epoch [1/10], Iter [2373/3125], train_loss:0.043590 Epoch [1/10], Iter [2374/3125], train_loss:0.027625 Epoch [1/10], Iter [2375/3125], train_loss:0.027970 Epoch [1/10], Iter [2376/3125], train_loss:0.086231 Epoch [1/10], Iter [2377/3125], train_loss:0.030232 Epoch [1/10], Iter [2378/3125], train_loss:0.048442 Epoch [1/10], Iter [2379/3125], train_loss:0.037288 Epoch [1/10], Iter [2380/3125], train_loss:0.036998 Epoch [1/10], Iter [2381/3125], train_loss:0.062230 Epoch [1/10], Iter [2382/3125], train_loss:0.077990 Epoch [1/10], Iter [2383/3125], train_loss:0.037560 Epoch [1/10], Iter [2384/3125], train_loss:0.060333 Epoch [1/10], Iter [2385/3125], train_loss:0.067466 Epoch [1/10], Iter [2386/3125], train_loss:0.044783 Epoch [1/10], Iter [2387/3125], train_loss:0.061185 Epoch [1/10], Iter [2388/3125], train_loss:0.020483 Epoch [1/10], Iter [2389/3125], train_loss:0.040517 Epoch [1/10], Iter [2390/3125], train_loss:0.080889 Epoch [1/10], Iter [2391/3125], train_loss:0.078674 Epoch [1/10], Iter [2392/3125], train_loss:0.038500 Epoch [1/10], Iter [2393/3125], train_loss:0.043009 Epoch [1/10], Iter [2394/3125], train_loss:0.045287 Epoch [1/10], Iter [2395/3125], train_loss:0.052948 Epoch [1/10], Iter [2396/3125], train_loss:0.096492 Epoch [1/10], Iter [2397/3125], train_loss:0.084607 Epoch [1/10], Iter [2398/3125], train_loss:0.018984 Epoch [1/10], Iter [2399/3125], train_loss:0.058866 Epoch [1/10], Iter [2400/3125], train_loss:0.054521 Epoch [1/10], Iter [2401/3125], train_loss:0.035970 Epoch [1/10], Iter [2402/3125], train_loss:0.083726 Epoch [1/10], Iter [2403/3125], train_loss:0.040679 Epoch [1/10], Iter [2404/3125], train_loss:0.065046 Epoch [1/10], Iter [2405/3125], train_loss:0.094652 Epoch [1/10], Iter [2406/3125], train_loss:0.059551 Epoch [1/10], Iter [2407/3125], train_loss:0.065810 Epoch [1/10], Iter [2408/3125], train_loss:0.050208 Epoch [1/10], Iter [2409/3125], train_loss:0.066216 Epoch [1/10], Iter [2410/3125], train_loss:0.058400 Epoch [1/10], Iter [2411/3125], train_loss:0.053513 Epoch [1/10], Iter [2412/3125], train_loss:0.060500 Epoch [1/10], Iter [2413/3125], train_loss:0.044563 Epoch [1/10], Iter [2414/3125], train_loss:0.029764 Epoch [1/10], Iter [2415/3125], train_loss:0.047340 Epoch [1/10], Iter [2416/3125], train_loss:0.035138 Epoch [1/10], Iter [2417/3125], train_loss:0.071377 Epoch [1/10], Iter [2418/3125], train_loss:0.024064 Epoch [1/10], Iter [2419/3125], train_loss:0.042528 Epoch [1/10], Iter [2420/3125], train_loss:0.043153 Epoch [1/10], Iter [2421/3125], train_loss:0.030465 Epoch [1/10], Iter [2422/3125], train_loss:0.072440 Epoch [1/10], Iter [2423/3125], train_loss:0.055920 Epoch [1/10], Iter [2424/3125], train_loss:0.035570 Epoch [1/10], Iter [2425/3125], train_loss:0.056007 Epoch [1/10], Iter [2426/3125], train_loss:0.041977 Epoch [1/10], Iter [2427/3125], train_loss:0.063373 Epoch [1/10], Iter [2428/3125], train_loss:0.052605 Epoch [1/10], Iter [2429/3125], train_loss:0.036802 Epoch [1/10], Iter [2430/3125], train_loss:0.034278 Epoch [1/10], Iter [2431/3125], train_loss:0.052479 Epoch [1/10], Iter [2432/3125], train_loss:0.039629 Epoch [1/10], Iter [2433/3125], train_loss:0.060461 Epoch [1/10], Iter [2434/3125], train_loss:0.022422 Epoch [1/10], Iter [2435/3125], train_loss:0.058592 Epoch [1/10], Iter [2436/3125], train_loss:0.085719 Epoch [1/10], Iter [2437/3125], train_loss:0.055790 Epoch [1/10], Iter [2438/3125], train_loss:0.033942 Epoch [1/10], Iter [2439/3125], train_loss:0.074614 Epoch [1/10], Iter [2440/3125], train_loss:0.042400 Epoch [1/10], Iter [2441/3125], train_loss:0.066518 Epoch [1/10], Iter [2442/3125], train_loss:0.084506 Epoch [1/10], Iter [2443/3125], train_loss:0.045445 Epoch [1/10], Iter [2444/3125], train_loss:0.058341 Epoch [1/10], Iter [2445/3125], train_loss:0.050448 Epoch [1/10], Iter [2446/3125], train_loss:0.053517 Epoch [1/10], Iter [2447/3125], train_loss:0.061119 Epoch [1/10], Iter [2448/3125], train_loss:0.067219 Epoch [1/10], Iter [2449/3125], train_loss:0.038764 Epoch [1/10], Iter [2450/3125], train_loss:0.050990 Epoch [1/10], Iter [2451/3125], train_loss:0.068929 Epoch [1/10], Iter [2452/3125], train_loss:0.112174 Epoch [1/10], Iter [2453/3125], train_loss:0.045488 Epoch [1/10], Iter [2454/3125], train_loss:0.034194 Epoch [1/10], Iter [2455/3125], train_loss:0.088972 Epoch [1/10], Iter [2456/3125], train_loss:0.044014 Epoch [1/10], Iter [2457/3125], train_loss:0.051432 Epoch [1/10], Iter [2458/3125], train_loss:0.038895 Epoch [1/10], Iter [2459/3125], train_loss:0.091389 Epoch [1/10], Iter [2460/3125], train_loss:0.067894 Epoch [1/10], Iter [2461/3125], train_loss:0.077940 Epoch [1/10], Iter [2462/3125], train_loss:0.035168 Epoch [1/10], Iter [2463/3125], train_loss:0.057799 Epoch [1/10], Iter [2464/3125], train_loss:0.039412 Epoch [1/10], Iter [2465/3125], train_loss:0.055779 Epoch [1/10], Iter [2466/3125], train_loss:0.039693 Epoch [1/10], Iter [2467/3125], train_loss:0.044370 Epoch [1/10], Iter [2468/3125], train_loss:0.072034 Epoch [1/10], Iter [2469/3125], train_loss:0.039117 Epoch [1/10], Iter [2470/3125], train_loss:0.041900 Epoch [1/10], Iter [2471/3125], train_loss:0.078160 Epoch [1/10], Iter [2472/3125], train_loss:0.043799 Epoch [1/10], Iter [2473/3125], train_loss:0.034027 Epoch [1/10], Iter [2474/3125], train_loss:0.033906 Epoch [1/10], Iter [2475/3125], train_loss:0.040556 Epoch [1/10], Iter [2476/3125], train_loss:0.076365 Epoch [1/10], Iter [2477/3125], train_loss:0.044474 Epoch [1/10], Iter [2478/3125], train_loss:0.050639 Epoch [1/10], Iter [2479/3125], train_loss:0.094295 Epoch [1/10], Iter [2480/3125], train_loss:0.049790 Epoch [1/10], Iter [2481/3125], train_loss:0.058790 Epoch [1/10], Iter [2482/3125], train_loss:0.063505 Epoch [1/10], Iter [2483/3125], train_loss:0.049205 Epoch [1/10], Iter [2484/3125], train_loss:0.056420 Epoch [1/10], Iter [2485/3125], train_loss:0.034539 Epoch [1/10], Iter [2486/3125], train_loss:0.060778 Epoch [1/10], Iter [2487/3125], train_loss:0.061710 Epoch [1/10], Iter [2488/3125], train_loss:0.059184 Epoch [1/10], Iter [2489/3125], train_loss:0.051106 Epoch [1/10], Iter [2490/3125], train_loss:0.055393 Epoch [1/10], Iter [2491/3125], train_loss:0.069071 Epoch [1/10], Iter [2492/3125], train_loss:0.038927 Epoch [1/10], Iter [2493/3125], train_loss:0.055511 Epoch [1/10], Iter [2494/3125], train_loss:0.030150 Epoch [1/10], Iter [2495/3125], train_loss:0.046406 Epoch [1/10], Iter [2496/3125], train_loss:0.050650 Epoch [1/10], Iter [2497/3125], train_loss:0.067050 Epoch [1/10], Iter [2498/3125], train_loss:0.065522 Epoch [1/10], Iter [2499/3125], train_loss:0.039835 Epoch [1/10], Iter [2500/3125], train_loss:0.037947 Epoch [1/10], Iter [2501/3125], train_loss:0.087482 Epoch [1/10], Iter [2502/3125], train_loss:0.049749 Epoch [1/10], Iter [2503/3125], train_loss:0.075907 Epoch [1/10], Iter [2504/3125], train_loss:0.048454 Epoch [1/10], Iter [2505/3125], train_loss:0.056744 Epoch [1/10], Iter [2506/3125], train_loss:0.063433 Epoch [1/10], Iter [2507/3125], train_loss:0.093217 Epoch [1/10], Iter [2508/3125], train_loss:0.060091 Epoch [1/10], Iter [2509/3125], train_loss:0.038879 Epoch [1/10], Iter [2510/3125], train_loss:0.073510 Epoch [1/10], Iter [2511/3125], train_loss:0.078042 Epoch [1/10], Iter [2512/3125], train_loss:0.018318 Epoch [1/10], Iter [2513/3125], train_loss:0.071369 Epoch [1/10], Iter [2514/3125], train_loss:0.055521 Epoch [1/10], Iter [2515/3125], train_loss:0.074205 Epoch [1/10], Iter [2516/3125], train_loss:0.034892 Epoch [1/10], Iter [2517/3125], train_loss:0.059679 Epoch [1/10], Iter [2518/3125], train_loss:0.044943 Epoch [1/10], Iter [2519/3125], train_loss:0.039163 Epoch [1/10], Iter [2520/3125], train_loss:0.033841 Epoch [1/10], Iter [2521/3125], train_loss:0.095452 Epoch [1/10], Iter [2522/3125], train_loss:0.052355 Epoch [1/10], Iter [2523/3125], train_loss:0.097691 Epoch [1/10], Iter [2524/3125], train_loss:0.043344 Epoch [1/10], Iter [2525/3125], train_loss:0.082170 Epoch [1/10], Iter [2526/3125], train_loss:0.037574 Epoch [1/10], Iter [2527/3125], train_loss:0.046212 Epoch [1/10], Iter [2528/3125], train_loss:0.028267 Epoch [1/10], Iter [2529/3125], train_loss:0.048699 Epoch [1/10], Iter [2530/3125], train_loss:0.089290 Epoch [1/10], Iter [2531/3125], train_loss:0.080898 Epoch [1/10], Iter [2532/3125], train_loss:0.040260 Epoch [1/10], Iter [2533/3125], train_loss:0.079006 Epoch [1/10], Iter [2534/3125], train_loss:0.044073 Epoch [1/10], Iter [2535/3125], train_loss:0.056003 Epoch [1/10], Iter [2536/3125], train_loss:0.049989 Epoch [1/10], Iter [2537/3125], train_loss:0.045744 Epoch [1/10], Iter [2538/3125], train_loss:0.049811 Epoch [1/10], Iter [2539/3125], train_loss:0.059298 Epoch [1/10], Iter [2540/3125], train_loss:0.041965 Epoch [1/10], Iter [2541/3125], train_loss:0.044184 Epoch [1/10], Iter [2542/3125], train_loss:0.070333 Epoch [1/10], Iter [2543/3125], train_loss:0.061322 Epoch [1/10], Iter [2544/3125], train_loss:0.033247 Epoch [1/10], Iter [2545/3125], train_loss:0.037805 Epoch [1/10], Iter [2546/3125], train_loss:0.031448 Epoch [1/10], Iter [2547/3125], train_loss:0.034567 Epoch [1/10], Iter [2548/3125], train_loss:0.053322 Epoch [1/10], Iter [2549/3125], train_loss:0.081269 Epoch [1/10], Iter [2550/3125], train_loss:0.078102 Epoch [1/10], Iter [2551/3125], train_loss:0.022630 Epoch [1/10], Iter [2552/3125], train_loss:0.032897 Epoch [1/10], Iter [2553/3125], train_loss:0.050063 Epoch [1/10], Iter [2554/3125], train_loss:0.053164 Epoch [1/10], Iter [2555/3125], train_loss:0.033120 Epoch [1/10], Iter [2556/3125], train_loss:0.046334 Epoch [1/10], Iter [2557/3125], train_loss:0.068456 Epoch [1/10], Iter [2558/3125], train_loss:0.070154 Epoch [1/10], Iter [2559/3125], train_loss:0.036025 Epoch [1/10], Iter [2560/3125], train_loss:0.070635 Epoch [1/10], Iter [2561/3125], train_loss:0.052198 Epoch [1/10], Iter [2562/3125], train_loss:0.043804 Epoch [1/10], Iter [2563/3125], train_loss:0.067197 Epoch [1/10], Iter [2564/3125], train_loss:0.080402 Epoch [1/10], Iter [2565/3125], train_loss:0.071421 Epoch [1/10], Iter [2566/3125], train_loss:0.044109 Epoch [1/10], Iter [2567/3125], train_loss:0.063801 Epoch [1/10], Iter [2568/3125], train_loss:0.075022 Epoch [1/10], Iter [2569/3125], train_loss:0.030197 Epoch [1/10], Iter [2570/3125], train_loss:0.060289 Epoch [1/10], Iter [2571/3125], train_loss:0.041631 Epoch [1/10], Iter [2572/3125], train_loss:0.047699 Epoch [1/10], Iter [2573/3125], train_loss:0.028659 Epoch [1/10], Iter [2574/3125], train_loss:0.046188 Epoch [1/10], Iter [2575/3125], train_loss:0.031889 Epoch [1/10], Iter [2576/3125], train_loss:0.066076 Epoch [1/10], Iter [2577/3125], train_loss:0.062998 Epoch [1/10], Iter [2578/3125], train_loss:0.034345 Epoch [1/10], Iter [2579/3125], train_loss:0.045776 Epoch [1/10], Iter [2580/3125], train_loss:0.063058 Epoch [1/10], Iter [2581/3125], train_loss:0.049935 Epoch [1/10], Iter [2582/3125], train_loss:0.084482 Epoch [1/10], Iter [2583/3125], train_loss:0.057923 Epoch [1/10], Iter [2584/3125], train_loss:0.045246 Epoch [1/10], Iter [2585/3125], train_loss:0.058265 Epoch [1/10], Iter [2586/3125], train_loss:0.035428 Epoch [1/10], Iter [2587/3125], train_loss:0.042721 Epoch [1/10], Iter [2588/3125], train_loss:0.067164 Epoch [1/10], Iter [2589/3125], train_loss:0.045646 Epoch [1/10], Iter [2590/3125], train_loss:0.038400 Epoch [1/10], Iter [2591/3125], train_loss:0.038546 Epoch [1/10], Iter [2592/3125], train_loss:0.072927 Epoch [1/10], Iter [2593/3125], train_loss:0.030221 Epoch [1/10], Iter [2594/3125], train_loss:0.056022 Epoch [1/10], Iter [2595/3125], train_loss:0.056454 Epoch [1/10], Iter [2596/3125], train_loss:0.044413 Epoch [1/10], Iter [2597/3125], train_loss:0.031464 Epoch [1/10], Iter [2598/3125], train_loss:0.051813 Epoch [1/10], Iter [2599/3125], train_loss:0.077083 Epoch [1/10], Iter [2600/3125], train_loss:0.040987 Epoch [1/10], Iter [2601/3125], train_loss:0.037267 Epoch [1/10], Iter [2602/3125], train_loss:0.033299 Epoch [1/10], Iter [2603/3125], train_loss:0.049933 Epoch [1/10], Iter [2604/3125], train_loss:0.050345 Epoch [1/10], Iter [2605/3125], train_loss:0.068158 Epoch [1/10], Iter [2606/3125], train_loss:0.063846 Epoch [1/10], Iter [2607/3125], train_loss:0.057081 Epoch [1/10], Iter [2608/3125], train_loss:0.050321 Epoch [1/10], Iter [2609/3125], train_loss:0.084901 Epoch [1/10], Iter [2610/3125], train_loss:0.061853 Epoch [1/10], Iter [2611/3125], train_loss:0.059709 Epoch [1/10], Iter [2612/3125], train_loss:0.057150 Epoch [1/10], Iter [2613/3125], train_loss:0.034964 Epoch [1/10], Iter [2614/3125], train_loss:0.044947 Epoch [1/10], Iter [2615/3125], train_loss:0.089898 Epoch [1/10], Iter [2616/3125], train_loss:0.052279 Epoch [1/10], Iter [2617/3125], train_loss:0.065590 Epoch [1/10], Iter [2618/3125], train_loss:0.079470 Epoch [1/10], Iter [2619/3125], train_loss:0.064696 Epoch [1/10], Iter [2620/3125], train_loss:0.031827 Epoch [1/10], Iter [2621/3125], train_loss:0.057286 Epoch [1/10], Iter [2622/3125], train_loss:0.059908 Epoch [1/10], Iter [2623/3125], train_loss:0.050808 Epoch [1/10], Iter [2624/3125], train_loss:0.076302 Epoch [1/10], Iter [2625/3125], train_loss:0.054479 Epoch [1/10], Iter [2626/3125], train_loss:0.050685 Epoch [1/10], Iter [2627/3125], train_loss:0.057106 Epoch [1/10], Iter [2628/3125], train_loss:0.050811 Epoch [1/10], Iter [2629/3125], train_loss:0.025450 Epoch [1/10], Iter [2630/3125], train_loss:0.035107 Epoch [1/10], Iter [2631/3125], train_loss:0.037918 Epoch [1/10], Iter [2632/3125], train_loss:0.049256 Epoch [1/10], Iter [2633/3125], train_loss:0.062963 Epoch [1/10], Iter [2634/3125], train_loss:0.043879 Epoch [1/10], Iter [2635/3125], train_loss:0.043937 Epoch [1/10], Iter [2636/3125], train_loss:0.043007 Epoch [1/10], Iter [2637/3125], train_loss:0.033700 Epoch [1/10], Iter [2638/3125], train_loss:0.024870 Epoch [1/10], Iter [2639/3125], train_loss:0.039514 Epoch [1/10], Iter [2640/3125], train_loss:0.067759 Epoch [1/10], Iter [2641/3125], train_loss:0.062978 Epoch [1/10], Iter [2642/3125], train_loss:0.073482 Epoch [1/10], Iter [2643/3125], train_loss:0.051648 Epoch [1/10], Iter [2644/3125], train_loss:0.065120 Epoch [1/10], Iter [2645/3125], train_loss:0.023624 Epoch [1/10], Iter [2646/3125], train_loss:0.019855 Epoch [1/10], Iter [2647/3125], train_loss:0.106905 Epoch [1/10], Iter [2648/3125], train_loss:0.058358 Epoch [1/10], Iter [2649/3125], train_loss:0.072519 Epoch [1/10], Iter [2650/3125], train_loss:0.070563 Epoch [1/10], Iter [2651/3125], train_loss:0.073849 Epoch [1/10], Iter [2652/3125], train_loss:0.051423 Epoch [1/10], Iter [2653/3125], train_loss:0.041773 Epoch [1/10], Iter [2654/3125], train_loss:0.042694 Epoch [1/10], Iter [2655/3125], train_loss:0.041109 Epoch [1/10], Iter [2656/3125], train_loss:0.046723 Epoch [1/10], Iter [2657/3125], train_loss:0.032426 Epoch [1/10], Iter [2658/3125], train_loss:0.031085 Epoch [1/10], Iter [2659/3125], train_loss:0.071443 Epoch [1/10], Iter [2660/3125], train_loss:0.034657 Epoch [1/10], Iter [2661/3125], train_loss:0.064858 Epoch [1/10], Iter [2662/3125], train_loss:0.011753 Epoch [1/10], Iter [2663/3125], train_loss:0.056094 Epoch [1/10], Iter [2664/3125], train_loss:0.039091 Epoch [1/10], Iter [2665/3125], train_loss:0.067260 Epoch [1/10], Iter [2666/3125], train_loss:0.054605 Epoch [1/10], Iter [2667/3125], train_loss:0.073443 Epoch [1/10], Iter [2668/3125], train_loss:0.047724 Epoch [1/10], Iter [2669/3125], train_loss:0.061778 Epoch [1/10], Iter [2670/3125], train_loss:0.052013 Epoch [1/10], Iter [2671/3125], train_loss:0.040040 Epoch [1/10], Iter [2672/3125], train_loss:0.058101 Epoch [1/10], Iter [2673/3125], train_loss:0.058269 Epoch [1/10], Iter [2674/3125], train_loss:0.056329 Epoch [1/10], Iter [2675/3125], train_loss:0.074943 Epoch [1/10], Iter [2676/3125], train_loss:0.060055 Epoch [1/10], Iter [2677/3125], train_loss:0.066210 Epoch [1/10], Iter [2678/3125], train_loss:0.077830 Epoch [1/10], Iter [2679/3125], train_loss:0.069789 Epoch [1/10], Iter [2680/3125], train_loss:0.022511 Epoch [1/10], Iter [2681/3125], train_loss:0.074430 Epoch [1/10], Iter [2682/3125], train_loss:0.064221 Epoch [1/10], Iter [2683/3125], train_loss:0.033731 Epoch [1/10], Iter [2684/3125], train_loss:0.057155 Epoch [1/10], Iter [2685/3125], train_loss:0.071050 Epoch [1/10], Iter [2686/3125], train_loss:0.031468 Epoch [1/10], Iter [2687/3125], train_loss:0.061247 Epoch [1/10], Iter [2688/3125], train_loss:0.033162 Epoch [1/10], Iter [2689/3125], train_loss:0.053674 Epoch [1/10], Iter [2690/3125], train_loss:0.052903 Epoch [1/10], Iter [2691/3125], train_loss:0.053036 Epoch [1/10], Iter [2692/3125], train_loss:0.031536 Epoch [1/10], Iter [2693/3125], train_loss:0.047191 Epoch [1/10], Iter [2694/3125], train_loss:0.053092 Epoch [1/10], Iter [2695/3125], train_loss:0.046388 Epoch [1/10], Iter [2696/3125], train_loss:0.081545 Epoch [1/10], Iter [2697/3125], train_loss:0.031258 Epoch [1/10], Iter [2698/3125], train_loss:0.065705 Epoch [1/10], Iter [2699/3125], train_loss:0.085829 Epoch [1/10], Iter [2700/3125], train_loss:0.036830 Epoch [1/10], Iter [2701/3125], train_loss:0.039658 Epoch [1/10], Iter [2702/3125], train_loss:0.034230 Epoch [1/10], Iter [2703/3125], train_loss:0.046603 Epoch [1/10], Iter [2704/3125], train_loss:0.062321 Epoch [1/10], Iter [2705/3125], train_loss:0.074843 Epoch [1/10], Iter [2706/3125], train_loss:0.064365 Epoch [1/10], Iter [2707/3125], train_loss:0.041580 Epoch [1/10], Iter [2708/3125], train_loss:0.042753 Epoch [1/10], Iter [2709/3125], train_loss:0.054325 Epoch [1/10], Iter [2710/3125], train_loss:0.029269 Epoch [1/10], Iter [2711/3125], train_loss:0.056201 Epoch [1/10], Iter [2712/3125], train_loss:0.032027 Epoch [1/10], Iter [2713/3125], train_loss:0.041384 Epoch [1/10], Iter [2714/3125], train_loss:0.042245 Epoch [1/10], Iter [2715/3125], train_loss:0.049180 Epoch [1/10], Iter [2716/3125], train_loss:0.071382 Epoch [1/10], Iter [2717/3125], train_loss:0.053056 Epoch [1/10], Iter [2718/3125], train_loss:0.076437 Epoch [1/10], Iter [2719/3125], train_loss:0.036449 Epoch [1/10], Iter [2720/3125], train_loss:0.037378 Epoch [1/10], Iter [2721/3125], train_loss:0.056445 Epoch [1/10], Iter [2722/3125], train_loss:0.070102 Epoch [1/10], Iter [2723/3125], train_loss:0.032661 Epoch [1/10], Iter [2724/3125], train_loss:0.045753 Epoch [1/10], Iter [2725/3125], train_loss:0.051136 Epoch [1/10], Iter [2726/3125], train_loss:0.048787 Epoch [1/10], Iter [2727/3125], train_loss:0.078822 Epoch [1/10], Iter [2728/3125], train_loss:0.053859 Epoch [1/10], Iter [2729/3125], train_loss:0.061877 Epoch [1/10], Iter [2730/3125], train_loss:0.068190 Epoch [1/10], Iter [2731/3125], train_loss:0.059085 Epoch [1/10], Iter [2732/3125], train_loss:0.041527 Epoch [1/10], Iter [2733/3125], train_loss:0.037386 Epoch [1/10], Iter [2734/3125], train_loss:0.045102 Epoch [1/10], Iter [2735/3125], train_loss:0.072924 Epoch [1/10], Iter [2736/3125], train_loss:0.024766 Epoch [1/10], Iter [2737/3125], train_loss:0.036317 Epoch [1/10], Iter [2738/3125], train_loss:0.060391 Epoch [1/10], Iter [2739/3125], train_loss:0.026071 Epoch [1/10], Iter [2740/3125], train_loss:0.045086 Epoch [1/10], Iter [2741/3125], train_loss:0.060746 Epoch [1/10], Iter [2742/3125], train_loss:0.037758 Epoch [1/10], Iter [2743/3125], train_loss:0.042991 Epoch [1/10], Iter [2744/3125], train_loss:0.057417 Epoch [1/10], Iter [2745/3125], train_loss:0.029067 Epoch [1/10], Iter [2746/3125], train_loss:0.095886 Epoch [1/10], Iter [2747/3125], train_loss:0.033592 Epoch [1/10], Iter [2748/3125], train_loss:0.043915 Epoch [1/10], Iter [2749/3125], train_loss:0.085850 Epoch [1/10], Iter [2750/3125], train_loss:0.066093 Epoch [1/10], Iter [2751/3125], train_loss:0.062001 Epoch [1/10], Iter [2752/3125], train_loss:0.069263 Epoch [1/10], Iter [2753/3125], train_loss:0.041522 Epoch [1/10], Iter [2754/3125], train_loss:0.056623 Epoch [1/10], Iter [2755/3125], train_loss:0.076867 Epoch [1/10], Iter [2756/3125], train_loss:0.063004 Epoch [1/10], Iter [2757/3125], train_loss:0.055485 Epoch [1/10], Iter [2758/3125], train_loss:0.066020 Epoch [1/10], Iter [2759/3125], train_loss:0.033939 Epoch [1/10], Iter [2760/3125], train_loss:0.032806 Epoch [1/10], Iter [2761/3125], train_loss:0.054655 Epoch [1/10], Iter [2762/3125], train_loss:0.050211 Epoch [1/10], Iter [2763/3125], train_loss:0.025504 Epoch [1/10], Iter [2764/3125], train_loss:0.052584 Epoch [1/10], Iter [2765/3125], train_loss:0.029184 Epoch [1/10], Iter [2766/3125], train_loss:0.020083 Epoch [1/10], Iter [2767/3125], train_loss:0.027875 Epoch [1/10], Iter [2768/3125], train_loss:0.024596 Epoch [1/10], Iter [2769/3125], train_loss:0.055002 Epoch [1/10], Iter [2770/3125], train_loss:0.055419 Epoch [1/10], Iter [2771/3125], train_loss:0.024973 Epoch [1/10], Iter [2772/3125], train_loss:0.086723 Epoch [1/10], Iter [2773/3125], train_loss:0.048133 Epoch [1/10], Iter [2774/3125], train_loss:0.046027 Epoch [1/10], Iter [2775/3125], train_loss:0.047695 Epoch [1/10], Iter [2776/3125], train_loss:0.037621 Epoch [1/10], Iter [2777/3125], train_loss:0.049847 Epoch [1/10], Iter [2778/3125], train_loss:0.050305 Epoch [1/10], Iter [2779/3125], train_loss:0.028408 Epoch [1/10], Iter [2780/3125], train_loss:0.057841 Epoch [1/10], Iter [2781/3125], train_loss:0.037195 Epoch [1/10], Iter [2782/3125], train_loss:0.046566 Epoch [1/10], Iter [2783/3125], train_loss:0.059322 Epoch [1/10], Iter [2784/3125], train_loss:0.089970 Epoch [1/10], Iter [2785/3125], train_loss:0.035622 Epoch [1/10], Iter [2786/3125], train_loss:0.036376 Epoch [1/10], Iter [2787/3125], train_loss:0.049406 Epoch [1/10], Iter [2788/3125], train_loss:0.027285 Epoch [1/10], Iter [2789/3125], train_loss:0.024182 Epoch [1/10], Iter [2790/3125], train_loss:0.058590 Epoch [1/10], Iter [2791/3125], train_loss:0.031623 Epoch [1/10], Iter [2792/3125], train_loss:0.064973 Epoch [1/10], Iter [2793/3125], train_loss:0.083880 Epoch [1/10], Iter [2794/3125], train_loss:0.063413 Epoch [1/10], Iter [2795/3125], train_loss:0.027198 Epoch [1/10], Iter [2796/3125], train_loss:0.065740 Epoch [1/10], Iter [2797/3125], train_loss:0.045814 Epoch [1/10], Iter [2798/3125], train_loss:0.058582 Epoch [1/10], Iter [2799/3125], train_loss:0.037425 Epoch [1/10], Iter [2800/3125], train_loss:0.040245 Epoch [1/10], Iter [2801/3125], train_loss:0.069127 Epoch [1/10], Iter [2802/3125], train_loss:0.038190 Epoch [1/10], Iter [2803/3125], train_loss:0.076748 Epoch [1/10], Iter [2804/3125], train_loss:0.063528 Epoch [1/10], Iter [2805/3125], train_loss:0.050070 Epoch [1/10], Iter [2806/3125], train_loss:0.043468 Epoch [1/10], Iter [2807/3125], train_loss:0.037768 Epoch [1/10], Iter [2808/3125], train_loss:0.069925 Epoch [1/10], Iter [2809/3125], train_loss:0.027971 Epoch [1/10], Iter [2810/3125], train_loss:0.045305 Epoch [1/10], Iter [2811/3125], train_loss:0.072035 Epoch [1/10], Iter [2812/3125], train_loss:0.027901 Epoch [1/10], Iter [2813/3125], train_loss:0.055258 Epoch [1/10], Iter [2814/3125], train_loss:0.033380 Epoch [1/10], Iter [2815/3125], train_loss:0.035067 Epoch [1/10], Iter [2816/3125], train_loss:0.062196 Epoch [1/10], Iter [2817/3125], train_loss:0.031055 Epoch [1/10], Iter [2818/3125], train_loss:0.027535 Epoch [1/10], Iter [2819/3125], train_loss:0.074925 Epoch [1/10], Iter [2820/3125], train_loss:0.014863 Epoch [1/10], Iter [2821/3125], train_loss:0.040033 Epoch [1/10], Iter [2822/3125], train_loss:0.073055 Epoch [1/10], Iter [2823/3125], train_loss:0.044778 Epoch [1/10], Iter [2824/3125], train_loss:0.041350 Epoch [1/10], Iter [2825/3125], train_loss:0.045701 Epoch [1/10], Iter [2826/3125], train_loss:0.069052 Epoch [1/10], Iter [2827/3125], train_loss:0.070689 Epoch [1/10], Iter [2828/3125], train_loss:0.073792 Epoch [1/10], Iter [2829/3125], train_loss:0.027273 Epoch [1/10], Iter [2830/3125], train_loss:0.070355 Epoch [1/10], Iter [2831/3125], train_loss:0.050928 Epoch [1/10], Iter [2832/3125], train_loss:0.063157 Epoch [1/10], Iter [2833/3125], train_loss:0.052722 Epoch [1/10], Iter [2834/3125], train_loss:0.066621 Epoch [1/10], Iter [2835/3125], train_loss:0.049870 Epoch [1/10], Iter [2836/3125], train_loss:0.045198 Epoch [1/10], Iter [2837/3125], train_loss:0.047708 Epoch [1/10], Iter [2838/3125], train_loss:0.031084 Epoch [1/10], Iter [2839/3125], train_loss:0.054982 Epoch [1/10], Iter [2840/3125], train_loss:0.062080 Epoch [1/10], Iter [2841/3125], train_loss:0.052313 Epoch [1/10], Iter [2842/3125], train_loss:0.027638 Epoch [1/10], Iter [2843/3125], train_loss:0.069474 Epoch [1/10], Iter [2844/3125], train_loss:0.051465 Epoch [1/10], Iter [2845/3125], train_loss:0.047240 Epoch [1/10], Iter [2846/3125], train_loss:0.043358 Epoch [1/10], Iter [2847/3125], train_loss:0.046753 Epoch [1/10], Iter [2848/3125], train_loss:0.059748 Epoch [1/10], Iter [2849/3125], train_loss:0.032166 Epoch [1/10], Iter [2850/3125], train_loss:0.051633 Epoch [1/10], Iter [2851/3125], train_loss:0.032861 Epoch [1/10], Iter [2852/3125], train_loss:0.046734 Epoch [1/10], Iter [2853/3125], train_loss:0.031587 Epoch [1/10], Iter [2854/3125], train_loss:0.028285 Epoch [1/10], Iter [2855/3125], train_loss:0.063359 Epoch [1/10], Iter [2856/3125], train_loss:0.063512 Epoch [1/10], Iter [2857/3125], train_loss:0.048190 Epoch [1/10], Iter [2858/3125], train_loss:0.070683 Epoch [1/10], Iter [2859/3125], train_loss:0.016137 Epoch [1/10], Iter [2860/3125], train_loss:0.045513 Epoch [1/10], Iter [2861/3125], train_loss:0.033696 Epoch [1/10], Iter [2862/3125], train_loss:0.056089 Epoch [1/10], Iter [2863/3125], train_loss:0.040835 Epoch [1/10], Iter [2864/3125], train_loss:0.059301 Epoch [1/10], Iter [2865/3125], train_loss:0.065590 Epoch [1/10], Iter [2866/3125], train_loss:0.054262 Epoch [1/10], Iter [2867/3125], train_loss:0.032128 Epoch [1/10], Iter [2868/3125], train_loss:0.070486 Epoch [1/10], Iter [2869/3125], train_loss:0.050579 Epoch [1/10], Iter [2870/3125], train_loss:0.048929 Epoch [1/10], Iter [2871/3125], train_loss:0.059329 Epoch [1/10], Iter [2872/3125], train_loss:0.059987 Epoch [1/10], Iter [2873/3125], train_loss:0.038087 Epoch [1/10], Iter [2874/3125], train_loss:0.042215 Epoch [1/10], Iter [2875/3125], train_loss:0.037359 Epoch [1/10], Iter [2876/3125], train_loss:0.064945 Epoch [1/10], Iter [2877/3125], train_loss:0.032644 Epoch [1/10], Iter [2878/3125], train_loss:0.035471 Epoch [1/10], Iter [2879/3125], train_loss:0.054034 Epoch [1/10], Iter [2880/3125], train_loss:0.055840 Epoch [1/10], Iter [2881/3125], train_loss:0.040988 Epoch [1/10], Iter [2882/3125], train_loss:0.076851 Epoch [1/10], Iter [2883/3125], train_loss:0.084683 Epoch [1/10], Iter [2884/3125], train_loss:0.052963 Epoch [1/10], Iter [2885/3125], train_loss:0.033718 Epoch [1/10], Iter [2886/3125], train_loss:0.047949 Epoch [1/10], Iter [2887/3125], train_loss:0.066821 Epoch [1/10], Iter [2888/3125], train_loss:0.062198 Epoch [1/10], Iter [2889/3125], train_loss:0.064902 Epoch [1/10], Iter [2890/3125], train_loss:0.057373 Epoch [1/10], Iter [2891/3125], train_loss:0.048909 Epoch [1/10], Iter [2892/3125], train_loss:0.047169 Epoch [1/10], Iter [2893/3125], train_loss:0.037598 Epoch [1/10], Iter [2894/3125], train_loss:0.044367 Epoch [1/10], Iter [2895/3125], train_loss:0.059186 Epoch [1/10], Iter [2896/3125], train_loss:0.027673 Epoch [1/10], Iter [2897/3125], train_loss:0.046781 Epoch [1/10], Iter [2898/3125], train_loss:0.044963 Epoch [1/10], Iter [2899/3125], train_loss:0.053782 Epoch [1/10], Iter [2900/3125], train_loss:0.037537 Epoch [1/10], Iter [2901/3125], train_loss:0.043916 Epoch [1/10], Iter [2902/3125], train_loss:0.056527 Epoch [1/10], Iter [2903/3125], train_loss:0.025347 Epoch [1/10], Iter [2904/3125], train_loss:0.038642 Epoch [1/10], Iter [2905/3125], train_loss:0.066414 Epoch [1/10], Iter [2906/3125], train_loss:0.041623 Epoch [1/10], Iter [2907/3125], train_loss:0.050016 Epoch [1/10], Iter [2908/3125], train_loss:0.043550 Epoch [1/10], Iter [2909/3125], train_loss:0.039868 Epoch [1/10], Iter [2910/3125], train_loss:0.026067 Epoch [1/10], Iter [2911/3125], train_loss:0.045635 Epoch [1/10], Iter [2912/3125], train_loss:0.070421 Epoch [1/10], Iter [2913/3125], train_loss:0.063436 Epoch [1/10], Iter [2914/3125], train_loss:0.049509 Epoch [1/10], Iter [2915/3125], train_loss:0.071456 Epoch [1/10], Iter [2916/3125], train_loss:0.029413 Epoch [1/10], Iter [2917/3125], train_loss:0.042938 Epoch [1/10], Iter [2918/3125], train_loss:0.060789 Epoch [1/10], Iter [2919/3125], train_loss:0.035195 Epoch [1/10], Iter [2920/3125], train_loss:0.049221 Epoch [1/10], Iter [2921/3125], train_loss:0.032330 Epoch [1/10], Iter [2922/3125], train_loss:0.037042 Epoch [1/10], Iter [2923/3125], train_loss:0.065629 Epoch [1/10], Iter [2924/3125], train_loss:0.022151 Epoch [1/10], Iter [2925/3125], train_loss:0.056095 Epoch [1/10], Iter [2926/3125], train_loss:0.034682 Epoch [1/10], Iter [2927/3125], train_loss:0.081066 Epoch [1/10], Iter [2928/3125], train_loss:0.038369 Epoch [1/10], Iter [2929/3125], train_loss:0.025391 Epoch [1/10], Iter [2930/3125], train_loss:0.043224 Epoch [1/10], Iter [2931/3125], train_loss:0.073949 Epoch [1/10], Iter [2932/3125], train_loss:0.062411 Epoch [1/10], Iter [2933/3125], train_loss:0.048195 Epoch [1/10], Iter [2934/3125], train_loss:0.041265 Epoch [1/10], Iter [2935/3125], train_loss:0.051641 Epoch [1/10], Iter [2936/3125], train_loss:0.051737 Epoch [1/10], Iter [2937/3125], train_loss:0.085035 Epoch [1/10], Iter [2938/3125], train_loss:0.041058 Epoch [1/10], Iter [2939/3125], train_loss:0.052639 Epoch [1/10], Iter [2940/3125], train_loss:0.067252 Epoch [1/10], Iter [2941/3125], train_loss:0.067398 Epoch [1/10], Iter [2942/3125], train_loss:0.035560 Epoch [1/10], Iter [2943/3125], train_loss:0.026009 Epoch [1/10], Iter [2944/3125], train_loss:0.028872 Epoch [1/10], Iter [2945/3125], train_loss:0.100868 Epoch [1/10], Iter [2946/3125], train_loss:0.073545 Epoch [1/10], Iter [2947/3125], train_loss:0.064018 Epoch [1/10], Iter [2948/3125], train_loss:0.038802 Epoch [1/10], Iter [2949/3125], train_loss:0.035678 Epoch [1/10], Iter [2950/3125], train_loss:0.057404 Epoch [1/10], Iter [2951/3125], train_loss:0.038700 Epoch [1/10], Iter [2952/3125], train_loss:0.066487 Epoch [1/10], Iter [2953/3125], train_loss:0.036224 Epoch [1/10], Iter [2954/3125], train_loss:0.049169 Epoch [1/10], Iter [2955/3125], train_loss:0.060712 Epoch [1/10], Iter [2956/3125], train_loss:0.054164 Epoch [1/10], Iter [2957/3125], train_loss:0.045852 Epoch [1/10], Iter [2958/3125], train_loss:0.046974 Epoch [1/10], Iter [2959/3125], train_loss:0.046566 Epoch [1/10], Iter [2960/3125], train_loss:0.029474 Epoch [1/10], Iter [2961/3125], train_loss:0.048267 Epoch [1/10], Iter [2962/3125], train_loss:0.093090 Epoch [1/10], Iter [2963/3125], train_loss:0.059621 Epoch [1/10], Iter [2964/3125], train_loss:0.053808 Epoch [1/10], Iter [2965/3125], train_loss:0.019410 Epoch [1/10], Iter [2966/3125], train_loss:0.080236 Epoch [1/10], Iter [2967/3125], train_loss:0.048073 Epoch [1/10], Iter [2968/3125], train_loss:0.045536 Epoch [1/10], Iter [2969/3125], train_loss:0.037549 Epoch [1/10], Iter [2970/3125], train_loss:0.077696 Epoch [1/10], Iter [2971/3125], train_loss:0.044552 Epoch [1/10], Iter [2972/3125], train_loss:0.028185 Epoch [1/10], Iter [2973/3125], train_loss:0.027866 Epoch [1/10], Iter [2974/3125], train_loss:0.047479 Epoch [1/10], Iter [2975/3125], train_loss:0.047819 Epoch [1/10], Iter [2976/3125], train_loss:0.040483 Epoch [1/10], Iter [2977/3125], train_loss:0.070177 Epoch [1/10], Iter [2978/3125], train_loss:0.021798 Epoch [1/10], Iter [2979/3125], train_loss:0.041524 Epoch [1/10], Iter [2980/3125], train_loss:0.038104 Epoch [1/10], Iter [2981/3125], train_loss:0.050260 Epoch [1/10], Iter [2982/3125], train_loss:0.047825 Epoch [1/10], Iter [2983/3125], train_loss:0.059096 Epoch [1/10], Iter [2984/3125], train_loss:0.036488 Epoch [1/10], Iter [2985/3125], train_loss:0.048905 Epoch [1/10], Iter [2986/3125], train_loss:0.092370 Epoch [1/10], Iter [2987/3125], train_loss:0.065375 Epoch [1/10], Iter [2988/3125], train_loss:0.050387 Epoch [1/10], Iter [2989/3125], train_loss:0.040478 Epoch [1/10], Iter [2990/3125], train_loss:0.070799 Epoch [1/10], Iter [2991/3125], train_loss:0.074366 Epoch [1/10], Iter [2992/3125], train_loss:0.035977 Epoch [1/10], Iter [2993/3125], train_loss:0.050263 Epoch [1/10], Iter [2994/3125], train_loss:0.038603 Epoch [1/10], Iter [2995/3125], train_loss:0.091508 Epoch [1/10], Iter [2996/3125], train_loss:0.041844 Epoch [1/10], Iter [2997/3125], train_loss:0.037022 Epoch [1/10], Iter [2998/3125], train_loss:0.035034 Epoch [1/10], Iter [2999/3125], train_loss:0.035311 Epoch [1/10], Iter [3000/3125], train_loss:0.027116 Epoch [1/10], Iter [3001/3125], train_loss:0.029279 Epoch [1/10], Iter [3002/3125], train_loss:0.033700 Epoch [1/10], Iter [3003/3125], train_loss:0.058413 Epoch [1/10], Iter [3004/3125], train_loss:0.023097 Epoch [1/10], Iter [3005/3125], train_loss:0.045443 Epoch [1/10], Iter [3006/3125], train_loss:0.029848 Epoch [1/10], Iter [3007/3125], train_loss:0.052713 Epoch [1/10], Iter [3008/3125], train_loss:0.035926 Epoch [1/10], Iter [3009/3125], train_loss:0.058838 Epoch [1/10], Iter [3010/3125], train_loss:0.056548 Epoch [1/10], Iter [3011/3125], train_loss:0.039738 Epoch [1/10], Iter [3012/3125], train_loss:0.053625 Epoch [1/10], Iter [3013/3125], train_loss:0.032034 Epoch [1/10], Iter [3014/3125], train_loss:0.099142 Epoch [1/10], Iter [3015/3125], train_loss:0.041366 Epoch [1/10], Iter [3016/3125], train_loss:0.041256 Epoch [1/10], Iter [3017/3125], train_loss:0.037890 Epoch [1/10], Iter [3018/3125], train_loss:0.051505 Epoch [1/10], Iter [3019/3125], train_loss:0.032262 Epoch [1/10], Iter [3020/3125], train_loss:0.108767 Epoch [1/10], Iter [3021/3125], train_loss:0.039950 Epoch [1/10], Iter [3022/3125], train_loss:0.074630 Epoch [1/10], Iter [3023/3125], train_loss:0.074800 Epoch [1/10], Iter [3024/3125], train_loss:0.068196 Epoch [1/10], Iter [3025/3125], train_loss:0.039287 Epoch [1/10], Iter [3026/3125], train_loss:0.052125 Epoch [1/10], Iter [3027/3125], train_loss:0.025400 Epoch [1/10], Iter [3028/3125], train_loss:0.066438 Epoch [1/10], Iter [3029/3125], train_loss:0.038479 Epoch [1/10], Iter [3030/3125], train_loss:0.057109 Epoch [1/10], Iter [3031/3125], train_loss:0.034795 Epoch [1/10], Iter [3032/3125], train_loss:0.027901 Epoch [1/10], Iter [3033/3125], train_loss:0.050128 Epoch [1/10], Iter [3034/3125], train_loss:0.032854 Epoch [1/10], Iter [3035/3125], train_loss:0.053708 Epoch [1/10], Iter [3036/3125], train_loss:0.088014 Epoch [1/10], Iter [3037/3125], train_loss:0.075370 Epoch [1/10], Iter [3038/3125], train_loss:0.075677 Epoch [1/10], Iter [3039/3125], train_loss:0.063172 Epoch [1/10], Iter [3040/3125], train_loss:0.076501 Epoch [1/10], Iter [3041/3125], train_loss:0.058156 Epoch [1/10], Iter [3042/3125], train_loss:0.061623 Epoch [1/10], Iter [3043/3125], train_loss:0.066724 Epoch [1/10], Iter [3044/3125], train_loss:0.053383 Epoch [1/10], Iter [3045/3125], train_loss:0.050633 Epoch [1/10], Iter [3046/3125], train_loss:0.058951 Epoch [1/10], Iter [3047/3125], train_loss:0.042557 Epoch [1/10], Iter [3048/3125], train_loss:0.030441 Epoch [1/10], Iter [3049/3125], train_loss:0.024813 Epoch [1/10], Iter [3050/3125], train_loss:0.033426 Epoch [1/10], Iter [3051/3125], train_loss:0.055847 Epoch [1/10], Iter [3052/3125], train_loss:0.044011 Epoch [1/10], Iter [3053/3125], train_loss:0.027693 Epoch [1/10], Iter [3054/3125], train_loss:0.051109 Epoch [1/10], Iter [3055/3125], train_loss:0.040254 Epoch [1/10], Iter [3056/3125], train_loss:0.022783 Epoch [1/10], Iter [3057/3125], train_loss:0.052132 Epoch [1/10], Iter [3058/3125], train_loss:0.056355 Epoch [1/10], Iter [3059/3125], train_loss:0.058088 Epoch [1/10], Iter [3060/3125], train_loss:0.031884 Epoch [1/10], Iter [3061/3125], train_loss:0.049938 Epoch [1/10], Iter [3062/3125], train_loss:0.039419 Epoch [1/10], Iter [3063/3125], train_loss:0.083298 Epoch [1/10], Iter [3064/3125], train_loss:0.052872 Epoch [1/10], Iter [3065/3125], train_loss:0.035879 Epoch [1/10], Iter [3066/3125], train_loss:0.040194 Epoch [1/10], Iter [3067/3125], train_loss:0.053528 Epoch [1/10], Iter [3068/3125], train_loss:0.036000 Epoch [1/10], Iter [3069/3125], train_loss:0.039297 Epoch [1/10], Iter [3070/3125], train_loss:0.058124 Epoch [1/10], Iter [3071/3125], train_loss:0.032619 Epoch [1/10], Iter [3072/3125], train_loss:0.056250 Epoch [1/10], Iter [3073/3125], train_loss:0.053652 Epoch [1/10], Iter [3074/3125], train_loss:0.033999 Epoch [1/10], Iter [3075/3125], train_loss:0.041154 Epoch [1/10], Iter [3076/3125], train_loss:0.064491 Epoch [1/10], Iter [3077/3125], train_loss:0.051499 Epoch [1/10], Iter [3078/3125], train_loss:0.072850 Epoch [1/10], Iter [3079/3125], train_loss:0.074374 Epoch [1/10], Iter [3080/3125], train_loss:0.037571 Epoch [1/10], Iter [3081/3125], train_loss:0.043772 Epoch [1/10], Iter [3082/3125], train_loss:0.042835 Epoch [1/10], Iter [3083/3125], train_loss:0.049374 Epoch [1/10], Iter [3084/3125], train_loss:0.069075 Epoch [1/10], Iter [3085/3125], train_loss:0.028113 Epoch [1/10], Iter [3086/3125], train_loss:0.037884 Epoch [1/10], Iter [3087/3125], train_loss:0.050082 Epoch [1/10], Iter [3088/3125], train_loss:0.063452 Epoch [1/10], Iter [3089/3125], train_loss:0.053441 Epoch [1/10], Iter [3090/3125], train_loss:0.041038 Epoch [1/10], Iter [3091/3125], train_loss:0.059465 Epoch [1/10], Iter [3092/3125], train_loss:0.027648 Epoch [1/10], Iter [3093/3125], train_loss:0.034605 Epoch [1/10], Iter [3094/3125], train_loss:0.019859 Epoch [1/10], Iter [3095/3125], train_loss:0.031989 Epoch [1/10], Iter [3096/3125], train_loss:0.051489 Epoch [1/10], Iter [3097/3125], train_loss:0.056322 Epoch [1/10], Iter [3098/3125], train_loss:0.046863 Epoch [1/10], Iter [3099/3125], train_loss:0.047653 Epoch [1/10], Iter [3100/3125], train_loss:0.050260 Epoch [1/10], Iter [3101/3125], train_loss:0.080984 Epoch [1/10], Iter [3102/3125], train_loss:0.039387 Epoch [1/10], Iter [3103/3125], train_loss:0.029410 Epoch [1/10], Iter [3104/3125], train_loss:0.038941 Epoch [1/10], Iter [3105/3125], train_loss:0.043713 Epoch [1/10], Iter [3106/3125], train_loss:0.037539 Epoch [1/10], Iter [3107/3125], train_loss:0.025358 Epoch [1/10], Iter [3108/3125], train_loss:0.071836 Epoch [1/10], Iter [3109/3125], train_loss:0.056706 Epoch [1/10], Iter [3110/3125], train_loss:0.033099 Epoch [1/10], Iter [3111/3125], train_loss:0.037032 Epoch [1/10], Iter [3112/3125], train_loss:0.038965 Epoch [1/10], Iter [3113/3125], train_loss:0.041378 Epoch [1/10], Iter [3114/3125], train_loss:0.049832 Epoch [1/10], Iter [3115/3125], train_loss:0.044040 Epoch [1/10], Iter [3116/3125], train_loss:0.029385 Epoch [1/10], Iter [3117/3125], train_loss:0.059979 Epoch [1/10], Iter [3118/3125], train_loss:0.067147 Epoch [1/10], Iter [3119/3125], train_loss:0.057981 Epoch [1/10], Iter [3120/3125], train_loss:0.028045 Epoch [1/10], Iter [3121/3125], train_loss:0.042211 Epoch [1/10], Iter [3122/3125], train_loss:0.056431 Epoch [1/10], Iter [3123/3125], train_loss:0.044317 Epoch [1/10], Iter [3124/3125], train_loss:0.054007 Epoch [1/10], Iter [3125/3125], train_loss:0.042914---------------------------------------------------------------------------NameError Traceback (most recent call last)~\AppData\Local\Temp/ipykernel_14844/2960384600.py in module40 test_total_correct 041 test_total_num 0 --- 42 for iter,(images,labels) in enumerate(test_loader):43 images images.to(device)44 labels labels.to(device)NameError: name test_loader is not defined2、动态调整学习率 2.1 torch.optim.lr_scheduler 学习率选择的问题 1、学习率设置过小会极大降低收敛速度增加训练时间2、学习率设置太大可能导致参数在最优解两侧来回振荡 以上问题都是学习率设置不满足模型训练的需求解决方案 PyTorch中提供了scheduler 官方API提供的torch.optim.lr_scheduler动态学习率 lr_scheduler.LambdaLR lr_scheduler.MultiplicativeLR lr_scheduler.StepLR lr_scheduler.MultiStepLR lr_scheduler.ExponentialLR lr_scheduler.CosineAnnealingLR lr_scheduler.ReduceLROnPlateau lr_scheduler.CyclicLR lr_scheduler.OneCycleLR lr_scheduler.CosineAnnealingWarmRestarts 2.2、torch.optim.lr_scheduler.LambdaLR torch.optim.lr_scheduler.LambdaLR(optimizer, lr_lambda, last_epoch- 1, verboseFalse) # LambdaLR 实现 lr_lambda f(epoch) new_lr lr_lambda * init_lr思想:初始学习率乘以系数由于每一次乘系数都是乘初始学习率因此系数往往是epoch的函数。 #伪代码Assuming optimizer has two groups.lambda1 lambda epoch: 1 / (epoch1)scheduler LambdaLR(optimizer, lr_lambdalambda1)for epoch in range(100):train(...)validate(...)scheduler.step()[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-f4P8ROuA-1692613806234)(attachment:image-2.png)] MultiplicativeLR torch.optim.lr_scheduler.MultiplicativeLR(optimizer, lr_lambda, last_epoch- 1, verboseFalse) 与LambdaLR不同该方法用前一次的学习率乘以lr_lambda,因此通常lr_lambda函数不需要与epoch有关。 new_lr lr_lambda * old_lr [外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-g2URgkPf-1692613806234)(attachment:image.png)] 2.2、自定义scheduler 官方给的动态学习率调整的API如果均不能满足我们的诉求应该怎么办 我们可以通过自定义函数adjust_learning_rate来改变param_group中lr的值 1、官方的API均不能满足诉求2、我们根据adjust_learning_rate实现学习率调整方法 # 训练中调用学习率方法 optimizer torch.optim.SGD(model.parameters(),lr args.lr,momentum 0.9) for epoch in range(10):train(...)validate(...)adjust_learning_rate(optimizer,epoch)#函数分段每隔几(10)段个epoch,第一个epoch为序号0不计使学习率变乘以0.1的epoch次方数 def adjust_learning_rate(optim, epoch, size10, gamma0.1):if (epoch 1) % size 0:pow (epoch 1) // sizelr learning_rate * np.power(gamma, pow)for param_group in optim.param_groups:param_group[lr] lr代码实例 lr_scheduler.LambdaLRadjust_learning_rate #训练验证 writer SummaryWriter(../train_skills) # 定义损失函数和优化器 device torch.device(cuda:0 if torch.cuda.is_available() else cpu) # 损失函数 criterion nn.CrossEntropyLoss() # 优化器 optimizer torch.optim.Adam(Resnet50.parameters(), lrlr)# 自定义 scheduler scheduler_my LambdaLR(optimizer, lr_lambdalambda epoch: 1/(epoch1),verbose True) print(初始化的学习率, optimizer.defaults[lr])epoch max_epochs Resnet50 Resnet50.to(device) total_step len(train_loader) train_all_loss [] test_all_loss []for i in range(epoch):Resnet50.train()train_total_loss 0train_total_num 0train_total_correct 0for iter, (images,labels) in enumerate(train_loader):images images.to(device)labels labels.to(device)outputs Resnet50(images)loss criterion(outputs,labels)train_total_correct (outputs.argmax(1) labels).sum().item()#backwordoptimizer.zero_grad()loss.backward()optimizer.step()train_total_num labels.shape[0]train_total_loss loss.item()print(Epoch [{}/{}], Iter [{}/{}], train_loss:{:4f}.format(i1,epoch,iter1,total_step,loss.item()/labels.shape[0]))writer.add_scalar(lr, optim.param_groups[0][lr], i)print(第%d个epoch的学习率%f % (epoch, optimizer.param_groups[0][lr]))scheduler_my.step() #scheduler#自定义调整lr # adjust_learning_rate(optimizer, i)Resnet50.eval()test_total_loss 0test_total_correct 0test_total_num 0for iter,(images,labels) in enumerate(test_loader):images images.to(device)labels labels.to(device)outputs Resnet50(images)loss criterion(outputs,labels)test_total_correct (outputs.argmax(1) labels).sum().item()test_total_loss loss.item()test_total_num labels.shape[0]print(Epoch [{}/{}], train_loss:{:.4f}, train_acc:{:.4f}%, test_loss:{:.4f}, test_acc:{:.4f}%.format(i1, epoch, train_total_loss / train_total_num, train_total_correct / train_total_num * 100, test_total_loss / test_total_num, test_total_correct / test_total_num * 100))train_all_loss.append(np.round(train_total_loss / train_total_num,4))test_all_loss.append(np.round(test_total_loss / test_total_num,4)) writer.close()Adjusting learning rate of group 0 to 1.0000e-04. 初始化的学习率 0.0001 Epoch [1/2], Iter [1/3125], train_loss:0.777986 Epoch [1/2], Iter [2/3125], train_loss:0.662992 Epoch [1/2], Iter [3/3125], train_loss:0.767887 Epoch [1/2], Iter [4/3125], train_loss:0.748286 Epoch [1/2], Iter [5/3125], train_loss:0.686887 Epoch [1/2], Iter [6/3125], train_loss:0.675070 Epoch [1/2], Iter [7/3125], train_loss:0.655532 Epoch [1/2], Iter [8/3125], train_loss:0.713970 Epoch [1/2], Iter [9/3125], train_loss:0.675706 Epoch [1/2], Iter [10/3125], train_loss:0.665308 Epoch [1/2], Iter [11/3125], train_loss:0.670263 Epoch [1/2], Iter [12/3125], train_loss:0.597091 Epoch [1/2], Iter [13/3125], train_loss:0.541138 Epoch [1/2], Iter [14/3125], train_loss:0.471112 Epoch [1/2], Iter [15/3125], train_loss:0.570017 Epoch [1/2], Iter [16/3125], train_loss:0.569556 Epoch [1/2], Iter [17/3125], train_loss:0.552114 Epoch [1/2], Iter [18/3125], train_loss:0.569929 Epoch [1/2], Iter [19/3125], train_loss:0.524716 Epoch [1/2], Iter [20/3125], train_loss:0.522762 Epoch [1/2], Iter [21/3125], train_loss:0.499370 Epoch [1/2], Iter [22/3125], train_loss:0.459812 Epoch [1/2], Iter [23/3125], train_loss:0.407852 Epoch [1/2], Iter [24/3125], train_loss:0.472173 Epoch [1/2], Iter [25/3125], train_loss:0.370801 Epoch [1/2], Iter [26/3125], train_loss:0.459706 Epoch [1/2], Iter [27/3125], train_loss:0.403983 Epoch [1/2], Iter [28/3125], train_loss:0.372209 Epoch [1/2], Iter [29/3125], train_loss:0.357835 Epoch [1/2], Iter [30/3125], train_loss:0.501332 Epoch [1/2], Iter [31/3125], train_loss:0.354409 Epoch [1/2], Iter [32/3125], train_loss:0.352994 Epoch [1/2], Iter [33/3125], train_loss:0.359231 Epoch [1/2], Iter [34/3125], train_loss:0.378708 Epoch [1/2], Iter [35/3125], train_loss:0.445062 Epoch [1/2], Iter [36/3125], train_loss:0.345325 Epoch [1/2], Iter [37/3125], train_loss:0.290598 Epoch [1/2], Iter [38/3125], train_loss:0.355161 Epoch [1/2], Iter [39/3125], train_loss:0.295590 Epoch [1/2], Iter [40/3125], train_loss:0.269099 Epoch [1/2], Iter [41/3125], train_loss:0.339802 Epoch [1/2], Iter [42/3125], train_loss:0.251694 Epoch [1/2], Iter [43/3125], train_loss:0.328401 Epoch [1/2], Iter [44/3125], train_loss:0.257955 Epoch [1/2], Iter [45/3125], train_loss:0.325558 Epoch [1/2], Iter [46/3125], train_loss:0.342137 Epoch [1/2], Iter [47/3125], train_loss:0.259149 Epoch [1/2], Iter [48/3125], train_loss:0.249372 Epoch [1/2], Iter [49/3125], train_loss:0.257600 Epoch [1/2], Iter [50/3125], train_loss:0.289483 Epoch [1/2], Iter [51/3125], train_loss:0.301230 Epoch [1/2], Iter [52/3125], train_loss:0.217237 Epoch [1/2], Iter [53/3125], train_loss:0.279841 Epoch [1/2], Iter [54/3125], train_loss:0.261875 Epoch [1/2], Iter [55/3125], train_loss:0.216530 Epoch [1/2], Iter [56/3125], train_loss:0.279174 Epoch [1/2], Iter [57/3125], train_loss:0.188948 Epoch [1/2], Iter [58/3125], train_loss:0.207412 Epoch [1/2], Iter [59/3125], train_loss:0.239609 Epoch [1/2], Iter [60/3125], train_loss:0.195655 Epoch [1/2], Iter [61/3125], train_loss:0.196358 Epoch [1/2], Iter [62/3125], train_loss:0.264320 Epoch [1/2], Iter [63/3125], train_loss:0.193350 Epoch [1/2], Iter [64/3125], train_loss:0.165940 Epoch [1/2], Iter [65/3125], train_loss:0.267849 Epoch [1/2], Iter [66/3125], train_loss:0.221301 Epoch [1/2], Iter [67/3125], train_loss:0.269790 Epoch [1/2], Iter [68/3125], train_loss:0.227033 Epoch [1/2], Iter [69/3125], train_loss:0.156358 Epoch [1/2], Iter [70/3125], train_loss:0.210391 Epoch [1/2], Iter [71/3125], train_loss:0.251990 Epoch [1/2], Iter [72/3125], train_loss:0.177134 Epoch [1/2], Iter [73/3125], train_loss:0.155195 Epoch [1/2], Iter [74/3125], train_loss:0.251515 Epoch [1/2], Iter [75/3125], train_loss:0.159152 Epoch [1/2], Iter [76/3125], train_loss:0.166255 Epoch [1/2], Iter [77/3125], train_loss:0.115882 Epoch [1/2], Iter [78/3125], train_loss:0.175745 Epoch [1/2], Iter [79/3125], train_loss:0.138844 Epoch [1/2], Iter [80/3125], train_loss:0.176611 Epoch [1/2], Iter [81/3125], train_loss:0.161312 Epoch [1/2], Iter [82/3125], train_loss:0.148712 Epoch [1/2], Iter [83/3125], train_loss:0.207151 Epoch [1/2], Iter [84/3125], train_loss:0.111603 Epoch [1/2], Iter [85/3125], train_loss:0.107699 Epoch [1/2], Iter [86/3125], train_loss:0.162084 Epoch [1/2], Iter [87/3125], train_loss:0.199193 Epoch [1/2], Iter [88/3125], train_loss:0.138881 Epoch [1/2], Iter [89/3125], train_loss:0.161221 Epoch [1/2], Iter [90/3125], train_loss:0.149200 Epoch [1/2], Iter [91/3125], train_loss:0.151864 Epoch [1/2], Iter [92/3125], train_loss:0.201360 Epoch [1/2], Iter [93/3125], train_loss:0.169258 Epoch [1/2], Iter [94/3125], train_loss:0.149062 Epoch [1/2], Iter [95/3125], train_loss:0.149584 Epoch [1/2], Iter [96/3125], train_loss:0.145563 Epoch [1/2], Iter [97/3125], train_loss:0.126489 Epoch [1/2], Iter [98/3125], train_loss:0.139146 Epoch [1/2], Iter [99/3125], train_loss:0.138828 Epoch [1/2], Iter [100/3125], train_loss:0.133510 Epoch [1/2], Iter [101/3125], train_loss:0.137596 Epoch [1/2], Iter [102/3125], train_loss:0.130815 Epoch [1/2], Iter [103/3125], train_loss:0.156223 Epoch [1/2], Iter [104/3125], train_loss:0.101501 Epoch [1/2], Iter [105/3125], train_loss:0.119640 Epoch [1/2], Iter [106/3125], train_loss:0.145987 Epoch [1/2], Iter [107/3125], train_loss:0.182159 Epoch [1/2], Iter [108/3125], train_loss:0.134178 Epoch [1/2], Iter [109/3125], train_loss:0.125466 Epoch [1/2], Iter [110/3125], train_loss:0.136854 Epoch [1/2], Iter [111/3125], train_loss:0.114577 Epoch [1/2], Iter [112/3125], train_loss:0.176352 Epoch [1/2], Iter [113/3125], train_loss:0.114336 Epoch [1/2], Iter [114/3125], train_loss:0.132073 Epoch [1/2], Iter [115/3125], train_loss:0.132009 Epoch [1/2], Iter [116/3125], train_loss:0.138485 Epoch [1/2], Iter [117/3125], train_loss:0.131889 Epoch [1/2], Iter [118/3125], train_loss:0.127713 Epoch [1/2], Iter [119/3125], train_loss:0.136108 Epoch [1/2], Iter [120/3125], train_loss:0.099374 Epoch [1/2], Iter [121/3125], train_loss:0.177180 Epoch [1/2], Iter [122/3125], train_loss:0.133789 Epoch [1/2], Iter [123/3125], train_loss:0.108010 Epoch [1/2], Iter [124/3125], train_loss:0.124499 Epoch [1/2], Iter [125/3125], train_loss:0.145130 Epoch [1/2], Iter [126/3125], train_loss:0.139046 Epoch [1/2], Iter [127/3125], train_loss:0.162694 Epoch [1/2], Iter [128/3125], train_loss:0.106318 Epoch [1/2], Iter [129/3125], train_loss:0.136911 Epoch [1/2], Iter [130/3125], train_loss:0.161438 Epoch [1/2], Iter [131/3125], train_loss:0.116436 Epoch [1/2], Iter [132/3125], train_loss:0.145941 Epoch [1/2], Iter [133/3125], train_loss:0.114138 Epoch [1/2], Iter [134/3125], train_loss:0.167708 Epoch [1/2], Iter [135/3125], train_loss:0.137426 Epoch [1/2], Iter [136/3125], train_loss:0.181821 Epoch [1/2], Iter [137/3125], train_loss:0.126747 Epoch [1/2], Iter [138/3125], train_loss:0.161444 Epoch [1/2], Iter [139/3125], train_loss:0.137294 Epoch [1/2], Iter [140/3125], train_loss:0.140909 Epoch [1/2], Iter [141/3125], train_loss:0.127225 Epoch [1/2], Iter [142/3125], train_loss:0.086217 Epoch [1/2], Iter [143/3125], train_loss:0.125356 Epoch [1/2], Iter [144/3125], train_loss:0.152855 Epoch [1/2], Iter [145/3125], train_loss:0.182545 Epoch [1/2], Iter [146/3125], train_loss:0.076299 Epoch [1/2], Iter [147/3125], train_loss:0.154243 Epoch [1/2], Iter [148/3125], train_loss:0.101580 Epoch [1/2], Iter [149/3125], train_loss:0.136949 Epoch [1/2], Iter [150/3125], train_loss:0.137361 Epoch [1/2], Iter [151/3125], train_loss:0.119204 Epoch [1/2], Iter [152/3125], train_loss:0.126940 Epoch [1/2], Iter [153/3125], train_loss:0.127168 Epoch [1/2], Iter [154/3125], train_loss:0.132602 Epoch [1/2], Iter [155/3125], train_loss:0.112731 Epoch [1/2], Iter [156/3125], train_loss:0.128222 Epoch [1/2], Iter [157/3125], train_loss:0.112968 Epoch [1/2], Iter [158/3125], train_loss:0.106631 Epoch [1/2], Iter [159/3125], train_loss:0.131883 Epoch [1/2], Iter [160/3125], train_loss:0.105249 Epoch [1/2], Iter [161/3125], train_loss:0.148656 Epoch [1/2], Iter [162/3125], train_loss:0.115082 Epoch [1/2], Iter [163/3125], train_loss:0.099327 Epoch [1/2], Iter [164/3125], train_loss:0.131512 Epoch [1/2], Iter [165/3125], train_loss:0.121838 Epoch [1/2], Iter [166/3125], train_loss:0.122599 Epoch [1/2], Iter [167/3125], train_loss:0.108223 Epoch [1/2], Iter [168/3125], train_loss:0.157398 Epoch [1/2], Iter [169/3125], train_loss:0.112632 Epoch [1/2], Iter [170/3125], train_loss:0.092063 Epoch [1/2], Iter [171/3125], train_loss:0.092099 Epoch [1/2], Iter [172/3125], train_loss:0.143247 Epoch [1/2], Iter [173/3125], train_loss:0.107952 Epoch [1/2], Iter [174/3125], train_loss:0.150982 Epoch [1/2], Iter [175/3125], train_loss:0.154513 Epoch [1/2], Iter [176/3125], train_loss:0.122460 Epoch [1/2], Iter [177/3125], train_loss:0.130054 Epoch [1/2], Iter [178/3125], train_loss:0.075364 Epoch [1/2], Iter [179/3125], train_loss:0.092844 Epoch [1/2], Iter [180/3125], train_loss:0.131176 Epoch [1/2], Iter [181/3125], train_loss:0.089559 Epoch [1/2], Iter [182/3125], train_loss:0.137490 Epoch [1/2], Iter [183/3125], train_loss:0.148960 Epoch [1/2], Iter [184/3125], train_loss:0.088713 Epoch [1/2], Iter [185/3125], train_loss:0.098040 Epoch [1/2], Iter [186/3125], train_loss:0.159430 Epoch [1/2], Iter [187/3125], train_loss:0.091044 Epoch [1/2], Iter [188/3125], train_loss:0.108532 Epoch [1/2], Iter [189/3125], train_loss:0.089453 Epoch [1/2], Iter [190/3125], train_loss:0.112841 Epoch [1/2], Iter [191/3125], train_loss:0.150818 Epoch [1/2], Iter [192/3125], train_loss:0.112883 Epoch [1/2], Iter [193/3125], train_loss:0.124884 Epoch [1/2], Iter [194/3125], train_loss:0.107502 Epoch [1/2], Iter [195/3125], train_loss:0.099678 Epoch [1/2], Iter [196/3125], train_loss:0.183032 Epoch [1/2], Iter [197/3125], train_loss:0.111150 Epoch [1/2], Iter [198/3125], train_loss:0.136155 Epoch [1/2], Iter [199/3125], train_loss:0.113451 Epoch [1/2], Iter [200/3125], train_loss:0.144825 Epoch [1/2], Iter [201/3125], train_loss:0.133655 Epoch [1/2], Iter [202/3125], train_loss:0.111885 Epoch [1/2], Iter [203/3125], train_loss:0.111356 Epoch [1/2], Iter [204/3125], train_loss:0.107932 Epoch [1/2], Iter [205/3125], train_loss:0.143930 Epoch [1/2], Iter [206/3125], train_loss:0.097970 Epoch [1/2], Iter [207/3125], train_loss:0.088761 Epoch [1/2], Iter [208/3125], train_loss:0.131987 Epoch [1/2], Iter [209/3125], train_loss:0.135780 Epoch [1/2], Iter [210/3125], train_loss:0.096630 Epoch [1/2], Iter [211/3125], train_loss:0.128221 Epoch [1/2], Iter [212/3125], train_loss:0.155038 Epoch [1/2], Iter [213/3125], train_loss:0.099105 Epoch [1/2], Iter [214/3125], train_loss:0.111038 Epoch [1/2], Iter [215/3125], train_loss:0.142604 Epoch [1/2], Iter [216/3125], train_loss:0.145580 Epoch [1/2], Iter [217/3125], train_loss:0.111073 Epoch [1/2], Iter [218/3125], train_loss:0.128455 Epoch [1/2], Iter [219/3125], train_loss:0.096221 Epoch [1/2], Iter [220/3125], train_loss:0.086480 Epoch [1/2], Iter [221/3125], train_loss:0.115596 Epoch [1/2], Iter [222/3125], train_loss:0.093819 Epoch [1/2], Iter [223/3125], train_loss:0.068540 Epoch [1/2], Iter [224/3125], train_loss:0.105397 Epoch [1/2], Iter [225/3125], train_loss:0.081237 Epoch [1/2], Iter [226/3125], train_loss:0.127183 Epoch [1/2], Iter [227/3125], train_loss:0.133673 Epoch [1/2], Iter [228/3125], train_loss:0.102121 Epoch [1/2], Iter [229/3125], train_loss:0.124757 Epoch [1/2], Iter [230/3125], train_loss:0.124150 Epoch [1/2], Iter [231/3125], train_loss:0.109962 Epoch [1/2], Iter [232/3125], train_loss:0.121613 Epoch [1/2], Iter [233/3125], train_loss:0.122472 Epoch [1/2], Iter [234/3125], train_loss:0.093679 Epoch [1/2], Iter [235/3125], train_loss:0.104721 Epoch [1/2], Iter [236/3125], train_loss:0.102781 Epoch [1/2], Iter [237/3125], train_loss:0.093572 Epoch [1/2], Iter [238/3125], train_loss:0.094514 Epoch [1/2], Iter [239/3125], train_loss:0.099495 Epoch [1/2], Iter [240/3125], train_loss:0.106375 Epoch [1/2], Iter [241/3125], train_loss:0.111261 Epoch [1/2], Iter [242/3125], train_loss:0.089024 Epoch [1/2], Iter [243/3125], train_loss:0.107102 Epoch [1/2], Iter [244/3125], train_loss:0.098898 Epoch [1/2], Iter [245/3125], train_loss:0.105752 Epoch [1/2], Iter [246/3125], train_loss:0.098761 Epoch [1/2], Iter [247/3125], train_loss:0.110852 Epoch [1/2], Iter [248/3125], train_loss:0.110072 Epoch [1/2], Iter [249/3125], train_loss:0.106461 Epoch [1/2], Iter [250/3125], train_loss:0.123407 Epoch [1/2], Iter [251/3125], train_loss:0.092958 Epoch [1/2], Iter [252/3125], train_loss:0.111045 Epoch [1/2], Iter [253/3125], train_loss:0.129692 Epoch [1/2], Iter [254/3125], train_loss:0.096450 Epoch [1/2], Iter [255/3125], train_loss:0.084925 Epoch [1/2], Iter [256/3125], train_loss:0.141627 Epoch [1/2], Iter [257/3125], train_loss:0.088181 Epoch [1/2], Iter [258/3125], train_loss:0.110038 Epoch [1/2], Iter [259/3125], train_loss:0.132803 Epoch [1/2], Iter [260/3125], train_loss:0.098667 Epoch [1/2], Iter [261/3125], train_loss:0.085513 Epoch [1/2], Iter [262/3125], train_loss:0.121055 Epoch [1/2], Iter [263/3125], train_loss:0.099879 Epoch [1/2], Iter [264/3125], train_loss:0.149433 Epoch [1/2], Iter [265/3125], train_loss:0.116061 Epoch [1/2], Iter [266/3125], train_loss:0.090697 Epoch [1/2], Iter [267/3125], train_loss:0.087413 Epoch [1/2], Iter [268/3125], train_loss:0.146219 Epoch [1/2], Iter [269/3125], train_loss:0.097796 Epoch [1/2], Iter [270/3125], train_loss:0.088155 Epoch [1/2], Iter [271/3125], train_loss:0.107575 Epoch [1/2], Iter [272/3125], train_loss:0.101357 Epoch [1/2], Iter [273/3125], train_loss:0.090542 Epoch [1/2], Iter [274/3125], train_loss:0.092936 Epoch [1/2], Iter [275/3125], train_loss:0.107296 Epoch [1/2], Iter [276/3125], train_loss:0.078067 Epoch [1/2], Iter [277/3125], train_loss:0.099335 Epoch [1/2], Iter [278/3125], train_loss:0.118054 Epoch [1/2], Iter [279/3125], train_loss:0.098823 Epoch [1/2], Iter [280/3125], train_loss:0.100404 Epoch [1/2], Iter [281/3125], train_loss:0.116890 Epoch [1/2], Iter [282/3125], train_loss:0.083836 Epoch [1/2], Iter [283/3125], train_loss:0.134695 Epoch [1/2], Iter [284/3125], train_loss:0.092292 Epoch [1/2], Iter [285/3125], train_loss:0.089188 Epoch [1/2], Iter [286/3125], train_loss:0.103081 Epoch [1/2], Iter [287/3125], train_loss:0.127043 Epoch [1/2], Iter [288/3125], train_loss:0.116650 Epoch [1/2], Iter [289/3125], train_loss:0.121881 Epoch [1/2], Iter [290/3125], train_loss:0.186911 Epoch [1/2], Iter [291/3125], train_loss:0.126078 Epoch [1/2], Iter [292/3125], train_loss:0.091569 Epoch [1/2], Iter [293/3125], train_loss:0.079495 Epoch [1/2], Iter [294/3125], train_loss:0.099240 Epoch [1/2], Iter [295/3125], train_loss:0.118772 Epoch [1/2], Iter [296/3125], train_loss:0.093694 Epoch [1/2], Iter [297/3125], train_loss:0.108655 Epoch [1/2], Iter [298/3125], train_loss:0.095032 Epoch [1/2], Iter [299/3125], train_loss:0.111288 Epoch [1/2], Iter [300/3125], train_loss:0.098187 Epoch [1/2], Iter [301/3125], train_loss:0.097793 Epoch [1/2], Iter [302/3125], train_loss:0.096069 Epoch [1/2], Iter [303/3125], train_loss:0.098303 Epoch [1/2], Iter [304/3125], train_loss:0.053307 Epoch [1/2], Iter [305/3125], train_loss:0.089034 Epoch [1/2], Iter [306/3125], train_loss:0.079592 Epoch [1/2], Iter [307/3125], train_loss:0.127933 Epoch [1/2], Iter [308/3125], train_loss:0.098109 Epoch [1/2], Iter [309/3125], train_loss:0.064728 Epoch [1/2], Iter [310/3125], train_loss:0.173963 Epoch [1/2], Iter [311/3125], train_loss:0.076444 Epoch [1/2], Iter [312/3125], train_loss:0.104166 Epoch [1/2], Iter [313/3125], train_loss:0.098701 Epoch [1/2], Iter [314/3125], train_loss:0.080666 Epoch [1/2], Iter [315/3125], train_loss:0.114130 Epoch [1/2], Iter [316/3125], train_loss:0.077030 Epoch [1/2], Iter [317/3125], train_loss:0.118316 Epoch [1/2], Iter [318/3125], train_loss:0.057820 Epoch [1/2], Iter [319/3125], train_loss:0.126976 Epoch [1/2], Iter [320/3125], train_loss:0.071933 Epoch [1/2], Iter [321/3125], train_loss:0.090767 Epoch [1/2], Iter [322/3125], train_loss:0.090457 Epoch [1/2], Iter [323/3125], train_loss:0.105079 Epoch [1/2], Iter [324/3125], train_loss:0.101791 Epoch [1/2], Iter [325/3125], train_loss:0.106632 Epoch [1/2], Iter [326/3125], train_loss:0.087738 Epoch [1/2], Iter [327/3125], train_loss:0.082531 Epoch [1/2], Iter [328/3125], train_loss:0.123027 Epoch [1/2], Iter [329/3125], train_loss:0.089840 Epoch [1/2], Iter [330/3125], train_loss:0.123866 Epoch [1/2], Iter [331/3125], train_loss:0.139623 Epoch [1/2], Iter [332/3125], train_loss:0.097267 Epoch [1/2], Iter [333/3125], train_loss:0.087837 Epoch [1/2], Iter [334/3125], train_loss:0.079422 Epoch [1/2], Iter [335/3125], train_loss:0.085209 Epoch [1/2], Iter [336/3125], train_loss:0.147867 Epoch [1/2], Iter [337/3125], train_loss:0.149562 Epoch [1/2], Iter [338/3125], train_loss:0.107306 Epoch [1/2], Iter [339/3125], train_loss:0.114367 Epoch [1/2], Iter [340/3125], train_loss:0.075745 Epoch [1/2], Iter [341/3125], train_loss:0.081646 Epoch [1/2], Iter [342/3125], train_loss:0.114543 Epoch [1/2], Iter [343/3125], train_loss:0.107771 Epoch [1/2], Iter [344/3125], train_loss:0.091723 Epoch [1/2], Iter [345/3125], train_loss:0.085628 Epoch [1/2], Iter [346/3125], train_loss:0.069710 Epoch [1/2], Iter [347/3125], train_loss:0.080913 Epoch [1/2], Iter [348/3125], train_loss:0.078024 Epoch [1/2], Iter [349/3125], train_loss:0.132719 Epoch [1/2], Iter [350/3125], train_loss:0.119744 Epoch [1/2], Iter [351/3125], train_loss:0.116647 Epoch [1/2], Iter [352/3125], train_loss:0.109735 Epoch [1/2], Iter [353/3125], train_loss:0.081496 Epoch [1/2], Iter [354/3125], train_loss:0.073368 Epoch [1/2], Iter [355/3125], train_loss:0.111581 Epoch [1/2], Iter [356/3125], train_loss:0.075484 Epoch [1/2], Iter [357/3125], train_loss:0.072975 Epoch [1/2], Iter [358/3125], train_loss:0.062364 Epoch [1/2], Iter [359/3125], train_loss:0.076667 Epoch [1/2], Iter [360/3125], train_loss:0.080340 Epoch [1/2], Iter [361/3125], train_loss:0.063418 Epoch [1/2], Iter [362/3125], train_loss:0.061630 Epoch [1/2], Iter [363/3125], train_loss:0.062767 Epoch [1/2], Iter [364/3125], train_loss:0.084588 Epoch [1/2], Iter [365/3125], train_loss:0.105539 Epoch [1/2], Iter [366/3125], train_loss:0.071236 Epoch [1/2], Iter [367/3125], train_loss:0.087279 Epoch [1/2], Iter [368/3125], train_loss:0.076322 Epoch [1/2], Iter [369/3125], train_loss:0.116615 Epoch [1/2], Iter [370/3125], train_loss:0.100660 Epoch [1/2], Iter [371/3125], train_loss:0.099755 Epoch [1/2], Iter [372/3125], train_loss:0.114215 Epoch [1/2], Iter [373/3125], train_loss:0.112513 Epoch [1/2], Iter [374/3125], train_loss:0.101781 Epoch [1/2], Iter [375/3125], train_loss:0.067294 Epoch [1/2], Iter [376/3125], train_loss:0.098053 Epoch [1/2], Iter [377/3125], train_loss:0.107353 Epoch [1/2], Iter [378/3125], train_loss:0.081777 Epoch [1/2], Iter [379/3125], train_loss:0.080122 Epoch [1/2], Iter [380/3125], train_loss:0.107728 Epoch [1/2], Iter [381/3125], train_loss:0.095094 Epoch [1/2], Iter [382/3125], train_loss:0.083242 Epoch [1/2], Iter [383/3125], train_loss:0.102041 Epoch [1/2], Iter [384/3125], train_loss:0.072550 Epoch [1/2], Iter [385/3125], train_loss:0.088450 Epoch [1/2], Iter [386/3125], train_loss:0.092246 Epoch [1/2], Iter [387/3125], train_loss:0.105446 Epoch [1/2], Iter [388/3125], train_loss:0.127865 Epoch [1/2], Iter [389/3125], train_loss:0.072769 Epoch [1/2], Iter [390/3125], train_loss:0.073997 Epoch [1/2], Iter [391/3125], train_loss:0.066677 Epoch [1/2], Iter [392/3125], train_loss:0.102232 Epoch [1/2], Iter [393/3125], train_loss:0.117690 Epoch [1/2], Iter [394/3125], train_loss:0.084889 Epoch [1/2], Iter [395/3125], train_loss:0.103554 Epoch [1/2], Iter [396/3125], train_loss:0.073418 Epoch [1/2], Iter [397/3125], train_loss:0.096942 Epoch [1/2], Iter [398/3125], train_loss:0.089206 Epoch [1/2], Iter [399/3125], train_loss:0.126500 Epoch [1/2], Iter [400/3125], train_loss:0.119990 Epoch [1/2], Iter [401/3125], train_loss:0.065327 Epoch [1/2], Iter [402/3125], train_loss:0.127086 Epoch [1/2], Iter [403/3125], train_loss:0.089086 Epoch [1/2], Iter [404/3125], train_loss:0.088689 Epoch [1/2], Iter [405/3125], train_loss:0.118437 Epoch [1/2], Iter [406/3125], train_loss:0.111353 Epoch [1/2], Iter [407/3125], train_loss:0.128636 Epoch [1/2], Iter [408/3125], train_loss:0.104118 Epoch [1/2], Iter [409/3125], train_loss:0.090673 Epoch [1/2], Iter [410/3125], train_loss:0.125681 Epoch [1/2], Iter [411/3125], train_loss:0.115205 Epoch [1/2], Iter [412/3125], train_loss:0.077153 Epoch [1/2], Iter [413/3125], train_loss:0.094824 Epoch [1/2], Iter [414/3125], train_loss:0.098783 Epoch [1/2], Iter [415/3125], train_loss:0.087345 Epoch [1/2], Iter [416/3125], train_loss:0.097017 Epoch [1/2], Iter [417/3125], train_loss:0.096015 Epoch [1/2], Iter [418/3125], train_loss:0.075332 Epoch [1/2], Iter [419/3125], train_loss:0.084599 Epoch [1/2], Iter [420/3125], train_loss:0.111044 Epoch [1/2], Iter [421/3125], train_loss:0.093526 Epoch [1/2], Iter [422/3125], train_loss:0.063629 Epoch [1/2], Iter [423/3125], train_loss:0.067428 Epoch [1/2], Iter [424/3125], train_loss:0.079753 Epoch [1/2], Iter [425/3125], train_loss:0.135439 Epoch [1/2], Iter [426/3125], train_loss:0.112857 Epoch [1/2], Iter [427/3125], train_loss:0.074499 Epoch [1/2], Iter [428/3125], train_loss:0.052821 Epoch [1/2], Iter [429/3125], train_loss:0.075851 Epoch [1/2], Iter [430/3125], train_loss:0.104684 Epoch [1/2], Iter [431/3125], train_loss:0.102066 Epoch [1/2], Iter [432/3125], train_loss:0.083621 Epoch [1/2], Iter [433/3125], train_loss:0.064658 Epoch [1/2], Iter [434/3125], train_loss:0.111376 Epoch [1/2], Iter [435/3125], train_loss:0.055758 Epoch [1/2], Iter [436/3125], train_loss:0.128865 Epoch [1/2], Iter [437/3125], train_loss:0.100289 Epoch [1/2], Iter [438/3125], train_loss:0.084247 Epoch [1/2], Iter [439/3125], train_loss:0.073448 Epoch [1/2], Iter [440/3125], train_loss:0.080761 Epoch [1/2], Iter [441/3125], train_loss:0.119340 Epoch [1/2], Iter [442/3125], train_loss:0.173922 Epoch [1/2], Iter [443/3125], train_loss:0.067979 Epoch [1/2], Iter [444/3125], train_loss:0.080348 Epoch [1/2], Iter [445/3125], train_loss:0.132988 Epoch [1/2], Iter [446/3125], train_loss:0.069152 Epoch [1/2], Iter [447/3125], train_loss:0.084873 Epoch [1/2], Iter [448/3125], train_loss:0.088424 Epoch [1/2], Iter [449/3125], train_loss:0.094467 Epoch [1/2], Iter [450/3125], train_loss:0.111121 Epoch [1/2], Iter [451/3125], train_loss:0.067928 Epoch [1/2], Iter [452/3125], train_loss:0.065471 Epoch [1/2], Iter [453/3125], train_loss:0.075276 Epoch [1/2], Iter [454/3125], train_loss:0.076016 Epoch [1/2], Iter [455/3125], train_loss:0.088840 Epoch [1/2], Iter [456/3125], train_loss:0.061118 Epoch [1/2], Iter [457/3125], train_loss:0.079531 Epoch [1/2], Iter [458/3125], train_loss:0.122364 Epoch [1/2], Iter [459/3125], train_loss:0.100249 Epoch [1/2], Iter [460/3125], train_loss:0.073599 Epoch [1/2], Iter [461/3125], train_loss:0.084068 Epoch [1/2], Iter [462/3125], train_loss:0.056314 Epoch [1/2], Iter [463/3125], train_loss:0.079495 Epoch [1/2], Iter [464/3125], train_loss:0.076411 Epoch [1/2], Iter [465/3125], train_loss:0.130830 Epoch [1/2], Iter [466/3125], train_loss:0.086917 Epoch [1/2], Iter [467/3125], train_loss:0.093509 Epoch [1/2], Iter [468/3125], train_loss:0.084006 Epoch [1/2], Iter [469/3125], train_loss:0.070421 Epoch [1/2], Iter [470/3125], train_loss:0.107369 Epoch [1/2], Iter [471/3125], train_loss:0.065467 Epoch [1/2], Iter [472/3125], train_loss:0.069032 Epoch [1/2], Iter [473/3125], train_loss:0.073237 Epoch [1/2], Iter [474/3125], train_loss:0.151757 Epoch [1/2], Iter [475/3125], train_loss:0.097692 Epoch [1/2], Iter [476/3125], train_loss:0.100925 Epoch [1/2], Iter [477/3125], train_loss:0.091285 Epoch [1/2], Iter [478/3125], train_loss:0.103061 Epoch [1/2], Iter [479/3125], train_loss:0.064359 Epoch [1/2], Iter [480/3125], train_loss:0.082491 Epoch [1/2], Iter [481/3125], train_loss:0.057366 Epoch [1/2], Iter [482/3125], train_loss:0.092543 Epoch [1/2], Iter [483/3125], train_loss:0.067777 Epoch [1/2], Iter [484/3125], train_loss:0.067935 Epoch [1/2], Iter [485/3125], train_loss:0.105495 Epoch [1/2], Iter [486/3125], train_loss:0.136604 Epoch [1/2], Iter [487/3125], train_loss:0.092469 Epoch [1/2], Iter [488/3125], train_loss:0.082614 Epoch [1/2], Iter [489/3125], train_loss:0.122642 Epoch [1/2], Iter [490/3125], train_loss:0.064453 Epoch [1/2], Iter [491/3125], train_loss:0.127374 Epoch [1/2], Iter [492/3125], train_loss:0.090427 Epoch [1/2], Iter [493/3125], train_loss:0.076251 Epoch [1/2], Iter [494/3125], train_loss:0.061046 Epoch [1/2], Iter [495/3125], train_loss:0.103997 Epoch [1/2], Iter [496/3125], train_loss:0.109734 Epoch [1/2], Iter [497/3125], train_loss:0.070913 Epoch [1/2], Iter [498/3125], train_loss:0.069599 Epoch [1/2], Iter [499/3125], train_loss:0.078603 Epoch [1/2], Iter [500/3125], train_loss:0.133940 Epoch [1/2], Iter [501/3125], train_loss:0.072970 Epoch [1/2], Iter [502/3125], train_loss:0.075337 Epoch [1/2], Iter [503/3125], train_loss:0.094221 Epoch [1/2], Iter [504/3125], train_loss:0.091344 Epoch [1/2], Iter [505/3125], train_loss:0.085541 Epoch [1/2], Iter [506/3125], train_loss:0.089418 Epoch [1/2], Iter [507/3125], train_loss:0.066250 Epoch [1/2], Iter [508/3125], train_loss:0.112804 Epoch [1/2], Iter [509/3125], train_loss:0.084062 Epoch [1/2], Iter [510/3125], train_loss:0.087550 Epoch [1/2], Iter [511/3125], train_loss:0.073422 Epoch [1/2], Iter [512/3125], train_loss:0.089989 Epoch [1/2], Iter [513/3125], train_loss:0.056597 Epoch [1/2], Iter [514/3125], train_loss:0.084649 Epoch [1/2], Iter [515/3125], train_loss:0.095353 Epoch [1/2], Iter [516/3125], train_loss:0.057524 Epoch [1/2], Iter [517/3125], train_loss:0.086105 Epoch [1/2], Iter [518/3125], train_loss:0.100302 Epoch [1/2], Iter [519/3125], train_loss:0.085303 Epoch [1/2], Iter [520/3125], train_loss:0.097001 Epoch [1/2], Iter [521/3125], train_loss:0.078477 Epoch [1/2], Iter [522/3125], train_loss:0.118421 Epoch [1/2], Iter [523/3125], train_loss:0.094699 Epoch [1/2], Iter [524/3125], train_loss:0.081237 Epoch [1/2], Iter [525/3125], train_loss:0.082480 Epoch [1/2], Iter [526/3125], train_loss:0.082260 Epoch [1/2], Iter [527/3125], train_loss:0.088543 Epoch [1/2], Iter [528/3125], train_loss:0.072576 Epoch [1/2], Iter [529/3125], train_loss:0.095206 Epoch [1/2], Iter [530/3125], train_loss:0.076497 Epoch [1/2], Iter [531/3125], train_loss:0.051827 Epoch [1/2], Iter [532/3125], train_loss:0.051135 Epoch [1/2], Iter [533/3125], train_loss:0.088031 Epoch [1/2], Iter [534/3125], train_loss:0.111677 Epoch [1/2], Iter [535/3125], train_loss:0.070332 Epoch [1/2], Iter [536/3125], train_loss:0.084658 Epoch [1/2], Iter [537/3125], train_loss:0.099877 Epoch [1/2], Iter [538/3125], train_loss:0.083049 Epoch [1/2], Iter [539/3125], train_loss:0.080456 Epoch [1/2], Iter [540/3125], train_loss:0.060653 Epoch [1/2], Iter [541/3125], train_loss:0.126004 Epoch [1/2], Iter [542/3125], train_loss:0.089957 Epoch [1/2], Iter [543/3125], train_loss:0.097005 Epoch [1/2], Iter [544/3125], train_loss:0.098928 Epoch [1/2], Iter [545/3125], train_loss:0.050157 Epoch [1/2], Iter [546/3125], train_loss:0.068912 Epoch [1/2], Iter [547/3125], train_loss:0.105661 Epoch [1/2], Iter [548/3125], train_loss:0.063028 Epoch [1/2], Iter [549/3125], train_loss:0.101849 Epoch [1/2], Iter [550/3125], train_loss:0.087718 Epoch [1/2], Iter [551/3125], train_loss:0.085455 Epoch [1/2], Iter [552/3125], train_loss:0.101876 Epoch [1/2], Iter [553/3125], train_loss:0.069947 Epoch [1/2], Iter [554/3125], train_loss:0.082198 Epoch [1/2], Iter [555/3125], train_loss:0.078910 Epoch [1/2], Iter [556/3125], train_loss:0.071619 Epoch [1/2], Iter [557/3125], train_loss:0.091170 Epoch [1/2], Iter [558/3125], train_loss:0.073899 Epoch [1/2], Iter [559/3125], train_loss:0.097393 Epoch [1/2], Iter [560/3125], train_loss:0.059482 Epoch [1/2], Iter [561/3125], train_loss:0.086727 Epoch [1/2], Iter [562/3125], train_loss:0.067922 Epoch [1/2], Iter [563/3125], train_loss:0.049343 Epoch [1/2], Iter [564/3125], train_loss:0.079434 Epoch [1/2], Iter [565/3125], train_loss:0.082183 Epoch [1/2], Iter [566/3125], train_loss:0.093476 Epoch [1/2], Iter [567/3125], train_loss:0.078752 Epoch [1/2], Iter [568/3125], train_loss:0.091465 Epoch [1/2], Iter [569/3125], train_loss:0.089662 Epoch [1/2], Iter [570/3125], train_loss:0.080252 Epoch [1/2], Iter [571/3125], train_loss:0.068077 Epoch [1/2], Iter [572/3125], train_loss:0.061509 Epoch [1/2], Iter [573/3125], train_loss:0.085185 Epoch [1/2], Iter [574/3125], train_loss:0.079471 Epoch [1/2], Iter [575/3125], train_loss:0.053422 Epoch [1/2], Iter [576/3125], train_loss:0.077580 Epoch [1/2], Iter [577/3125], train_loss:0.097711 Epoch [1/2], Iter [578/3125], train_loss:0.088529 Epoch [1/2], Iter [579/3125], train_loss:0.078072 Epoch [1/2], Iter [580/3125], train_loss:0.066475 Epoch [1/2], Iter [581/3125], train_loss:0.100759 Epoch [1/2], Iter [582/3125], train_loss:0.059701 Epoch [1/2], Iter [583/3125], train_loss:0.109780 Epoch [1/2], Iter [584/3125], train_loss:0.091762 Epoch [1/2], Iter [585/3125], train_loss:0.092769 Epoch [1/2], Iter [586/3125], train_loss:0.087646 Epoch [1/2], Iter [587/3125], train_loss:0.077475 Epoch [1/2], Iter [588/3125], train_loss:0.082140 Epoch [1/2], Iter [589/3125], train_loss:0.064143 Epoch [1/2], Iter [590/3125], train_loss:0.118475 Epoch [1/2], Iter [591/3125], train_loss:0.061369 Epoch [1/2], Iter [592/3125], train_loss:0.103518 Epoch [1/2], Iter [593/3125], train_loss:0.109588 Epoch [1/2], Iter [594/3125], train_loss:0.075540 Epoch [1/2], Iter [595/3125], train_loss:0.066279 Epoch [1/2], Iter [596/3125], train_loss:0.084220 Epoch [1/2], Iter [597/3125], train_loss:0.093858 Epoch [1/2], Iter [598/3125], train_loss:0.064187 Epoch [1/2], Iter [599/3125], train_loss:0.066326 Epoch [1/2], Iter [600/3125], train_loss:0.081327 Epoch [1/2], Iter [601/3125], train_loss:0.083892 Epoch [1/2], Iter [602/3125], train_loss:0.072193 Epoch [1/2], Iter [603/3125], train_loss:0.070572 Epoch [1/2], Iter [604/3125], train_loss:0.099174 Epoch [1/2], Iter [605/3125], train_loss:0.073340 Epoch [1/2], Iter [606/3125], train_loss:0.075066 Epoch [1/2], Iter [607/3125], train_loss:0.089540 Epoch [1/2], Iter [608/3125], train_loss:0.087063 Epoch [1/2], Iter [609/3125], train_loss:0.067917 Epoch [1/2], Iter [610/3125], train_loss:0.078777 Epoch [1/2], Iter [611/3125], train_loss:0.073020 Epoch [1/2], Iter [612/3125], train_loss:0.053916 Epoch [1/2], Iter [613/3125], train_loss:0.099749 Epoch [1/2], Iter [614/3125], train_loss:0.076472 Epoch [1/2], Iter [615/3125], train_loss:0.092774 Epoch [1/2], Iter [616/3125], train_loss:0.072519 Epoch [1/2], Iter [617/3125], train_loss:0.115796 Epoch [1/2], Iter [618/3125], train_loss:0.111423 Epoch [1/2], Iter [619/3125], train_loss:0.035930 Epoch [1/2], Iter [620/3125], train_loss:0.053881 Epoch [1/2], Iter [621/3125], train_loss:0.121114 Epoch [1/2], Iter [622/3125], train_loss:0.121951 Epoch [1/2], Iter [623/3125], train_loss:0.073308 Epoch [1/2], Iter [624/3125], train_loss:0.048398 Epoch [1/2], Iter [625/3125], train_loss:0.107412 Epoch [1/2], Iter [626/3125], train_loss:0.068145 Epoch [1/2], Iter [627/3125], train_loss:0.077340 Epoch [1/2], Iter [628/3125], train_loss:0.085913 Epoch [1/2], Iter [629/3125], train_loss:0.085568 Epoch [1/2], Iter [630/3125], train_loss:0.075331 Epoch [1/2], Iter [631/3125], train_loss:0.063729 Epoch [1/2], Iter [632/3125], train_loss:0.096395 Epoch [1/2], Iter [633/3125], train_loss:0.091692 Epoch [1/2], Iter [634/3125], train_loss:0.087556 Epoch [1/2], Iter [635/3125], train_loss:0.128987 Epoch [1/2], Iter [636/3125], train_loss:0.078282 Epoch [1/2], Iter [637/3125], train_loss:0.072686 Epoch [1/2], Iter [638/3125], train_loss:0.101055 Epoch [1/2], Iter [639/3125], train_loss:0.088135 Epoch [1/2], Iter [640/3125], train_loss:0.076548 Epoch [1/2], Iter [641/3125], train_loss:0.074535 Epoch [1/2], Iter [642/3125], train_loss:0.133764 Epoch [1/2], Iter [643/3125], train_loss:0.081785 Epoch [1/2], Iter [644/3125], train_loss:0.081873 Epoch [1/2], Iter [645/3125], train_loss:0.052027 Epoch [1/2], Iter [646/3125], train_loss:0.065710 Epoch [1/2], Iter [647/3125], train_loss:0.066639 Epoch [1/2], Iter [648/3125], train_loss:0.077497 Epoch [1/2], Iter [649/3125], train_loss:0.071994 Epoch [1/2], Iter [650/3125], train_loss:0.077160 Epoch [1/2], Iter [651/3125], train_loss:0.088668 Epoch [1/2], Iter [652/3125], train_loss:0.091575 Epoch [1/2], Iter [653/3125], train_loss:0.063036 Epoch [1/2], Iter [654/3125], train_loss:0.077080 Epoch [1/2], Iter [655/3125], train_loss:0.120097 Epoch [1/2], Iter [656/3125], train_loss:0.057079 Epoch [1/2], Iter [657/3125], train_loss:0.078749 Epoch [1/2], Iter [658/3125], train_loss:0.080975 Epoch [1/2], Iter [659/3125], train_loss:0.084412 Epoch [1/2], Iter [660/3125], train_loss:0.081507 Epoch [1/2], Iter [661/3125], train_loss:0.106032 Epoch [1/2], Iter [662/3125], train_loss:0.044990 Epoch [1/2], Iter [663/3125], train_loss:0.071733 Epoch [1/2], Iter [664/3125], train_loss:0.068678 Epoch [1/2], Iter [665/3125], train_loss:0.060852 Epoch [1/2], Iter [666/3125], train_loss:0.061496 Epoch [1/2], Iter [667/3125], train_loss:0.099616 Epoch [1/2], Iter [668/3125], train_loss:0.043187 Epoch [1/2], Iter [669/3125], train_loss:0.042735 Epoch [1/2], Iter [670/3125], train_loss:0.063698 Epoch [1/2], Iter [671/3125], train_loss:0.054137 Epoch [1/2], Iter [672/3125], train_loss:0.122349 Epoch [1/2], Iter [673/3125], train_loss:0.045259 Epoch [1/2], Iter [674/3125], train_loss:0.096469 Epoch [1/2], Iter [675/3125], train_loss:0.058725 Epoch [1/2], Iter [676/3125], train_loss:0.092602 Epoch [1/2], Iter [677/3125], train_loss:0.066935 Epoch [1/2], Iter [678/3125], train_loss:0.077298 Epoch [1/2], Iter [679/3125], train_loss:0.110552 Epoch [1/2], Iter [680/3125], train_loss:0.048738 Epoch [1/2], Iter [681/3125], train_loss:0.096448 Epoch [1/2], Iter [682/3125], train_loss:0.110349 Epoch [1/2], Iter [683/3125], train_loss:0.119194 Epoch [1/2], Iter [684/3125], train_loss:0.078200 Epoch [1/2], Iter [685/3125], train_loss:0.090346 Epoch [1/2], Iter [686/3125], train_loss:0.067279 Epoch [1/2], Iter [687/3125], train_loss:0.056750 Epoch [1/2], Iter [688/3125], train_loss:0.103682 Epoch [1/2], Iter [689/3125], train_loss:0.070194 Epoch [1/2], Iter [690/3125], train_loss:0.077888 Epoch [1/2], Iter [691/3125], train_loss:0.089339 Epoch [1/2], Iter [692/3125], train_loss:0.069433 Epoch [1/2], Iter [693/3125], train_loss:0.062627 Epoch [1/2], Iter [694/3125], train_loss:0.088834 Epoch [1/2], Iter [695/3125], train_loss:0.057176 Epoch [1/2], Iter [696/3125], train_loss:0.062857 Epoch [1/2], Iter [697/3125], train_loss:0.107247 Epoch [1/2], Iter [698/3125], train_loss:0.075563 Epoch [1/2], Iter [699/3125], train_loss:0.075217 Epoch [1/2], Iter [700/3125], train_loss:0.073498 Epoch [1/2], Iter [701/3125], train_loss:0.084294 Epoch [1/2], Iter [702/3125], train_loss:0.055456 Epoch [1/2], Iter [703/3125], train_loss:0.101781 Epoch [1/2], Iter [704/3125], train_loss:0.102988 Epoch [1/2], Iter [705/3125], train_loss:0.090018 Epoch [1/2], Iter [706/3125], train_loss:0.071555 Epoch [1/2], Iter [707/3125], train_loss:0.066634 Epoch [1/2], Iter [708/3125], train_loss:0.075814 Epoch [1/2], Iter [709/3125], train_loss:0.077288 Epoch [1/2], Iter [710/3125], train_loss:0.104503 Epoch [1/2], Iter [711/3125], train_loss:0.067886 Epoch [1/2], Iter [712/3125], train_loss:0.079606 Epoch [1/2], Iter [713/3125], train_loss:0.071527 Epoch [1/2], Iter [714/3125], train_loss:0.085514 Epoch [1/2], Iter [715/3125], train_loss:0.057681 Epoch [1/2], Iter [716/3125], train_loss:0.078999 Epoch [1/2], Iter [717/3125], train_loss:0.071168 Epoch [1/2], Iter [718/3125], train_loss:0.089825 Epoch [1/2], Iter [719/3125], train_loss:0.045149 Epoch [1/2], Iter [720/3125], train_loss:0.084063 Epoch [1/2], Iter [721/3125], train_loss:0.066844 Epoch [1/2], Iter [722/3125], train_loss:0.111551 Epoch [1/2], Iter [723/3125], train_loss:0.090148 Epoch [1/2], Iter [724/3125], train_loss:0.088762 Epoch [1/2], Iter [725/3125], train_loss:0.053935 Epoch [1/2], Iter [726/3125], train_loss:0.097556 Epoch [1/2], Iter [727/3125], train_loss:0.057640 Epoch [1/2], Iter [728/3125], train_loss:0.099852 Epoch [1/2], Iter [729/3125], train_loss:0.072951 Epoch [1/2], Iter [730/3125], train_loss:0.086131 Epoch [1/2], Iter [731/3125], train_loss:0.076418 Epoch [1/2], Iter [732/3125], train_loss:0.093934 Epoch [1/2], Iter [733/3125], train_loss:0.086792 Epoch [1/2], Iter [734/3125], train_loss:0.076435 Epoch [1/2], Iter [735/3125], train_loss:0.098343 Epoch [1/2], Iter [736/3125], train_loss:0.064591 Epoch [1/2], Iter [737/3125], train_loss:0.136798 Epoch [1/2], Iter [738/3125], train_loss:0.086149 Epoch [1/2], Iter [739/3125], train_loss:0.071737 Epoch [1/2], Iter [740/3125], train_loss:0.064806 Epoch [1/2], Iter [741/3125], train_loss:0.080049 Epoch [1/2], Iter [742/3125], train_loss:0.096013 Epoch [1/2], Iter [743/3125], train_loss:0.060116 Epoch [1/2], Iter [744/3125], train_loss:0.067535 Epoch [1/2], Iter [745/3125], train_loss:0.093100 Epoch [1/2], Iter [746/3125], train_loss:0.072566 Epoch [1/2], Iter [747/3125], train_loss:0.103533 Epoch [1/2], Iter [748/3125], train_loss:0.083829 Epoch [1/2], Iter [749/3125], train_loss:0.058632 Epoch [1/2], Iter [750/3125], train_loss:0.063049 Epoch [1/2], Iter [751/3125], train_loss:0.072190 Epoch [1/2], Iter [752/3125], train_loss:0.081107 Epoch [1/2], Iter [753/3125], train_loss:0.073657 Epoch [1/2], Iter [754/3125], train_loss:0.063324 Epoch [1/2], Iter [755/3125], train_loss:0.061974 Epoch [1/2], Iter [756/3125], train_loss:0.064494 Epoch [1/2], Iter [757/3125], train_loss:0.077813 Epoch [1/2], Iter [758/3125], train_loss:0.070678 Epoch [1/2], Iter [759/3125], train_loss:0.062416 Epoch [1/2], Iter [760/3125], train_loss:0.062071 Epoch [1/2], Iter [761/3125], train_loss:0.030896 Epoch [1/2], Iter [762/3125], train_loss:0.054023 Epoch [1/2], Iter [763/3125], train_loss:0.123419 Epoch [1/2], Iter [764/3125], train_loss:0.080511 Epoch [1/2], Iter [765/3125], train_loss:0.088166 Epoch [1/2], Iter [766/3125], train_loss:0.044754 Epoch [1/2], Iter [767/3125], train_loss:0.065380 Epoch [1/2], Iter [768/3125], train_loss:0.062831 Epoch [1/2], Iter [769/3125], train_loss:0.082807 Epoch [1/2], Iter [770/3125], train_loss:0.106045 Epoch [1/2], Iter [771/3125], train_loss:0.039265 Epoch [1/2], Iter [772/3125], train_loss:0.040538 Epoch [1/2], Iter [773/3125], train_loss:0.064032 Epoch [1/2], Iter [774/3125], train_loss:0.098438 Epoch [1/2], Iter [775/3125], train_loss:0.044762 Epoch [1/2], Iter [776/3125], train_loss:0.059482 Epoch [1/2], Iter [777/3125], train_loss:0.071769 Epoch [1/2], Iter [778/3125], train_loss:0.081381 Epoch [1/2], Iter [779/3125], train_loss:0.077327 Epoch [1/2], Iter [780/3125], train_loss:0.062736 Epoch [1/2], Iter [781/3125], train_loss:0.093462 Epoch [1/2], Iter [782/3125], train_loss:0.072988 Epoch [1/2], Iter [783/3125], train_loss:0.060638 Epoch [1/2], Iter [784/3125], train_loss:0.093783 Epoch [1/2], Iter [785/3125], train_loss:0.071993 Epoch [1/2], Iter [786/3125], train_loss:0.100763 Epoch [1/2], Iter [787/3125], train_loss:0.072992 Epoch [1/2], Iter [788/3125], train_loss:0.092503 Epoch [1/2], Iter [789/3125], train_loss:0.087834 Epoch [1/2], Iter [790/3125], train_loss:0.112599 Epoch [1/2], Iter [791/3125], train_loss:0.078161 Epoch [1/2], Iter [792/3125], train_loss:0.080000 Epoch [1/2], Iter [793/3125], train_loss:0.043560 Epoch [1/2], Iter [794/3125], train_loss:0.080028 Epoch [1/2], Iter [795/3125], train_loss:0.104163 Epoch [1/2], Iter [796/3125], train_loss:0.064733 Epoch [1/2], Iter [797/3125], train_loss:0.051298 Epoch [1/2], Iter [798/3125], train_loss:0.069372 Epoch [1/2], Iter [799/3125], train_loss:0.044411 Epoch [1/2], Iter [800/3125], train_loss:0.071995 Epoch [1/2], Iter [801/3125], train_loss:0.058943 Epoch [1/2], Iter [802/3125], train_loss:0.075079 Epoch [1/2], Iter [803/3125], train_loss:0.065944 Epoch [1/2], Iter [804/3125], train_loss:0.054138 Epoch [1/2], Iter [805/3125], train_loss:0.061844 Epoch [1/2], Iter [806/3125], train_loss:0.075249 Epoch [1/2], Iter [807/3125], train_loss:0.090213 Epoch [1/2], Iter [808/3125], train_loss:0.106900 Epoch [1/2], Iter [809/3125], train_loss:0.087969 Epoch [1/2], Iter [810/3125], train_loss:0.082871 Epoch [1/2], Iter [811/3125], train_loss:0.083834 Epoch [1/2], Iter [812/3125], train_loss:0.067130 Epoch [1/2], Iter [813/3125], train_loss:0.081398 Epoch [1/2], Iter [814/3125], train_loss:0.075722 Epoch [1/2], Iter [815/3125], train_loss:0.102066 Epoch [1/2], Iter [816/3125], train_loss:0.095934 Epoch [1/2], Iter [817/3125], train_loss:0.073375 Epoch [1/2], Iter [818/3125], train_loss:0.114593 Epoch [1/2], Iter [819/3125], train_loss:0.080349 Epoch [1/2], Iter [820/3125], train_loss:0.093809 Epoch [1/2], Iter [821/3125], train_loss:0.057519 Epoch [1/2], Iter [822/3125], train_loss:0.060332 Epoch [1/2], Iter [823/3125], train_loss:0.069837 Epoch [1/2], Iter [824/3125], train_loss:0.081108 Epoch [1/2], Iter [825/3125], train_loss:0.064217 Epoch [1/2], Iter [826/3125], train_loss:0.077845 Epoch [1/2], Iter [827/3125], train_loss:0.062394 Epoch [1/2], Iter [828/3125], train_loss:0.078574 Epoch [1/2], Iter [829/3125], train_loss:0.077207 Epoch [1/2], Iter [830/3125], train_loss:0.052881 Epoch [1/2], Iter [831/3125], train_loss:0.105506 Epoch [1/2], Iter [832/3125], train_loss:0.085921 Epoch [1/2], Iter [833/3125], train_loss:0.062045 Epoch [1/2], Iter [834/3125], train_loss:0.078639 Epoch [1/2], Iter [835/3125], train_loss:0.091643 Epoch [1/2], Iter [836/3125], train_loss:0.070230 Epoch [1/2], Iter [837/3125], train_loss:0.061350 Epoch [1/2], Iter [838/3125], train_loss:0.100740 Epoch [1/2], Iter [839/3125], train_loss:0.085829 Epoch [1/2], Iter [840/3125], train_loss:0.060633 Epoch [1/2], Iter [841/3125], train_loss:0.071548 Epoch [1/2], Iter [842/3125], train_loss:0.083561 Epoch [1/2], Iter [843/3125], train_loss:0.066375 Epoch [1/2], Iter [844/3125], train_loss:0.100119 Epoch [1/2], Iter [845/3125], train_loss:0.088684 Epoch [1/2], Iter [846/3125], train_loss:0.055062 Epoch [1/2], Iter [847/3125], train_loss:0.074315 Epoch [1/2], Iter [848/3125], train_loss:0.069999 Epoch [1/2], Iter [849/3125], train_loss:0.035895 Epoch [1/2], Iter [850/3125], train_loss:0.037956 Epoch [1/2], Iter [851/3125], train_loss:0.100308 Epoch [1/2], Iter [852/3125], train_loss:0.067342 Epoch [1/2], Iter [853/3125], train_loss:0.100173 Epoch [1/2], Iter [854/3125], train_loss:0.095898 Epoch [1/2], Iter [855/3125], train_loss:0.037566 Epoch [1/2], Iter [856/3125], train_loss:0.109127 Epoch [1/2], Iter [857/3125], train_loss:0.086012 Epoch [1/2], Iter [858/3125], train_loss:0.042612 Epoch [1/2], Iter [859/3125], train_loss:0.095185 Epoch [1/2], Iter [860/3125], train_loss:0.041484 Epoch [1/2], Iter [861/3125], train_loss:0.077971 Epoch [1/2], Iter [862/3125], train_loss:0.077879 Epoch [1/2], Iter [863/3125], train_loss:0.074702 Epoch [1/2], Iter [864/3125], train_loss:0.065591 Epoch [1/2], Iter [865/3125], train_loss:0.044043 Epoch [1/2], Iter [866/3125], train_loss:0.086357 Epoch [1/2], Iter [867/3125], train_loss:0.076382 Epoch [1/2], Iter [868/3125], train_loss:0.126473 Epoch [1/2], Iter [869/3125], train_loss:0.111014 Epoch [1/2], Iter [870/3125], train_loss:0.053985 Epoch [1/2], Iter [871/3125], train_loss:0.066713 Epoch [1/2], Iter [872/3125], train_loss:0.092710 Epoch [1/2], Iter [873/3125], train_loss:0.072230 Epoch [1/2], Iter [874/3125], train_loss:0.072040 Epoch [1/2], Iter [875/3125], train_loss:0.128901 Epoch [1/2], Iter [876/3125], train_loss:0.094567 Epoch [1/2], Iter [877/3125], train_loss:0.068851 Epoch [1/2], Iter [878/3125], train_loss:0.124406 Epoch [1/2], Iter [879/3125], train_loss:0.060597 Epoch [1/2], Iter [880/3125], train_loss:0.053799 Epoch [1/2], Iter [881/3125], train_loss:0.089491 Epoch [1/2], Iter [882/3125], train_loss:0.056719 Epoch [1/2], Iter [883/3125], train_loss:0.076862 Epoch [1/2], Iter [884/3125], train_loss:0.068522 Epoch [1/2], Iter [885/3125], train_loss:0.104225 Epoch [1/2], Iter [886/3125], train_loss:0.082506 Epoch [1/2], Iter [887/3125], train_loss:0.052971 Epoch [1/2], Iter [888/3125], train_loss:0.059774 Epoch [1/2], Iter [889/3125], train_loss:0.086975 Epoch [1/2], Iter [890/3125], train_loss:0.056777 Epoch [1/2], Iter [891/3125], train_loss:0.087735 Epoch [1/2], Iter [892/3125], train_loss:0.070902 Epoch [1/2], Iter [893/3125], train_loss:0.111826 Epoch [1/2], Iter [894/3125], train_loss:0.059331 Epoch [1/2], Iter [895/3125], train_loss:0.094341 Epoch [1/2], Iter [896/3125], train_loss:0.051812 Epoch [1/2], Iter [897/3125], train_loss:0.112401 Epoch [1/2], Iter [898/3125], train_loss:0.061509 Epoch [1/2], Iter [899/3125], train_loss:0.064180 Epoch [1/2], Iter [900/3125], train_loss:0.038741 Epoch [1/2], Iter [901/3125], train_loss:0.053055 Epoch [1/2], Iter [902/3125], train_loss:0.054728 Epoch [1/2], Iter [903/3125], train_loss:0.078024 Epoch [1/2], Iter [904/3125], train_loss:0.044780 Epoch [1/2], Iter [905/3125], train_loss:0.089853 Epoch [1/2], Iter [906/3125], train_loss:0.101245 Epoch [1/2], Iter [907/3125], train_loss:0.052246 Epoch [1/2], Iter [908/3125], train_loss:0.071536 Epoch [1/2], Iter [909/3125], train_loss:0.075075 Epoch [1/2], Iter [910/3125], train_loss:0.074174 Epoch [1/2], Iter [911/3125], train_loss:0.072227 Epoch [1/2], Iter [912/3125], train_loss:0.101729 Epoch [1/2], Iter [913/3125], train_loss:0.071239 Epoch [1/2], Iter [914/3125], train_loss:0.101731 Epoch [1/2], Iter [915/3125], train_loss:0.066899 Epoch [1/2], Iter [916/3125], train_loss:0.042201 Epoch [1/2], Iter [917/3125], train_loss:0.057565 Epoch [1/2], Iter [918/3125], train_loss:0.043300 Epoch [1/2], Iter [919/3125], train_loss:0.101549 Epoch [1/2], Iter [920/3125], train_loss:0.080133 Epoch [1/2], Iter [921/3125], train_loss:0.088354 Epoch [1/2], Iter [922/3125], train_loss:0.079794 Epoch [1/2], Iter [923/3125], train_loss:0.082035 Epoch [1/2], Iter [924/3125], train_loss:0.043397 Epoch [1/2], Iter [925/3125], train_loss:0.101342 Epoch [1/2], Iter [926/3125], train_loss:0.070656 Epoch [1/2], Iter [927/3125], train_loss:0.068928 Epoch [1/2], Iter [928/3125], train_loss:0.086801 Epoch [1/2], Iter [929/3125], train_loss:0.059911 Epoch [1/2], Iter [930/3125], train_loss:0.079392 Epoch [1/2], Iter [931/3125], train_loss:0.083579 Epoch [1/2], Iter [932/3125], train_loss:0.051975 Epoch [1/2], Iter [933/3125], train_loss:0.083430 Epoch [1/2], Iter [934/3125], train_loss:0.066587 Epoch [1/2], Iter [935/3125], train_loss:0.087434 Epoch [1/2], Iter [936/3125], train_loss:0.087518 Epoch [1/2], Iter [937/3125], train_loss:0.075971 Epoch [1/2], Iter [938/3125], train_loss:0.060921 Epoch [1/2], Iter [939/3125], train_loss:0.059609 Epoch [1/2], Iter [940/3125], train_loss:0.053374 Epoch [1/2], Iter [941/3125], train_loss:0.059154 Epoch [1/2], Iter [942/3125], train_loss:0.037160 Epoch [1/2], Iter [943/3125], train_loss:0.094307 Epoch [1/2], Iter [944/3125], train_loss:0.069412 Epoch [1/2], Iter [945/3125], train_loss:0.093543 Epoch [1/2], Iter [946/3125], train_loss:0.057713 Epoch [1/2], Iter [947/3125], train_loss:0.050613 Epoch [1/2], Iter [948/3125], train_loss:0.101521 Epoch [1/2], Iter [949/3125], train_loss:0.099398 Epoch [1/2], Iter [950/3125], train_loss:0.098440 Epoch [1/2], Iter [951/3125], train_loss:0.036929 Epoch [1/2], Iter [952/3125], train_loss:0.062752 Epoch [1/2], Iter [953/3125], train_loss:0.048165 Epoch [1/2], Iter [954/3125], train_loss:0.075584 Epoch [1/2], Iter [955/3125], train_loss:0.080492 Epoch [1/2], Iter [956/3125], train_loss:0.087700 Epoch [1/2], Iter [957/3125], train_loss:0.043403 Epoch [1/2], Iter [958/3125], train_loss:0.069215 Epoch [1/2], Iter [959/3125], train_loss:0.044430 Epoch [1/2], Iter [960/3125], train_loss:0.066561 Epoch [1/2], Iter [961/3125], train_loss:0.106058 Epoch [1/2], Iter [962/3125], train_loss:0.066117 Epoch [1/2], Iter [963/3125], train_loss:0.075821 Epoch [1/2], Iter [964/3125], train_loss:0.076452 Epoch [1/2], Iter [965/3125], train_loss:0.068917 Epoch [1/2], Iter [966/3125], train_loss:0.073009 Epoch [1/2], Iter [967/3125], train_loss:0.066570 Epoch [1/2], Iter [968/3125], train_loss:0.078626 Epoch [1/2], Iter [969/3125], train_loss:0.071714 Epoch [1/2], Iter [970/3125], train_loss:0.073739 Epoch [1/2], Iter [971/3125], train_loss:0.036135 Epoch [1/2], Iter [972/3125], train_loss:0.077290 Epoch [1/2], Iter [973/3125], train_loss:0.108345 Epoch [1/2], Iter [974/3125], train_loss:0.085700 Epoch [1/2], Iter [975/3125], train_loss:0.081209 Epoch [1/2], Iter [976/3125], train_loss:0.034647 Epoch [1/2], Iter [977/3125], train_loss:0.056354 Epoch [1/2], Ite
http://www.tj-hxxt.cn/news/231260.html

相关文章:

  • jquery+html5 网站后台管理页面模板网站推广app软件
  • 鹤壁做网站公司代做ppt平台
  • 我的网站突然找不到网页了安徽省合肥市建设局网站
  • wap网站如何制作网站 防采集
  • 网站新闻发布后前台不显示天津制作网页
  • 国外设计网站app吗网站开发一个支付功能要好多钱
  • 青海西宁高端网站建设用什么软件做网站最好
  • 宝安专业手机网站设计公司wordpress页面发布
  • 旅游网站源码unity 做网站
  • 类似于美团的网站怎么做馆陶企业做网站推广
  • 易书网上书城网站建设方案厦门微信网站开发
  • 网站搭建策划书网站地图可以自己做么
  • 中小型网站建设与管理总结游戏网站建设与策划书
  • 前端做网站一般用什么框架二维码制作工具
  • 买网站去哪买WordPress图片方案
  • 景点介绍网站模板宁波做网站优化多少钱
  • 学做系统的网站用插件做网站
  • 百度收录网站多久旺道seo优化
  • 网站开发语言检测免费做封面网站
  • org是国外的网站吗手机营销软件
  • 如何做局域网网站郑州seo优化
  • 湖北网站建设的释义网站建设龙华
  • 懂得做网站还可以做什么兼职建设工程司法解释(二)
  • 做网站获流量平面设计学习
  • 用vs做网站表格向上居中网站分页导航
  • 优秀网站网址有域名和主机怎么做网站
  • 青岛网站建设大全内蒙古头条新闻发布信息
  • 手把手教你做网站视频沧州网站优化价格
  • 国外设计网站参考福州seo排名外包
  • 对ui设计的理解和认识seo服务公司上海