当前位置：首页 > news >正文

网站评论怎么做的常州妇幼做的无创在哪个网站查

news 2025/10/17 13:16:58

网站评论怎么做的,常州妇幼做的无创在哪个网站查,五屏网站建设如何,pageadmin官网svm和决策树基本知识以及模型评价以及模型保存文章目录一、SVM1.1#xff0c;常用属性函数二、决策树2.1#xff0c;常用属性函数2.2#xff0c;决策树可视化2.3#xff0c;决策树解释 3#xff0c;模型评价3.1#xff0c;方面一#xff08;评价指标#xff09;3.2常用属性函数二、决策树2.1常用属性函数2.2决策树可视化2.3决策树解释 3模型评价3.1方面一评价指标3.2方面二不同数据规模下模型的性能 4模型保存与读取4.1模型的保存4.2模型的读取一、SVM 1.1常用属性函数 predict返回一个数组表示个测试样本的类别。 predict_probe返回一个数组表示测试样本属于每种类型的概率。 decision_function返回一个数组表示测试样本到对应类型的超平面距离。 get_params获取当前svm函数的各项参数值。 score获取预测结果准确率。 set_params设置SVC函数的参数 clf.n_support_:各类的支持向量的个数 clf.support_各类的支持向量在训练样本中的索引 clf.support_vectors_全部支持向量原文链接二、决策树 2.1常用属性函数 classes_类标签单输出问题或类标签数组的列表多输出问题。 feature_importances_特征重要度。 max_features_max_features的推断值。 n_classes_类数用于单输出问题或包含每个输出的类数的列表用于多输出问题。 n_features_执行拟合时的特征数量。 n_outputs_执行拟合时的输出数量。 tree_ 训练拟合fit(train_x, train_y) 预测predict(X)返回标签、predict_log_proba(X)、predict_proba(X)返回概率每个点的概率和为1一般取predict_proba(X)[:,1] 评分返回平均准确度score(test_x, test_y)。等效于准确率accuracy_score 参数类获取分类器的参数get_params([deep])、设置分类器的参数set_params(params)。原文链接 2.2决策树可视化 from sklearn.datasets import load_iris from sklearn import tree import matplotlib.pyplot as plt# 加载鸢尾花数据集 iris load_iris()# 创建决策树模型 model tree.DecisionTreeClassifier(max_depth2) model.fit(iris.data, iris.target)# 可视化决策树 feature_names iris.feature_names plt.figure(figsize(12,12)) _ tree.plot_tree(model, feature_namesfeature_names, class_namesiris.target_names, filledTrue, roundedTrue) plt.show()2.3决策树解释节点含义 petal length (cm)2.45表示数据特征petal width (cm)0.75当petal width (cm)0.75进入左边分支否则进入右边分支gini表示该节点的基尼系数samples表示该节点的样本数value表示各分类的样本数例如根节点中的[34,32,39]表示分类为Setosa的样本数为34分类为Versicolour的样本数为32分类为Virginica的样本数量为39class表示该区块被划分为的类别它是由value中样本数较多的类别决定的例如根节点中分类为Virginica的样本数最多所以该节点的分类为Virginica依此类推。每一个颜色代表一个分类随着层数的增加颜色也会变深。原文链接 3模型评价 3.1方面一评价指标准确率准确率是分类问题中最常用的评估指标用于衡量模型的正确预测率。精确率和召回率精确率和召回率用于评估二分类模型的性能。精确率是指预测为正例的样本中实际为正例的比例召回率是指实际为正例的样本中被正确预测为正例的比例。F1分数 F1分数是精确率和召回率的加权平均值用于评估二分类模型的性能。 # 其他的指标 def accuracy_precision_recall_f1(y_true, y_pred):# 1.准确率accuracy accuracy_score(y_true, y_pred)# 2.精确率和召回率precision precision_score(y_true, y_pred)recall recall_score(y_true, y_pred)# 3.F1分数f1 f1_score(y_true, y_pred)return [accuracy, precision, recall, f1]print(accuracy_precision_recall_f1(test_label_shulle_scaler, test_data_predict))原文链接混淆矩阵混淆矩阵是一个二维矩阵用于表示分类模型的性能。它将预测结果分为真正例True Positive、假正例False Positive、真反例True Negative和假反例False Negative四类分别对应矩阵的四个象限。 def draw_confusion_matrix(label_true, label_pred, label_name, normlize, titleConfusion Matrix, pdf_save_pathNone,dpi100):param label_true: 真实标签比如[0,1,2,7,4,5,...]param label_pred: 预测标签比如[0,5,4,2,1,4,...]param label_name: 标签名字比如[cat,dog,flower,...]param normlize: 是否设元素为百分比形式param title: 图标题param pdf_save_path: 是否保存是则为保存路径pdf_save_pathxxx.png | xxx.pdf | ...等其他plt.savefig支持的保存格式param dpi: 保存到文件的分辨率论文一般要求至少300dpireturn:exampledraw_confusion_matrix(label_truey_gt,label_predy_pred,label_name[Angry, Disgust, Fear, Happy, Sad, Surprise, Neutral],normlizeTrue,titleConfusion Matrix on Fer2013,pdf_save_pathConfusion_Matrix_on_Fer2013.png,dpi300)cm1 confusion_matrix(label_true, label_pred)cm confusion_matrix(label_true, label_pred)print(cm)if normlize:row_sums np.sum(cm, axis1)cm cm / row_sums[:, np.newaxis]cm cm.Tcm1 cm1.Tplt.imshow(cm, cmapBlues)plt.title(title)# plt.xlabel(Predict label)# plt.ylabel(Truth label)plt.xlabel(预测标签)plt.ylabel(真实标签)plt.yticks(range(label_name.__len__()), label_name)plt.xticks(range(label_name.__len__()), label_name, rotation45)plt.tight_layout()plt.colorbar()for i in range(label_name.__len__()):for j in range(label_name.__len__()):color (1, 1, 1) if i j else (0, 0, 0) # 对角线字体白色其他黑色value float(format(%.1f % (cm[i, j] * 100)))value1 str(value) %\n str(cm1[i, j])plt.text(i, j, value1, verticalalignmentcenter, horizontalalignmentcenter, colorcolor)plt.show()# if not pdf_save_path is None:# plt.savefig(pdf_save_path, bbox_inchestight, dpidpi)labels_name [健康, 故障] test_data_predict SVC_all.predict(test_data_shuffle_scaler)draw_confusion_matrix(label_truetest_label_shulle_scaler,label_predtest_data_predict,label_namelabels_name,normlizeTrue,title混淆矩阵,# titleConfusion Matrix,pdf_save_pathConfusion_Matrix.jpg,dpi300)原文链接 AUC和ROC曲线 ROC曲线是一种评估二分类模型性能的方法它以真正例率TPR为纵轴假正例率FPR为横轴绘制出模型预测结果在不同阈值下的性能。AUC是ROC曲线下面积用于评估模型总体性能。 # 画ROC曲线函数 def plot_roc_curve(y_true, y_score):y_true:真实值y_score预测概率。注意不要传入预测labelfrom sklearn.metrics import roc_curveimport matplotlib.pyplot as pltfpr, tpr, threshold roc_curve(y_true, y_score)# plt.xlabel(False Positive Rate)# plt.ylabel(Ture Positive Rate)plt.xlabel(特异度)plt.ylabel(灵敏度)plt.title(ROC曲线)# plt.title(roc curve)plt.plot(fpr, tpr, colorb, linewidth0.8)plt.plot([0, 1], [0, 1], r--)plt.show()# print(np.sum(SVC_all.predict(test_data_shuffle_scaler))) test_data_score SVC_all.decision_function(test_data_shuffle_scaler) plot_roc_curve(test_label_shulle_scaler, SVC_all.predict_proba(test_data_shuffle_scaler)[:,1]) plot_roc_curve(test_label_shulle_scaler, test_data_score)# 计算AUC from sklearn.metrics import roc_auc_score print(roc_auc_score(test_label_shulle_scaler, SVC_all.predict_proba(test_data_shuffle_scaler)[:,1]))原文链接 3.2方面二不同数据规模下模型的性能 def plot_learning_curve(estimator, title, X, y,ax, # 选择子图ylimNone, # 设置纵坐标的取值范围cvNone, # 交叉验证n_jobsNone # 设定索要使用的线程):train_sizes, train_scores, test_scores learning_curve(estimator, X, y, cvcv, n_jobsn_jobs)# learning_curve() 是一个可视化工具用于评估机器学习模型的性能和训练集大小之间的关系。它可以帮助我们理解模型在不同数据规模下的训练表现# 进而判断模型是否出现了欠拟合或过拟合的情况。该函数会生成一条曲线横轴表示不同大小的训练集纵轴表示训练集和交叉验证集上的评估指标例如# 准确率、损失等。通过观察曲线我们可以得出以下结论# 1训练集误差和交叉验证集误差之间的关系当训练集规模较小时模型可能过度拟合训练集误差较低交叉验证集误差较高当训练集规模逐渐增大时# 模型可能更好地泛化两者的误差逐渐趋于稳定。# 2训练集误差和交叉验证集误差对训练集规模的响应通过观察曲线的斜率我们可以判断模型是否存在高方差过拟合或高偏差欠拟合的问题。如果# 训练集和交叉验证集的误差都很高且二者之间的间隔较大说明模型存在高偏差如果训练集误差很低而交叉验证集误差较高且二者的间隔也较大说# 明模型存在高方差。# cv : int交叉验证生成器或可迭代的可选项确定交叉验证拆分策略。v的可能输入是# - 无使用默认的3倍交叉验证# - 整数指定折叠数。# - 要用作交叉验证生成器的对象。# - 可迭代的yielding训练/测试分裂。# ShuffleSplit我们这里设置cv交叉验证使用ShuffleSplit方法一共取得100组训练集与测试集# 每次的测试集为20%它返回的是每组训练集与测试集的下标索引由此可以知道哪些是train那些是test。# n_jobs : 整数可选并行运行的作业数默认值为1。windows开多线程需要ax.set_title(title)if ylim is not None:ax.set_ylim(*ylim)# *是可以接受任意数量的参数# 而 ** 可以接受任意数量的指定键值的参数# def m(*args,**kwargs):# print(args)# print(kwargs)# m(1,2,a1,b2)# #args:(1,2),kwargs:{b: 2, a: 1}ax.set_xlabel(Training examples)ax.set_ylabel(Score)ax.grid() # 显示网格作为背景不是必须ax.plot(train_sizes, np.mean(train_scores, axis1), o-, colorr, labelTraining score)ax.plot(train_sizes, np.mean(test_scores, axis1), o-, colorg, labelTest score)ax.legend(locbest)return ax# # y y.astype(np.int) print(X.shape) print(y.shape)title [Naive_Bayes, DecisionTree, SVM_RBF_kernel, RandomForest, Logistic] # model [GaussianNB(), DTC(), SVC(gamma0.001) # , RFC(n_estimators50), LR(C0.1, solverlbfgs)] model [GaussianNB(), DTC(), SVC(kernelrbf), RFC(n_estimators50), LR(C0.1, solverliblinear)] cv ShuffleSplit(n_splits10, test_size0.5, random_state0) # n_splits: # 划分数据集的份数类似于KFlod的折数默认为10份 # test_size # 测试集所占总样本的比例如test_size0.2即将划分后的数据集中20%作为测试集 # random_state # 随机数种子使每次划分的数据集不变 # train_sizes: 随着训练集的增大选择在10%25%50%75%100%的训练集大小上进行采样。 # 比如CV 510%的意思是先在训练集上选取10%的数据进行五折交叉验证。 # train_sizes数组类形状n_ticksdtype float或int # 训练示例的相对或绝对数量将用于生成学习曲线。如果dtype为float则视为训练集最大尺寸的一部分 # 由所选的验证方法确定即它必须在01]之内否则将被解释为绝对大小注意为了进行分类 # 样本的数量通常必须足够大以包含每个类中的至少一个样本默认值np.linspace0.11.05 # 输出 # train_sizes_abs # 返回生成的训练的样本数如[ 10 , 100 , 1000 ] # train_scores: # 返回训练集分数该矩阵为 len ( train_sizes_abs ) , cv分割数维的分数 # 每行数据代表该样本数对应不同折的分数 # test_scores: # 同train_scores,只不过是这个对应的是测试集分数 print( * 25) fig, axes plt.subplots(1, 5, figsize(30, 6)) for ind, title_, estimator in zip(range(len(title)), title, model):times time()plot_learning_curve(estimator, title_, X_scaler, y,axaxes[ind], ylim[0, 1.05], n_jobs4, cvcv)print({}:{}.format(title_, datetime.datetime.fromtimestamp(time() - times).strftime( %M:%S:%f))) plt.show() print( * 25) for i in [*zip(range(len(title)), title, model)]:print(i)原文链接 4模型保存与读取 4.1模型的保存 title [Naive_Bayes, DecisionTree, SVM_RBF_kernel, RandomForest, Logistic] model [GaussianNB(), DTC(), SVC(gamma0.001), RFC(n_estimators50), LR(C0.1, solverliblinear)]import joblibfor i_index, i in enumerate(model):i.fit(X, y)joblib_file model_save/ title[i_index] _model.pklwith open(joblib_file, wb) as file:joblib.dump(i, joblib_file)print(i.score(X, y))4.2模型的读取 title [Naive_Bayes, DecisionTree, SVM_RBF_kernel, RandomForest, Logistic]for i in title:joblib_file model_save/ i _model.pklwith open(joblib_file, rb) as file:model joblib.load(file)print(i, : , model.score(X, y))

查看全文

http://www.tj-hxxt.cn/news/225594.html