自己做的网页怎么上传网站吗,学校 网站 建设 目的,网络推广文章的方法,怎样编辑网站标题快速方便地下载huggingface的模型库和数据集 方法一#xff1a;用于使用 aria2/wgetgit 下载 Huggingface 模型和数据集的 CLI 工具特点Usage 方法二#xff1a;模型下载【个人使用记录】保持目录结构数据集下载不足之处 方法一#xff1a;用于使用 aria2/wgetgit 下载 Hugg… 快速方便地下载huggingface的模型库和数据集 方法一用于使用 aria2/wgetgit 下载 Huggingface 模型和数据集的 CLI 工具特点Usage 方法二模型下载【个人使用记录】保持目录结构数据集下载不足之处 方法一用于使用 aria2/wgetgit 下载 Huggingface 模型和数据集的 CLI 工具
来自https://gist.github.com/padeoe/697678ab8e528b85a2a7bddafea1fa4f。 使用方法将hfd.sh拷贝过去然后参考下面的参考命令下载数据集或者模型 Huggingface 模型下载器
考虑到官方 huggingface-cli 缺乏多线程下载支持以及错误处理不足在 hf_transfer 中这个命令行工具巧妙地利用 wget 或 aria2 来处理 LFS 文件并使用 git clone 来处理其余文件。
特点
⏯️ 从断点恢复您可以随时重新运行它或按 CtrlC。 多线程下载利用多线程加速下载过程。 文件排除使用--exclude或--include跳过或指定文件为具有重复格式的模型例如*.bin或*.safetensors节省时间。 身份验证支持对于需要 Huggingface 登录的门控模型请使用 --hf_username 和 --hf_token 进行身份验证。 镜像站点支持使用“HF_ENDPOINT”环境变量进行设置。代理支持使用“HTTPS_PROXY”环境变量进行设置。 简单仅依赖git、aria2c/wget。
Usage
首先下载 hfd.sh 或克隆此存储库然后授予脚本执行权限。
chmod ax hfd.sh为了方便起见您可以创建一个别名
alias hfd$PWD/hfd.sh使用说明
$ ./hfd.sh -h
Usage:hfd repo_id [--include include_pattern] [--exclude exclude_pattern] [--hf_username username] [--hf_token token] [--tool aria2c|wget] [-x threads] [--dataset] [--local-dir path]Description:Downloads a model or dataset from Hugging Face using the provided repo ID.Parameters:repo_id The Hugging Face repo ID in the format org/repo_name.--include (Optional) Flag to specify a string pattern to include files for downloading.--exclude (Optional) Flag to specify a string pattern to exclude files from downloading.include/exclude_pattern The pattern to match against filenames, supports wildcard characters. e.g., --exclude *.safetensor, --include vae/*.--hf_username (Optional) Hugging Face username for authentication. **NOT EMAIL**.--hf_token (Optional) Hugging Face token for authentication.--tool (Optional) Download tool to use. Can be aria2c (default) or wget.-x (Optional) Number of download threads for aria2c. Defaults to 4.--dataset (Optional) Flag to indicate downloading a dataset.--local-dir (Optional) Local directory path where the model or dataset will be stored.Example:hfd bigscience/bloom-560m --exclude *.safetensorshfd meta-llama/Llama-2-7b --hf_username myuser --hf_token mytoken -x 4hfd lavita/medical-qa-shared-task-v1-toy --dataset下载模型
hfd bigscience/bloom-560m下载模型需要登录
从https://huggingface.co/settings/tokens获取huggingface令牌然后
hfd meta-llama/Llama-2-7b --hf_username YOUR_HF_USERNAME_NOT_EMAIL --hf_token YOUR_HF_TOKEN下载模型并排除某些文件例如.safetensors
hfd bigscience/bloom-560m --exclude *.safetensors使用 aria2c 和多线程下载
hfd bigscience/bloom-560m输出 下载过程中将显示文件 URL
$ hfd bigscience/bloom-560m --tool wget --exclude *.safetensors
...
Start Downloading lfs files, bash script:wget -c https://huggingface.co/bigscience/bloom-560m/resolve/main/flax_model.msgpack
# wget -c https://huggingface.co/bigscience/bloom-560m/resolve/main/model.safetensors
wget -c https://huggingface.co/bigscience/bloom-560m/resolve/main/onnx/decoder_model.onnx
...# 安装包
apt update
apt-get install aria2
apt-get install iftop
apt-get install git-lfs
#参考命令
bash /xxx/xxx/hfd.sh mmaaz60/ActivityNet-QA-Test-Videos --tool aria2c -x 16 --dataset --local-dir /xxx/xxx/ActivityNethfd.sh
#!/usr/bin/env bash
# Color definitions
RED\033[0;31m
GREEN\033[0;32m
YELLOW\033[1;33m
NC\033[0m # No Colortrap printf ${YELLOW}\nDownload interrupted. If you re-run the command, you can resume the download from the breakpoint.\n${NC}; exit 1 INTdisplay_help() {cat EOF
Usage:hfd repo_id [--include include_pattern] [--exclude exclude_pattern] [--hf_username username] [--hf_token token] [--tool aria2c|wget] [-x threads] [--dataset] [--local-dir path] Description:Downloads a model or dataset from Hugging Face using the provided repo ID.Parameters:repo_id The Hugging Face repo ID in the format org/repo_name.--include (Optional) Flag to specify a string pattern to include files for downloading.--exclude (Optional) Flag to specify a string pattern to exclude files from downloading.include/exclude_pattern The pattern to match against filenames, supports wildcard characters. e.g., --exclude *.safetensor, --include vae/*.--hf_username (Optional) Hugging Face username for authentication. **NOT EMAIL**.--hf_token (Optional) Hugging Face token for authentication.--tool (Optional) Download tool to use. Can be aria2c (default) or wget.-x (Optional) Number of download threads for aria2c. Defaults to 4.--dataset (Optional) Flag to indicate downloading a dataset.--local-dir (Optional) Local directory path where the model or dataset will be stored.Example:hfd bigscience/bloom-560m --exclude *.safetensorshfd meta-llama/Llama-2-7b --hf_username myuser --hf_token mytoken -x 4hfd lavita/medical-qa-shared-task-v1-toy --dataset
EOFexit 1
}MODEL_ID$1
shift# Default values
TOOLaria2c
THREADS4
HF_ENDPOINT${HF_ENDPOINT:-https://hf-mirror.com}while [[ $# -gt 0 ]]; docase $1 in--include) INCLUDE_PATTERN$2; shift 2 ;;--exclude) EXCLUDE_PATTERN$2; shift 2 ;;--hf_username) HF_USERNAME$2; shift 2 ;;--hf_token) HF_TOKEN$2; shift 2 ;;--tool) TOOL$2; shift 2 ;;-x) THREADS$2; shift 2 ;;--dataset) DATASET1; shift ;;--local-dir) LOCAL_DIR$2; shift 2 ;;*) shift ;;esac
done# Check if aria2, wget, curl, git, and git-lfs are installed
check_command() {if ! command -v $1 /dev/null; thenecho -e ${RED}$1 is not installed. Please install it first.${NC}exit 1fi
}# Mark current repo safe when using shared file system like samba or nfs
ensure_ownership() {if git status 21 | grep fatal: detected dubious ownership in repository at /dev/null; thengit config --global --add safe.directory ${PWD}printf ${YELLOW}Detected dubious ownership in repository, mark ${PWD} safe using git, edit ~/.gitconfig if you want to reverse this.\n${NC} fi
}[[ $TOOL aria2c ]] check_command aria2c
[[ $TOOL wget ]] check_command wget
check_command curl; check_command git; check_command git-lfs[[ -z $MODEL_ID || $MODEL_ID ~ ^-h ]] display_helpif [[ -z $LOCAL_DIR ]]; thenLOCAL_DIR${MODEL_ID#*/}
fiif [[ $DATASET 1 ]]; thenMODEL_IDdatasets/$MODEL_ID
fi
echo Downloading to $LOCAL_DIRif [ -d $LOCAL_DIR/.git ]; thenprintf ${YELLOW}%s exists, Skip Clone.\n${NC} $LOCAL_DIRcd $LOCAL_DIR ensure_ownership GIT_LFS_SKIP_SMUDGE1 git pull || { printf ${RED}Git pull failed.${NC}\n; exit 1; }
elseREPO_URL$HF_ENDPOINT/$MODEL_IDGIT_REFS_URL${REPO_URL}/info/refs?servicegit-upload-packecho Testing GIT_REFS_URL: $GIT_REFS_URLresponse$(curl -s -o /dev/null -w %{http_code} $GIT_REFS_URL)if [ $response 401 ] || [ $response 403 ]; thenif [[ -z $HF_USERNAME || -z $HF_TOKEN ]]; thenprintf ${RED}HTTP Status Code: $response.\nThe repository requires authentication, but --hf_username and --hf_token is not passed. Please get token from https://huggingface.co/settings/tokens.\nExiting.\n${NC}exit 1fiREPO_URLhttps://$HF_USERNAME:$HF_TOKEN${HF_ENDPOINT#https://}/$MODEL_IDelif [ $response ! 200 ]; thenprintf ${RED}Unexpected HTTP Status Code: $response\n${NC}printf ${YELLOW}Executing debug command: curl -v %s\nOutput:${NC}\n $GIT_REFS_URLcurl -v $GIT_REFS_URL; printf \n${RED}Git clone failed.\n${NC}; exit 1fiecho GIT_LFS_SKIP_SMUDGE1 git clone $REPO_URL $LOCAL_DIRGIT_LFS_SKIP_SMUDGE1 git clone $REPO_URL $LOCAL_DIR cd $LOCAL_DIR || { printf ${RED}Git clone failed.\n${NC}; exit 1; }ensure_ownershipwhile IFS read -r file; dotruncate -s 0 $filedone $(git lfs ls-files | cut -d -f 3-)
fiprintf \nStart Downloading lfs files, bash script:\ncd $LOCAL_DIR\n
files$(git lfs ls-files | cut -d -f 3-)
declare -a urlswhile IFS read -r file; dourl$HF_ENDPOINT/$MODEL_ID/resolve/main/$filefile_dir$(dirname $file)mkdir -p $file_dirif [[ $TOOL wget ]]; thendownload_cmdwget -c \$url\ -O \$file\[[ -n $HF_TOKEN ]] download_cmdwget --header\Authorization: Bearer ${HF_TOKEN}\ -c \$url\ -O \$file\elsedownload_cmdaria2c --console-log-levelerror --file-allocationnone -x $THREADS -s $THREADS -k 1M -c \$url\ -d \$file_dir\ -o \$(basename $file)\[[ -n $HF_TOKEN ]] download_cmdaria2c --header\Authorization: Bearer ${HF_TOKEN}\ --console-log-levelerror --file-allocationnone -x $THREADS -s $THREADS -k 1M -c \$url\ -d \$file_dir\ -o \$(basename $file)\fi[[ -n $INCLUDE_PATTERN ! $file $INCLUDE_PATTERN ]] printf # %s\n $download_cmd continue[[ -n $EXCLUDE_PATTERN $file $EXCLUDE_PATTERN ]] printf # %s\n $download_cmd continueprintf %s\n $download_cmdurls($url|$file)
done $filesfor url_file in ${urls[]}; doIFS| read -r url file $url_fileprintf ${YELLOW}Start downloading ${file}.\n${NC} file_dir$(dirname $file)if [[ $TOOL wget ]]; then[[ -n $HF_TOKEN ]] wget --headerAuthorization: Bearer ${HF_TOKEN} -c $url -O $file || wget -c $url -O $fileelse[[ -n $HF_TOKEN ]] aria2c --headerAuthorization: Bearer ${HF_TOKEN} --console-log-levelerror --file-allocationnone -x $THREADS -s $THREADS -k 1M -c $url -d $file_dir -o $(basename $file) || aria2c --console-log-levelerror --file-allocationnone -x $THREADS -s $THREADS -k 1M -c $url -d $file_dir -o $(basename $file)fi[[ $? -eq 0 ]] printf Downloaded %s successfully.\n $url || { printf ${RED}Failed to download %s.\n${NC} $url; exit 1; }
doneprintf ${GREEN}Download completed successfully.\n${NC}
方法二模型下载【个人使用记录】
这个代码不能保持目录结构见下面的改进版
import datetime
import os
import threadingfrom huggingface_hub import hf_hub_url
from huggingface_hub.hf_api import HfApi
from huggingface_hub.utils import filter_repo_objects# 执行命令
def execCmd(cmd):print(命令%s开始运行%s % (cmd, datetime.datetime.now()))os.system(cmd)print(命令%s结束运行%s % (cmd, datetime.datetime.now()))if __name__ __main__:# 需下载的hf库名称repo_id Salesforce/blip2-opt-2.7b# 本地存储路径save_path ./blip2-opt-2.7b# 获取项目信息_api HfApi()repo_info _api.repo_info(repo_idrepo_id,repo_typemodel,revisionmain,tokenNone,)# 获取文件信息filtered_repo_files list(filter_repo_objects(items[f.rfilename for f in repo_info.siblings],allow_patternsNone,ignore_patternsNone,))cmds []threads []# 需要执行的命令列表for file in filtered_repo_files:# 获取路径url hf_hub_url(repo_idrepo_id, filenamefile)# 断点下载指令cmds.append(fwget -c {url} -P {save_path})print(cmds)print(程序开始%s % datetime.datetime.now())for cmd in cmds:th threading.Thread(targetexecCmd, args(cmd,))th.start()threads.append(th)for th in threads:th.join()print(程序结束%s % datetime.datetime.now())保持目录结构
import datetime
import os
import threading
from pathlib import Pathfrom huggingface_hub import hf_hub_url
from huggingface_hub.hf_api import HfApi
from huggingface_hub.utils import filter_repo_objects# 执行命令
def execCmd(cmd):print(命令%s开始运行%s % (cmd, datetime.datetime.now()))os.system(cmd)print(命令%s结束运行%s % (cmd, datetime.datetime.now()))if __name__ __main__:# 需下载的hf库名称repo_id Salesforce/blip2-opt-2.7b# 本地存储路径save_path ./blip2-opt-2.7b# 创建本地保存目录Path(save_path).mkdir(parentsTrue, exist_okTrue)# 获取项目信息_api HfApi()repo_info _api.repo_info(repo_idrepo_id,repo_typemodel,revisionmain,tokenNone,)# 获取文件信息filtered_repo_files list(filter_repo_objects(items[f.rfilename for f in repo_info.siblings],allow_patternsNone,ignore_patternsNone,))cmds []threads []# 需要执行的命令列表for file in filtered_repo_files:# 获取路径url hf_hub_url(repo_idrepo_id, filenamefile)# 在本地创建子目录local_file os.path.join(save_path, file)local_dir os.path.dirname(local_file)Path(local_dir).mkdir(parentsTrue, exist_okTrue)# 断点下载指令cmds.append(fwget -c {url} -P {local_dir})print(cmds)print(程序开始%s % datetime.datetime.now())for cmd in cmds:th threading.Thread(targetexecCmd, args(cmd,))th.start()threads.append(th)for th in threads:th.join()print(程序结束%s % datetime.datetime.now())数据集下载
import datetime
import os
import threading
from pathlib import Pathfrom huggingface_hub import HfApi
from huggingface_hub.utils import filter_repo_objects# 执行命令
def execCmd(cmd):print(命令%s开始运行%s % (cmd, datetime.datetime.now()))os.system(cmd)print(命令%s结束运行%s % (cmd, datetime.datetime.now()))if __name__ __main__:# 需下载的数据集IDdataset_id openai/webtext# 本地存储路径save_path ./webtext# 创建本地保存目录Path(save_path).mkdir(parentsTrue, exist_okTrue)# 获取数据集信息_api HfApi()dataset_info _api.dataset_info(dataset_iddataset_id,revisionmain,tokenNone,)# 获取文件信息filtered_dataset_files list(filter_repo_objects(items[f.rfilename for f in dataset_info.siblings],allow_patternsNone,ignore_patternsNone,))cmds []threads []# 需要执行的命令列表for file in filtered_dataset_files:# 获取路径url dataset_info.get_file_url(file)# 在本地创建子目录local_file os.path.join(save_path, file)local_dir os.path.dirname(local_file)Path(local_dir).mkdir(parentsTrue, exist_okTrue)# 断点下载指令cmds.append(fwget -c {url} -P {local_dir})print(cmds)print(程序开始%s % datetime.datetime.now())for cmd in cmds:th threading.Thread(targetexecCmd, args(cmd,))th.start()threads.append(th)for th in threads:th.join()print(程序结束%s % datetime.datetime.now())不足之处
不支持需要授权的库。
文件太多可能会开很多线程。 创作不易观众老爷们请留步… 动起可爱的小手点个赞再走呗 (๑◕ܫ๑) 欢迎大家关注笔者你的关注是我持续更博的最大动力 原创文章转载告知盗版必究 ♠ ⊕ ♠ ⊕ ♠ ⊕ ♠ ⊕ ♠ ⊕ ♠ ⊕ ♠ ⊕ ♠ ⊕ ♠ ⊕ ♠ ⊕ ♠ ⊕ ♠ ⊕ ♠ ⊕ ♠ ⊕ ♠ ⊕ ♠ ⊕ ♠ ⊕ ♠ ⊕ ♠ ⊕ ♠ ⊕ ♠ ⊕ ♠ ⊕ ♠ ⊕ ♠ ⊕ ♠ ⊕ ♠ ⊕ ♠ ⊕ ♠ ⊕ ♠ ⊕ ♠ ⊕ ♠ 文章转载自: http://www.morning.bkcnq.cn.gov.cn.bkcnq.cn http://www.morning.hkpyp.cn.gov.cn.hkpyp.cn http://www.morning.tmzlt.cn.gov.cn.tmzlt.cn http://www.morning.rcrfz.cn.gov.cn.rcrfz.cn http://www.morning.xqgtd.cn.gov.cn.xqgtd.cn http://www.morning.lxqkt.cn.gov.cn.lxqkt.cn http://www.morning.rqrh.cn.gov.cn.rqrh.cn http://www.morning.bnrff.cn.gov.cn.bnrff.cn http://www.morning.qzzmc.cn.gov.cn.qzzmc.cn http://www.morning.qngcq.cn.gov.cn.qngcq.cn http://www.morning.qwbht.cn.gov.cn.qwbht.cn http://www.morning.kllzy.com.gov.cn.kllzy.com http://www.morning.nrchx.cn.gov.cn.nrchx.cn http://www.morning.xqnzn.cn.gov.cn.xqnzn.cn http://www.morning.flchj.cn.gov.cn.flchj.cn http://www.morning.tgfsr.cn.gov.cn.tgfsr.cn http://www.morning.xjmyq.com.gov.cn.xjmyq.com http://www.morning.dbphz.cn.gov.cn.dbphz.cn http://www.morning.grynb.cn.gov.cn.grynb.cn http://www.morning.tbwsl.cn.gov.cn.tbwsl.cn http://www.morning.yxdrf.cn.gov.cn.yxdrf.cn http://www.morning.hhfwj.cn.gov.cn.hhfwj.cn http://www.morning.skfkx.cn.gov.cn.skfkx.cn http://www.morning.bwznl.cn.gov.cn.bwznl.cn http://www.morning.qbxdt.cn.gov.cn.qbxdt.cn http://www.morning.ltksw.cn.gov.cn.ltksw.cn http://www.morning.kdrly.cn.gov.cn.kdrly.cn http://www.morning.mnsts.cn.gov.cn.mnsts.cn http://www.morning.ryrgx.cn.gov.cn.ryrgx.cn http://www.morning.gnbtp.cn.gov.cn.gnbtp.cn http://www.morning.yymlk.cn.gov.cn.yymlk.cn http://www.morning.klltg.cn.gov.cn.klltg.cn http://www.morning.wfpmt.cn.gov.cn.wfpmt.cn http://www.morning.nrpp.cn.gov.cn.nrpp.cn http://www.morning.hdscx.cn.gov.cn.hdscx.cn http://www.morning.qcdhg.cn.gov.cn.qcdhg.cn http://www.morning.mumgou.com.gov.cn.mumgou.com http://www.morning.kqglp.cn.gov.cn.kqglp.cn http://www.morning.xbtlt.cn.gov.cn.xbtlt.cn http://www.morning.nrchx.cn.gov.cn.nrchx.cn http://www.morning.hxcuvg.cn.gov.cn.hxcuvg.cn http://www.morning.thbnt.cn.gov.cn.thbnt.cn http://www.morning.snrhg.cn.gov.cn.snrhg.cn http://www.morning.qbdqc.cn.gov.cn.qbdqc.cn http://www.morning.ghgck.cn.gov.cn.ghgck.cn http://www.morning.jykzy.cn.gov.cn.jykzy.cn http://www.morning.kbdjn.cn.gov.cn.kbdjn.cn http://www.morning.bsghk.cn.gov.cn.bsghk.cn http://www.morning.xgmf.cn.gov.cn.xgmf.cn http://www.morning.mfzyn.cn.gov.cn.mfzyn.cn http://www.morning.tnbas.com.gov.cn.tnbas.com http://www.morning.cttti.com.gov.cn.cttti.com http://www.morning.pfgln.cn.gov.cn.pfgln.cn http://www.morning.txysr.cn.gov.cn.txysr.cn http://www.morning.tgtsg.cn.gov.cn.tgtsg.cn http://www.morning.ggcjf.cn.gov.cn.ggcjf.cn http://www.morning.ljjph.cn.gov.cn.ljjph.cn http://www.morning.nrll.cn.gov.cn.nrll.cn http://www.morning.splkk.cn.gov.cn.splkk.cn http://www.morning.fwblh.cn.gov.cn.fwblh.cn http://www.morning.zxrtt.cn.gov.cn.zxrtt.cn http://www.morning.rjnrf.cn.gov.cn.rjnrf.cn http://www.morning.ksgjy.cn.gov.cn.ksgjy.cn http://www.morning.nxkyr.cn.gov.cn.nxkyr.cn http://www.morning.msbpb.cn.gov.cn.msbpb.cn http://www.morning.nynyj.cn.gov.cn.nynyj.cn http://www.morning.nnrqg.cn.gov.cn.nnrqg.cn http://www.morning.htpjl.cn.gov.cn.htpjl.cn http://www.morning.gjtdp.cn.gov.cn.gjtdp.cn http://www.morning.drnjn.cn.gov.cn.drnjn.cn http://www.morning.lztrt.cn.gov.cn.lztrt.cn http://www.morning.sryyt.cn.gov.cn.sryyt.cn http://www.morning.kqpxb.cn.gov.cn.kqpxb.cn http://www.morning.beiyishengxin.cn.gov.cn.beiyishengxin.cn http://www.morning.wkhfg.cn.gov.cn.wkhfg.cn http://www.morning.prfrb.cn.gov.cn.prfrb.cn http://www.morning.fkgcd.cn.gov.cn.fkgcd.cn http://www.morning.ppbqz.cn.gov.cn.ppbqz.cn http://www.morning.pwzzk.cn.gov.cn.pwzzk.cn http://www.morning.lfxcj.cn.gov.cn.lfxcj.cn