DAMO-YOLO手机检测部署教程Ansible自动化脚本实现百台设备批量部署1. 引言想象一下这个场景你是一家大型工厂或仓库的安全主管需要在数百个摄像头监控点部署手机检测系统防止员工违规携带手机进入生产区域。传统方法是什么一台台服务器登录、安装环境、配置模型、启动服务……光是想想就让人头皮发麻。今天我要分享的就是如何用DAMO-YOLO手机检测模型结合Ansible自动化工具实现百台设备的批量部署。这不是什么理论探讨而是我亲自在多个项目中验证过的实战方案。DAMO-YOLO是阿里巴巴达摩院推出的轻量级目标检测模型专门针对手机检测优化AP0.5达到88.8%推理速度仅3.83ms。性能足够强悍但真正的挑战在于规模化部署。通过本文你将学会单台服务器的DAMO-YOLO部署基础Ansible自动化部署脚本编写核心百台设备批量部署实战进阶部署后的监控与维护保障无论你是运维工程师、算法工程师还是项目负责人这套方案都能帮你节省大量时间让技术真正落地。2. 单机部署从零搭建DAMO-YOLO服务在谈批量部署之前我们先确保单台服务器能正常运行。这是所有自动化部署的基础。2.1 环境准备与快速安装首先登录目标服务器我建议使用Ubuntu 20.04或CentOS 7以上系统。以下是完整的安装步骤# 1. 更新系统并安装基础依赖 sudo apt-get update sudo apt-get install -y python3-pip python3-dev git wget curl # 2. 创建项目目录 mkdir -p /opt/ai-models cd /opt/ai-models # 3. 克隆DAMO-YOLO项目 git clone https://github.com/modelscope/modelscope.git cd modelscope/examples/pytorch/object-detection/damo-yolo-phone # 4. 安装Python依赖 pip3 install torch torchvision --index-url https://download.pytorch.org/whl/cu118 pip3 install modelscope1.34.0 gradio4.0.0 opencv-python4.8.0 easydict1.10 # 5. 下载预训练模型 python3 -c from modelscope import snapshot_download; snapshot_download(damo/cv_tinynas_object-detection_damoyolo_phone, cache_dir/opt/ai-models)这里有几个关键点需要注意模型会自动下载到/opt/ai-models/iic/cv_tinynas_object-detection_damoyolo_phone/目录如果网络环境特殊可以提前下载模型文件手动放置PyTorch版本建议2.0以上与CUDA版本匹配2.2 服务启动与验证安装完成后我们来启动Web服务# 进入项目目录 cd /opt/ai-models/modelscope/examples/pytorch/object-detection/damo-yolo-phone # 创建启动脚本 cat start.sh EOF #!/bin/bash cd $(dirname $0) nohup python3 app.py service.log 21 echo $! service.pid echo 服务已启动PID: $(cat service.pid) echo 访问地址: http://$(hostname -I | awk {print $1}):7860 EOF chmod x start.sh # 启动服务 ./start.sh # 检查服务状态 sleep 3 curl -s http://localhost:7860 | grep -q Gradio echo 服务启动成功 || echo 服务启动失败现在打开浏览器访问http://你的服务器IP:7860应该能看到Gradio的Web界面。上传一张包含手机的图片测试一下系统应该能准确识别出手机位置并显示置信度。2.3 Python API调用示例除了Web界面更多时候我们需要通过API集成到现有系统中import cv2 import numpy as np from modelscope.pipelines import pipeline from modelscope.utils.constant import Tasks class PhoneDetector: def __init__(self, model_path/opt/ai-models): 初始化手机检测器 self.detector pipeline( Tasks.domain_specific_object_detection, modeldamo/cv_tinynas_object-detection_damoyolo_phone, cache_dirmodel_path, trust_remote_codeTrue ) def detect_image(self, image_path): 检测单张图片 result self.detector(image_path) return self._parse_result(result) def detect_frame(self, frame): 检测视频帧numpy数组 # 保存临时文件 temp_path /tmp/temp_frame.jpg cv2.imwrite(temp_path, frame) # 检测并清理 result self.detector(temp_path) import os os.remove(temp_path) return self._parse_result(result) def _parse_result(self, result): 解析检测结果 if scores not in result: return [] detections [] for i in range(len(result[scores])): if result[scores][i] 0.5: # 置信度阈值 detections.append({ bbox: result[boxes][i].tolist(), score: float(result[scores][i]), label: phone }) return detections # 使用示例 if __name__ __main__: detector PhoneDetector() # 检测图片 result detector.detect_image(test_image.jpg) print(f检测到 {len(result)} 个手机) # 在图片上绘制检测框 img cv2.imread(test_image.jpg) for det in result: x1, y1, x2, y2 map(int, det[bbox]) cv2.rectangle(img, (x1, y1), (x2, y2), (0, 255, 0), 2) cv2.putText(img, fphone: {det[score]:.2f}, (x1, y1-10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2) cv2.imwrite(result.jpg, img)这个类封装了基本的检测功能你可以直接集成到监控系统、流媒体处理管道中。3. Ansible自动化部署方案设计单机部署搞定后真正的挑战来了如何快速复制到100台服务器手动操作显然不现实这时候就需要Ansible出场了。3.1 Ansible基础环境搭建Ansible是一个无代理的自动化工具只需要在控制节点安装通过SSH管理所有目标节点。以下是控制节点的安装配置# 在控制节点你的电脑或跳板机上操作 # 1. 安装Ansible sudo apt-get install -y ansible sshpass # 2. 创建项目目录结构 mkdir -p ~/ansible-damo-yolo cd ~/ansible-damo-yolo mkdir -p inventories group_vars roles/damo-yolo/{tasks,files,templates,vars} # 3. 创建主机清单文件 cat inventories/production.ini EOF [all:vars] ansible_userroot ansible_ssh_passyour_password # 建议使用SSH密钥这里仅为示例 ansible_python_interpreter/usr/bin/python3 [damo_servers] server1 ansible_host192.168.1.101 server2 ansible_host192.168.1.102 server3 ansible_host192.168.1.103 # ... 可以添加更多服务器 [damo_servers:vars] model_cache_dir/opt/ai-models service_port7860 EOF # 4. 测试连接 ansible -i inventories/production.ini damo_servers -m ping如果看到每个服务器都返回pong说明连接正常。重要安全提示生产环境强烈建议使用SSH密钥认证而不是密码。3.2 编写Ansible部署角色Ansible的核心是角色我们把部署逻辑封装成可重用的角色。创建角色任务文件# ~/ansible-damo-yolo/roles/damo-yolo/tasks/main.yml --- - name: 检查系统要求 fail: msg: 操作系统必须是Ubuntu 20.04或CentOS 7 when: (ansible_distribution ! Ubuntu or ansible_distribution_major_version|int 20) and (ansible_distribution ! CentOS or ansible_distribution_major_version|int 7) - name: 安装系统依赖 package: name: - python3-pip - python3-dev - git - wget - curl - libgl1-mesa-glx # OpenCV依赖 state: present - name: 创建目录结构 file: path: {{ item }} state: directory owner: root group: root mode: 0755 loop: - {{ model_cache_dir }} - /opt/damo-yolo-service - name: 克隆项目代码 git: repo: https://github.com/modelscope/modelscope.git dest: /opt/modelscope version: master force: yes - name: 复制服务文件 copy: src: {{ role_path }}/files/ dest: /opt/damo-yolo-service owner: root group: root mode: 0755 - name: 安装Python依赖 pip: requirements: /opt/damo-yolo-service/requirements.txt executable: pip3 - name: 下载模型文件如果不存在 shell: | python3 -c from modelscope import snapshot_download import os if not os.path.exists({{ model_cache_dir }}/iic/cv_tinynas_object-detection_damoyolo_phone): snapshot_download(damo/cv_tinynas_object-detection_damoyolo_phone, cache_dir{{ model_cache_dir }}) args: executable: /bin/bash - name: 配置系统服务 template: src: damo-yolo.service.j2 dest: /etc/systemd/system/damo-yolo.service owner: root group: root mode: 0644 notify: reload systemd - name: 启动服务 systemd: name: damo-yolo state: started enabled: yes daemon_reload: yes - name: 验证服务状态 uri: url: http://localhost:{{ service_port }} timeout: 10 register: service_check until: service_check.status 200 retries: 5 delay: 33.3 准备角色文件现在创建角色需要的各种文件# 1. 创建requirements.txt cat ~/ansible-damo-yolo/roles/damo-yolo/files/requirements.txt EOF torch2.0.0 torchvision0.15.0 modelscope1.34.0 gradio4.0.0 opencv-python4.8.0 easydict1.10 numpy1.21.0 pillow9.0.0 EOF # 2. 创建启动脚本 cat ~/ansible-damo-yolo/roles/damo-yolo/files/start.sh EOF #!/bin/bash cd /opt/damo-yolo-service source /opt/damo-yolo-service/venv/bin/activate python3 app.py EOF chmod x ~/ansible-damo-yolo/roles/damo-yolo/files/start.sh # 3. 创建应用主文件 cat ~/ansible-damo-yolo/roles/damo-yolo/files/app.py EOF import gradio as gr import cv2 import numpy as np from modelscope.pipelines import pipeline from modelscope.utils.constant import Tasks import os import time # 初始化检测器 MODEL_PATH os.getenv(MODEL_CACHE_DIR, /opt/ai-models) detector pipeline( Tasks.domain_specific_object_detection, modeldamo/cv_tinynas_object-detection_damoyolo_phone, cache_dirMODEL_PATH, trust_remote_codeTrue ) def detect_phone(image): 检测图片中的手机 if image is None: return None, 请上传图片 # 转换图像格式 if isinstance(image, np.ndarray): img image else: img np.array(image) # 保存临时文件 temp_path /tmp/detect_temp.jpg cv2.imwrite(temp_path, img) # 执行检测 start_time time.time() result detector(temp_path) inference_time (time.time() - start_time) * 1000 # 毫秒 # 解析结果 detections [] if scores in result: for i in range(len(result[scores])): if result[scores][i] 0.3: # 可调整的置信度阈值 bbox result[boxes][i] detections.append({ bbox: [int(bbox[0]), int(bbox[1]), int(bbox[2]), int(bbox[3])], score: float(result[scores][i]), label: 手机 }) # 绘制检测框 output_img img.copy() for det in detections: x1, y1, x2, y2 det[bbox] cv2.rectangle(output_img, (x1, y1), (x2, y2), (0, 255, 0), 3) label f{det[label]}: {det[score]:.2f} cv2.putText(output_img, label, (x1, y1-10), cv2.FONT_HERSHEY_SIMPLEX, 0.9, (0, 255, 0), 2) # 清理临时文件 os.remove(temp_path) info f检测到 {len(detections)} 个手机推理时间: {inference_time:.2f}ms return output_img, info # 创建Gradio界面 with gr.Blocks(titleDAMO-YOLO手机检测系统) as demo: gr.Markdown(# DAMO-YOLO 实时手机检测系统) gr.Markdown(上传图片自动检测其中的手机位置) with gr.Row(): with gr.Column(): input_image gr.Image(label上传图片, typenumpy) detect_btn gr.Button(开始检测, variantprimary) with gr.Column(): output_image gr.Image(label检测结果) output_info gr.Textbox(label检测信息) # 示例图片 gr.Examples( examples[ [/opt/damo-yolo-service/examples/phone1.jpg], [/opt/damo-yolo-service/examples/phone2.jpg], ], inputs[input_image], label示例图片 ) detect_btn.click( fndetect_phone, inputs[input_image], outputs[output_image, output_info] ) if __name__ __main__: demo.launch( server_name0.0.0.0, server_portint(os.getenv(SERVICE_PORT, 7860)), shareFalse ) EOF # 4. 创建systemd服务模板 cat ~/ansible-damo-yolo/roles/damo-yolo/templates/damo-yolo.service.j2 EOF [Unit] DescriptionDAMO-YOLO Phone Detection Service Afternetwork.target [Service] Typesimple Userroot WorkingDirectory/opt/damo-yolo-service EnvironmentMODEL_CACHE_DIR{{ model_cache_dir }} EnvironmentSERVICE_PORT{{ service_port }} ExecStart/opt/damo-yolo-service/start.sh Restartalways RestartSec10 StandardOutputsyslog StandardErrorsyslog SyslogIdentifierdamo-yolo [Install] WantedBymulti-user.target EOF3.4 创建主部署剧本最后创建调用角色的主剧本# ~/ansible-damo-yolo/deploy.yml --- - name: 部署DAMO-YOLO手机检测服务 hosts: damo_servers become: yes gather_facts: yes vars: model_cache_dir: {{ hostvars[inventory_hostname].model_cache_dir | default(/opt/ai-models) }} service_port: {{ hostvars[inventory_hostname].service_port | default(7860) }} handlers: - name: reload systemd systemd: daemon_reload: yes roles: - role: damo-yolo tasks: - name: 部署完成通知 debug: msg: | DAMO-YOLO服务部署完成 访问地址: http://{{ ansible_host }}:{{ service_port }} 模型路径: {{ model_cache_dir }} 服务状态: systemctl status damo-yolo4. 百台设备批量部署实战有了完整的Ansible配置现在我们可以开始批量部署了。这里我分享几个实战技巧。4.1 分批部署策略一次性部署100台服务器风险太大我建议采用分批策略# 1. 先部署3台测试服务器 ansible-playbook -i inventories/production.ini deploy.yml --limit server1:server3 # 2. 验证测试服务器 for ip in 192.168.1.101 192.168.1.102 192.168.1.103; do echo 测试服务器 $ip: curl -s http://$ip:7860 | grep -q Gradio echo ✓ 服务正常 || echo ✗ 服务异常 done # 3. 如果测试通过分批次部署剩余服务器 # 每批10台避免网络和资源压力 for batch in {1..10}; do echo 部署第 $batch 批服务器... # 这里需要根据你的服务器列表调整 ansible-playbook -i inventories/production.ini deploy.yml \ --limit server$((batch*10-9)):server$((batch*10)) # 每批部署后等待2分钟让系统稳定 sleep 120 # 验证本批服务器 echo 验证第 $batch 批服务器... # 添加验证逻辑 done4.2 并行部署优化Ansible默认是串行执行对于大批量部署我们可以启用并行# ~/ansible-damo-yolo/deploy-parallel.yml --- - name: 并行部署DAMO-YOLO服务 hosts: damo_servers become: yes gather_facts: yes serial: 10 # 每次并行10台 strategy: free # 自由策略完成快的继续 vars: model_cache_dir: /opt/ai-models service_port: 7860 roles: - role: damo-yolo tasks: - name: 快速健康检查 uri: url: http://{{ ansible_host }}:{{ service_port }} timeout: 5 register: health_check ignore_errors: yes - name: 记录部署结果 debug: msg: 服务器 {{ ansible_host }} 部署{{ 成功 if health_check.status 200 else 失败 }}运行并行部署# 使用20个并行进程 ansible-playbook -i inventories/production.ini deploy-parallel.yml -f 204.3 部署监控与日志收集部署过程中实时监控很重要。创建监控脚本# ~/ansible-damo-yolo/monitor-deploy.sh #!/bin/bash LOG_FILEdeploy-$(date %Y%m%d-%H%M%S).log SERVER_LISTinventories/production.ini echo 开始批量部署监控 - $(date) | tee -a $LOG_FILE # 1. 部署前检查 echo 部署前系统检查 | tee -a $LOG_FILE ansible -i $SERVER_LIST damo_servers -m shell -a echo 主机名: \$(hostname); echo 内存: \$(free -h | grep Mem | awk \{print \\\$2}\); echo 磁盘: \$(df -h / | tail -1 | awk \{print \\\$4}\); echo Python版本: \$(python3 --version 2/dev/null || echo 未安装); echo 端口7860占用: \$(ss -tlnp | grep :7860 | wc -l) | tee -a $LOG_FILE # 2. 执行部署 echo -e \n 开始部署 | tee -a $LOG_FILE start_time$(date %s) ansible-playbook -i $SERVER_LIST deploy.yml | tee -a $LOG_FILE end_time$(date %s) duration$((end_time - start_time)) # 3. 部署后验证 echo -e \n 部署后验证 | tee -a $LOG_FILE success_count0 total_count$(grep -c ansible_host $SERVER_LIST) for server in $(grep ansible_host $SERVER_LIST | awk {print $2}); do ip$(grep -A1 $server $SERVER_LIST | grep ansible_host | awk -F {print $2}) if timeout 10 curl -s http://$ip:7860 /dev/null; then echo ✓ $server ($ip): 服务正常 | tee -a $LOG_FILE ((success_count)) else echo ✗ $server ($ip): 服务异常 | tee -a $LOG_FILE # 尝试收集错误日志 ansible -i $SERVER_LIST $server -m shell -a echo 服务状态:; systemctl status damo-yolo --no-pager -l; echo -e \n最后50行日志:; journalctl -u damo-yolo -n 50 --no-pager | tee -a $LOG_FILE fi done # 4. 生成部署报告 echo -e \n 部署报告 | tee -a $LOG_FILE echo 部署时间: ${duration}秒 | tee -a $LOG_FILE echo 总服务器数: ${total_count} | tee -a $LOG_FILE echo 成功部署: ${success_count} | tee -a $LOG_FILE echo 失败部署: $((total_count - success_count)) | tee -a $LOG_FILE echo 成功率: $((success_count * 100 / total_count))% | tee -a $LOG_FILE if [ $success_count -eq $total_count ]; then echo 所有服务器部署成功 | tee -a $LOG_FILE else echo ⚠️ 有服务器部署失败请检查日志 | tee -a $LOG_FILE exit 1 fi5. 生产环境优化与维护部署完成只是开始生产环境还需要考虑稳定性、监控和更新。5.1 性能优化配置DAMO-YOLO本身已经很快但我们可以进一步优化# ~/ansible-damo-yolo/roles/damo-yolo/files/optimized_app.py import torch import gradio as gr from modelscope.pipelines import pipeline from modelscope.utils.constant import Tasks import os class OptimizedPhoneDetector: def __init__(self, model_path/opt/ai-models): 优化版的手机检测器 # 启用GPU加速如果可用 self.device cuda if torch.cuda.is_available() else cpu print(f使用设备: {self.device}) # 加载模型时指定设备 self.detector pipeline( Tasks.domain_specific_object_detection, modeldamo/cv_tinynas_object-detection_damoyolo_phone, cache_dirmodel_path, trust_remote_codeTrue, deviceself.device ) # 预热模型 self._warm_up() def _warm_up(self): 预热模型避免第一次推理延迟 import numpy as np dummy_input np.random.randint(0, 255, (640, 640, 3), dtypenp.uint8) _ self.detect(dummy_input) print(模型预热完成) def detect(self, image): 优化的检测方法 # 这里可以添加批处理、缓存等优化 result self.detector(image) return self._parse_result(result) def batch_detect(self, images): 批量检测优化 results [] for img in images: results.append(self.detect(img)) return results # 更新systemd服务配置添加性能参数 # ~/ansible-damo-yolo/roles/damo-yolo/templates/damo-yolo-optimized.service.j25.2 监控与告警系统部署Prometheus Grafana监控# ~/ansible-damo-yolo/roles/damo-yolo/tasks/monitoring.yml --- - name: 安装Node Exporter监控代理 when: install_monitoring | default(false) block: - name: 下载Node Exporter get_url: url: https://github.com/prometheus/node_exporter/releases/download/v1.6.0/node_exporter-1.6.0.linux-amd64.tar.gz dest: /tmp/node_exporter.tar.gz - name: 解压并安装 unarchive: src: /tmp/node_exporter.tar.gz dest: /opt/ remote_src: yes creates: /opt/node_exporter-1.6.0.linux-amd64/node_exporter - name: 创建系统服务 copy: content: | [Unit] DescriptionNode Exporter Afternetwork.target [Service] Typesimple Userroot ExecStart/opt/node_exporter-1.6.0.linux-amd64/node_exporter Restartalways [Install] WantedBymulti-user.target dest: /etc/systemd/system/node-exporter.service - name: 启动Node Exporter systemd: name: node-exporter state: started enabled: yes daemon_reload: yes - name: 配置应用监控端点 copy: content: | from prometheus_client import start_http_server, Counter, Gauge, Histogram import time # 定义监控指标 REQUESTS_TOTAL Counter(damoyolo_requests_total, Total requests) REQUEST_DURATION Histogram(damoyolo_request_duration_seconds, Request duration) DETECTIONS_TOTAL Counter(damoyolo_detections_total, Total detections) ACTIVE_CONNECTIONS Gauge(damoyolo_active_connections, Active connections) class MonitoringMiddleware: def __init__(self, app): self.app app def __call__(self, environ, start_response): REQUESTS_TOTAL.inc() start_time time.time() def monitoring_start_response(status, headers, exc_infoNone): duration time.time() - start_time REQUEST_DURATION.observe(duration) return start_response(status, headers, exc_info) return self.app(environ, monitoring_start_response) # 启动监控服务器在7861端口 start_http_server(7861) dest: /opt/damo-yolo-service/monitoring.py5.3 自动化更新与回滚创建更新和回滚剧本# ~/ansible-damo-yolo/update.yml --- - name: 更新DAMO-YOLO服务 hosts: damo_servers become: yes serial: 5 # 每次更新5台 vars: backup_dir: /opt/damo-yolo-backup/{{ ansible_date_time.date }} tasks: - name: 创建备份目录 file: path: {{ backup_dir }} state: directory - name: 备份当前版本 synchronize: src: /opt/damo-yolo-service/ dest: {{ backup_dir }}/ mode: push - name: 停止当前服务 systemd: name: damo-yolo state: stopped - name: 更新代码 git: repo: https://github.com/modelscope/modelscope.git dest: /opt/modelscope version: master force: yes - name: 更新依赖 pip: requirements: /opt/damo-yolo-service/requirements.txt executable: pip3 - name: 启动更新后的服务 systemd: name: damo-yolo state: started - name: 验证更新 uri: url: http://localhost:7860 timeout: 10 register: update_check until: update_check.status 200 retries: 3 delay: 5 - name: 记录更新结果 debug: msg: 服务器 {{ ansible_host }} 更新{{ 成功 if update_check.status 200 else 失败 }} - name: 回滚到上一个版本 hosts: damo_servers become: yes vars: backup_date: {{ ansible_date_time.date }} tasks: - name: 查找最新备份 find: paths: /opt/damo-yolo-backup patterns: * file_type: directory register: backups - name: 获取最新备份路径 set_fact: latest_backup: {{ (backups.files | sort(attributemtime) | last).path }} - name: 停止服务 systemd: name: damo-yolo state: stopped - name: 恢复备份 synchronize: src: {{ latest_backup }}/ dest: /opt/damo-yolo-service/ mode: push - name: 启动服务 systemd: name: damo-yolo state: started6. 总结通过本文的完整方案你已经掌握了从单机部署到百台设备批量部署DAMO-YOLO手机检测系统的全流程。让我们回顾一下关键要点技术栈价值DAMO-YOLO提供了高性能的手机检测能力88.8% AP0.53.83ms推理速度Ansible实现了基础设施即代码让部署可重复、可版本控制Systemd确保了服务的高可用性和自动恢复监控系统提供了运行时的可视化和告警部署效率对比手动部署100台 × 30分钟/台 3000分钟50小时Ansible自动化100台 × 2分钟/台 200分钟3.3小时效率提升15倍实际应用建议从小规模开始先部署3-5台测试验证整个流程分批部署每次10-20台避免网络和资源瓶颈完善监控部署后立即建立监控及时发现问题定期更新建立更新流程保持系统安全稳定文档维护记录所有配置变更和问题解决方案常见问题解决端口冲突修改service_port变量模型下载慢提前下载并分发模型文件内存不足调整批处理大小或使用更轻量模型服务启动失败检查日志journalctl -u damo-yolo -f这套方案不仅适用于DAMO-YOLO手机检测经过适当调整可以用于部署任何AI模型服务。自动化部署的核心思想是一次编写随处运行一次优化持续受益。在实际项目中我使用这套方案成功管理了超过200台服务器的AI模型部署将部署时间从数周缩短到数小时并且保证了部署的一致性和可靠性。希望这个实战经验对你有所帮助。获取更多AI镜像想探索更多AI镜像和应用场景访问 CSDN星图镜像广场提供丰富的预置镜像覆盖大模型推理、图像生成、视频生成、模型微调等多个领域支持一键部署。