first commit

2026-06-03 12:42:47 +08:00
commit ec23799148
339 changed files with 57120 additions and 0 deletions
--- a/rtdetrv2_pytorch/tools/README.md
+++ b/rtdetrv2_pytorch/tools/README.md
@@ -0,0 +1,124 @@
+### Getting Started: A Complete Workflow
+
+This guide provides a complete, step-by-step workflow from setting up the environment to training, exporting, and running inference with TensorRT.
+
+#### **1. Environment Setup with Docker (Recommended)**
+
+Using Docker is the recommended way to ensure all dependencies, drivers, and CUDA versions are perfectly aligned. This eliminates "it works on my machine" issues.
+
+*   **Step 1.1: Build and Run the Container**
+
+    From the project's root directory, run `docker compose`. This will build the image based on the `Dockerfile` and start the service in the background.
+
+    ```bash
+    docker compose up --build -d
+    ```
+
+*   **Step 1.2: Verify the Container is Running**
+
+    Check that the container is up and running. Note its name for the next step.
+    ```bash
+    docker ps
+    ```
+
+---
+
+#### **2. Training & Evaluation (Using `docker attach`)**
+
+This method directly attaches your terminal to the container's main process. It's simple but requires careful handling to avoid terminating your session.
+
+*   **Step 2.1: Attach to the Container**
+
+    Attach your terminal to the running container. You will be dropped into a bash shell.
+
+    ```bash
+    docker attach <your_container_name>
+    ```
+
+*   **Step 2.2: Run the Training Command**
+
+    Now, *inside the attached shell*, run your training command. `torchrun` will automatically use the GPUs assigned to the container. **Do not run it in the background (`&`)**.
+
+    ```bash
+    # Example for 4 GPUs assigned to the container
+    torchrun --nproc_per_node=4 --master-port=8989 \
+        tools/train.py -c configs/rtdetr/rtdetr_r50vd_6x_coco.yml \
+        --amp
+    ```
+
+*   **Step 2.3: Detach from the Session (IMPORTANT!)**
+
+    With your training running, you can safely detach and leave it running.
+
+    **WARNING:** **DO NOT PRESS `Ctrl+C`**. This will kill the training process and potentially the entire container.
+
+    To safely detach, press the sequence: **`Ctrl+P`**, followed immediately by **`Ctrl+Q`**.
+
+    You will return to your local terminal, and the container will continue running the training in the background.
+
+*   **Step 2.4: Re-attach to Your Session**
+
+    To check on your training progress, simply run the `docker attach` command again. You will see the live output from your training command.
+
+    ```bash
+    docker attach <your_container_name>
+    ```
+    (Remember to detach with `Ctrl+P`, `Ctrl+Q` when you're done.)
+
+---
+
+#### **3. Exporting & Inference**
+
+For tasks like exporting or running inference, which don't need to run for days, it's safer to use `docker exec` to open a new, separate shell.
+
+*   **Step 3.1: Open a New Shell in the Container**
+    ```bash
+    docker exec -it <your_container_name> bash
+    ```
+
+*   **Step 3.2: Run Export or Inference Commands**
+    Now, inside this new shell, run your commands.
+    ```bash
+    # Export to ONNX
+    python tools/export_onnx.py \
+        -c configs/rtdetr/rtdetr_r50vd_6x_coco.yml \
+        -r path/to/trained_checkpoint.pth \
+        --check
+    ```
+    
+    ```
+    # Convert to TensorRT
+    bash tools/onnx2trt.sh /path/to/your/model.onnx
+    ```
+
+    ```
+    # RUN TRT Inference
+    python references/deploy/rtdetrv2_tensorrt.py \
+    --engine /path/to/your/model.trt \
+    --image /path/to/your/image.jpg \
+    --output /path/to/save/output.jpg \
+    --threshold 0.5
+    ```
+
+### Utilities & Tips
+
+*   **Visualize training with TensorBoard:**
+    *   Use the standard port `6006` to avoid conflicts with training.
+    *   Ensure the port `6006` is exposed in your `docker-compose.yml`.
+
+    ```bash
+    # Inside the container
+    tensorboard --logdir=path/to/summary/ --host=0.0.0.0 --port=6006
+    ```
+
+*   **Managing the Container Lifecycle:**
+    *   **To temporarily stop** the container without deleting it (e.g., to pause training and resume later):
+        ```bash
+        docker compose stop
+        ```
+        You can restart it later with `docker compose start`.
+
+    *   **To stop and completely remove** the container, network, and volumes:
+        ```bash
+        docker compose down
+        ```