Files
2026-06-03 12:42:47 +08:00
..
2026-06-03 12:42:47 +08:00
2026-06-03 12:42:47 +08:00
2026-06-03 12:42:47 +08:00
2026-06-03 12:42:47 +08:00
2026-06-03 12:42:47 +08:00
2026-06-03 12:42:47 +08:00

Getting Started: A Complete Workflow

This guide provides a complete, step-by-step workflow from setting up the environment to training, exporting, and running inference with TensorRT.

Using Docker is the recommended way to ensure all dependencies, drivers, and CUDA versions are perfectly aligned. This eliminates "it works on my machine" issues.

  • Step 1.1: Build and Run the Container

    From the project's root directory, run docker compose. This will build the image based on the Dockerfile and start the service in the background.

    docker compose up --build -d
    
  • Step 1.2: Verify the Container is Running

    Check that the container is up and running. Note its name for the next step.

    docker ps
    

2. Training & Evaluation (Using docker attach)

This method directly attaches your terminal to the container's main process. It's simple but requires careful handling to avoid terminating your session.

  • Step 2.1: Attach to the Container

    Attach your terminal to the running container. You will be dropped into a bash shell.

    docker attach <your_container_name>
    
  • Step 2.2: Run the Training Command

    Now, inside the attached shell, run your training command. torchrun will automatically use the GPUs assigned to the container. Do not run it in the background (&).

    # Example for 4 GPUs assigned to the container
    torchrun --nproc_per_node=4 --master-port=8989 \
        tools/train.py -c configs/rtdetr/rtdetr_r50vd_6x_coco.yml \
        --amp
    
  • Step 2.3: Detach from the Session (IMPORTANT!)

    With your training running, you can safely detach and leave it running.

    WARNING: DO NOT PRESS Ctrl+C. This will kill the training process and potentially the entire container.

    To safely detach, press the sequence: Ctrl+P, followed immediately by Ctrl+Q.

    You will return to your local terminal, and the container will continue running the training in the background.

  • Step 2.4: Re-attach to Your Session

    To check on your training progress, simply run the docker attach command again. You will see the live output from your training command.

    docker attach <your_container_name>
    

    (Remember to detach with Ctrl+P, Ctrl+Q when you're done.)


3. Exporting & Inference

For tasks like exporting or running inference, which don't need to run for days, it's safer to use docker exec to open a new, separate shell.

  • Step 3.1: Open a New Shell in the Container

    docker exec -it <your_container_name> bash
    
  • Step 3.2: Run Export or Inference Commands Now, inside this new shell, run your commands.

    # Export to ONNX
    python tools/export_onnx.py \
        -c configs/rtdetr/rtdetr_r50vd_6x_coco.yml \
        -r path/to/trained_checkpoint.pth \
        --check
    
    # Convert to TensorRT
    bash tools/onnx2trt.sh /path/to/your/model.onnx
    
    # RUN TRT Inference
    python references/deploy/rtdetrv2_tensorrt.py \
    --engine /path/to/your/model.trt \
    --image /path/to/your/image.jpg \
    --output /path/to/save/output.jpg \
    --threshold 0.5
    

Utilities & Tips

  • Visualize training with TensorBoard:

    • Use the standard port 6006 to avoid conflicts with training.
    • Ensure the port 6006 is exposed in your docker-compose.yml.
    # Inside the container
    tensorboard --logdir=path/to/summary/ --host=0.0.0.0 --port=6006
    
  • Managing the Container Lifecycle:

    • To temporarily stop the container without deleting it (e.g., to pause training and resume later):

      docker compose stop
      

      You can restart it later with docker compose start.

    • To stop and completely remove the container, network, and volumes:

      docker compose down