4.0 KiB
Getting Started: A Complete Workflow
This guide provides a complete, step-by-step workflow from setting up the environment to training, exporting, and running inference with TensorRT.
1. Environment Setup with Docker (Recommended)
Using Docker is the recommended way to ensure all dependencies, drivers, and CUDA versions are perfectly aligned. This eliminates "it works on my machine" issues.
-
Step 1.1: Build and Run the Container
From the project's root directory, run
docker compose. This will build the image based on theDockerfileand start the service in the background.docker compose up --build -d -
Step 1.2: Verify the Container is Running
Check that the container is up and running. Note its name for the next step.
docker ps
2. Training & Evaluation (Using docker attach)
This method directly attaches your terminal to the container's main process. It's simple but requires careful handling to avoid terminating your session.
-
Step 2.1: Attach to the Container
Attach your terminal to the running container. You will be dropped into a bash shell.
docker attach <your_container_name> -
Step 2.2: Run the Training Command
Now, inside the attached shell, run your training command.
torchrunwill automatically use the GPUs assigned to the container. Do not run it in the background (&).# Example for 4 GPUs assigned to the container torchrun --nproc_per_node=4 --master-port=8989 \ tools/train.py -c configs/rtdetr/rtdetr_r50vd_6x_coco.yml \ --amp -
Step 2.3: Detach from the Session (IMPORTANT!)
With your training running, you can safely detach and leave it running.
WARNING: DO NOT PRESS
Ctrl+C. This will kill the training process and potentially the entire container.To safely detach, press the sequence:
Ctrl+P, followed immediately byCtrl+Q.You will return to your local terminal, and the container will continue running the training in the background.
-
Step 2.4: Re-attach to Your Session
To check on your training progress, simply run the
docker attachcommand again. You will see the live output from your training command.docker attach <your_container_name>(Remember to detach with
Ctrl+P,Ctrl+Qwhen you're done.)
3. Exporting & Inference
For tasks like exporting or running inference, which don't need to run for days, it's safer to use docker exec to open a new, separate shell.
-
Step 3.1: Open a New Shell in the Container
docker exec -it <your_container_name> bash -
Step 3.2: Run Export or Inference Commands Now, inside this new shell, run your commands.
# Export to ONNX python tools/export_onnx.py \ -c configs/rtdetr/rtdetr_r50vd_6x_coco.yml \ -r path/to/trained_checkpoint.pth \ --check# Convert to TensorRT bash tools/onnx2trt.sh /path/to/your/model.onnx# RUN TRT Inference python references/deploy/rtdetrv2_tensorrt.py \ --engine /path/to/your/model.trt \ --image /path/to/your/image.jpg \ --output /path/to/save/output.jpg \ --threshold 0.5
Utilities & Tips
-
Visualize training with TensorBoard:
- Use the standard port
6006to avoid conflicts with training. - Ensure the port
6006is exposed in yourdocker-compose.yml.
# Inside the container tensorboard --logdir=path/to/summary/ --host=0.0.0.0 --port=6006 - Use the standard port
-
Managing the Container Lifecycle:
-
To temporarily stop the container without deleting it (e.g., to pause training and resume later):
docker compose stopYou can restart it later with
docker compose start. -
To stop and completely remove the container, network, and volumes:
docker compose down
-