first commit

This commit is contained in:
陈赣
2026-06-03 12:42:47 +08:00
commit ec23799148
339 changed files with 57120 additions and 0 deletions

View File

@@ -0,0 +1,10 @@
FROM nvcr.io/nvidia/pytorch:25.06-py3
WORKDIR /workspace
COPY requirements.txt .
RUN pip install --upgrade pip && \
pip install -r requirements.txt
CMD ["/bin/bash"]

168
rtdetrv2_pytorch/README.md Normal file
View File

@@ -0,0 +1,168 @@
## Quick start
<details >
<summary>Setup</summary>
```shell
pip install -r requirements.txt
```
The following is the corresponding `torch` and `torchvision` versions.
`rtdetr` | `torch` | `torchvision`
|---|---|---|
| `-` | `2.4` | `0.19` |
| `-` | `2.2` | `0.17` |
| `-` | `2.1` | `0.16` |
| `-` | `2.0` | `0.15` |
</details>
<details open>
<summary>Fig</summary>
<div align="center">
<img width="500" alt="image" src="https://github.com/user-attachments/assets/437877e9-1d4f-4d30-85e8-aafacfa0ec56">
</div>
</details>
## Model Zoo
### Base models
| Model | Dataset | Input Size | AP<sup>val</sup> | AP<sub>50</sub><sup>val</sup> | #Params(M) | FPS | config| checkpoint |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |:---: |
**RT-DETRv2-S** | COCO | 640 | **48.1** <font color=green>(+1.6)</font> | **65.1** | 20 | 217 | [config](./configs/rtdetrv2/rtdetrv2_r18vd_120e_coco.yml) | [url](https://github.com/lyuwenyu/storage/releases/download/v0.2/rtdetrv2_r18vd_120e_coco_rerun_48.1.pth) |
**RT-DETRv2-M**<sup>*<sup> | COCO | 640 | **49.9** <font color=green>(+1.0)</font> | **67.5** | 31 | 161 | [config](./configs/rtdetrv2/rtdetrv2_r34vd_120e_coco.yml) | [url](https://github.com/lyuwenyu/storage/releases/download/v0.1/rtdetrv2_r34vd_120e_coco_ema.pth)
**RT-DETRv2-M** | COCO | 640 | **51.9** <font color=green>(+0.6)</font> | **69.9** | 36 | 145 | [config](./configs/rtdetrv2/rtdetrv2_r50vd_m_7x_coco.yml) | [url](https://github.com/lyuwenyu/storage/releases/download/v0.1/rtdetrv2_r50vd_m_7x_coco_ema.pth)
**RT-DETRv2-L** | COCO | 640 | **53.4** <font color=green>(+0.3)</font> | **71.6** | 42 | 108 | [config](./configs/rtdetrv2/rtdetrv2_r50vd_6x_coco.yml) | [url](https://github.com/lyuwenyu/storage/releases/download/v0.1/rtdetrv2_r50vd_6x_coco_ema.pth)
**RT-DETRv2-X** | COCO | 640 | 54.3 | **72.8** <font color=green>(+0.1)</font> | 76 | 74 | [config](./configs/rtdetrv2/rtdetrv2_r101vd_6x_coco.yml) | [url](https://github.com/lyuwenyu/storage/releases/download/v0.1/rtdetrv2_r101vd_6x_coco_from_paddle.pth)
<!-- rtdetrv2_hgnetv2_l | COCO | 640 | 52.9 | 71.5 | 32 | 114 | [url<sup>*</sup>](https://github.com/lyuwenyu/storage/releases/download/v0.1/rtdetrv2_hgnetv2_l_6x_coco_from_paddle.pth)
rtdetrv2_hgnetv2_x | COCO | 640 | 54.7 | 72.9 | 67 | 74 | [url<sup>*</sup>](https://github.com/lyuwenyu/storage/releases/download/v0.1/rtdetrv2_hgnetv2_x_6x_coco_from_paddle.pth)
rtdetrv2_hgnetv2_h | COCO | 640 | 56.3 | 74.8 | 123 | 40 | [url<sup>*</sup>](https://github.com/lyuwenyu/storage/releases/download/v0.1/rtdetrv2_hgnetv2_h_6x_coco_from_paddle.pth)
rtdetrv2_18vd | COCO+Objects365 | 640 | 49.0 | 66.5 | 20 | 217 | [url<sup>*</sup>](https://github.com/lyuwenyu/storage/releases/download/v0.1/rtdetrv2_r18vd_5x_coco_objects365_from_paddle.pth)
rtdetrv2_r50vd | COCO+Objects365 | 640 | 55.2 | 73.4 | 42 | 108 | [url<sup>*</sup>](https://github.com/lyuwenyu/storage/releases/download/v0.1/rtdetrv2_r50vd_2x_coco_objects365_from_paddle.pth)
rtdetrv2_r101vd | COCO+Objects365 | 640 | 56.2 | 74.5 | 76 | 74 | [url<sup>*</sup>](https://github.com/lyuwenyu/storage/releases/download/v0.1/rtdetrv2_r101vd_2x_coco_objects365_from_paddle.pth)
-->
**Notes:**
- `AP` is evaluated on *MSCOCO val2017* dataset.
- `FPS` is evaluated on a single T4 GPU with $batch\\_size = 1$, $fp16$, and $TensorRT>=8.5.1$.
- `COCO + Objects365` in the table means finetuned model on `COCO` using pretrained weights trained on `Objects365`.
### Models of discrete sampling
| Model | Sampling Method | AP<sup>val</sup> | AP<sub>50</sub><sup>val</sup> | config| checkpoint
| :---: | :---: | :---: | :---: | :---: | :---: |
**RT-DETRv2-S_dsp** | discrete_sampling | 47.4 | 64.8 <font color=red>(-0.1)</font> | [config](./configs/rtdetrv2/rtdetrv2_r18vd_dsp_3x_coco.yml) | [url](https://github.com/lyuwenyu/storage/releases/download/v0.1/rtdetrv2_r18vd_dsp_3x_coco.pth)
**RT-DETRv2-M**<sup>*</sup>**_dsp** | discrete_sampling | 49.2 | 67.1 <font color=red>(-0.4)</font> | [config](./configs/rtdetrv2/rtdetrv2_r34vd_dsp_1x_coco.yml) | [url](https://github.com/lyuwenyu/storage/releases/download/v0.1/rrtdetrv2_r34vd_dsp_1x_coco.pth)
**RT-DETRv2-M_dsp** | discrete_sampling | 51.4 | 69.7 <font color=red>(-0.2)</font> | [config](./configs/rtdetrv2/rtdetrv2_r50vd_m_dsp_3x_coco.yml) | [url](https://github.com/lyuwenyu/storage/releases/download/v0.1/rtdetrv2_r50vd_m_dsp_3x_coco.pth)
**RT-DETRv2-L_dsp** | discrete_sampling | 52.9 | 71.3 <font color=red>(-0.3)</font> |[config](./configs/rtdetrv2/rtdetrv2_r50vd_dsp_1x_coco.yml)| [url](https://github.com/lyuwenyu/storage/releases/download/v0.1/rtdetrv2_r50vd_dsp_1x_coco.pth)
<!-- **rtdetrv2_r18vd_dsp1** | discrete_sampling | 21600 | 46.3 | 63.9 | [url](https://github.com/lyuwenyu/storage/releases/download/v0.1/rtdetrv2_r18vd_dsp1_1x_coco.pth) -->
<!-- rtdetrv2_r18vd_dsp1 | discrete_sampling | 21600 | 45.5 | 63.0 | 4.34 | [url](https://github.com/lyuwenyu/storage/releases/download/v0.1/rtdetrv2_r18vd_dsp1_120e_coco.pth) -->
<!-- 4.3 -->
**Notes:**
- The impact on inference speed is related to specific device and software.
- `*_dsp*` is the model inherit `*_sp*` model's knowledge and adapt to `discrete_sampling` strategy. **You can use TensorRT 8.4 (or even older versions) to inference for these models**
<!-- - `grid_sampling` use `grid_sample` to sample attention map, `discrete_sampling` use `index_select` method to sample attention map. -->
### Ablation on sampling points
<!-- Flexible samping strategy in cross attenstion layer for devices that do **not** optimize (or not support) `grid_sampling` well. You can choose models based on specific scenarios and the trade-off between speed and accuracy. -->
| Model | Sampling Method | #Points | AP<sup>val</sup> | AP<sub>50</sub><sup>val</sup> | checkpoint
| :---: | :---: | :---: | :---: | :---: | :---: |
**rtdetrv2_r18vd_sp1** | grid_sampling | 21,600 | 47.3 | 64.3 <font color=red>(-0.6) | [url](https://github.com/lyuwenyu/storage/releases/download/v0.1/rtdetrv2_r18vd_sp1_120e_coco.pth)
**rtdetrv2_r18vd_sp2** | grid_sampling | 43,200 | 47.7 | 64.7 <font color=red>(-0.2) | [url](https://github.com/lyuwenyu/storage/releases/download/v0.1/rtdetrv2_r18vd_sp2_120e_coco.pth)
**rtdetrv2_r18vd_sp3** | grid_sampling | 64,800 | 47.8 | 64.8 <font color=red>(-0.1) | [url](https://github.com/lyuwenyu/storage/releases/download/v0.1/rtdetrv2_r18vd_sp3_120e_coco.pth)
rtdetrv2_r18vd(_sp4)| grid_sampling | 86,400 | 47.9 | 64.9 | [url](https://github.com/lyuwenyu/storage/releases/download/v0.1/rtdetrv2_r18vd_120e_coco.pth)
**Notes:**
- The impact on inference speed is related to specific device and software.
- `#points` the total number of sampling points in decoder for per image inference.
## Usage
<details>
<summary> details </summary>
<!-- <summary>1. Training </summary> -->
1. Training
```shell
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=9909 --nproc_per_node=4 tools/train.py -c path/to/config --use-amp --seed=0 &> log.txt 2>&1 &
```
<!-- <summary>2. Testing </summary> -->
2. Testing
```shell
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=9909 --nproc_per_node=4 tools/train.py -c path/to/config -r path/to/checkpoint --test-only
```
<!-- <summary>3. Tuning </summary> -->
3. Tuning
```shell
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=9909 --nproc_per_node=4 tools/train.py -c path/to/config -t path/to/checkpoint --use-amp --seed=0 &> log.txt 2>&1 &
```
<!-- <summary>4. Export onnx </summary> -->
4. Export onnx
```shell
python tools/export_onnx.py -c path/to/config -r path/to/checkpoint --check
```
<!-- <summary>5. Export tensorrt </summary> -->
5. Export tensorrt
```shell
python tools/export_trt.py -i path/to/onnxfile
```
<!-- <summary>6. Inference </summary> -->
5. Inference
Support torch, onnxruntime, tensorrt and openvino, see details in *references/deploy*
```shell
python references/deploy/rtdetrv2_onnxruntime.py --onnx-file=model.onnx --im-file=xxxx
python references/deploy/rtdetrv2_tensorrt.py --trt-file=model.trt --im-file=xxxx
python references/deploy/rtdetrv2_torch.py -c path/to/config -r path/to/checkpoint --im-file=xxx --device=cuda:0
```
</details>
## Citation
If you use `RTDETR` or `RTDETRv2` in your work, please use the following BibTeX entries:
<details>
<summary> bibtex </summary>
```latex
@misc{lv2023detrs,
title={DETRs Beat YOLOs on Real-time Object Detection},
author={Wenyu Lv and Shangliang Xu and Yian Zhao and Guanzhong Wang and Jinman Wei and Cheng Cui and Yuning Du and Qingqing Dang and Yi Liu},
year={2023},
eprint={2304.08069},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
@misc{lv2024rtdetrv2improvedbaselinebagoffreebies,
title={RT-DETRv2: Improved Baseline with Bag-of-Freebies for Real-Time Detection Transformer},
author={Wenyu Lv and Yian Zhao and Qinyao Chang and Kui Huang and Guanzhong Wang and Yi Liu},
year={2024},
eprint={2407.17140},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2407.17140},
}
```
</details>

View File

@@ -0,0 +1,48 @@
task: detection
evaluator:
type: CocoEvaluator
iou_types: ['bbox', ]
# num_classes: 365
# remap_mscoco_category: False
# num_classes: 91
# remap_mscoco_category: False
num_classes: 80
remap_mscoco_category: True
train_dataloader:
type: DataLoader
dataset:
type: CocoDetection
img_folder: ./dataset/coco/train2017/
ann_file: ./dataset/coco/annotations/instances_train2017.json
return_masks: False
transforms:
type: Compose
ops: ~
shuffle: True
num_workers: 4
drop_last: True
collate_fn:
type: BatchImageCollateFunction
val_dataloader:
type: DataLoader
dataset:
type: CocoDetection
img_folder: ./dataset/coco/val2017/
ann_file: ./dataset/coco/annotations/instances_val2017.json
return_masks: False
transforms:
type: Compose
ops: ~
shuffle: False
num_workers: 4
drop_last: False
collate_fn:
type: BatchImageCollateFunction

View File

@@ -0,0 +1,40 @@
task: detection
evaluator:
type: CocoEvaluator
iou_types: ['bbox', ]
num_classes: 20
train_dataloader:
type: DataLoader
dataset:
type: VOCDetection
root: ./dataset/voc/
ann_file: trainval.txt
label_file: label_list.txt
transforms:
type: Compose
ops: ~
shuffle: True
num_workers: 4
drop_last: True
collate_fn:
type: BatchImageCollateFunction
val_dataloader:
type: DataLoader
dataset:
type: VOCDetection
root: ./dataset/voc/
ann_file: test.txt
label_file: label_list.txt
transforms:
type: Compose
ops: ~
shuffle: False
num_workers: 4
drop_last: False
collate_fn:
type: BatchImageCollateFunction

View File

@@ -0,0 +1,31 @@
train_dataloader:
dataset:
return_masks: False
transforms:
ops:
- {type: RandomPhotometricDistort, p: 0.5}
- {type: RandomZoomOut, fill: 0}
- {type: RandomIoUCrop, p: 0.8}
- {type: SanitizeBoundingBoxes, min_size: 1}
- {type: RandomHorizontalFlip}
- {type: Resize, size: [640, 640], }
- {type: SanitizeBoundingBoxes, min_size: 1}
- {type: ConvertPILImage, dtype: 'float32', scale: True}
- {type: ConvertBoxes, fmt: 'cxcywh', normalize: True}
collate_fn:
type: BatchImageCollateFunction
scales: [480, 512, 544, 576, 608, 640, 640, 640, 672, 704, 736, 768, 800]
shuffle: True
num_workers: 4
total_batch_size: 16
val_dataloader:
dataset:
transforms:
ops:
- {type: Resize, size: [640, 640]}
- {type: ConvertPILImage, dtype: 'float32', scale: True}
shuffle: False
total_batch_size: 16
num_workers: 8

View File

@@ -0,0 +1,40 @@
use_ema: True
ema:
type: ModelEMA
decay: 0.9999
warmups: 2000
epoches: 72
clip_max_norm: 0.1
optimizer:
type: AdamW
params:
-
params: '^(?=.*backbone)(?!.*(?:norm|bn)).*$'
lr: 0.00001
-
params: '^(?=.*backbone)(?=.*(?:norm|bn)).*$'
weight_decay: 0.
lr: 0.00001
-
params: '^(?=.*(?:encoder|decoder))(?=.*(?:norm|bn|bias)).*$'
weight_decay: 0.
lr: 0.0001
betas: [0.9, 0.999]
weight_decay: 0.0001
lr_scheduler:
type: MultiStepLR
milestones: [1000]
gamma: 0.1
lr_warmup_scheduler:
type: LinearWarmup
warmup_duration: 2000

View File

@@ -0,0 +1,79 @@
task: detection
model: RTDETR
criterion: RTDETRCriterion
postprocessor: RTDETRPostProcessor
use_focal_loss: True
eval_spatial_size: [640, 640] # h w
RTDETR:
backbone: PResNet
encoder: HybridEncoder
decoder: RTDETRTransformer
PResNet:
depth: 50
variant: d
freeze_at: 0
return_idx: [1, 2, 3]
num_stages: 4
freeze_norm: True
pretrained: True
HybridEncoder:
in_channels: [512, 1024, 2048]
feat_strides: [8, 16, 32]
# intra
hidden_dim: 256
use_encoder_idx: [2]
num_encoder_layers: 1
nhead: 8
dim_feedforward: 1024
dropout: 0.
enc_act: 'gelu'
# cross
expansion: 1.0
depth_mult: 1
act: 'silu'
version: v1
RTDETRTransformer:
feat_channels: [256, 256, 256]
feat_strides: [8, 16, 32]
hidden_dim: 256
num_levels: 3
num_layers: 6
num_queries: 300
num_denoising: 100
label_noise_ratio: 0.5
box_noise_scale: 1.0 # 1.0 0.4
eval_idx: -1
RTDETRPostProcessor:
num_top_queries: 300
RTDETRCriterion:
weight_dict: {loss_vfl: 1, loss_bbox: 5, loss_giou: 2,}
losses: ['vfl', 'boxes', ]
alpha: 0.75
gamma: 2.0
matcher:
type: HungarianMatcher
weight_dict: {cost_class: 2, cost_bbox: 5, cost_giou: 2}
alpha: 0.25
gamma: 2.0

View File

@@ -0,0 +1,111 @@
# DETRs Beat YOLOs on Real-time Object Detection
## Introduction
This repository is the official pytorch implementation of [*RTDETR*](https://arxiv.org/abs/2304.08069v1), and is compatiable with [RT-DETR/rtdetr_pytorch](https://github.com/lyuwenyu/RT-DETR/tree/main). For paddle version implementation, please refer to [RT-DETR/rtdetr_paddle](https://github.com/lyuwenyu/RT-DETR/tree/main). **If you are using rtdetr for the first time, it is highly recommended to use [rtdetrv2](../rtdetrv2/)**.
<details open>
<summary> Fig </summary>
<div align="center">
<img src="https://github.com/lyuwenyu/RT-DETR/assets/17582080/42636690-1ecf-4647-b075-842ecb9bc562" width=500>
</div>
</details>
<!--
<div align="center">
<img src="https://github.com/lyuwenyu/RT-DETR/assets/17582080/42636690-1ecf-4647-b075-842ecb9bc562" width=500>
</div> -->
## Model Zoo
| Model | Dataset | Input Size | AP<sup>val</sup> | AP<sub>50</sub><sup>val</sup> | #Params(M) | FPS | checkpoint |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
rtdetr_r18vd | COCO | 640 | 46.4 | 63.7 | 20 | 217 | [url<sup>*</sup>](https://github.com/lyuwenyu/storage/releases/download/v0.1/rtdetr_r18vd_dec3_6x_coco_from_paddle.pth)
rtdetr_r34vd | COCO | 640 | 48.9 | 66.8 | 31 | 161 | [url<sup>*</sup>](https://github.com/lyuwenyu/storage/releases/download/v0.1/rtdetr_r34vd_dec4_6x_coco_from_paddle.pth)
rtdetr_r50vd_m | COCO | 640 | 51.3 | 69.5 | 36 | 145 | [url<sup>*</sup>](https://github.com/lyuwenyu/storage/releases/download/v0.1/rtdetr_r50vd_m_6x_coco_from_paddle.pth)
rtdetr_r50vd | COCO | 640 | 53.1 | 71.2| 42 | 108 | [url<sup>*</sup>](https://github.com/lyuwenyu/storage/releases/download/v0.1/rtdetr_r50vd_6x_coco_from_paddle.pth)
rtdetr_r101vd | COCO | 640 | 54.3 | 72.8 | 76 | 74 | [url<sup>*</sup>](https://github.com/lyuwenyu/storage/releases/download/v0.1/rtdetr_r101vd_6x_coco_from_paddle.pth)
rtdetr_18vd | COCO+Objects365 | 640 | 49.0 | 66.5 | 20 | 217 | [url<sup>*</sup>](https://github.com/lyuwenyu/storage/releases/download/v0.1/rtdetr_r18vd_5x_coco_objects365_from_paddle.pth)
rtdetr_r50vd | COCO+Objects365 | 640 | 55.2 | 73.4 | 42 | 108 | [url<sup>*</sup>](https://github.com/lyuwenyu/storage/releases/download/v0.1/rtdetr_r50vd_2x_coco_objects365_from_paddle.pth)
rtdetr_r101vd | COCO+Objects365 | 640 | 56.2 | 74.5 | 76 | 74 | [url<sup>*</sup>](https://github.com/lyuwenyu/storage/releases/download/v0.1/rtdetr_r101vd_2x_coco_objects365_from_paddle.pth)
<!-- rtdetr_r18vd | COCO | 640 | 46.5 | 63.6 | 20 | 217 | [url](https://github.com/lyuwenyu/storage/releases/download/v0.1/rtdetr_r18vd_6x_coco.pth) -->
<!-- rtdetr_r18vd | Objects365 | 640 | 22.9 | 31.2| - | [url<sup>*</sup>](https://github.com/lyuwenyu/storage/releases/download/v0.1/rtdetr_r18vd_5x_coco_objects365_from_paddle.pth)
rtdetr_r50vd | Objects365 | 640 | 35.1 | 46.2 | - | [url<sup>*</sup>](https://github.com/lyuwenyu/storage/releases/download/v0.1/rtdetr_r50vd_2x_coco_objects365_from_paddle.pth)
rtdetr_r101vd | Objects365 | 640 | 36.8 | 48.3 | - | [url<sup>*</sup>](https://github.com/lyuwenyu/storage/releases/download/v0.1/rtdetr_r101vd_2x_coco_objects365_from_paddle.pth) -->
Notes
<!-- - AP is evaluated on coco 2017 val dataset -->
<!-- RT-DETR was trained on COCO train2017 and evaluated on val2017. -->
- `COCO + Objects365` in the table means finetuned model on `COCO` using pretrained weights trained on `Objects365`.
- `FPS` is evaluated on a single T4 GPU with $batch\\_size = 1$ and $tensorrt\\_fp16$ mode
- `url`<sup>`*`</sup> is the url of the pretrained weights, converted from the paddle model to save energy. *There may be slight differences between this table and the paper.
## Usage
<details>
<summary> details </summary>
<!-- <summary>1. Training </summary> -->
1. Training
```shell
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=9909 --nproc_per_node=4 tools/train.py -c path/to/config &> log.txt 2>&1 &
```
<!-- <summary>2. Testing </summary> -->
2. Testing
```shell
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=9909 --nproc_per_node=4 tools/train.py -c path/to/config -r path/to/checkpoint --test-only
```
<!-- <summary>3. Tuning </summary> -->
3. Tuning
```shell
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=9909 --nproc_per_node=4 tools/train.py -c path/to/config -t path/to/checkpoint &> log.txt 2>&1 &
```
<!-- <summary>4. Export onnx </summary> -->
4. Export onnx
```shell
python tools/export_onnx.py -c path/to/config -r path/to/checkpoint --check
```
<!-- <summary>5. Inference </summary> -->
5. Inference
Support torch, onnxruntime, tensorrt and openvino, see details in *references/deploy*
```shell
python references/deploy/rtdetrv2_onnx.py --onnx-file=model.onnx --im-file=xxxx
python references/deploy/rtdetrv2_tensorrt.py --trt-file=model.trt --im-file=xxxx
python references/deploy/rtdetrv2_torch.py -c path/to/config -r path/to/checkpoint --im-file=xxx --device=cuda:0
```
</details>
## Citation
If you use `RTDETR` in your work, please use the following BibTeX entries:
<details>
<summary> bibtex </summary>
```latex
@misc{lv2023detrs,
title={DETRs Beat YOLOs on Real-time Object Detection},
author={Wenyu Lv and Shangliang Xu and Yian Zhao and Guanzhong Wang and Jinman Wei and Cheng Cui and Yuning Du and Qingqing Dang and Yi Liu},
year={2023},
eprint={2304.08069},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
@software{Lv_rtdetr_by_cvperception_2023,
author = {Lv, Wenyu},
license = {Apache-2.0},
month = oct,
title = {{rtdetr by cvperception}},
url = {https://github.com/lyuwenyu/cvperception/},
version = {0.0.1dev},
year = {2023}
}
```
</details>

View File

@@ -0,0 +1,41 @@
__include__: [
'../dataset/coco_detection.yml',
'../runtime.yml',
'./include/dataloader.yml',
'./include/optimizer.yml',
'./include/rtdetr_r50vd.yml',
]
output_dir: ./output/rtdetr_r101vd_6x_coco
PResNet:
depth: 101
HybridEncoder:
# intra
hidden_dim: 384
dim_feedforward: 2048
RTDETRTransformer:
feat_channels: [384, 384, 384]
optimizer:
type: AdamW
params:
-
params: '^(?=.*backbone)(?!.*norm|bn).*$'
lr: 0.000001
-
params: '^(?=.*(?:encoder|decoder))(?=.*(?:norm|bn)).*$'
weight_decay: 0.
lr: 0.0001
betas: [0.9, 0.999]
weight_decay: 0.0001

View File

@@ -0,0 +1,48 @@
__include__: [
'../dataset/coco_detection.yml',
'../runtime.yml',
'./include/dataloader.yml',
'./include/optimizer.yml',
'./include/rtdetr_r50vd.yml',
]
output_dir: ./output/rtdetr_r18vd_6x_coco
PResNet:
depth: 18
freeze_at: -1
freeze_norm: False
pretrained: True
HybridEncoder:
in_channels: [128, 256, 512]
hidden_dim: 256
expansion: 0.5
RTDETRTransformer:
num_layers: 3
optimizer:
type: AdamW
params:
-
params: '^(?=.*backbone)(?=.*norm|bn).*$'
weight_decay: 0.
lr: 0.00001
-
params: '^(?=.*backbone)(?!.*norm|bn).*$'
lr: 0.00001
-
params: '^(?=.*(?:encoder|decoder))(?=.*(?:norm|bn|bias)).*$'
weight_decay: 0.
lr: 0.0001
betas: [0.9, 0.999]
weight_decay: 0.0001

View File

@@ -0,0 +1,48 @@
__include__: [
'../dataset/coco_detection.yml',
'../runtime.yml',
'./include/dataloader.yml',
'./include/optimizer.yml',
'./include/rtdetr_r50vd.yml',
]
output_dir: ./output/rtdetr_r34vd_6x_coco
PResNet:
depth: 34
freeze_at: -1
freeze_norm: False
pretrained: True
HybridEncoder:
in_channels: [128, 256, 512]
hidden_dim: 256
expansion: 0.5
RTDETRTransformer:
num_layers: 4
optimizer:
type: AdamW
params:
-
params: '^(?=.*backbone)(?=.*norm|bn).*$'
weight_decay: 0.
lr: 0.00001
-
params: '^(?=.*backbone)(?!.*norm|bn).*$'
lr: 0.00001
-
params: '^(?=.*(?:encoder|decoder))(?=.*(?:norm|bn|bias)).*$'
weight_decay: 0.
lr: 0.0001
betas: [0.9, 0.999]
weight_decay: 0.0001

View File

@@ -0,0 +1,14 @@
__include__: [
'../dataset/coco_detection.yml',
'../runtime.yml',
'./include/dataloader.yml',
'./include/optimizer.yml',
'./include/rtdetr_r50vd.yml',
]
output_dir: ./output/rtdetr_r50vd_6x_coco

View File

@@ -0,0 +1,34 @@
__include__: [
'../dataset/coco_detection.yml',
'../runtime.yml',
'./include/dataloader.yml',
'./include/optimizer.yml',
'./include/rtdetr_r50vd.yml',
]
output_dir: ./output/rtdetr_r50vd_m_6x_coco
HybridEncoder:
expansion: 0.5
RTDETRTransformer:
eval_idx: 2 # use 3th decoder layer to eval
optimizer:
type: AdamW
params:
-
params: '^(?=.*backbone)(?!.*norm|bn).*$'
lr: 0.00001
-
params: '^(?=.*(?:encoder|decoder))(?=.*(?:norm|bn|bias)).*$'
weight_decay: 0.
lr: 0.0001
betas: [0.9, 0.999]
weight_decay: 0.0001

View File

@@ -0,0 +1,38 @@
train_dataloader:
dataset:
transforms:
ops:
- {type: RandomPhotometricDistort, p: 0.5}
- {type: RandomZoomOut, fill: 0}
- {type: RandomIoUCrop, p: 0.8}
- {type: SanitizeBoundingBoxes, min_size: 1}
- {type: RandomHorizontalFlip}
- {type: Resize, size: [640, 640], }
- {type: SanitizeBoundingBoxes, min_size: 1}
- {type: ConvertPILImage, dtype: 'float32', scale: True}
- {type: ConvertBoxes, fmt: 'cxcywh', normalize: True}
policy:
name: stop_epoch
epoch: 71 # epoch in [71, ~) stop `ops`
ops: ['RandomPhotometricDistort', 'RandomZoomOut', 'RandomIoUCrop']
collate_fn:
type: BatchImageCollateFunction
scales: [480, 512, 544, 576, 608, 640, 640, 640, 672, 704, 736, 768, 800]
stop_epoch: 71 # epoch in [71, ~) stop `multiscales`
shuffle: True
total_batch_size: 16 # total batch size equals to 16 (4 * 4)
num_workers: 4
val_dataloader:
dataset:
transforms:
ops:
- {type: Resize, size: [640, 640]}
- {type: ConvertPILImage, dtype: 'float32', scale: True}
shuffle: False
total_batch_size: 32
num_workers: 4

View File

@@ -0,0 +1,37 @@
use_amp: True
use_ema: True
ema:
type: ModelEMA
decay: 0.9999
warmups: 2000
epoches: 72
clip_max_norm: 0.1
optimizer:
type: AdamW
params:
-
params: '^(?=.*backbone)(?!.*norm).*$'
lr: 0.00001
-
params: '^(?=.*(?:encoder|decoder))(?=.*(?:norm|bn)).*$'
weight_decay: 0.
lr: 0.0001
betas: [0.9, 0.999]
weight_decay: 0.0001
lr_scheduler:
type: MultiStepLR
milestones: [1000]
gamma: 0.1
lr_warmup_scheduler:
type: LinearWarmup
warmup_duration: 2000

View File

@@ -0,0 +1,83 @@
task: detection
model: RTDETR
criterion: RTDETRCriterionv2
postprocessor: RTDETRPostProcessor
use_focal_loss: True
eval_spatial_size: [640, 640] # h w
RTDETR:
backbone: PResNet
encoder: HybridEncoder
decoder: RTDETRTransformerv2
PResNet:
depth: 50
variant: d
freeze_at: 0
return_idx: [1, 2, 3]
num_stages: 4
freeze_norm: True
pretrained: True
HybridEncoder:
in_channels: [512, 1024, 2048]
feat_strides: [8, 16, 32]
# intra
hidden_dim: 256
use_encoder_idx: [2]
num_encoder_layers: 1
nhead: 8
dim_feedforward: 1024
dropout: 0.
enc_act: 'gelu'
# cross
expansion: 1.0
depth_mult: 1
act: 'silu'
RTDETRTransformerv2:
feat_channels: [256, 256, 256]
feat_strides: [8, 16, 32]
hidden_dim: 256
num_levels: 3
num_layers: 6
num_queries: 300
num_denoising: 100
label_noise_ratio: 0.5
box_noise_scale: 1.0 # 1.0 0.4
eval_idx: -1
# NEW
num_points: [4, 4, 4] # [3,3,3] [2,2,2]
cross_attn_method: default # default, discrete
query_select_method: default # default, agnostic
RTDETRPostProcessor:
num_top_queries: 300
RTDETRCriterionv2:
weight_dict: {loss_vfl: 1, loss_bbox: 5, loss_giou: 2,}
losses: ['vfl', 'boxes', ]
alpha: 0.75
gamma: 2.0
matcher:
type: HungarianMatcher
weight_dict: {cost_class: 2, cost_bbox: 5, cost_giou: 2}
alpha: 0.25
gamma: 2.0

View File

@@ -0,0 +1,50 @@
__include__: [
'../dataset/coco_detection.yml',
'../runtime.yml',
'./include/dataloader.yml',
'./include/optimizer.yml',
'./include/rtdetrv2_r50vd.yml',
]
output_dir: ./output/rtdetrv2_hgnetv2_h_6x_coco
RTDETR:
backbone: HGNetv2
HGNetv2:
name: 'H'
return_idx: [1, 2, 3]
freeze_at: 0
freeze_norm: True
pretrained: True
HybridEncoder:
# intra
hidden_dim: 512
dim_feedforward: 2048
num_encoder_layers: 2
RTDETRTransformerv2:
feat_channels: [512, 512, 512]
optimizer:
type: AdamW
params:
-
params: '^(?=.*backbone)(?!.*norm|bn).*$'
lr: 0.000005
-
params: '^(?=.*(?:encoder|decoder))(?=.*(?:norm|bn)).*$'
weight_decay: 0.
lr: 0.0001
betas: [0.9, 0.999]
weight_decay: 0.0001

View File

@@ -0,0 +1,38 @@
__include__: [
'../dataset/coco_detection.yml',
'../runtime.yml',
'./include/dataloader.yml',
'./include/optimizer.yml',
'./include/rtdetrv2_r50vd.yml',
]
output_dir: ./output/rtdetrv2_hgnetv2_l_6x_coco
RTDETR:
backbone: HGNetv2
HGNetv2:
name: 'L'
return_idx: [1, 2, 3]
freeze_at: 0
freeze_norm: True
pretrained: True
optimizer:
type: AdamW
params:
-
params: '^(?=.*backbone)(?!.*norm|bn).*$'
lr: 0.000005
-
params: '^(?=.*(?:encoder|decoder))(?=.*(?:norm|bn)).*$'
weight_decay: 0.
lr: 0.0001
betas: [0.9, 0.999]
weight_decay: 0.0001

View File

@@ -0,0 +1,50 @@
__include__: [
'../dataset/coco_detection.yml',
'../runtime.yml',
'./include/dataloader.yml',
'./include/optimizer.yml',
'./include/rtdetrv2_r50vd.yml',
]
output_dir: ./output/rtdetrv2_hgnetv2_x_6x_coco
RTDETR:
backbone: HGNetv2
HGNetv2:
name: 'X'
return_idx: [1, 2, 3]
freeze_at: 0
freeze_norm: True
pretrained: True
HybridEncoder:
# intra
hidden_dim: 384
dim_feedforward: 2048
RTDETRTransformerv2:
feat_channels: [384, 384, 384]
optimizer:
type: AdamW
params:
-
params: '^(?=.*backbone)(?!.*norm|bn).*$'
lr: 0.000001
-
params: '^(?=.*(?:encoder|decoder))(?=.*(?:norm|bn)).*$'
weight_decay: 0.
lr: 0.0001
betas: [0.9, 0.999]
weight_decay: 0.0001

View File

@@ -0,0 +1,40 @@
__include__: [
'../dataset/coco_detection.yml',
'../runtime.yml',
'./include/dataloader.yml',
'./include/optimizer.yml',
'./include/rtdetrv2_r50vd.yml',
]
output_dir: ./output/rtdetrv2_r101vd_6x_coco
PResNet:
depth: 101
HybridEncoder:
# intra
hidden_dim: 384
dim_feedforward: 2048
RTDETRTransformerv2:
feat_channels: [384, 384, 384]
optimizer:
type: AdamW
params:
-
params: '^(?=.*backbone)(?!.*norm|bn).*$'
lr: 0.000001
-
params: '^(?=.*(?:encoder|decoder))(?=.*(?:norm|bn)).*$'
weight_decay: 0.
lr: 0.0001
betas: [0.9, 0.999]
weight_decay: 0.0001

View File

@@ -0,0 +1,46 @@
__include__: [
'../dataset/coco_detection.yml',
'../runtime.yml',
'./include/dataloader.yml',
'./include/optimizer.yml',
'./include/rtdetrv2_r50vd.yml',
]
output_dir: ./output/rtdetrv2_r18vd_120e_coco
PResNet:
depth: 18
freeze_at: -1
freeze_norm: False
pretrained: True
HybridEncoder:
in_channels: [128, 256, 512]
hidden_dim: 256
expansion: 0.5
RTDETRTransformerv2:
num_layers: 3
epoches: 120
optimizer:
type: AdamW
params:
-
params: '^(?=.*(?:norm|bn)).*$'
weight_decay: 0.
train_dataloader:
dataset:
transforms:
policy:
epoch: 117
collate_fn:
scales: ~

View File

@@ -0,0 +1,46 @@
__include__: [
'../dataset/voc_detection.yml',
'../runtime.yml',
'./include/dataloader.yml',
'./include/optimizer.yml',
'./include/rtdetrv2_r50vd.yml',
]
output_dir: ./output/rtdetrv2_r18vd_120e_voc
PResNet:
depth: 18
freeze_at: -1
freeze_norm: False
pretrained: True
HybridEncoder:
in_channels: [128, 256, 512]
hidden_dim: 256
expansion: 0.5
RTDETRTransformerv2:
num_layers: 3
epoches: 120
optimizer:
type: AdamW
params:
-
params: '^(?=.*(?:norm|bn)).*$'
weight_decay: 0.
train_dataloader:
dataset:
transforms:
policy:
epoch: 117
collate_fn:
scales: ~
total_batch_size: 32

View File

@@ -0,0 +1,49 @@
__include__: [
'../dataset/coco_detection.yml',
'../runtime.yml',
'./include/dataloader.yml',
'./include/optimizer.yml',
'./include/rtdetrv2_r50vd.yml',
]
tuning: https://github.com/lyuwenyu/storage/releases/download/v0.1/rtdetrv2_r18vd_120e_coco.pth
output_dir: ./output/rtdetrv2_r18vd_dsp_3x_coco
PResNet:
depth: 18
freeze_at: -1
freeze_norm: False
pretrained: True
HybridEncoder:
in_channels: [128, 256, 512]
hidden_dim: 256
expansion: 0.5
RTDETRTransformerv2:
num_layers: 3
num_points: [4, 4, 4]
cross_attn_method: discrete
epoches: 36
optimizer:
type: AdamW
params:
-
params: '^(?=.*(?:norm|bn)).*$'
weight_decay: 0.
train_dataloader:
dataset:
transforms:
policy:
epoch: 33
collate_fn:
scales: ~

View File

@@ -0,0 +1,47 @@
__include__: [
'../dataset/coco_detection.yml',
'../runtime.yml',
'./include/dataloader.yml',
'./include/optimizer.yml',
'./include/rtdetrv2_r50vd.yml',
]
output_dir: ./output/rtdetrv2_r18vd_sp1_120e_coco
PResNet:
depth: 18
freeze_at: -1
freeze_norm: False
pretrained: True
HybridEncoder:
in_channels: [128, 256, 512]
hidden_dim: 256
expansion: 0.5
RTDETRTransformerv2:
num_layers: 3
num_points: [1, 1, 1]
epoches: 120
optimizer:
type: AdamW
params:
-
params: '^(?=.*(?:norm|bn)).*$'
weight_decay: 0.
train_dataloader:
dataset:
transforms:
policy:
epoch: 117
collate_fn:
scales: ~

View File

@@ -0,0 +1,47 @@
__include__: [
'../dataset/coco_detection.yml',
'../runtime.yml',
'./include/dataloader.yml',
'./include/optimizer.yml',
'./include/rtdetrv2_r50vd.yml',
]
output_dir: ./output/rtdetrv2_r18vd_sp2_120e_coco
PResNet:
depth: 18
freeze_at: -1
freeze_norm: False
pretrained: True
HybridEncoder:
in_channels: [128, 256, 512]
hidden_dim: 256
expansion: 0.5
RTDETRTransformerv2:
num_layers: 3
num_points: [2, 2, 2]
epoches: 120
optimizer:
type: AdamW
params:
-
params: '^(?=.*(?:norm|bn)).*$'
weight_decay: 0.
train_dataloader:
dataset:
transforms:
policy:
epoch: 117
collate_fn:
scales: ~

View File

@@ -0,0 +1,47 @@
__include__: [
'../dataset/coco_detection.yml',
'../runtime.yml',
'./include/dataloader.yml',
'./include/optimizer.yml',
'./include/rtdetrv2_r50vd.yml',
]
output_dir: ./output/rtdetrv2_r18vd_sp3_120e_coco
PResNet:
depth: 18
freeze_at: -1
freeze_norm: False
pretrained: True
HybridEncoder:
in_channels: [128, 256, 512]
hidden_dim: 256
expansion: 0.5
RTDETRTransformerv2:
num_layers: 3
num_points: [3, 3, 3]
epoches: 120
optimizer:
type: AdamW
params:
-
params: '^(?=.*(?:norm|bn)).*$'
weight_decay: 0.
train_dataloader:
dataset:
transforms:
policy:
epoch: 117
collate_fn:
scales: ~

View File

@@ -0,0 +1,57 @@
__include__: [
'../dataset/coco_detection.yml',
'../runtime.yml',
'./include/dataloader.yml',
'./include/optimizer.yml',
'./include/rtdetrv2_r50vd.yml',
]
output_dir: ./output/rtdetrv2_r34vd_120e_coco
PResNet:
depth: 34
freeze_at: -1
freeze_norm: False
pretrained: True
HybridEncoder:
in_channels: [128, 256, 512]
hidden_dim: 256
expansion: 0.5
RTDETRTransformerv2:
num_layers: 4
epoches: 120
optimizer:
type: AdamW
params:
-
params: '^(?=.*backbone)(?!.*norm|bn).*$'
lr: 0.00005
-
params: '^(?=.*backbone)(?=.*norm|bn).*$'
lr: 0.00005
weight_decay: 0.
-
params: '^(?=.*(?:encoder|decoder))(?=.*(?:norm|bn|bias)).*$'
weight_decay: 0.
lr: 0.0001
betas: [0.9, 0.999]
weight_decay: 0.0001
train_dataloader:
dataset:
transforms:
policy:
epoch: 117
collate_fn:
stop_epoch: 117

View File

@@ -0,0 +1,59 @@
__include__: [
'../dataset/coco_detection.yml',
'../runtime.yml',
'./include/dataloader.yml',
'./include/optimizer.yml',
'./include/rtdetrv2_r50vd.yml',
]
tuning: https://github.com/lyuwenyu/storage/releases/download/v0.1/rtdetrv2_r34vd_120e_coco_ema.pth
output_dir: ./output/rtdetrv2_r34vd_dsp_1x_coco
PResNet:
depth: 34
freeze_at: -1
freeze_norm: False
pretrained: True
HybridEncoder:
in_channels: [128, 256, 512]
hidden_dim: 256
expansion: 0.5
RTDETRTransformerv2:
num_layers: 4
cross_attn_method: discrete
epoches: 12
optimizer:
type: AdamW
params:
-
params: '^(?=.*backbone)(?!.*norm|bn).*$'
lr: 0.00005
-
params: '^(?=.*backbone)(?=.*norm|bn).*$'
lr: 0.00005
weight_decay: 0.
-
params: '^(?=.*(?:encoder|decoder))(?=.*(?:norm|bn|bias)).*$'
weight_decay: 0.
lr: 0.0001
betas: [0.9, 0.999]
weight_decay: 0.0001
train_dataloader:
dataset:
transforms:
policy:
epoch: 10
collate_fn:
stop_epoch: 10

View File

@@ -0,0 +1,27 @@
__include__: [
'../dataset/coco_detection.yml',
'../runtime.yml',
'./include/dataloader.yml',
'./include/optimizer.yml',
'./include/rtdetrv2_r50vd.yml',
]
output_dir: ./output/rtdetrv2_r50vd_6x_coco
optimizer:
type: AdamW
params:
-
params: '^(?=.*backbone)(?!.*norm).*$'
lr: 0.00001
-
params: '^(?=.*(?:encoder|decoder))(?=.*(?:norm|bn)).*$'
weight_decay: 0.
lr: 0.0001
betas: [0.9, 0.999]
weight_decay: 0.0001

View File

@@ -0,0 +1,27 @@
__include__: [
'../dataset/coco_detection.yml',
'../runtime.yml',
'./include/dataloader.yml',
'./include/optimizer.yml',
'./include/rtdetrv2_r50vd.yml',
]
tuning: https://github.com/lyuwenyu/storage/releases/download/v0.1/rtdetrv2_r50vd_6x_coco_ema.pth
output_dir: ./output/rtdetrv2_r50vd_dsp_1x_coco
RTDETRTransformerv2:
cross_attn_method: discrete
epoches: 12
train_dataloader:
dataset:
transforms:
policy:
epoch: 10
collate_fn:
stop_epoch: 10

View File

@@ -0,0 +1,43 @@
__include__: [
'../dataset/coco_detection.yml',
'../runtime.yml',
'./include/dataloader.yml',
'./include/optimizer.yml',
'./include/rtdetrv2_r50vd.yml',
]
output_dir: ./output/rtdetrv2_r50vd_m_6x_coco
HybridEncoder:
expansion: 0.5
RTDETRTransformerv2:
eval_idx: 2 # use 3th decoder layer to eval
epoches: 84
optimizer:
type: AdamW
params:
-
params: '^(?=.*backbone)(?!.*norm).*$'
lr: 0.00001
-
params: '^(?=.*(?:encoder|decoder))(?=.*(?:norm|bn)).*$'
weight_decay: 0.
lr: 0.0001
betas: [0.9, 0.999]
weight_decay: 0.0001
train_dataloader:
dataset:
transforms:
policy:
epoch: 81
collate_fn:
stop_epoch: 81

View File

@@ -0,0 +1,44 @@
__include__: [
'../dataset/coco_detection.yml',
'../runtime.yml',
'./include/dataloader.yml',
'./include/optimizer.yml',
'./include/rtdetrv2_r50vd.yml',
]
output_dir: ./output/rtdetrv2_r50vd_m_dsp_3x_coco
tuning: https://github.com/lyuwenyu/storage/releases/download/v0.1/rtdetrv2_r50vd_m_7x_coco_ema.pth
HybridEncoder:
expansion: 0.5
RTDETRTransformerv2:
eval_idx: 2 # use 3th decoder layer to eval
cross_attn_method: discrete
epoches: 36
optimizer:
type: AdamW
params:
-
params: '^(?=.*backbone)(?!.*norm).*$'
lr: 0.00001
-
params: '^(?=.*(?:encoder|decoder))(?=.*(?:norm|bn)).*$'
weight_decay: 0.
lr: 0.0001
betas: [0.9, 0.999]
weight_decay: 0.0001
train_dataloader:
dataset:
transforms:
policy:
epoch: 33
collate_fn:
stop_epoch: 33

View File

@@ -0,0 +1,21 @@
print_freq: 100
output_dir: './logs'
checkpoint_freq: 1
sync_bn: True
find_unused_parameters: False
use_amp: False
scaler:
type: GradScaler
enabled: True
use_ema: False
ema:
type: ModelEMA
decay: 0.9999
warmups: 2000

View File

@@ -0,0 +1,23 @@
services:
tensorrt-container:
build:
context: .
dockerfile: Dockerfile
image: rtdetr-v2:25.06
container_name: rtdetr-v2-trt
ports:
- "6006:6006" # tensorboard
volumes:
- ./:/workspace
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
working_dir: /workspace
restart: unless-stopped
stdin_open: true
tty: true
command: bash

View File

@@ -0,0 +1,2 @@
# Deployment

View File

@@ -0,0 +1,61 @@
"""Copyright(c) 2023 lyuwenyu. All Rights Reserved.
"""
import torch
import torchvision.transforms as T
import numpy as np
import onnxruntime as ort
from PIL import Image, ImageDraw
def draw(images, labels, boxes, scores, thrh = 0.6):
for i, im in enumerate(images):
draw = ImageDraw.Draw(im)
scr = scores[i]
lab = labels[i][scr > thrh]
box = boxes[i][scr > thrh]
for b in box:
draw.rectangle(list(b), outline='red',)
draw.text((b[0], b[1]), text=str(lab[i].item()), fill='blue', )
im.save(f'results_{i}.jpg')
def main(args, ):
"""main
"""
sess = ort.InferenceSession(args.onnx_file)
print(ort.get_device())
im_pil = Image.open(args.im_file).convert('RGB')
w, h = im_pil.size
orig_size = torch.tensor([w, h])[None]
transforms = T.Compose([
T.Resize((640, 640)),
T.ToTensor(),
])
im_data = transforms(im_pil)[None]
output = sess.run(
# output_names=['labels', 'boxes', 'scores'],
output_names=None,
input_feed={'images': im_data.data.numpy(), "orig_target_sizes": orig_size.data.numpy()}
)
labels, boxes, scores = output
draw([im_pil], labels, boxes, scores)
if __name__ == '__main__':
import argparse
parser = argparse.ArgumentParser()
parser.add_argument('--onnx-file', type=str, )
parser.add_argument('--im-file', type=str, )
# parser.add_argument('-d', '--device', type=str, default='cpu')
args = parser.parse_args()
main(args)

View File

@@ -0,0 +1,5 @@
"""Copyright(c) 2023 lyuwenyu. All Rights Reserved.
"""
# please reference: https://github.com/guojin-yan/RT-DETR-OpenVINO

View File

@@ -0,0 +1,258 @@
# Copyright 2023 lyuwenyu. All Rights Reserved.
# Copyright (c) 2025 Hitbee-dev. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# ==============================================================================
# NOTICE: This file has been heavily modified by [Hitbee-dev] from the original source.
# Modifications include restructuring for broader GPU architecture compatibility
# (including NVIDIA Blackwell), improved modularity, and enhanced testability.
# ==============================================================================
import time
import numpy as np
import torch
import tensorrt as trt
from collections import OrderedDict
from PIL import Image, ImageDraw, ImageFont
class TRTInference(object):
"""
A high-level wrapper for TensorRT inference, designed for ease of use and flexibility.
This class handles engine loading, context creation, and dynamic buffer allocation.
"""
def __init__(self, engine_path, device='cuda:0', verbose=False):
"""
Initializes the TRTInference instance.
Args:
engine_path (str): Path to the serialized TensorRT engine file.
device (str): The device to run inference on (e.g., 'cuda:0').
verbose (bool): If True, enables verbose logging from the TensorRT logger.
"""
self.engine_path = engine_path
self.device = torch.device(device)
self.logger = trt.Logger(trt.Logger.VERBOSE) if verbose else trt.Logger(trt.Logger.INFO)
trt.init_libnvinfer_plugins(self.logger, '')
self.runtime = trt.Runtime(self.logger)
self.engine = self._load_engine(engine_path)
self.context = self.engine.create_execution_context()
self.input_names, self.output_names = self._get_io_names()
self.buffers_allocated = False
self.gpu_buffers = OrderedDict()
self.binding_addrs = OrderedDict()
print(f"[TRTInference] Initialized successfully. Engine: '{engine_path}'.")
def _load_engine(self, path):
"""Loads a TensorRT engine from a file."""
with open(path, 'rb') as f:
engine = self.runtime.deserialize_cuda_engine(f.read())
if engine is None:
raise RuntimeError(f"Failed to load TensorRT engine from '{path}'.")
return engine
def _get_io_names(self):
"""Parses input and output tensor names from the engine."""
input_names, output_names = [], []
for i in range(self.engine.num_io_tensors):
name = self.engine.get_tensor_name(i)
if self.engine.get_tensor_mode(name) == trt.TensorIOMode.INPUT:
input_names.append(name)
else:
output_names.append(name)
return input_names, output_names
def _allocate_buffers(self, blob: dict):
"""
Allocates GPU buffers for inputs and outputs based on the first inference request.
This "lazy allocation" strategy handles dynamic input shapes gracefully.
"""
print("[TRTInference] First inference call detected. Allocating GPU buffers...")
for name in self.input_names:
tensor = blob[name]
shape = tuple(tensor.shape)
dtype = tensor.dtype
self.context.set_input_shape(name, shape)
self.gpu_buffers[name] = torch.empty(shape, dtype=dtype, device=self.device)
self.binding_addrs[name] = self.gpu_buffers[name].data_ptr()
print(f" - Input '{name}': allocated buffer with shape {shape}.")
for name in self.output_names:
shape = tuple(self.context.get_tensor_shape(name))
dtype = trt.nptype(self.engine.get_tensor_dtype(name))
torch_dtype = torch.from_numpy(np.array(0, dtype=dtype)).dtype
self.gpu_buffers[name] = torch.empty(shape, dtype=torch_dtype, device=self.device)
self.binding_addrs[name] = self.gpu_buffers[name].data_ptr()
print(f" - Output '{name}': allocated buffer with shape {shape}.")
self.buffers_allocated = True
print("[TRTInference] GPU buffers allocated successfully.")
def __call__(self, blob: dict):
"""
Executes inference on the loaded TensorRT engine.
Args:
blob (dict): A dictionary mapping input tensor names to their corresponding
torch.Tensor data on the GPU.
Returns:
dict: A dictionary mapping output tensor names to their corresponding
torch.Tensor results on the GPU.
"""
if not self.buffers_allocated:
self._allocate_buffers(blob)
for name in self.input_names:
self.gpu_buffers[name].copy_(blob[name])
self.context.execute_v2(bindings=list(self.binding_addrs.values()))
return {name: self.gpu_buffers[name] for name in self.output_names}
# --- Visualization Utility Function ---
COCO_CLASSES = [
'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic light',
'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',
'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee',
'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple',
'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard',
'cell phone', 'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase',
'scissors', 'teddy bear', 'hair drier', 'toothbrush'
]
def visualize_detections(image_pil, boxes, scores, labels, class_names=COCO_CLASSES, threshold=0.5):
"""
Draws bounding boxes on a PIL image. This function is a general-purpose utility.
Args:
image_pil (PIL.Image.Image): The image to draw on.
boxes (torch.Tensor): A tensor of bounding boxes (shape: [N, 4]).
scores (torch.Tensor): A tensor of confidence scores (shape: [N]).
labels (torch.Tensor): A tensor of class labels (shape: [N]).
class_names (list): A list of strings corresponding to class labels.
threshold (float): The confidence threshold for displaying detections.
Returns:
PIL.Image.Image: The image with detections drawn on it.
"""
img_draw = image_pil.copy()
draw = ImageDraw.Draw(img_draw)
# Ensure tensors are on CPU and converted to NumPy for processing
boxes = boxes.cpu().numpy()
scores = scores.cpu().numpy()
labels = labels.cpu().numpy()
count = 0
for i in range(len(scores)):
score = scores[i]
if score < threshold:
continue
count += 1
box = boxes[i]
label_idx = int(labels[i])
xmin, ymin, xmax, ymax = box
class_name = class_names[label_idx] if label_idx < len(class_names) else f'CLS-{label_idx}'
color = 'red' # Keep it simple or use a color map
draw.rectangle(((xmin, ymin), (xmax, ymax)), outline=color, width=3)
text = f"{class_name}: {score:.2f}"
try:
font = ImageFont.truetype("arial.ttf", 20)
except IOError:
font = ImageFont.load_default()
text_bbox = draw.textbbox((xmin, ymin), text, font=font)
draw.rectangle(text_bbox, fill=color)
draw.text((xmin, ymin), text, fill="white", font=font)
print(f" - Found {count} objects above threshold {threshold}.")
return img_draw
if __name__ == '__main__':
import argparse
import torchvision.transforms as T
import os
parser = argparse.ArgumentParser(description="Test script for the TRTInference wrapper.")
parser.add_argument('--engine', type=str, required=True, help="Path to the TensorRT engine file.")
parser.add_argument('--image', type=str, required=True, help="Path to the input image file.")
parser.add_argument('--output', type=str, default='output.jpg', help="Path to save the output image with detections.")
parser.add_argument('--device', type=str, default='cuda:0', help="Device to run inference on.")
parser.add_argument('--threshold', type=float, default=0.5, help="Confidence threshold for displaying detections.")
args = parser.parse_args()
if not torch.cuda.is_available():
raise SystemExit("CUDA is not available. This script requires a GPU.")
print("--- TRTInference Wrapper Test ---")
print("\n1. Initializing TRTInference...")
trt_model = TRTInference(args.engine, device=args.device)
print("\n2. Preprocessing input image...")
image_pil = Image.open(args.image).convert('RGB')
w, h = image_pil.size
transforms = T.Compose([
T.Resize((640, 640)),
T.ToTensor(),
])
image_tensor = transforms(image_pil).unsqueeze(0).to(args.device)
orig_size_tensor = torch.tensor([[w, h]], dtype=torch.int64, device=args.device)
blob = {
'images': image_tensor,
'orig_target_sizes': orig_size_tensor
}
print(f" - Original image size: {w}x{h}")
print(f" - Input tensor shape: {image_tensor.shape}")
print("\n3. Running inference...")
start_time = time.time()
output_gpu = trt_model(blob)
torch.cuda.synchronize()
end_time = time.time()
print(f"\n4. Inference complete in { (end_time - start_time) * 1000:.2f} ms.")
print("\n5. Post-processing and saving output image...")
output_labels = output_gpu['labels'][0]
output_boxes = output_gpu['boxes'][0]
output_scores = output_gpu['scores'][0]
# Use the new, separate visualization function
result_image = visualize_detections(
image_pil,
output_boxes,
output_scores,
output_labels,
threshold=args.threshold
)
result_image.save(args.output)
print(f" - Output image with detections saved to: {os.path.abspath(args.output)}")
print("\n--- Test finished successfully ---")

View File

@@ -0,0 +1,84 @@
"""Copyright(c) 2023 lyuwenyu. All Rights Reserved.
"""
import torch
import torch.nn as nn
import torchvision.transforms as T
import numpy as np
from PIL import Image, ImageDraw
from src.core import YAMLConfig
def draw(images, labels, boxes, scores, thrh = 0.6):
for i, im in enumerate(images):
draw = ImageDraw.Draw(im)
scr = scores[i]
lab = labels[i][scr > thrh]
box = boxes[i][scr > thrh]
scrs = scores[i][scr > thrh]
for j,b in enumerate(box):
draw.rectangle(list(b), outline='red',)
draw.text((b[0], b[1]), text=f"{lab[j].item()} {round(scrs[j].item(),2)}", fill='blue', )
im.save(f'results_{i}.jpg')
def main(args, ):
"""main
"""
cfg = YAMLConfig(args.config, resume=args.resume)
if args.resume:
checkpoint = torch.load(args.resume, map_location='cpu')
if 'ema' in checkpoint:
state = checkpoint['ema']['module']
else:
state = checkpoint['model']
else:
raise AttributeError('Only support resume to load model.state_dict by now.')
# NOTE load train mode state -> convert to deploy mode
cfg.model.load_state_dict(state)
class Model(nn.Module):
def __init__(self, ) -> None:
super().__init__()
self.model = cfg.model.deploy()
self.postprocessor = cfg.postprocessor.deploy()
def forward(self, images, orig_target_sizes):
outputs = self.model(images)
outputs = self.postprocessor(outputs, orig_target_sizes)
return outputs
model = Model().to(args.device)
im_pil = Image.open(args.im_file).convert('RGB')
w, h = im_pil.size
orig_size = torch.tensor([w, h])[None].to(args.device)
transforms = T.Compose([
T.Resize((640, 640)),
T.ToTensor(),
])
im_data = transforms(im_pil)[None].to(args.device)
output = model(im_data, orig_size)
labels, boxes, scores = output
draw([im_pil], labels, boxes, scores)
if __name__ == '__main__':
import argparse
parser = argparse.ArgumentParser()
parser.add_argument('-c', '--config', type=str, )
parser.add_argument('-r', '--resume', type=str, )
parser.add_argument('-f', '--im-file', type=str, )
parser.add_argument('-d', '--device', type=str, default='cpu')
args = parser.parse_args()
main(args)

View File

@@ -0,0 +1,9 @@
torch>=2.0.1
torchvision>=0.15.2
faster-coco-eval>=1.6.6
PyYAML
tensorboard
scipy
pycocotools
onnx
onnxruntime-gpu

View File

@@ -0,0 +1,8 @@
"""Copyright(c) 2023 lyuwenyu. All Rights Reserved.
"""
# for register purpose
from . import optim
from . import data
from . import nn
from . import zoo

View File

@@ -0,0 +1,7 @@
"""Copyright(c) 2023 lyuwenyu. All Rights Reserved.
"""
from .workspace import GLOBAL_CONFIG, register, create
from .yaml_utils import *
from ._config import BaseConfig
from .yaml_config import YAMLConfig

View File

@@ -0,0 +1,290 @@
"""Copyright(c) 2023 lyuwenyu. All Rights Reserved.
"""
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader
from torch.optim import Optimizer
from torch.optim.lr_scheduler import LRScheduler
from torch.cuda.amp.grad_scaler import GradScaler
from torch.utils.tensorboard import SummaryWriter
from pathlib import Path
from typing import Callable, List, Dict
__all__ = ['BaseConfig', ]
class BaseConfig(object):
# TODO property
def __init__(self) -> None:
super().__init__()
self.task :str = None
# instance / function
self._model :nn.Module = None
self._postprocessor :nn.Module = None
self._criterion :nn.Module = None
self._optimizer :Optimizer = None
self._lr_scheduler :LRScheduler = None
self._lr_warmup_scheduler: LRScheduler = None
self._train_dataloader :DataLoader = None
self._val_dataloader :DataLoader = None
self._ema :nn.Module = None
self._scaler :GradScaler = None
self._train_dataset :Dataset = None
self._val_dataset :Dataset = None
self._collate_fn :Callable = None
self._evaluator :Callable[[nn.Module, DataLoader, str], ] = None
self._writer: SummaryWriter = None
# dataset
self.num_workers :int = 0
self.batch_size :int = None
self._train_batch_size :int = None
self._val_batch_size :int = None
self._train_shuffle: bool = None
self._val_shuffle: bool = None
# runtime
self.resume :str = None
self.tuning :str = None
self.epoches :int = None
self.last_epoch :int = -1
self.use_amp :bool = False
self.use_ema :bool = False
self.ema_decay :float = 0.9999
self.ema_warmups: int = 2000
self.sync_bn :bool = False
self.clip_max_norm : float = 0.
self.find_unused_parameters :bool = None
self.seed :int = None
self.print_freq :int = None
self.checkpoint_freq :int = 1
self.output_dir :str = None
self.summary_dir :str = None
self.device : str = ''
@property
def model(self, ) -> nn.Module:
return self._model
@model.setter
def model(self, m):
assert isinstance(m, nn.Module), f'{type(m)} != nn.Module, please check your model class'
self._model = m
@property
def postprocessor(self, ) -> nn.Module:
return self._postprocessor
@postprocessor.setter
def postprocessor(self, m):
assert isinstance(m, nn.Module), f'{type(m)} != nn.Module, please check your model class'
self._postprocessor = m
@property
def criterion(self, ) -> nn.Module:
return self._criterion
@criterion.setter
def criterion(self, m):
assert isinstance(m, nn.Module), f'{type(m)} != nn.Module, please check your model class'
self._criterion = m
@property
def optimizer(self, ) -> Optimizer:
return self._optimizer
@optimizer.setter
def optimizer(self, m):
assert isinstance(m, Optimizer), f'{type(m)} != optim.Optimizer, please check your model class'
self._optimizer = m
@property
def lr_scheduler(self, ) -> LRScheduler:
return self._lr_scheduler
@lr_scheduler.setter
def lr_scheduler(self, m):
assert isinstance(m, LRScheduler), f'{type(m)} != LRScheduler, please check your model class'
self._lr_scheduler = m
@property
def lr_warmup_scheduler(self, ) -> LRScheduler:
return self._lr_warmup_scheduler
@lr_warmup_scheduler.setter
def lr_warmup_scheduler(self, m):
self._lr_warmup_scheduler = m
@property
def train_dataloader(self) -> DataLoader:
if self._train_dataloader is None and self.train_dataset is not None:
loader = DataLoader(self.train_dataset,
batch_size=self.train_batch_size,
num_workers=self.num_workers,
collate_fn=self.collate_fn,
shuffle=self.train_shuffle, )
loader.shuffle = self.train_shuffle
self._train_dataloader = loader
return self._train_dataloader
@train_dataloader.setter
def train_dataloader(self, loader):
self._train_dataloader = loader
@property
def val_dataloader(self) -> DataLoader:
if self._val_dataloader is None and self.val_dataset is not None:
loader = DataLoader(self.val_dataset,
batch_size=self.val_batch_size,
num_workers=self.num_workers,
drop_last=False,
collate_fn=self.collate_fn,
shuffle=self.val_shuffle)
loader.shuffle = self.val_shuffle
self._val_dataloader = loader
return self._val_dataloader
@val_dataloader.setter
def val_dataloader(self, loader):
self._val_dataloader = loader
@property
def ema(self, ) -> nn.Module:
if self._ema is None and self.use_ema and self.model is not None:
from ..optim import ModelEMA
self._ema = ModelEMA(self.model, self.ema_decay, self.ema_warmups)
return self._ema
@ema.setter
def ema(self, obj):
self._ema = obj
@property
def scaler(self) -> GradScaler:
if self._scaler is None and self.use_amp and torch.cuda.is_available():
self._scaler = GradScaler()
return self._scaler
@scaler.setter
def scaler(self, obj: GradScaler):
self._scaler = obj
@property
def val_shuffle(self) -> bool:
if self._val_shuffle is None:
print('warning: set default val_shuffle=False')
return False
return self._val_shuffle
@val_shuffle.setter
def val_shuffle(self, shuffle):
assert isinstance(shuffle, bool), 'shuffle must be bool'
self._val_shuffle = shuffle
@property
def train_shuffle(self) -> bool:
if self._train_shuffle is None:
print('warning: set default train_shuffle=True')
return True
return self._train_shuffle
@train_shuffle.setter
def train_shuffle(self, shuffle):
assert isinstance(shuffle, bool), 'shuffle must be bool'
self._train_shuffle = shuffle
@property
def train_batch_size(self) -> int:
if self._train_batch_size is None and isinstance(self.batch_size, int):
print(f'warning: set train_batch_size=batch_size={self.batch_size}')
return self.batch_size
return self._train_batch_size
@train_batch_size.setter
def train_batch_size(self, batch_size):
assert isinstance(batch_size, int), 'batch_size must be int'
self._train_batch_size = batch_size
@property
def val_batch_size(self) -> int:
if self._val_batch_size is None:
print(f'warning: set val_batch_size=batch_size={self.batch_size}')
return self.batch_size
return self._val_batch_size
@val_batch_size.setter
def val_batch_size(self, batch_size):
assert isinstance(batch_size, int), 'batch_size must be int'
self._val_batch_size = batch_size
@property
def train_dataset(self) -> Dataset:
return self._train_dataset
@train_dataset.setter
def train_dataset(self, dataset):
assert isinstance(dataset, Dataset), f'{type(dataset)} must be Dataset'
self._train_dataset = dataset
@property
def val_dataset(self) -> Dataset:
return self._val_dataset
@val_dataset.setter
def val_dataset(self, dataset):
assert isinstance(dataset, Dataset), f'{type(dataset)} must be Dataset'
self._val_dataset = dataset
@property
def collate_fn(self) -> Callable:
return self._collate_fn
@collate_fn.setter
def collate_fn(self, fn):
assert isinstance(fn, Callable), f'{type(fn)} must be Callable'
self._collate_fn = fn
@property
def evaluator(self) -> Callable:
return self._evaluator
@evaluator.setter
def evaluator(self, fn):
assert isinstance(fn, Callable), f'{type(fn)} must be Callable'
self._evaluator = fn
@property
def writer(self) -> SummaryWriter:
if self._writer is None:
if self.summary_dir:
self._writer = SummaryWriter(self.summary_dir)
elif self.output_dir:
self._writer = SummaryWriter(Path(self.output_dir) / 'summary')
return self._writer
@writer.setter
def writer(self, m):
assert isinstance(m, SummaryWriter), f'{type(m)} must be SummaryWriter'
self._writer = m
def __repr__(self, ):
s = ''
for k, v in self.__dict__.items():
if not k.startswith('_'):
s += f'{k}: {v}\n'
return s

View File

@@ -0,0 +1,179 @@
""""Copyright(c) 2023 lyuwenyu. All Rights Reserved.
"""
import inspect
import importlib
import functools
import inspect
from collections import defaultdict
from typing import Any, Dict, Optional, List
GLOBAL_CONFIG = defaultdict(dict)
def register(dct :Any=GLOBAL_CONFIG, name=None, force=False):
"""
dct:
if dct is Dict, register foo into dct as key-value pair
if dct is Clas, register as modules attibute
force
whether force register.
"""
def decorator(foo):
register_name = foo.__name__ if name is None else name
if not force:
if inspect.isclass(dct):
assert not hasattr(dct, foo.__name__), \
f'module {dct.__name__} has {foo.__name__}'
else:
assert foo.__name__ not in dct, \
f'{foo.__name__} has been already registered'
if inspect.isfunction(foo):
@functools.wraps(foo)
def wrap_func(*args, **kwargs):
return foo(*args, **kwargs)
if isinstance(dct, dict):
dct[foo.__name__] = wrap_func
elif inspect.isclass(dct):
setattr(dct, foo.__name__, wrap_func)
else:
raise AttributeError('')
return wrap_func
elif inspect.isclass(foo):
dct[register_name] = extract_schema(foo)
else:
raise ValueError(f'Do not support {type(foo)} register')
return foo
return decorator
def extract_schema(module: type):
"""
Args:
module (type),
Return:
Dict,
"""
argspec = inspect.getfullargspec(module.__init__)
arg_names = [arg for arg in argspec.args if arg != 'self']
num_defualts = len(argspec.defaults) if argspec.defaults is not None else 0
num_requires = len(arg_names) - num_defualts
schame = dict()
schame['_name'] = module.__name__
schame['_pymodule'] = importlib.import_module(module.__module__)
schame['_inject'] = getattr(module, '__inject__', [])
schame['_share'] = getattr(module, '__share__', [])
schame['_kwargs'] = {}
for i, name in enumerate(arg_names):
if name in schame['_share']:
assert i >= num_requires, 'share config must have default value.'
value = argspec.defaults[i - num_requires]
elif i >= num_requires:
value = argspec.defaults[i - num_requires]
else:
value = None
schame[name] = value
schame['_kwargs'][name] = value
return schame
def create(type_or_name, global_cfg=GLOBAL_CONFIG, **kwargs):
"""
"""
assert type(type_or_name) in (type, str), 'create should be modules or name.'
name = type_or_name if isinstance(type_or_name, str) else type_or_name.__name__
if name in global_cfg:
if hasattr(global_cfg[name], '__dict__'):
return global_cfg[name]
else:
raise ValueError('The module {} is not registered'.format(name))
cfg = global_cfg[name]
if isinstance(cfg, dict) and 'type' in cfg:
_cfg: dict = global_cfg[cfg['type']]
# clean args
_keys = [k for k in _cfg.keys() if not k.startswith('_')]
for _arg in _keys:
del _cfg[_arg]
_cfg.update(_cfg['_kwargs']) # restore default args
_cfg.update(cfg) # load config args
_cfg.update(kwargs) # TODO recive extra kwargs
name = _cfg.pop('type') # pop extra key `type` (from cfg)
return create(name, global_cfg)
module = getattr(cfg['_pymodule'], name)
module_kwargs = {}
module_kwargs.update(cfg)
# shared var
for k in cfg['_share']:
if k in global_cfg:
module_kwargs[k] = global_cfg[k]
else:
module_kwargs[k] = cfg[k]
# inject
for k in cfg['_inject']:
_k = cfg[k]
if _k is None:
continue
if isinstance(_k, str):
if _k not in global_cfg:
raise ValueError(f'Missing inject config of {_k}.')
_cfg = global_cfg[_k]
if isinstance(_cfg, dict):
module_kwargs[k] = create(_cfg['_name'], global_cfg)
else:
module_kwargs[k] = _cfg
elif isinstance(_k, dict):
if 'type' not in _k.keys():
raise ValueError(f'Missing inject for `type` style.')
_type = str(_k['type'])
if _type not in global_cfg:
raise ValueError(f'Missing {_type} in inspect stage.')
# TODO
_cfg: dict = global_cfg[_type]
# clean args
_keys = [k for k in _cfg.keys() if not k.startswith('_')]
for _arg in _keys:
del _cfg[_arg]
_cfg.update(_cfg['_kwargs']) # restore default values
_cfg.update(_k) # load config args
name = _cfg.pop('type') # pop extra key (`type` from _k)
module_kwargs[k] = create(name, global_cfg)
else:
raise ValueError(f'Inject does not support {_k}')
# TODO hard code
module_kwargs = {k: v for k, v in module_kwargs.items() if not k.startswith('_')}
# TODO for **kwargs
# extra_args = set(module_kwargs.keys()) - set(arg_names)
# if len(extra_args) > 0:
# raise RuntimeError(f'Error: unknown args {extra_args} for {module}')
return module(**module_kwargs)

View File

@@ -0,0 +1,172 @@
"""Copyright(c) 2023 lyuwenyu. All Rights Reserved.
"""
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
import re
import copy
from ._config import BaseConfig
from .workspace import create
from .yaml_utils import load_config, merge_config, merge_dict
class YAMLConfig(BaseConfig):
def __init__(self, cfg_path: str, **kwargs) -> None:
super().__init__()
cfg = load_config(cfg_path)
cfg = merge_dict(cfg, kwargs)
self.yaml_cfg = copy.deepcopy(cfg)
for k in super().__dict__:
if not k.startswith('_') and k in cfg:
self.__dict__[k] = cfg[k]
@property
def global_cfg(self, ):
return merge_config(self.yaml_cfg, inplace=False, overwrite=False)
@property
def model(self, ) -> torch.nn.Module:
if self._model is None and 'model' in self.yaml_cfg:
self._model = create(self.yaml_cfg['model'], self.global_cfg)
return super().model
@property
def postprocessor(self, ) -> torch.nn.Module:
if self._postprocessor is None and 'postprocessor' in self.yaml_cfg:
self._postprocessor = create(self.yaml_cfg['postprocessor'], self.global_cfg)
return super().postprocessor
@property
def criterion(self, ) -> torch.nn.Module:
if self._criterion is None and 'criterion' in self.yaml_cfg:
self._criterion = create(self.yaml_cfg['criterion'], self.global_cfg)
return super().criterion
@property
def optimizer(self, ) -> optim.Optimizer:
if self._optimizer is None and 'optimizer' in self.yaml_cfg:
params = self.get_optim_params(self.yaml_cfg['optimizer'], self.model)
self._optimizer = create('optimizer', self.global_cfg, params=params)
return super().optimizer
@property
def lr_scheduler(self, ) -> optim.lr_scheduler.LRScheduler:
if self._lr_scheduler is None and 'lr_scheduler' in self.yaml_cfg:
self._lr_scheduler = create('lr_scheduler', self.global_cfg, optimizer=self.optimizer)
print(f'Initial lr: {self._lr_scheduler.get_last_lr()}')
return super().lr_scheduler
@property
def lr_warmup_scheduler(self, ) -> optim.lr_scheduler.LRScheduler:
if self._lr_warmup_scheduler is None and 'lr_warmup_scheduler' in self.yaml_cfg :
self._lr_warmup_scheduler = create('lr_warmup_scheduler', self.global_cfg, lr_scheduler=self.lr_scheduler)
return super().lr_warmup_scheduler
@property
def train_dataloader(self, ) -> DataLoader:
if self._train_dataloader is None and 'train_dataloader' in self.yaml_cfg:
self._train_dataloader = self.build_dataloader('train_dataloader')
return super().train_dataloader
@property
def val_dataloader(self, ) -> DataLoader:
if self._val_dataloader is None and 'val_dataloader' in self.yaml_cfg:
self._val_dataloader = self.build_dataloader('val_dataloader')
return super().val_dataloader
@property
def ema(self, ) -> torch.nn.Module:
if self._ema is None and self.yaml_cfg.get('use_ema', False):
self._ema = create('ema', self.global_cfg, model=self.model)
return super().ema
@property
def scaler(self, ):
if self._scaler is None and self.yaml_cfg.get('use_amp', False):
self._scaler = create('scaler', self.global_cfg)
return super().scaler
@property
def evaluator(self, ):
if self._evaluator is None and 'evaluator' in self.yaml_cfg:
if self.yaml_cfg['evaluator']['type'] == 'CocoEvaluator':
from ..data import get_coco_api_from_dataset
base_ds = get_coco_api_from_dataset(self.val_dataloader.dataset)
self._evaluator = create('evaluator', self.global_cfg, coco_gt=base_ds)
else:
raise NotImplementedError(f"{self.yaml_cfg['evaluator']['type']}")
return super().evaluator
@staticmethod
def get_optim_params(cfg: dict, model: nn.Module):
"""
E.g.:
^(?=.*a)(?=.*b).*$ means including a and b
^(?=.*(?:a|b)).*$ means including a or b
^(?=.*a)(?!.*b).*$ means including a, but not b
"""
assert 'type' in cfg, ''
cfg = copy.deepcopy(cfg)
if 'params' not in cfg:
return model.parameters()
assert isinstance(cfg['params'], list), ''
param_groups = []
visited = []
for pg in cfg['params']:
pattern = pg['params']
params = {k: v for k, v in model.named_parameters() if v.requires_grad and len(re.findall(pattern, k)) > 0}
pg['params'] = params.values()
param_groups.append(pg)
visited.extend(list(params.keys()))
# print(params.keys())
names = [k for k, v in model.named_parameters() if v.requires_grad]
if len(visited) < len(names):
unseen = set(names) - set(visited)
params = {k: v for k, v in model.named_parameters() if v.requires_grad and k in unseen}
param_groups.append({'params': params.values()})
visited.extend(list(params.keys()))
# print(params.keys())
assert len(visited) == len(names), ''
return param_groups
@staticmethod
def get_rank_batch_size(cfg):
"""compute batch size for per rank if total_batch_size is provided.
"""
assert ('total_batch_size' in cfg or 'batch_size' in cfg) \
and not ('total_batch_size' in cfg and 'batch_size' in cfg), \
'`batch_size` or `total_batch_size` should be choosed one'
total_batch_size = cfg.get('total_batch_size', None)
if total_batch_size is None:
bs = cfg.get('batch_size')
else:
from ..misc import dist_utils
assert total_batch_size % dist_utils.get_world_size() == 0, \
'total_batch_size should be divisible by world size'
bs = total_batch_size // dist_utils.get_world_size()
return bs
def build_dataloader(self, name: str):
bs = self.get_rank_batch_size(self.yaml_cfg[name])
global_cfg = self.global_cfg
if 'total_batch_size' in global_cfg[name]:
# pop unexpected key for dataloader init
_ = global_cfg[name].pop('total_batch_size')
print(f'building {name} with batch_size={bs}...')
loader = create(name, global_cfg, batch_size=bs)
loader.shuffle = self.yaml_cfg[name].get('shuffle', False)
return loader

View File

@@ -0,0 +1,124 @@
""""Copyright(c) 2023 lyuwenyu. All Rights Reserved.
"""
import os
import copy
import yaml
from typing import Any, Dict, Optional, List
from .workspace import GLOBAL_CONFIG
__all__ = [
'load_config',
'merge_config',
'merge_dict',
'parse_cli',
]
INCLUDE_KEY = '__include__'
def load_config(file_path, cfg=dict()):
"""load config
"""
_, ext = os.path.splitext(file_path)
assert ext in ['.yml', '.yaml'], "only support yaml files"
with open(file_path) as f:
file_cfg = yaml.load(f, Loader=yaml.Loader)
if file_cfg is None:
return {}
if INCLUDE_KEY in file_cfg:
base_yamls = list(file_cfg[INCLUDE_KEY])
for base_yaml in base_yamls:
if base_yaml.startswith('~'):
base_yaml = os.path.expanduser(base_yaml)
if not base_yaml.startswith('/'):
base_yaml = os.path.join(os.path.dirname(file_path), base_yaml)
with open(base_yaml) as f:
base_cfg = load_config(base_yaml, cfg)
merge_dict(cfg, base_cfg)
return merge_dict(cfg, file_cfg)
def merge_dict(dct, another_dct, inplace=True) -> Dict:
"""merge another_dct into dct
"""
def _merge(dct, another) -> Dict:
for k in another:
if (k in dct and isinstance(dct[k], dict) and isinstance(another[k], dict)):
_merge(dct[k], another[k])
else:
dct[k] = another[k]
return dct
if not inplace:
dct = copy.deepcopy(dct)
return _merge(dct, another_dct)
def dictify(s: str, v: Any) -> Dict:
if '.' not in s:
return {s: v}
key, rest = s.split('.', 1)
return {key: dictify(rest, v)}
def parse_cli(nargs: List[str]) -> Dict:
"""
parse command-line arguments
convert `a.c=3 b=10` to `{'a': {'c': 3}, 'b': 10}`
"""
cfg = {}
if nargs is None or len(nargs) == 0:
return cfg
for s in nargs:
s = s.strip()
k, v = s.split('=', 1)
d = dictify(k, yaml.load(v, Loader=yaml.Loader))
cfg = merge_dict(cfg, d)
return cfg
def merge_config(cfg, another_cfg=GLOBAL_CONFIG, inplace: bool=False, overwrite: bool=False):
"""
Merge another_cfg into cfg, return the merged config
Example:
cfg1 = load_config('./rtdetrv2_r18vd_6x_coco.yml')
cfg1 = merge_config(cfg, inplace=True)
cfg2 = load_config('./rtdetr_r50vd_6x_coco.yml')
cfg2 = merge_config(cfg2, inplace=True)
model1 = create(cfg1['model'], cfg1)
model2 = create(cfg2['model'], cfg2)
"""
def _merge(dct, another):
for k in another:
if k not in dct:
dct[k] = another[k]
elif isinstance(dct[k], dict) and isinstance(another[k], dict):
_merge(dct[k], another[k])
elif overwrite:
dct[k] = another[k]
return cfg
if not inplace:
cfg = copy.deepcopy(cfg)
return _merge(cfg, another_cfg)

View File

@@ -0,0 +1,21 @@
"""Copyright(c) 2023 lyuwenyu. All Rights Reserved.
"""
from .dataset import *
from .transforms import *
from .dataloader import *
from ._misc import convert_to_tv_tensor
# def set_epoch(self, epoch) -> None:
# self.epoch = epoch
# def _set_epoch_func(datasets):
# """Add `set_epoch` for datasets
# """
# from ..core import register
# for ds in datasets:
# register(ds)(set_epoch)
# _set_epoch_func([CIFAR10, VOCDetection, CocoDetection])

View File

@@ -0,0 +1,55 @@
"""Copyright(c) 2023 lyuwenyu. All Rights Reserved.
"""
import importlib.metadata
from torch import Tensor
if importlib.metadata.version('torchvision') == '0.15.2':
import torchvision
torchvision.disable_beta_transforms_warning()
from torchvision.datapoints import BoundingBox as BoundingBoxes
from torchvision.datapoints import BoundingBoxFormat, Mask, Image, Video
from torchvision.transforms.v2 import SanitizeBoundingBox as SanitizeBoundingBoxes
_boxes_keys = ['format', 'spatial_size']
elif '0.17' > importlib.metadata.version('torchvision') >= '0.16':
import torchvision
torchvision.disable_beta_transforms_warning()
from torchvision.transforms.v2 import SanitizeBoundingBoxes
from torchvision.tv_tensors import (
BoundingBoxes, BoundingBoxFormat, Mask, Image, Video)
_boxes_keys = ['format', 'canvas_size']
elif importlib.metadata.version('torchvision') >= '0.17':
import torchvision
from torchvision.transforms.v2 import SanitizeBoundingBoxes
from torchvision.tv_tensors import (
BoundingBoxes, BoundingBoxFormat, Mask, Image, Video)
_boxes_keys = ['format', 'canvas_size']
else:
raise RuntimeError('Please make sure torchvision version >= 0.15.2')
def convert_to_tv_tensor(tensor: Tensor, key: str, box_format='xyxy', spatial_size=None) -> Tensor:
"""
Args:
tensor (Tensor): input tensor
key (str): transform to key
Return:
Dict[str, TV_Tensor]
"""
assert key in ('boxes', 'masks', ), "Only support 'boxes' and 'masks'"
if key == 'boxes':
box_format = getattr(BoundingBoxFormat, box_format.upper())
_kwargs = dict(zip(_boxes_keys, [box_format, spatial_size]))
return BoundingBoxes(tensor, **_kwargs)
if key == 'masks':
return Mask(tensor)

View File

@@ -0,0 +1,107 @@
"""Copyright(c) 2023 lyuwenyu. All Rights Reserved.
"""
import torch
import torch.utils.data as data
import torch.nn.functional as F
from torch.utils.data import default_collate
import torchvision
torchvision.disable_beta_transforms_warning()
import torchvision.transforms.v2 as VT
from torchvision.transforms.v2 import functional as VF, InterpolationMode
import random
from functools import partial
from ..core import register
__all__ = [
'DataLoader',
'BaseCollateFunction',
'BatchImageCollateFunction',
'batch_image_collate_fn'
]
@register()
class DataLoader(data.DataLoader):
__inject__ = ['dataset', 'collate_fn']
def __repr__(self) -> str:
format_string = self.__class__.__name__ + "("
for n in ['dataset', 'batch_size', 'num_workers', 'drop_last', 'collate_fn']:
format_string += "\n"
format_string += " {0}: {1}".format(n, getattr(self, n))
format_string += "\n)"
return format_string
def set_epoch(self, epoch):
self._epoch = epoch
self.dataset.set_epoch(epoch)
self.collate_fn.set_epoch(epoch)
@property
def epoch(self):
return self._epoch if hasattr(self, '_epoch') else -1
@property
def shuffle(self):
return self._shuffle
@shuffle.setter
def shuffle(self, shuffle):
assert isinstance(shuffle, bool), 'shuffle must be a boolean'
self._shuffle = shuffle
@register()
def batch_image_collate_fn(items):
"""only batch image
"""
return torch.cat([x[0][None] for x in items], dim=0), [x[1] for x in items]
class BaseCollateFunction(object):
def set_epoch(self, epoch):
self._epoch = epoch
@property
def epoch(self):
return self._epoch if hasattr(self, '_epoch') else -1
def __call__(self, items):
raise NotImplementedError('')
@register()
class BatchImageCollateFunction(BaseCollateFunction):
def __init__(
self,
scales=None,
stop_epoch=None,
) -> None:
super().__init__()
self.scales = scales
self.stop_epoch = stop_epoch if stop_epoch is not None else 100000000
# self.interpolation = interpolation
def __call__(self, items):
images = torch.cat([x[0][None] for x in items], dim=0)
targets = [x[1] for x in items]
if self.scales is not None and self.epoch < self.stop_epoch:
# sz = random.choice(self.scales)
# sz = [sz] if isinstance(sz, int) else list(sz)
# VF.resize(inpt, sz, interpolation=self.interpolation)
sz = random.choice(self.scales)
images = F.interpolate(images, size=sz)
if 'masks' in targets[0]:
for tg in targets:
tg['masks'] = F.interpolate(tg['masks'], size=sz, mode='nearest')
raise NotImplementedError('')
return images, targets

View File

@@ -0,0 +1,16 @@
"""Copyright(c) 2023 lyuwenyu. All Rights Reserved.
"""
# from ._dataset import DetDataset
from .cifar_dataset import CIFAR10
from .coco_dataset import CocoDetection
from .coco_dataset import (
CocoDetection,
mscoco_category2name,
mscoco_category2label,
mscoco_label2category,
)
from .coco_eval import CocoEvaluator
from .coco_utils import get_coco_api_from_dataset
from .voc_detection import VOCDetection
from .voc_eval import VOCEvaluator

View File

@@ -0,0 +1,22 @@
"""Copyright(c) 2023 lyuwenyu. All Rights Reserved.
"""
import torch
import torch.utils.data as data
class DetDataset(data.Dataset):
def __getitem__(self, index):
img, target = self.load_item(index)
if self.transforms is not None:
img, target, _ = self.transforms(img, target, self)
return img, target
def load_item(self, index):
raise NotImplementedError("Please implement this function to return item before `transforms`.")
def set_epoch(self, epoch) -> None:
self._epoch = epoch
@property
def epoch(self):
return self._epoch if hasattr(self, '_epoch') else -1

View File

@@ -0,0 +1,16 @@
"""Copyright(c) 2023 lyuwenyu. All Rights Reserved.
"""
import torchvision
from typing import Optional, Callable
from ...core import register
@register()
class CIFAR10(torchvision.datasets.CIFAR10):
__inject__ = ['transform', 'target_transform']
def __init__(self, root: str, train: bool = True, transform: Optional[Callable] = None, target_transform: Optional[Callable] = None, download: bool = False) -> None:
super().__init__(root, train, transform, target_transform, download)

View File

@@ -0,0 +1,261 @@
"""
Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
Mostly copy-paste from https://github.com/pytorch/vision/blob/13b35ff/references/detection/coco_utils.py
Copyright(c) 2023 lyuwenyu. All Rights Reserved.
"""
import torch
from faster_coco_eval.utils.pytorch import FasterCocoDetection
import torchvision
from PIL import Image
from faster_coco_eval.core import mask as coco_mask
from ._dataset import DetDataset
from .._misc import convert_to_tv_tensor
from ...core import register
__all__ = ['CocoDetection']
torchvision.disable_beta_transforms_warning()
@register()
class CocoDetection(FasterCocoDetection, DetDataset):
__inject__ = ['transforms', ]
__share__ = ['remap_mscoco_category']
def __init__(self, img_folder, ann_file, transforms, return_masks=False, remap_mscoco_category=False):
super(FasterCocoDetection, self).__init__(img_folder, ann_file)
self._transforms = transforms
self.prepare = ConvertCocoPolysToMask(return_masks)
self.img_folder = img_folder
self.ann_file = ann_file
self.return_masks = return_masks
self.remap_mscoco_category = remap_mscoco_category
def __getitem__(self, idx):
img, target = self.load_item(idx)
if self._transforms is not None:
img, target, _ = self._transforms(img, target, self)
return img, target
def load_item(self, idx):
image, target = super(FasterCocoDetection, self).__getitem__(idx)
image_id = self.ids[idx]
target = {'image_id': image_id, 'annotations': target}
if self.remap_mscoco_category:
image, target = self.prepare(image, target, category2label=mscoco_category2label)
# image, target = self.prepare(image, target, category2label=self.category2label)
else:
image, target = self.prepare(image, target)
target['idx'] = torch.tensor([idx])
if 'boxes' in target:
target['boxes'] = convert_to_tv_tensor(target['boxes'], key='boxes', spatial_size=image.size[::-1])
if 'masks' in target:
target['masks'] = convert_to_tv_tensor(target['masks'], key='masks')
return image, target
def extra_repr(self) -> str:
s = f' img_folder: {self.img_folder}\n ann_file: {self.ann_file}\n'
s += f' return_masks: {self.return_masks}\n'
if hasattr(self, '_transforms') and self._transforms is not None:
s += f' transforms:\n {repr(self._transforms)}'
if hasattr(self, '_preset') and self._preset is not None:
s += f' preset:\n {repr(self._preset)}'
return s
@property
def categories(self, ):
return self.coco.dataset['categories']
@property
def category2name(self, ):
return {cat['id']: cat['name'] for cat in self.categories}
@property
def category2label(self, ):
return {cat['id']: i for i, cat in enumerate(self.categories)}
@property
def label2category(self, ):
return {i: cat['id'] for i, cat in enumerate(self.categories)}
def convert_coco_poly_to_mask(segmentations, height, width):
masks = []
for polygons in segmentations:
rles = coco_mask.frPyObjects(polygons, height, width)
mask = coco_mask.decode(rles)
if len(mask.shape) < 3:
mask = mask[..., None]
mask = torch.as_tensor(mask, dtype=torch.uint8)
mask = mask.any(dim=2)
masks.append(mask)
if masks:
masks = torch.stack(masks, dim=0)
else:
masks = torch.zeros((0, height, width), dtype=torch.uint8)
return masks
class ConvertCocoPolysToMask(object):
def __init__(self, return_masks=False):
self.return_masks = return_masks
def __call__(self, image: Image.Image, target, **kwargs):
w, h = image.size
image_id = target["image_id"]
image_id = torch.tensor([image_id])
anno = target["annotations"]
anno = [obj for obj in anno if 'iscrowd' not in obj or obj['iscrowd'] == 0]
boxes = [obj["bbox"] for obj in anno]
# guard against no boxes via resizing
boxes = torch.as_tensor(boxes, dtype=torch.float32).reshape(-1, 4)
boxes[:, 2:] += boxes[:, :2]
boxes[:, 0::2].clamp_(min=0, max=w)
boxes[:, 1::2].clamp_(min=0, max=h)
category2label = kwargs.get('category2label', None)
if category2label is not None:
labels = [category2label[obj["category_id"]] for obj in anno]
else:
labels = [obj["category_id"] for obj in anno]
labels = torch.tensor(labels, dtype=torch.int64)
if self.return_masks:
segmentations = [obj["segmentation"] for obj in anno]
masks = convert_coco_poly_to_mask(segmentations, h, w)
keypoints = None
if anno and "keypoints" in anno[0]:
keypoints = [obj["keypoints"] for obj in anno]
keypoints = torch.as_tensor(keypoints, dtype=torch.float32)
num_keypoints = keypoints.shape[0]
if num_keypoints:
keypoints = keypoints.view(num_keypoints, -1, 3)
keep = (boxes[:, 3] > boxes[:, 1]) & (boxes[:, 2] > boxes[:, 0])
boxes = boxes[keep]
labels = labels[keep]
if self.return_masks:
masks = masks[keep]
if keypoints is not None:
keypoints = keypoints[keep]
target = {}
target["boxes"] = boxes
target["labels"] = labels
if self.return_masks:
target["masks"] = masks
target["image_id"] = image_id
if keypoints is not None:
target["keypoints"] = keypoints
# for conversion to coco api
area = torch.tensor([obj["area"] for obj in anno])
iscrowd = torch.tensor([obj["iscrowd"] if "iscrowd" in obj else 0 for obj in anno])
target["area"] = area[keep]
target["iscrowd"] = iscrowd[keep]
target["orig_size"] = torch.as_tensor([int(w), int(h)])
# target["size"] = torch.as_tensor([int(w), int(h)])
return image, target
mscoco_category2name = {
1: 'person',
2: 'bicycle',
3: 'car',
4: 'motorcycle',
5: 'airplane',
6: 'bus',
7: 'train',
8: 'truck',
9: 'boat',
10: 'traffic light',
11: 'fire hydrant',
13: 'stop sign',
14: 'parking meter',
15: 'bench',
16: 'bird',
17: 'cat',
18: 'dog',
19: 'horse',
20: 'sheep',
21: 'cow',
22: 'elephant',
23: 'bear',
24: 'zebra',
25: 'giraffe',
27: 'backpack',
28: 'umbrella',
31: 'handbag',
32: 'tie',
33: 'suitcase',
34: 'frisbee',
35: 'skis',
36: 'snowboard',
37: 'sports ball',
38: 'kite',
39: 'baseball bat',
40: 'baseball glove',
41: 'skateboard',
42: 'surfboard',
43: 'tennis racket',
44: 'bottle',
46: 'wine glass',
47: 'cup',
48: 'fork',
49: 'knife',
50: 'spoon',
51: 'bowl',
52: 'banana',
53: 'apple',
54: 'sandwich',
55: 'orange',
56: 'broccoli',
57: 'carrot',
58: 'hot dog',
59: 'pizza',
60: 'donut',
61: 'cake',
62: 'chair',
63: 'couch',
64: 'potted plant',
65: 'bed',
67: 'dining table',
70: 'toilet',
72: 'tv',
73: 'laptop',
74: 'mouse',
75: 'remote',
76: 'keyboard',
77: 'cell phone',
78: 'microwave',
79: 'oven',
80: 'toaster',
81: 'sink',
82: 'refrigerator',
84: 'book',
85: 'clock',
86: 'vase',
87: 'scissors',
88: 'teddy bear',
89: 'hair drier',
90: 'toothbrush'
}
mscoco_category2label = {k: i for i, k in enumerate(mscoco_category2name.keys())}
mscoco_label2category = {v: k for k, v in mscoco_category2label.items()}

View File

@@ -0,0 +1,16 @@
"""
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
COCO evaluator that works in distributed mode.
Mostly copy-paste from https://github.com/pytorch/vision/blob/edfd5a7/references/detection/coco_eval.py
The difference is that there is less copy-pasting from pycocotools
in the end of the file, as python3 can suppress prints with contextlib
# MiXaiLL76 replacing pycocotools with faster-coco-eval for better performance and support.
"""
from ...core import register
from faster_coco_eval.utils.pytorch import FasterCocoEvaluator
@register()
class CocoEvaluator(FasterCocoEvaluator):
pass

View File

@@ -0,0 +1,194 @@
"""
copy and modified https://github.com/pytorch/vision/blob/main/references/detection/coco_utils.py
Copyright(c) 2023 lyuwenyu. All Rights Reserved.
"""
import torch
import torch.utils.data
import torchvision
import torchvision.transforms.functional as TVF
from faster_coco_eval import COCO
import faster_coco_eval.core.mask as mask_util
def convert_coco_poly_to_mask(segmentations, height, width):
masks = []
for polygons in segmentations:
rles = mask_util.frPyObjects(polygons, height, width)
mask = mask_util.decode(rles)
if len(mask.shape) < 3:
mask = mask[..., None]
mask = torch.as_tensor(mask, dtype=torch.uint8)
mask = mask.any(dim=2)
masks.append(mask)
if masks:
masks = torch.stack(masks, dim=0)
else:
masks = torch.zeros((0, height, width), dtype=torch.uint8)
return masks
class ConvertCocoPolysToMask:
def __call__(self, image, target):
w, h = image.size
image_id = target["image_id"]
anno = target["annotations"]
anno = [obj for obj in anno if obj["iscrowd"] == 0]
boxes = [obj["bbox"] for obj in anno]
# guard against no boxes via resizing
boxes = torch.as_tensor(boxes, dtype=torch.float32).reshape(-1, 4)
boxes[:, 2:] += boxes[:, :2]
boxes[:, 0::2].clamp_(min=0, max=w)
boxes[:, 1::2].clamp_(min=0, max=h)
classes = [obj["category_id"] for obj in anno]
classes = torch.tensor(classes, dtype=torch.int64)
segmentations = [obj["segmentation"] for obj in anno]
masks = convert_coco_poly_to_mask(segmentations, h, w)
keypoints = None
if anno and "keypoints" in anno[0]:
keypoints = [obj["keypoints"] for obj in anno]
keypoints = torch.as_tensor(keypoints, dtype=torch.float32)
num_keypoints = keypoints.shape[0]
if num_keypoints:
keypoints = keypoints.view(num_keypoints, -1, 3)
keep = (boxes[:, 3] > boxes[:, 1]) & (boxes[:, 2] > boxes[:, 0])
boxes = boxes[keep]
classes = classes[keep]
masks = masks[keep]
if keypoints is not None:
keypoints = keypoints[keep]
target = {}
target["boxes"] = boxes
target["labels"] = classes
target["masks"] = masks
target["image_id"] = image_id
if keypoints is not None:
target["keypoints"] = keypoints
# for conversion to coco api
area = torch.tensor([obj["area"] for obj in anno])
iscrowd = torch.tensor([obj["iscrowd"] for obj in anno])
target["area"] = area
target["iscrowd"] = iscrowd
return image, target
def _coco_remove_images_without_annotations(dataset, cat_list=None):
def _has_only_empty_bbox(anno):
return all(any(o <= 1 for o in obj["bbox"][2:]) for obj in anno)
def _count_visible_keypoints(anno):
return sum(sum(1 for v in ann["keypoints"][2::3] if v > 0) for ann in anno)
min_keypoints_per_image = 10
def _has_valid_annotation(anno):
# if it's empty, there is no annotation
if len(anno) == 0:
return False
# if all boxes have close to zero area, there is no annotation
if _has_only_empty_bbox(anno):
return False
# keypoints task have a slight different criteria for considering
# if an annotation is valid
if "keypoints" not in anno[0]:
return True
# for keypoint detection tasks, only consider valid images those
# containing at least min_keypoints_per_image
if _count_visible_keypoints(anno) >= min_keypoints_per_image:
return True
return False
ids = []
for ds_idx, img_id in enumerate(dataset.ids):
ann_ids = dataset.coco.getAnnIds(imgIds=img_id, iscrowd=None)
anno = dataset.coco.loadAnns(ann_ids)
if cat_list:
anno = [obj for obj in anno if obj["category_id"] in cat_list]
if _has_valid_annotation(anno):
ids.append(ds_idx)
dataset = torch.utils.data.Subset(dataset, ids)
return dataset
def convert_to_coco_api(ds):
coco_ds = COCO()
# annotation IDs need to start at 1, not 0, see torchvision issue #1530
ann_id = 1
dataset = {"images": [], "categories": [], "annotations": []}
categories = set()
for img_idx in range(len(ds)):
# find better way to get target
# targets = ds.get_annotations(img_idx)
# img, targets = ds[img_idx]
# TODO (by lyuwenyu), load image and targets before `transforms`
img, targets = ds.load_item(img_idx)
width, height = img.size
image_id = targets["image_id"].item()
img_dict = {}
img_dict["id"] = image_id
img_dict["width"] = width
img_dict["height"] = height
dataset["images"].append(img_dict)
bboxes = targets["boxes"].clone()
bboxes[:, 2:] -= bboxes[:, :2] # xyxy -> xywh
bboxes = bboxes.tolist()
labels = targets["labels"].tolist()
areas = targets["area"].tolist()
iscrowd = targets["iscrowd"].tolist()
if "masks" in targets:
masks = targets["masks"]
# make masks Fortran contiguous for coco_mask
masks = masks.permute(0, 2, 1).contiguous().permute(0, 2, 1)
if "keypoints" in targets:
keypoints = targets["keypoints"]
keypoints = keypoints.reshape(keypoints.shape[0], -1).tolist()
num_objs = len(bboxes)
for i in range(num_objs):
ann = {}
ann["image_id"] = image_id
ann["bbox"] = bboxes[i]
ann["category_id"] = labels[i]
categories.add(labels[i])
ann["area"] = areas[i]
ann["iscrowd"] = iscrowd[i]
ann["id"] = ann_id
if "masks" in targets:
ann["segmentation"] = mask_util.encode(masks[i].numpy())
if "keypoints" in targets:
ann["keypoints"] = keypoints[i]
ann["num_keypoints"] = sum(k != 0 for k in keypoints[i][2::3])
dataset["annotations"].append(ann)
ann_id += 1
dataset["categories"] = [{"id": i} for i in sorted(categories)]
coco_ds.dataset = dataset
coco_ds.createIndex()
return coco_ds
def get_coco_api_from_dataset(dataset):
# FIXME: This is... awful?
for _ in range(10):
if isinstance(dataset, torchvision.datasets.CocoDetection):
break
if isinstance(dataset, torch.utils.data.Subset):
dataset = dataset.dataset
if isinstance(dataset, torchvision.datasets.CocoDetection):
return dataset.coco
return convert_to_coco_api(dataset)

View File

@@ -0,0 +1,75 @@
"""Copyright(c) 2023 lyuwenyu. All Rights Reserved.
"""
from sympy import im
import torch
import torchvision
import torchvision.transforms.functional as TVF
import os
from PIL import Image
from typing import Optional, Callable
try:
from defusedxml.ElementTree import parse as ET_parse
except ImportError:
from xml.etree.ElementTree import parse as ET_parse
from ._dataset import DetDataset
from .._misc import convert_to_tv_tensor
from ...core import register
@register()
class VOCDetection(torchvision.datasets.VOCDetection, DetDataset):
__inject__ = ['transforms', ]
def __init__(self, root: str, ann_file: str = "trainval.txt", label_file: str = "label_list.txt", transforms: Optional[Callable] = None):
with open(os.path.join(root, ann_file), 'r') as f:
lines = [x.strip() for x in f.readlines()]
lines = [x.split(' ') for x in lines]
self.images = [os.path.join(root, lin[0]) for lin in lines]
self.targets = [os.path.join(root, lin[1]) for lin in lines]
assert len(self.images) == len(self.targets)
with open(os.path.join(root + label_file), 'r') as f:
labels = f.readlines()
labels = [lab.strip() for lab in labels]
self.transforms = transforms
self.labels_map = {lab: i for i, lab in enumerate(labels)}
def __getitem__(self, index: int):
image, target = self.load_item(index)
if self.transforms is not None:
image, target, _ = self.transforms(image, target, self)
# target["orig_size"] = torch.tensor(TVF.get_image_size(image))
return image, target
def load_item(self, index: int):
image = Image.open(self.images[index]).convert("RGB")
target = self.parse_voc_xml(ET_parse(self.annotations[index]).getroot())
output = {}
output["image_id"] = torch.tensor([index])
for k in ['area', 'boxes', 'labels', 'iscrowd']:
output[k] = []
for blob in target['annotation']['object']:
box = [float(v) for v in blob['bndbox'].values()]
output["boxes"].append(box)
output["labels"].append(blob['name'])
output["area"].append((box[2] - box[0]) * (box[3] - box[1]))
output["iscrowd"].append(0)
w, h = image.size
boxes = torch.tensor(output["boxes"]) if len(output["boxes"]) > 0 else torch.zeros(0, 4)
output['boxes'] = convert_to_tv_tensor(boxes, 'boxes', box_format='xyxy', spatial_size=[h, w])
output['labels'] = torch.tensor([self.labels_map[lab] for lab in output["labels"]])
output['area'] = torch.tensor(output['area'])
output["iscrowd"] = torch.tensor(output["iscrowd"])
output["orig_size"] = torch.tensor([w, h])
return image, output

View File

@@ -0,0 +1,10 @@
"""Copyright(c) 2023 lyuwenyu. All Rights Reserved.
"""
import torch
import torchvision
class VOCEvaluator(object):
def __init__(self) -> None:
pass

View File

@@ -0,0 +1,20 @@
""""Copyright(c) 2023 lyuwenyu. All Rights Reserved.
"""
from ._transforms import (
EmptyTransform,
RandomPhotometricDistort,
RandomZoomOut,
RandomIoUCrop,
RandomHorizontalFlip,
Resize,
PadToSize,
SanitizeBoundingBoxes,
RandomCrop,
Normalize,
ConvertBoxes,
ConvertPILImage,
)
from .container import Compose
from .mosaic import Mosaic

View File

@@ -0,0 +1,148 @@
""""Copyright(c) 2023 lyuwenyu. All Rights Reserved.
"""
import torch
import torch.nn as nn
import torchvision
torchvision.disable_beta_transforms_warning()
import torchvision.transforms.v2 as T
import torchvision.transforms.v2.functional as F
import PIL
import PIL.Image
from typing import Any, Dict, List, Optional
from .._misc import convert_to_tv_tensor, _boxes_keys
from .._misc import Image, Video, Mask, BoundingBoxes
from .._misc import SanitizeBoundingBoxes
from ...core import register
RandomPhotometricDistort = register()(T.RandomPhotometricDistort)
RandomZoomOut = register()(T.RandomZoomOut)
RandomHorizontalFlip = register()(T.RandomHorizontalFlip)
Resize = register()(T.Resize)
# ToImageTensor = register()(T.ToImageTensor)
# ConvertDtype = register()(T.ConvertDtype)
# PILToTensor = register()(T.PILToTensor)
SanitizeBoundingBoxes = register(name='SanitizeBoundingBoxes')(SanitizeBoundingBoxes)
RandomCrop = register()(T.RandomCrop)
Normalize = register()(T.Normalize)
@register()
class EmptyTransform(T.Transform):
def __init__(self, ) -> None:
super().__init__()
def forward(self, *inputs):
inputs = inputs if len(inputs) > 1 else inputs[0]
return inputs
@register()
class PadToSize(T.Pad):
_transformed_types = (
PIL.Image.Image,
Image,
Video,
Mask,
BoundingBoxes,
)
def _get_params(self, flat_inputs: List[Any]) -> Dict[str, Any]:
sp = F.get_spatial_size(flat_inputs[0])
h, w = self.size[1] - sp[0], self.size[0] - sp[1]
self.padding = [0, 0, w, h]
return dict(padding=self.padding)
def make_params(self, flat_inputs: List[Any]) -> Dict[str, Any]:
return self._get_params(flat_inputs)
def __init__(self, size, fill=0, padding_mode='constant') -> None:
if isinstance(size, int):
size = (size, size)
self.size = size
super().__init__(0, fill, padding_mode)
def _transform(self, inpt: Any, params: Dict[str, Any]) -> Any:
fill = self._fill[type(inpt)]
padding = params['padding']
return F.pad(inpt, padding=padding, fill=fill, padding_mode=self.padding_mode) # type: ignore[arg-type]
def transform(self, inpt: Any, params: Dict[str, Any]) -> Any:
return self._transform(inpt, params)
def __call__(self, *inputs: Any) -> Any:
outputs = super().forward(*inputs)
if len(outputs) > 1 and isinstance(outputs[1], dict):
outputs[1]['padding'] = torch.tensor(self.padding)
return outputs
@register()
class RandomIoUCrop(T.RandomIoUCrop):
def __init__(self, min_scale: float = 0.3, max_scale: float = 1, min_aspect_ratio: float = 0.5, max_aspect_ratio: float = 2, sampler_options: Optional[List[float]] = None, trials: int = 40, p: float = 1.0):
super().__init__(min_scale, max_scale, min_aspect_ratio, max_aspect_ratio, sampler_options, trials)
self.p = p
def __call__(self, *inputs: Any) -> Any:
if torch.rand(1) >= self.p:
return inputs if len(inputs) > 1 else inputs[0]
return super().forward(*inputs)
@register()
class ConvertBoxes(T.Transform):
_transformed_types = (
BoundingBoxes,
)
def __init__(self, fmt='', normalize=False) -> None:
super().__init__()
self.fmt = fmt
self.normalize = normalize
def _transform(self, inpt: Any, params: Dict[str, Any]) -> Any:
spatial_size = getattr(inpt, _boxes_keys[1])
if self.fmt:
in_fmt = inpt.format.value.lower()
inpt = torchvision.ops.box_convert(inpt, in_fmt=in_fmt, out_fmt=self.fmt.lower())
inpt = convert_to_tv_tensor(inpt, key='boxes', box_format=self.fmt.upper(), spatial_size=spatial_size)
if self.normalize:
inpt = inpt / torch.tensor(spatial_size[::-1]).tile(2)[None]
return inpt
def transform(self, inpt: Any, params: Dict[str, Any]) -> Any:
return self._transform(inpt, params)
@register()
class ConvertPILImage(T.Transform):
_transformed_types = (
PIL.Image.Image,
)
def __init__(self, dtype='float32', scale=True) -> None:
super().__init__()
self.dtype = dtype
self.scale = scale
def _transform(self, inpt: Any, params: Dict[str, Any]) -> Any:
inpt = F.pil_to_tensor(inpt)
if self.dtype == 'float32':
inpt = inpt.float()
if self.scale:
inpt = inpt / 255.
inpt = Image(inpt)
return inpt
def transform(self, inpt: Any, params: Dict[str, Any]) -> Any:
return self._transform(inpt, params)

View File

@@ -0,0 +1,95 @@
""""Copyright(c) 2023 lyuwenyu. All Rights Reserved.
"""
import torch
import torch.nn as nn
import torchvision
torchvision.disable_beta_transforms_warning()
import torchvision.transforms.v2 as T
from typing import Any, Dict, List, Optional
from ._transforms import EmptyTransform
from ...core import register, GLOBAL_CONFIG
@register()
class Compose(T.Compose):
def __init__(self, ops, policy=None) -> None:
transforms = []
if ops is not None:
for op in ops:
if isinstance(op, dict):
name = op.pop('type')
transfom = getattr(GLOBAL_CONFIG[name]['_pymodule'], GLOBAL_CONFIG[name]['_name'])(**op)
transforms.append(transfom)
op['type'] = name
elif isinstance(op, nn.Module):
transforms.append(op)
else:
raise ValueError('')
else:
transforms =[EmptyTransform(), ]
super().__init__(transforms=transforms)
if policy is None:
policy = {'name': 'default'}
self.policy = policy
self.global_samples = 0
def forward(self, *inputs: Any) -> Any:
return self.get_forward(self.policy['name'])(*inputs)
def get_forward(self, name):
forwards = {
'default': self.default_forward,
'stop_epoch': self.stop_epoch_forward,
'stop_sample': self.stop_sample_forward,
}
return forwards[name]
def default_forward(self, *inputs: Any) -> Any:
sample = inputs if len(inputs) > 1 else inputs[0]
for transform in self.transforms:
sample = transform(sample)
return sample
def stop_epoch_forward(self, *inputs: Any):
sample = inputs if len(inputs) > 1 else inputs[0]
dataset = sample[-1]
cur_epoch = dataset.epoch
policy_ops = self.policy['ops']
policy_epoch = self.policy['epoch']
for transform in self.transforms:
if type(transform).__name__ in policy_ops and cur_epoch >= policy_epoch:
pass
else:
sample = transform(sample)
return sample
def stop_sample_forward(self, *inputs: Any):
sample = inputs if len(inputs) > 1 else inputs[0]
dataset = sample[-1]
cur_epoch = dataset.epoch
policy_ops = self.policy['ops']
policy_sample = self.policy['sample']
for transform in self.transforms:
if type(transform).__name__ in policy_ops and self.global_samples >= policy_sample:
pass
else:
sample = transform(sample)
self.global_samples += 1
return sample

View File

@@ -0,0 +1,169 @@
import torch
import torchvision.transforms.functional as F
from packaging import version
from typing import Optional, List
from torch import Tensor
# needed due to empty tensor bug in pytorch and torchvision 0.5
import torchvision
if version.parse(torchvision.__version__) < version.parse('0.7'):
from torchvision.ops import _new_empty_tensor
from torchvision.ops.misc import _output_size
def interpolate(input, size=None, scale_factor=None, mode="nearest", align_corners=None):
# type: (Tensor, Optional[List[int]], Optional[float], str, Optional[bool]) -> Tensor
"""
Equivalent to nn.functional.interpolate, but with support for empty batch sizes.
This will eventually be supported natively by PyTorch, and this
class can go away.
"""
if version.parse(torchvision.__version__) < version.parse('0.7'):
if input.numel() > 0:
return torch.nn.functional.interpolate(
input, size, scale_factor, mode, align_corners
)
output_shape = _output_size(2, input, size, scale_factor)
output_shape = list(input.shape[:-2]) + list(output_shape)
return _new_empty_tensor(input, output_shape)
else:
return torchvision.ops.misc.interpolate(input, size, scale_factor, mode, align_corners)
def crop(image, target, region):
cropped_image = F.crop(image, *region)
target = target.copy()
i, j, h, w = region
# should we do something wrt the original size?
target["size"] = torch.tensor([h, w])
fields = ["labels", "area", "iscrowd"]
if "boxes" in target:
boxes = target["boxes"]
max_size = torch.as_tensor([w, h], dtype=torch.float32)
cropped_boxes = boxes - torch.as_tensor([j, i, j, i])
cropped_boxes = torch.min(cropped_boxes.reshape(-1, 2, 2), max_size)
cropped_boxes = cropped_boxes.clamp(min=0)
area = (cropped_boxes[:, 1, :] - cropped_boxes[:, 0, :]).prod(dim=1)
target["boxes"] = cropped_boxes.reshape(-1, 4)
target["area"] = area
fields.append("boxes")
if "masks" in target:
# FIXME should we update the area here if there are no boxes?
target['masks'] = target['masks'][:, i:i + h, j:j + w]
fields.append("masks")
# remove elements for which the boxes or masks that have zero area
if "boxes" in target or "masks" in target:
# favor boxes selection when defining which elements to keep
# this is compatible with previous implementation
if "boxes" in target:
cropped_boxes = target['boxes'].reshape(-1, 2, 2)
keep = torch.all(cropped_boxes[:, 1, :] > cropped_boxes[:, 0, :], dim=1)
else:
keep = target['masks'].flatten(1).any(1)
for field in fields:
target[field] = target[field][keep]
return cropped_image, target
def hflip(image, target):
flipped_image = F.hflip(image)
w, h = image.size
target = target.copy()
if "boxes" in target:
boxes = target["boxes"]
boxes = boxes[:, [2, 1, 0, 3]] * torch.as_tensor([-1, 1, -1, 1]) + torch.as_tensor([w, 0, w, 0])
target["boxes"] = boxes
if "masks" in target:
target['masks'] = target['masks'].flip(-1)
return flipped_image, target
def resize(image, target, size, max_size=None):
# size can be min_size (scalar) or (w, h) tuple
def get_size_with_aspect_ratio(image_size, size, max_size=None):
w, h = image_size
if max_size is not None:
min_original_size = float(min((w, h)))
max_original_size = float(max((w, h)))
if max_original_size / min_original_size * size > max_size:
size = int(round(max_size * min_original_size / max_original_size))
if (w <= h and w == size) or (h <= w and h == size):
return (h, w)
if w < h:
ow = size
oh = int(size * h / w)
else:
oh = size
ow = int(size * w / h)
# r = min(size / min(h, w), max_size / max(h, w))
# ow = int(w * r)
# oh = int(h * r)
return (oh, ow)
def get_size(image_size, size, max_size=None):
if isinstance(size, (list, tuple)):
return size[::-1]
else:
return get_size_with_aspect_ratio(image_size, size, max_size)
size = get_size(image.size, size, max_size)
rescaled_image = F.resize(image, size)
if target is None:
return rescaled_image, None
ratios = tuple(float(s) / float(s_orig) for s, s_orig in zip(rescaled_image.size, image.size))
ratio_width, ratio_height = ratios
target = target.copy()
if "boxes" in target:
boxes = target["boxes"]
scaled_boxes = boxes * torch.as_tensor([ratio_width, ratio_height, ratio_width, ratio_height])
target["boxes"] = scaled_boxes
if "area" in target:
area = target["area"]
scaled_area = area * (ratio_width * ratio_height)
target["area"] = scaled_area
h, w = size
target["size"] = torch.tensor([h, w])
if "masks" in target:
target['masks'] = interpolate(
target['masks'][:, None].float(), size, mode="nearest")[:, 0] > 0.5
return rescaled_image, target
def pad(image, target, padding):
# assumes that we only pad on the bottom right corners
padded_image = F.pad(image, (0, 0, padding[0], padding[1]))
if target is None:
return padded_image, None
target = target.copy()
# should we do something wrt the original size?
target["size"] = torch.tensor(padded_image.size[::-1])
if "masks" in target:
target['masks'] = torch.nn.functional.pad(target['masks'], (0, padding[0], 0, padding[1]))
return padded_image, target

View File

@@ -0,0 +1,72 @@
""""Copyright(c) 2023 lyuwenyu. All Rights Reserved.
"""
import torch
import torchvision
torchvision.disable_beta_transforms_warning()
import torchvision.transforms.v2 as T
import torchvision.transforms.v2.functional as F
import random
from PIL import Image
from .._misc import convert_to_tv_tensor
from ...core import register
@register()
class Mosaic(T.Transform):
def __init__(self, size, max_size=None, ) -> None:
super().__init__()
self.resize = T.Resize(size=size, max_size=max_size)
self.crop = T.RandomCrop(size=max_size if max_size else size)
# TODO add arg `output_size` for affine`
# self.random_perspective = T.RandomPerspective(distortion_scale=0.5, p=1., )
self.random_affine = T.RandomAffine(degrees=0, translate=(0.1, 0.1), scale=(0.5, 1.5), fill=114)
def forward(self, *inputs):
inputs = inputs if len(inputs) > 1 else inputs[0]
image, target, dataset = inputs
images = []
targets = []
indices = random.choices(range(len(dataset)), k=3)
for i in indices:
image, target = dataset.load_item(i)
image, target = self.resize(image, target)
images.append(image)
targets.append(target)
h, w = F.get_spatial_size(images[0])
offset = [[0, 0], [w, 0], [0, h], [w, h]]
image = Image.new(mode=images[0].mode, size=(w * 2, h * 2), color=0)
for i, im in enumerate(images):
image.paste(im, offset[i])
offset = torch.tensor([[0, 0], [w, 0], [0, h], [w, h]]).repeat(1, 2)
target = {}
for k in targets[0]:
if k == 'boxes':
v = [t[k] + offset[i] for i, t in enumerate(targets)]
else:
v = [t[k] for t in targets]
if isinstance(v[0], torch.Tensor):
v = torch.cat(v, dim=0)
target[k] = v
if 'boxes' in target:
# target['boxes'] = target['boxes'].clamp(0, 640 * 2 - 1)
w, h = image.size
target['boxes'] = convert_to_tv_tensor(target['boxes'], 'boxes', box_format='xyxy', spatial_size=[h, w])
if 'masks' in target:
target['masks'] = convert_to_tv_tensor(target['masks'], 'masks')
image, target = self.random_affine(image, target)
# image, target = self.resize(image, target)
image, target = self.crop(image, target)
return image, target, dataset

View File

@@ -0,0 +1,2 @@
""""Copyright(c) 2023 lyuwenyu. All Rights Reserved.
"""

View File

@@ -0,0 +1,7 @@
"""Copyright(c) 2023 lyuwenyu. All Rights Reserved.
"""
from .logger import *
from .visualizer import *
from .dist_utils import setup_seed, setup_print
from .profiler_utils import stats

View File

@@ -0,0 +1,103 @@
"""Copyright(c) 2023 lyuwenyu. All Rights Reserved.
"""
import torch
import torchvision
from torch import Tensor
from typing import List, Tuple
def generalized_box_iou(boxes1: Tensor, boxes2: Tensor) -> Tensor:
assert (boxes1[:, 2:] >= boxes1[:, :2]).all()
assert (boxes2[:, 2:] >= boxes2[:, :2]).all()
return torchvision.ops.generalized_box_iou(boxes1, boxes2)
# elementwise
def elementwise_box_iou(boxes1: Tensor, boxes2: Tensor) -> Tensor:
"""
Args:
boxes1, [N, 4]
boxes2, [N, 4]
Returns:
iou, [N, ]
union, [N, ]
"""
area1 = torchvision.ops.box_area(boxes1) # [N, ]
area2 = torchvision.ops.box_area(boxes2) # [N, ]
lt = torch.max(boxes1[:, :2], boxes2[:, :2]) # [N, 2]
rb = torch.min(boxes1[:, 2:], boxes2[:, 2:]) # [N, 2]
wh = (rb - lt).clamp(min=0) # [N, 2]
inter = wh[:, 0] * wh[:, 1] # [N, ]
union = area1 + area2 - inter
iou = inter / union
return iou, union
def elementwise_generalized_box_iou(boxes1: Tensor, boxes2: Tensor) -> Tensor:
"""
Args:
boxes1, [N, 4] with [x1, y1, x2, y2]
boxes2, [N, 4] with [x1, y1, x2, y2]
Returns:
giou, [N, ]
"""
assert (boxes1[:, 2:] >= boxes1[:, :2]).all()
assert (boxes2[:, 2:] >= boxes2[:, :2]).all()
iou, union = elementwise_box_iou(boxes1, boxes2)
lt = torch.min(boxes1[:, :2], boxes2[:, :2]) # [N, 2]
rb = torch.max(boxes1[:, 2:], boxes2[:, 2:]) # [N, 2]
wh = (rb - lt).clamp(min=0) # [N, 2]
area = wh[:, 0] * wh[:, 1]
return iou - (area - union) / area
def check_point_inside_box(points: Tensor, boxes: Tensor, eps=1e-9) -> Tensor:
"""
Args:
points, [K, 2], (x, y)
boxes, [N, 4], (x1, y1, y2, y2)
Returns:
Tensor (bool), [K, N]
"""
x, y = [p.unsqueeze(-1) for p in points.unbind(-1)]
x1, y1, x2, y2 = [x.unsqueeze(0) for x in boxes.unbind(-1)]
l = x - x1
t = y - y1
r = x2 - x
b = y2 - y
ltrb = torch.stack([l, t, r, b], dim=-1)
mask = ltrb.min(dim=-1).values > eps
return mask
def point_box_distance(points: Tensor, boxes: Tensor) -> Tensor:
"""
Args:
boxes, [N, 4], (x1, y1, x2, y2)
points, [N, 2], (x, y)
Returns:
Tensor (N, 4), (l, t, r, b)
"""
x1y1, x2y2 = torch.split(boxes, 2, dim=-1)
lt = points - x1y1
rb = x2y2 - points
return torch.concat([lt, rb], dim=-1)
def point_distance_box(points: Tensor, distances: Tensor) -> Tensor:
"""
Args:
points (Tensor), [N, 2], (x, y)
distances (Tensor), [N, 4], (l, t, r, b)
Returns:
boxes (Tensor), (N, 4), (x1, y1, x2, y2)
"""
lt, rb = torch.split(distances, 2, dim=-1)
x1y1 = -lt + points
x2y2 = rb + points
boxes = torch.concat([x1y1, x2y2], dim=-1)
return boxes

View File

@@ -0,0 +1,267 @@
"""
reference
- https://github.com/pytorch/vision/blob/main/references/detection/utils.py
- https://github.com/facebookresearch/detr/blob/master/util/misc.py#L406
Copyright(c) 2023 lyuwenyu. All Rights Reserved.
"""
import os
import random
import numpy as np
import atexit
import torch
import torch.nn as nn
import torch.distributed
import torch.backends.cudnn
from torch.nn.parallel import DataParallel as DP
from torch.nn.parallel import DistributedDataParallel as DDP
from torch.distributed.fsdp import FullyShardedDataParallel as FSDP
from torch.utils.data import DistributedSampler
# from torch.utils.data.dataloader import DataLoader
from ..data import DataLoader
def setup_distributed(print_rank: int=0, print_method: str='builtin', seed: int=None, ):
"""
env setup
args:
print_rank,
print_method, (builtin, rich)
seed,
"""
try:
# https://pytorch.org/docs/stable/elastic/run.html
RANK = int(os.getenv('RANK', -1))
LOCAL_RANK = int(os.getenv('LOCAL_RANK', -1))
WORLD_SIZE = int(os.getenv('WORLD_SIZE', 1))
# torch.distributed.init_process_group(backend=backend, init_method='env://')
torch.distributed.init_process_group(init_method='env://')
torch.distributed.barrier()
rank = torch.distributed.get_rank()
torch.cuda.set_device(rank)
torch.cuda.empty_cache()
enabled_dist = True
print('Initialized distributed mode...')
except:
enabled_dist = False
print('Not init distributed mode.')
setup_print(get_rank() == print_rank, method=print_method)
if seed is not None:
setup_seed(seed)
return enabled_dist
def setup_print(is_main, method='builtin'):
"""This function disables printing when not in master process
"""
import builtins as __builtin__
if method == 'builtin':
builtin_print = __builtin__.print
elif method == 'rich':
import rich
builtin_print = rich.print
else:
raise AttributeError('')
def print(*args, **kwargs):
force = kwargs.pop('force', False)
if is_main or force:
builtin_print(*args, **kwargs)
__builtin__.print = print
def is_dist_available_and_initialized():
if not torch.distributed.is_available():
return False
if not torch.distributed.is_initialized():
return False
return True
@atexit.register
def cleanup():
"""cleanup distributed environment
"""
if is_dist_available_and_initialized():
torch.distributed.barrier()
torch.distributed.destroy_process_group()
def get_rank():
if not is_dist_available_and_initialized():
return 0
return torch.distributed.get_rank()
def get_world_size():
if not is_dist_available_and_initialized():
return 1
return torch.distributed.get_world_size()
def is_main_process():
return get_rank() == 0
def save_on_master(*args, **kwargs):
if is_main_process():
torch.save(*args, **kwargs)
def warp_model(
model: torch.nn.Module,
sync_bn: bool=False,
dist_mode: str='ddp',
find_unused_parameters: bool=False,
compile: bool=False,
compile_mode: str='reduce-overhead',
**kwargs
):
if is_dist_available_and_initialized():
rank = get_rank()
model = nn.SyncBatchNorm.convert_sync_batchnorm(model) if sync_bn else model
if dist_mode == 'dp':
model = DP(model, device_ids=[rank], output_device=rank)
elif dist_mode == 'ddp':
model = DDP(model, device_ids=[rank], output_device=rank, find_unused_parameters=find_unused_parameters)
else:
raise AttributeError('')
if compile:
model = torch.compile(model, mode=compile_mode)
return model
def de_model(model):
return de_parallel(de_complie(model))
def warp_loader(loader, shuffle=False):
if is_dist_available_and_initialized():
sampler = DistributedSampler(loader.dataset, shuffle=shuffle)
loader = DataLoader(loader.dataset,
loader.batch_size,
sampler=sampler,
drop_last=loader.drop_last,
collate_fn=loader.collate_fn,
pin_memory=loader.pin_memory,
num_workers=loader.num_workers, )
return loader
def is_parallel(model) -> bool:
# Returns True if model is of type DP or DDP
return type(model) in (torch.nn.parallel.DataParallel, torch.nn.parallel.DistributedDataParallel)
def de_parallel(model) -> nn.Module:
# De-parallelize a model: returns single-GPU model if model is of type DP or DDP
return model.module if is_parallel(model) else model
def reduce_dict(data, avg=True):
"""
Args
data dict: input, {k: v, ...}
avg bool: true
"""
world_size = get_world_size()
if world_size < 2:
return data
with torch.no_grad():
keys, values = [], []
for k in sorted(data.keys()):
keys.append(k)
values.append(data[k])
values = torch.stack(values, dim=0)
torch.distributed.all_reduce(values)
if avg is True:
values /= world_size
return {k: v for k, v in zip(keys, values)}
def all_gather(data):
"""
Run all_gather on arbitrary picklable data (not necessarily tensors)
Args:
data: any picklable object
Returns:
list[data]: list of data gathered from each rank
"""
world_size = get_world_size()
if world_size == 1:
return [data]
data_list = [None] * world_size
torch.distributed.all_gather_object(data_list, data)
return data_list
import time
def sync_time():
"""sync_time
"""
if torch.cuda.is_available():
torch.cuda.synchronize()
return time.time()
def setup_seed(seed: int, deterministic=False):
"""setup_seed for reproducibility
torch.manual_seed(3407) is all you need. https://arxiv.org/abs/2109.08203
"""
seed = seed + get_rank()
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
if torch.cuda.is_available():
torch.cuda.manual_seed_all(seed)
# memory will be large when setting deterministic to True
if torch.backends.cudnn.is_available() and deterministic:
torch.backends.cudnn.deterministic = True
# for torch.compile
def check_compile():
import torch
import warnings
gpu_ok = False
if torch.cuda.is_available():
device_cap = torch.cuda.get_device_capability()
if device_cap in ((7, 0), (8, 0), (9, 0)):
gpu_ok = True
if not gpu_ok:
warnings.warn(
"GPU is not NVIDIA V100, A100, or H100. Speedup numbers may be lower "
"than expected."
)
return gpu_ok
def is_compile(model):
import torch._dynamo
return type(model) in (torch._dynamo.OptimizedModule, )
def de_complie(model):
return model._orig_mod if is_compile(model) else model

View File

@@ -0,0 +1,70 @@
"""
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/util/lazy_loader.py
"""
import types
import importlib
class LazyLoader(types.ModuleType):
"""Lazily import a module, mainly to avoid pulling in large dependencies.
`paddle`, and `ffmpeg` are examples of modules that are large and not always
needed, and this allows them to only be loaded when they are used.
"""
# The lint error here is incorrect.
def __init__(self, local_name, parent_module_globals, name, warning=None):
self._local_name = local_name
self._parent_module_globals = parent_module_globals
self._warning = warning
# These members allows doctest correctly process this module member without
# triggering self._load(). self._load() mutates parant_module_globals and
# triggers a dict mutated during iteration error from doctest.py.
# - for from_module()
self.__module__ = name.rsplit(".", 1)[0]
# - for is_routine()
self.__wrapped__ = None
super(LazyLoader, self).__init__(name)
def _load(self):
"""Load the module and insert it into the parent's globals."""
# Import the target module and insert it into the parent's namespace
module = importlib.import_module(self.__name__)
self._parent_module_globals[self._local_name] = module
# Emit a warning if one was specified
if self._warning:
# logging.warning(self._warning)
# Make sure to only warn once.
self._warning = None
# Update this object's dict so that if someone keeps a reference to the
# LazyLoader, lookups are efficient (__getattr__ is only called on lookups
# that fail).
self.__dict__.update(module.__dict__)
return module
def __getattr__(self, item):
module = self._load()
return getattr(module, item)
def __repr__(self):
# Carefully to not trigger _load, since repr may be called in very
# sensitive places.
return f"<LazyLoader {self.__name__} as {self._local_name}>"
def __dir__(self):
module = self._load()
return dir(module)
# import paddle.nn as nn
# nn = LazyLoader("nn", globals(), "paddle.nn")
# class M(nn.Layer):
# def __init__(self) -> None:
# super().__init__()

View File

@@ -0,0 +1,239 @@
"""
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
https://github.com/facebookresearch/detr/blob/main/util/misc.py
Mostly copy-paste from torchvision references.
"""
import time
import pickle
import datetime
from collections import defaultdict, deque
from typing import Dict
import torch
import torch.distributed as tdist
from .dist_utils import is_dist_available_and_initialized, get_world_size
class SmoothedValue(object):
"""Track a series of values and provide access to smoothed values over a
window or the global series average.
"""
def __init__(self, window_size=20, fmt=None):
if fmt is None:
fmt = "{median:.4f} ({global_avg:.4f})"
self.deque = deque(maxlen=window_size)
self.total = 0.0
self.count = 0
self.fmt = fmt
def update(self, value, n=1):
self.deque.append(value)
self.count += n
self.total += value * n
def synchronize_between_processes(self):
"""
Warning: does not synchronize the deque!
"""
if not is_dist_available_and_initialized():
return
t = torch.tensor([self.count, self.total], dtype=torch.float64, device='cuda')
tdist.barrier()
tdist.all_reduce(t)
t = t.tolist()
self.count = int(t[0])
self.total = t[1]
@property
def median(self):
d = torch.tensor(list(self.deque))
return d.median().item()
@property
def avg(self):
d = torch.tensor(list(self.deque), dtype=torch.float32)
return d.mean().item()
@property
def global_avg(self):
return self.total / self.count
@property
def max(self):
return max(self.deque)
@property
def value(self):
return self.deque[-1]
def __str__(self):
return self.fmt.format(
median=self.median,
avg=self.avg,
global_avg=self.global_avg,
max=self.max,
value=self.value)
def all_gather(data):
"""
Run all_gather on arbitrary picklable data (not necessarily tensors)
Args:
data: any picklable object
Returns:
list[data]: list of data gathered from each rank
"""
world_size = get_world_size()
if world_size == 1:
return [data]
# serialized to a Tensor
buffer = pickle.dumps(data)
storage = torch.ByteStorage.from_buffer(buffer)
tensor = torch.ByteTensor(storage).to("cuda")
# obtain Tensor size of each rank
local_size = torch.tensor([tensor.numel()], device="cuda")
size_list = [torch.tensor([0], device="cuda") for _ in range(world_size)]
tdist.all_gather(size_list, local_size)
size_list = [int(size.item()) for size in size_list]
max_size = max(size_list)
# receiving Tensor from all ranks
# we pad the tensor because torch all_gather does not support
# gathering tensors of different shapes
tensor_list = []
for _ in size_list:
tensor_list.append(torch.empty((max_size,), dtype=torch.uint8, device="cuda"))
if local_size != max_size:
padding = torch.empty(size=(max_size - local_size,), dtype=torch.uint8, device="cuda")
tensor = torch.cat((tensor, padding), dim=0)
tdist.all_gather(tensor_list, tensor)
data_list = []
for size, tensor in zip(size_list, tensor_list):
buffer = tensor.cpu().numpy().tobytes()[:size]
data_list.append(pickle.loads(buffer))
return data_list
def reduce_dict(input_dict, average=True) -> Dict[str, torch.Tensor]:
"""
Args:
input_dict (dict): all the values will be reduced
average (bool): whether to do average or sum
Reduce the values in the dictionary from all processes so that all processes
have the averaged results. Returns a dict with the same fields as
input_dict, after reduction.
"""
world_size = get_world_size()
if world_size < 2:
return input_dict
with torch.no_grad():
names = []
values = []
# sort the keys so that they are consistent across processes
for k in sorted(input_dict.keys()):
names.append(k)
values.append(input_dict[k])
values = torch.stack(values, dim=0)
tdist.all_reduce(values)
if average:
values /= world_size
reduced_dict = {k: v for k, v in zip(names, values)}
return reduced_dict
class MetricLogger(object):
def __init__(self, delimiter="\t"):
self.meters = defaultdict(SmoothedValue)
self.delimiter = delimiter
def update(self, **kwargs):
for k, v in kwargs.items():
if isinstance(v, torch.Tensor):
v = v.item()
assert isinstance(v, (float, int))
self.meters[k].update(v)
def __getattr__(self, attr):
if attr in self.meters:
return self.meters[attr]
if attr in self.__dict__:
return self.__dict__[attr]
raise AttributeError("'{}' object has no attribute '{}'".format(
type(self).__name__, attr))
def __str__(self):
loss_str = []
for name, meter in self.meters.items():
loss_str.append(
"{}: {}".format(name, str(meter))
)
return self.delimiter.join(loss_str)
def synchronize_between_processes(self):
for meter in self.meters.values():
meter.synchronize_between_processes()
def add_meter(self, name, meter):
self.meters[name] = meter
def log_every(self, iterable, print_freq, header=None):
i = 0
if not header:
header = ''
start_time = time.time()
end = time.time()
iter_time = SmoothedValue(fmt='{avg:.4f}')
data_time = SmoothedValue(fmt='{avg:.4f}')
space_fmt = ':' + str(len(str(len(iterable)))) + 'd'
if torch.cuda.is_available():
log_msg = self.delimiter.join([
header,
'[{0' + space_fmt + '}/{1}]',
'eta: {eta}',
'{meters}',
'time: {time}',
'data: {data}',
'max mem: {memory:.0f}'
])
else:
log_msg = self.delimiter.join([
header,
'[{0' + space_fmt + '}/{1}]',
'eta: {eta}',
'{meters}',
'time: {time}',
'data: {data}'
])
MB = 1024.0 * 1024.0
for obj in iterable:
data_time.update(time.time() - end)
yield obj
iter_time.update(time.time() - end)
if i % print_freq == 0 or i == len(iterable) - 1:
eta_seconds = iter_time.global_avg * (len(iterable) - i)
eta_string = str(datetime.timedelta(seconds=int(eta_seconds)))
if torch.cuda.is_available():
print(log_msg.format(
i, len(iterable), eta=eta_string,
meters=str(self),
time=str(iter_time), data=str(data_time),
memory=torch.cuda.max_memory_allocated() / MB))
else:
print(log_msg.format(
i, len(iterable), eta=eta_string,
meters=str(self),
time=str(iter_time), data=str(data_time)))
i += 1
end = time.time()
total_time = time.time() - start_time
total_time_str = str(datetime.timedelta(seconds=int(total_time)))
print('{} Total time: {} ({:.4f} s / it)'.format(
header, total_time_str, total_time / len(iterable)))

View File

@@ -0,0 +1,65 @@
"""Copyright(c) 2023 lyuwenyu. All Rights Reserved.
"""
import re
import torch
import torch.nn as nn
from torch import Tensor
from typing import List
def stats(
model: nn.Module,
data: Tensor=None,
input_shape: List=[1, 3, 640, 640],
device: str='cpu',
verbose=False) -> str:
is_training = model.training
model.train()
num_params = sum([p.numel() for p in model.parameters() if p.requires_grad])
model.eval()
model = model.to(device)
if data is None:
data = torch.rand(*input_shape, device=device)
def trace_handler(prof):
print(prof.key_averages().table(
sort_by="self_cuda_time_total", row_limit=-1))
num_active = 2
with torch.profiler.profile(
activities=[
torch.profiler.ProfilerActivity.CPU,
torch.profiler.ProfilerActivity.CUDA,
],
schedule=torch.profiler.schedule(
wait=1,
warmup=1,
active=num_active,
repeat=1
),
# on_trace_ready=trace_handler,
# on_trace_ready=torch.profiler.tensorboard_trace_handler('./log')
# with_modules=True,
with_flops=True,
) as p:
for _ in range(5):
_ = model(data)
p.step()
if is_training:
model.train()
info = p.key_averages().table(sort_by="self_cuda_time_total", row_limit=-1)
num_flops = sum([float(v.strip()) for v in re.findall('(\d+.?\d+ *\n)', info)]) / num_active
if verbose:
# print(info)
print(f'Total number of trainable parameters: {num_params}')
print(f'Total number of flops: {int(num_flops)}M with {input_shape}')
return {'n_parameters': num_params, 'n_flops': num_flops, 'info': info}

View File

@@ -0,0 +1,34 @@
""""Copyright(c) 2023 lyuwenyu. All Rights Reserved.
"""
import torch
import torch.utils.data
import torchvision
torchvision.disable_beta_transforms_warning()
import PIL
__all__ = ['show_sample']
def show_sample(sample):
"""for coco dataset/dataloader
"""
import matplotlib.pyplot as plt
from torchvision.transforms.v2 import functional as F
from torchvision.utils import draw_bounding_boxes
image, target = sample
if isinstance(image, PIL.Image.Image):
image = F.to_image_tensor(image)
image = F.convert_dtype(image, torch.uint8)
annotated_image = draw_bounding_boxes(image, target["boxes"], colors="yellow", width=3)
fig, ax = plt.subplots()
ax.imshow(annotated_image.permute(1, 2, 0).numpy())
ax.set(xticklabels=[], yticklabels=[], xticks=[], yticks=[])
fig.tight_layout()
fig.show()
plt.show()

View File

@@ -0,0 +1,17 @@
"""Copyright(c) 2023 lyuwenyu. All Rights Reserved.
"""
from .arch import *
from .criterion import *
from .postprocessor import *
#
from .backbone import *
from .backbone import (
get_activation,
FrozenBatchNorm2d,
freeze_batch_norm2d,
)

View File

@@ -0,0 +1,6 @@
"""Copyright(c) 2023 lyuwenyu. All Rights Reserved.
"""
from .classification import Classification, ClassHead
from .yolo import YOLO

View File

@@ -0,0 +1,45 @@
"""Copyright(c) 2023 lyuwenyu. All Rights Reserved.
"""
import torch
import torch.nn as nn
from ...core import register
__all__ = ['Classification', 'ClassHead']
@register()
class Classification(torch.nn.Module):
__inject__ = ['backbone', 'head']
def __init__(self, backbone: nn.Module, head: nn.Module=None):
super().__init__()
self.backbone = backbone
self.head = head
def forward(self, x):
x = self.backbone(x)
if self.head is not None:
x = self.head(x)
return x
@register()
class ClassHead(nn.Module):
def __init__(self, hidden_dim, num_classes):
super().__init__()
self.pool = nn.AdaptiveAvgPool2d(1)
self.proj = nn.Linear(hidden_dim, num_classes)
def forward(self, x):
x = x[0] if isinstance(x, (list, tuple)) else x
x = self.pool(x)
x = x.reshape(x.shape[0], -1)
x = self.proj(x)
return x

View File

@@ -0,0 +1,33 @@
"""Copyright(c) 2023 lyuwenyu. All Rights Reserved.
"""
import torch
from ...core import register
__all__ = ['YOLO', ]
@register()
class YOLO(torch.nn.Module):
__inject__ = ['backbone', 'neck', 'head', ]
def __init__(self, backbone: torch.nn.Module, neck, head):
super().__init__()
self.backbone = backbone
self.neck = neck
self.head = head
def forward(self, x, **kwargs):
x = self.backbone(x)
x = self.neck(x)
x = self.head(x)
return x
def deploy(self, ):
self.eval()
for m in self.modules():
if m is not self and hasattr(m, 'deploy'):
m.deploy()
return self

View File

@@ -0,0 +1,18 @@
"""Copyright(c) 2023 lyuwenyu. All Rights Reserved.
"""
from .common import (
get_activation,
FrozenBatchNorm2d,
freeze_batch_norm2d,
)
from .presnet import PResNet
from .test_resnet import MResNet
from .timm_model import TimmModel
from .torchvision_model import TorchVisionModel
from .csp_resnet import CSPResNet
from .csp_darknet import CSPDarkNet, CSPPAN
from .hgnetv2 import HGNetv2

View File

@@ -0,0 +1,97 @@
"""Copyright(c) 2023 lyuwenyu. All Rights Reserved.
"""
import torch
import torch.nn as nn
class FrozenBatchNorm2d(nn.Module):
"""copy and modified from https://github.com/facebookresearch/detr/blob/master/models/backbone.py
BatchNorm2d where the batch statistics and the affine parameters are fixed.
Copy-paste from torchvision.misc.ops with added eps before rqsrt,
without which any other models than torchvision.models.resnet[18,34,50,101]
produce nans.
"""
def __init__(self, num_features, eps=1e-5):
super(FrozenBatchNorm2d, self).__init__()
n = num_features
self.register_buffer("weight", torch.ones(n))
self.register_buffer("bias", torch.zeros(n))
self.register_buffer("running_mean", torch.zeros(n))
self.register_buffer("running_var", torch.ones(n))
self.eps = eps
self.num_features = n
def _load_from_state_dict(self, state_dict, prefix, local_metadata, strict,
missing_keys, unexpected_keys, error_msgs):
num_batches_tracked_key = prefix + 'num_batches_tracked'
if num_batches_tracked_key in state_dict:
del state_dict[num_batches_tracked_key]
super(FrozenBatchNorm2d, self)._load_from_state_dict(
state_dict, prefix, local_metadata, strict,
missing_keys, unexpected_keys, error_msgs)
def forward(self, x):
# move reshapes to the beginning
# to make it fuser-friendly
w = self.weight.reshape(1, -1, 1, 1)
b = self.bias.reshape(1, -1, 1, 1)
rv = self.running_var.reshape(1, -1, 1, 1)
rm = self.running_mean.reshape(1, -1, 1, 1)
scale = w * (rv + self.eps).rsqrt()
bias = b - rm * scale
return x * scale + bias
def extra_repr(self):
return (
"{num_features}, eps={eps}".format(**self.__dict__)
)
def freeze_batch_norm2d(module: nn.Module) -> nn.Module:
if isinstance(module, nn.BatchNorm2d):
module = FrozenBatchNorm2d(module.num_features)
else:
for name, child in module.named_children():
_child = freeze_batch_norm2d(child)
if _child is not child:
setattr(module, name, _child)
return module
def get_activation(act: str, inplace: bool=True):
"""get activation
"""
if act is None:
return nn.Identity()
elif isinstance(act, nn.Module):
return act
act = act.lower()
if act == 'silu' or act == 'swish':
m = nn.SiLU()
elif act == 'relu':
m = nn.ReLU()
elif act == 'leaky_relu':
m = nn.LeakyReLU()
elif act == 'silu':
m = nn.SiLU()
elif act == 'gelu':
m = nn.GELU()
elif act == 'hardsigmoid':
m = nn.Hardsigmoid()
else:
raise RuntimeError('')
if hasattr(m, 'inplace'):
m.inplace = inplace
return m

View File

@@ -0,0 +1,177 @@
"""Copyright(c) 2023 lyuwenyu. All Rights Reserved.
"""
import torch
import torch.nn as nn
import torch.nn.functional as F
import math
import warnings
from .common import get_activation
from ...core import register
def autopad(k, p=None):
if p is None:
p = k // 2 if isinstance(k, int) else [x // 2 for x in k]
return p
def make_divisible(c, d):
return math.ceil(c / d) * d
class Conv(nn.Module):
def __init__(self, cin, cout, k=1, s=1, p=None, g=1, act='silu') -> None:
super().__init__()
self.conv = nn.Conv2d(cin, cout, k, s, autopad(k, p), groups=g, bias=False)
self.bn = nn.BatchNorm2d(cout)
self.act = get_activation(act, inplace=True)
def forward(self, x):
return self.act(self.bn(self.conv(x)))
class Bottleneck(nn.Module):
# Standard bottleneck
def __init__(self, c1, c2, shortcut=True, g=1, e=0.5, act='silu'):
super().__init__()
c_ = int(c2 * e) # hidden channels
self.cv1 = Conv(c1, c_, 1, 1, act=act)
self.cv2 = Conv(c_, c2, 3, 1, g=g, act=act)
self.add = shortcut and c1 == c2
def forward(self, x):
return x + self.cv2(self.cv1(x)) if self.add else self.cv2(self.cv1(x))
class C3(nn.Module):
# CSP Bottleneck with 3 convolutions
def __init__(self, c1, c2, n=1, shortcut=True, g=1, e=0.5, act='silu'): # ch_in, ch_out, number, shortcut, groups, expansion
super().__init__()
c_ = int(c2 * e) # hidden channels
self.cv1 = Conv(c1, c_, 1, 1, act=act)
self.cv2 = Conv(c1, c_, 1, 1, act=act)
self.m = nn.Sequential(*(Bottleneck(c_, c_, shortcut, g, e=1.0, act=act) for _ in range(n)))
self.cv3 = Conv(2 * c_, c2, 1, act=act)
def forward(self, x):
return self.cv3(torch.cat((self.m(self.cv1(x)), self.cv2(x)), dim=1))
class SPPF(nn.Module):
# Spatial Pyramid Pooling - Fast (SPPF) layer for YOLOv5 by Glenn Jocher
def __init__(self, c1, c2, k=5, act='silu'): # equivalent to SPP(k=(5, 9, 13))
super().__init__()
c_ = c1 // 2 # hidden channels
self.cv1 = Conv(c1, c_, 1, 1, act=act)
self.cv2 = Conv(c_ * 4, c2, 1, 1, act=act)
self.m = nn.MaxPool2d(kernel_size=k, stride=1, padding=k // 2)
def forward(self, x):
x = self.cv1(x)
with warnings.catch_warnings():
warnings.simplefilter('ignore') # suppress torch 1.9.0 max_pool2d() warning
y1 = self.m(x)
y2 = self.m(y1)
return self.cv2(torch.cat([x, y1, y2, self.m(y2)], 1))
@register()
class CSPDarkNet(nn.Module):
__share__ = ['depth_multi', 'width_multi']
def __init__(self, in_channels=3, width_multi=1.0, depth_multi=1.0, return_idx=[2, 3, -1], act='silu', ) -> None:
super().__init__()
channels = [64, 128, 256, 512, 1024]
channels = [make_divisible(c * width_multi, 8) for c in channels]
depths = [3, 6, 9, 3]
depths = [max(round(d * depth_multi), 1) for d in depths]
self.layers = nn.ModuleList([Conv(in_channels, channels[0], 6, 2, 2, act=act)])
for i, (c, d) in enumerate(zip(channels, depths), 1):
layer = nn.Sequential(*[Conv(c, channels[i], 3, 2, act=act), C3(channels[i], channels[i], n=d, act=act)])
self.layers.append(layer)
self.layers.append(SPPF(channels[-1], channels[-1], k=5, act=act))
self.return_idx = return_idx
self.out_channels = [channels[i] for i in self.return_idx]
self.strides = [[2, 4, 8, 16, 32][i] for i in self.return_idx]
self.depths = depths
self.act = act
def forward(self, x):
outputs = []
for _, m in enumerate(self.layers):
x = m(x)
outputs.append(x)
return [outputs[i] for i in self.return_idx]
@register()
class CSPPAN(nn.Module):
"""
P5 ---> 1x1 ---------------------------------> concat --> c3 --> det
| up | conv /2
P4 ---> concat ---> c3 ---> 1x1 --> concat ---> c3 -----------> det
| up | conv /2
P3 -----------------------> concat ---> c3 ---------------------> det
"""
__share__ = ['depth_multi', ]
def __init__(self, in_channels=[256, 512, 1024], depth_multi=1., act='silu') -> None:
super().__init__()
depth = max(round(3 * depth_multi), 1)
self.out_channels = in_channels
self.fpn_stems = nn.ModuleList([Conv(cin, cout, 1, 1, act=act) for cin, cout in zip(in_channels[::-1], in_channels[::-1][1:])])
self.fpn_csps = nn.ModuleList([C3(cin, cout, depth, False, act=act) for cin, cout in zip(in_channels[::-1], in_channels[::-1][1:])])
self.pan_stems = nn.ModuleList([Conv(c, c, 3, 2, act=act) for c in in_channels[:-1]])
self.pan_csps = nn.ModuleList([C3(c, c, depth, False, act=act) for c in in_channels[1:]])
def forward(self, feats):
fpn_feats = []
for i, feat in enumerate(feats[::-1]):
if i == 0:
feat = self.fpn_stems[i](feat)
fpn_feats.append(feat)
else:
_feat = F.interpolate(fpn_feats[-1], scale_factor=2, mode='nearest')
feat = torch.concat([_feat, feat], dim=1)
feat = self.fpn_csps[i-1](feat)
if i < len(self.fpn_stems):
feat = self.fpn_stems[i](feat)
fpn_feats.append(feat)
pan_feats = []
for i, feat in enumerate(fpn_feats[::-1]):
if i == 0:
pan_feats.append(feat)
else:
_feat = self.pan_stems[i-1](pan_feats[-1])
feat = torch.concat([_feat, feat], dim=1)
feat = self.pan_csps[i-1](feat)
pan_feats.append(feat)
return pan_feats
if __name__ == '__main__':
data = torch.rand(1, 3, 320, 640)
width_multi = 0.75
depth_multi = 0.33
m = CSPDarkNet(3, width_multi=width_multi, depth_multi=depth_multi, act='silu')
outputs = m(data)
print([o.shape for o in outputs])
m = CSPPAN(in_channels=m.out_channels, depth_multi=depth_multi, act='silu')
outputs = m(outputs)
print([o.shape for o in outputs])

View File

@@ -0,0 +1,277 @@
"""
https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.6/ppdet/modeling/backbones/cspresnet.py
Copyright(c) 2023 lyuwenyu. All Rights Reserved.
"""
import torch
import torch.nn as nn
import torch.nn.functional as F
from collections import OrderedDict
from .common import get_activation
from ...core import register
__all__ = ['CSPResNet']
donwload_url = {
's': 'https://github.com/lyuwenyu/storage/releases/download/v0.1/CSPResNetb_s_pretrained_from_paddle.pth',
'm': 'https://github.com/lyuwenyu/storage/releases/download/v0.1/CSPResNetb_m_pretrained_from_paddle.pth',
'l': 'https://github.com/lyuwenyu/storage/releases/download/v0.1/CSPResNetb_l_pretrained_from_paddle.pth',
'x': 'https://github.com/lyuwenyu/storage/releases/download/v0.1/CSPResNetb_x_pretrained_from_paddle.pth',
}
class ConvBNLayer(nn.Module):
def __init__(self, ch_in, ch_out, filter_size=3, stride=1, groups=1, padding=0, act=None):
super().__init__()
self.conv = nn.Conv2d(ch_in, ch_out, filter_size, stride, padding, groups=groups, bias=False)
self.bn = nn.BatchNorm2d(ch_out)
self.act = get_activation(act)
def forward(self, x: torch.Tensor) -> torch.Tensor:
x = self.conv(x)
x = self.bn(x)
x = self.act(x)
return x
class RepVggBlock(nn.Module):
def __init__(self, ch_in, ch_out, act='relu', alpha: bool=False):
super().__init__()
self.ch_in = ch_in
self.ch_out = ch_out
self.conv1 = ConvBNLayer(
ch_in, ch_out, 3, stride=1, padding=1, act=None)
self.conv2 = ConvBNLayer(
ch_in, ch_out, 1, stride=1, padding=0, act=None)
self.act = get_activation(act)
if alpha:
self.alpha = nn.Parameter(torch.ones(1, ))
else:
self.alpha = None
def forward(self, x):
if hasattr(self, 'conv'):
y = self.conv(x)
else:
if self.alpha:
y = self.conv1(x) + self.alpha * self.conv2(x)
else:
y = self.conv1(x) + self.conv2(x)
y = self.act(y)
return y
def convert_to_deploy(self):
if not hasattr(self, 'conv'):
self.conv = nn.Conv2d(self.ch_in, self.ch_out, 3, 1, padding=1)
kernel, bias = self.get_equivalent_kernel_bias()
self.conv.weight.data = kernel
self.conv.bias.data = bias
def get_equivalent_kernel_bias(self):
kernel3x3, bias3x3 = self._fuse_bn_tensor(self.conv1)
kernel1x1, bias1x1 = self._fuse_bn_tensor(self.conv2)
if self.alpha:
return kernel3x3 + self.alpha * self._pad_1x1_to_3x3_tensor(
kernel1x1), bias3x3 + self.alpha * bias1x1
else:
return kernel3x3 + self._pad_1x1_to_3x3_tensor(
kernel1x1), bias3x3 + bias1x1
def _pad_1x1_to_3x3_tensor(self, kernel1x1):
if kernel1x1 is None:
return 0
else:
return F.pad(kernel1x1, [1, 1, 1, 1])
def _fuse_bn_tensor(self, branch: ConvBNLayer):
if branch is None:
return 0, 0
kernel = branch.conv.weight
running_mean = branch.norm.running_mean
running_var = branch.norm.running_var
gamma = branch.norm.weight
beta = branch.norm.bias
eps = branch.norm.eps
std = (running_var + eps).sqrt()
t = (gamma / std).reshape(-1, 1, 1, 1)
return kernel * t, beta - running_mean * gamma / std
class BasicBlock(nn.Module):
def __init__(self,
ch_in,
ch_out,
act='relu',
shortcut=True,
use_alpha=False):
super().__init__()
assert ch_in == ch_out
self.conv1 = ConvBNLayer(ch_in, ch_out, 3, stride=1, padding=1, act=act)
self.conv2 = RepVggBlock(ch_out, ch_out, act=act, alpha=use_alpha)
self.shortcut = shortcut
def forward(self, x):
y = self.conv1(x)
y = self.conv2(y)
if self.shortcut:
return x + y
else:
return y
class EffectiveSELayer(nn.Module):
""" Effective Squeeze-Excitation
From `CenterMask : Real-Time Anchor-Free Instance Segmentation` - https://arxiv.org/abs/1911.06667
"""
def __init__(self, channels, act='hardsigmoid'):
super(EffectiveSELayer, self).__init__()
self.fc = nn.Conv2d(channels, channels, kernel_size=1, padding=0)
self.act = get_activation(act)
def forward(self, x: torch.Tensor):
x_se = x.mean((2, 3), keepdim=True)
x_se = self.fc(x_se)
x_se = self.act(x_se)
return x * x_se
class CSPResStage(nn.Module):
def __init__(self,
block_fn,
ch_in,
ch_out,
n,
stride,
act='relu',
attn='eca',
use_alpha=False):
super().__init__()
ch_mid = (ch_in + ch_out) // 2
if stride == 2:
self.conv_down = ConvBNLayer(
ch_in, ch_mid, 3, stride=2, padding=1, act=act)
else:
self.conv_down = None
self.conv1 = ConvBNLayer(ch_mid, ch_mid // 2, 1, act=act)
self.conv2 = ConvBNLayer(ch_mid, ch_mid // 2, 1, act=act)
self.blocks = nn.Sequential(*[
block_fn(
ch_mid // 2,
ch_mid // 2,
act=act,
shortcut=True,
use_alpha=use_alpha) for i in range(n)
])
if attn:
self.attn = EffectiveSELayer(ch_mid, act='hardsigmoid')
else:
self.attn = None
self.conv3 = ConvBNLayer(ch_mid, ch_out, 1, act=act)
def forward(self, x):
if self.conv_down is not None:
x = self.conv_down(x)
y1 = self.conv1(x)
y2 = self.blocks(self.conv2(x))
y = torch.concat([y1, y2], dim=1)
if self.attn is not None:
y = self.attn(y)
y = self.conv3(y)
return y
@register()
class CSPResNet(nn.Module):
layers = [3, 6, 6, 3]
channels = [64, 128, 256, 512, 1024]
model_cfg = {
's': {'depth_mult': 0.33, 'width_mult': 0.50, },
'm': {'depth_mult': 0.67, 'width_mult': 0.75, },
'l': {'depth_mult': 1.00, 'width_mult': 1.00, },
'x': {'depth_mult': 1.33, 'width_mult': 1.25, },
}
def __init__(self,
name: str,
act='silu',
return_idx=[1, 2, 3],
use_large_stem=True,
use_alpha=False,
pretrained=False):
super().__init__()
depth_mult = self.model_cfg[name]['depth_mult']
width_mult = self.model_cfg[name]['width_mult']
channels = [max(round(c * width_mult), 1) for c in self.channels]
layers = [max(round(l * depth_mult), 1) for l in self.layers]
act = get_activation(act)
if use_large_stem:
self.stem = nn.Sequential(OrderedDict([
('conv1', ConvBNLayer(
3, channels[0] // 2, 3, stride=2, padding=1, act=act)),
('conv2', ConvBNLayer(
channels[0] // 2,
channels[0] // 2,
3,
stride=1,
padding=1,
act=act)), ('conv3', ConvBNLayer(
channels[0] // 2,
channels[0],
3,
stride=1,
padding=1,
act=act))]))
else:
self.stem = nn.Sequential(OrderedDict([
('conv1', ConvBNLayer(
3, channels[0] // 2, 3, stride=2, padding=1, act=act)),
('conv2', ConvBNLayer(
channels[0] // 2,
channels[0],
3,
stride=1,
padding=1,
act=act))]))
n = len(channels) - 1
self.stages = nn.Sequential(OrderedDict([(str(i), CSPResStage(
BasicBlock,
channels[i],
channels[i + 1],
layers[i],
2,
act=act,
use_alpha=use_alpha)) for i in range(n)]))
self._out_channels = channels[1:]
self._out_strides = [4 * 2**i for i in range(n)]
self.return_idx = return_idx
if pretrained:
if isinstance(pretrained, bool) or 'http' in pretrained:
state = torch.hub.load_state_dict_from_url(donwload_url[name], map_location='cpu')
else:
state = torch.load(pretrained, map_location='cpu')
self.load_state_dict(state)
print(f'Load CSPResNet_{name} state_dict')
def forward(self, x):
x = self.stem(x)
outs = []
for idx, stage in enumerate(self.stages):
x = stage(x)
if idx in self.return_idx:
outs.append(x)
return outs

View File

@@ -0,0 +1,428 @@
"""Copyright(c) 2023 lyuwenyu. All Rights Reserved.
https://github.com/PaddlePaddle/PaddleDetection/blob/develop/ppdet/modeling/backbones/hgnet_v2.py
"""
import torch
import torch.nn as nn
import torch.nn.init as init
import torch.nn.functional as F
from torch import Tensor
from typing import List, Tuple
from .common import FrozenBatchNorm2d
from ...core import register
__all__ = ['HGNetv2']
class LearnableAffineBlock(nn.Module):
def __init__(self, scale_value=1.0, bias_value=0.0):
super().__init__()
self.scale = nn.Parameter(torch.tensor([scale_value]))
self.bias = nn.Parameter(torch.tensor([bias_value]))
def forward(self, x: Tensor) -> Tensor:
return self.scale * x + self.bias
class ConvBNAct(nn.Module):
def __init__(self,
in_channels,
out_channels,
kernel_size=3,
stride=1,
padding=0,
groups=1,
use_act=True,
use_lab=False):
super().__init__()
self.use_act = use_act
self.use_lab = use_lab
if padding == 'same':
self.conv = nn.Sequential(
nn.ZeroPad2d([0, 1, 0, 1]),
nn.Conv2d(
in_channels,
out_channels,
kernel_size,
stride,
groups=groups,
bias=False
)
)
else:
self.conv = nn.Conv2d(
in_channels,
out_channels,
kernel_size,
stride,
padding=(kernel_size - 1) // 2,
groups=groups,
bias=False
)
self.bn = nn.BatchNorm2d(out_channels)
if self.use_act:
self.act = nn.ReLU()
if self.use_lab:
self.lab = LearnableAffineBlock()
def forward(self, x: Tensor) -> Tensor:
x = self.conv(x)
x = self.bn(x)
if self.use_act:
x = self.act(x)
if self.use_lab:
x = self.lab(x)
return x
class LightConvBNAct(nn.Module):
def __init__(self,
in_channels,
out_channels,
kernel_size,
stride,
groups=1,
use_lab=False):
super().__init__()
self.conv1 = ConvBNAct(
in_channels=in_channels,
out_channels=out_channels,
kernel_size=1,
use_act=False,
use_lab=use_lab
)
self.conv2 = ConvBNAct(
in_channels=out_channels,
out_channels=out_channels,
kernel_size=kernel_size,
groups=out_channels,
use_act=True,
use_lab=use_lab
)
def forward(self, x: Tensor) -> Tensor:
x = self.conv1(x)
x = self.conv2(x)
return x
class StemBlock(nn.Module):
def __init__(self,
in_channels,
mid_channels,
out_channels,
use_lab=False):
super().__init__()
self.stem1 = ConvBNAct(
in_channels=in_channels,
out_channels=mid_channels,
kernel_size=3,
stride=2,
use_lab=use_lab
)
self.stem2a = ConvBNAct(
in_channels=mid_channels,
out_channels=mid_channels // 2,
kernel_size=2,
stride=1,
padding='same',
use_lab=use_lab
)
self.stem2b = ConvBNAct(
in_channels=mid_channels // 2,
out_channels=mid_channels,
kernel_size=2,
stride=1,
padding='same',
use_lab=use_lab
)
self.stem3 = ConvBNAct(
in_channels=mid_channels * 2,
out_channels=mid_channels,
kernel_size=3,
stride=2,
use_lab=use_lab
)
self.stem4 = ConvBNAct(
in_channels=mid_channels,
out_channels=out_channels,
kernel_size=1,
stride=1,
use_lab=use_lab
)
self.pool = nn.Sequential(
nn.ZeroPad2d([0, 1, 0, 1]),
nn.MaxPool2d(2, 1, ceil_mode=True)
)
def forward(self, x: Tensor) -> Tensor:
x = self.stem1(x)
x2 = self.stem2a(x)
x2 = self.stem2b(x2)
x1 = self.pool(x)
x = torch.concat([x1, x2], dim=1)
x = self.stem3(x)
x = self.stem4(x)
return x
class HG_Block(nn.Module):
def __init__(self,
in_channels,
mid_channels,
out_channels,
kernel_size=3,
layer_num=6,
identity=False,
light_block=True,
use_lab=False):
super().__init__()
self.identity = identity
self.layers = nn.ModuleList()
block_type = "LightConvBNAct" if light_block else "ConvBNAct"
for i in range(layer_num):
self.layers.append(
eval(block_type)(in_channels=in_channels
if i == 0 else mid_channels,
out_channels=mid_channels,
stride=1,
kernel_size=kernel_size,
use_lab=use_lab))
# feature aggregation
total_channels = in_channels + layer_num * mid_channels
self.aggregation_squeeze_conv = ConvBNAct(
in_channels=total_channels,
out_channels=out_channels // 2,
kernel_size=1,
stride=1,
use_lab=use_lab)
self.aggregation_excitation_conv = ConvBNAct(
in_channels=out_channels // 2,
out_channels=out_channels,
kernel_size=1,
stride=1,
use_lab=use_lab)
def forward(self, x):
identity = x
output = []
output.append(x)
for layer in self.layers:
x = layer(x)
output.append(x)
x = torch.concat(output, dim=1)
x = self.aggregation_squeeze_conv(x)
x = self.aggregation_excitation_conv(x)
if self.identity:
x = x + identity
return x
class HG_Stage(nn.Module):
def __init__(self,
in_channels,
mid_channels,
out_channels,
block_num,
layer_num=6,
downsample=True,
light_block=True,
kernel_size=3,
use_lab=False):
super().__init__()
self.downsample = downsample
if downsample:
self.downsample = ConvBNAct(
in_channels=in_channels,
out_channels=in_channels,
kernel_size=3,
stride=2,
groups=in_channels,
use_act=False,
use_lab=use_lab)
blocks_list = []
for i in range(block_num):
blocks_list.append(
HG_Block(
in_channels=in_channels if i == 0 else out_channels,
mid_channels=mid_channels,
out_channels=out_channels,
kernel_size=kernel_size,
layer_num=layer_num,
identity=False if i == 0 else True,
light_block=light_block,
use_lab=use_lab))
self.blocks = nn.Sequential(*blocks_list)
def forward(self, x):
if self.downsample:
x = self.downsample(x)
x = self.blocks(x)
return x
@register()
class HGNetv2(nn.Module):
"""
Args:
stem_channels: list. Number of channels for the stem block.
stage_type: str. The stage configuration of PPHGNet. such as the number of channels, stride, etc.
use_lab: boolean. Whether to use LearnableAffineBlock in network.
lr_mult_list: list. Control the learning rate of different stages.
Returns:
model: nn.Module.
"""
arch_configs = {
'L': {
'stem_channels': [3, 32, 48],
'stage_config': {
# in_channels, mid_channels, out_channels, num_blocks, downsample, light_block, kernel_size, layer_num
"stage1": [48, 48, 128, 1, False, False, 3, 6],
"stage2": [128, 96, 512, 1, True, False, 3, 6],
"stage3": [512, 192, 1024, 3, True, True, 5, 6],
"stage4": [1024, 384, 2048, 1, True, True, 5, 6],
},
'url': 'https://github.com/lyuwenyu/storage/releases/download/v0.1/PPHGNetV2_L_ssld_pretrained_from_paddle.pth',
},
'X': {
'stem_channels': [3, 32, 64],
'stage_config': {
# in_channels, mid_channels, out_channels, num_blocks, downsample, light_block, kernel_size, layer_num
"stage1": [64, 64, 128, 1, False, False, 3, 6],
"stage2": [128, 128, 512, 2, True, False, 3, 6],
"stage3": [512, 256, 1024, 5, True, True, 5, 6],
"stage4": [1024, 512, 2048, 2, True, True, 5, 6],
},
'url': 'https://github.com/lyuwenyu/storage/releases/download/v0.1/PPHGNetV2_X_ssld_pretrained_from_paddle.pth',
},
'H': {
'stem_channels': [3, 48, 96],
'stage_config': {
# in_channels, mid_channels, out_channels, num_blocks, downsample, light_block, kernel_size, layer_num
"stage1": [96, 96, 192, 2, False, False, 3, 6],
"stage2": [192, 192, 512, 3, True, False, 3, 6],
"stage3": [512, 384, 1024, 6, True, True, 5, 6],
"stage4": [1024, 768, 2048, 3, True, True, 5, 6],
},
'url': 'https://github.com/lyuwenyu/storage/releases/download/v0.1/PPHGNetV2_H_ssld_pretrained_from_paddle.pth',
}
}
def __init__(self,
name,
use_lab=False,
return_idx=[1, 2, 3],
freeze_at=-1,
freeze_norm=False,
pretrained=False):
super().__init__()
self.use_lab = use_lab
self.return_idx = return_idx
stem_channels = self.arch_configs[name]['stem_channels']
stage_config = self.arch_configs[name]['stage_config']
download_url = self.arch_configs[name]['url']
self._out_strides = [4, 8, 16, 32]
self._out_channels = [stage_config[k][2] for k in stage_config]
# stem
self.stem = StemBlock(
in_channels=stem_channels[0],
mid_channels=stem_channels[1],
out_channels=stem_channels[2],
use_lab=use_lab
)
# stages
self.stages = nn.ModuleList()
for i, k in enumerate(stage_config):
in_channels, mid_channels, out_channels, block_num, downsample, light_block, kernel_size, layer_num = stage_config[
k]
self.stages.append(
HG_Stage(
in_channels,
mid_channels,
out_channels,
block_num,
layer_num,
downsample,
light_block,
kernel_size,
use_lab))
self._init_weights()
if freeze_at >= 0:
self._freeze_parameters(self.stem)
for i in range(min(freeze_at, 4)):
self._freeze_parameters(self.stages[i])
if freeze_norm:
self._freeze_norm(self)
if pretrained:
if isinstance(pretrained, bool) or 'http' in pretrained:
state = torch.hub.load_state_dict_from_url(download_url, map_location='cpu')
else:
state = torch.load(pretrained, map_location='cpu')
self.load_state_dict(state)
print(f'Load HGNetv2_{name} state_dict')
def _init_weights(self):
for m in self.modules():
if isinstance(m, nn.Conv2d):
init.kaiming_normal_(m.weight)
elif isinstance(m, (nn.BatchNorm2d)):
init.constant_(m.weight, 1)
init.constant_(m.bias, 0)
elif isinstance(m, nn.Linear):
init.constant_(m.bias, 0)
def _freeze_parameters(self, m: nn.Module):
for p in m.parameters():
p.requires_grad = False
def _freeze_norm(self, m: nn.Module):
if isinstance(m, nn.BatchNorm2d):
m = FrozenBatchNorm2d(m.num_features)
else:
for name, child in m.named_children():
_child = self._freeze_norm(child)
if _child is not child:
setattr(m, name, _child)
return m
def forward(self, x: Tensor) -> List[Tensor]:
x = self.stem(x)
outs = []
for idx, stage in enumerate(self.stages):
x = stage(x)
if idx in self.return_idx:
outs.append(x)
return outs
if __name__ == '__main__':
m = HGNetv2(name='X', pretrained=False, freeze_at=-1, freeze_norm=False)
data = torch.randn(1, 3, 640, 640)
output = m(data)
print([o.shape for o in output])
output[0].mean().backward()

View File

@@ -0,0 +1,245 @@
"""Copyright(c) 2023 lyuwenyu. All Rights Reserved.
"""
import torch
import torch.nn as nn
import torch.nn.functional as F
from collections import OrderedDict
from .common import get_activation, FrozenBatchNorm2d
from ...core import register
__all__ = ['PResNet']
ResNet_cfg = {
18: [2, 2, 2, 2],
34: [3, 4, 6, 3],
50: [3, 4, 6, 3],
101: [3, 4, 23, 3],
# 152: [3, 8, 36, 3],
}
donwload_url = {
18: 'https://github.com/lyuwenyu/storage/releases/download/v0.1/ResNet18_vd_pretrained_from_paddle.pth',
34: 'https://github.com/lyuwenyu/storage/releases/download/v0.1/ResNet34_vd_pretrained_from_paddle.pth',
50: 'https://github.com/lyuwenyu/storage/releases/download/v0.1/ResNet50_vd_ssld_v2_pretrained_from_paddle.pth',
101: 'https://github.com/lyuwenyu/storage/releases/download/v0.1/ResNet101_vd_ssld_pretrained_from_paddle.pth',
}
class ConvNormLayer(nn.Module):
def __init__(self, ch_in, ch_out, kernel_size, stride, padding=None, bias=False, act=None):
super().__init__()
self.conv = nn.Conv2d(
ch_in,
ch_out,
kernel_size,
stride,
padding=(kernel_size-1)//2 if padding is None else padding,
bias=bias)
self.norm = nn.BatchNorm2d(ch_out)
self.act = get_activation(act)
def forward(self, x):
return self.act(self.norm(self.conv(x)))
class BasicBlock(nn.Module):
expansion = 1
def __init__(self, ch_in, ch_out, stride, shortcut, act='relu', variant='b'):
super().__init__()
self.shortcut = shortcut
if not shortcut:
if variant == 'd' and stride == 2:
self.short = nn.Sequential(OrderedDict([
('pool', nn.AvgPool2d(2, 2, 0, ceil_mode=True)),
('conv', ConvNormLayer(ch_in, ch_out, 1, 1))
]))
else:
self.short = ConvNormLayer(ch_in, ch_out, 1, stride)
self.branch2a = ConvNormLayer(ch_in, ch_out, 3, stride, act=act)
self.branch2b = ConvNormLayer(ch_out, ch_out, 3, 1, act=None)
self.act = nn.Identity() if act is None else get_activation(act)
def forward(self, x):
out = self.branch2a(x)
out = self.branch2b(out)
if self.shortcut:
short = x
else:
short = self.short(x)
out = out + short
out = self.act(out)
return out
class BottleNeck(nn.Module):
expansion = 4
def __init__(self, ch_in, ch_out, stride, shortcut, act='relu', variant='b'):
super().__init__()
if variant == 'a':
stride1, stride2 = stride, 1
else:
stride1, stride2 = 1, stride
width = ch_out
self.branch2a = ConvNormLayer(ch_in, width, 1, stride1, act=act)
self.branch2b = ConvNormLayer(width, width, 3, stride2, act=act)
self.branch2c = ConvNormLayer(width, ch_out * self.expansion, 1, 1)
self.shortcut = shortcut
if not shortcut:
if variant == 'd' and stride == 2:
self.short = nn.Sequential(OrderedDict([
('pool', nn.AvgPool2d(2, 2, 0, ceil_mode=True)),
('conv', ConvNormLayer(ch_in, ch_out * self.expansion, 1, 1))
]))
else:
self.short = ConvNormLayer(ch_in, ch_out * self.expansion, 1, stride)
self.act = nn.Identity() if act is None else get_activation(act)
def forward(self, x):
out = self.branch2a(x)
out = self.branch2b(out)
out = self.branch2c(out)
if self.shortcut:
short = x
else:
short = self.short(x)
out = out + short
out = self.act(out)
return out
class Blocks(nn.Module):
def __init__(self, block, ch_in, ch_out, count, stage_num, act='relu', variant='b'):
super().__init__()
self.blocks = nn.ModuleList()
for i in range(count):
self.blocks.append(
block(
ch_in,
ch_out,
stride=2 if i == 0 and stage_num != 2 else 1,
shortcut=False if i == 0 else True,
variant=variant,
act=act)
)
if i == 0:
ch_in = ch_out * block.expansion
def forward(self, x):
out = x
for block in self.blocks:
out = block(out)
return out
@register()
class PResNet(nn.Module):
def __init__(
self,
depth,
variant='d',
num_stages=4,
return_idx=[0, 1, 2, 3],
act='relu',
freeze_at=-1,
freeze_norm=True,
pretrained=False):
super().__init__()
block_nums = ResNet_cfg[depth]
ch_in = 64
if variant in ['c', 'd']:
conv_def = [
[3, ch_in // 2, 3, 2, "conv1_1"],
[ch_in // 2, ch_in // 2, 3, 1, "conv1_2"],
[ch_in // 2, ch_in, 3, 1, "conv1_3"],
]
else:
conv_def = [[3, ch_in, 7, 2, "conv1_1"]]
self.conv1 = nn.Sequential(OrderedDict([
(name, ConvNormLayer(cin, cout, k, s, act=act)) for cin, cout, k, s, name in conv_def
]))
ch_out_list = [64, 128, 256, 512]
block = BottleNeck if depth >= 50 else BasicBlock
_out_channels = [block.expansion * v for v in ch_out_list]
_out_strides = [4, 8, 16, 32]
self.res_layers = nn.ModuleList()
for i in range(num_stages):
stage_num = i + 2
self.res_layers.append(
Blocks(block, ch_in, ch_out_list[i], block_nums[i], stage_num, act=act, variant=variant)
)
ch_in = _out_channels[i]
self.return_idx = return_idx
self.out_channels = [_out_channels[_i] for _i in return_idx]
self.out_strides = [_out_strides[_i] for _i in return_idx]
if freeze_at >= 0:
self._freeze_parameters(self.conv1)
for i in range(min(freeze_at, num_stages)):
self._freeze_parameters(self.res_layers[i])
if freeze_norm:
self._freeze_norm(self)
if pretrained:
if isinstance(pretrained, bool) or 'http' in pretrained:
state = torch.hub.load_state_dict_from_url(donwload_url[depth], map_location='cpu')
else:
state = torch.load(pretrained, map_location='cpu')
self.load_state_dict(state)
print(f'Load PResNet{depth} state_dict')
def _freeze_parameters(self, m: nn.Module):
for p in m.parameters():
p.requires_grad = False
def _freeze_norm(self, m: nn.Module):
if isinstance(m, nn.BatchNorm2d):
m = FrozenBatchNorm2d(m.num_features)
else:
for name, child in m.named_children():
_child = self._freeze_norm(child)
if _child is not child:
setattr(m, name, _child)
return m
def forward(self, x):
conv1 = self.conv1(x)
x = F.max_pool2d(conv1, kernel_size=3, stride=2, padding=1)
outs = []
for idx, stage in enumerate(self.res_layers):
x = stage(x)
if idx in self.return_idx:
outs.append(x)
return outs

View File

@@ -0,0 +1,81 @@
import torch
import torch.nn as nn
import torch.nn.functional as F
from collections import OrderedDict
from ...core import register
class BasicBlock(nn.Module):
expansion = 1
def __init__(self, in_planes, planes, stride=1):
super(BasicBlock, self).__init__()
self.conv1 = nn.Conv2d(in_planes, planes, kernel_size=3, stride=stride, padding=1, bias=False)
self.bn1 = nn.BatchNorm2d(planes)
self.conv2 = nn.Conv2d(planes, planes, kernel_size=3,stride=1, padding=1, bias=False)
self.bn2 = nn.BatchNorm2d(planes)
self.shortcut = nn.Sequential()
if stride != 1 or in_planes != self.expansion*planes:
self.shortcut = nn.Sequential(
nn.Conv2d(in_planes, self.expansion*planes,kernel_size=1, stride=stride, bias=False),
nn.BatchNorm2d(self.expansion*planes)
)
def forward(self, x):
out = F.relu(self.bn1(self.conv1(x)))
out = self.bn2(self.conv2(out))
out += self.shortcut(x)
out = F.relu(out)
return out
class _ResNet(nn.Module):
def __init__(self, block, num_blocks, num_classes=10):
super().__init__()
self.in_planes = 64
self.conv1 = nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1, bias=False)
self.bn1 = nn.BatchNorm2d(64)
self.layer1 = self._make_layer(block, 64, num_blocks[0], stride=1)
self.layer2 = self._make_layer(block, 128, num_blocks[1], stride=2)
self.layer3 = self._make_layer(block, 256, num_blocks[2], stride=2)
self.layer4 = self._make_layer(block, 512, num_blocks[3], stride=2)
self.linear = nn.Linear(512 * block.expansion, num_classes)
def _make_layer(self, block, planes, num_blocks, stride):
strides = [stride] + [1]*(num_blocks-1)
layers = []
for stride in strides:
layers.append(block(self.in_planes, planes, stride))
self.in_planes = planes * block.expansion
return nn.Sequential(*layers)
def forward(self, x):
out = F.relu(self.bn1(self.conv1(x)))
out = self.layer1(out)
out = self.layer2(out)
out = self.layer3(out)
out = self.layer4(out)
out = F.avg_pool2d(out, 4)
out = out.view(out.size(0), -1)
out = self.linear(out)
return out
@register()
class MResNet(nn.Module):
def __init__(self, num_classes=10, num_blocks=[2, 2, 2, 2]) -> None:
super().__init__()
self.model = _ResNet(BasicBlock, num_blocks, num_classes)
def forward(self, x):
return self.model(x)

View File

@@ -0,0 +1,70 @@
"""Copyright(c) 2023 lyuwenyu. All Rights Reserved.
https://towardsdatascience.com/getting-started-with-pytorch-image-models-timm-a-practitioners-guide-4e77b4bf9055#0583
"""
import torch
from torchvision.models.feature_extraction import get_graph_node_names, create_feature_extractor
from .utils import IntermediateLayerGetter
from ...core import register
@register()
class TimmModel(torch.nn.Module):
def __init__(self, \
name,
return_layers,
pretrained=False,
exportable=True,
features_only=True,
**kwargs) -> None:
super().__init__()
import timm
model = timm.create_model(
name,
pretrained=pretrained,
exportable=exportable,
features_only=features_only,
**kwargs
)
# nodes, _ = get_graph_node_names(model)
# print(nodes)
# features = {'': ''}
# model = create_feature_extractor(model, return_nodes=features)
assert set(return_layers).issubset(model.feature_info.module_name()), \
f'return_layers should be a subset of {model.feature_info.module_name()}'
# self.model = model
self.model = IntermediateLayerGetter(model, return_layers)
return_idx = [model.feature_info.module_name().index(name) for name in return_layers]
self.strides = [model.feature_info.reduction()[i] for i in return_idx]
self.channels = [model.feature_info.channels()[i] for i in return_idx]
self.return_idx = return_idx
self.return_layers = return_layers
def forward(self, x: torch.Tensor):
outputs = self.model(x)
# outputs = [outputs[i] for i in self.return_idx]
return outputs
if __name__ == '__main__':
model = TimmModel(name='resnet34', return_layers=['layer2', 'layer3'])
data = torch.rand(1, 3, 640, 640)
outputs = model(data)
for output in outputs:
print(output.shape)
"""
model:
type: TimmModel
name: resnet34
return_layers: ['layer2', 'layer4']
"""

View File

@@ -0,0 +1,49 @@
"""Copyright(c) 2023 lyuwenyu. All Rights Reserved.
"""
import torch
import torchvision
from ...core import register
from .utils import IntermediateLayerGetter
__all__ = ['TorchVisionModel']
@register()
class TorchVisionModel(torch.nn.Module):
def __init__(self, name, return_layers, weights=None, **kwargs) -> None:
super().__init__()
if weights is not None:
weights = getattr(torchvision.models.get_model_weights(name), weights)
model = torchvision.models.get_model(name, weights=weights, **kwargs)
# TODO hard code.
if hasattr(model, 'features'):
model = IntermediateLayerGetter(model.features, return_layers)
else:
model = IntermediateLayerGetter(model, return_layers)
self.model = model
def forward(self, x):
return self.model(x)
# TorchVisionModel('swin_t', return_layers=['5', '7'])
# TorchVisionModel('resnet34', return_layers=['layer2','layer3', 'layer4'])
"""
TorchVisionModel:
name: swin_t
return_layers: ['5', '7']
weights: DEFAULT
model:
type: TorchVisionModel
name: resnet34
return_layers: ['layer2','layer3', 'layer4']
weights: DEFAULT
"""

View File

@@ -0,0 +1,55 @@
"""
https://github.com/pytorch/vision/blob/main/torchvision/models/_utils.py
Copyright(c) 2023 lyuwenyu. All Rights Reserved.
"""
from collections import OrderedDict
from typing import Dict, List
import torch.nn as nn
class IntermediateLayerGetter(nn.ModuleDict):
"""
Module wrapper that returns intermediate layers from a model
It has a strong assumption that the modules have been registered
into the model in the same order as they are used.
This means that one should **not** reuse the same nn.Module
twice in the forward if you want this to work.
Additionally, it is only able to query submodules that are directly
assigned to the model. So if `model` is passed, `model.feature1` can
be returned, but not `model.feature1.layer2`.
"""
_version = 3
def __init__(self, model: nn.Module, return_layers: List[str]) -> None:
if not set(return_layers).issubset([name for name, _ in model.named_children()]):
raise ValueError("return_layers are not present in model. {}"\
.format([name for name, _ in model.named_children()]))
orig_return_layers = return_layers
return_layers = {str(k): str(k) for k in return_layers}
layers = OrderedDict()
for name, module in model.named_children():
layers[name] = module
if name in return_layers:
del return_layers[name]
if not return_layers:
break
super().__init__(layers)
self.return_layers = orig_return_layers
def forward(self, x):
outputs = []
for name, module in self.items():
x = module(x)
if name in self.return_layers:
outputs.append(x)
return outputs

View File

@@ -0,0 +1,10 @@
"""Copyright(c) 2023 lyuwenyu. All Rights Reserved.
"""
import torch.nn as nn
from ...core import register
from .det_criterion import DetCriterion
CrossEntropyLoss = register()(nn.CrossEntropyLoss)

View File

@@ -0,0 +1,171 @@
"""Copyright(c) 2023 lyuwenyu. All Rights Reserved.
"""
import torch
import torch.nn.functional as F
import torch.distributed
import torchvision
from ...misc import box_ops
from ...misc import dist_utils
from ...core import register
@register()
class DetCriterion(torch.nn.Module):
"""Default Detection Criterion
"""
__share__ = ['num_classes']
__inject__ = ['matcher']
def __init__(self,
losses,
weight_dict,
num_classes=80,
alpha=0.75,
gamma=2.0,
box_fmt='cxcywh',
matcher=None):
"""
Args:
losses (list[str]): requested losses, support ['boxes', 'vfl', 'focal']
weight_dict (dict[str, float)]: corresponding losses weight, including
['loss_bbox', 'loss_giou', 'loss_vfl', 'loss_focal']
box_fmt (str): in box format, 'cxcywh' or 'xyxy'
matcher (Matcher): matcher used to match source to target
"""
super().__init__()
self.losses = losses
self.weight_dict = weight_dict
self.alpha = alpha
self.gamma = gamma
self.num_classes = num_classes
self.box_fmt = box_fmt
assert matcher is not None, ''
self.matcher = matcher
def forward(self, outputs, targets, **kwargs):
"""
Args:
outputs: Dict[Tensor], 'pred_boxes', 'pred_logits', 'meta'.
targets, List[Dict[str, Tensor]], len(targets) == batch_size.
kwargs, store other information such as current epoch id.
Return:
losses, Dict[str, Tensor]
"""
matched = self.matcher(outputs, targets)
values = matched['values']
indices = matched['indices']
num_boxes = self._get_positive_nums(indices)
# Compute all the requested losses
losses = {}
for loss in self.losses:
l_dict = self.get_loss(loss, outputs, targets, indices, num_boxes)
l_dict = {k: l_dict[k] * self.weight_dict[k] for k in l_dict if k in self.weight_dict}
losses.update(l_dict)
return losses
def _get_src_permutation_idx(self, indices):
# permute predictions following indices
batch_idx = torch.cat([torch.full_like(src, i) for i, (src, _) in enumerate(indices)])
src_idx = torch.cat([src for (src, _) in indices])
return batch_idx, src_idx
def _get_tgt_permutation_idx(self, indices):
# permute targets following indices
batch_idx = torch.cat([torch.full_like(tgt, i) for i, (_, tgt) in enumerate(indices)])
tgt_idx = torch.cat([tgt for (_, tgt) in indices])
return batch_idx, tgt_idx
def _get_positive_nums(self, indices):
# number of positive samples
num_pos = sum(len(i) for (i, _) in indices)
num_pos = torch.as_tensor([num_pos], dtype=torch.float32, device=indices[0][0].device)
if dist_utils.is_dist_available_and_initialized():
torch.distributed.all_reduce(num_pos)
num_pos = torch.clamp(num_pos / dist_utils.get_world_size(), min=1).item()
return num_pos
def loss_labels_focal(self, outputs, targets, indices, num_boxes):
assert 'pred_logits' in outputs
src_logits = outputs['pred_logits']
idx = self._get_src_permutation_idx(indices)
target_classes_o = torch.cat([t["labels"][j] for t, (_, j) in zip(targets, indices)])
target_classes = torch.full(src_logits.shape[:2], self.num_classes,
dtype=torch.int64, device=src_logits.device)
target_classes[idx] = target_classes_o
target = F.one_hot(target_classes, num_classes=self.num_classes + 1)[..., :-1].to(src_logits.dtype)
loss = torchvision.ops.sigmoid_focal_loss(src_logits, target, self.alpha, self.gamma, reduction='none')
loss = loss.sum() / num_boxes
return {'loss_focal': loss}
def loss_labels_vfl(self, outputs, targets, indices, num_boxes):
assert 'pred_boxes' in outputs
idx = self._get_src_permutation_idx(indices)
src_boxes = outputs['pred_boxes'][idx]
target_boxes = torch.cat([t['boxes'][j] for t, (_, j) in zip(targets, indices)], dim=0)
src_boxes = torchvision.ops.box_convert(src_boxes, in_fmt=self.box_fmt, out_fmt='xyxy')
target_boxes = torchvision.ops.box_convert(target_boxes, in_fmt=self.box_fmt, out_fmt='xyxy')
iou, _ = box_ops.elementwise_box_iou(src_boxes.detach(), target_boxes)
src_logits: torch.Tensor = outputs['pred_logits']
target_classes_o = torch.cat([t["labels"][j] for t, (_, j) in zip(targets, indices)])
target_classes = torch.full(src_logits.shape[:2], self.num_classes,
dtype=torch.int64, device=src_logits.device)
target_classes[idx] = target_classes_o
target = F.one_hot(target_classes, num_classes=self.num_classes + 1)[..., :-1]
target_score_o = torch.zeros_like(target_classes, dtype=src_logits.dtype)
target_score_o[idx] = iou.to(src_logits.dtype)
target_score = target_score_o.unsqueeze(-1) * target
src_score = F.sigmoid(src_logits.detach())
weight = self.alpha * src_score.pow(self.gamma) * (1 - target) + target_score
loss = F.binary_cross_entropy_with_logits(src_logits, target_score, weight=weight, reduction='none')
loss = loss.sum() / num_boxes
return {'loss_vfl': loss}
def loss_boxes(self, outputs, targets, indices, num_boxes):
assert 'pred_boxes' in outputs
idx = self._get_src_permutation_idx(indices)
src_boxes = outputs['pred_boxes'][idx]
target_boxes = torch.cat([t['boxes'][i] for t, (_, i) in zip(targets, indices)], dim=0)
losses = {}
loss_bbox = F.l1_loss(src_boxes, target_boxes, reduction='none')
losses['loss_bbox'] = loss_bbox.sum() / num_boxes
src_boxes = torchvision.ops.box_convert(src_boxes, in_fmt=self.box_fmt, out_fmt='xyxy')
target_boxes = torchvision.ops.box_convert(target_boxes, in_fmt=self.box_fmt, out_fmt='xyxy')
loss_giou = 1 - box_ops.elementwise_generalized_box_iou(src_boxes, target_boxes)
losses['loss_giou'] = loss_giou.sum() / num_boxes
return losses
def loss_boxes_giou(self, outputs, targets, indices, num_boxes):
assert 'pred_boxes' in outputs
idx = self._get_src_permutation_idx(indices)
src_boxes = outputs['pred_boxes'][idx]
target_boxes = torch.cat([t['boxes'][i] for t, (_, i) in zip(targets, indices)], dim=0)
losses = {}
src_boxes = torchvision.ops.box_convert(src_boxes, in_fmt=self.box_fmt, out_fmt='xyxy')
target_boxes = torchvision.ops.box_convert(target_boxes, in_fmt=self.box_fmt, out_fmt='xyxy')
loss_giou = 1 - box_ops.elementwise_generalized_box_iou(src_boxes, target_boxes)
losses['loss_giou'] = loss_giou.sum() / num_boxes
return losses
def get_loss(self, loss, outputs, targets, indices, num_boxes, **kwargs):
loss_map = {
'boxes': self.loss_boxes,
'giou': self.loss_boxes_giou,
'vfl': self.loss_labels_vfl,
'focal': self.loss_labels_focal,
}
assert loss in loss_map, f'do you really want to compute {loss} loss?'
return loss_map[loss](outputs, targets, indices, num_boxes, **kwargs)

View File

@@ -0,0 +1,5 @@
"""Copyright(c) 2023 lyuwenyu. All Rights Reserved.
"""
from .nms_postprocessor import DetNMSPostProcessor

View File

@@ -0,0 +1,62 @@
"""Copyright(c) 2023 lyuwenyu. All Rights Reserved.
"""
import torch
import torchvision
from torch import Tensor
from enum import Enum
class BoxProcessFormat(Enum):
"""Box process format
Available formats are
* ``RESIZE``
* ``RESIZE_KEEP_RATIO``
* ``RESIZE_KEEP_RATIO_PADDING``
"""
RESIZE = 1
RESIZE_KEEP_RATIO = 2
RESIZE_KEEP_RATIO_PADDING = 3
def box_revert(
boxes: Tensor,
orig_sizes: Tensor=None,
eval_sizes: Tensor=None,
inpt_sizes: Tensor=None,
inpt_padding: Tensor=None,
normalized: bool=True,
in_fmt: str='cxcywh',
out_fmt: str='xyxy',
process_fmt=BoxProcessFormat.RESIZE,
) -> Tensor:
"""
Args:
boxes(Tensor), [N, :, 4], (x1, y1, x2, y2), pred boxes.
inpt_sizes(Tensor), [N, 2], (w, h). input sizes.
orig_sizes(Tensor), [N, 2], (w, h). origin sizes.
inpt_padding (Tensor), [N, 2], (w_pad, h_pad, ...).
(inpt_sizes + inpt_padding) == eval_sizes
"""
assert in_fmt in ('cxcywh', 'xyxy'), ''
if normalized and eval_sizes is not None:
boxes = boxes * eval_sizes.repeat(1, 2).unsqueeze(1)
if inpt_padding is not None:
if in_fmt == 'xyxy':
boxes -= inpt_padding[:, :2].repeat(1, 2).unsqueeze(1)
elif in_fmt == 'cxcywh':
boxes[..., :2] -= inpt_padding[:, :2].repeat(1, 2).unsqueeze(1)
if orig_sizes is not None:
orig_sizes = orig_sizes.repeat(1, 2).unsqueeze(1)
if inpt_sizes is not None:
inpt_sizes = inpt_sizes.repeat(1, 2).unsqueeze(1)
boxes = boxes * (orig_sizes / inpt_sizes)
else:
boxes = boxes * orig_sizes
boxes = torchvision.ops.box_convert(boxes, in_fmt=in_fmt, out_fmt=out_fmt)
return boxes

View File

@@ -0,0 +1,81 @@
"""Copyright(c) 2023 lyuwenyu. All Rights Reserved.
"""
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision
__all__ = ['DetDETRPostProcessor']
from .box_revert import box_revert
from .box_revert import BoxProcessFormat
def mod(a, b):
out = a - a // b * b
return out
class DetDETRPostProcessor(nn.Module):
def __init__(
self,
num_classes=80,
use_focal_loss=True,
num_top_queries=300,
box_process_format=BoxProcessFormat.RESIZE,
) -> None:
super().__init__()
self.use_focal_loss = use_focal_loss
self.num_top_queries = num_top_queries
self.num_classes = int(num_classes)
self.box_process_format = box_process_format
self.deploy_mode = False
def extra_repr(self) -> str:
return f'use_focal_loss={self.use_focal_loss}, num_classes={self.num_classes}, num_top_queries={self.num_top_queries}'
def forward(self, outputs, **kwargs):
logits, boxes = outputs['pred_logits'], outputs['pred_boxes']
if self.use_focal_loss:
scores = F.sigmoid(logits)
scores, index = torch.topk(scores.flatten(1), self.num_top_queries, dim=-1)
labels = index % self.num_classes
# labels = mod(index, self.num_classes) # for tensorrt
index = index // self.num_classes
boxes = boxes.gather(dim=1, index=index.unsqueeze(-1).repeat(1, 1, boxes.shape[-1]))
else:
scores = F.softmax(logits)[:, :, :-1]
scores, labels = scores.max(dim=-1)
if scores.shape[1] > self.num_top_queries:
scores, index = torch.topk(scores, self.num_top_queries, dim=-1)
labels = torch.gather(labels, dim=1, index=index)
boxes = torch.gather(boxes, dim=1, index=index.unsqueeze(-1).tile(1, 1, boxes.shape[-1]))
if kwargs is not None:
boxes = box_revert(
boxes,
in_fmt='cxcywh',
out_fmt='xyxy',
process_fmt=self.box_process_format,
normalized=True,
**kwargs,
)
# TODO for onnx export
if self.deploy_mode:
return labels, boxes, scores
results = []
for lab, box, sco in zip(labels, boxes, scores):
result = dict(labels=lab, boxes=box, scores=sco)
results.append(result)
return results
def deploy(self, ):
self.eval()
self.deploy_mode = True
return self

View File

@@ -0,0 +1,79 @@
"""Copyright(c) 2023 lyuwenyu. All Rights Reserved.
"""
import torch
import torch.nn.functional as F
import torch.distributed
import torchvision
from torch import Tensor
from ...core import register
from typing import Dict
__all__ = ['DetNMSPostProcessor', ]
@register()
class DetNMSPostProcessor(torch.nn.Module):
def __init__(self, \
iou_threshold=0.7,
score_threshold=0.01,
keep_topk=300,
box_fmt='cxcywh',
logit_fmt='sigmoid') -> None:
super().__init__()
self.iou_threshold = iou_threshold
self.score_threshold = score_threshold
self.keep_topk = keep_topk
self.box_fmt = box_fmt.lower()
self.logit_fmt = logit_fmt.lower()
self.logit_func = getattr(F, self.logit_fmt, None)
self.deploy_mode = False
def forward(self, outputs: Dict[str, Tensor], orig_target_sizes: Tensor):
logits, boxes = outputs['pred_logits'], outputs['pred_boxes']
pred_boxes = torchvision.ops.box_convert(boxes, in_fmt=self.box_fmt, out_fmt='xyxy')
pred_boxes *= orig_target_sizes.repeat(1, 2).unsqueeze(1)
values, pred_labels = torch.max(logits, dim=-1)
if self.logit_func:
pred_scores = self.logit_func(values)
else:
pred_scores = values
# TODO for onnx export
if self.deploy_mode:
blobs = {
'pred_labels': pred_labels,
'pred_boxes': pred_boxes,
'pred_scores': pred_scores
}
return blobs
results = []
for i in range(logits.shape[0]):
score_keep = pred_scores[i] > self.score_threshold
pred_box = pred_boxes[i][score_keep]
pred_label = pred_labels[i][score_keep]
pred_score = pred_scores[i][score_keep]
keep = torchvision.ops.batched_nms(pred_box, pred_score, pred_label, self.iou_threshold)
keep = keep[:self.keep_topk]
blob = {
'labels': pred_label[keep],
'boxes': pred_box[keep],
'scores': pred_score[keep],
}
results.append(blob)
return results
def deploy(self, ):
self.eval()
self.deploy_mode = True
return self

View File

@@ -0,0 +1,7 @@
"""Copyright(c) 2023 lyuwenyu. All Rights Reserved.
"""
from .ema import *
from .optim import *
from .amp import *
from .warmup import *

View File

@@ -0,0 +1,12 @@
"""Copyright(c) 2023 lyuwenyu. All Rights Reserved.
"""
import torch.cuda.amp as amp
from ..core import register
__all__ = ['GradScaler']
GradScaler = register()(amp.grad_scaler.GradScaler)

View File

@@ -0,0 +1,92 @@
"""Copyright(c) 2023 lyuwenyu. All Rights Reserved.
"""
import torch
import torch.nn as nn
import math
from copy import deepcopy
from ..core import register
from ..misc import dist_utils
__all__ = ['ModelEMA']
@register()
class ModelEMA(object):
"""
Model Exponential Moving Average from https://github.com/rwightman/pytorch-image-models
Keep a moving average of everything in the model state_dict (parameters and buffers).
This is intended to allow functionality like
https://www.tensorflow.org/api_docs/python/tf/train/ExponentialMovingAverage
A smoothed version of the weights is necessary for some training schemes to perform well.
This class is sensitive where it is initialized in the sequence of model init,
GPU assignment and distributed training wrappers.
"""
def __init__(self, model: nn.Module, decay: float=0.9999, warmups: int=2000, ):
super().__init__()
self.module = deepcopy(dist_utils.de_parallel(model)).eval()
# if next(model.parameters()).device.type != 'cpu':
# self.module.half() # FP16 EMA
self.decay = decay
self.warmups = warmups
self.updates = 0 # number of EMA updates
self.decay_fn = lambda x: decay * (1 - math.exp(-x / warmups)) # decay exponential ramp (to help early epochs)
for p in self.module.parameters():
p.requires_grad_(False)
def update(self, model: nn.Module):
# Update EMA parameters
with torch.no_grad():
self.updates += 1
d = self.decay_fn(self.updates)
msd = dist_utils.de_parallel(model).state_dict()
for k, v in self.module.state_dict().items():
if v.dtype.is_floating_point:
v *= d
v += (1 - d) * msd[k].detach()
def to(self, *args, **kwargs):
self.module = self.module.to(*args, **kwargs)
return self
def state_dict(self, ):
return dict(module=self.module.state_dict(), updates=self.updates)
def load_state_dict(self, state, strict=True):
self.module.load_state_dict(state['module'], strict=strict)
if 'updates' in state:
self.updates = state['updates']
def forwad(self, ):
raise RuntimeError('ema...')
def extra_repr(self) -> str:
return f'decay={self.decay}, warmups={self.warmups}'
class ExponentialMovingAverage(torch.optim.swa_utils.AveragedModel):
"""Maintains moving averages of model parameters using an exponential decay.
``ema_avg = decay * avg_model_param + (1 - decay) * model_param``
`torch.optim.swa_utils.AveragedModel <https://pytorch.org/docs/stable/optim.html#custom-averaging-strategies>`_
is used to compute the EMA.
"""
def __init__(self, model, decay, device="cpu", use_buffers=True):
self.decay_fn = lambda x: decay * (1 - math.exp(-x / 2000))
def ema_avg(avg_model_param, model_param, num_averaged):
decay = self.decay_fn(num_averaged)
return decay * avg_model_param + (1 - decay) * model_param
super().__init__(model, device, ema_avg, use_buffers=use_buffers)

View File

@@ -0,0 +1,23 @@
"""Copyright(c) 2023 lyuwenyu. All Rights Reserved.
"""
import torch.optim as optim
import torch.optim.lr_scheduler as lr_scheduler
from ..core import register
__all__ = ['AdamW', 'SGD', 'Adam', 'MultiStepLR', 'CosineAnnealingLR', 'OneCycleLR', 'LambdaLR']
SGD = register()(optim.SGD)
Adam = register()(optim.Adam)
AdamW = register()(optim.AdamW)
MultiStepLR = register()(lr_scheduler.MultiStepLR)
CosineAnnealingLR = register()(lr_scheduler.CosineAnnealingLR)
OneCycleLR = register()(lr_scheduler.OneCycleLR)
LambdaLR = register()(lr_scheduler.LambdaLR)

View File

@@ -0,0 +1,47 @@
"""Copyright(c) 2023 lyuwenyu. All Rights Reserved.
"""
from torch.optim.lr_scheduler import LRScheduler
from ..core import register
class Warmup(object):
def __init__(self, lr_scheduler: LRScheduler, warmup_duration: int, last_step: int=-1) -> None:
self.lr_scheduler = lr_scheduler
self.warmup_end_values = [pg['lr'] for pg in lr_scheduler.optimizer.param_groups]
self.last_step = last_step
self.warmup_duration = warmup_duration
self.step()
def state_dict(self):
return {k: v for k, v in self.__dict__.items() if k != 'lr_scheduler'}
def load_state_dict(self, state_dict):
self.__dict__.update(state_dict)
def get_warmup_factor(self, step, **kwargs):
raise NotImplementedError
def step(self, ):
self.last_step += 1
if self.last_step >= self.warmup_duration:
return
factor = self.get_warmup_factor(self.last_step)
for i, pg in enumerate(self.lr_scheduler.optimizer.param_groups):
pg['lr'] = factor * self.warmup_end_values[i]
def finished(self, ):
if self.last_step >= self.warmup_duration:
return True
return False
@register()
class LinearWarmup(Warmup):
def __init__(self, lr_scheduler: LRScheduler, warmup_duration: int, last_step: int = -1) -> None:
super().__init__(lr_scheduler, warmup_duration, last_step)
def get_warmup_factor(self, step):
return min(1.0, (step + 1) / self.warmup_duration)

View File

@@ -0,0 +1,15 @@
"""Copyright(c) 2023 lyuwenyu. All Rights Reserved.
"""
from ._solver import BaseSolver
from .clas_solver import ClasSolver
from .det_solver import DetSolver
from typing import Dict
TASKS :Dict[str, BaseSolver] = {
'classification': ClasSolver,
'detection': DetSolver,
}

View File

@@ -0,0 +1,191 @@
"""Copyright(c) 2023 lyuwenyu. All Rights Reserved.
"""
import torch
import torch.nn as nn
from datetime import datetime
from pathlib import Path
from typing import Dict
import atexit
from ..misc import dist_utils
from ..core import BaseConfig
def to(m: nn.Module, device: str):
if m is None:
return None
return m.to(device)
class BaseSolver(object):
def __init__(self, cfg: BaseConfig) -> None:
self.cfg = cfg
def _setup(self, ):
"""Avoid instantiating unnecessary classes
"""
cfg = self.cfg
if cfg.device:
device = torch.device(cfg.device)
else:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
self.model = cfg.model
# NOTE (lyuwenyu): must load_tuning_state before ema instance building
if self.cfg.tuning:
print(f'tuning checkpoint from {self.cfg.tuning}')
self.load_tuning_state(self.cfg.tuning)
self.model = dist_utils.warp_model(self.model.to(device), sync_bn=cfg.sync_bn, \
find_unused_parameters=cfg.find_unused_parameters)
self.criterion = to(cfg.criterion, device)
self.postprocessor = to(cfg.postprocessor, device)
self.ema = to(cfg.ema, device)
self.scaler = cfg.scaler
self.device = device
self.last_epoch = self.cfg.last_epoch
self.output_dir = Path(cfg.output_dir)
self.output_dir.mkdir(parents=True, exist_ok=True)
self.writer = cfg.writer
if self.writer:
atexit.register(self.writer.close)
if dist_utils.is_main_process():
self.writer.add_text(f'config', '{:s}'.format(cfg.__repr__()), 0)
def cleanup(self, ):
if self.writer:
atexit.register(self.writer.close)
def train(self, ):
self._setup()
self.optimizer = self.cfg.optimizer
self.lr_scheduler = self.cfg.lr_scheduler
self.lr_warmup_scheduler = self.cfg.lr_warmup_scheduler
self.train_dataloader = dist_utils.warp_loader(self.cfg.train_dataloader, \
shuffle=self.cfg.train_dataloader.shuffle)
self.val_dataloader = dist_utils.warp_loader(self.cfg.val_dataloader, \
shuffle=self.cfg.val_dataloader.shuffle)
self.evaluator = self.cfg.evaluator
# NOTE instantiating order
if self.cfg.resume:
print(f'Resume checkpoint from {self.cfg.resume}')
self.load_resume_state(self.cfg.resume)
def eval(self, ):
self._setup()
self.val_dataloader = dist_utils.warp_loader(self.cfg.val_dataloader, \
shuffle=self.cfg.val_dataloader.shuffle)
self.evaluator = self.cfg.evaluator
if self.cfg.resume:
print(f'Resume checkpoint from {self.cfg.resume}')
self.load_resume_state(self.cfg.resume)
def to(self, device):
for k, v in self.__dict__.items():
if hasattr(v, 'to'):
v.to(device)
def state_dict(self):
"""state dict, train/eval
"""
state = {}
state['date'] = datetime.now().isoformat()
# TODO for resume
state['last_epoch'] = self.last_epoch
for k, v in self.__dict__.items():
if hasattr(v, 'state_dict'):
v = dist_utils.de_parallel(v)
state[k] = v.state_dict()
return state
def load_state_dict(self, state):
"""load state dict, train/eval
"""
# TODO
if 'last_epoch' in state:
self.last_epoch = state['last_epoch']
print('Load last_epoch')
for k, v in self.__dict__.items():
if hasattr(v, 'load_state_dict') and k in state:
v = dist_utils.de_parallel(v)
v.load_state_dict(state[k])
print(f'Load {k}.state_dict')
if hasattr(v, 'load_state_dict') and k not in state:
print(f'Not load {k}.state_dict')
def load_resume_state(self, path: str):
"""load resume
"""
# for cuda:0 memory
if path.startswith('http'):
state = torch.hub.load_state_dict_from_url(path, map_location='cpu')
else:
state = torch.load(path, map_location='cpu')
self.load_state_dict(state)
def load_tuning_state(self, path: str,):
"""only load model for tuning and skip missed/dismatched keys
"""
if path.startswith('http'):
state = torch.hub.load_state_dict_from_url(path, map_location='cpu')
else:
state = torch.load(path, map_location='cpu')
module = dist_utils.de_parallel(self.model)
# TODO hard code
if 'ema' in state:
stat, infos = self._matched_state(module.state_dict(), state['ema']['module'])
else:
stat, infos = self._matched_state(module.state_dict(), state['model'])
module.load_state_dict(stat, strict=False)
print(f'Load model.state_dict, {infos}')
@staticmethod
def _matched_state(state: Dict[str, torch.Tensor], params: Dict[str, torch.Tensor]):
missed_list = []
unmatched_list = []
matched_state = {}
for k, v in state.items():
if k in params:
if v.shape == params[k].shape:
matched_state[k] = params[k]
else:
unmatched_list.append(k)
else:
missed_list.append(k)
return matched_state, {'missed': missed_list, 'unmatched': unmatched_list}
def fit(self, ):
raise NotImplementedError('')
def val(self, ):
raise NotImplementedError('')

View File

@@ -0,0 +1,74 @@
"""Copyright(c) 2023 lyuwenyu. All Rights Reserved.
"""
import torch
import torch.nn as nn
from ..misc import (MetricLogger, SmoothedValue, reduce_dict)
def train_one_epoch(model: nn.Module, criterion: nn.Module, dataloader, optimizer, ema, epoch, device):
"""
"""
model.train()
metric_logger = MetricLogger(delimiter=" ")
metric_logger.add_meter('lr', SmoothedValue(window_size=1, fmt='{value:.6f}'))
print_freq = 100
header = 'Epoch: [{}]'.format(epoch)
for imgs, labels in metric_logger.log_every(dataloader, print_freq, header):
imgs = imgs.to(device)
labels = labels.to(device)
preds = model(imgs)
loss: torch.Tensor = criterion(preds, labels)
optimizer.zero_grad()
loss.backward()
optimizer.step()
if ema is not None:
ema.update(model)
loss_reduced_values = {k: v.item() for k, v in reduce_dict({'loss': loss}).items()}
metric_logger.update(**loss_reduced_values)
metric_logger.update(lr=optimizer.param_groups[0]["lr"])
metric_logger.synchronize_between_processes()
print("Averaged stats:", metric_logger)
stats = {k: meter.global_avg for k, meter in metric_logger.meters.items()}
return stats
@torch.no_grad()
def evaluate(model, criterion, dataloader, device):
model.eval()
metric_logger = MetricLogger(delimiter=" ")
# metric_logger.add_meter('acc', SmoothedValue(window_size=1, fmt='{global_avg:.4f}'))
# metric_logger.add_meter('loss', SmoothedValue(window_size=1, fmt='{value:.2f}'))
metric_logger.add_meter('acc', SmoothedValue(window_size=1))
metric_logger.add_meter('loss', SmoothedValue(window_size=1))
header = 'Test:'
for imgs, labels in metric_logger.log_every(dataloader, 10, header):
imgs, labels = imgs.to(device), labels.to(device)
preds = model(imgs)
acc = (preds.argmax(dim=-1) == labels).sum() / preds.shape[0]
loss = criterion(preds, labels)
dict_reduced = reduce_dict({'acc': acc, 'loss': loss})
reduced_values = {k: v.item() for k, v in dict_reduced.items()}
metric_logger.update(**reduced_values)
metric_logger.synchronize_between_processes()
print("Averaged stats:", metric_logger)
stats = {k: meter.global_avg for k, meter in metric_logger.meters.items()}
return stats

View File

@@ -0,0 +1,71 @@
"""Copyright(c) 2023 lyuwenyu. All Rights Reserved.
"""
import time
import json
import datetime
from pathlib import Path
import torch
import torch.nn as nn
from ..misc import dist_utils
from ._solver import BaseSolver
from .clas_engine import train_one_epoch, evaluate
class ClasSolver(BaseSolver):
def fit(self, ):
print("Start training")
self.train()
args = self.cfg
n_parameters = sum(p.numel() for p in self.model.parameters() if p.requires_grad)
print('Number of params:', n_parameters)
output_dir = Path(args.output_dir)
output_dir.mkdir(exist_ok=True)
start_time = time.time()
start_epoch = self.last_epoch + 1
for epoch in range(start_epoch, args.epoches):
if dist_utils.is_dist_available_and_initialized():
self.train_dataloader.sampler.set_epoch(epoch)
train_stats = train_one_epoch(self.model,
self.criterion,
self.train_dataloader,
self.optimizer,
self.ema,
epoch=epoch,
device=self.device)
self.lr_scheduler.step()
self.last_epoch += 1
if output_dir:
checkpoint_paths = [output_dir / 'checkpoint.pth']
# extra checkpoint before LR drop and every 100 epochs
if (epoch + 1) % args.checkpoint_freq == 0:
checkpoint_paths.append(output_dir / f'checkpoint{epoch:04}.pth')
for checkpoint_path in checkpoint_paths:
dist_utils.save_on_master(self.state_dict(epoch), checkpoint_path)
module = self.ema.module if self.ema else self.model
test_stats = evaluate(module, self.criterion, self.val_dataloader, self.device)
log_stats = {**{f'train_{k}': v for k, v in train_stats.items()},
**{f'test_{k}': v for k, v in test_stats.items()},
'epoch': epoch,
'n_parameters': n_parameters}
if output_dir and dist_utils.is_main_process():
with (output_dir / "log.txt").open("a") as f:
f.write(json.dumps(log_stats) + "\n")
total_time = time.time() - start_time
total_time_str = str(datetime.timedelta(seconds=int(total_time)))
print('Training time {}'.format(total_time_str))

View File

@@ -0,0 +1,157 @@
"""
Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
https://github.com/facebookresearch/detr/blob/main/engine.py
Copyright(c) 2023 lyuwenyu. All Rights Reserved.
"""
import sys
import math
from typing import Iterable
import torch
import torch.amp
from torch.utils.tensorboard import SummaryWriter
from torch.cuda.amp.grad_scaler import GradScaler
from ..optim import ModelEMA, Warmup
from ..data import CocoEvaluator
from ..misc import MetricLogger, SmoothedValue, dist_utils
def train_one_epoch(model: torch.nn.Module, criterion: torch.nn.Module,
data_loader: Iterable, optimizer: torch.optim.Optimizer,
device: torch.device, epoch: int, max_norm: float = 0, **kwargs):
model.train()
criterion.train()
metric_logger = MetricLogger(delimiter=" ")
metric_logger.add_meter('lr', SmoothedValue(window_size=1, fmt='{value:.6f}'))
header = 'Epoch: [{}]'.format(epoch)
print_freq = kwargs.get('print_freq', 10)
writer :SummaryWriter = kwargs.get('writer', None)
ema :ModelEMA = kwargs.get('ema', None)
scaler :GradScaler = kwargs.get('scaler', None)
lr_warmup_scheduler :Warmup = kwargs.get('lr_warmup_scheduler', None)
for i, (samples, targets) in enumerate(metric_logger.log_every(data_loader, print_freq, header)):
samples = samples.to(device)
targets = [{k: v.to(device) for k, v in t.items()} for t in targets]
global_step = epoch * len(data_loader) + i
metas = dict(epoch=epoch, step=i, global_step=global_step)
if scaler is not None:
with torch.autocast(device_type=str(device), cache_enabled=True):
outputs = model(samples, targets=targets)
with torch.autocast(device_type=str(device), enabled=False):
loss_dict = criterion(outputs, targets, **metas)
loss = sum(loss_dict.values())
scaler.scale(loss).backward()
if max_norm > 0:
scaler.unscale_(optimizer)
torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm)
scaler.step(optimizer)
scaler.update()
optimizer.zero_grad()
else:
outputs = model(samples, targets=targets)
loss_dict = criterion(outputs, targets, **metas)
loss : torch.Tensor = sum(loss_dict.values())
optimizer.zero_grad()
loss.backward()
if max_norm > 0:
torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm)
optimizer.step()
# ema
if ema is not None:
ema.update(model)
if lr_warmup_scheduler is not None:
lr_warmup_scheduler.step()
loss_dict_reduced = dist_utils.reduce_dict(loss_dict)
loss_value = sum(loss_dict_reduced.values())
if not math.isfinite(loss_value):
print("Loss is {}, stopping training".format(loss_value))
print(loss_dict_reduced)
sys.exit(1)
metric_logger.update(loss=loss_value, **loss_dict_reduced)
metric_logger.update(lr=optimizer.param_groups[0]["lr"])
if writer and dist_utils.is_main_process():
writer.add_scalar('Loss/total', loss_value.item(), global_step)
for j, pg in enumerate(optimizer.param_groups):
writer.add_scalar(f'Lr/pg_{j}', pg['lr'], global_step)
for k, v in loss_dict_reduced.items():
writer.add_scalar(f'Loss/{k}', v.item(), global_step)
# gather the stats from all processes
metric_logger.synchronize_between_processes()
print("Averaged stats:", metric_logger)
return {k: meter.global_avg for k, meter in metric_logger.meters.items()}
@torch.no_grad()
def evaluate(model: torch.nn.Module, criterion: torch.nn.Module, postprocessor, data_loader, coco_evaluator: CocoEvaluator, device):
model.eval()
criterion.eval()
coco_evaluator.cleanup()
iou_types = coco_evaluator.iou_types
metric_logger = MetricLogger(delimiter=" ")
header = 'Test:'
for samples, targets in metric_logger.log_every(data_loader, 10, header):
samples = samples.to(device)
targets = [{k: v.to(device) for k, v in t.items()} for t in targets]
outputs = model(samples)
# TODO (lyuwenyu), fix dataset converted using `convert_to_coco_api`?
orig_target_sizes = torch.stack([t["orig_size"] for t in targets], dim=0)
results = postprocessor(outputs, orig_target_sizes)
# if 'segm' in postprocessor.keys():
# target_sizes = torch.stack([t["size"] for t in targets], dim=0)
# results = postprocessor['segm'](results, outputs, orig_target_sizes, target_sizes)
res = {target['image_id'].item(): output for target, output in zip(targets, results)}
if coco_evaluator is not None:
coco_evaluator.update(res)
# gather the stats from all processes
metric_logger.synchronize_between_processes()
print("Averaged stats:", metric_logger)
if coco_evaluator is not None:
coco_evaluator.synchronize_between_processes()
# accumulate predictions from all images
if coco_evaluator is not None:
coco_evaluator.accumulate()
coco_evaluator.summarize()
stats = {}
# stats = {k: meter.global_avg for k, meter in metric_logger.meters.items()}
if coco_evaluator is not None:
if 'bbox' in iou_types:
stats['coco_eval_bbox'] = coco_evaluator.coco_eval['bbox'].stats.tolist()
if 'segm' in iou_types:
stats['coco_eval_masks'] = coco_evaluator.coco_eval['segm'].stats.tolist()
return stats, coco_evaluator

Some files were not shown because too many files have changed in this diff Show More