first commit
This commit is contained in:
111
rtdetr_pytorch/README.md
Normal file
111
rtdetr_pytorch/README.md
Normal file
@@ -0,0 +1,111 @@
|
||||
## TODO
|
||||
<details>
|
||||
<summary> see details </summary>
|
||||
|
||||
- [x] Training
|
||||
- [x] Evaluation
|
||||
- [x] Export onnx
|
||||
- [x] Upload source code
|
||||
- [x] Upload weight convert from paddle, see [*links*](https://github.com/lyuwenyu/RT-DETR/issues/42)
|
||||
- [x] Align training details with the [*paddle version*](../rtdetr_paddle/)
|
||||
- [x] Tuning rtdetr based on [*pretrained weights*](https://github.com/lyuwenyu/RT-DETR/issues/42)
|
||||
|
||||
</details>
|
||||
|
||||
|
||||
## Model Zoo
|
||||
|
||||
| Model | Dataset | Input Size | AP<sup>val</sup> | AP<sub>50</sub><sup>val</sup> | #Params(M) | FPS | checkpoint |
|
||||
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|
||||
rtdetr_r18vd | COCO | 640 | 46.4 | 63.7 | 20 | 217 | [url<sup>*</sup>](https://github.com/lyuwenyu/storage/releases/download/v0.1/rtdetr_r18vd_dec3_6x_coco_from_paddle.pth)
|
||||
rtdetr_r34vd | COCO | 640 | 48.9 | 66.8 | 31 | 161 | [url<sup>*</sup>](https://github.com/lyuwenyu/storage/releases/download/v0.1/rtdetr_r34vd_dec4_6x_coco_from_paddle.pth)
|
||||
rtdetr_r50vd_m | COCO | 640 | 51.3 | 69.5 | 36 | 145 | [url<sup>*</sup>](https://github.com/lyuwenyu/storage/releases/download/v0.1/rtdetr_r50vd_m_6x_coco_from_paddle.pth)
|
||||
rtdetr_r50vd | COCO | 640 | 53.1 | 71.2| 42 | 108 | [url<sup>*</sup>](https://github.com/lyuwenyu/storage/releases/download/v0.1/rtdetr_r50vd_6x_coco_from_paddle.pth)
|
||||
rtdetr_r101vd | COCO | 640 | 54.3 | 72.8 | 76 | 74 | [url<sup>*</sup>](https://github.com/lyuwenyu/storage/releases/download/v0.1/rtdetr_r101vd_6x_coco_from_paddle.pth)
|
||||
rtdetr_18vd | COCO+Objects365 | 640 | 49.0 | 66.5 | 20 | 217 | [url<sup>*</sup>](https://github.com/lyuwenyu/storage/releases/download/v0.1/rtdetr_r18vd_5x_coco_objects365_from_paddle.pth)
|
||||
rtdetr_r50vd | COCO+Objects365 | 640 | 55.2 | 73.4 | 42 | 108 | [url<sup>*</sup>](https://github.com/lyuwenyu/storage/releases/download/v0.1/rtdetr_r50vd_2x_coco_objects365_from_paddle.pth)
|
||||
rtdetr_r101vd | COCO+Objects365 | 640 | 56.2 | 74.5 | 76 | 74 | [url<sup>*</sup>](https://github.com/lyuwenyu/storage/releases/download/v0.1/rtdetr_r101vd_2x_coco_objects365_from_paddle.pth)
|
||||
rtdetr_regnet | COCO | 640 | 51.6 | 69.6 | 38 | 67 | [url<sup>*</sup>](https://drive.google.com/file/d/1K2EXJgnaEUJcZCLULHrZ492EF4PdgVp9/view?usp=sharing)
|
||||
rtdetr_dla34 | COCO | 640 | 49.6 | 67.4 | 34 | 83 | [url<sup>*</sup>](https://drive.google.com/file/d/1_rVpl-jIelwy2LDT3E4vdM4KCLBcOtzZ/view?usp=sharing)
|
||||
|
||||
Notes
|
||||
- `COCO + Objects365` in the table means finetuned model on `COCO` using pretrained weights trained on `Objects365`.
|
||||
- `url`<sup>`*`</sup> is the url of pretrained weights convert from paddle model for save energy. *It may have slight differences between this table and paper*
|
||||
<!-- - `FPS` is evaluated on a single T4 GPU with $batch\\_size = 1$ and $tensorrt\\_fp16$ mode -->
|
||||
|
||||
## Quick start
|
||||
|
||||
<details>
|
||||
<summary>Install</summary>
|
||||
|
||||
```bash
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
</details>
|
||||
|
||||
|
||||
<details>
|
||||
<summary>Data</summary>
|
||||
|
||||
- Download and extract COCO 2017 train and val images.
|
||||
```
|
||||
path/to/coco/
|
||||
annotations/ # annotation json files
|
||||
train2017/ # train images
|
||||
val2017/ # val images
|
||||
```
|
||||
- Modify config [`img_folder`, `ann_file`](configs/dataset/coco_detection.yml)
|
||||
</details>
|
||||
|
||||
|
||||
|
||||
<details>
|
||||
<summary>Training & Evaluation</summary>
|
||||
|
||||
- Training on a Single GPU:
|
||||
|
||||
```shell
|
||||
# training on single-gpu
|
||||
export CUDA_VISIBLE_DEVICES=0
|
||||
python tools/train.py -c configs/rtdetr/rtdetr_r50vd_6x_coco.yml
|
||||
```
|
||||
|
||||
- Training on Multiple GPUs:
|
||||
|
||||
```shell
|
||||
# train on multi-gpu
|
||||
export CUDA_VISIBLE_DEVICES=0,1,2,3
|
||||
torchrun --nproc_per_node=4 tools/train.py -c configs/rtdetr/rtdetr_r50vd_6x_coco.yml
|
||||
```
|
||||
|
||||
- Evaluation on Multiple GPUs:
|
||||
|
||||
```shell
|
||||
# val on multi-gpu
|
||||
export CUDA_VISIBLE_DEVICES=0,1,2,3
|
||||
torchrun --nproc_per_node=4 tools/train.py -c configs/rtdetr/rtdetr_r50vd_6x_coco.yml -r path/to/checkpoint --test-only
|
||||
```
|
||||
|
||||
</details>
|
||||
|
||||
|
||||
|
||||
<details>
|
||||
<summary>Export</summary>
|
||||
|
||||
```shell
|
||||
python tools/export_onnx.py -c configs/rtdetr/rtdetr_r18vd_6x_coco.yml -r path/to/checkpoint --check
|
||||
```
|
||||
</details>
|
||||
|
||||
|
||||
|
||||
|
||||
<details open>
|
||||
<summary>Train custom data</summary>
|
||||
|
||||
1. set `remap_mscoco_category: False`. This variable only works for ms-coco dataset. If you want to use `remap_mscoco_category` logic on your dataset, please modify variable [`mscoco_category2name`](https://github.com/lyuwenyu/RT-DETR/blob/main/rtdetr_pytorch/src/data/coco/coco_dataset.py#L154) based on your dataset.
|
||||
|
||||
2. add `-t path/to/checkpoint` (optinal) to tuning rtdetr based on pretrained checkpoint. see [training script details](./tools/README.md).
|
||||
</details>
|
||||
34
rtdetr_pytorch/configs/dataset/coco_detection.yml
Normal file
34
rtdetr_pytorch/configs/dataset/coco_detection.yml
Normal file
@@ -0,0 +1,34 @@
|
||||
task: detection
|
||||
|
||||
num_classes: 80
|
||||
remap_mscoco_category: True
|
||||
|
||||
train_dataloader:
|
||||
type: DataLoader
|
||||
dataset:
|
||||
type: CocoDetection
|
||||
img_folder: ./dataset/coco/train2017/
|
||||
ann_file: ./dataset/coco/annotations/instances_train2017.json
|
||||
transforms:
|
||||
type: Compose
|
||||
ops: ~
|
||||
shuffle: True
|
||||
batch_size: 8
|
||||
num_workers: 4
|
||||
drop_last: True
|
||||
|
||||
|
||||
val_dataloader:
|
||||
type: DataLoader
|
||||
dataset:
|
||||
type: CocoDetection
|
||||
img_folder: ./dataset/coco/val2017/
|
||||
ann_file: ./dataset/coco/annotations/instances_val2017.json
|
||||
transforms:
|
||||
type: Compose
|
||||
ops: ~
|
||||
|
||||
shuffle: False
|
||||
batch_size: 8
|
||||
num_workers: 4
|
||||
drop_last: False
|
||||
39
rtdetr_pytorch/configs/rtdetr/include/dataloader.yml
Normal file
39
rtdetr_pytorch/configs/rtdetr/include/dataloader.yml
Normal file
@@ -0,0 +1,39 @@
|
||||
# num_classes: 91
|
||||
# remap_mscoco_category: True
|
||||
|
||||
train_dataloader:
|
||||
dataset:
|
||||
return_masks: False
|
||||
transforms:
|
||||
ops:
|
||||
- {type: RandomPhotometricDistort, p: 0.5}
|
||||
- {type: RandomZoomOut, fill: 0}
|
||||
- {type: RandomIoUCrop, p: 0.8}
|
||||
- {type: SanitizeBoundingBox, min_size: 1}
|
||||
- {type: RandomHorizontalFlip}
|
||||
- {type: Resize, size: [640, 640], }
|
||||
# - {type: Resize, size: 639, max_size: 640}
|
||||
# - {type: PadToSize, spatial_size: 640}
|
||||
- {type: ToImageTensor}
|
||||
- {type: ConvertDtype}
|
||||
- {type: SanitizeBoundingBox, min_size: 1}
|
||||
- {type: ConvertBox, out_fmt: 'cxcywh', normalize: True}
|
||||
shuffle: True
|
||||
batch_size: 4
|
||||
num_workers: 4
|
||||
collate_fn: default_collate_fn
|
||||
|
||||
|
||||
val_dataloader:
|
||||
dataset:
|
||||
transforms:
|
||||
ops:
|
||||
# - {type: Resize, size: 639, max_size: 640}
|
||||
# - {type: PadToSize, spatial_size: 640}
|
||||
- {type: Resize, size: [640, 640]}
|
||||
- {type: ToImageTensor}
|
||||
- {type: ConvertDtype}
|
||||
shuffle: False
|
||||
batch_size: 8
|
||||
num_workers: 4
|
||||
collate_fn: default_collate_fn
|
||||
39
rtdetr_pytorch/configs/rtdetr/include/dataloader_regnet.yml
Normal file
39
rtdetr_pytorch/configs/rtdetr/include/dataloader_regnet.yml
Normal file
@@ -0,0 +1,39 @@
|
||||
# num_classes: 91
|
||||
# remap_mscoco_category: True
|
||||
|
||||
train_dataloader:
|
||||
dataset:
|
||||
return_masks: False
|
||||
transforms:
|
||||
ops:
|
||||
- {type: RandomPhotometricDistort, p: 0.5}
|
||||
- {type: RandomZoomOut, fill: 0}
|
||||
- {type: RandomIoUCrop, p: 0.8}
|
||||
- {type: SanitizeBoundingBox, min_size: 1}
|
||||
- {type: RandomHorizontalFlip}
|
||||
- {type: Resize, size: [640, 640], }
|
||||
# - {type: Resize, size: 639, max_size: 640}
|
||||
# - {type: PadToSize, spatial_size: 640}
|
||||
- {type: ToImageTensor}
|
||||
- {type: ConvertDtype}
|
||||
- {type: SanitizeBoundingBox, min_size: 1}
|
||||
- {type: ConvertBox, out_fmt: 'cxcywh', normalize: True}
|
||||
shuffle: True
|
||||
batch_size: 8
|
||||
num_workers: 2
|
||||
collate_fn: default_collate_fn
|
||||
|
||||
|
||||
val_dataloader:
|
||||
dataset:
|
||||
transforms:
|
||||
ops:
|
||||
# - {type: Resize, size: 639, max_size: 640}
|
||||
# - {type: PadToSize, spatial_size: 640}
|
||||
- {type: Resize, size: [640, 640]}
|
||||
- {type: ToImageTensor}
|
||||
- {type: ConvertDtype}
|
||||
shuffle: False
|
||||
batch_size: 8
|
||||
num_workers: 2
|
||||
collate_fn: default_collate_fn
|
||||
36
rtdetr_pytorch/configs/rtdetr/include/optimizer.yml
Normal file
36
rtdetr_pytorch/configs/rtdetr/include/optimizer.yml
Normal file
@@ -0,0 +1,36 @@
|
||||
|
||||
use_ema: True
|
||||
ema:
|
||||
type: ModelEMA
|
||||
decay: 0.9999
|
||||
warmups: 2000
|
||||
|
||||
|
||||
find_unused_parameters: True
|
||||
|
||||
epoches: 72
|
||||
clip_max_norm: 0.1
|
||||
|
||||
optimizer:
|
||||
type: AdamW
|
||||
params:
|
||||
-
|
||||
params: 'backbone'
|
||||
lr: 0.00001
|
||||
-
|
||||
params: '^(?=.*encoder(?=.*bias|.*norm.*weight)).*$'
|
||||
weight_decay: 0.
|
||||
-
|
||||
params: '^(?=.*decoder(?=.*bias|.*norm.*weight)).*$'
|
||||
weight_decay: 0.
|
||||
|
||||
lr: 0.0001
|
||||
betas: [0.9, 0.999]
|
||||
weight_decay: 0.0001
|
||||
|
||||
|
||||
lr_scheduler:
|
||||
type: MultiStepLR
|
||||
milestones: [1000]
|
||||
gamma: 0.1
|
||||
|
||||
33
rtdetr_pytorch/configs/rtdetr/include/optimizer_regnet.yml
Normal file
33
rtdetr_pytorch/configs/rtdetr/include/optimizer_regnet.yml
Normal file
@@ -0,0 +1,33 @@
|
||||
|
||||
use_ema: True
|
||||
ema:
|
||||
type: ModelEMA
|
||||
decay: 0.9999
|
||||
warmups: 2000
|
||||
|
||||
|
||||
find_unused_parameters: True
|
||||
|
||||
epoches: 72
|
||||
clip_max_norm: 0.1
|
||||
|
||||
optimizer:
|
||||
type: AdamW
|
||||
params:
|
||||
-
|
||||
params: '^(?=.*encoder(?=.*bias|.*norm.*weight)).*$'
|
||||
weight_decay: 0.
|
||||
-
|
||||
params: '^(?=.*decoder(?=.*bias|.*norm.*weight)).*$'
|
||||
weight_decay: 0.
|
||||
|
||||
lr: 0.0001
|
||||
betas: [0.9, 0.999]
|
||||
weight_decay: 0.0001
|
||||
|
||||
|
||||
lr_scheduler:
|
||||
type: MultiStepLR
|
||||
milestones: [1000]
|
||||
gamma: 0.1
|
||||
|
||||
78
rtdetr_pytorch/configs/rtdetr/include/rtdetr_dla34.yml
Normal file
78
rtdetr_pytorch/configs/rtdetr/include/rtdetr_dla34.yml
Normal file
@@ -0,0 +1,78 @@
|
||||
task: detection
|
||||
|
||||
model: RTDETR
|
||||
criterion: SetCriterion
|
||||
postprocessor: RTDETRPostProcessor
|
||||
|
||||
|
||||
RTDETR:
|
||||
backbone: DLANet
|
||||
encoder: HybridEncoder
|
||||
decoder: RTDETRTransformer
|
||||
multi_scale: [480, 512, 544, 576, 608, 640, 640, 640, 672, 704, 736, 768, 800]
|
||||
|
||||
DLANet:
|
||||
dla: dla34
|
||||
pretrained: True
|
||||
return_idx: [1, 2, 3]
|
||||
|
||||
|
||||
HybridEncoder:
|
||||
in_channels: [128, 256, 512]
|
||||
feat_strides: [8, 16, 32]
|
||||
|
||||
# intra
|
||||
hidden_dim: 256
|
||||
use_encoder_idx: [2]
|
||||
num_encoder_layers: 1
|
||||
nhead: 8
|
||||
dim_feedforward: 1024
|
||||
dropout: 0.
|
||||
enc_act: 'gelu'
|
||||
pe_temperature: 10000
|
||||
|
||||
# cross
|
||||
expansion: 1.0
|
||||
depth_mult: 1
|
||||
act: 'silu'
|
||||
|
||||
# eval
|
||||
eval_spatial_size: [640, 640]
|
||||
|
||||
|
||||
RTDETRTransformer:
|
||||
feat_channels: [256, 256, 256]
|
||||
feat_strides: [8, 16, 32]
|
||||
hidden_dim: 256
|
||||
num_levels: 3
|
||||
|
||||
num_queries: 300
|
||||
|
||||
num_decoder_layers: 6
|
||||
num_denoising: 100
|
||||
|
||||
eval_idx: -1
|
||||
eval_spatial_size: [640, 640]
|
||||
|
||||
|
||||
use_focal_loss: True
|
||||
|
||||
RTDETRPostProcessor:
|
||||
num_top_queries: 300
|
||||
|
||||
|
||||
SetCriterion:
|
||||
weight_dict: {loss_vfl: 1, loss_bbox: 5, loss_giou: 2,}
|
||||
losses: ['vfl', 'boxes', ]
|
||||
alpha: 0.75
|
||||
gamma: 2.0
|
||||
|
||||
matcher:
|
||||
type: HungarianMatcher
|
||||
weight_dict: {cost_class: 2, cost_bbox: 5, cost_giou: 2}
|
||||
# use_focal_loss: True
|
||||
alpha: 0.25
|
||||
gamma: 2.0
|
||||
|
||||
|
||||
|
||||
81
rtdetr_pytorch/configs/rtdetr/include/rtdetr_r50vd.yml
Normal file
81
rtdetr_pytorch/configs/rtdetr/include/rtdetr_r50vd.yml
Normal file
@@ -0,0 +1,81 @@
|
||||
task: detection
|
||||
|
||||
model: RTDETR
|
||||
criterion: SetCriterion
|
||||
postprocessor: RTDETRPostProcessor
|
||||
|
||||
|
||||
RTDETR:
|
||||
backbone: PResNet
|
||||
encoder: HybridEncoder
|
||||
decoder: RTDETRTransformer
|
||||
multi_scale: [480, 512, 544, 576, 608, 640, 640, 640, 672, 704, 736, 768, 800]
|
||||
|
||||
PResNet:
|
||||
depth: 50
|
||||
variant: d
|
||||
freeze_at: 0
|
||||
return_idx: [1, 2, 3]
|
||||
num_stages: 4
|
||||
freeze_norm: True
|
||||
pretrained: True
|
||||
|
||||
HybridEncoder:
|
||||
in_channels: [512, 1024, 2048]
|
||||
feat_strides: [8, 16, 32]
|
||||
|
||||
# intra
|
||||
hidden_dim: 256
|
||||
use_encoder_idx: [2]
|
||||
num_encoder_layers: 1
|
||||
nhead: 8
|
||||
dim_feedforward: 1024
|
||||
dropout: 0.
|
||||
enc_act: 'gelu'
|
||||
pe_temperature: 10000
|
||||
|
||||
# cross
|
||||
expansion: 1.0
|
||||
depth_mult: 1
|
||||
act: 'silu'
|
||||
|
||||
# eval
|
||||
eval_spatial_size: [640, 640]
|
||||
|
||||
|
||||
RTDETRTransformer:
|
||||
feat_channels: [256, 256, 256]
|
||||
feat_strides: [8, 16, 32]
|
||||
hidden_dim: 256
|
||||
num_levels: 3
|
||||
|
||||
num_queries: 300
|
||||
|
||||
num_decoder_layers: 6
|
||||
num_denoising: 100
|
||||
|
||||
eval_idx: -1
|
||||
eval_spatial_size: [640, 640]
|
||||
|
||||
|
||||
use_focal_loss: True
|
||||
|
||||
RTDETRPostProcessor:
|
||||
num_top_queries: 300
|
||||
|
||||
|
||||
SetCriterion:
|
||||
weight_dict: {loss_vfl: 1, loss_bbox: 5, loss_giou: 2,}
|
||||
losses: ['vfl', 'boxes', ]
|
||||
alpha: 0.75
|
||||
gamma: 2.0
|
||||
|
||||
matcher:
|
||||
type: HungarianMatcher
|
||||
weight_dict: {cost_class: 2, cost_bbox: 5, cost_giou: 2}
|
||||
# use_focal_loss: True
|
||||
alpha: 0.25
|
||||
gamma: 2.0
|
||||
|
||||
|
||||
|
||||
77
rtdetr_pytorch/configs/rtdetr/include/rtdetr_regnet.yml
Normal file
77
rtdetr_pytorch/configs/rtdetr/include/rtdetr_regnet.yml
Normal file
@@ -0,0 +1,77 @@
|
||||
task: detection
|
||||
|
||||
model: RTDETR
|
||||
criterion: SetCriterion
|
||||
postprocessor: RTDETRPostProcessor
|
||||
|
||||
|
||||
RTDETR:
|
||||
backbone: RegNet
|
||||
encoder: HybridEncoder
|
||||
decoder: RTDETRTransformer
|
||||
multi_scale: [480, 512, 544, 576, 608, 640, 640, 640, 672, 704, 736, 768, 800]
|
||||
|
||||
|
||||
RegNet:
|
||||
return_idx: [1, 2, 3]
|
||||
configuration: RegNetConfig()
|
||||
|
||||
HybridEncoder:
|
||||
in_channels: [192, 512, 1088]
|
||||
feat_strides: [8, 16, 32]
|
||||
|
||||
# intra
|
||||
hidden_dim: 256
|
||||
use_encoder_idx: [2]
|
||||
num_encoder_layers: 1
|
||||
nhead: 8
|
||||
dim_feedforward: 1024
|
||||
dropout: 0.
|
||||
enc_act: 'gelu'
|
||||
pe_temperature: 10000
|
||||
|
||||
# cross
|
||||
expansion: 1.0
|
||||
depth_mult: 1
|
||||
act: 'silu'
|
||||
|
||||
# eval
|
||||
eval_spatial_size: [640, 640]
|
||||
|
||||
|
||||
RTDETRTransformer:
|
||||
feat_channels: [256, 256, 256]
|
||||
feat_strides: [8, 16, 32]
|
||||
hidden_dim: 256
|
||||
num_levels: 3
|
||||
|
||||
num_queries: 300
|
||||
|
||||
num_decoder_layers: 6
|
||||
num_denoising: 100
|
||||
|
||||
eval_idx: -1
|
||||
eval_spatial_size: [640, 640]
|
||||
|
||||
|
||||
use_focal_loss: True
|
||||
|
||||
RTDETRPostProcessor:
|
||||
num_top_queries: 300
|
||||
|
||||
|
||||
SetCriterion:
|
||||
weight_dict: {loss_vfl: 1, loss_bbox: 5, loss_giou: 2,}
|
||||
losses: ['vfl', 'boxes', ]
|
||||
alpha: 0.75
|
||||
gamma: 2.0
|
||||
|
||||
matcher:
|
||||
type: HungarianMatcher
|
||||
weight_dict: {cost_class: 2, cost_bbox: 5, cost_giou: 2}
|
||||
# use_focal_loss: True
|
||||
alpha: 0.25
|
||||
gamma: 2.0
|
||||
|
||||
|
||||
|
||||
9
rtdetr_pytorch/configs/rtdetr/rtdetr_dla34_6x_coco.yml
Normal file
9
rtdetr_pytorch/configs/rtdetr/rtdetr_dla34_6x_coco.yml
Normal file
@@ -0,0 +1,9 @@
|
||||
__include__: [
|
||||
'../dataset/coco_detection.yml',
|
||||
'../runtime.yml',
|
||||
'./include/dataloader.yml',
|
||||
'./include/optimizer.yml',
|
||||
'./include/rtdetr_dla34.yml',
|
||||
]
|
||||
|
||||
output_dir: ./output/rtdetr_dla34_6x_coco
|
||||
28
rtdetr_pytorch/configs/rtdetr/rtdetr_r101vd_6x_coco.yml
Normal file
28
rtdetr_pytorch/configs/rtdetr/rtdetr_r101vd_6x_coco.yml
Normal file
@@ -0,0 +1,28 @@
|
||||
__include__: [
|
||||
'../dataset/coco_detection.yml',
|
||||
'../runtime.yml',
|
||||
'./include/dataloader.yml',
|
||||
'./include/optimizer.yml',
|
||||
'./include/rtdetr_r50vd.yml',
|
||||
]
|
||||
|
||||
PResNet:
|
||||
depth: 101
|
||||
|
||||
|
||||
HybridEncoder:
|
||||
# intra
|
||||
hidden_dim: 384
|
||||
dim_feedforward: 2048
|
||||
|
||||
|
||||
RTDETRTransformer:
|
||||
feat_channels: [384, 384, 384]
|
||||
|
||||
|
||||
optimizer:
|
||||
type: AdamW
|
||||
params:
|
||||
-
|
||||
params: 'backbone'
|
||||
lr: 0.000001
|
||||
49
rtdetr_pytorch/configs/rtdetr/rtdetr_r18vd_6x_coco.yml
Normal file
49
rtdetr_pytorch/configs/rtdetr/rtdetr_r18vd_6x_coco.yml
Normal file
@@ -0,0 +1,49 @@
|
||||
|
||||
__include__: [
|
||||
'../dataset/coco_detection.yml',
|
||||
'../runtime.yml',
|
||||
'./include/dataloader.yml',
|
||||
'./include/optimizer.yml',
|
||||
'./include/rtdetr_r50vd.yml',
|
||||
]
|
||||
|
||||
|
||||
output_dir: ./output/rtdetr_r18vd_6x_coco
|
||||
|
||||
PResNet:
|
||||
depth: 18
|
||||
freeze_at: -1
|
||||
freeze_norm: False
|
||||
pretrained: True
|
||||
|
||||
HybridEncoder:
|
||||
in_channels: [128, 256, 512]
|
||||
hidden_dim: 256
|
||||
expansion: 0.5
|
||||
|
||||
|
||||
RTDETRTransformer:
|
||||
eval_idx: -1
|
||||
num_decoder_layers: 3
|
||||
num_denoising: 100
|
||||
|
||||
|
||||
|
||||
optimizer:
|
||||
type: AdamW
|
||||
params:
|
||||
-
|
||||
params: '^(?=.*backbone)(?=.*norm).*$'
|
||||
lr: 0.00001
|
||||
weight_decay: 0.
|
||||
-
|
||||
params: '^(?=.*backbone)(?!.*norm).*$'
|
||||
lr: 0.00001
|
||||
-
|
||||
params: '^(?=.*(?:encoder|decoder))(?=.*(?:norm|bias)).*$'
|
||||
weight_decay: 0.
|
||||
|
||||
lr: 0.0001
|
||||
betas: [0.9, 0.999]
|
||||
weight_decay: 0.0001
|
||||
|
||||
48
rtdetr_pytorch/configs/rtdetr/rtdetr_r34vd_6x_coco.yml
Normal file
48
rtdetr_pytorch/configs/rtdetr/rtdetr_r34vd_6x_coco.yml
Normal file
@@ -0,0 +1,48 @@
|
||||
|
||||
__include__: [
|
||||
'../dataset/coco_detection.yml',
|
||||
'../runtime.yml',
|
||||
'./include/dataloader.yml',
|
||||
'./include/optimizer.yml',
|
||||
'./include/rtdetr_r50vd.yml',
|
||||
]
|
||||
|
||||
|
||||
output_dir: ./output/rtdetr_r34vd_6x_coco
|
||||
|
||||
|
||||
PResNet:
|
||||
depth: 34
|
||||
freeze_at: -1
|
||||
freeze_norm: False
|
||||
pretrained: True
|
||||
|
||||
|
||||
HybridEncoder:
|
||||
in_channels: [128, 256, 512]
|
||||
hidden_dim: 256
|
||||
expansion: 0.5
|
||||
|
||||
|
||||
RTDETRTransformer:
|
||||
num_decoder_layers: 4
|
||||
|
||||
|
||||
|
||||
optimizer:
|
||||
type: AdamW
|
||||
params:
|
||||
-
|
||||
params: '^(?=.*backbone)(?=.*norm|bn).*$'
|
||||
weight_decay: 0.
|
||||
lr: 0.00001
|
||||
-
|
||||
params: '^(?=.*backbone)(?!.*norm|bn).*$'
|
||||
lr: 0.00001
|
||||
-
|
||||
params: '^(?=.*(?:encoder|decoder))(?=.*(?:norm|bn|bias)).*$'
|
||||
weight_decay: 0.
|
||||
|
||||
lr: 0.0001
|
||||
betas: [0.9, 0.999]
|
||||
weight_decay: 0.0001
|
||||
9
rtdetr_pytorch/configs/rtdetr/rtdetr_r50vd_6x_coco.yml
Normal file
9
rtdetr_pytorch/configs/rtdetr/rtdetr_r50vd_6x_coco.yml
Normal file
@@ -0,0 +1,9 @@
|
||||
__include__: [
|
||||
'../dataset/coco_detection.yml',
|
||||
'../runtime.yml',
|
||||
'./include/dataloader.yml',
|
||||
'./include/optimizer.yml',
|
||||
'./include/rtdetr_r50vd.yml',
|
||||
]
|
||||
|
||||
output_dir: ./output/rtdetr_r50vd_6x_coco
|
||||
16
rtdetr_pytorch/configs/rtdetr/rtdetr_r50vd_m_6x_coco.yml
Normal file
16
rtdetr_pytorch/configs/rtdetr/rtdetr_r50vd_m_6x_coco.yml
Normal file
@@ -0,0 +1,16 @@
|
||||
__include__: [
|
||||
'../dataset/coco_detection.yml',
|
||||
'../runtime.yml',
|
||||
'./include/dataloader.yml',
|
||||
'./include/optimizer.yml',
|
||||
'./include/rtdetr_r50vd.yml',
|
||||
]
|
||||
|
||||
output_dir: ./output/rtdetr_r50vd_m_6x_coco
|
||||
|
||||
|
||||
HybridEncoder:
|
||||
expansion: 0.5
|
||||
|
||||
RTDETRTransformer:
|
||||
eval_idx: 2 # use 3th decoder layer to eval
|
||||
9
rtdetr_pytorch/configs/rtdetr/rtdetr_regnet_6x_coco.yml
Normal file
9
rtdetr_pytorch/configs/rtdetr/rtdetr_regnet_6x_coco.yml
Normal file
@@ -0,0 +1,9 @@
|
||||
__include__: [
|
||||
'../dataset/coco_detection.yml',
|
||||
'../runtime.yml',
|
||||
'./include/dataloader_regnet.yml',
|
||||
'./include/optimizer_regnet.yml',
|
||||
'./include/rtdetr_regnet.yml',
|
||||
]
|
||||
|
||||
output_dir: ./output/rtdetr_regnet_6x_coco
|
||||
17
rtdetr_pytorch/configs/runtime.yml
Normal file
17
rtdetr_pytorch/configs/runtime.yml
Normal file
@@ -0,0 +1,17 @@
|
||||
sync_bn: True
|
||||
find_unused_parameters: False
|
||||
|
||||
|
||||
use_amp: False
|
||||
|
||||
scaler:
|
||||
type: GradScaler
|
||||
enabled: True
|
||||
|
||||
|
||||
use_ema: False
|
||||
ema:
|
||||
type: ModelEMA
|
||||
decay: 0.9999
|
||||
warmups: 2000
|
||||
|
||||
8
rtdetr_pytorch/requirements.txt
Normal file
8
rtdetr_pytorch/requirements.txt
Normal file
@@ -0,0 +1,8 @@
|
||||
torch==2.0.1
|
||||
torchvision==0.15.2
|
||||
onnx==1.14.0
|
||||
onnxruntime==1.15.1
|
||||
pycocotools
|
||||
PyYAML
|
||||
scipy
|
||||
transformers
|
||||
5
rtdetr_pytorch/src/__init__.py
Normal file
5
rtdetr_pytorch/src/__init__.py
Normal file
@@ -0,0 +1,5 @@
|
||||
|
||||
from . import data
|
||||
from . import nn
|
||||
from . import optim
|
||||
from . import zoo
|
||||
7
rtdetr_pytorch/src/core/__init__.py
Normal file
7
rtdetr_pytorch/src/core/__init__.py
Normal file
@@ -0,0 +1,7 @@
|
||||
"""by lyuwenyu
|
||||
"""
|
||||
|
||||
# from .yaml_utils import register, create, load_config, merge_config, merge_dict
|
||||
from .yaml_utils import *
|
||||
from .config import BaseConfig
|
||||
from .yaml_config import YAMLConfig
|
||||
264
rtdetr_pytorch/src/core/config.py
Normal file
264
rtdetr_pytorch/src/core/config.py
Normal file
@@ -0,0 +1,264 @@
|
||||
"""by lyuwenyu
|
||||
"""
|
||||
|
||||
from pprint import pprint
|
||||
import torch
|
||||
import torch.nn as nn
|
||||
from torch.utils.data import Dataset, DataLoader
|
||||
from torch.optim import Optimizer
|
||||
from torch.optim.lr_scheduler import LRScheduler
|
||||
from torch.cuda.amp.grad_scaler import GradScaler
|
||||
|
||||
from typing import Callable, List, Dict
|
||||
|
||||
|
||||
__all__ = ['BaseConfig', ]
|
||||
|
||||
|
||||
|
||||
class BaseConfig(object):
|
||||
# TODO property
|
||||
|
||||
|
||||
def __init__(self) -> None:
|
||||
super().__init__()
|
||||
|
||||
self.task :str = None
|
||||
|
||||
self._model :nn.Module = None
|
||||
self._postprocessor :nn.Module = None
|
||||
self._criterion :nn.Module = None
|
||||
self._optimizer :Optimizer = None
|
||||
self._lr_scheduler :LRScheduler = None
|
||||
self._train_dataloader :DataLoader = None
|
||||
self._val_dataloader :DataLoader = None
|
||||
self._ema :nn.Module = None
|
||||
self._scaler :GradScaler = None
|
||||
|
||||
self.train_dataset :Dataset = None
|
||||
self.val_dataset :Dataset = None
|
||||
self.num_workers :int = 0
|
||||
self.collate_fn :Callable = None
|
||||
|
||||
self.batch_size :int = None
|
||||
self._train_batch_size :int = None
|
||||
self._val_batch_size :int = None
|
||||
self._train_shuffle: bool = None
|
||||
self._val_shuffle: bool = None
|
||||
|
||||
self.evaluator :Callable[[nn.Module, DataLoader, str], ] = None
|
||||
|
||||
# runtime
|
||||
self.resume :str = None
|
||||
self.tuning :str = None
|
||||
|
||||
self.epoches :int = None
|
||||
self.last_epoch :int = -1
|
||||
self.end_epoch :int = None
|
||||
|
||||
self.use_amp :bool = False
|
||||
self.use_ema :bool = False
|
||||
self.sync_bn :bool = False
|
||||
self.clip_max_norm : float = None
|
||||
self.find_unused_parameters :bool = None
|
||||
# self.ema_decay: float = 0.9999
|
||||
# self.grad_clip_: Callable = None
|
||||
|
||||
self.log_dir :str = './logs/'
|
||||
self.log_step :int = 10
|
||||
self._output_dir :str = None
|
||||
self._print_freq :int = None
|
||||
self.checkpoint_step :int = 1
|
||||
|
||||
# self.device :str = torch.device('cpu')
|
||||
device = 'cuda' if torch.cuda.is_available() else 'cpu'
|
||||
self.device = torch.device(device)
|
||||
|
||||
|
||||
@property
|
||||
def model(self, ) -> nn.Module:
|
||||
return self._model
|
||||
|
||||
@model.setter
|
||||
def model(self, m):
|
||||
assert isinstance(m, nn.Module), f'{type(m)} != nn.Module, please check your model class'
|
||||
self._model = m
|
||||
|
||||
@property
|
||||
def postprocessor(self, ) -> nn.Module:
|
||||
return self._postprocessor
|
||||
|
||||
@postprocessor.setter
|
||||
def postprocessor(self, m):
|
||||
assert isinstance(m, nn.Module), f'{type(m)} != nn.Module, please check your model class'
|
||||
self._postprocessor = m
|
||||
|
||||
@property
|
||||
def criterion(self, ) -> nn.Module:
|
||||
return self._criterion
|
||||
|
||||
@criterion.setter
|
||||
def criterion(self, m):
|
||||
assert isinstance(m, nn.Module), f'{type(m)} != nn.Module, please check your model class'
|
||||
self._criterion = m
|
||||
|
||||
@property
|
||||
def optimizer(self, ) -> Optimizer:
|
||||
return self._optimizer
|
||||
|
||||
@optimizer.setter
|
||||
def optimizer(self, m):
|
||||
assert isinstance(m, Optimizer), f'{type(m)} != optim.Optimizer, please check your model class'
|
||||
self._optimizer = m
|
||||
|
||||
@property
|
||||
def lr_scheduler(self, ) -> LRScheduler:
|
||||
return self._lr_scheduler
|
||||
|
||||
@lr_scheduler.setter
|
||||
def lr_scheduler(self, m):
|
||||
assert isinstance(m, LRScheduler), f'{type(m)} != LRScheduler, please check your model class'
|
||||
self._lr_scheduler = m
|
||||
|
||||
|
||||
@property
|
||||
def train_dataloader(self):
|
||||
if self._train_dataloader is None and self.train_dataset is not None:
|
||||
loader = DataLoader(self.train_dataset,
|
||||
batch_size=self.train_batch_size,
|
||||
num_workers=self.num_workers,
|
||||
collate_fn=self.collate_fn,
|
||||
shuffle=self.train_shuffle, )
|
||||
loader.shuffle = self.train_shuffle
|
||||
self._train_dataloader = loader
|
||||
|
||||
return self._train_dataloader
|
||||
|
||||
@train_dataloader.setter
|
||||
def train_dataloader(self, loader):
|
||||
self._train_dataloader = loader
|
||||
|
||||
@property
|
||||
def val_dataloader(self):
|
||||
if self._val_dataloader is None and self.val_dataset is not None:
|
||||
loader = DataLoader(self.val_dataset,
|
||||
batch_size=self.val_batch_size,
|
||||
num_workers=self.num_workers,
|
||||
drop_last=False,
|
||||
collate_fn=self.collate_fn,
|
||||
shuffle=self.val_shuffle)
|
||||
loader.shuffle = self.val_shuffle
|
||||
self._val_dataloader = loader
|
||||
|
||||
return self._val_dataloader
|
||||
|
||||
@val_dataloader.setter
|
||||
def val_dataloader(self, loader):
|
||||
self._val_dataloader = loader
|
||||
|
||||
|
||||
# TODO method
|
||||
# @property
|
||||
# def ema(self, ) -> nn.Module:
|
||||
# if self._ema is None and self.use_ema and self.model is not None:
|
||||
# self._ema = ModelEMA(self.model, self.ema_decay)
|
||||
# return self._ema
|
||||
|
||||
@property
|
||||
def ema(self, ) -> nn.Module:
|
||||
return self._ema
|
||||
|
||||
@ema.setter
|
||||
def ema(self, obj):
|
||||
self._ema = obj
|
||||
|
||||
|
||||
@property
|
||||
def scaler(self) -> GradScaler:
|
||||
if self._scaler is None and self.use_amp and torch.cuda.is_available():
|
||||
self._scaler = GradScaler()
|
||||
return self._scaler
|
||||
|
||||
@scaler.setter
|
||||
def scaler(self, obj: GradScaler):
|
||||
self._scaler = obj
|
||||
|
||||
|
||||
@property
|
||||
def val_shuffle(self):
|
||||
if self._val_shuffle is None:
|
||||
print('warning: set default val_shuffle=False')
|
||||
return False
|
||||
return self._val_shuffle
|
||||
|
||||
@val_shuffle.setter
|
||||
def val_shuffle(self, shuffle):
|
||||
assert isinstance(shuffle, bool), 'shuffle must be bool'
|
||||
self._val_shuffle = shuffle
|
||||
|
||||
@property
|
||||
def train_shuffle(self):
|
||||
if self._train_shuffle is None:
|
||||
print('warning: set default train_shuffle=True')
|
||||
return True
|
||||
return self._train_shuffle
|
||||
|
||||
@train_shuffle.setter
|
||||
def train_shuffle(self, shuffle):
|
||||
assert isinstance(shuffle, bool), 'shuffle must be bool'
|
||||
self._train_shuffle = shuffle
|
||||
|
||||
|
||||
@property
|
||||
def train_batch_size(self):
|
||||
if self._train_batch_size is None and isinstance(self.batch_size, int):
|
||||
print(f'warning: set train_batch_size=batch_size={self.batch_size}')
|
||||
return self.batch_size
|
||||
return self._train_batch_size
|
||||
|
||||
@train_batch_size.setter
|
||||
def train_batch_size(self, batch_size):
|
||||
assert isinstance(batch_size, int), 'batch_size must be int'
|
||||
self._train_batch_size = batch_size
|
||||
|
||||
@property
|
||||
def val_batch_size(self):
|
||||
if self._val_batch_size is None:
|
||||
print(f'warning: set val_batch_size=batch_size={self.batch_size}')
|
||||
return self.batch_size
|
||||
return self._val_batch_size
|
||||
|
||||
@val_batch_size.setter
|
||||
def val_batch_size(self, batch_size):
|
||||
assert isinstance(batch_size, int), 'batch_size must be int'
|
||||
self._val_batch_size = batch_size
|
||||
|
||||
|
||||
@property
|
||||
def output_dir(self):
|
||||
if self._output_dir is None:
|
||||
return self.log_dir
|
||||
return self._output_dir
|
||||
|
||||
@output_dir.setter
|
||||
def output_dir(self, root):
|
||||
self._output_dir = root
|
||||
|
||||
@property
|
||||
def print_freq(self):
|
||||
if self._print_freq is None:
|
||||
# self._print_freq = self.log_step
|
||||
return self.log_step
|
||||
return self._print_freq
|
||||
|
||||
@print_freq.setter
|
||||
def print_freq(self, n):
|
||||
assert isinstance(n, int), 'print_freq must be int'
|
||||
self._print_freq = n
|
||||
|
||||
|
||||
# def __repr__(self) -> str:
|
||||
# pass
|
||||
|
||||
|
||||
|
||||
152
rtdetr_pytorch/src/core/yaml_config.py
Normal file
152
rtdetr_pytorch/src/core/yaml_config.py
Normal file
@@ -0,0 +1,152 @@
|
||||
"""by lyuwenyu
|
||||
"""
|
||||
|
||||
import torch
|
||||
import torch.nn as nn
|
||||
|
||||
import re
|
||||
import copy
|
||||
|
||||
from .config import BaseConfig
|
||||
from .yaml_utils import load_config, merge_config, create, merge_dict
|
||||
|
||||
|
||||
class YAMLConfig(BaseConfig):
|
||||
def __init__(self, cfg_path: str, **kwargs) -> None:
|
||||
super().__init__()
|
||||
|
||||
cfg = load_config(cfg_path)
|
||||
merge_dict(cfg, kwargs)
|
||||
|
||||
# pprint(cfg)
|
||||
|
||||
self.yaml_cfg = cfg
|
||||
|
||||
self.log_step = cfg.get('log_step', 100)
|
||||
self.checkpoint_step = cfg.get('checkpoint_step', 1)
|
||||
self.epoches = cfg.get('epoches', -1)
|
||||
self.resume = cfg.get('resume', '')
|
||||
self.tuning = cfg.get('tuning', '')
|
||||
self.sync_bn = cfg.get('sync_bn', False)
|
||||
self.output_dir = cfg.get('output_dir', None)
|
||||
|
||||
self.use_ema = cfg.get('use_ema', False)
|
||||
self.use_amp = cfg.get('use_amp', False)
|
||||
self.autocast = cfg.get('autocast', dict())
|
||||
self.find_unused_parameters = cfg.get('find_unused_parameters', None)
|
||||
self.clip_max_norm = cfg.get('clip_max_norm', 0.)
|
||||
|
||||
|
||||
@property
|
||||
def model(self, ) -> torch.nn.Module:
|
||||
if self._model is None and 'model' in self.yaml_cfg:
|
||||
merge_config(self.yaml_cfg)
|
||||
self._model = create(self.yaml_cfg['model'])
|
||||
return self._model
|
||||
|
||||
@property
|
||||
def postprocessor(self, ) -> torch.nn.Module:
|
||||
if self._postprocessor is None and 'postprocessor' in self.yaml_cfg:
|
||||
merge_config(self.yaml_cfg)
|
||||
self._postprocessor = create(self.yaml_cfg['postprocessor'])
|
||||
return self._postprocessor
|
||||
|
||||
@property
|
||||
def criterion(self, ):
|
||||
if self._criterion is None and 'criterion' in self.yaml_cfg:
|
||||
merge_config(self.yaml_cfg)
|
||||
self._criterion = create(self.yaml_cfg['criterion'])
|
||||
return self._criterion
|
||||
|
||||
|
||||
@property
|
||||
def optimizer(self, ):
|
||||
if self._optimizer is None and 'optimizer' in self.yaml_cfg:
|
||||
merge_config(self.yaml_cfg)
|
||||
params = self.get_optim_params(self.yaml_cfg['optimizer'], self.model)
|
||||
self._optimizer = create('optimizer', params=params)
|
||||
|
||||
return self._optimizer
|
||||
|
||||
@property
|
||||
def lr_scheduler(self, ):
|
||||
if self._lr_scheduler is None and 'lr_scheduler' in self.yaml_cfg:
|
||||
merge_config(self.yaml_cfg)
|
||||
self._lr_scheduler = create('lr_scheduler', optimizer=self.optimizer)
|
||||
print('Initial lr: ', self._lr_scheduler.get_last_lr())
|
||||
|
||||
return self._lr_scheduler
|
||||
|
||||
@property
|
||||
def train_dataloader(self, ):
|
||||
if self._train_dataloader is None and 'train_dataloader' in self.yaml_cfg:
|
||||
merge_config(self.yaml_cfg)
|
||||
self._train_dataloader = create('train_dataloader')
|
||||
self._train_dataloader.shuffle = self.yaml_cfg['train_dataloader'].get('shuffle', False)
|
||||
|
||||
return self._train_dataloader
|
||||
|
||||
@property
|
||||
def val_dataloader(self, ):
|
||||
if self._val_dataloader is None and 'val_dataloader' in self.yaml_cfg:
|
||||
merge_config(self.yaml_cfg)
|
||||
self._val_dataloader = create('val_dataloader')
|
||||
self._val_dataloader.shuffle = self.yaml_cfg['val_dataloader'].get('shuffle', False)
|
||||
|
||||
return self._val_dataloader
|
||||
|
||||
|
||||
@property
|
||||
def ema(self, ):
|
||||
if self._ema is None and self.yaml_cfg.get('use_ema', False):
|
||||
merge_config(self.yaml_cfg)
|
||||
self._ema = create('ema', model=self.model)
|
||||
|
||||
return self._ema
|
||||
|
||||
|
||||
@property
|
||||
def scaler(self, ):
|
||||
if self._scaler is None and self.yaml_cfg.get('use_amp', False):
|
||||
merge_config(self.yaml_cfg)
|
||||
self._scaler = create('scaler')
|
||||
|
||||
return self._scaler
|
||||
|
||||
|
||||
@staticmethod
|
||||
def get_optim_params(cfg: dict, model: nn.Module):
|
||||
'''
|
||||
E.g.:
|
||||
^(?=.*a)(?=.*b).*$ means including a and b
|
||||
^((?!b.)*a((?!b).)*$ means including a but not b
|
||||
^((?!b|c).)*a((?!b|c).)*$ means including a but not (b | c)
|
||||
'''
|
||||
assert 'type' in cfg, ''
|
||||
cfg = copy.deepcopy(cfg)
|
||||
|
||||
if 'params' not in cfg:
|
||||
return model.parameters()
|
||||
|
||||
assert isinstance(cfg['params'], list), ''
|
||||
|
||||
param_groups = []
|
||||
visited = []
|
||||
for pg in cfg['params']:
|
||||
pattern = pg['params']
|
||||
params = {k: v for k, v in model.named_parameters() if v.requires_grad and len(re.findall(pattern, k)) > 0}
|
||||
pg['params'] = params.values()
|
||||
param_groups.append(pg)
|
||||
visited.extend(list(params.keys()))
|
||||
|
||||
names = [k for k, v in model.named_parameters() if v.requires_grad]
|
||||
|
||||
if len(visited) < len(names):
|
||||
unseen = set(names) - set(visited)
|
||||
params = {k: v for k, v in model.named_parameters() if v.requires_grad and k in unseen}
|
||||
param_groups.append({'params': params.values()})
|
||||
visited.extend(list(params.keys()))
|
||||
|
||||
assert len(visited) == len(names), ''
|
||||
|
||||
return param_groups
|
||||
208
rtdetr_pytorch/src/core/yaml_utils.py
Normal file
208
rtdetr_pytorch/src/core/yaml_utils.py
Normal file
@@ -0,0 +1,208 @@
|
||||
""""by lyuwenyu
|
||||
"""
|
||||
|
||||
import os
|
||||
import yaml
|
||||
import inspect
|
||||
import importlib
|
||||
|
||||
__all__ = ['GLOBAL_CONFIG', 'register', 'create', 'load_config', 'merge_config', 'merge_dict']
|
||||
|
||||
|
||||
GLOBAL_CONFIG = dict()
|
||||
INCLUDE_KEY = '__include__'
|
||||
|
||||
|
||||
def register(cls: type):
|
||||
'''
|
||||
Args:
|
||||
cls (type): Module class to be registered.
|
||||
'''
|
||||
if cls.__name__ in GLOBAL_CONFIG:
|
||||
raise ValueError('{} already registered'.format(cls.__name__))
|
||||
|
||||
if inspect.isfunction(cls):
|
||||
GLOBAL_CONFIG[cls.__name__] = cls
|
||||
|
||||
elif inspect.isclass(cls):
|
||||
GLOBAL_CONFIG[cls.__name__] = extract_schema(cls)
|
||||
|
||||
else:
|
||||
raise ValueError(f'register {cls}')
|
||||
|
||||
return cls
|
||||
|
||||
|
||||
def extract_schema(cls: type):
|
||||
'''
|
||||
Args:
|
||||
cls (type),
|
||||
Return:
|
||||
Dict,
|
||||
'''
|
||||
argspec = inspect.getfullargspec(cls.__init__)
|
||||
arg_names = [arg for arg in argspec.args if arg != 'self']
|
||||
num_defualts = len(argspec.defaults) if argspec.defaults is not None else 0
|
||||
num_requires = len(arg_names) - num_defualts
|
||||
|
||||
schame = dict()
|
||||
schame['_name'] = cls.__name__
|
||||
schame['_pymodule'] = importlib.import_module(cls.__module__)
|
||||
schame['_inject'] = getattr(cls, '__inject__', [])
|
||||
schame['_share'] = getattr(cls, '__share__', [])
|
||||
|
||||
for i, name in enumerate(arg_names):
|
||||
if name in schame['_share']:
|
||||
assert i >= num_requires, 'share config must have default value.'
|
||||
value = argspec.defaults[i - num_requires]
|
||||
|
||||
elif i >= num_requires:
|
||||
value = argspec.defaults[i - num_requires]
|
||||
|
||||
else:
|
||||
value = None
|
||||
|
||||
schame[name] = value
|
||||
|
||||
return schame
|
||||
|
||||
|
||||
|
||||
def create(type_or_name, **kwargs):
|
||||
'''
|
||||
'''
|
||||
assert type(type_or_name) in (type, str), 'create should be class or name.'
|
||||
|
||||
name = type_or_name if isinstance(type_or_name, str) else type_or_name.__name__
|
||||
|
||||
if name in GLOBAL_CONFIG:
|
||||
if hasattr(GLOBAL_CONFIG[name], '__dict__'):
|
||||
return GLOBAL_CONFIG[name]
|
||||
else:
|
||||
raise ValueError('The module {} is not registered'.format(name))
|
||||
|
||||
cfg = GLOBAL_CONFIG[name]
|
||||
|
||||
if isinstance(cfg, dict) and 'type' in cfg:
|
||||
_cfg: dict = GLOBAL_CONFIG[cfg['type']]
|
||||
_cfg.update(cfg) # update global cls default args
|
||||
_cfg.update(kwargs) # TODO
|
||||
name = _cfg.pop('type')
|
||||
|
||||
return create(name)
|
||||
|
||||
|
||||
cls = getattr(cfg['_pymodule'], name)
|
||||
argspec = inspect.getfullargspec(cls.__init__)
|
||||
arg_names = [arg for arg in argspec.args if arg != 'self']
|
||||
|
||||
cls_kwargs = {}
|
||||
cls_kwargs.update(cfg)
|
||||
|
||||
# shared var
|
||||
for k in cfg['_share']:
|
||||
if k in GLOBAL_CONFIG:
|
||||
cls_kwargs[k] = GLOBAL_CONFIG[k]
|
||||
else:
|
||||
cls_kwargs[k] = cfg[k]
|
||||
|
||||
# inject
|
||||
for k in cfg['_inject']:
|
||||
_k = cfg[k]
|
||||
|
||||
if _k is None:
|
||||
continue
|
||||
|
||||
if isinstance(_k, str):
|
||||
if _k not in GLOBAL_CONFIG:
|
||||
raise ValueError(f'Missing inject config of {_k}.')
|
||||
|
||||
_cfg = GLOBAL_CONFIG[_k]
|
||||
|
||||
if isinstance(_cfg, dict):
|
||||
cls_kwargs[k] = create(_cfg['_name'])
|
||||
else:
|
||||
cls_kwargs[k] = _cfg
|
||||
|
||||
elif isinstance(_k, dict):
|
||||
if 'type' not in _k.keys():
|
||||
raise ValueError(f'Missing inject for `type` style.')
|
||||
|
||||
_type = str(_k['type'])
|
||||
if _type not in GLOBAL_CONFIG:
|
||||
raise ValueError(f'Missing {_type} in inspect stage.')
|
||||
|
||||
# TODO modified inspace, maybe get wrong result for using `> 1`
|
||||
_cfg: dict = GLOBAL_CONFIG[_type]
|
||||
# _cfg_copy = copy.deepcopy(_cfg)
|
||||
_cfg.update(_k) # update
|
||||
cls_kwargs[k] = create(_type)
|
||||
# _cfg.update(_cfg_copy) # resume
|
||||
|
||||
else:
|
||||
raise ValueError(f'Inject does not support {_k}')
|
||||
|
||||
|
||||
cls_kwargs = {n: cls_kwargs[n] for n in arg_names}
|
||||
|
||||
return cls(**cls_kwargs)
|
||||
|
||||
|
||||
|
||||
def load_config(file_path, cfg=dict()):
|
||||
'''load config
|
||||
'''
|
||||
_, ext = os.path.splitext(file_path)
|
||||
assert ext in ['.yml', '.yaml'], "only support yaml files for now"
|
||||
|
||||
with open(file_path) as f:
|
||||
file_cfg = yaml.load(f, Loader=yaml.Loader)
|
||||
if file_cfg is None:
|
||||
return {}
|
||||
|
||||
if INCLUDE_KEY in file_cfg:
|
||||
base_yamls = list(file_cfg[INCLUDE_KEY])
|
||||
for base_yaml in base_yamls:
|
||||
if base_yaml.startswith('~'):
|
||||
base_yaml = os.path.expanduser(base_yaml)
|
||||
|
||||
if not base_yaml.startswith('/'):
|
||||
base_yaml = os.path.join(os.path.dirname(file_path), base_yaml)
|
||||
|
||||
with open(base_yaml) as f:
|
||||
base_cfg = load_config(base_yaml, cfg)
|
||||
merge_config(base_cfg, cfg)
|
||||
|
||||
return merge_config(file_cfg, cfg)
|
||||
|
||||
|
||||
|
||||
def merge_dict(dct, another_dct):
|
||||
'''merge another_dct into dct
|
||||
'''
|
||||
for k in another_dct:
|
||||
if (k in dct and isinstance(dct[k], dict) and isinstance(another_dct[k], dict)):
|
||||
merge_dict(dct[k], another_dct[k])
|
||||
else:
|
||||
dct[k] = another_dct[k]
|
||||
|
||||
return dct
|
||||
|
||||
|
||||
|
||||
def merge_config(config, another_cfg=None):
|
||||
"""
|
||||
Merge config into global config or another_cfg.
|
||||
|
||||
Args:
|
||||
config (dict): Config to be merged.
|
||||
|
||||
Returns: global config
|
||||
"""
|
||||
global GLOBAL_CONFIG
|
||||
dct = GLOBAL_CONFIG if another_cfg is None else another_cfg
|
||||
|
||||
return merge_dict(dct, config)
|
||||
|
||||
|
||||
|
||||
7
rtdetr_pytorch/src/data/__init__.py
Normal file
7
rtdetr_pytorch/src/data/__init__.py
Normal file
@@ -0,0 +1,7 @@
|
||||
|
||||
from .coco import *
|
||||
from .cifar10 import CIFAR10
|
||||
|
||||
from .dataloader import *
|
||||
from .transforms import *
|
||||
|
||||
14
rtdetr_pytorch/src/data/cifar10/__init__.py
Normal file
14
rtdetr_pytorch/src/data/cifar10/__init__.py
Normal file
@@ -0,0 +1,14 @@
|
||||
|
||||
import torchvision
|
||||
from typing import Optional, Callable
|
||||
|
||||
from src.core import register
|
||||
|
||||
|
||||
@register
|
||||
class CIFAR10(torchvision.datasets.CIFAR10):
|
||||
__inject__ = ['transform', 'target_transform']
|
||||
|
||||
def __init__(self, root: str, train: bool = True, transform: Optional[Callable] = None, target_transform: Optional[Callable] = None, download: bool = False) -> None:
|
||||
super().__init__(root, train, transform, target_transform, download)
|
||||
|
||||
9
rtdetr_pytorch/src/data/coco/__init__.py
Normal file
9
rtdetr_pytorch/src/data/coco/__init__.py
Normal file
@@ -0,0 +1,9 @@
|
||||
from .coco_dataset import (
|
||||
CocoDetection,
|
||||
mscoco_category2label,
|
||||
mscoco_label2category,
|
||||
mscoco_category2name,
|
||||
)
|
||||
from .coco_eval import *
|
||||
|
||||
from .coco_utils import get_coco_api_from_dataset
|
||||
238
rtdetr_pytorch/src/data/coco/coco_dataset.py
Normal file
238
rtdetr_pytorch/src/data/coco/coco_dataset.py
Normal file
@@ -0,0 +1,238 @@
|
||||
"""
|
||||
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
|
||||
|
||||
COCO dataset which returns image_id for evaluation.
|
||||
Mostly copy-paste from https://github.com/pytorch/vision/blob/13b35ff/references/detection/coco_utils.py
|
||||
"""
|
||||
|
||||
import torch
|
||||
import torch.utils.data
|
||||
|
||||
import torchvision
|
||||
torchvision.disable_beta_transforms_warning()
|
||||
|
||||
from torchvision import datapoints
|
||||
|
||||
from pycocotools import mask as coco_mask
|
||||
|
||||
from src.core import register
|
||||
|
||||
__all__ = ['CocoDetection']
|
||||
|
||||
|
||||
@register
|
||||
class CocoDetection(torchvision.datasets.CocoDetection):
|
||||
__inject__ = ['transforms']
|
||||
__share__ = ['remap_mscoco_category']
|
||||
|
||||
def __init__(self, img_folder, ann_file, transforms, return_masks, remap_mscoco_category=False):
|
||||
super(CocoDetection, self).__init__(img_folder, ann_file)
|
||||
self._transforms = transforms
|
||||
self.prepare = ConvertCocoPolysToMask(return_masks, remap_mscoco_category)
|
||||
self.img_folder = img_folder
|
||||
self.ann_file = ann_file
|
||||
self.return_masks = return_masks
|
||||
self.remap_mscoco_category = remap_mscoco_category
|
||||
|
||||
def __getitem__(self, idx):
|
||||
img, target = super(CocoDetection, self).__getitem__(idx)
|
||||
image_id = self.ids[idx]
|
||||
target = {'image_id': image_id, 'annotations': target}
|
||||
img, target = self.prepare(img, target)
|
||||
|
||||
# ['boxes', 'masks', 'labels']:
|
||||
if 'boxes' in target:
|
||||
target['boxes'] = datapoints.BoundingBox(
|
||||
target['boxes'],
|
||||
format=datapoints.BoundingBoxFormat.XYXY,
|
||||
spatial_size=img.size[::-1]) # h w
|
||||
|
||||
if 'masks' in target:
|
||||
target['masks'] = datapoints.Mask(target['masks'])
|
||||
|
||||
if self._transforms is not None:
|
||||
img, target = self._transforms(img, target)
|
||||
|
||||
return img, target
|
||||
|
||||
def extra_repr(self) -> str:
|
||||
s = f' img_folder: {self.img_folder}\n ann_file: {self.ann_file}\n'
|
||||
s += f' return_masks: {self.return_masks}\n'
|
||||
if hasattr(self, '_transforms') and self._transforms is not None:
|
||||
s += f' transforms:\n {repr(self._transforms)}'
|
||||
|
||||
return s
|
||||
|
||||
|
||||
def convert_coco_poly_to_mask(segmentations, height, width):
|
||||
masks = []
|
||||
for polygons in segmentations:
|
||||
rles = coco_mask.frPyObjects(polygons, height, width)
|
||||
mask = coco_mask.decode(rles)
|
||||
if len(mask.shape) < 3:
|
||||
mask = mask[..., None]
|
||||
mask = torch.as_tensor(mask, dtype=torch.uint8)
|
||||
mask = mask.any(dim=2)
|
||||
masks.append(mask)
|
||||
if masks:
|
||||
masks = torch.stack(masks, dim=0)
|
||||
else:
|
||||
masks = torch.zeros((0, height, width), dtype=torch.uint8)
|
||||
return masks
|
||||
|
||||
|
||||
class ConvertCocoPolysToMask(object):
|
||||
def __init__(self, return_masks=False, remap_mscoco_category=False):
|
||||
self.return_masks = return_masks
|
||||
self.remap_mscoco_category = remap_mscoco_category
|
||||
|
||||
def __call__(self, image, target):
|
||||
w, h = image.size
|
||||
|
||||
image_id = target["image_id"]
|
||||
image_id = torch.tensor([image_id])
|
||||
|
||||
anno = target["annotations"]
|
||||
|
||||
anno = [obj for obj in anno if 'iscrowd' not in obj or obj['iscrowd'] == 0]
|
||||
|
||||
boxes = [obj["bbox"] for obj in anno]
|
||||
# guard against no boxes via resizing
|
||||
boxes = torch.as_tensor(boxes, dtype=torch.float32).reshape(-1, 4)
|
||||
boxes[:, 2:] += boxes[:, :2]
|
||||
boxes[:, 0::2].clamp_(min=0, max=w)
|
||||
boxes[:, 1::2].clamp_(min=0, max=h)
|
||||
|
||||
if self.remap_mscoco_category:
|
||||
classes = [mscoco_category2label[obj["category_id"]] for obj in anno]
|
||||
else:
|
||||
classes = [obj["category_id"] for obj in anno]
|
||||
|
||||
classes = torch.tensor(classes, dtype=torch.int64)
|
||||
|
||||
if self.return_masks:
|
||||
segmentations = [obj["segmentation"] for obj in anno]
|
||||
masks = convert_coco_poly_to_mask(segmentations, h, w)
|
||||
|
||||
keypoints = None
|
||||
if anno and "keypoints" in anno[0]:
|
||||
keypoints = [obj["keypoints"] for obj in anno]
|
||||
keypoints = torch.as_tensor(keypoints, dtype=torch.float32)
|
||||
num_keypoints = keypoints.shape[0]
|
||||
if num_keypoints:
|
||||
keypoints = keypoints.view(num_keypoints, -1, 3)
|
||||
|
||||
keep = (boxes[:, 3] > boxes[:, 1]) & (boxes[:, 2] > boxes[:, 0])
|
||||
boxes = boxes[keep]
|
||||
classes = classes[keep]
|
||||
if self.return_masks:
|
||||
masks = masks[keep]
|
||||
if keypoints is not None:
|
||||
keypoints = keypoints[keep]
|
||||
|
||||
target = {}
|
||||
target["boxes"] = boxes
|
||||
target["labels"] = classes
|
||||
if self.return_masks:
|
||||
target["masks"] = masks
|
||||
target["image_id"] = image_id
|
||||
if keypoints is not None:
|
||||
target["keypoints"] = keypoints
|
||||
|
||||
# for conversion to coco api
|
||||
area = torch.tensor([obj["area"] for obj in anno])
|
||||
iscrowd = torch.tensor([obj["iscrowd"] if "iscrowd" in obj else 0 for obj in anno])
|
||||
target["area"] = area[keep]
|
||||
target["iscrowd"] = iscrowd[keep]
|
||||
|
||||
target["orig_size"] = torch.as_tensor([int(w), int(h)])
|
||||
target["size"] = torch.as_tensor([int(w), int(h)])
|
||||
|
||||
return image, target
|
||||
|
||||
|
||||
mscoco_category2name = {
|
||||
1: 'person',
|
||||
2: 'bicycle',
|
||||
3: 'car',
|
||||
4: 'motorcycle',
|
||||
5: 'airplane',
|
||||
6: 'bus',
|
||||
7: 'train',
|
||||
8: 'truck',
|
||||
9: 'boat',
|
||||
10: 'traffic light',
|
||||
11: 'fire hydrant',
|
||||
13: 'stop sign',
|
||||
14: 'parking meter',
|
||||
15: 'bench',
|
||||
16: 'bird',
|
||||
17: 'cat',
|
||||
18: 'dog',
|
||||
19: 'horse',
|
||||
20: 'sheep',
|
||||
21: 'cow',
|
||||
22: 'elephant',
|
||||
23: 'bear',
|
||||
24: 'zebra',
|
||||
25: 'giraffe',
|
||||
27: 'backpack',
|
||||
28: 'umbrella',
|
||||
31: 'handbag',
|
||||
32: 'tie',
|
||||
33: 'suitcase',
|
||||
34: 'frisbee',
|
||||
35: 'skis',
|
||||
36: 'snowboard',
|
||||
37: 'sports ball',
|
||||
38: 'kite',
|
||||
39: 'baseball bat',
|
||||
40: 'baseball glove',
|
||||
41: 'skateboard',
|
||||
42: 'surfboard',
|
||||
43: 'tennis racket',
|
||||
44: 'bottle',
|
||||
46: 'wine glass',
|
||||
47: 'cup',
|
||||
48: 'fork',
|
||||
49: 'knife',
|
||||
50: 'spoon',
|
||||
51: 'bowl',
|
||||
52: 'banana',
|
||||
53: 'apple',
|
||||
54: 'sandwich',
|
||||
55: 'orange',
|
||||
56: 'broccoli',
|
||||
57: 'carrot',
|
||||
58: 'hot dog',
|
||||
59: 'pizza',
|
||||
60: 'donut',
|
||||
61: 'cake',
|
||||
62: 'chair',
|
||||
63: 'couch',
|
||||
64: 'potted plant',
|
||||
65: 'bed',
|
||||
67: 'dining table',
|
||||
70: 'toilet',
|
||||
72: 'tv',
|
||||
73: 'laptop',
|
||||
74: 'mouse',
|
||||
75: 'remote',
|
||||
76: 'keyboard',
|
||||
77: 'cell phone',
|
||||
78: 'microwave',
|
||||
79: 'oven',
|
||||
80: 'toaster',
|
||||
81: 'sink',
|
||||
82: 'refrigerator',
|
||||
84: 'book',
|
||||
85: 'clock',
|
||||
86: 'vase',
|
||||
87: 'scissors',
|
||||
88: 'teddy bear',
|
||||
89: 'hair drier',
|
||||
90: 'toothbrush'
|
||||
}
|
||||
|
||||
mscoco_category2label = {k: i for i, k in enumerate(mscoco_category2name.keys())}
|
||||
mscoco_label2category = {v: k for k, v in mscoco_category2label.items()}
|
||||
269
rtdetr_pytorch/src/data/coco/coco_eval.py
Normal file
269
rtdetr_pytorch/src/data/coco/coco_eval.py
Normal file
@@ -0,0 +1,269 @@
|
||||
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
|
||||
"""
|
||||
COCO evaluator that works in distributed mode.
|
||||
|
||||
Mostly copy-paste from https://github.com/pytorch/vision/blob/edfd5a7/references/detection/coco_eval.py
|
||||
The difference is that there is less copy-pasting from pycocotools
|
||||
in the end of the file, as python3 can suppress prints with contextlib
|
||||
"""
|
||||
import os
|
||||
import contextlib
|
||||
import copy
|
||||
import numpy as np
|
||||
import torch
|
||||
|
||||
from pycocotools.cocoeval import COCOeval
|
||||
from pycocotools.coco import COCO
|
||||
import pycocotools.mask as mask_util
|
||||
|
||||
from src.misc import dist
|
||||
|
||||
|
||||
__all__ = ['CocoEvaluator',]
|
||||
|
||||
|
||||
class CocoEvaluator(object):
|
||||
def __init__(self, coco_gt, iou_types):
|
||||
assert isinstance(iou_types, (list, tuple))
|
||||
coco_gt = copy.deepcopy(coco_gt)
|
||||
self.coco_gt = coco_gt
|
||||
|
||||
self.iou_types = iou_types
|
||||
self.coco_eval = {}
|
||||
for iou_type in iou_types:
|
||||
self.coco_eval[iou_type] = COCOeval(coco_gt, iouType=iou_type)
|
||||
|
||||
self.img_ids = []
|
||||
self.eval_imgs = {k: [] for k in iou_types}
|
||||
|
||||
def update(self, predictions):
|
||||
img_ids = list(np.unique(list(predictions.keys())))
|
||||
self.img_ids.extend(img_ids)
|
||||
|
||||
for iou_type in self.iou_types:
|
||||
results = self.prepare(predictions, iou_type)
|
||||
|
||||
# suppress pycocotools prints
|
||||
with open(os.devnull, 'w') as devnull:
|
||||
with contextlib.redirect_stdout(devnull):
|
||||
coco_dt = COCO.loadRes(self.coco_gt, results) if results else COCO()
|
||||
coco_eval = self.coco_eval[iou_type]
|
||||
|
||||
coco_eval.cocoDt = coco_dt
|
||||
coco_eval.params.imgIds = list(img_ids)
|
||||
img_ids, eval_imgs = evaluate(coco_eval)
|
||||
|
||||
self.eval_imgs[iou_type].append(eval_imgs)
|
||||
|
||||
def synchronize_between_processes(self):
|
||||
for iou_type in self.iou_types:
|
||||
self.eval_imgs[iou_type] = np.concatenate(self.eval_imgs[iou_type], 2)
|
||||
create_common_coco_eval(self.coco_eval[iou_type], self.img_ids, self.eval_imgs[iou_type])
|
||||
|
||||
def accumulate(self):
|
||||
for coco_eval in self.coco_eval.values():
|
||||
coco_eval.accumulate()
|
||||
|
||||
def summarize(self):
|
||||
for iou_type, coco_eval in self.coco_eval.items():
|
||||
print("IoU metric: {}".format(iou_type))
|
||||
coco_eval.summarize()
|
||||
|
||||
def prepare(self, predictions, iou_type):
|
||||
if iou_type == "bbox":
|
||||
return self.prepare_for_coco_detection(predictions)
|
||||
elif iou_type == "segm":
|
||||
return self.prepare_for_coco_segmentation(predictions)
|
||||
elif iou_type == "keypoints":
|
||||
return self.prepare_for_coco_keypoint(predictions)
|
||||
else:
|
||||
raise ValueError("Unknown iou type {}".format(iou_type))
|
||||
|
||||
def prepare_for_coco_detection(self, predictions):
|
||||
coco_results = []
|
||||
for original_id, prediction in predictions.items():
|
||||
if len(prediction) == 0:
|
||||
continue
|
||||
|
||||
boxes = prediction["boxes"]
|
||||
boxes = convert_to_xywh(boxes).tolist()
|
||||
scores = prediction["scores"].tolist()
|
||||
labels = prediction["labels"].tolist()
|
||||
|
||||
coco_results.extend(
|
||||
[
|
||||
{
|
||||
"image_id": original_id,
|
||||
"category_id": labels[k],
|
||||
"bbox": box,
|
||||
"score": scores[k],
|
||||
}
|
||||
for k, box in enumerate(boxes)
|
||||
]
|
||||
)
|
||||
return coco_results
|
||||
|
||||
def prepare_for_coco_segmentation(self, predictions):
|
||||
coco_results = []
|
||||
for original_id, prediction in predictions.items():
|
||||
if len(prediction) == 0:
|
||||
continue
|
||||
|
||||
scores = prediction["scores"]
|
||||
labels = prediction["labels"]
|
||||
masks = prediction["masks"]
|
||||
|
||||
masks = masks > 0.5
|
||||
|
||||
scores = prediction["scores"].tolist()
|
||||
labels = prediction["labels"].tolist()
|
||||
|
||||
rles = [
|
||||
mask_util.encode(np.array(mask[0, :, :, np.newaxis], dtype=np.uint8, order="F"))[0]
|
||||
for mask in masks
|
||||
]
|
||||
for rle in rles:
|
||||
rle["counts"] = rle["counts"].decode("utf-8")
|
||||
|
||||
coco_results.extend(
|
||||
[
|
||||
{
|
||||
"image_id": original_id,
|
||||
"category_id": labels[k],
|
||||
"segmentation": rle,
|
||||
"score": scores[k],
|
||||
}
|
||||
for k, rle in enumerate(rles)
|
||||
]
|
||||
)
|
||||
return coco_results
|
||||
|
||||
def prepare_for_coco_keypoint(self, predictions):
|
||||
coco_results = []
|
||||
for original_id, prediction in predictions.items():
|
||||
if len(prediction) == 0:
|
||||
continue
|
||||
|
||||
boxes = prediction["boxes"]
|
||||
boxes = convert_to_xywh(boxes).tolist()
|
||||
scores = prediction["scores"].tolist()
|
||||
labels = prediction["labels"].tolist()
|
||||
keypoints = prediction["keypoints"]
|
||||
keypoints = keypoints.flatten(start_dim=1).tolist()
|
||||
|
||||
coco_results.extend(
|
||||
[
|
||||
{
|
||||
"image_id": original_id,
|
||||
"category_id": labels[k],
|
||||
'keypoints': keypoint,
|
||||
"score": scores[k],
|
||||
}
|
||||
for k, keypoint in enumerate(keypoints)
|
||||
]
|
||||
)
|
||||
return coco_results
|
||||
|
||||
|
||||
def convert_to_xywh(boxes):
|
||||
xmin, ymin, xmax, ymax = boxes.unbind(1)
|
||||
return torch.stack((xmin, ymin, xmax - xmin, ymax - ymin), dim=1)
|
||||
|
||||
|
||||
def merge(img_ids, eval_imgs):
|
||||
all_img_ids = dist.all_gather(img_ids)
|
||||
all_eval_imgs = dist.all_gather(eval_imgs)
|
||||
|
||||
merged_img_ids = []
|
||||
for p in all_img_ids:
|
||||
merged_img_ids.extend(p)
|
||||
|
||||
merged_eval_imgs = []
|
||||
for p in all_eval_imgs:
|
||||
merged_eval_imgs.append(p)
|
||||
|
||||
merged_img_ids = np.array(merged_img_ids)
|
||||
merged_eval_imgs = np.concatenate(merged_eval_imgs, 2)
|
||||
|
||||
# keep only unique (and in sorted order) images
|
||||
merged_img_ids, idx = np.unique(merged_img_ids, return_index=True)
|
||||
merged_eval_imgs = merged_eval_imgs[..., idx]
|
||||
|
||||
return merged_img_ids, merged_eval_imgs
|
||||
|
||||
|
||||
def create_common_coco_eval(coco_eval, img_ids, eval_imgs):
|
||||
img_ids, eval_imgs = merge(img_ids, eval_imgs)
|
||||
img_ids = list(img_ids)
|
||||
eval_imgs = list(eval_imgs.flatten())
|
||||
|
||||
coco_eval.evalImgs = eval_imgs
|
||||
coco_eval.params.imgIds = img_ids
|
||||
coco_eval._paramsEval = copy.deepcopy(coco_eval.params)
|
||||
|
||||
|
||||
#################################################################
|
||||
# From pycocotools, just removed the prints and fixed
|
||||
# a Python3 bug about unicode not defined
|
||||
#################################################################
|
||||
|
||||
|
||||
# import io
|
||||
# from contextlib import redirect_stdout
|
||||
# def evaluate(imgs):
|
||||
# with redirect_stdout(io.StringIO()):
|
||||
# imgs.evaluate()
|
||||
# return imgs.params.imgIds, np.asarray(imgs.evalImgs).reshape(-1, len(imgs.params.areaRng), len(imgs.params.imgIds))
|
||||
|
||||
|
||||
def evaluate(self):
|
||||
'''
|
||||
Run per image evaluation on given images and store results (a list of dict) in self.evalImgs
|
||||
:return: None
|
||||
'''
|
||||
# tic = time.time()
|
||||
# print('Running per image evaluation...')
|
||||
p = self.params
|
||||
# add backward compatibility if useSegm is specified in params
|
||||
if p.useSegm is not None:
|
||||
p.iouType = 'segm' if p.useSegm == 1 else 'bbox'
|
||||
print('useSegm (deprecated) is not None. Running {} evaluation'.format(p.iouType))
|
||||
# print('Evaluate annotation type *{}*'.format(p.iouType))
|
||||
p.imgIds = list(np.unique(p.imgIds))
|
||||
if p.useCats:
|
||||
p.catIds = list(np.unique(p.catIds))
|
||||
p.maxDets = sorted(p.maxDets)
|
||||
self.params = p
|
||||
|
||||
self._prepare()
|
||||
# loop through images, area range, max detection number
|
||||
catIds = p.catIds if p.useCats else [-1]
|
||||
|
||||
if p.iouType == 'segm' or p.iouType == 'bbox':
|
||||
computeIoU = self.computeIoU
|
||||
elif p.iouType == 'keypoints':
|
||||
computeIoU = self.computeOks
|
||||
self.ious = {
|
||||
(imgId, catId): computeIoU(imgId, catId)
|
||||
for imgId in p.imgIds
|
||||
for catId in catIds}
|
||||
|
||||
evaluateImg = self.evaluateImg
|
||||
maxDet = p.maxDets[-1]
|
||||
evalImgs = [
|
||||
evaluateImg(imgId, catId, areaRng, maxDet)
|
||||
for catId in catIds
|
||||
for areaRng in p.areaRng
|
||||
for imgId in p.imgIds
|
||||
]
|
||||
# this is NOT in the pycocotools code, but could be done outside
|
||||
evalImgs = np.asarray(evalImgs).reshape(len(catIds), len(p.areaRng), len(p.imgIds))
|
||||
self._paramsEval = copy.deepcopy(self.params)
|
||||
# toc = time.time()
|
||||
# print('DONE (t={:0.2f}s).'.format(toc-tic))
|
||||
return p.imgIds, evalImgs
|
||||
|
||||
#################################################################
|
||||
# end of straight copy from pycocotools, just removing the prints
|
||||
#################################################################
|
||||
|
||||
184
rtdetr_pytorch/src/data/coco/coco_utils.py
Normal file
184
rtdetr_pytorch/src/data/coco/coco_utils.py
Normal file
@@ -0,0 +1,184 @@
|
||||
import os
|
||||
|
||||
import torch
|
||||
import torch.utils.data
|
||||
import torchvision
|
||||
from pycocotools import mask as coco_mask
|
||||
from pycocotools.coco import COCO
|
||||
|
||||
|
||||
def convert_coco_poly_to_mask(segmentations, height, width):
|
||||
masks = []
|
||||
for polygons in segmentations:
|
||||
rles = coco_mask.frPyObjects(polygons, height, width)
|
||||
mask = coco_mask.decode(rles)
|
||||
if len(mask.shape) < 3:
|
||||
mask = mask[..., None]
|
||||
mask = torch.as_tensor(mask, dtype=torch.uint8)
|
||||
mask = mask.any(dim=2)
|
||||
masks.append(mask)
|
||||
if masks:
|
||||
masks = torch.stack(masks, dim=0)
|
||||
else:
|
||||
masks = torch.zeros((0, height, width), dtype=torch.uint8)
|
||||
return masks
|
||||
|
||||
|
||||
class ConvertCocoPolysToMask:
|
||||
def __call__(self, image, target):
|
||||
w, h = image.size
|
||||
|
||||
image_id = target["image_id"]
|
||||
|
||||
anno = target["annotations"]
|
||||
|
||||
anno = [obj for obj in anno if obj["iscrowd"] == 0]
|
||||
|
||||
boxes = [obj["bbox"] for obj in anno]
|
||||
# guard against no boxes via resizing
|
||||
boxes = torch.as_tensor(boxes, dtype=torch.float32).reshape(-1, 4)
|
||||
boxes[:, 2:] += boxes[:, :2]
|
||||
boxes[:, 0::2].clamp_(min=0, max=w)
|
||||
boxes[:, 1::2].clamp_(min=0, max=h)
|
||||
|
||||
classes = [obj["category_id"] for obj in anno]
|
||||
classes = torch.tensor(classes, dtype=torch.int64)
|
||||
|
||||
segmentations = [obj["segmentation"] for obj in anno]
|
||||
masks = convert_coco_poly_to_mask(segmentations, h, w)
|
||||
|
||||
keypoints = None
|
||||
if anno and "keypoints" in anno[0]:
|
||||
keypoints = [obj["keypoints"] for obj in anno]
|
||||
keypoints = torch.as_tensor(keypoints, dtype=torch.float32)
|
||||
num_keypoints = keypoints.shape[0]
|
||||
if num_keypoints:
|
||||
keypoints = keypoints.view(num_keypoints, -1, 3)
|
||||
|
||||
keep = (boxes[:, 3] > boxes[:, 1]) & (boxes[:, 2] > boxes[:, 0])
|
||||
boxes = boxes[keep]
|
||||
classes = classes[keep]
|
||||
masks = masks[keep]
|
||||
if keypoints is not None:
|
||||
keypoints = keypoints[keep]
|
||||
|
||||
target = {}
|
||||
target["boxes"] = boxes
|
||||
target["labels"] = classes
|
||||
target["masks"] = masks
|
||||
target["image_id"] = image_id
|
||||
if keypoints is not None:
|
||||
target["keypoints"] = keypoints
|
||||
|
||||
# for conversion to coco api
|
||||
area = torch.tensor([obj["area"] for obj in anno])
|
||||
iscrowd = torch.tensor([obj["iscrowd"] for obj in anno])
|
||||
target["area"] = area
|
||||
target["iscrowd"] = iscrowd
|
||||
|
||||
return image, target
|
||||
|
||||
|
||||
def _coco_remove_images_without_annotations(dataset, cat_list=None):
|
||||
def _has_only_empty_bbox(anno):
|
||||
return all(any(o <= 1 for o in obj["bbox"][2:]) for obj in anno)
|
||||
|
||||
def _count_visible_keypoints(anno):
|
||||
return sum(sum(1 for v in ann["keypoints"][2::3] if v > 0) for ann in anno)
|
||||
|
||||
min_keypoints_per_image = 10
|
||||
|
||||
def _has_valid_annotation(anno):
|
||||
# if it's empty, there is no annotation
|
||||
if len(anno) == 0:
|
||||
return False
|
||||
# if all boxes have close to zero area, there is no annotation
|
||||
if _has_only_empty_bbox(anno):
|
||||
return False
|
||||
# keypoints task have a slight different criteria for considering
|
||||
# if an annotation is valid
|
||||
if "keypoints" not in anno[0]:
|
||||
return True
|
||||
# for keypoint detection tasks, only consider valid images those
|
||||
# containing at least min_keypoints_per_image
|
||||
if _count_visible_keypoints(anno) >= min_keypoints_per_image:
|
||||
return True
|
||||
return False
|
||||
|
||||
ids = []
|
||||
for ds_idx, img_id in enumerate(dataset.ids):
|
||||
ann_ids = dataset.coco.getAnnIds(imgIds=img_id, iscrowd=None)
|
||||
anno = dataset.coco.loadAnns(ann_ids)
|
||||
if cat_list:
|
||||
anno = [obj for obj in anno if obj["category_id"] in cat_list]
|
||||
if _has_valid_annotation(anno):
|
||||
ids.append(ds_idx)
|
||||
|
||||
dataset = torch.utils.data.Subset(dataset, ids)
|
||||
return dataset
|
||||
|
||||
|
||||
def convert_to_coco_api(ds):
|
||||
coco_ds = COCO()
|
||||
# annotation IDs need to start at 1, not 0, see torchvision issue #1530
|
||||
ann_id = 1
|
||||
dataset = {"images": [], "categories": [], "annotations": []}
|
||||
categories = set()
|
||||
for img_idx in range(len(ds)):
|
||||
# find better way to get target
|
||||
# targets = ds.get_annotations(img_idx)
|
||||
img, targets = ds[img_idx]
|
||||
image_id = targets["image_id"].item()
|
||||
img_dict = {}
|
||||
img_dict["id"] = image_id
|
||||
img_dict["height"] = img.shape[-2]
|
||||
img_dict["width"] = img.shape[-1]
|
||||
dataset["images"].append(img_dict)
|
||||
bboxes = targets["boxes"].clone()
|
||||
bboxes[:, 2:] -= bboxes[:, :2]
|
||||
bboxes = bboxes.tolist()
|
||||
labels = targets["labels"].tolist()
|
||||
areas = targets["area"].tolist()
|
||||
iscrowd = targets["iscrowd"].tolist()
|
||||
if "masks" in targets:
|
||||
masks = targets["masks"]
|
||||
# make masks Fortran contiguous for coco_mask
|
||||
masks = masks.permute(0, 2, 1).contiguous().permute(0, 2, 1)
|
||||
if "keypoints" in targets:
|
||||
keypoints = targets["keypoints"]
|
||||
keypoints = keypoints.reshape(keypoints.shape[0], -1).tolist()
|
||||
num_objs = len(bboxes)
|
||||
for i in range(num_objs):
|
||||
ann = {}
|
||||
ann["image_id"] = image_id
|
||||
ann["bbox"] = bboxes[i]
|
||||
ann["category_id"] = labels[i]
|
||||
categories.add(labels[i])
|
||||
ann["area"] = areas[i]
|
||||
ann["iscrowd"] = iscrowd[i]
|
||||
ann["id"] = ann_id
|
||||
if "masks" in targets:
|
||||
ann["segmentation"] = coco_mask.encode(masks[i].numpy())
|
||||
if "keypoints" in targets:
|
||||
ann["keypoints"] = keypoints[i]
|
||||
ann["num_keypoints"] = sum(k != 0 for k in keypoints[i][2::3])
|
||||
dataset["annotations"].append(ann)
|
||||
ann_id += 1
|
||||
dataset["categories"] = [{"id": i} for i in sorted(categories)]
|
||||
coco_ds.dataset = dataset
|
||||
coco_ds.createIndex()
|
||||
return coco_ds
|
||||
|
||||
|
||||
def get_coco_api_from_dataset(dataset):
|
||||
# FIXME: This is... awful?
|
||||
for _ in range(10):
|
||||
if isinstance(dataset, torchvision.datasets.CocoDetection):
|
||||
break
|
||||
if isinstance(dataset, torch.utils.data.Subset):
|
||||
dataset = dataset.dataset
|
||||
if isinstance(dataset, torchvision.datasets.CocoDetection):
|
||||
return dataset.coco
|
||||
return convert_to_coco_api(dataset)
|
||||
|
||||
|
||||
28
rtdetr_pytorch/src/data/dataloader.py
Normal file
28
rtdetr_pytorch/src/data/dataloader.py
Normal file
@@ -0,0 +1,28 @@
|
||||
import torch
|
||||
import torch.utils.data as data
|
||||
|
||||
from src.core import register
|
||||
|
||||
|
||||
__all__ = ['DataLoader']
|
||||
|
||||
|
||||
@register
|
||||
class DataLoader(data.DataLoader):
|
||||
__inject__ = ['dataset', 'collate_fn']
|
||||
|
||||
def __repr__(self) -> str:
|
||||
format_string = self.__class__.__name__ + "("
|
||||
for n in ['dataset', 'batch_size', 'num_workers', 'drop_last', 'collate_fn']:
|
||||
format_string += "\n"
|
||||
format_string += " {0}: {1}".format(n, getattr(self, n))
|
||||
format_string += "\n)"
|
||||
return format_string
|
||||
|
||||
|
||||
|
||||
@register
|
||||
def default_collate_fn(items):
|
||||
'''default collate_fn
|
||||
'''
|
||||
return torch.cat([x[0][None] for x in items], dim=0), [x[1] for x in items]
|
||||
169
rtdetr_pytorch/src/data/functional.py
Normal file
169
rtdetr_pytorch/src/data/functional.py
Normal file
@@ -0,0 +1,169 @@
|
||||
import torch
|
||||
import torchvision.transforms.functional as F
|
||||
|
||||
from packaging import version
|
||||
from typing import Optional, List
|
||||
from torch import Tensor
|
||||
|
||||
# needed due to empty tensor bug in pytorch and torchvision 0.5
|
||||
import torchvision
|
||||
if version.parse(torchvision.__version__) < version.parse('0.7'):
|
||||
from torchvision.ops import _new_empty_tensor
|
||||
from torchvision.ops.misc import _output_size
|
||||
|
||||
|
||||
def interpolate(input, size=None, scale_factor=None, mode="nearest", align_corners=None):
|
||||
# type: (Tensor, Optional[List[int]], Optional[float], str, Optional[bool]) -> Tensor
|
||||
"""
|
||||
Equivalent to nn.functional.interpolate, but with support for empty batch sizes.
|
||||
This will eventually be supported natively by PyTorch, and this
|
||||
class can go away.
|
||||
"""
|
||||
if version.parse(torchvision.__version__) < version.parse('0.7'):
|
||||
if input.numel() > 0:
|
||||
return torch.nn.functional.interpolate(
|
||||
input, size, scale_factor, mode, align_corners
|
||||
)
|
||||
|
||||
output_shape = _output_size(2, input, size, scale_factor)
|
||||
output_shape = list(input.shape[:-2]) + list(output_shape)
|
||||
return _new_empty_tensor(input, output_shape)
|
||||
else:
|
||||
return torchvision.ops.misc.interpolate(input, size, scale_factor, mode, align_corners)
|
||||
|
||||
|
||||
|
||||
def crop(image, target, region):
|
||||
cropped_image = F.crop(image, *region)
|
||||
|
||||
target = target.copy()
|
||||
i, j, h, w = region
|
||||
|
||||
# should we do something wrt the original size?
|
||||
target["size"] = torch.tensor([h, w])
|
||||
|
||||
fields = ["labels", "area", "iscrowd"]
|
||||
|
||||
if "boxes" in target:
|
||||
boxes = target["boxes"]
|
||||
max_size = torch.as_tensor([w, h], dtype=torch.float32)
|
||||
cropped_boxes = boxes - torch.as_tensor([j, i, j, i])
|
||||
cropped_boxes = torch.min(cropped_boxes.reshape(-1, 2, 2), max_size)
|
||||
cropped_boxes = cropped_boxes.clamp(min=0)
|
||||
area = (cropped_boxes[:, 1, :] - cropped_boxes[:, 0, :]).prod(dim=1)
|
||||
target["boxes"] = cropped_boxes.reshape(-1, 4)
|
||||
target["area"] = area
|
||||
fields.append("boxes")
|
||||
|
||||
if "masks" in target:
|
||||
# FIXME should we update the area here if there are no boxes?
|
||||
target['masks'] = target['masks'][:, i:i + h, j:j + w]
|
||||
fields.append("masks")
|
||||
|
||||
# remove elements for which the boxes or masks that have zero area
|
||||
if "boxes" in target or "masks" in target:
|
||||
# favor boxes selection when defining which elements to keep
|
||||
# this is compatible with previous implementation
|
||||
if "boxes" in target:
|
||||
cropped_boxes = target['boxes'].reshape(-1, 2, 2)
|
||||
keep = torch.all(cropped_boxes[:, 1, :] > cropped_boxes[:, 0, :], dim=1)
|
||||
else:
|
||||
keep = target['masks'].flatten(1).any(1)
|
||||
|
||||
for field in fields:
|
||||
target[field] = target[field][keep]
|
||||
|
||||
return cropped_image, target
|
||||
|
||||
|
||||
def hflip(image, target):
|
||||
flipped_image = F.hflip(image)
|
||||
|
||||
w, h = image.size
|
||||
|
||||
target = target.copy()
|
||||
if "boxes" in target:
|
||||
boxes = target["boxes"]
|
||||
boxes = boxes[:, [2, 1, 0, 3]] * torch.as_tensor([-1, 1, -1, 1]) + torch.as_tensor([w, 0, w, 0])
|
||||
target["boxes"] = boxes
|
||||
|
||||
if "masks" in target:
|
||||
target['masks'] = target['masks'].flip(-1)
|
||||
|
||||
return flipped_image, target
|
||||
|
||||
|
||||
def resize(image, target, size, max_size=None):
|
||||
# size can be min_size (scalar) or (w, h) tuple
|
||||
|
||||
def get_size_with_aspect_ratio(image_size, size, max_size=None):
|
||||
w, h = image_size
|
||||
if max_size is not None:
|
||||
min_original_size = float(min((w, h)))
|
||||
max_original_size = float(max((w, h)))
|
||||
if max_original_size / min_original_size * size > max_size:
|
||||
size = int(round(max_size * min_original_size / max_original_size))
|
||||
|
||||
if (w <= h and w == size) or (h <= w and h == size):
|
||||
return (h, w)
|
||||
|
||||
if w < h:
|
||||
ow = size
|
||||
oh = int(size * h / w)
|
||||
else:
|
||||
oh = size
|
||||
ow = int(size * w / h)
|
||||
|
||||
# r = min(size / min(h, w), max_size / max(h, w))
|
||||
# ow = int(w * r)
|
||||
# oh = int(h * r)
|
||||
|
||||
return (oh, ow)
|
||||
|
||||
def get_size(image_size, size, max_size=None):
|
||||
if isinstance(size, (list, tuple)):
|
||||
return size[::-1]
|
||||
else:
|
||||
return get_size_with_aspect_ratio(image_size, size, max_size)
|
||||
|
||||
size = get_size(image.size, size, max_size)
|
||||
rescaled_image = F.resize(image, size)
|
||||
|
||||
if target is None:
|
||||
return rescaled_image, None
|
||||
|
||||
ratios = tuple(float(s) / float(s_orig) for s, s_orig in zip(rescaled_image.size, image.size))
|
||||
ratio_width, ratio_height = ratios
|
||||
|
||||
target = target.copy()
|
||||
if "boxes" in target:
|
||||
boxes = target["boxes"]
|
||||
scaled_boxes = boxes * torch.as_tensor([ratio_width, ratio_height, ratio_width, ratio_height])
|
||||
target["boxes"] = scaled_boxes
|
||||
|
||||
if "area" in target:
|
||||
area = target["area"]
|
||||
scaled_area = area * (ratio_width * ratio_height)
|
||||
target["area"] = scaled_area
|
||||
|
||||
h, w = size
|
||||
target["size"] = torch.tensor([h, w])
|
||||
|
||||
if "masks" in target:
|
||||
target['masks'] = interpolate(
|
||||
target['masks'][:, None].float(), size, mode="nearest")[:, 0] > 0.5
|
||||
|
||||
return rescaled_image, target
|
||||
|
||||
|
||||
def pad(image, target, padding):
|
||||
# assumes that we only pad on the bottom right corners
|
||||
padded_image = F.pad(image, (0, 0, padding[0], padding[1]))
|
||||
if target is None:
|
||||
return padded_image, None
|
||||
target = target.copy()
|
||||
# should we do something wrt the original size?
|
||||
target["size"] = torch.tensor(padded_image.size[::-1])
|
||||
if "masks" in target:
|
||||
target['masks'] = torch.nn.functional.pad(target['masks'], (0, padding[0], 0, padding[1]))
|
||||
return padded_image, target
|
||||
150
rtdetr_pytorch/src/data/transforms.py
Normal file
150
rtdetr_pytorch/src/data/transforms.py
Normal file
@@ -0,0 +1,150 @@
|
||||
""""by lyuwenyu
|
||||
"""
|
||||
|
||||
|
||||
import torch
|
||||
import torch.nn as nn
|
||||
|
||||
import torchvision
|
||||
torchvision.disable_beta_transforms_warning()
|
||||
from torchvision import datapoints
|
||||
|
||||
import torchvision.transforms.v2 as T
|
||||
import torchvision.transforms.v2.functional as F
|
||||
|
||||
from PIL import Image
|
||||
from typing import Any, Dict, List, Optional
|
||||
|
||||
from src.core import register, GLOBAL_CONFIG
|
||||
|
||||
|
||||
__all__ = ['Compose', ]
|
||||
|
||||
|
||||
RandomPhotometricDistort = register(T.RandomPhotometricDistort)
|
||||
RandomZoomOut = register(T.RandomZoomOut)
|
||||
# RandomIoUCrop = register(T.RandomIoUCrop)
|
||||
RandomHorizontalFlip = register(T.RandomHorizontalFlip)
|
||||
Resize = register(T.Resize)
|
||||
ToImageTensor = register(T.ToImageTensor)
|
||||
ConvertDtype = register(T.ConvertDtype)
|
||||
SanitizeBoundingBox = register(T.SanitizeBoundingBox)
|
||||
RandomCrop = register(T.RandomCrop)
|
||||
Normalize = register(T.Normalize)
|
||||
|
||||
|
||||
|
||||
@register
|
||||
class Compose(T.Compose):
|
||||
def __init__(self, ops) -> None:
|
||||
transforms = []
|
||||
if ops is not None:
|
||||
for op in ops:
|
||||
if isinstance(op, dict):
|
||||
name = op.pop('type')
|
||||
transfom = getattr(GLOBAL_CONFIG[name]['_pymodule'], name)(**op)
|
||||
transforms.append(transfom)
|
||||
# op['type'] = name
|
||||
elif isinstance(op, nn.Module):
|
||||
transforms.append(op)
|
||||
|
||||
else:
|
||||
raise ValueError('')
|
||||
else:
|
||||
transforms =[EmptyTransform(), ]
|
||||
|
||||
super().__init__(transforms=transforms)
|
||||
|
||||
|
||||
@register
|
||||
class EmptyTransform(T.Transform):
|
||||
def __init__(self, ) -> None:
|
||||
super().__init__()
|
||||
|
||||
def forward(self, *inputs):
|
||||
inputs = inputs if len(inputs) > 1 else inputs[0]
|
||||
return inputs
|
||||
|
||||
|
||||
@register
|
||||
class PadToSize(T.Pad):
|
||||
_transformed_types = (
|
||||
Image.Image,
|
||||
datapoints.Image,
|
||||
datapoints.Video,
|
||||
datapoints.Mask,
|
||||
datapoints.BoundingBox,
|
||||
)
|
||||
def _get_params(self, flat_inputs: List[Any]) -> Dict[str, Any]:
|
||||
sz = F.get_spatial_size(flat_inputs[0])
|
||||
h, w = self.spatial_size[0] - sz[0], self.spatial_size[1] - sz[1]
|
||||
self.padding = [0, 0, w, h]
|
||||
return dict(padding=self.padding)
|
||||
|
||||
def make_params(self, flat_inputs: List[Any]) -> Dict[str, Any]:
|
||||
return self._get_params(flat_inputs)
|
||||
|
||||
def __init__(self, spatial_size, fill=0, padding_mode='constant') -> None:
|
||||
if isinstance(spatial_size, int):
|
||||
spatial_size = (spatial_size, spatial_size)
|
||||
|
||||
self.spatial_size = spatial_size
|
||||
super().__init__(0, fill, padding_mode)
|
||||
|
||||
def _transform(self, inpt: Any, params: Dict[str, Any]) -> Any:
|
||||
fill = self._fill[type(inpt)]
|
||||
padding = params['padding']
|
||||
return F.pad(inpt, padding=padding, fill=fill, padding_mode=self.padding_mode) # type: ignore[arg-type]
|
||||
|
||||
def transform(self, inpt: Any, params: Dict[str, Any]) -> Any:
|
||||
return self._transform(inpt, params)
|
||||
|
||||
def __call__(self, *inputs: Any) -> Any:
|
||||
outputs = super().forward(*inputs)
|
||||
if len(outputs) > 1 and isinstance(outputs[1], dict):
|
||||
outputs[1]['padding'] = torch.tensor(self.padding)
|
||||
return outputs
|
||||
|
||||
|
||||
@register
|
||||
class RandomIoUCrop(T.RandomIoUCrop):
|
||||
def __init__(self, min_scale: float = 0.3, max_scale: float = 1, min_aspect_ratio: float = 0.5, max_aspect_ratio: float = 2, sampler_options: Optional[List[float]] = None, trials: int = 40, p: float = 1.0):
|
||||
super().__init__(min_scale, max_scale, min_aspect_ratio, max_aspect_ratio, sampler_options, trials)
|
||||
self.p = p
|
||||
|
||||
def __call__(self, *inputs: Any) -> Any:
|
||||
if torch.rand(1) >= self.p:
|
||||
return inputs if len(inputs) > 1 else inputs[0]
|
||||
|
||||
return super().forward(*inputs)
|
||||
|
||||
|
||||
@register
|
||||
class ConvertBox(T.Transform):
|
||||
_transformed_types = (
|
||||
datapoints.BoundingBox,
|
||||
)
|
||||
def __init__(self, out_fmt='', normalize=False) -> None:
|
||||
super().__init__()
|
||||
self.out_fmt = out_fmt
|
||||
self.normalize = normalize
|
||||
|
||||
self.data_fmt = {
|
||||
'xyxy': datapoints.BoundingBoxFormat.XYXY,
|
||||
'cxcywh': datapoints.BoundingBoxFormat.CXCYWH
|
||||
}
|
||||
|
||||
def _transform(self, inpt: Any, params: Dict[str, Any]) -> Any:
|
||||
if self.out_fmt:
|
||||
spatial_size = inpt.spatial_size
|
||||
in_fmt = inpt.format.value.lower()
|
||||
inpt = torchvision.ops.box_convert(inpt, in_fmt=in_fmt, out_fmt=self.out_fmt)
|
||||
inpt = datapoints.BoundingBox(inpt, format=self.data_fmt[self.out_fmt], spatial_size=spatial_size)
|
||||
|
||||
if self.normalize:
|
||||
inpt = inpt / torch.tensor(inpt.spatial_size[::-1]).tile(2)[None]
|
||||
|
||||
return inpt
|
||||
|
||||
def transform(self, inpt: Any, params: Dict[str, Any]) -> Any:
|
||||
return self._transform(inpt, params)
|
||||
3
rtdetr_pytorch/src/misc/__init__.py
Normal file
3
rtdetr_pytorch/src/misc/__init__.py
Normal file
@@ -0,0 +1,3 @@
|
||||
|
||||
from .logger import *
|
||||
from .visualizer import *
|
||||
189
rtdetr_pytorch/src/misc/dist.py
Normal file
189
rtdetr_pytorch/src/misc/dist.py
Normal file
@@ -0,0 +1,189 @@
|
||||
"""
|
||||
reference
|
||||
- https://github.com/pytorch/vision/blob/main/references/detection/utils.py
|
||||
- https://github.com/facebookresearch/detr/blob/master/util/misc.py#L406
|
||||
|
||||
by lyuwenyu
|
||||
"""
|
||||
|
||||
import random
|
||||
import numpy as np
|
||||
|
||||
import torch
|
||||
import torch.nn as nn
|
||||
import torch.distributed
|
||||
import torch.distributed as tdist
|
||||
|
||||
from torch.nn.parallel import DistributedDataParallel as DDP
|
||||
|
||||
from torch.utils.data import DistributedSampler
|
||||
from torch.utils.data.dataloader import DataLoader
|
||||
|
||||
|
||||
def init_distributed():
|
||||
'''
|
||||
distributed setup
|
||||
args:
|
||||
backend (str), ('nccl', 'gloo')
|
||||
'''
|
||||
try:
|
||||
# # https://pytorch.org/docs/stable/elastic/run.html
|
||||
# LOCAL_RANK = int(os.getenv('LOCAL_RANK', -1))
|
||||
# RANK = int(os.getenv('RANK', -1))
|
||||
# WORLD_SIZE = int(os.getenv('WORLD_SIZE', 1))
|
||||
|
||||
tdist.init_process_group(init_method='env://', )
|
||||
torch.distributed.barrier()
|
||||
|
||||
rank = get_rank()
|
||||
device = torch.device(f'cuda:{rank}')
|
||||
torch.cuda.set_device(device)
|
||||
|
||||
setup_print(rank == 0)
|
||||
print('Initialized distributed mode...')
|
||||
|
||||
return True
|
||||
|
||||
except:
|
||||
print('Not init distributed mode.')
|
||||
return False
|
||||
|
||||
|
||||
def setup_print(is_main):
|
||||
'''This function disables printing when not in master process
|
||||
'''
|
||||
import builtins as __builtin__
|
||||
builtin_print = __builtin__.print
|
||||
|
||||
def print(*args, **kwargs):
|
||||
force = kwargs.pop('force', False)
|
||||
if is_main or force:
|
||||
builtin_print(*args, **kwargs)
|
||||
|
||||
__builtin__.print = print
|
||||
|
||||
|
||||
def is_dist_available_and_initialized():
|
||||
if not tdist.is_available():
|
||||
return False
|
||||
if not tdist.is_initialized():
|
||||
return False
|
||||
return True
|
||||
|
||||
|
||||
def get_rank():
|
||||
if not is_dist_available_and_initialized():
|
||||
return 0
|
||||
return tdist.get_rank()
|
||||
|
||||
|
||||
def get_world_size():
|
||||
if not is_dist_available_and_initialized():
|
||||
return 1
|
||||
return tdist.get_world_size()
|
||||
|
||||
|
||||
def is_main_process():
|
||||
return get_rank() == 0
|
||||
|
||||
|
||||
def save_on_master(*args, **kwargs):
|
||||
if is_main_process():
|
||||
torch.save(*args, **kwargs)
|
||||
|
||||
|
||||
|
||||
def warp_model(model, find_unused_parameters=False, sync_bn=False,):
|
||||
if is_dist_available_and_initialized():
|
||||
rank = get_rank()
|
||||
model = nn.SyncBatchNorm.convert_sync_batchnorm(model) if sync_bn else model
|
||||
model = DDP(model, device_ids=[rank], output_device=rank, find_unused_parameters=find_unused_parameters)
|
||||
return model
|
||||
|
||||
|
||||
def warp_loader(loader, shuffle=False):
|
||||
if is_dist_available_and_initialized():
|
||||
sampler = DistributedSampler(loader.dataset, shuffle=shuffle)
|
||||
loader = DataLoader(loader.dataset,
|
||||
loader.batch_size,
|
||||
sampler=sampler,
|
||||
drop_last=loader.drop_last,
|
||||
collate_fn=loader.collate_fn,
|
||||
pin_memory=loader.pin_memory,
|
||||
num_workers=loader.num_workers, )
|
||||
return loader
|
||||
|
||||
|
||||
|
||||
def is_parallel(model) -> bool:
|
||||
# Returns True if model is of type DP or DDP
|
||||
return type(model) in (torch.nn.parallel.DataParallel, torch.nn.parallel.DistributedDataParallel)
|
||||
|
||||
|
||||
def de_parallel(model) -> nn.Module:
|
||||
# De-parallelize a model: returns single-GPU model if model is of type DP or DDP
|
||||
return model.module if is_parallel(model) else model
|
||||
|
||||
|
||||
def reduce_dict(data, avg=True):
|
||||
'''
|
||||
Args
|
||||
data dict: input, {k: v, ...}
|
||||
avg bool: true
|
||||
'''
|
||||
world_size = get_world_size()
|
||||
if world_size < 2:
|
||||
return data
|
||||
|
||||
with torch.no_grad():
|
||||
keys, values = [], []
|
||||
for k in sorted(data.keys()):
|
||||
keys.append(k)
|
||||
values.append(data[k])
|
||||
|
||||
values = torch.stack(values, dim=0)
|
||||
tdist.all_reduce(values)
|
||||
|
||||
if avg is True:
|
||||
values /= world_size
|
||||
|
||||
_data = {k: v for k, v in zip(keys, values)}
|
||||
|
||||
return _data
|
||||
|
||||
|
||||
|
||||
def all_gather(data):
|
||||
"""
|
||||
Run all_gather on arbitrary picklable data (not necessarily tensors)
|
||||
Args:
|
||||
data: any picklable object
|
||||
Returns:
|
||||
list[data]: list of data gathered from each rank
|
||||
"""
|
||||
world_size = get_world_size()
|
||||
if world_size == 1:
|
||||
return [data]
|
||||
data_list = [None] * world_size
|
||||
tdist.all_gather_object(data_list, data)
|
||||
return data_list
|
||||
|
||||
|
||||
import time
|
||||
def sync_time():
|
||||
'''sync_time
|
||||
'''
|
||||
if torch.cuda.is_available():
|
||||
torch.cuda.synchronize()
|
||||
|
||||
return time.time()
|
||||
|
||||
|
||||
|
||||
def set_seed(seed):
|
||||
# fix the seed for reproducibility
|
||||
torch.manual_seed(seed)
|
||||
np.random.seed(seed)
|
||||
random.seed(seed)
|
||||
|
||||
|
||||
239
rtdetr_pytorch/src/misc/logger.py
Normal file
239
rtdetr_pytorch/src/misc/logger.py
Normal file
@@ -0,0 +1,239 @@
|
||||
"""
|
||||
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
|
||||
https://github.com/facebookresearch/detr/blob/main/util/misc.py
|
||||
Mostly copy-paste from torchvision references.
|
||||
"""
|
||||
|
||||
import time
|
||||
import pickle
|
||||
import datetime
|
||||
from collections import defaultdict, deque
|
||||
from typing import Dict
|
||||
|
||||
import torch
|
||||
import torch.distributed as tdist
|
||||
|
||||
from .dist import is_dist_available_and_initialized, get_world_size
|
||||
|
||||
|
||||
class SmoothedValue(object):
|
||||
"""Track a series of values and provide access to smoothed values over a
|
||||
window or the global series average.
|
||||
"""
|
||||
|
||||
def __init__(self, window_size=20, fmt=None):
|
||||
if fmt is None:
|
||||
fmt = "{median:.4f} ({global_avg:.4f})"
|
||||
self.deque = deque(maxlen=window_size)
|
||||
self.total = 0.0
|
||||
self.count = 0
|
||||
self.fmt = fmt
|
||||
|
||||
def update(self, value, n=1):
|
||||
self.deque.append(value)
|
||||
self.count += n
|
||||
self.total += value * n
|
||||
|
||||
def synchronize_between_processes(self):
|
||||
"""
|
||||
Warning: does not synchronize the deque!
|
||||
"""
|
||||
if not is_dist_available_and_initialized():
|
||||
return
|
||||
t = torch.tensor([self.count, self.total], dtype=torch.float64, device='cuda')
|
||||
tdist.barrier()
|
||||
tdist.all_reduce(t)
|
||||
t = t.tolist()
|
||||
self.count = int(t[0])
|
||||
self.total = t[1]
|
||||
|
||||
@property
|
||||
def median(self):
|
||||
d = torch.tensor(list(self.deque))
|
||||
return d.median().item()
|
||||
|
||||
@property
|
||||
def avg(self):
|
||||
d = torch.tensor(list(self.deque), dtype=torch.float32)
|
||||
return d.mean().item()
|
||||
|
||||
@property
|
||||
def global_avg(self):
|
||||
return self.total / self.count
|
||||
|
||||
@property
|
||||
def max(self):
|
||||
return max(self.deque)
|
||||
|
||||
@property
|
||||
def value(self):
|
||||
return self.deque[-1]
|
||||
|
||||
def __str__(self):
|
||||
return self.fmt.format(
|
||||
median=self.median,
|
||||
avg=self.avg,
|
||||
global_avg=self.global_avg,
|
||||
max=self.max,
|
||||
value=self.value)
|
||||
|
||||
|
||||
def all_gather(data):
|
||||
"""
|
||||
Run all_gather on arbitrary picklable data (not necessarily tensors)
|
||||
Args:
|
||||
data: any picklable object
|
||||
Returns:
|
||||
list[data]: list of data gathered from each rank
|
||||
"""
|
||||
world_size = get_world_size()
|
||||
if world_size == 1:
|
||||
return [data]
|
||||
|
||||
# serialized to a Tensor
|
||||
buffer = pickle.dumps(data)
|
||||
storage = torch.ByteStorage.from_buffer(buffer)
|
||||
tensor = torch.ByteTensor(storage).to("cuda")
|
||||
|
||||
# obtain Tensor size of each rank
|
||||
local_size = torch.tensor([tensor.numel()], device="cuda")
|
||||
size_list = [torch.tensor([0], device="cuda") for _ in range(world_size)]
|
||||
tdist.all_gather(size_list, local_size)
|
||||
size_list = [int(size.item()) for size in size_list]
|
||||
max_size = max(size_list)
|
||||
|
||||
# receiving Tensor from all ranks
|
||||
# we pad the tensor because torch all_gather does not support
|
||||
# gathering tensors of different shapes
|
||||
tensor_list = []
|
||||
for _ in size_list:
|
||||
tensor_list.append(torch.empty((max_size,), dtype=torch.uint8, device="cuda"))
|
||||
if local_size != max_size:
|
||||
padding = torch.empty(size=(max_size - local_size,), dtype=torch.uint8, device="cuda")
|
||||
tensor = torch.cat((tensor, padding), dim=0)
|
||||
tdist.all_gather(tensor_list, tensor)
|
||||
|
||||
data_list = []
|
||||
for size, tensor in zip(size_list, tensor_list):
|
||||
buffer = tensor.cpu().numpy().tobytes()[:size]
|
||||
data_list.append(pickle.loads(buffer))
|
||||
|
||||
return data_list
|
||||
|
||||
|
||||
def reduce_dict(input_dict, average=True) -> Dict[str, torch.Tensor]:
|
||||
"""
|
||||
Args:
|
||||
input_dict (dict): all the values will be reduced
|
||||
average (bool): whether to do average or sum
|
||||
Reduce the values in the dictionary from all processes so that all processes
|
||||
have the averaged results. Returns a dict with the same fields as
|
||||
input_dict, after reduction.
|
||||
"""
|
||||
world_size = get_world_size()
|
||||
if world_size < 2:
|
||||
return input_dict
|
||||
with torch.no_grad():
|
||||
names = []
|
||||
values = []
|
||||
# sort the keys so that they are consistent across processes
|
||||
for k in sorted(input_dict.keys()):
|
||||
names.append(k)
|
||||
values.append(input_dict[k])
|
||||
values = torch.stack(values, dim=0)
|
||||
tdist.all_reduce(values)
|
||||
if average:
|
||||
values /= world_size
|
||||
reduced_dict = {k: v for k, v in zip(names, values)}
|
||||
return reduced_dict
|
||||
|
||||
|
||||
class MetricLogger(object):
|
||||
def __init__(self, delimiter="\t"):
|
||||
self.meters = defaultdict(SmoothedValue)
|
||||
self.delimiter = delimiter
|
||||
|
||||
def update(self, **kwargs):
|
||||
for k, v in kwargs.items():
|
||||
if isinstance(v, torch.Tensor):
|
||||
v = v.item()
|
||||
assert isinstance(v, (float, int))
|
||||
self.meters[k].update(v)
|
||||
|
||||
def __getattr__(self, attr):
|
||||
if attr in self.meters:
|
||||
return self.meters[attr]
|
||||
if attr in self.__dict__:
|
||||
return self.__dict__[attr]
|
||||
raise AttributeError("'{}' object has no attribute '{}'".format(
|
||||
type(self).__name__, attr))
|
||||
|
||||
def __str__(self):
|
||||
loss_str = []
|
||||
for name, meter in self.meters.items():
|
||||
loss_str.append(
|
||||
"{}: {}".format(name, str(meter))
|
||||
)
|
||||
return self.delimiter.join(loss_str)
|
||||
|
||||
def synchronize_between_processes(self):
|
||||
for meter in self.meters.values():
|
||||
meter.synchronize_between_processes()
|
||||
|
||||
def add_meter(self, name, meter):
|
||||
self.meters[name] = meter
|
||||
|
||||
def log_every(self, iterable, print_freq, header=None):
|
||||
i = 0
|
||||
if not header:
|
||||
header = ''
|
||||
start_time = time.time()
|
||||
end = time.time()
|
||||
iter_time = SmoothedValue(fmt='{avg:.4f}')
|
||||
data_time = SmoothedValue(fmt='{avg:.4f}')
|
||||
space_fmt = ':' + str(len(str(len(iterable)))) + 'd'
|
||||
if torch.cuda.is_available():
|
||||
log_msg = self.delimiter.join([
|
||||
header,
|
||||
'[{0' + space_fmt + '}/{1}]',
|
||||
'eta: {eta}',
|
||||
'{meters}',
|
||||
'time: {time}',
|
||||
'data: {data}',
|
||||
'max mem: {memory:.0f}'
|
||||
])
|
||||
else:
|
||||
log_msg = self.delimiter.join([
|
||||
header,
|
||||
'[{0' + space_fmt + '}/{1}]',
|
||||
'eta: {eta}',
|
||||
'{meters}',
|
||||
'time: {time}',
|
||||
'data: {data}'
|
||||
])
|
||||
MB = 1024.0 * 1024.0
|
||||
for obj in iterable:
|
||||
data_time.update(time.time() - end)
|
||||
yield obj
|
||||
iter_time.update(time.time() - end)
|
||||
if i % print_freq == 0 or i == len(iterable) - 1:
|
||||
eta_seconds = iter_time.global_avg * (len(iterable) - i)
|
||||
eta_string = str(datetime.timedelta(seconds=int(eta_seconds)))
|
||||
if torch.cuda.is_available():
|
||||
print(log_msg.format(
|
||||
i, len(iterable), eta=eta_string,
|
||||
meters=str(self),
|
||||
time=str(iter_time), data=str(data_time),
|
||||
memory=torch.cuda.max_memory_allocated() / MB))
|
||||
else:
|
||||
print(log_msg.format(
|
||||
i, len(iterable), eta=eta_string,
|
||||
meters=str(self),
|
||||
time=str(iter_time), data=str(data_time)))
|
||||
i += 1
|
||||
end = time.time()
|
||||
total_time = time.time() - start_time
|
||||
total_time_str = str(datetime.timedelta(seconds=int(total_time)))
|
||||
print('{} Total time: {} ({:.4f} s / it)'.format(
|
||||
header, total_time_str, total_time / len(iterable)))
|
||||
|
||||
34
rtdetr_pytorch/src/misc/visualizer.py
Normal file
34
rtdetr_pytorch/src/misc/visualizer.py
Normal file
@@ -0,0 +1,34 @@
|
||||
""""by lyuwenyu
|
||||
"""
|
||||
|
||||
import torch
|
||||
import torch.utils.data
|
||||
|
||||
import torchvision
|
||||
torchvision.disable_beta_transforms_warning()
|
||||
|
||||
import PIL
|
||||
|
||||
__all__ = ['show_sample']
|
||||
|
||||
def show_sample(sample):
|
||||
"""for coco dataset/dataloader
|
||||
"""
|
||||
import matplotlib.pyplot as plt
|
||||
from torchvision.transforms.v2 import functional as F
|
||||
from torchvision.utils import draw_bounding_boxes
|
||||
|
||||
image, target = sample
|
||||
if isinstance(image, PIL.Image.Image):
|
||||
image = F.to_image_tensor(image)
|
||||
|
||||
image = F.convert_dtype(image, torch.uint8)
|
||||
annotated_image = draw_bounding_boxes(image, target["boxes"], colors="yellow", width=3)
|
||||
|
||||
fig, ax = plt.subplots()
|
||||
ax.imshow(annotated_image.permute(1, 2, 0).numpy())
|
||||
ax.set(xticklabels=[], yticklabels=[], xticks=[], yticks=[])
|
||||
fig.tight_layout()
|
||||
fig.show()
|
||||
plt.show()
|
||||
|
||||
7
rtdetr_pytorch/src/nn/__init__.py
Normal file
7
rtdetr_pytorch/src/nn/__init__.py
Normal file
@@ -0,0 +1,7 @@
|
||||
|
||||
from .arch import *
|
||||
from .criterion import *
|
||||
|
||||
#
|
||||
from .backbone import *
|
||||
|
||||
1
rtdetr_pytorch/src/nn/arch/__init__.py
Normal file
1
rtdetr_pytorch/src/nn/arch/__init__.py
Normal file
@@ -0,0 +1 @@
|
||||
from .classification import *
|
||||
41
rtdetr_pytorch/src/nn/arch/classification.py
Normal file
41
rtdetr_pytorch/src/nn/arch/classification.py
Normal file
@@ -0,0 +1,41 @@
|
||||
import torch
|
||||
import torch.nn as nn
|
||||
|
||||
from src.core import register
|
||||
|
||||
|
||||
__all__ = ['Classification', 'ClassHead']
|
||||
|
||||
|
||||
@register
|
||||
class Classification(nn.Module):
|
||||
__inject__ = ['backbone', 'head']
|
||||
|
||||
def __init__(self, backbone: nn.Module, head: nn.Module=None):
|
||||
super().__init__()
|
||||
|
||||
self.backbone = backbone
|
||||
self.head = head
|
||||
|
||||
def forward(self, x):
|
||||
x = self.backbone(x)
|
||||
|
||||
if self.head is not None:
|
||||
x = self.head(x)
|
||||
|
||||
return x
|
||||
|
||||
|
||||
@register
|
||||
class ClassHead(nn.Module):
|
||||
def __init__(self, hidden_dim, num_classes):
|
||||
super().__init__()
|
||||
self.pool = nn.AdaptiveAvgPool2d(1)
|
||||
self.proj = nn.Linear(hidden_dim, num_classes)
|
||||
|
||||
def forward(self, x):
|
||||
x = x[0] if isinstance(x, (list, tuple)) else x
|
||||
x = self.pool(x)
|
||||
x = x.reshape(x.shape[0], -1)
|
||||
x = self.proj(x)
|
||||
return x
|
||||
6
rtdetr_pytorch/src/nn/backbone/__init__.py
Normal file
6
rtdetr_pytorch/src/nn/backbone/__init__.py
Normal file
@@ -0,0 +1,6 @@
|
||||
|
||||
from .presnet import *
|
||||
from .test_resnet import *
|
||||
from .regnet import *
|
||||
from .common import *
|
||||
from .dla import *
|
||||
102
rtdetr_pytorch/src/nn/backbone/common.py
Normal file
102
rtdetr_pytorch/src/nn/backbone/common.py
Normal file
@@ -0,0 +1,102 @@
|
||||
'''by lyuwenyu
|
||||
'''
|
||||
|
||||
import torch
|
||||
import torch.nn as nn
|
||||
|
||||
|
||||
|
||||
class ConvNormLayer(nn.Module):
|
||||
def __init__(self, ch_in, ch_out, kernel_size, stride, padding=None, bias=False, act=None):
|
||||
super().__init__()
|
||||
self.conv = nn.Conv2d(
|
||||
ch_in,
|
||||
ch_out,
|
||||
kernel_size,
|
||||
stride,
|
||||
padding=(kernel_size-1)//2 if padding is None else padding,
|
||||
bias=bias)
|
||||
self.norm = nn.BatchNorm2d(ch_out)
|
||||
self.act = nn.Identity() if act is None else get_activation(act)
|
||||
|
||||
def forward(self, x):
|
||||
return self.act(self.norm(self.conv(x)))
|
||||
|
||||
|
||||
class FrozenBatchNorm2d(nn.Module):
|
||||
"""copy and modified from https://github.com/facebookresearch/detr/blob/master/models/backbone.py
|
||||
BatchNorm2d where the batch statistics and the affine parameters are fixed.
|
||||
Copy-paste from torchvision.misc.ops with added eps before rqsrt,
|
||||
without which any other models than torchvision.models.resnet[18,34,50,101]
|
||||
produce nans.
|
||||
"""
|
||||
def __init__(self, num_features, eps=1e-5):
|
||||
super(FrozenBatchNorm2d, self).__init__()
|
||||
n = num_features
|
||||
self.register_buffer("weight", torch.ones(n))
|
||||
self.register_buffer("bias", torch.zeros(n))
|
||||
self.register_buffer("running_mean", torch.zeros(n))
|
||||
self.register_buffer("running_var", torch.ones(n))
|
||||
self.eps = eps
|
||||
self.num_features = n
|
||||
|
||||
def _load_from_state_dict(self, state_dict, prefix, local_metadata, strict,
|
||||
missing_keys, unexpected_keys, error_msgs):
|
||||
num_batches_tracked_key = prefix + 'num_batches_tracked'
|
||||
if num_batches_tracked_key in state_dict:
|
||||
del state_dict[num_batches_tracked_key]
|
||||
|
||||
super(FrozenBatchNorm2d, self)._load_from_state_dict(
|
||||
state_dict, prefix, local_metadata, strict,
|
||||
missing_keys, unexpected_keys, error_msgs)
|
||||
|
||||
def forward(self, x):
|
||||
# move reshapes to the beginning
|
||||
# to make it fuser-friendly
|
||||
w = self.weight.reshape(1, -1, 1, 1)
|
||||
b = self.bias.reshape(1, -1, 1, 1)
|
||||
rv = self.running_var.reshape(1, -1, 1, 1)
|
||||
rm = self.running_mean.reshape(1, -1, 1, 1)
|
||||
scale = w * (rv + self.eps).rsqrt()
|
||||
bias = b - rm * scale
|
||||
return x * scale + bias
|
||||
|
||||
def extra_repr(self):
|
||||
return (
|
||||
"{num_features}, eps={eps}".format(**self.__dict__)
|
||||
)
|
||||
|
||||
|
||||
def get_activation(act: str, inpace: bool=True):
|
||||
'''get activation
|
||||
'''
|
||||
act = act.lower()
|
||||
|
||||
if act == 'silu':
|
||||
m = nn.SiLU()
|
||||
|
||||
elif act == 'relu':
|
||||
m = nn.ReLU()
|
||||
|
||||
elif act == 'leaky_relu':
|
||||
m = nn.LeakyReLU()
|
||||
|
||||
elif act == 'silu':
|
||||
m = nn.SiLU()
|
||||
|
||||
elif act == 'gelu':
|
||||
m = nn.GELU()
|
||||
|
||||
elif act is None:
|
||||
m = nn.Identity()
|
||||
|
||||
elif isinstance(act, nn.Module):
|
||||
m = act
|
||||
|
||||
else:
|
||||
raise RuntimeError('')
|
||||
|
||||
if hasattr(m, 'inplace'):
|
||||
m.inplace = inpace
|
||||
|
||||
return m
|
||||
452
rtdetr_pytorch/src/nn/backbone/dla.py
Normal file
452
rtdetr_pytorch/src/nn/backbone/dla.py
Normal file
@@ -0,0 +1,452 @@
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import math
|
||||
import logging
|
||||
from os.path import join
|
||||
|
||||
import torch
|
||||
from torch import nn
|
||||
import torch.utils.model_zoo as model_zoo
|
||||
# from mmdet.models.builder import BACKBONES
|
||||
from src.core import register
|
||||
|
||||
|
||||
BN_MOMENTUM = 0.1
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
def get_model_url(data='imagenet', name='dla34', hash='ba72cf86'):
|
||||
return join('http://dl.yf.io/dla/models', data, '{}-{}.pth'.format(name, hash))
|
||||
|
||||
|
||||
def conv3x3(in_planes, out_planes, stride=1):
|
||||
"3x3 convolution with padding"
|
||||
return nn.Conv2d(
|
||||
in_planes, out_planes, kernel_size=3, stride=stride, padding=1, bias=False
|
||||
)
|
||||
|
||||
|
||||
class BasicBlock(nn.Module):
|
||||
def __init__(self, inplanes, planes, stride=1, dilation=1):
|
||||
super(BasicBlock, self).__init__()
|
||||
self.conv1 = nn.Conv2d(
|
||||
inplanes,
|
||||
planes,
|
||||
kernel_size=3,
|
||||
stride=stride,
|
||||
padding=dilation,
|
||||
bias=False,
|
||||
dilation=dilation,
|
||||
)
|
||||
self.bn1 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM)
|
||||
self.relu = nn.ReLU(inplace=True)
|
||||
self.conv2 = nn.Conv2d(
|
||||
planes,
|
||||
planes,
|
||||
kernel_size=3,
|
||||
stride=1,
|
||||
padding=dilation,
|
||||
bias=False,
|
||||
dilation=dilation,
|
||||
)
|
||||
self.bn2 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM)
|
||||
self.stride = stride
|
||||
|
||||
def forward(self, x, residual=None):
|
||||
if residual is None:
|
||||
residual = x
|
||||
|
||||
out = self.conv1(x)
|
||||
out = self.bn1(out)
|
||||
out = self.relu(out)
|
||||
|
||||
out = self.conv2(out)
|
||||
out = self.bn2(out)
|
||||
|
||||
out += residual
|
||||
out = self.relu(out)
|
||||
|
||||
return out
|
||||
|
||||
|
||||
class Bottleneck(nn.Module):
|
||||
expansion = 2
|
||||
|
||||
def __init__(self, inplanes, planes, stride=1, dilation=1):
|
||||
super(Bottleneck, self).__init__()
|
||||
expansion = Bottleneck.expansion
|
||||
bottle_planes = planes // expansion
|
||||
self.conv1 = nn.Conv2d(inplanes, bottle_planes, kernel_size=1, bias=False)
|
||||
self.bn1 = nn.BatchNorm2d(bottle_planes, momentum=BN_MOMENTUM)
|
||||
self.conv2 = nn.Conv2d(
|
||||
bottle_planes,
|
||||
bottle_planes,
|
||||
kernel_size=3,
|
||||
stride=stride,
|
||||
padding=dilation,
|
||||
bias=False,
|
||||
dilation=dilation,
|
||||
)
|
||||
self.bn2 = nn.BatchNorm2d(bottle_planes, momentum=BN_MOMENTUM)
|
||||
self.conv3 = nn.Conv2d(bottle_planes, planes, kernel_size=1, bias=False)
|
||||
self.bn3 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM)
|
||||
self.relu = nn.ReLU(inplace=True)
|
||||
self.stride = stride
|
||||
|
||||
def forward(self, x, residual=None):
|
||||
if residual is None:
|
||||
residual = x
|
||||
|
||||
out = self.conv1(x)
|
||||
out = self.bn1(out)
|
||||
out = self.relu(out)
|
||||
|
||||
out = self.conv2(out)
|
||||
out = self.bn2(out)
|
||||
out = self.relu(out)
|
||||
|
||||
out = self.conv3(out)
|
||||
out = self.bn3(out)
|
||||
|
||||
out += residual
|
||||
out = self.relu(out)
|
||||
|
||||
return out
|
||||
|
||||
|
||||
class BottleneckX(nn.Module):
|
||||
expansion = 2
|
||||
cardinality = 32
|
||||
|
||||
def __init__(self, inplanes, planes, stride=1, dilation=1):
|
||||
super(BottleneckX, self).__init__()
|
||||
cardinality = BottleneckX.cardinality
|
||||
# dim = int(math.floor(planes * (BottleneckV5.expansion / 64.0)))
|
||||
# bottle_planes = dim * cardinality
|
||||
bottle_planes = planes * cardinality // 32
|
||||
self.conv1 = nn.Conv2d(inplanes, bottle_planes, kernel_size=1, bias=False)
|
||||
self.bn1 = nn.BatchNorm2d(bottle_planes, momentum=BN_MOMENTUM)
|
||||
self.conv2 = nn.Conv2d(
|
||||
bottle_planes,
|
||||
bottle_planes,
|
||||
kernel_size=3,
|
||||
stride=stride,
|
||||
padding=dilation,
|
||||
bias=False,
|
||||
dilation=dilation,
|
||||
groups=cardinality,
|
||||
)
|
||||
self.bn2 = nn.BatchNorm2d(bottle_planes, momentum=BN_MOMENTUM)
|
||||
self.conv3 = nn.Conv2d(bottle_planes, planes, kernel_size=1, bias=False)
|
||||
self.bn3 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM)
|
||||
self.relu = nn.ReLU(inplace=True)
|
||||
self.stride = stride
|
||||
|
||||
def forward(self, x, residual=None):
|
||||
if residual is None:
|
||||
residual = x
|
||||
|
||||
out = self.conv1(x)
|
||||
out = self.bn1(out)
|
||||
out = self.relu(out)
|
||||
|
||||
out = self.conv2(out)
|
||||
out = self.bn2(out)
|
||||
out = self.relu(out)
|
||||
|
||||
out = self.conv3(out)
|
||||
out = self.bn3(out)
|
||||
|
||||
out += residual
|
||||
out = self.relu(out)
|
||||
|
||||
return out
|
||||
|
||||
|
||||
class Root(nn.Module):
|
||||
def __init__(self, in_channels, out_channels, kernel_size, residual):
|
||||
super(Root, self).__init__()
|
||||
self.conv = nn.Conv2d(
|
||||
in_channels,
|
||||
out_channels,
|
||||
1,
|
||||
stride=1,
|
||||
bias=False,
|
||||
padding=(kernel_size - 1) // 2,
|
||||
)
|
||||
self.bn = nn.BatchNorm2d(out_channels, momentum=BN_MOMENTUM)
|
||||
self.relu = nn.ReLU(inplace=True)
|
||||
self.residual = residual
|
||||
|
||||
def forward(self, *x):
|
||||
children = x
|
||||
x = self.conv(torch.cat(x, 1))
|
||||
x = self.bn(x)
|
||||
if self.residual:
|
||||
x += children[0]
|
||||
x = self.relu(x)
|
||||
|
||||
return x
|
||||
|
||||
|
||||
class Tree(nn.Module):
|
||||
def __init__(
|
||||
self,
|
||||
levels,
|
||||
block,
|
||||
in_channels,
|
||||
out_channels,
|
||||
stride=1,
|
||||
level_root=False,
|
||||
root_dim=0,
|
||||
root_kernel_size=1,
|
||||
dilation=1,
|
||||
root_residual=False,
|
||||
):
|
||||
super(Tree, self).__init__()
|
||||
if root_dim == 0:
|
||||
root_dim = 2 * out_channels
|
||||
if level_root:
|
||||
root_dim += in_channels
|
||||
if levels == 1:
|
||||
self.tree1 = block(in_channels, out_channels, stride, dilation=dilation)
|
||||
self.tree2 = block(out_channels, out_channels, 1, dilation=dilation)
|
||||
else:
|
||||
self.tree1 = Tree(
|
||||
levels - 1,
|
||||
block,
|
||||
in_channels,
|
||||
out_channels,
|
||||
stride,
|
||||
root_dim=0,
|
||||
root_kernel_size=root_kernel_size,
|
||||
dilation=dilation,
|
||||
root_residual=root_residual,
|
||||
)
|
||||
self.tree2 = Tree(
|
||||
levels - 1,
|
||||
block,
|
||||
out_channels,
|
||||
out_channels,
|
||||
root_dim=root_dim + out_channels,
|
||||
root_kernel_size=root_kernel_size,
|
||||
dilation=dilation,
|
||||
root_residual=root_residual,
|
||||
)
|
||||
if levels == 1:
|
||||
self.root = Root(root_dim, out_channels, root_kernel_size, root_residual)
|
||||
self.level_root = level_root
|
||||
self.root_dim = root_dim
|
||||
self.downsample = None
|
||||
self.project = None
|
||||
self.levels = levels
|
||||
if stride > 1:
|
||||
self.downsample = nn.MaxPool2d(stride, stride=stride)
|
||||
if levels == 1 and in_channels != out_channels:
|
||||
self.project = nn.Sequential(
|
||||
nn.Conv2d(
|
||||
in_channels, out_channels, kernel_size=1, stride=1, bias=False
|
||||
),
|
||||
nn.BatchNorm2d(out_channels, momentum=BN_MOMENTUM),
|
||||
)
|
||||
|
||||
def forward(self, x, residual=None, children=None):
|
||||
children = [] if children is None else children
|
||||
bottom = self.downsample(x) if self.downsample else x
|
||||
residual = self.project(bottom) if self.project else bottom
|
||||
if self.level_root:
|
||||
children.append(bottom)
|
||||
x1 = self.tree1(x, residual)
|
||||
if self.levels == 1:
|
||||
x2 = self.tree2(x1)
|
||||
x = self.root(x2, x1, *children)
|
||||
else:
|
||||
children.append(x1)
|
||||
x = self.tree2(x1, children=children)
|
||||
return x
|
||||
|
||||
|
||||
class DLA(nn.Module):
|
||||
def __init__(
|
||||
self,
|
||||
levels,
|
||||
channels,
|
||||
num_classes=1000,
|
||||
block=BasicBlock,
|
||||
out_indices=(2, 3, 4, 5),
|
||||
residual_root=False,
|
||||
linear_root=False,
|
||||
):
|
||||
super(DLA, self).__init__()
|
||||
self.channels = channels
|
||||
self.num_classes = num_classes
|
||||
self.out_indices = out_indices
|
||||
self.base_layer = nn.Sequential(
|
||||
nn.Conv2d(3, channels[0], kernel_size=7, stride=1, padding=3, bias=False),
|
||||
nn.BatchNorm2d(channels[0], momentum=BN_MOMENTUM),
|
||||
nn.ReLU(inplace=True),
|
||||
)
|
||||
self.level0 = self._make_conv_level(channels[0], channels[0], levels[0])
|
||||
self.level1 = self._make_conv_level(
|
||||
channels[0], channels[1], levels[1], stride=2
|
||||
)
|
||||
self.level2 = Tree(
|
||||
levels[2],
|
||||
block,
|
||||
channels[1],
|
||||
channels[2],
|
||||
2,
|
||||
level_root=False,
|
||||
root_residual=residual_root,
|
||||
)
|
||||
self.level3 = Tree(
|
||||
levels[3],
|
||||
block,
|
||||
channels[2],
|
||||
channels[3],
|
||||
2,
|
||||
level_root=True,
|
||||
root_residual=residual_root,
|
||||
)
|
||||
self.level4 = Tree(
|
||||
levels[4],
|
||||
block,
|
||||
channels[3],
|
||||
channels[4],
|
||||
2,
|
||||
level_root=True,
|
||||
root_residual=residual_root,
|
||||
)
|
||||
self.level5 = Tree(
|
||||
levels[5],
|
||||
block,
|
||||
channels[4],
|
||||
channels[5],
|
||||
2,
|
||||
level_root=True,
|
||||
root_residual=residual_root,
|
||||
)
|
||||
|
||||
# for m in self.modules():
|
||||
# if isinstance(m, nn.Conv2d):
|
||||
# n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
|
||||
# m.weight.data.normal_(0, math.sqrt(2. / n))
|
||||
# elif isinstance(m, nn.BatchNorm2d):
|
||||
# m.weight.data.fill_(1)
|
||||
# m.bias.data.zero_()
|
||||
|
||||
def _make_level(self, block, inplanes, planes, blocks, stride=1):
|
||||
downsample = None
|
||||
if stride != 1 or inplanes != planes:
|
||||
downsample = nn.Sequential(
|
||||
nn.MaxPool2d(stride, stride=stride),
|
||||
nn.Conv2d(inplanes, planes, kernel_size=1, stride=1, bias=False),
|
||||
nn.BatchNorm2d(planes, momentum=BN_MOMENTUM),
|
||||
)
|
||||
|
||||
layers = []
|
||||
layers.append(block(inplanes, planes, stride, downsample=downsample))
|
||||
for i in range(1, blocks):
|
||||
layers.append(block(inplanes, planes))
|
||||
|
||||
return nn.Sequential(*layers)
|
||||
|
||||
def _make_conv_level(self, inplanes, planes, convs, stride=1, dilation=1):
|
||||
modules = []
|
||||
for i in range(convs):
|
||||
modules.extend(
|
||||
[
|
||||
nn.Conv2d(
|
||||
inplanes,
|
||||
planes,
|
||||
kernel_size=3,
|
||||
stride=stride if i == 0 else 1,
|
||||
padding=dilation,
|
||||
bias=False,
|
||||
dilation=dilation,
|
||||
),
|
||||
nn.BatchNorm2d(planes, momentum=BN_MOMENTUM),
|
||||
nn.ReLU(inplace=True),
|
||||
]
|
||||
)
|
||||
inplanes = planes
|
||||
return nn.Sequential(*modules)
|
||||
|
||||
def forward(self, x):
|
||||
y = []
|
||||
x = self.base_layer(x)
|
||||
for i in range(6):
|
||||
x = getattr(self, 'level{}'.format(i))(x)
|
||||
if i in self.out_indices:
|
||||
y.append(x)
|
||||
return y
|
||||
|
||||
def load_pretrained_model(self, data='imagenet', name='dla34', hash='ba72cf86'):
|
||||
# fc = self.fc
|
||||
if name.endswith('.pth'):
|
||||
model_weights = torch.load(data + name)
|
||||
else:
|
||||
model_url = get_model_url(data, name, hash)
|
||||
model_weights = model_zoo.load_url(model_url)
|
||||
self.load_state_dict(model_weights, strict=False)
|
||||
# self.fc = fc
|
||||
|
||||
|
||||
def dla34(pretrained=True, levels=None, in_channels=None, **kwargs): # DLA-34
|
||||
model = DLA(levels=levels, channels=in_channels, block=BasicBlock, **kwargs)
|
||||
if pretrained:
|
||||
model.load_pretrained_model(data='imagenet', name='dla34', hash='ba72cf86')
|
||||
return model
|
||||
|
||||
@register
|
||||
class DLANet(nn.Module):
|
||||
def __init__(
|
||||
self,
|
||||
dla='dla34',
|
||||
pretrained=True,
|
||||
levels=[1, 1, 1, 2, 2, 1],
|
||||
in_channels=[16, 32, 64, 128, 256, 512],
|
||||
return_index = [1, 2, 3],
|
||||
cfg=None,
|
||||
):
|
||||
super(DLANet, self).__init__()
|
||||
self.cfg = cfg
|
||||
self.in_channels = in_channels
|
||||
|
||||
self.model = eval(dla)(
|
||||
pretrained=pretrained, levels=levels, in_channels=in_channels
|
||||
)
|
||||
self.return_index = return_index
|
||||
def forward(self, x):
|
||||
x = self.model(x)
|
||||
max_list = max(self.return_index)
|
||||
min_list = min(self.return_index)
|
||||
return x[min_list:max_list+1]
|
||||
|
||||
|
||||
class Identity(nn.Module):
|
||||
def __init__(self):
|
||||
super(Identity, self).__init__()
|
||||
|
||||
def forward(self, x):
|
||||
return x
|
||||
|
||||
|
||||
def fill_fc_weights(layers):
|
||||
for m in layers.modules():
|
||||
if isinstance(m, nn.Conv2d):
|
||||
if m.bias is not None:
|
||||
nn.init.constant_(m.bias, 0)
|
||||
|
||||
|
||||
def fill_up_weights(up):
|
||||
w = up.weight.data
|
||||
f = math.ceil(w.size(2) / 2)
|
||||
c = (2 * f - 1 - f % 2) / (2.0 * f)
|
||||
for i in range(w.size(2)):
|
||||
for j in range(w.size(3)):
|
||||
w[0, 0, i, j] = (1 - math.fabs(i / f - c)) * (1 - math.fabs(j / f - c))
|
||||
for c in range(1, w.size(0)):
|
||||
w[c, 0, :, :] = w[0, 0, :, :]
|
||||
225
rtdetr_pytorch/src/nn/backbone/presnet.py
Normal file
225
rtdetr_pytorch/src/nn/backbone/presnet.py
Normal file
@@ -0,0 +1,225 @@
|
||||
'''by lyuwenyu
|
||||
'''
|
||||
import torch
|
||||
import torch.nn as nn
|
||||
import torch.nn.functional as F
|
||||
|
||||
from collections import OrderedDict
|
||||
|
||||
from .common import get_activation, ConvNormLayer, FrozenBatchNorm2d
|
||||
|
||||
from src.core import register
|
||||
|
||||
|
||||
__all__ = ['PResNet']
|
||||
|
||||
|
||||
ResNet_cfg = {
|
||||
18: [2, 2, 2, 2],
|
||||
34: [3, 4, 6, 3],
|
||||
50: [3, 4, 6, 3],
|
||||
101: [3, 4, 23, 3],
|
||||
# 152: [3, 8, 36, 3],
|
||||
}
|
||||
|
||||
|
||||
donwload_url = {
|
||||
18: 'https://github.com/lyuwenyu/storage/releases/download/v0.1/ResNet18_vd_pretrained_from_paddle.pth',
|
||||
34: 'https://github.com/lyuwenyu/storage/releases/download/v0.1/ResNet34_vd_pretrained_from_paddle.pth',
|
||||
50: 'https://github.com/lyuwenyu/storage/releases/download/v0.1/ResNet50_vd_ssld_v2_pretrained_from_paddle.pth',
|
||||
101: 'https://github.com/lyuwenyu/storage/releases/download/v0.1/ResNet101_vd_ssld_pretrained_from_paddle.pth',
|
||||
}
|
||||
|
||||
|
||||
class BasicBlock(nn.Module):
|
||||
expansion = 1
|
||||
|
||||
def __init__(self, ch_in, ch_out, stride, shortcut, act='relu', variant='b'):
|
||||
super().__init__()
|
||||
|
||||
self.shortcut = shortcut
|
||||
|
||||
if not shortcut:
|
||||
if variant == 'd' and stride == 2:
|
||||
self.short = nn.Sequential(OrderedDict([
|
||||
('pool', nn.AvgPool2d(2, 2, 0, ceil_mode=True)),
|
||||
('conv', ConvNormLayer(ch_in, ch_out, 1, 1))
|
||||
]))
|
||||
else:
|
||||
self.short = ConvNormLayer(ch_in, ch_out, 1, stride)
|
||||
|
||||
self.branch2a = ConvNormLayer(ch_in, ch_out, 3, stride, act=act)
|
||||
self.branch2b = ConvNormLayer(ch_out, ch_out, 3, 1, act=None)
|
||||
self.act = nn.Identity() if act is None else get_activation(act)
|
||||
|
||||
|
||||
def forward(self, x):
|
||||
out = self.branch2a(x)
|
||||
out = self.branch2b(out)
|
||||
if self.shortcut:
|
||||
short = x
|
||||
else:
|
||||
short = self.short(x)
|
||||
|
||||
out = out + short
|
||||
out = self.act(out)
|
||||
|
||||
return out
|
||||
|
||||
|
||||
class BottleNeck(nn.Module):
|
||||
expansion = 4
|
||||
|
||||
def __init__(self, ch_in, ch_out, stride, shortcut, act='relu', variant='b'):
|
||||
super().__init__()
|
||||
|
||||
if variant == 'a':
|
||||
stride1, stride2 = stride, 1
|
||||
else:
|
||||
stride1, stride2 = 1, stride
|
||||
|
||||
width = ch_out
|
||||
|
||||
self.branch2a = ConvNormLayer(ch_in, width, 1, stride1, act=act)
|
||||
self.branch2b = ConvNormLayer(width, width, 3, stride2, act=act)
|
||||
self.branch2c = ConvNormLayer(width, ch_out * self.expansion, 1, 1)
|
||||
|
||||
self.shortcut = shortcut
|
||||
if not shortcut:
|
||||
if variant == 'd' and stride == 2:
|
||||
self.short = nn.Sequential(OrderedDict([
|
||||
('pool', nn.AvgPool2d(2, 2, 0, ceil_mode=True)),
|
||||
('conv', ConvNormLayer(ch_in, ch_out * self.expansion, 1, 1))
|
||||
]))
|
||||
else:
|
||||
self.short = ConvNormLayer(ch_in, ch_out * self.expansion, 1, stride)
|
||||
|
||||
self.act = nn.Identity() if act is None else get_activation(act)
|
||||
|
||||
def forward(self, x):
|
||||
out = self.branch2a(x)
|
||||
out = self.branch2b(out)
|
||||
out = self.branch2c(out)
|
||||
|
||||
if self.shortcut:
|
||||
short = x
|
||||
else:
|
||||
short = self.short(x)
|
||||
|
||||
out = out + short
|
||||
out = self.act(out)
|
||||
|
||||
return out
|
||||
|
||||
|
||||
class Blocks(nn.Module):
|
||||
def __init__(self, block, ch_in, ch_out, count, stage_num, act='relu', variant='b'):
|
||||
super().__init__()
|
||||
|
||||
self.blocks = nn.ModuleList()
|
||||
for i in range(count):
|
||||
self.blocks.append(
|
||||
block(
|
||||
ch_in,
|
||||
ch_out,
|
||||
stride=2 if i == 0 and stage_num != 2 else 1,
|
||||
shortcut=False if i == 0 else True,
|
||||
variant=variant,
|
||||
act=act)
|
||||
)
|
||||
|
||||
if i == 0:
|
||||
ch_in = ch_out * block.expansion
|
||||
|
||||
def forward(self, x):
|
||||
out = x
|
||||
for block in self.blocks:
|
||||
out = block(out)
|
||||
return out
|
||||
|
||||
|
||||
@register
|
||||
class PResNet(nn.Module):
|
||||
def __init__(
|
||||
self,
|
||||
depth,
|
||||
variant='d',
|
||||
num_stages=4,
|
||||
return_idx=[0, 1, 2, 3],
|
||||
act='relu',
|
||||
freeze_at=-1,
|
||||
freeze_norm=True,
|
||||
pretrained=False):
|
||||
super().__init__()
|
||||
|
||||
block_nums = ResNet_cfg[depth]
|
||||
ch_in = 64
|
||||
if variant in ['c', 'd']:
|
||||
conv_def = [
|
||||
[3, ch_in // 2, 3, 2, "conv1_1"],
|
||||
[ch_in // 2, ch_in // 2, 3, 1, "conv1_2"],
|
||||
[ch_in // 2, ch_in, 3, 1, "conv1_3"],
|
||||
]
|
||||
else:
|
||||
conv_def = [[3, ch_in, 7, 2, "conv1_1"]]
|
||||
|
||||
self.conv1 = nn.Sequential(OrderedDict([
|
||||
(_name, ConvNormLayer(c_in, c_out, k, s, act=act)) for c_in, c_out, k, s, _name in conv_def
|
||||
]))
|
||||
|
||||
ch_out_list = [64, 128, 256, 512]
|
||||
block = BottleNeck if depth >= 50 else BasicBlock
|
||||
|
||||
_out_channels = [block.expansion * v for v in ch_out_list]
|
||||
_out_strides = [4, 8, 16, 32]
|
||||
|
||||
self.res_layers = nn.ModuleList()
|
||||
for i in range(num_stages):
|
||||
stage_num = i + 2
|
||||
self.res_layers.append(
|
||||
Blocks(block, ch_in, ch_out_list[i], block_nums[i], stage_num, act=act, variant=variant)
|
||||
)
|
||||
ch_in = _out_channels[i]
|
||||
|
||||
self.return_idx = return_idx
|
||||
self.out_channels = [_out_channels[_i] for _i in return_idx]
|
||||
self.out_strides = [_out_strides[_i] for _i in return_idx]
|
||||
|
||||
if freeze_at >= 0:
|
||||
self._freeze_parameters(self.conv1)
|
||||
for i in range(min(freeze_at, num_stages)):
|
||||
self._freeze_parameters(self.res_layers[i])
|
||||
|
||||
if freeze_norm:
|
||||
self._freeze_norm(self)
|
||||
|
||||
if pretrained:
|
||||
state = torch.hub.load_state_dict_from_url(donwload_url[depth])
|
||||
self.load_state_dict(state)
|
||||
print(f'Load PResNet{depth} state_dict')
|
||||
|
||||
def _freeze_parameters(self, m: nn.Module):
|
||||
for p in m.parameters():
|
||||
p.requires_grad = False
|
||||
|
||||
def _freeze_norm(self, m: nn.Module):
|
||||
if isinstance(m, nn.BatchNorm2d):
|
||||
m = FrozenBatchNorm2d(m.num_features)
|
||||
else:
|
||||
for name, child in m.named_children():
|
||||
_child = self._freeze_norm(child)
|
||||
if _child is not child:
|
||||
setattr(m, name, _child)
|
||||
return m
|
||||
|
||||
def forward(self, x):
|
||||
conv1 = self.conv1(x)
|
||||
x = F.max_pool2d(conv1, kernel_size=3, stride=2, padding=1)
|
||||
outs = []
|
||||
for idx, stage in enumerate(self.res_layers):
|
||||
x = stage(x)
|
||||
if idx in self.return_idx:
|
||||
outs.append(x)
|
||||
return outs
|
||||
|
||||
|
||||
23
rtdetr_pytorch/src/nn/backbone/regnet.py
Normal file
23
rtdetr_pytorch/src/nn/backbone/regnet.py
Normal file
@@ -0,0 +1,23 @@
|
||||
import torch
|
||||
import torch.nn as nn
|
||||
from transformers import RegNetModel
|
||||
|
||||
|
||||
from src.core import register
|
||||
|
||||
__all__ = ['RegNet']
|
||||
|
||||
@register
|
||||
class RegNet(nn.Module):
|
||||
def __init__(self, configuration, return_idx=[0, 1, 2, 3]):
|
||||
super(RegNet, self).__init__()
|
||||
self.model = RegNetModel.from_pretrained("facebook/regnet-y-040")
|
||||
self.return_idx = return_idx
|
||||
|
||||
|
||||
def forward(self, x):
|
||||
|
||||
outputs = self.model(x, output_hidden_states = True)
|
||||
x = outputs.hidden_states[2:5]
|
||||
|
||||
return x
|
||||
81
rtdetr_pytorch/src/nn/backbone/test_resnet.py
Normal file
81
rtdetr_pytorch/src/nn/backbone/test_resnet.py
Normal file
@@ -0,0 +1,81 @@
|
||||
import torch
|
||||
import torch.nn as nn
|
||||
import torch.nn.functional as F
|
||||
|
||||
from collections import OrderedDict
|
||||
|
||||
|
||||
from src.core import register
|
||||
|
||||
|
||||
class BasicBlock(nn.Module):
|
||||
expansion = 1
|
||||
|
||||
def __init__(self, in_planes, planes, stride=1):
|
||||
super(BasicBlock, self).__init__()
|
||||
|
||||
self.conv1 = nn.Conv2d(in_planes, planes, kernel_size=3, stride=stride, padding=1, bias=False)
|
||||
self.bn1 = nn.BatchNorm2d(planes)
|
||||
|
||||
self.conv2 = nn.Conv2d(planes, planes, kernel_size=3,stride=1, padding=1, bias=False)
|
||||
self.bn2 = nn.BatchNorm2d(planes)
|
||||
|
||||
self.shortcut = nn.Sequential()
|
||||
if stride != 1 or in_planes != self.expansion*planes:
|
||||
self.shortcut = nn.Sequential(
|
||||
nn.Conv2d(in_planes, self.expansion*planes,kernel_size=1, stride=stride, bias=False),
|
||||
nn.BatchNorm2d(self.expansion*planes)
|
||||
)
|
||||
def forward(self, x):
|
||||
out = F.relu(self.bn1(self.conv1(x)))
|
||||
out = self.bn2(self.conv2(out))
|
||||
out += self.shortcut(x)
|
||||
out = F.relu(out)
|
||||
return out
|
||||
|
||||
|
||||
|
||||
class _ResNet(nn.Module):
|
||||
def __init__(self, block, num_blocks, num_classes=10):
|
||||
super().__init__()
|
||||
self.in_planes = 64
|
||||
|
||||
self.conv1 = nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1, bias=False)
|
||||
self.bn1 = nn.BatchNorm2d(64)
|
||||
|
||||
self.layer1 = self._make_layer(block, 64, num_blocks[0], stride=1)
|
||||
self.layer2 = self._make_layer(block, 128, num_blocks[1], stride=2)
|
||||
self.layer3 = self._make_layer(block, 256, num_blocks[2], stride=2)
|
||||
self.layer4 = self._make_layer(block, 512, num_blocks[3], stride=2)
|
||||
|
||||
self.linear = nn.Linear(512 * block.expansion, num_classes)
|
||||
|
||||
def _make_layer(self, block, planes, num_blocks, stride):
|
||||
strides = [stride] + [1]*(num_blocks-1)
|
||||
layers = []
|
||||
for stride in strides:
|
||||
layers.append(block(self.in_planes, planes, stride))
|
||||
self.in_planes = planes * block.expansion
|
||||
return nn.Sequential(*layers)
|
||||
|
||||
def forward(self, x):
|
||||
out = F.relu(self.bn1(self.conv1(x)))
|
||||
out = self.layer1(out)
|
||||
out = self.layer2(out)
|
||||
out = self.layer3(out)
|
||||
out = self.layer4(out)
|
||||
out = F.avg_pool2d(out, 4)
|
||||
out = out.view(out.size(0), -1)
|
||||
out = self.linear(out)
|
||||
return out
|
||||
|
||||
|
||||
@register
|
||||
class MResNet(nn.Module):
|
||||
def __init__(self, num_classes=10, num_blocks=[2, 2, 2, 2]) -> None:
|
||||
super().__init__()
|
||||
self.model = _ResNet(BasicBlock, num_blocks, num_classes)
|
||||
|
||||
def forward(self, x):
|
||||
return self.model(x)
|
||||
|
||||
58
rtdetr_pytorch/src/nn/backbone/utils.py
Normal file
58
rtdetr_pytorch/src/nn/backbone/utils.py
Normal file
@@ -0,0 +1,58 @@
|
||||
"""
|
||||
https://github.com/pytorch/vision/blob/main/torchvision/models/_utils.py
|
||||
|
||||
by lyuwenyu
|
||||
"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from typing import Dict, List
|
||||
|
||||
|
||||
import torch.nn as nn
|
||||
|
||||
|
||||
class IntermediateLayerGetter(nn.ModuleDict):
|
||||
"""
|
||||
Module wrapper that returns intermediate layers from a model
|
||||
|
||||
It has a strong assumption that the modules have been registered
|
||||
into the model in the same order as they are used.
|
||||
This means that one should **not** reuse the same nn.Module
|
||||
twice in the forward if you want this to work.
|
||||
|
||||
Additionally, it is only able to query submodules that are directly
|
||||
assigned to the model. So if `model` is passed, `model.feature1` can
|
||||
be returned, but not `model.feature1.layer2`.
|
||||
"""
|
||||
|
||||
_version = 3
|
||||
|
||||
def __init__(self, model: nn.Module, return_layers: List[str]) -> None:
|
||||
if not set(return_layers).issubset([name for name, _ in model.named_children()]):
|
||||
raise ValueError("return_layers are not present in model. {}"\
|
||||
.format([name for name, _ in model.named_children()]))
|
||||
orig_return_layers = return_layers
|
||||
return_layers = {str(k): str(k) for k in return_layers}
|
||||
layers = OrderedDict()
|
||||
for name, module in model.named_children():
|
||||
layers[name] = module
|
||||
if name in return_layers:
|
||||
del return_layers[name]
|
||||
if not return_layers:
|
||||
break
|
||||
|
||||
super().__init__(layers)
|
||||
self.return_layers = orig_return_layers
|
||||
|
||||
def forward(self, x):
|
||||
# out = OrderedDict()
|
||||
outputs = []
|
||||
for name, module in self.items():
|
||||
x = module(x)
|
||||
if name in self.return_layers:
|
||||
# out_name = self.return_layers[name]
|
||||
# out[out_name] = x
|
||||
outputs.append(x)
|
||||
|
||||
return outputs
|
||||
|
||||
6
rtdetr_pytorch/src/nn/criterion/__init__.py
Normal file
6
rtdetr_pytorch/src/nn/criterion/__init__.py
Normal file
@@ -0,0 +1,6 @@
|
||||
|
||||
import torch.nn as nn
|
||||
from src.core import register
|
||||
|
||||
CrossEntropyLoss = register(nn.CrossEntropyLoss)
|
||||
|
||||
20
rtdetr_pytorch/src/nn/criterion/utils.py
Normal file
20
rtdetr_pytorch/src/nn/criterion/utils.py
Normal file
@@ -0,0 +1,20 @@
|
||||
import torch
|
||||
import torchvision
|
||||
|
||||
|
||||
|
||||
def format_target(targets):
|
||||
'''
|
||||
Args:
|
||||
targets (List[Dict]),
|
||||
Return:
|
||||
tensor (Tensor), [im_id, label, bbox,]
|
||||
'''
|
||||
outputs = []
|
||||
for i, tgt in enumerate(targets):
|
||||
boxes = torchvision.ops.box_convert(tgt['boxes'], in_fmt='xyxy', out_fmt='cxcywh')
|
||||
labels = tgt['labels'].reshape(-1, 1)
|
||||
im_ids = torch.ones_like(labels) * i
|
||||
outputs.append(torch.cat([im_ids, labels, boxes], dim=1))
|
||||
|
||||
return torch.cat(outputs, dim=0)
|
||||
4
rtdetr_pytorch/src/optim/__init__.py
Normal file
4
rtdetr_pytorch/src/optim/__init__.py
Normal file
@@ -0,0 +1,4 @@
|
||||
|
||||
from .ema import *
|
||||
from .optim import *
|
||||
from .amp import *
|
||||
12
rtdetr_pytorch/src/optim/amp.py
Normal file
12
rtdetr_pytorch/src/optim/amp.py
Normal file
@@ -0,0 +1,12 @@
|
||||
import torch
|
||||
import torch.nn as nn
|
||||
import torch.cuda.amp as amp
|
||||
|
||||
|
||||
from src.core import register
|
||||
import src.misc.dist as dist
|
||||
|
||||
|
||||
__all__ = ['GradScaler']
|
||||
|
||||
GradScaler = register(amp.grad_scaler.GradScaler)
|
||||
115
rtdetr_pytorch/src/optim/ema.py
Normal file
115
rtdetr_pytorch/src/optim/ema.py
Normal file
@@ -0,0 +1,115 @@
|
||||
"""
|
||||
reference:
|
||||
https://github.com/ultralytics/yolov5/blob/master/utils/torch_utils.py#L404
|
||||
|
||||
by lyuwenyu
|
||||
"""
|
||||
|
||||
import torch
|
||||
import torch.nn as nn
|
||||
|
||||
import math
|
||||
from copy import deepcopy
|
||||
|
||||
|
||||
|
||||
from src.core import register
|
||||
import src.misc.dist as dist
|
||||
|
||||
|
||||
__all__ = ['ModelEMA']
|
||||
|
||||
|
||||
|
||||
@register
|
||||
class ModelEMA(object):
|
||||
""" Model Exponential Moving Average from https://github.com/rwightman/pytorch-image-models
|
||||
Keep a moving average of everything in the model state_dict (parameters and buffers).
|
||||
This is intended to allow functionality like
|
||||
https://www.tensorflow.org/api_docs/python/tf/train/ExponentialMovingAverage
|
||||
A smoothed version of the weights is necessary for some training schemes to perform well.
|
||||
This class is sensitive where it is initialized in the sequence of model init,
|
||||
GPU assignment and distributed training wrappers.
|
||||
"""
|
||||
def __init__(self, model: nn.Module, decay: float=0.9999, warmups: int=2000):
|
||||
super().__init__()
|
||||
|
||||
# Create EMA
|
||||
self.module = deepcopy(dist.de_parallel(model)).eval() # FP32 EMA
|
||||
|
||||
# if next(model.parameters()).device.type != 'cpu':
|
||||
# self.module.half() # FP16 EMA
|
||||
|
||||
self.decay = decay
|
||||
self.warmups = warmups
|
||||
self.updates = 0 # number of EMA updates
|
||||
# self.filter_no_grad = filter_no_grad
|
||||
self.decay_fn = lambda x: decay * (1 - math.exp(-x / warmups)) # decay exponential ramp (to help early epochs)
|
||||
|
||||
for p in self.module.parameters():
|
||||
p.requires_grad_(False)
|
||||
|
||||
def update(self, model: nn.Module):
|
||||
# Update EMA parameters
|
||||
with torch.no_grad():
|
||||
self.updates += 1
|
||||
d = self.decay_fn(self.updates)
|
||||
|
||||
msd = dist.de_parallel(model).state_dict()
|
||||
for k, v in self.module.state_dict().items():
|
||||
if v.dtype.is_floating_point:
|
||||
v *= d
|
||||
v += (1 - d) * msd[k].detach()
|
||||
|
||||
def to(self, *args, **kwargs):
|
||||
self.module = self.module.to(*args, **kwargs)
|
||||
return self
|
||||
|
||||
def update_attr(self, model, include=(), exclude=('process_group', 'reducer')):
|
||||
# Update EMA attributes
|
||||
self.copy_attr(self.module, model, include, exclude)
|
||||
|
||||
@staticmethod
|
||||
def copy_attr(a, b, include=(), exclude=()):
|
||||
# Copy attributes from b to a, options to only include [...] and to exclude [...]
|
||||
for k, v in b.__dict__.items():
|
||||
if (len(include) and k not in include) or k.startswith('_') or k in exclude:
|
||||
continue
|
||||
else:
|
||||
setattr(a, k, v)
|
||||
|
||||
def state_dict(self, ):
|
||||
return dict(module=self.module.state_dict(), updates=self.updates, warmups=self.warmups)
|
||||
|
||||
def load_state_dict(self, state):
|
||||
self.module.load_state_dict(state['module'])
|
||||
if 'updates' in state:
|
||||
self.updates = state['updates']
|
||||
|
||||
def forwad(self, ):
|
||||
raise RuntimeError('ema...')
|
||||
|
||||
def extra_repr(self) -> str:
|
||||
return f'decay={self.decay}, warmups={self.warmups}'
|
||||
|
||||
|
||||
|
||||
|
||||
class ExponentialMovingAverage(torch.optim.swa_utils.AveragedModel):
|
||||
"""Maintains moving averages of model parameters using an exponential decay.
|
||||
``ema_avg = decay * avg_model_param + (1 - decay) * model_param``
|
||||
`torch.optim.swa_utils.AveragedModel <https://pytorch.org/docs/stable/optim.html#custom-averaging-strategies>`_
|
||||
is used to compute the EMA.
|
||||
"""
|
||||
def __init__(self, model, decay, device="cpu", use_buffers=True):
|
||||
|
||||
self.decay_fn = lambda x: decay * (1 - math.exp(-x / 2000))
|
||||
|
||||
def ema_avg(avg_model_param, model_param, num_averaged):
|
||||
decay = self.decay_fn(num_averaged)
|
||||
return decay * avg_model_param + (1 - decay) * model_param
|
||||
|
||||
super().__init__(model, device, ema_avg, use_buffers=use_buffers)
|
||||
|
||||
|
||||
|
||||
22
rtdetr_pytorch/src/optim/optim.py
Normal file
22
rtdetr_pytorch/src/optim/optim.py
Normal file
@@ -0,0 +1,22 @@
|
||||
|
||||
import torch
|
||||
import torch.nn as nn
|
||||
import torch.optim as optim
|
||||
import torch.optim.lr_scheduler as lr_scheduler
|
||||
|
||||
from src.core import register
|
||||
|
||||
|
||||
__all__ = ['AdamW', 'SGD', 'Adam', 'MultiStepLR', 'CosineAnnealingLR', 'OneCycleLR', 'LambdaLR']
|
||||
|
||||
|
||||
|
||||
SGD = register(optim.SGD)
|
||||
Adam = register(optim.Adam)
|
||||
AdamW = register(optim.AdamW)
|
||||
|
||||
|
||||
MultiStepLR = register(lr_scheduler.MultiStepLR)
|
||||
CosineAnnealingLR = register(lr_scheduler.CosineAnnealingLR)
|
||||
OneCycleLR = register(lr_scheduler.OneCycleLR)
|
||||
LambdaLR = register(lr_scheduler.LambdaLR)
|
||||
12
rtdetr_pytorch/src/solver/__init__.py
Normal file
12
rtdetr_pytorch/src/solver/__init__.py
Normal file
@@ -0,0 +1,12 @@
|
||||
"""by lyuwenyu
|
||||
"""
|
||||
|
||||
from .solver import BaseSolver
|
||||
from .det_solver import DetSolver
|
||||
|
||||
|
||||
from typing import Dict
|
||||
|
||||
TASKS :Dict[str, BaseSolver] = {
|
||||
'detection': DetSolver,
|
||||
}
|
||||
190
rtdetr_pytorch/src/solver/det_engine.py
Normal file
190
rtdetr_pytorch/src/solver/det_engine.py
Normal file
@@ -0,0 +1,190 @@
|
||||
"""
|
||||
Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
|
||||
https://github.com/facebookresearch/detr/blob/main/engine.py
|
||||
|
||||
by lyuwenyu
|
||||
"""
|
||||
|
||||
import math
|
||||
import os
|
||||
import sys
|
||||
import pathlib
|
||||
from typing import Iterable
|
||||
|
||||
import torch
|
||||
import torch.amp
|
||||
|
||||
from src.data import CocoEvaluator
|
||||
from src.misc import (MetricLogger, SmoothedValue, reduce_dict)
|
||||
|
||||
|
||||
def train_one_epoch(model: torch.nn.Module, criterion: torch.nn.Module,
|
||||
data_loader: Iterable, optimizer: torch.optim.Optimizer,
|
||||
device: torch.device, epoch: int, max_norm: float = 0, **kwargs):
|
||||
model.train()
|
||||
criterion.train()
|
||||
metric_logger = MetricLogger(delimiter=" ")
|
||||
metric_logger.add_meter('lr', SmoothedValue(window_size=1, fmt='{value:.6f}'))
|
||||
# metric_logger.add_meter('class_error', SmoothedValue(window_size=1, fmt='{value:.2f}'))
|
||||
header = 'Epoch: [{}]'.format(epoch)
|
||||
print_freq = kwargs.get('print_freq', 10)
|
||||
|
||||
ema = kwargs.get('ema', None)
|
||||
scaler = kwargs.get('scaler', None)
|
||||
|
||||
for samples, targets in metric_logger.log_every(data_loader, print_freq, header):
|
||||
samples = samples.to(device)
|
||||
targets = [{k: v.to(device) for k, v in t.items()} for t in targets]
|
||||
|
||||
if scaler is not None:
|
||||
with torch.autocast(device_type=str(device), cache_enabled=True):
|
||||
outputs = model(samples, targets)
|
||||
|
||||
with torch.autocast(device_type=str(device), enabled=False):
|
||||
loss_dict = criterion(outputs, targets)
|
||||
|
||||
loss = sum(loss_dict.values())
|
||||
scaler.scale(loss).backward()
|
||||
|
||||
if max_norm > 0:
|
||||
scaler.unscale_(optimizer)
|
||||
torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm)
|
||||
|
||||
scaler.step(optimizer)
|
||||
scaler.update()
|
||||
optimizer.zero_grad()
|
||||
|
||||
else:
|
||||
outputs = model(samples, targets)
|
||||
loss_dict = criterion(outputs, targets)
|
||||
|
||||
loss = sum(loss_dict.values())
|
||||
optimizer.zero_grad()
|
||||
loss.backward()
|
||||
|
||||
if max_norm > 0:
|
||||
torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm)
|
||||
|
||||
optimizer.step()
|
||||
|
||||
# ema
|
||||
if ema is not None:
|
||||
ema.update(model)
|
||||
|
||||
loss_dict_reduced = reduce_dict(loss_dict)
|
||||
loss_value = sum(loss_dict_reduced.values())
|
||||
|
||||
if not math.isfinite(loss_value):
|
||||
print("Loss is {}, stopping training".format(loss_value))
|
||||
print(loss_dict_reduced)
|
||||
sys.exit(1)
|
||||
|
||||
metric_logger.update(loss=loss_value, **loss_dict_reduced)
|
||||
metric_logger.update(lr=optimizer.param_groups[0]["lr"])
|
||||
|
||||
# gather the stats from all processes
|
||||
metric_logger.synchronize_between_processes()
|
||||
print("Averaged stats:", metric_logger)
|
||||
return {k: meter.global_avg for k, meter in metric_logger.meters.items()}
|
||||
|
||||
|
||||
|
||||
@torch.no_grad()
|
||||
def evaluate(model: torch.nn.Module, criterion: torch.nn.Module, postprocessors, data_loader, base_ds, device, output_dir):
|
||||
model.eval()
|
||||
criterion.eval()
|
||||
|
||||
metric_logger = MetricLogger(delimiter=" ")
|
||||
# metric_logger.add_meter('class_error', SmoothedValue(window_size=1, fmt='{value:.2f}'))
|
||||
header = 'Test:'
|
||||
|
||||
# iou_types = tuple(k for k in ('segm', 'bbox') if k in postprocessors.keys())
|
||||
iou_types = postprocessors.iou_types
|
||||
coco_evaluator = CocoEvaluator(base_ds, iou_types)
|
||||
# coco_evaluator.coco_eval[iou_types[0]].params.iouThrs = [0, 0.1, 0.5, 0.75]
|
||||
|
||||
panoptic_evaluator = None
|
||||
# if 'panoptic' in postprocessors.keys():
|
||||
# panoptic_evaluator = PanopticEvaluator(
|
||||
# data_loader.dataset.ann_file,
|
||||
# data_loader.dataset.ann_folder,
|
||||
# output_dir=os.path.join(output_dir, "panoptic_eval"),
|
||||
# )
|
||||
|
||||
for samples, targets in metric_logger.log_every(data_loader, 10, header):
|
||||
samples = samples.to(device)
|
||||
targets = [{k: v.to(device) for k, v in t.items()} for t in targets]
|
||||
|
||||
# with torch.autocast(device_type=str(device)):
|
||||
# outputs = model(samples)
|
||||
|
||||
outputs = model(samples)
|
||||
|
||||
# loss_dict = criterion(outputs, targets)
|
||||
# weight_dict = criterion.weight_dict
|
||||
# # reduce losses over all GPUs for logging purposes
|
||||
# loss_dict_reduced = reduce_dict(loss_dict)
|
||||
# loss_dict_reduced_scaled = {k: v * weight_dict[k]
|
||||
# for k, v in loss_dict_reduced.items() if k in weight_dict}
|
||||
# loss_dict_reduced_unscaled = {f'{k}_unscaled': v
|
||||
# for k, v in loss_dict_reduced.items()}
|
||||
# metric_logger.update(loss=sum(loss_dict_reduced_scaled.values()),
|
||||
# **loss_dict_reduced_scaled,
|
||||
# **loss_dict_reduced_unscaled)
|
||||
# metric_logger.update(class_error=loss_dict_reduced['class_error'])
|
||||
|
||||
orig_target_sizes = torch.stack([t["orig_size"] for t in targets], dim=0)
|
||||
results = postprocessors(outputs, orig_target_sizes)
|
||||
# results = postprocessors(outputs, targets)
|
||||
|
||||
# if 'segm' in postprocessors.keys():
|
||||
# target_sizes = torch.stack([t["size"] for t in targets], dim=0)
|
||||
# results = postprocessors['segm'](results, outputs, orig_target_sizes, target_sizes)
|
||||
|
||||
res = {target['image_id'].item(): output for target, output in zip(targets, results)}
|
||||
if coco_evaluator is not None:
|
||||
coco_evaluator.update(res)
|
||||
|
||||
# if panoptic_evaluator is not None:
|
||||
# res_pano = postprocessors["panoptic"](outputs, target_sizes, orig_target_sizes)
|
||||
# for i, target in enumerate(targets):
|
||||
# image_id = target["image_id"].item()
|
||||
# file_name = f"{image_id:012d}.png"
|
||||
# res_pano[i]["image_id"] = image_id
|
||||
# res_pano[i]["file_name"] = file_name
|
||||
# panoptic_evaluator.update(res_pano)
|
||||
|
||||
# gather the stats from all processes
|
||||
metric_logger.synchronize_between_processes()
|
||||
print("Averaged stats:", metric_logger)
|
||||
if coco_evaluator is not None:
|
||||
coco_evaluator.synchronize_between_processes()
|
||||
if panoptic_evaluator is not None:
|
||||
panoptic_evaluator.synchronize_between_processes()
|
||||
|
||||
# accumulate predictions from all images
|
||||
if coco_evaluator is not None:
|
||||
coco_evaluator.accumulate()
|
||||
coco_evaluator.summarize()
|
||||
|
||||
# panoptic_res = None
|
||||
# if panoptic_evaluator is not None:
|
||||
# panoptic_res = panoptic_evaluator.summarize()
|
||||
|
||||
stats = {}
|
||||
# stats = {k: meter.global_avg for k, meter in metric_logger.meters.items()}
|
||||
if coco_evaluator is not None:
|
||||
if 'bbox' in iou_types:
|
||||
stats['coco_eval_bbox'] = coco_evaluator.coco_eval['bbox'].stats.tolist()
|
||||
if 'segm' in iou_types:
|
||||
stats['coco_eval_masks'] = coco_evaluator.coco_eval['segm'].stats.tolist()
|
||||
|
||||
# if panoptic_res is not None:
|
||||
# stats['PQ_all'] = panoptic_res["All"]
|
||||
# stats['PQ_th'] = panoptic_res["Things"]
|
||||
# stats['PQ_st'] = panoptic_res["Stuff"]
|
||||
|
||||
return stats, coco_evaluator
|
||||
|
||||
|
||||
|
||||
104
rtdetr_pytorch/src/solver/det_solver.py
Normal file
104
rtdetr_pytorch/src/solver/det_solver.py
Normal file
@@ -0,0 +1,104 @@
|
||||
'''
|
||||
by lyuwenyu
|
||||
'''
|
||||
import time
|
||||
import json
|
||||
import datetime
|
||||
|
||||
import torch
|
||||
|
||||
from src.misc import dist
|
||||
from src.data import get_coco_api_from_dataset
|
||||
|
||||
from .solver import BaseSolver
|
||||
from .det_engine import train_one_epoch, evaluate
|
||||
|
||||
|
||||
class DetSolver(BaseSolver):
|
||||
|
||||
def fit(self, ):
|
||||
print("Start training")
|
||||
self.train()
|
||||
|
||||
args = self.cfg
|
||||
|
||||
n_parameters = sum(p.numel() for p in self.model.parameters() if p.requires_grad)
|
||||
print('number of params:', n_parameters)
|
||||
|
||||
base_ds = get_coco_api_from_dataset(self.val_dataloader.dataset)
|
||||
# best_stat = {'coco_eval_bbox': 0, 'coco_eval_masks': 0, 'epoch': -1, }
|
||||
best_stat = {'epoch': -1, }
|
||||
|
||||
start_time = time.time()
|
||||
for epoch in range(self.last_epoch + 1, args.epoches):
|
||||
if dist.is_dist_available_and_initialized():
|
||||
self.train_dataloader.sampler.set_epoch(epoch)
|
||||
|
||||
train_stats = train_one_epoch(
|
||||
self.model, self.criterion, self.train_dataloader, self.optimizer, self.device, epoch,
|
||||
args.clip_max_norm, print_freq=args.log_step, ema=self.ema, scaler=self.scaler)
|
||||
|
||||
self.lr_scheduler.step()
|
||||
|
||||
if self.output_dir:
|
||||
checkpoint_paths = [self.output_dir / 'checkpoint.pth']
|
||||
# extra checkpoint before LR drop and every 100 epochs
|
||||
if (epoch + 1) % args.checkpoint_step == 0:
|
||||
checkpoint_paths.append(self.output_dir / f'checkpoint{epoch:04}.pth')
|
||||
for checkpoint_path in checkpoint_paths:
|
||||
dist.save_on_master(self.state_dict(epoch), checkpoint_path)
|
||||
|
||||
module = self.ema.module if self.ema else self.model
|
||||
test_stats, coco_evaluator = evaluate(
|
||||
module, self.criterion, self.postprocessor, self.val_dataloader, base_ds, self.device, self.output_dir
|
||||
)
|
||||
|
||||
# TODO
|
||||
for k in test_stats.keys():
|
||||
if k in best_stat:
|
||||
best_stat['epoch'] = epoch if test_stats[k][0] > best_stat[k] else best_stat['epoch']
|
||||
best_stat[k] = max(best_stat[k], test_stats[k][0])
|
||||
else:
|
||||
best_stat['epoch'] = epoch
|
||||
best_stat[k] = test_stats[k][0]
|
||||
print('best_stat: ', best_stat)
|
||||
|
||||
|
||||
log_stats = {**{f'train_{k}': v for k, v in train_stats.items()},
|
||||
**{f'test_{k}': v for k, v in test_stats.items()},
|
||||
'epoch': epoch,
|
||||
'n_parameters': n_parameters}
|
||||
|
||||
if self.output_dir and dist.is_main_process():
|
||||
with (self.output_dir / "log.txt").open("a") as f:
|
||||
f.write(json.dumps(log_stats) + "\n")
|
||||
|
||||
# for evaluation logs
|
||||
if coco_evaluator is not None:
|
||||
(self.output_dir / 'eval').mkdir(exist_ok=True)
|
||||
if "bbox" in coco_evaluator.coco_eval:
|
||||
filenames = ['latest.pth']
|
||||
if epoch % 50 == 0:
|
||||
filenames.append(f'{epoch:03}.pth')
|
||||
for name in filenames:
|
||||
torch.save(coco_evaluator.coco_eval["bbox"].eval,
|
||||
self.output_dir / "eval" / name)
|
||||
|
||||
total_time = time.time() - start_time
|
||||
total_time_str = str(datetime.timedelta(seconds=int(total_time)))
|
||||
print('Training time {}'.format(total_time_str))
|
||||
|
||||
|
||||
def val(self, ):
|
||||
self.eval()
|
||||
|
||||
base_ds = get_coco_api_from_dataset(self.val_dataloader.dataset)
|
||||
|
||||
module = self.ema.module if self.ema else self.model
|
||||
test_stats, coco_evaluator = evaluate(module, self.criterion, self.postprocessor,
|
||||
self.val_dataloader, base_ds, self.device, self.output_dir)
|
||||
|
||||
if self.output_dir:
|
||||
dist.save_on_master(coco_evaluator.coco_eval["bbox"].eval, self.output_dir / "eval.pth")
|
||||
|
||||
return
|
||||
182
rtdetr_pytorch/src/solver/solver.py
Normal file
182
rtdetr_pytorch/src/solver/solver.py
Normal file
@@ -0,0 +1,182 @@
|
||||
"""by lyuwenyu
|
||||
"""
|
||||
|
||||
import torch
|
||||
import torch.nn as nn
|
||||
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
from typing import Dict
|
||||
|
||||
from src.misc import dist
|
||||
from src.core import BaseConfig
|
||||
|
||||
|
||||
class BaseSolver(object):
|
||||
def __init__(self, cfg: BaseConfig) -> None:
|
||||
|
||||
self.cfg = cfg
|
||||
|
||||
def setup(self, ):
|
||||
'''Avoid instantiating unnecessary classes
|
||||
'''
|
||||
cfg = self.cfg
|
||||
device = cfg.device
|
||||
self.device = device
|
||||
self.last_epoch = cfg.last_epoch
|
||||
|
||||
self.model = dist.warp_model(cfg.model.to(device), cfg.find_unused_parameters, cfg.sync_bn)
|
||||
self.criterion = cfg.criterion.to(device)
|
||||
self.postprocessor = cfg.postprocessor
|
||||
|
||||
# NOTE (lvwenyu): should load_tuning_state before ema instance building
|
||||
if self.cfg.tuning:
|
||||
print(f'Tuning checkpoint from {self.cfg.tuning}')
|
||||
self.load_tuning_state(self.cfg.tuning)
|
||||
|
||||
self.scaler = cfg.scaler
|
||||
self.ema = cfg.ema.to(device) if cfg.ema is not None else None
|
||||
|
||||
self.output_dir = Path(cfg.output_dir)
|
||||
self.output_dir.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
|
||||
def train(self, ):
|
||||
self.setup()
|
||||
self.optimizer = self.cfg.optimizer
|
||||
self.lr_scheduler = self.cfg.lr_scheduler
|
||||
|
||||
# NOTE instantiating order
|
||||
if self.cfg.resume:
|
||||
print(f'Resume checkpoint from {self.cfg.resume}')
|
||||
self.resume(self.cfg.resume)
|
||||
|
||||
self.train_dataloader = dist.warp_loader(self.cfg.train_dataloader, \
|
||||
shuffle=self.cfg.train_dataloader.shuffle)
|
||||
self.val_dataloader = dist.warp_loader(self.cfg.val_dataloader, \
|
||||
shuffle=self.cfg.val_dataloader.shuffle)
|
||||
|
||||
|
||||
def eval(self, ):
|
||||
self.setup()
|
||||
self.val_dataloader = dist.warp_loader(self.cfg.val_dataloader, \
|
||||
shuffle=self.cfg.val_dataloader.shuffle)
|
||||
|
||||
if self.cfg.resume:
|
||||
print(f'resume from {self.cfg.resume}')
|
||||
self.resume(self.cfg.resume)
|
||||
|
||||
|
||||
def state_dict(self, last_epoch):
|
||||
'''state dict
|
||||
'''
|
||||
state = {}
|
||||
state['model'] = dist.de_parallel(self.model).state_dict()
|
||||
state['date'] = datetime.now().isoformat()
|
||||
|
||||
# TODO
|
||||
state['last_epoch'] = last_epoch
|
||||
|
||||
if self.optimizer is not None:
|
||||
state['optimizer'] = self.optimizer.state_dict()
|
||||
|
||||
if self.lr_scheduler is not None:
|
||||
state['lr_scheduler'] = self.lr_scheduler.state_dict()
|
||||
# state['last_epoch'] = self.lr_scheduler.last_epoch
|
||||
|
||||
if self.ema is not None:
|
||||
state['ema'] = self.ema.state_dict()
|
||||
|
||||
if self.scaler is not None:
|
||||
state['scaler'] = self.scaler.state_dict()
|
||||
|
||||
return state
|
||||
|
||||
|
||||
def load_state_dict(self, state):
|
||||
'''load state dict
|
||||
'''
|
||||
# TODO
|
||||
if getattr(self, 'last_epoch', None) and 'last_epoch' in state:
|
||||
self.last_epoch = state['last_epoch']
|
||||
print('Loading last_epoch')
|
||||
|
||||
if getattr(self, 'model', None) and 'model' in state:
|
||||
if dist.is_parallel(self.model):
|
||||
self.model.module.load_state_dict(state['model'])
|
||||
else:
|
||||
self.model.load_state_dict(state['model'])
|
||||
print('Loading model.state_dict')
|
||||
|
||||
if getattr(self, 'ema', None) and 'ema' in state:
|
||||
self.ema.load_state_dict(state['ema'])
|
||||
print('Loading ema.state_dict')
|
||||
|
||||
if getattr(self, 'optimizer', None) and 'optimizer' in state:
|
||||
self.optimizer.load_state_dict(state['optimizer'])
|
||||
print('Loading optimizer.state_dict')
|
||||
|
||||
if getattr(self, 'lr_scheduler', None) and 'lr_scheduler' in state:
|
||||
self.lr_scheduler.load_state_dict(state['lr_scheduler'])
|
||||
print('Loading lr_scheduler.state_dict')
|
||||
|
||||
if getattr(self, 'scaler', None) and 'scaler' in state:
|
||||
self.scaler.load_state_dict(state['scaler'])
|
||||
print('Loading scaler.state_dict')
|
||||
|
||||
|
||||
def save(self, path):
|
||||
'''save state
|
||||
'''
|
||||
state = self.state_dict()
|
||||
dist.save_on_master(state, path)
|
||||
|
||||
|
||||
def resume(self, path):
|
||||
'''load resume
|
||||
'''
|
||||
# for cuda:0 memory
|
||||
state = torch.load(path, map_location='cpu')
|
||||
self.load_state_dict(state)
|
||||
|
||||
def load_tuning_state(self, path,):
|
||||
"""only load model for tuning and skip missed/dismatched keys
|
||||
"""
|
||||
if 'http' in path:
|
||||
state = torch.hub.load_state_dict_from_url(path, map_location='cpu')
|
||||
else:
|
||||
state = torch.load(path, map_location='cpu')
|
||||
|
||||
module = dist.de_parallel(self.model)
|
||||
|
||||
# TODO hard code
|
||||
if 'ema' in state:
|
||||
stat, infos = self._matched_state(module.state_dict(), state['ema']['module'])
|
||||
else:
|
||||
stat, infos = self._matched_state(module.state_dict(), state['model'])
|
||||
|
||||
module.load_state_dict(stat, strict=False)
|
||||
print(f'Load model.state_dict, {infos}')
|
||||
|
||||
@staticmethod
|
||||
def _matched_state(state: Dict[str, torch.Tensor], params: Dict[str, torch.Tensor]):
|
||||
missed_list = []
|
||||
unmatched_list = []
|
||||
matched_state = {}
|
||||
for k, v in state.items():
|
||||
if k in params:
|
||||
if v.shape == params[k].shape:
|
||||
matched_state[k] = params[k]
|
||||
else:
|
||||
unmatched_list.append(k)
|
||||
else:
|
||||
missed_list.append(k)
|
||||
|
||||
return matched_state, {'missed': missed_list, 'unmatched': unmatched_list}
|
||||
|
||||
|
||||
def fit(self, ):
|
||||
raise NotImplementedError('')
|
||||
|
||||
def val(self, ):
|
||||
raise NotImplementedError('')
|
||||
2
rtdetr_pytorch/src/zoo/__init__.py
Normal file
2
rtdetr_pytorch/src/zoo/__init__.py
Normal file
@@ -0,0 +1,2 @@
|
||||
|
||||
from .rtdetr import *
|
||||
12
rtdetr_pytorch/src/zoo/rtdetr/__init__.py
Normal file
12
rtdetr_pytorch/src/zoo/rtdetr/__init__.py
Normal file
@@ -0,0 +1,12 @@
|
||||
"""by lyuwenyu
|
||||
"""
|
||||
|
||||
|
||||
from .rtdetr import *
|
||||
|
||||
from .hybrid_encoder import *
|
||||
from .rtdetr_decoder import *
|
||||
from .rtdetr_postprocessor import *
|
||||
from .rtdetr_criterion import *
|
||||
|
||||
from .matcher import *
|
||||
89
rtdetr_pytorch/src/zoo/rtdetr/box_ops.py
Normal file
89
rtdetr_pytorch/src/zoo/rtdetr/box_ops.py
Normal file
@@ -0,0 +1,89 @@
|
||||
'''
|
||||
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
|
||||
https://github.com/facebookresearch/detr/blob/main/util/box_ops.py
|
||||
'''
|
||||
|
||||
import torch
|
||||
from torchvision.ops.boxes import box_area
|
||||
|
||||
|
||||
def box_cxcywh_to_xyxy(x):
|
||||
x_c, y_c, w, h = x.unbind(-1)
|
||||
b = [(x_c - 0.5 * w), (y_c - 0.5 * h),
|
||||
(x_c + 0.5 * w), (y_c + 0.5 * h)]
|
||||
return torch.stack(b, dim=-1)
|
||||
|
||||
|
||||
def box_xyxy_to_cxcywh(x):
|
||||
x0, y0, x1, y1 = x.unbind(-1)
|
||||
b = [(x0 + x1) / 2, (y0 + y1) / 2,
|
||||
(x1 - x0), (y1 - y0)]
|
||||
return torch.stack(b, dim=-1)
|
||||
|
||||
|
||||
# modified from torchvision to also return the union
|
||||
def box_iou(boxes1, boxes2):
|
||||
area1 = box_area(boxes1)
|
||||
area2 = box_area(boxes2)
|
||||
|
||||
lt = torch.max(boxes1[:, None, :2], boxes2[:, :2]) # [N,M,2]
|
||||
rb = torch.min(boxes1[:, None, 2:], boxes2[:, 2:]) # [N,M,2]
|
||||
|
||||
wh = (rb - lt).clamp(min=0) # [N,M,2]
|
||||
inter = wh[:, :, 0] * wh[:, :, 1] # [N,M]
|
||||
|
||||
union = area1[:, None] + area2 - inter
|
||||
|
||||
iou = inter / union
|
||||
return iou, union
|
||||
|
||||
|
||||
def generalized_box_iou(boxes1, boxes2):
|
||||
"""
|
||||
Generalized IoU from https://giou.stanford.edu/
|
||||
|
||||
The boxes should be in [x0, y0, x1, y1] format
|
||||
|
||||
Returns a [N, M] pairwise matrix, where N = len(boxes1)
|
||||
and M = len(boxes2)
|
||||
"""
|
||||
# degenerate boxes gives inf / nan results
|
||||
# so do an early check
|
||||
assert (boxes1[:, 2:] >= boxes1[:, :2]).all()
|
||||
assert (boxes2[:, 2:] >= boxes2[:, :2]).all()
|
||||
iou, union = box_iou(boxes1, boxes2)
|
||||
|
||||
lt = torch.min(boxes1[:, None, :2], boxes2[:, :2])
|
||||
rb = torch.max(boxes1[:, None, 2:], boxes2[:, 2:])
|
||||
|
||||
wh = (rb - lt).clamp(min=0) # [N,M,2]
|
||||
area = wh[:, :, 0] * wh[:, :, 1]
|
||||
|
||||
return iou - (area - union) / area
|
||||
|
||||
|
||||
def masks_to_boxes(masks):
|
||||
"""Compute the bounding boxes around the provided masks
|
||||
|
||||
The masks should be in format [N, H, W] where N is the number of masks, (H, W) are the spatial dimensions.
|
||||
|
||||
Returns a [N, 4] tensors, with the boxes in xyxy format
|
||||
"""
|
||||
if masks.numel() == 0:
|
||||
return torch.zeros((0, 4), device=masks.device)
|
||||
|
||||
h, w = masks.shape[-2:]
|
||||
|
||||
y = torch.arange(0, h, dtype=torch.float)
|
||||
x = torch.arange(0, w, dtype=torch.float)
|
||||
y, x = torch.meshgrid(y, x)
|
||||
|
||||
x_mask = (masks * x.unsqueeze(0))
|
||||
x_max = x_mask.flatten(1).max(-1)[0]
|
||||
x_min = x_mask.masked_fill(~(masks.bool()), 1e8).flatten(1).min(-1)[0]
|
||||
|
||||
y_mask = (masks * y.unsqueeze(0))
|
||||
y_max = y_mask.flatten(1).max(-1)[0]
|
||||
y_min = y_mask.masked_fill(~(masks.bool()), 1e8).flatten(1).min(-1)[0]
|
||||
|
||||
return torch.stack([x_min, y_min, x_max, y_max], 1)
|
||||
125
rtdetr_pytorch/src/zoo/rtdetr/denoising.py
Normal file
125
rtdetr_pytorch/src/zoo/rtdetr/denoising.py
Normal file
@@ -0,0 +1,125 @@
|
||||
"""by lyuwenyu
|
||||
"""
|
||||
|
||||
import torch
|
||||
|
||||
from .utils import inverse_sigmoid
|
||||
from .box_ops import box_cxcywh_to_xyxy, box_xyxy_to_cxcywh
|
||||
|
||||
|
||||
|
||||
def get_contrastive_denoising_training_group(targets,
|
||||
num_classes,
|
||||
num_queries,
|
||||
class_embed,
|
||||
num_denoising=100,
|
||||
label_noise_ratio=0.5,
|
||||
box_noise_scale=1.0,):
|
||||
"""cnd"""
|
||||
if num_denoising <= 0:
|
||||
return None, None, None, None
|
||||
|
||||
num_gts = [len(t['labels']) for t in targets]
|
||||
device = targets[0]['labels'].device
|
||||
|
||||
max_gt_num = max(num_gts)
|
||||
if max_gt_num == 0:
|
||||
return None, None, None, None
|
||||
|
||||
num_group = num_denoising // max_gt_num
|
||||
num_group = 1 if num_group == 0 else num_group
|
||||
# pad gt to max_num of a batch
|
||||
bs = len(num_gts)
|
||||
|
||||
input_query_class = torch.full([bs, max_gt_num], num_classes, dtype=torch.int32, device=device)
|
||||
input_query_bbox = torch.zeros([bs, max_gt_num, 4], device=device)
|
||||
pad_gt_mask = torch.zeros([bs, max_gt_num], dtype=torch.bool, device=device)
|
||||
|
||||
for i in range(bs):
|
||||
num_gt = num_gts[i]
|
||||
if num_gt > 0:
|
||||
input_query_class[i, :num_gt] = targets[i]['labels']
|
||||
input_query_bbox[i, :num_gt] = targets[i]['boxes']
|
||||
pad_gt_mask[i, :num_gt] = 1
|
||||
# each group has positive and negative queries.
|
||||
input_query_class = input_query_class.tile([1, 2 * num_group])
|
||||
input_query_bbox = input_query_bbox.tile([1, 2 * num_group, 1])
|
||||
pad_gt_mask = pad_gt_mask.tile([1, 2 * num_group])
|
||||
# positive and negative mask
|
||||
negative_gt_mask = torch.zeros([bs, max_gt_num * 2, 1], device=device)
|
||||
negative_gt_mask[:, max_gt_num:] = 1
|
||||
negative_gt_mask = negative_gt_mask.tile([1, num_group, 1])
|
||||
positive_gt_mask = 1 - negative_gt_mask
|
||||
# contrastive denoising training positive index
|
||||
positive_gt_mask = positive_gt_mask.squeeze(-1) * pad_gt_mask
|
||||
dn_positive_idx = torch.nonzero(positive_gt_mask)[:, 1]
|
||||
dn_positive_idx = torch.split(dn_positive_idx, [n * num_group for n in num_gts])
|
||||
# total denoising queries
|
||||
num_denoising = int(max_gt_num * 2 * num_group)
|
||||
|
||||
if label_noise_ratio > 0:
|
||||
mask = torch.rand_like(input_query_class, dtype=torch.float) < (label_noise_ratio * 0.5)
|
||||
# randomly put a new one here
|
||||
new_label = torch.randint_like(mask, 0, num_classes, dtype=input_query_class.dtype)
|
||||
input_query_class = torch.where(mask & pad_gt_mask, new_label, input_query_class)
|
||||
|
||||
# if label_noise_ratio > 0:
|
||||
# input_query_class = input_query_class.flatten()
|
||||
# pad_gt_mask = pad_gt_mask.flatten()
|
||||
# # half of bbox prob
|
||||
# # mask = torch.rand(input_query_class.shape, device=device) < (label_noise_ratio * 0.5)
|
||||
# mask = torch.rand_like(input_query_class) < (label_noise_ratio * 0.5)
|
||||
# chosen_idx = torch.nonzero(mask * pad_gt_mask).squeeze(-1)
|
||||
# # randomly put a new one here
|
||||
# new_label = torch.randint_like(chosen_idx, 0, num_classes, dtype=input_query_class.dtype)
|
||||
# # input_query_class.scatter_(dim=0, index=chosen_idx, value=new_label)
|
||||
# input_query_class[chosen_idx] = new_label
|
||||
# input_query_class = input_query_class.reshape(bs, num_denoising)
|
||||
# pad_gt_mask = pad_gt_mask.reshape(bs, num_denoising)
|
||||
|
||||
if box_noise_scale > 0:
|
||||
known_bbox = box_cxcywh_to_xyxy(input_query_bbox)
|
||||
diff = torch.tile(input_query_bbox[..., 2:] * 0.5, [1, 1, 2]) * box_noise_scale
|
||||
rand_sign = torch.randint_like(input_query_bbox, 0, 2) * 2.0 - 1.0
|
||||
rand_part = torch.rand_like(input_query_bbox)
|
||||
rand_part = (rand_part + 1.0) * negative_gt_mask + rand_part * (1 - negative_gt_mask)
|
||||
rand_part *= rand_sign
|
||||
known_bbox += rand_part * diff
|
||||
known_bbox.clip_(min=0.0, max=1.0)
|
||||
input_query_bbox = box_xyxy_to_cxcywh(known_bbox)
|
||||
input_query_bbox = inverse_sigmoid(input_query_bbox)
|
||||
|
||||
# class_embed = torch.concat([class_embed, torch.zeros([1, class_embed.shape[-1]], device=device)])
|
||||
# input_query_class = torch.gather(
|
||||
# class_embed, input_query_class.flatten(),
|
||||
# axis=0).reshape(bs, num_denoising, -1)
|
||||
# input_query_class = class_embed(input_query_class.flatten()).reshape(bs, num_denoising, -1)
|
||||
input_query_class = class_embed(input_query_class)
|
||||
|
||||
tgt_size = num_denoising + num_queries
|
||||
# attn_mask = torch.ones([tgt_size, tgt_size], device=device) < 0
|
||||
attn_mask = torch.full([tgt_size, tgt_size], False, dtype=torch.bool, device=device)
|
||||
# match query cannot see the reconstruction
|
||||
attn_mask[num_denoising:, :num_denoising] = True
|
||||
|
||||
# reconstruct cannot see each other
|
||||
for i in range(num_group):
|
||||
if i == 0:
|
||||
attn_mask[max_gt_num * 2 * i: max_gt_num * 2 * (i + 1), max_gt_num * 2 * (i + 1): num_denoising] = True
|
||||
if i == num_group - 1:
|
||||
attn_mask[max_gt_num * 2 * i: max_gt_num * 2 * (i + 1), :max_gt_num * i * 2] = True
|
||||
else:
|
||||
attn_mask[max_gt_num * 2 * i: max_gt_num * 2 * (i + 1), max_gt_num * 2 * (i + 1): num_denoising] = True
|
||||
attn_mask[max_gt_num * 2 * i: max_gt_num * 2 * (i + 1), :max_gt_num * 2 * i] = True
|
||||
|
||||
dn_meta = {
|
||||
"dn_positive_idx": dn_positive_idx,
|
||||
"dn_num_group": num_group,
|
||||
"dn_num_split": [num_denoising, num_queries]
|
||||
}
|
||||
|
||||
# print(input_query_class.shape) # torch.Size([4, 196, 256])
|
||||
# print(input_query_bbox.shape) # torch.Size([4, 196, 4])
|
||||
# print(attn_mask.shape) # torch.Size([496, 496])
|
||||
|
||||
return input_query_class, input_query_bbox, attn_mask, dn_meta
|
||||
322
rtdetr_pytorch/src/zoo/rtdetr/hybrid_encoder.py
Normal file
322
rtdetr_pytorch/src/zoo/rtdetr/hybrid_encoder.py
Normal file
@@ -0,0 +1,322 @@
|
||||
'''by lyuwenyu
|
||||
'''
|
||||
|
||||
import copy
|
||||
import torch
|
||||
import torch.nn as nn
|
||||
import torch.nn.functional as F
|
||||
|
||||
from .utils import get_activation
|
||||
|
||||
from src.core import register
|
||||
|
||||
|
||||
__all__ = ['HybridEncoder']
|
||||
|
||||
|
||||
|
||||
class ConvNormLayer(nn.Module):
|
||||
def __init__(self, ch_in, ch_out, kernel_size, stride, padding=None, bias=False, act=None):
|
||||
super().__init__()
|
||||
self.conv = nn.Conv2d(
|
||||
ch_in,
|
||||
ch_out,
|
||||
kernel_size,
|
||||
stride,
|
||||
padding=(kernel_size-1)//2 if padding is None else padding,
|
||||
bias=bias)
|
||||
self.norm = nn.BatchNorm2d(ch_out)
|
||||
self.act = nn.Identity() if act is None else get_activation(act)
|
||||
|
||||
def forward(self, x):
|
||||
return self.act(self.norm(self.conv(x)))
|
||||
|
||||
|
||||
class RepVggBlock(nn.Module):
|
||||
def __init__(self, ch_in, ch_out, act='relu'):
|
||||
super().__init__()
|
||||
self.ch_in = ch_in
|
||||
self.ch_out = ch_out
|
||||
self.conv1 = ConvNormLayer(ch_in, ch_out, 3, 1, padding=1, act=None)
|
||||
self.conv2 = ConvNormLayer(ch_in, ch_out, 1, 1, padding=0, act=None)
|
||||
self.act = nn.Identity() if act is None else get_activation(act)
|
||||
|
||||
def forward(self, x):
|
||||
if hasattr(self, 'conv'):
|
||||
y = self.conv(x)
|
||||
else:
|
||||
y = self.conv1(x) + self.conv2(x)
|
||||
|
||||
return self.act(y)
|
||||
|
||||
def convert_to_deploy(self):
|
||||
if not hasattr(self, 'conv'):
|
||||
self.conv = nn.Conv2d(self.ch_in, self.ch_out, 3, 1, padding=1)
|
||||
|
||||
kernel, bias = self.get_equivalent_kernel_bias()
|
||||
self.conv.weight.data = kernel
|
||||
self.conv.bias.data = bias
|
||||
# self.__delattr__('conv1')
|
||||
# self.__delattr__('conv2')
|
||||
|
||||
def get_equivalent_kernel_bias(self):
|
||||
kernel3x3, bias3x3 = self._fuse_bn_tensor(self.conv1)
|
||||
kernel1x1, bias1x1 = self._fuse_bn_tensor(self.conv2)
|
||||
|
||||
return kernel3x3 + self._pad_1x1_to_3x3_tensor(kernel1x1), bias3x3 + bias1x1
|
||||
|
||||
def _pad_1x1_to_3x3_tensor(self, kernel1x1):
|
||||
if kernel1x1 is None:
|
||||
return 0
|
||||
else:
|
||||
return F.pad(kernel1x1, [1, 1, 1, 1])
|
||||
|
||||
def _fuse_bn_tensor(self, branch: ConvNormLayer):
|
||||
if branch is None:
|
||||
return 0, 0
|
||||
kernel = branch.conv.weight
|
||||
running_mean = branch.norm.running_mean
|
||||
running_var = branch.norm.running_var
|
||||
gamma = branch.norm.weight
|
||||
beta = branch.norm.bias
|
||||
eps = branch.norm.eps
|
||||
std = (running_var + eps).sqrt()
|
||||
t = (gamma / std).reshape(-1, 1, 1, 1)
|
||||
return kernel * t, beta - running_mean * gamma / std
|
||||
|
||||
|
||||
class CSPRepLayer(nn.Module):
|
||||
def __init__(self,
|
||||
in_channels,
|
||||
out_channels,
|
||||
num_blocks=3,
|
||||
expansion=1.0,
|
||||
bias=None,
|
||||
act="silu"):
|
||||
super(CSPRepLayer, self).__init__()
|
||||
hidden_channels = int(out_channels * expansion)
|
||||
self.conv1 = ConvNormLayer(in_channels, hidden_channels, 1, 1, bias=bias, act=act)
|
||||
self.conv2 = ConvNormLayer(in_channels, hidden_channels, 1, 1, bias=bias, act=act)
|
||||
self.bottlenecks = nn.Sequential(*[
|
||||
RepVggBlock(hidden_channels, hidden_channels, act=act) for _ in range(num_blocks)
|
||||
])
|
||||
if hidden_channels != out_channels:
|
||||
self.conv3 = ConvNormLayer(hidden_channels, out_channels, 1, 1, bias=bias, act=act)
|
||||
else:
|
||||
self.conv3 = nn.Identity()
|
||||
|
||||
def forward(self, x):
|
||||
x_1 = self.conv1(x)
|
||||
x_1 = self.bottlenecks(x_1)
|
||||
x_2 = self.conv2(x)
|
||||
return self.conv3(x_1 + x_2)
|
||||
|
||||
|
||||
# transformer
|
||||
class TransformerEncoderLayer(nn.Module):
|
||||
def __init__(self,
|
||||
d_model,
|
||||
nhead,
|
||||
dim_feedforward=2048,
|
||||
dropout=0.1,
|
||||
activation="relu",
|
||||
normalize_before=False):
|
||||
super().__init__()
|
||||
self.normalize_before = normalize_before
|
||||
|
||||
self.self_attn = nn.MultiheadAttention(d_model, nhead, dropout, batch_first=True)
|
||||
|
||||
self.linear1 = nn.Linear(d_model, dim_feedforward)
|
||||
self.dropout = nn.Dropout(dropout)
|
||||
self.linear2 = nn.Linear(dim_feedforward, d_model)
|
||||
|
||||
self.norm1 = nn.LayerNorm(d_model)
|
||||
self.norm2 = nn.LayerNorm(d_model)
|
||||
self.dropout1 = nn.Dropout(dropout)
|
||||
self.dropout2 = nn.Dropout(dropout)
|
||||
self.activation = get_activation(activation)
|
||||
|
||||
@staticmethod
|
||||
def with_pos_embed(tensor, pos_embed):
|
||||
return tensor if pos_embed is None else tensor + pos_embed
|
||||
|
||||
def forward(self, src, src_mask=None, pos_embed=None) -> torch.Tensor:
|
||||
residual = src
|
||||
if self.normalize_before:
|
||||
src = self.norm1(src)
|
||||
q = k = self.with_pos_embed(src, pos_embed)
|
||||
src, _ = self.self_attn(q, k, value=src, attn_mask=src_mask)
|
||||
|
||||
src = residual + self.dropout1(src)
|
||||
if not self.normalize_before:
|
||||
src = self.norm1(src)
|
||||
|
||||
residual = src
|
||||
if self.normalize_before:
|
||||
src = self.norm2(src)
|
||||
src = self.linear2(self.dropout(self.activation(self.linear1(src))))
|
||||
src = residual + self.dropout2(src)
|
||||
if not self.normalize_before:
|
||||
src = self.norm2(src)
|
||||
return src
|
||||
|
||||
|
||||
class TransformerEncoder(nn.Module):
|
||||
def __init__(self, encoder_layer, num_layers, norm=None):
|
||||
super(TransformerEncoder, self).__init__()
|
||||
self.layers = nn.ModuleList([copy.deepcopy(encoder_layer) for _ in range(num_layers)])
|
||||
self.num_layers = num_layers
|
||||
self.norm = norm
|
||||
|
||||
def forward(self, src, src_mask=None, pos_embed=None) -> torch.Tensor:
|
||||
output = src
|
||||
for layer in self.layers:
|
||||
output = layer(output, src_mask=src_mask, pos_embed=pos_embed)
|
||||
|
||||
if self.norm is not None:
|
||||
output = self.norm(output)
|
||||
|
||||
return output
|
||||
|
||||
|
||||
@register
|
||||
class HybridEncoder(nn.Module):
|
||||
def __init__(self,
|
||||
in_channels=[512, 1024, 2048],
|
||||
feat_strides=[8, 16, 32],
|
||||
hidden_dim=256,
|
||||
nhead=8,
|
||||
dim_feedforward = 1024,
|
||||
dropout=0.0,
|
||||
enc_act='gelu',
|
||||
use_encoder_idx=[2],
|
||||
num_encoder_layers=1,
|
||||
pe_temperature=10000,
|
||||
expansion=1.0,
|
||||
depth_mult=1.0,
|
||||
act='silu',
|
||||
eval_spatial_size=None):
|
||||
super().__init__()
|
||||
self.in_channels = in_channels
|
||||
self.feat_strides = feat_strides
|
||||
self.hidden_dim = hidden_dim
|
||||
self.use_encoder_idx = use_encoder_idx
|
||||
self.num_encoder_layers = num_encoder_layers
|
||||
self.pe_temperature = pe_temperature
|
||||
self.eval_spatial_size = eval_spatial_size
|
||||
|
||||
self.out_channels = [hidden_dim for _ in range(len(in_channels))]
|
||||
self.out_strides = feat_strides
|
||||
|
||||
# channel projection
|
||||
self.input_proj = nn.ModuleList()
|
||||
for in_channel in in_channels:
|
||||
self.input_proj.append(
|
||||
nn.Sequential(
|
||||
nn.Conv2d(in_channel, hidden_dim, kernel_size=1, bias=False),
|
||||
nn.BatchNorm2d(hidden_dim)
|
||||
)
|
||||
)
|
||||
|
||||
# encoder transformer
|
||||
encoder_layer = TransformerEncoderLayer(
|
||||
hidden_dim,
|
||||
nhead=nhead,
|
||||
dim_feedforward=dim_feedforward,
|
||||
dropout=dropout,
|
||||
activation=enc_act)
|
||||
|
||||
self.encoder = nn.ModuleList([
|
||||
TransformerEncoder(copy.deepcopy(encoder_layer), num_encoder_layers) for _ in range(len(use_encoder_idx))
|
||||
])
|
||||
|
||||
# top-down fpn
|
||||
self.lateral_convs = nn.ModuleList()
|
||||
self.fpn_blocks = nn.ModuleList()
|
||||
for _ in range(len(in_channels) - 1, 0, -1):
|
||||
self.lateral_convs.append(ConvNormLayer(hidden_dim, hidden_dim, 1, 1, act=act))
|
||||
self.fpn_blocks.append(
|
||||
CSPRepLayer(hidden_dim * 2, hidden_dim, round(3 * depth_mult), act=act, expansion=expansion)
|
||||
)
|
||||
|
||||
# bottom-up pan
|
||||
self.downsample_convs = nn.ModuleList()
|
||||
self.pan_blocks = nn.ModuleList()
|
||||
for _ in range(len(in_channels) - 1):
|
||||
self.downsample_convs.append(
|
||||
ConvNormLayer(hidden_dim, hidden_dim, 3, 2, act=act)
|
||||
)
|
||||
self.pan_blocks.append(
|
||||
CSPRepLayer(hidden_dim * 2, hidden_dim, round(3 * depth_mult), act=act, expansion=expansion)
|
||||
)
|
||||
|
||||
self._reset_parameters()
|
||||
|
||||
def _reset_parameters(self):
|
||||
if self.eval_spatial_size:
|
||||
for idx in self.use_encoder_idx:
|
||||
stride = self.feat_strides[idx]
|
||||
pos_embed = self.build_2d_sincos_position_embedding(
|
||||
self.eval_spatial_size[1] // stride, self.eval_spatial_size[0] // stride,
|
||||
self.hidden_dim, self.pe_temperature)
|
||||
setattr(self, f'pos_embed{idx}', pos_embed)
|
||||
# self.register_buffer(f'pos_embed{idx}', pos_embed)
|
||||
|
||||
@staticmethod
|
||||
def build_2d_sincos_position_embedding(w, h, embed_dim=256, temperature=10000.):
|
||||
'''
|
||||
'''
|
||||
grid_w = torch.arange(int(w), dtype=torch.float32)
|
||||
grid_h = torch.arange(int(h), dtype=torch.float32)
|
||||
grid_w, grid_h = torch.meshgrid(grid_w, grid_h, indexing='ij')
|
||||
assert embed_dim % 4 == 0, \
|
||||
'Embed dimension must be divisible by 4 for 2D sin-cos position embedding'
|
||||
pos_dim = embed_dim // 4
|
||||
omega = torch.arange(pos_dim, dtype=torch.float32) / pos_dim
|
||||
omega = 1. / (temperature ** omega)
|
||||
|
||||
out_w = grid_w.flatten()[..., None] @ omega[None]
|
||||
out_h = grid_h.flatten()[..., None] @ omega[None]
|
||||
|
||||
return torch.concat([out_w.sin(), out_w.cos(), out_h.sin(), out_h.cos()], dim=1)[None, :, :]
|
||||
|
||||
def forward(self, feats):
|
||||
assert len(feats) == len(self.in_channels)
|
||||
proj_feats = [self.input_proj[i](feat) for i, feat in enumerate(feats)]
|
||||
|
||||
# encoder
|
||||
if self.num_encoder_layers > 0:
|
||||
for i, enc_ind in enumerate(self.use_encoder_idx):
|
||||
h, w = proj_feats[enc_ind].shape[2:]
|
||||
# flatten [B, C, H, W] to [B, HxW, C]
|
||||
src_flatten = proj_feats[enc_ind].flatten(2).permute(0, 2, 1)
|
||||
if self.training or self.eval_spatial_size is None:
|
||||
pos_embed = self.build_2d_sincos_position_embedding(
|
||||
w, h, self.hidden_dim, self.pe_temperature).to(src_flatten.device)
|
||||
else:
|
||||
pos_embed = getattr(self, f'pos_embed{enc_ind}', None).to(src_flatten.device)
|
||||
|
||||
memory = self.encoder[i](src_flatten, pos_embed=pos_embed)
|
||||
proj_feats[enc_ind] = memory.permute(0, 2, 1).reshape(-1, self.hidden_dim, h, w).contiguous()
|
||||
# print([x.is_contiguous() for x in proj_feats ])
|
||||
|
||||
# broadcasting and fusion
|
||||
inner_outs = [proj_feats[-1]]
|
||||
for idx in range(len(self.in_channels) - 1, 0, -1):
|
||||
feat_high = inner_outs[0]
|
||||
feat_low = proj_feats[idx - 1]
|
||||
feat_high = self.lateral_convs[len(self.in_channels) - 1 - idx](feat_high)
|
||||
inner_outs[0] = feat_high
|
||||
upsample_feat = F.interpolate(feat_high, scale_factor=2., mode='nearest')
|
||||
inner_out = self.fpn_blocks[len(self.in_channels)-1-idx](torch.concat([upsample_feat, feat_low], dim=1))
|
||||
inner_outs.insert(0, inner_out)
|
||||
|
||||
outs = [inner_outs[0]]
|
||||
for idx in range(len(self.in_channels) - 1):
|
||||
feat_low = outs[-1]
|
||||
feat_high = inner_outs[idx + 1]
|
||||
downsample_feat = self.downsample_convs[idx](feat_low)
|
||||
out = self.pan_blocks[idx](torch.concat([downsample_feat, feat_high], dim=1))
|
||||
outs.append(out)
|
||||
|
||||
return outs
|
||||
108
rtdetr_pytorch/src/zoo/rtdetr/matcher.py
Normal file
108
rtdetr_pytorch/src/zoo/rtdetr/matcher.py
Normal file
@@ -0,0 +1,108 @@
|
||||
"""
|
||||
Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
|
||||
Modules to compute the matching cost and solve the corresponding LSAP.
|
||||
|
||||
by lyuwenyu
|
||||
"""
|
||||
|
||||
import torch
|
||||
import torch.nn.functional as F
|
||||
|
||||
from scipy.optimize import linear_sum_assignment
|
||||
from torch import nn
|
||||
|
||||
from .box_ops import box_cxcywh_to_xyxy, generalized_box_iou
|
||||
|
||||
from src.core import register
|
||||
|
||||
|
||||
@register
|
||||
class HungarianMatcher(nn.Module):
|
||||
"""This class computes an assignment between the targets and the predictions of the network
|
||||
|
||||
For efficiency reasons, the targets don't include the no_object. Because of this, in general,
|
||||
there are more predictions than targets. In this case, we do a 1-to-1 matching of the best predictions,
|
||||
while the others are un-matched (and thus treated as non-objects).
|
||||
"""
|
||||
|
||||
__share__ = ['use_focal_loss', ]
|
||||
|
||||
def __init__(self, weight_dict, use_focal_loss=False, alpha=0.25, gamma=2.0):
|
||||
"""Creates the matcher
|
||||
|
||||
Params:
|
||||
cost_class: This is the relative weight of the classification error in the matching cost
|
||||
cost_bbox: This is the relative weight of the L1 error of the bounding box coordinates in the matching cost
|
||||
cost_giou: This is the relative weight of the giou loss of the bounding box in the matching cost
|
||||
"""
|
||||
super().__init__()
|
||||
self.cost_class = weight_dict['cost_class']
|
||||
self.cost_bbox = weight_dict['cost_bbox']
|
||||
self.cost_giou = weight_dict['cost_giou']
|
||||
|
||||
self.use_focal_loss = use_focal_loss
|
||||
self.alpha = alpha
|
||||
self.gamma = gamma
|
||||
|
||||
assert self.cost_class != 0 or self.cost_bbox != 0 or self.cost_giou != 0, "all costs cant be 0"
|
||||
|
||||
@torch.no_grad()
|
||||
def forward(self, outputs, targets):
|
||||
""" Performs the matching
|
||||
|
||||
Params:
|
||||
outputs: This is a dict that contains at least these entries:
|
||||
"pred_logits": Tensor of dim [batch_size, num_queries, num_classes] with the classification logits
|
||||
"pred_boxes": Tensor of dim [batch_size, num_queries, 4] with the predicted box coordinates
|
||||
|
||||
targets: This is a list of targets (len(targets) = batch_size), where each target is a dict containing:
|
||||
"labels": Tensor of dim [num_target_boxes] (where num_target_boxes is the number of ground-truth
|
||||
objects in the target) containing the class labels
|
||||
"boxes": Tensor of dim [num_target_boxes, 4] containing the target box coordinates
|
||||
|
||||
Returns:
|
||||
A list of size batch_size, containing tuples of (index_i, index_j) where:
|
||||
- index_i is the indices of the selected predictions (in order)
|
||||
- index_j is the indices of the corresponding selected targets (in order)
|
||||
For each batch element, it holds:
|
||||
len(index_i) = len(index_j) = min(num_queries, num_target_boxes)
|
||||
"""
|
||||
bs, num_queries = outputs["pred_logits"].shape[:2]
|
||||
|
||||
# We flatten to compute the cost matrices in a batch
|
||||
if self.use_focal_loss:
|
||||
out_prob = F.sigmoid(outputs["pred_logits"].flatten(0, 1))
|
||||
else:
|
||||
out_prob = outputs["pred_logits"].flatten(0, 1).softmax(-1) # [batch_size * num_queries, num_classes]
|
||||
|
||||
out_bbox = outputs["pred_boxes"].flatten(0, 1) # [batch_size * num_queries, 4]
|
||||
|
||||
# Also concat the target labels and boxes
|
||||
tgt_ids = torch.cat([v["labels"] for v in targets])
|
||||
tgt_bbox = torch.cat([v["boxes"] for v in targets])
|
||||
|
||||
# Compute the classification cost. Contrary to the loss, we don't use the NLL,
|
||||
# but approximate it in 1 - proba[target class].
|
||||
# The 1 is a constant that doesn't change the matching, it can be ommitted.
|
||||
if self.use_focal_loss:
|
||||
out_prob = out_prob[:, tgt_ids]
|
||||
neg_cost_class = (1 - self.alpha) * (out_prob**self.gamma) * (-(1 - out_prob + 1e-8).log())
|
||||
pos_cost_class = self.alpha * ((1 - out_prob)**self.gamma) * (-(out_prob + 1e-8).log())
|
||||
cost_class = pos_cost_class - neg_cost_class
|
||||
else:
|
||||
cost_class = -out_prob[:, tgt_ids]
|
||||
|
||||
# Compute the L1 cost between boxes
|
||||
cost_bbox = torch.cdist(out_bbox, tgt_bbox, p=1)
|
||||
|
||||
# Compute the giou cost betwen boxes
|
||||
cost_giou = -generalized_box_iou(box_cxcywh_to_xyxy(out_bbox), box_cxcywh_to_xyxy(tgt_bbox))
|
||||
|
||||
# Final cost matrix
|
||||
C = self.cost_bbox * cost_bbox + self.cost_class * cost_class + self.cost_giou * cost_giou
|
||||
C = C.view(bs, num_queries, -1).cpu()
|
||||
|
||||
sizes = [len(v["boxes"]) for v in targets]
|
||||
indices = [linear_sum_assignment(c[i]) for i, c in enumerate(C.split(sizes, -1))]
|
||||
|
||||
return [(torch.as_tensor(i, dtype=torch.int64), torch.as_tensor(j, dtype=torch.int64)) for i, j in indices]
|
||||
44
rtdetr_pytorch/src/zoo/rtdetr/rtdetr.py
Normal file
44
rtdetr_pytorch/src/zoo/rtdetr/rtdetr.py
Normal file
@@ -0,0 +1,44 @@
|
||||
"""by lyuwenyu
|
||||
"""
|
||||
|
||||
import torch
|
||||
import torch.nn as nn
|
||||
import torch.nn.functional as F
|
||||
|
||||
import random
|
||||
import numpy as np
|
||||
|
||||
from src.core import register
|
||||
|
||||
|
||||
__all__ = ['RTDETR', ]
|
||||
|
||||
|
||||
@register
|
||||
class RTDETR(nn.Module):
|
||||
__inject__ = ['backbone', 'encoder', 'decoder', ]
|
||||
|
||||
def __init__(self, backbone: nn.Module, encoder, decoder, multi_scale=None):
|
||||
super().__init__()
|
||||
self.backbone = backbone
|
||||
self.decoder = decoder
|
||||
self.encoder = encoder
|
||||
self.multi_scale = multi_scale
|
||||
|
||||
def forward(self, x, targets=None):
|
||||
if self.multi_scale and self.training:
|
||||
sz = np.random.choice(self.multi_scale)
|
||||
x = F.interpolate(x, size=[sz, sz])
|
||||
|
||||
x = self.backbone(x)
|
||||
x = self.encoder(x)
|
||||
x = self.decoder(x, targets)
|
||||
|
||||
return x
|
||||
|
||||
def deploy(self, ):
|
||||
self.eval()
|
||||
for m in self.modules():
|
||||
if hasattr(m, 'convert_to_deploy'):
|
||||
m.convert_to_deploy()
|
||||
return self
|
||||
341
rtdetr_pytorch/src/zoo/rtdetr/rtdetr_criterion.py
Normal file
341
rtdetr_pytorch/src/zoo/rtdetr/rtdetr_criterion.py
Normal file
@@ -0,0 +1,341 @@
|
||||
"""
|
||||
reference:
|
||||
https://github.com/facebookresearch/detr/blob/main/models/detr.py
|
||||
|
||||
by lyuwenyu
|
||||
"""
|
||||
|
||||
|
||||
import torch
|
||||
import torch.nn as nn
|
||||
import torch.nn.functional as F
|
||||
import torchvision
|
||||
|
||||
# from torchvision.ops import box_convert, generalized_box_iou
|
||||
from .box_ops import box_cxcywh_to_xyxy, box_iou, generalized_box_iou
|
||||
|
||||
from src.misc.dist import get_world_size, is_dist_available_and_initialized
|
||||
from src.core import register
|
||||
|
||||
|
||||
|
||||
@register
|
||||
class SetCriterion(nn.Module):
|
||||
""" This class computes the loss for DETR.
|
||||
The process happens in two steps:
|
||||
1) we compute hungarian assignment between ground truth boxes and the outputs of the model
|
||||
2) we supervise each pair of matched ground-truth / prediction (supervise class and box)
|
||||
"""
|
||||
__share__ = ['num_classes', ]
|
||||
__inject__ = ['matcher', ]
|
||||
|
||||
def __init__(self, matcher, weight_dict, losses, alpha=0.2, gamma=2.0, eos_coef=1e-4, num_classes=80):
|
||||
""" Create the criterion.
|
||||
Parameters:
|
||||
num_classes: number of object categories, omitting the special no-object category
|
||||
matcher: module able to compute a matching between targets and proposals
|
||||
weight_dict: dict containing as key the names of the losses and as values their relative weight.
|
||||
eos_coef: relative classification weight applied to the no-object category
|
||||
losses: list of all the losses to be applied. See get_loss for list of available losses.
|
||||
"""
|
||||
super().__init__()
|
||||
self.num_classes = num_classes
|
||||
self.matcher = matcher
|
||||
self.weight_dict = weight_dict
|
||||
self.losses = losses
|
||||
|
||||
empty_weight = torch.ones(self.num_classes + 1)
|
||||
empty_weight[-1] = eos_coef
|
||||
self.register_buffer('empty_weight', empty_weight)
|
||||
|
||||
self.alpha = alpha
|
||||
self.gamma = gamma
|
||||
|
||||
|
||||
def loss_labels(self, outputs, targets, indices, num_boxes, log=True):
|
||||
"""Classification loss (NLL)
|
||||
targets dicts must contain the key "labels" containing a tensor of dim [nb_target_boxes]
|
||||
"""
|
||||
assert 'pred_logits' in outputs
|
||||
src_logits = outputs['pred_logits']
|
||||
|
||||
idx = self._get_src_permutation_idx(indices)
|
||||
target_classes_o = torch.cat([t["labels"][J] for t, (_, J) in zip(targets, indices)])
|
||||
target_classes = torch.full(src_logits.shape[:2], self.num_classes,
|
||||
dtype=torch.int64, device=src_logits.device)
|
||||
target_classes[idx] = target_classes_o
|
||||
|
||||
loss_ce = F.cross_entropy(src_logits.transpose(1, 2), target_classes, self.empty_weight)
|
||||
losses = {'loss_ce': loss_ce}
|
||||
|
||||
if log:
|
||||
# TODO this should probably be a separate loss, not hacked in this one here
|
||||
losses['class_error'] = 100 - accuracy(src_logits[idx], target_classes_o)[0]
|
||||
return losses
|
||||
|
||||
def loss_labels_bce(self, outputs, targets, indices, num_boxes, log=True):
|
||||
src_logits = outputs['pred_logits']
|
||||
idx = self._get_src_permutation_idx(indices)
|
||||
target_classes_o = torch.cat([t["labels"][J] for t, (_, J) in zip(targets, indices)])
|
||||
target_classes = torch.full(src_logits.shape[:2], self.num_classes,
|
||||
dtype=torch.int64, device=src_logits.device)
|
||||
target_classes[idx] = target_classes_o
|
||||
|
||||
target = F.one_hot(target_classes, num_classes=self.num_classes + 1)[..., :-1]
|
||||
loss = F.binary_cross_entropy_with_logits(src_logits, target * 1., reduction='none')
|
||||
loss = loss.mean(1).sum() * src_logits.shape[1] / num_boxes
|
||||
return {'loss_bce': loss}
|
||||
|
||||
def loss_labels_focal(self, outputs, targets, indices, num_boxes, log=True):
|
||||
assert 'pred_logits' in outputs
|
||||
src_logits = outputs['pred_logits']
|
||||
|
||||
idx = self._get_src_permutation_idx(indices)
|
||||
target_classes_o = torch.cat([t["labels"][J] for t, (_, J) in zip(targets, indices)])
|
||||
target_classes = torch.full(src_logits.shape[:2], self.num_classes,
|
||||
dtype=torch.int64, device=src_logits.device)
|
||||
target_classes[idx] = target_classes_o
|
||||
|
||||
target = F.one_hot(target_classes, num_classes=self.num_classes+1)[..., :-1]
|
||||
# ce_loss = F.binary_cross_entropy_with_logits(src_logits, target * 1., reduction="none")
|
||||
# prob = F.sigmoid(src_logits) # TODO .detach()
|
||||
# p_t = prob * target + (1 - prob) * (1 - target)
|
||||
# alpha_t = self.alpha * target + (1 - self.alpha) * (1 - target)
|
||||
# loss = alpha_t * ce_loss * ((1 - p_t) ** self.gamma)
|
||||
# loss = loss.mean(1).sum() * src_logits.shape[1] / num_boxes
|
||||
loss = torchvision.ops.sigmoid_focal_loss(src_logits, target, self.alpha, self.gamma, reduction='none')
|
||||
loss = loss.mean(1).sum() * src_logits.shape[1] / num_boxes
|
||||
|
||||
return {'loss_focal': loss}
|
||||
|
||||
def loss_labels_vfl(self, outputs, targets, indices, num_boxes, log=True):
|
||||
assert 'pred_boxes' in outputs
|
||||
idx = self._get_src_permutation_idx(indices)
|
||||
|
||||
src_boxes = outputs['pred_boxes'][idx]
|
||||
target_boxes = torch.cat([t['boxes'][i] for t, (_, i) in zip(targets, indices)], dim=0)
|
||||
ious, _ = box_iou(box_cxcywh_to_xyxy(src_boxes), box_cxcywh_to_xyxy(target_boxes))
|
||||
ious = torch.diag(ious).detach()
|
||||
|
||||
src_logits = outputs['pred_logits']
|
||||
target_classes_o = torch.cat([t["labels"][J] for t, (_, J) in zip(targets, indices)])
|
||||
target_classes = torch.full(src_logits.shape[:2], self.num_classes,
|
||||
dtype=torch.int64, device=src_logits.device)
|
||||
target_classes[idx] = target_classes_o
|
||||
target = F.one_hot(target_classes, num_classes=self.num_classes + 1)[..., :-1]
|
||||
|
||||
target_score_o = torch.zeros_like(target_classes, dtype=src_logits.dtype)
|
||||
target_score_o[idx] = ious.to(target_score_o.dtype)
|
||||
target_score = target_score_o.unsqueeze(-1) * target
|
||||
|
||||
pred_score = F.sigmoid(src_logits).detach()
|
||||
weight = self.alpha * pred_score.pow(self.gamma) * (1 - target) + target_score
|
||||
|
||||
loss = F.binary_cross_entropy_with_logits(src_logits, target_score, weight=weight, reduction='none')
|
||||
loss = loss.mean(1).sum() * src_logits.shape[1] / num_boxes
|
||||
return {'loss_vfl': loss}
|
||||
|
||||
@torch.no_grad()
|
||||
def loss_cardinality(self, outputs, targets, indices, num_boxes):
|
||||
""" Compute the cardinality error, ie the absolute error in the number of predicted non-empty boxes
|
||||
This is not really a loss, it is intended for logging purposes only. It doesn't propagate gradients
|
||||
"""
|
||||
pred_logits = outputs['pred_logits']
|
||||
device = pred_logits.device
|
||||
tgt_lengths = torch.as_tensor([len(v["labels"]) for v in targets], device=device)
|
||||
# Count the number of predictions that are NOT "no-object" (which is the last class)
|
||||
card_pred = (pred_logits.argmax(-1) != pred_logits.shape[-1] - 1).sum(1)
|
||||
card_err = F.l1_loss(card_pred.float(), tgt_lengths.float())
|
||||
losses = {'cardinality_error': card_err}
|
||||
return losses
|
||||
|
||||
def loss_boxes(self, outputs, targets, indices, num_boxes):
|
||||
"""Compute the losses related to the bounding boxes, the L1 regression loss and the GIoU loss
|
||||
targets dicts must contain the key "boxes" containing a tensor of dim [nb_target_boxes, 4]
|
||||
The target boxes are expected in format (center_x, center_y, w, h), normalized by the image size.
|
||||
"""
|
||||
assert 'pred_boxes' in outputs
|
||||
idx = self._get_src_permutation_idx(indices)
|
||||
src_boxes = outputs['pred_boxes'][idx]
|
||||
target_boxes = torch.cat([t['boxes'][i] for t, (_, i) in zip(targets, indices)], dim=0)
|
||||
|
||||
losses = {}
|
||||
|
||||
loss_bbox = F.l1_loss(src_boxes, target_boxes, reduction='none')
|
||||
losses['loss_bbox'] = loss_bbox.sum() / num_boxes
|
||||
|
||||
loss_giou = 1 - torch.diag(generalized_box_iou(
|
||||
box_cxcywh_to_xyxy(src_boxes),
|
||||
box_cxcywh_to_xyxy(target_boxes)))
|
||||
losses['loss_giou'] = loss_giou.sum() / num_boxes
|
||||
return losses
|
||||
|
||||
def loss_masks(self, outputs, targets, indices, num_boxes):
|
||||
"""Compute the losses related to the masks: the focal loss and the dice loss.
|
||||
targets dicts must contain the key "masks" containing a tensor of dim [nb_target_boxes, h, w]
|
||||
"""
|
||||
assert "pred_masks" in outputs
|
||||
|
||||
src_idx = self._get_src_permutation_idx(indices)
|
||||
tgt_idx = self._get_tgt_permutation_idx(indices)
|
||||
src_masks = outputs["pred_masks"]
|
||||
src_masks = src_masks[src_idx]
|
||||
masks = [t["masks"] for t in targets]
|
||||
# TODO use valid to mask invalid areas due to padding in loss
|
||||
target_masks, valid = nested_tensor_from_tensor_list(masks).decompose()
|
||||
target_masks = target_masks.to(src_masks)
|
||||
target_masks = target_masks[tgt_idx]
|
||||
|
||||
# upsample predictions to the target size
|
||||
src_masks = interpolate(src_masks[:, None], size=target_masks.shape[-2:],
|
||||
mode="bilinear", align_corners=False)
|
||||
src_masks = src_masks[:, 0].flatten(1)
|
||||
|
||||
target_masks = target_masks.flatten(1)
|
||||
target_masks = target_masks.view(src_masks.shape)
|
||||
losses = {
|
||||
"loss_mask": sigmoid_focal_loss(src_masks, target_masks, num_boxes),
|
||||
"loss_dice": dice_loss(src_masks, target_masks, num_boxes),
|
||||
}
|
||||
return losses
|
||||
|
||||
def _get_src_permutation_idx(self, indices):
|
||||
# permute predictions following indices
|
||||
batch_idx = torch.cat([torch.full_like(src, i) for i, (src, _) in enumerate(indices)])
|
||||
src_idx = torch.cat([src for (src, _) in indices])
|
||||
return batch_idx, src_idx
|
||||
|
||||
def _get_tgt_permutation_idx(self, indices):
|
||||
# permute targets following indices
|
||||
batch_idx = torch.cat([torch.full_like(tgt, i) for i, (_, tgt) in enumerate(indices)])
|
||||
tgt_idx = torch.cat([tgt for (_, tgt) in indices])
|
||||
return batch_idx, tgt_idx
|
||||
|
||||
def get_loss(self, loss, outputs, targets, indices, num_boxes, **kwargs):
|
||||
loss_map = {
|
||||
'labels': self.loss_labels,
|
||||
'cardinality': self.loss_cardinality,
|
||||
'boxes': self.loss_boxes,
|
||||
'masks': self.loss_masks,
|
||||
|
||||
'bce': self.loss_labels_bce,
|
||||
'focal': self.loss_labels_focal,
|
||||
'vfl': self.loss_labels_vfl,
|
||||
}
|
||||
assert loss in loss_map, f'do you really want to compute {loss} loss?'
|
||||
return loss_map[loss](outputs, targets, indices, num_boxes, **kwargs)
|
||||
|
||||
def forward(self, outputs, targets):
|
||||
""" This performs the loss computation.
|
||||
Parameters:
|
||||
outputs: dict of tensors, see the output specification of the model for the format
|
||||
targets: list of dicts, such that len(targets) == batch_size.
|
||||
The expected keys in each dict depends on the losses applied, see each loss' doc
|
||||
"""
|
||||
outputs_without_aux = {k: v for k, v in outputs.items() if 'aux' not in k}
|
||||
|
||||
# Retrieve the matching between the outputs of the last layer and the targets
|
||||
indices = self.matcher(outputs_without_aux, targets)
|
||||
|
||||
# Compute the average number of target boxes accross all nodes, for normalization purposes
|
||||
num_boxes = sum(len(t["labels"]) for t in targets)
|
||||
num_boxes = torch.as_tensor([num_boxes], dtype=torch.float, device=next(iter(outputs.values())).device)
|
||||
if is_dist_available_and_initialized():
|
||||
torch.distributed.all_reduce(num_boxes)
|
||||
num_boxes = torch.clamp(num_boxes / get_world_size(), min=1).item()
|
||||
|
||||
# Compute all the requested losses
|
||||
losses = {}
|
||||
for loss in self.losses:
|
||||
l_dict = self.get_loss(loss, outputs, targets, indices, num_boxes)
|
||||
l_dict = {k: l_dict[k] * self.weight_dict[k] for k in l_dict if k in self.weight_dict}
|
||||
losses.update(l_dict)
|
||||
|
||||
# In case of auxiliary losses, we repeat this process with the output of each intermediate layer.
|
||||
if 'aux_outputs' in outputs:
|
||||
for i, aux_outputs in enumerate(outputs['aux_outputs']):
|
||||
indices = self.matcher(aux_outputs, targets)
|
||||
for loss in self.losses:
|
||||
if loss == 'masks':
|
||||
# Intermediate masks losses are too costly to compute, we ignore them.
|
||||
continue
|
||||
kwargs = {}
|
||||
if loss == 'labels':
|
||||
# Logging is enabled only for the last layer
|
||||
kwargs = {'log': False}
|
||||
|
||||
l_dict = self.get_loss(loss, aux_outputs, targets, indices, num_boxes, **kwargs)
|
||||
l_dict = {k: l_dict[k] * self.weight_dict[k] for k in l_dict if k in self.weight_dict}
|
||||
l_dict = {k + f'_aux_{i}': v for k, v in l_dict.items()}
|
||||
losses.update(l_dict)
|
||||
|
||||
# In case of cdn auxiliary losses. For rtdetr
|
||||
if 'dn_aux_outputs' in outputs:
|
||||
assert 'dn_meta' in outputs, ''
|
||||
indices = self.get_cdn_matched_indices(outputs['dn_meta'], targets)
|
||||
num_boxes = num_boxes * outputs['dn_meta']['dn_num_group']
|
||||
|
||||
for i, aux_outputs in enumerate(outputs['dn_aux_outputs']):
|
||||
# indices = self.matcher(aux_outputs, targets)
|
||||
for loss in self.losses:
|
||||
if loss == 'masks':
|
||||
# Intermediate masks losses are too costly to compute, we ignore them.
|
||||
continue
|
||||
kwargs = {}
|
||||
if loss == 'labels':
|
||||
# Logging is enabled only for the last layer
|
||||
kwargs = {'log': False}
|
||||
|
||||
l_dict = self.get_loss(loss, aux_outputs, targets, indices, num_boxes, **kwargs)
|
||||
l_dict = {k: l_dict[k] * self.weight_dict[k] for k in l_dict if k in self.weight_dict}
|
||||
l_dict = {k + f'_dn_{i}': v for k, v in l_dict.items()}
|
||||
losses.update(l_dict)
|
||||
|
||||
return losses
|
||||
|
||||
@staticmethod
|
||||
def get_cdn_matched_indices(dn_meta, targets):
|
||||
'''get_cdn_matched_indices
|
||||
'''
|
||||
dn_positive_idx, dn_num_group = dn_meta["dn_positive_idx"], dn_meta["dn_num_group"]
|
||||
num_gts = [len(t['labels']) for t in targets]
|
||||
device = targets[0]['labels'].device
|
||||
|
||||
dn_match_indices = []
|
||||
for i, num_gt in enumerate(num_gts):
|
||||
if num_gt > 0:
|
||||
gt_idx = torch.arange(num_gt, dtype=torch.int64, device=device)
|
||||
gt_idx = gt_idx.tile(dn_num_group)
|
||||
assert len(dn_positive_idx[i]) == len(gt_idx)
|
||||
dn_match_indices.append((dn_positive_idx[i], gt_idx))
|
||||
else:
|
||||
dn_match_indices.append((torch.zeros(0, dtype=torch.int64, device=device), \
|
||||
torch.zeros(0, dtype=torch.int64, device=device)))
|
||||
|
||||
return dn_match_indices
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
@torch.no_grad()
|
||||
def accuracy(output, target, topk=(1,)):
|
||||
"""Computes the precision@k for the specified values of k"""
|
||||
if target.numel() == 0:
|
||||
return [torch.zeros([], device=output.device)]
|
||||
maxk = max(topk)
|
||||
batch_size = target.size(0)
|
||||
|
||||
_, pred = output.topk(maxk, 1, True, True)
|
||||
pred = pred.t()
|
||||
correct = pred.eq(target.view(1, -1).expand_as(pred))
|
||||
|
||||
res = []
|
||||
for k in topk:
|
||||
correct_k = correct[:k].view(-1).float().sum(0)
|
||||
res.append(correct_k.mul_(100.0 / batch_size))
|
||||
return res
|
||||
|
||||
|
||||
|
||||
|
||||
574
rtdetr_pytorch/src/zoo/rtdetr/rtdetr_decoder.py
Normal file
574
rtdetr_pytorch/src/zoo/rtdetr/rtdetr_decoder.py
Normal file
@@ -0,0 +1,574 @@
|
||||
"""by lyuwenyu
|
||||
"""
|
||||
|
||||
import math
|
||||
import copy
|
||||
from collections import OrderedDict
|
||||
|
||||
import torch
|
||||
import torch.nn as nn
|
||||
import torch.nn.functional as F
|
||||
import torch.nn.init as init
|
||||
|
||||
from .denoising import get_contrastive_denoising_training_group
|
||||
from .utils import deformable_attention_core_func, get_activation, inverse_sigmoid
|
||||
from .utils import bias_init_with_prob
|
||||
|
||||
|
||||
from src.core import register
|
||||
|
||||
|
||||
__all__ = ['RTDETRTransformer']
|
||||
|
||||
|
||||
|
||||
class MLP(nn.Module):
|
||||
def __init__(self, input_dim, hidden_dim, output_dim, num_layers, act='relu'):
|
||||
super().__init__()
|
||||
self.num_layers = num_layers
|
||||
h = [hidden_dim] * (num_layers - 1)
|
||||
self.layers = nn.ModuleList(nn.Linear(n, k) for n, k in zip([input_dim] + h, h + [output_dim]))
|
||||
self.act = nn.Identity() if act is None else get_activation(act)
|
||||
|
||||
def forward(self, x):
|
||||
for i, layer in enumerate(self.layers):
|
||||
x = self.act(layer(x)) if i < self.num_layers - 1 else layer(x)
|
||||
return x
|
||||
|
||||
|
||||
|
||||
class MSDeformableAttention(nn.Module):
|
||||
def __init__(self, embed_dim=256, num_heads=8, num_levels=4, num_points=4,):
|
||||
"""
|
||||
Multi-Scale Deformable Attention Module
|
||||
"""
|
||||
super(MSDeformableAttention, self).__init__()
|
||||
self.embed_dim = embed_dim
|
||||
self.num_heads = num_heads
|
||||
self.num_levels = num_levels
|
||||
self.num_points = num_points
|
||||
self.total_points = num_heads * num_levels * num_points
|
||||
|
||||
self.head_dim = embed_dim // num_heads
|
||||
assert self.head_dim * num_heads == self.embed_dim, "embed_dim must be divisible by num_heads"
|
||||
|
||||
self.sampling_offsets = nn.Linear(embed_dim, self.total_points * 2,)
|
||||
self.attention_weights = nn.Linear(embed_dim, self.total_points)
|
||||
self.value_proj = nn.Linear(embed_dim, embed_dim)
|
||||
self.output_proj = nn.Linear(embed_dim, embed_dim)
|
||||
|
||||
self.ms_deformable_attn_core = deformable_attention_core_func
|
||||
|
||||
self._reset_parameters()
|
||||
|
||||
|
||||
def _reset_parameters(self):
|
||||
# sampling_offsets
|
||||
init.constant_(self.sampling_offsets.weight, 0)
|
||||
thetas = torch.arange(self.num_heads, dtype=torch.float32) * (2.0 * math.pi / self.num_heads)
|
||||
grid_init = torch.stack([thetas.cos(), thetas.sin()], -1)
|
||||
grid_init = grid_init / grid_init.abs().max(-1, keepdim=True).values
|
||||
grid_init = grid_init.reshape(self.num_heads, 1, 1, 2).tile([1, self.num_levels, self.num_points, 1])
|
||||
scaling = torch.arange(1, self.num_points + 1, dtype=torch.float32).reshape(1, 1, -1, 1)
|
||||
grid_init *= scaling
|
||||
self.sampling_offsets.bias.data[...] = grid_init.flatten()
|
||||
|
||||
# attention_weights
|
||||
init.constant_(self.attention_weights.weight, 0)
|
||||
init.constant_(self.attention_weights.bias, 0)
|
||||
|
||||
# proj
|
||||
init.xavier_uniform_(self.value_proj.weight)
|
||||
init.constant_(self.value_proj.bias, 0)
|
||||
init.xavier_uniform_(self.output_proj.weight)
|
||||
init.constant_(self.output_proj.bias, 0)
|
||||
|
||||
|
||||
def forward(self,
|
||||
query,
|
||||
reference_points,
|
||||
value,
|
||||
value_spatial_shapes,
|
||||
value_mask=None):
|
||||
"""
|
||||
Args:
|
||||
query (Tensor): [bs, query_length, C]
|
||||
reference_points (Tensor): [bs, query_length, n_levels, 2], range in [0, 1], top-left (0,0),
|
||||
bottom-right (1, 1), including padding area
|
||||
value (Tensor): [bs, value_length, C]
|
||||
value_spatial_shapes (List): [n_levels, 2], [(H_0, W_0), (H_1, W_1), ..., (H_{L-1}, W_{L-1})]
|
||||
value_level_start_index (List): [n_levels], [0, H_0*W_0, H_0*W_0+H_1*W_1, ...]
|
||||
value_mask (Tensor): [bs, value_length], True for non-padding elements, False for padding elements
|
||||
|
||||
Returns:
|
||||
output (Tensor): [bs, Length_{query}, C]
|
||||
"""
|
||||
bs, Len_q = query.shape[:2]
|
||||
Len_v = value.shape[1]
|
||||
|
||||
value = self.value_proj(value)
|
||||
if value_mask is not None:
|
||||
value_mask = value_mask.astype(value.dtype).unsqueeze(-1)
|
||||
value *= value_mask
|
||||
value = value.reshape(bs, Len_v, self.num_heads, self.head_dim)
|
||||
|
||||
sampling_offsets = self.sampling_offsets(query).reshape(
|
||||
bs, Len_q, self.num_heads, self.num_levels, self.num_points, 2)
|
||||
attention_weights = self.attention_weights(query).reshape(
|
||||
bs, Len_q, self.num_heads, self.num_levels * self.num_points)
|
||||
attention_weights = F.softmax(attention_weights, dim=-1).reshape(
|
||||
bs, Len_q, self.num_heads, self.num_levels, self.num_points)
|
||||
|
||||
if reference_points.shape[-1] == 2:
|
||||
offset_normalizer = torch.tensor(value_spatial_shapes)
|
||||
offset_normalizer = offset_normalizer.flip([1]).reshape(
|
||||
1, 1, 1, self.num_levels, 1, 2)
|
||||
sampling_locations = reference_points.reshape(
|
||||
bs, Len_q, 1, self.num_levels, 1, 2
|
||||
) + sampling_offsets / offset_normalizer
|
||||
elif reference_points.shape[-1] == 4:
|
||||
sampling_locations = (
|
||||
reference_points[:, :, None, :, None, :2] + sampling_offsets /
|
||||
self.num_points * reference_points[:, :, None, :, None, 2:] * 0.5)
|
||||
else:
|
||||
raise ValueError(
|
||||
"Last dim of reference_points must be 2 or 4, but get {} instead.".
|
||||
format(reference_points.shape[-1]))
|
||||
|
||||
output = self.ms_deformable_attn_core(value, value_spatial_shapes, sampling_locations, attention_weights)
|
||||
|
||||
output = self.output_proj(output)
|
||||
|
||||
return output
|
||||
|
||||
|
||||
class TransformerDecoderLayer(nn.Module):
|
||||
def __init__(self,
|
||||
d_model=256,
|
||||
n_head=8,
|
||||
dim_feedforward=1024,
|
||||
dropout=0.,
|
||||
activation="relu",
|
||||
n_levels=4,
|
||||
n_points=4,):
|
||||
super(TransformerDecoderLayer, self).__init__()
|
||||
|
||||
# self attention
|
||||
self.self_attn = nn.MultiheadAttention(d_model, n_head, dropout=dropout, batch_first=True)
|
||||
self.dropout1 = nn.Dropout(dropout)
|
||||
self.norm1 = nn.LayerNorm(d_model)
|
||||
|
||||
# cross attention
|
||||
self.cross_attn = MSDeformableAttention(d_model, n_head, n_levels, n_points)
|
||||
self.dropout2 = nn.Dropout(dropout)
|
||||
self.norm2 = nn.LayerNorm(d_model)
|
||||
|
||||
# ffn
|
||||
self.linear1 = nn.Linear(d_model, dim_feedforward)
|
||||
self.activation = getattr(F, activation)
|
||||
self.dropout3 = nn.Dropout(dropout)
|
||||
self.linear2 = nn.Linear(dim_feedforward, d_model)
|
||||
self.dropout4 = nn.Dropout(dropout)
|
||||
self.norm3 = nn.LayerNorm(d_model)
|
||||
|
||||
# self._reset_parameters()
|
||||
|
||||
# def _reset_parameters(self):
|
||||
# linear_init_(self.linear1)
|
||||
# linear_init_(self.linear2)
|
||||
# xavier_uniform_(self.linear1.weight)
|
||||
# xavier_uniform_(self.linear2.weight)
|
||||
|
||||
def with_pos_embed(self, tensor, pos):
|
||||
return tensor if pos is None else tensor + pos
|
||||
|
||||
def forward_ffn(self, tgt):
|
||||
return self.linear2(self.dropout3(self.activation(self.linear1(tgt))))
|
||||
|
||||
def forward(self,
|
||||
tgt,
|
||||
reference_points,
|
||||
memory,
|
||||
memory_spatial_shapes,
|
||||
memory_level_start_index,
|
||||
attn_mask=None,
|
||||
memory_mask=None,
|
||||
query_pos_embed=None):
|
||||
# self attention
|
||||
q = k = self.with_pos_embed(tgt, query_pos_embed)
|
||||
|
||||
# if attn_mask is not None:
|
||||
# attn_mask = torch.where(
|
||||
# attn_mask.to(torch.bool),
|
||||
# torch.zeros_like(attn_mask),
|
||||
# torch.full_like(attn_mask, float('-inf'), dtype=tgt.dtype))
|
||||
|
||||
tgt2, _ = self.self_attn(q, k, value=tgt, attn_mask=attn_mask)
|
||||
tgt = tgt + self.dropout1(tgt2)
|
||||
tgt = self.norm1(tgt)
|
||||
|
||||
# cross attention
|
||||
tgt2 = self.cross_attn(\
|
||||
self.with_pos_embed(tgt, query_pos_embed),
|
||||
reference_points,
|
||||
memory,
|
||||
memory_spatial_shapes,
|
||||
memory_mask)
|
||||
tgt = tgt + self.dropout2(tgt2)
|
||||
tgt = self.norm2(tgt)
|
||||
|
||||
# ffn
|
||||
tgt2 = self.forward_ffn(tgt)
|
||||
tgt = tgt + self.dropout4(tgt2)
|
||||
tgt = self.norm3(tgt.clamp(min=-65504, max=65504))
|
||||
|
||||
return tgt
|
||||
|
||||
|
||||
class TransformerDecoder(nn.Module):
|
||||
def __init__(self, hidden_dim, decoder_layer, num_layers, eval_idx=-1):
|
||||
super(TransformerDecoder, self).__init__()
|
||||
self.layers = nn.ModuleList([copy.deepcopy(decoder_layer) for _ in range(num_layers)])
|
||||
self.hidden_dim = hidden_dim
|
||||
self.num_layers = num_layers
|
||||
self.eval_idx = eval_idx if eval_idx >= 0 else num_layers + eval_idx
|
||||
|
||||
def forward(self,
|
||||
tgt,
|
||||
ref_points_unact,
|
||||
memory,
|
||||
memory_spatial_shapes,
|
||||
memory_level_start_index,
|
||||
bbox_head,
|
||||
score_head,
|
||||
query_pos_head,
|
||||
attn_mask=None,
|
||||
memory_mask=None):
|
||||
output = tgt
|
||||
dec_out_bboxes = []
|
||||
dec_out_logits = []
|
||||
ref_points_detach = F.sigmoid(ref_points_unact)
|
||||
|
||||
for i, layer in enumerate(self.layers):
|
||||
ref_points_input = ref_points_detach.unsqueeze(2)
|
||||
query_pos_embed = query_pos_head(ref_points_detach)
|
||||
|
||||
output = layer(output, ref_points_input, memory,
|
||||
memory_spatial_shapes, memory_level_start_index,
|
||||
attn_mask, memory_mask, query_pos_embed)
|
||||
|
||||
inter_ref_bbox = F.sigmoid(bbox_head[i](output) + inverse_sigmoid(ref_points_detach))
|
||||
|
||||
if self.training:
|
||||
dec_out_logits.append(score_head[i](output))
|
||||
if i == 0:
|
||||
dec_out_bboxes.append(inter_ref_bbox)
|
||||
else:
|
||||
dec_out_bboxes.append(F.sigmoid(bbox_head[i](output) + inverse_sigmoid(ref_points)))
|
||||
|
||||
elif i == self.eval_idx:
|
||||
dec_out_logits.append(score_head[i](output))
|
||||
dec_out_bboxes.append(inter_ref_bbox)
|
||||
break
|
||||
|
||||
ref_points = inter_ref_bbox
|
||||
ref_points_detach = inter_ref_bbox.detach(
|
||||
) if self.training else inter_ref_bbox
|
||||
|
||||
return torch.stack(dec_out_bboxes), torch.stack(dec_out_logits)
|
||||
|
||||
|
||||
@register
|
||||
class RTDETRTransformer(nn.Module):
|
||||
__share__ = ['num_classes']
|
||||
def __init__(self,
|
||||
num_classes=80,
|
||||
hidden_dim=256,
|
||||
num_queries=300,
|
||||
position_embed_type='sine',
|
||||
feat_channels=[512, 1024, 2048],
|
||||
feat_strides=[8, 16, 32],
|
||||
num_levels=3,
|
||||
num_decoder_points=4,
|
||||
nhead=8,
|
||||
num_decoder_layers=6,
|
||||
dim_feedforward=1024,
|
||||
dropout=0.,
|
||||
activation="relu",
|
||||
num_denoising=100,
|
||||
label_noise_ratio=0.5,
|
||||
box_noise_scale=1.0,
|
||||
learnt_init_query=False,
|
||||
eval_spatial_size=None,
|
||||
eval_idx=-1,
|
||||
eps=1e-2,
|
||||
aux_loss=True):
|
||||
|
||||
super(RTDETRTransformer, self).__init__()
|
||||
assert position_embed_type in ['sine', 'learned'], \
|
||||
f'ValueError: position_embed_type not supported {position_embed_type}!'
|
||||
assert len(feat_channels) <= num_levels
|
||||
assert len(feat_strides) == len(feat_channels)
|
||||
for _ in range(num_levels - len(feat_strides)):
|
||||
feat_strides.append(feat_strides[-1] * 2)
|
||||
|
||||
self.hidden_dim = hidden_dim
|
||||
self.nhead = nhead
|
||||
self.feat_strides = feat_strides
|
||||
self.num_levels = num_levels
|
||||
self.num_classes = num_classes
|
||||
self.num_queries = num_queries
|
||||
self.eps = eps
|
||||
self.num_decoder_layers = num_decoder_layers
|
||||
self.eval_spatial_size = eval_spatial_size
|
||||
self.aux_loss = aux_loss
|
||||
|
||||
# backbone feature projection
|
||||
self._build_input_proj_layer(feat_channels)
|
||||
|
||||
# Transformer module
|
||||
decoder_layer = TransformerDecoderLayer(hidden_dim, nhead, dim_feedforward, dropout, activation, num_levels, num_decoder_points)
|
||||
self.decoder = TransformerDecoder(hidden_dim, decoder_layer, num_decoder_layers, eval_idx)
|
||||
|
||||
self.num_denoising = num_denoising
|
||||
self.label_noise_ratio = label_noise_ratio
|
||||
self.box_noise_scale = box_noise_scale
|
||||
# denoising part
|
||||
if num_denoising > 0:
|
||||
# self.denoising_class_embed = nn.Embedding(num_classes, hidden_dim, padding_idx=num_classes-1) # TODO for load paddle weights
|
||||
self.denoising_class_embed = nn.Embedding(num_classes+1, hidden_dim, padding_idx=num_classes)
|
||||
|
||||
# decoder embedding
|
||||
self.learnt_init_query = learnt_init_query
|
||||
if learnt_init_query:
|
||||
self.tgt_embed = nn.Embedding(num_queries, hidden_dim)
|
||||
self.query_pos_head = MLP(4, 2 * hidden_dim, hidden_dim, num_layers=2)
|
||||
|
||||
# encoder head
|
||||
self.enc_output = nn.Sequential(
|
||||
nn.Linear(hidden_dim, hidden_dim),
|
||||
nn.LayerNorm(hidden_dim,)
|
||||
)
|
||||
self.enc_score_head = nn.Linear(hidden_dim, num_classes)
|
||||
self.enc_bbox_head = MLP(hidden_dim, hidden_dim, 4, num_layers=3)
|
||||
|
||||
# decoder head
|
||||
self.dec_score_head = nn.ModuleList([
|
||||
nn.Linear(hidden_dim, num_classes)
|
||||
for _ in range(num_decoder_layers)
|
||||
])
|
||||
self.dec_bbox_head = nn.ModuleList([
|
||||
MLP(hidden_dim, hidden_dim, 4, num_layers=3)
|
||||
for _ in range(num_decoder_layers)
|
||||
])
|
||||
|
||||
# init encoder output anchors and valid_mask
|
||||
if self.eval_spatial_size:
|
||||
self.anchors, self.valid_mask = self._generate_anchors()
|
||||
|
||||
self._reset_parameters()
|
||||
|
||||
def _reset_parameters(self):
|
||||
bias = bias_init_with_prob(0.01)
|
||||
|
||||
init.constant_(self.enc_score_head.bias, bias)
|
||||
init.constant_(self.enc_bbox_head.layers[-1].weight, 0)
|
||||
init.constant_(self.enc_bbox_head.layers[-1].bias, 0)
|
||||
|
||||
for cls_, reg_ in zip(self.dec_score_head, self.dec_bbox_head):
|
||||
init.constant_(cls_.bias, bias)
|
||||
init.constant_(reg_.layers[-1].weight, 0)
|
||||
init.constant_(reg_.layers[-1].bias, 0)
|
||||
|
||||
# linear_init_(self.enc_output[0])
|
||||
init.xavier_uniform_(self.enc_output[0].weight)
|
||||
if self.learnt_init_query:
|
||||
init.xavier_uniform_(self.tgt_embed.weight)
|
||||
init.xavier_uniform_(self.query_pos_head.layers[0].weight)
|
||||
init.xavier_uniform_(self.query_pos_head.layers[1].weight)
|
||||
|
||||
|
||||
def _build_input_proj_layer(self, feat_channels):
|
||||
self.input_proj = nn.ModuleList()
|
||||
for in_channels in feat_channels:
|
||||
self.input_proj.append(
|
||||
nn.Sequential(OrderedDict([
|
||||
('conv', nn.Conv2d(in_channels, self.hidden_dim, 1, bias=False)),
|
||||
('norm', nn.BatchNorm2d(self.hidden_dim,))])
|
||||
)
|
||||
)
|
||||
|
||||
in_channels = feat_channels[-1]
|
||||
|
||||
for _ in range(self.num_levels - len(feat_channels)):
|
||||
self.input_proj.append(
|
||||
nn.Sequential(OrderedDict([
|
||||
('conv', nn.Conv2d(in_channels, self.hidden_dim, 3, 2, padding=1, bias=False)),
|
||||
('norm', nn.BatchNorm2d(self.hidden_dim))])
|
||||
)
|
||||
)
|
||||
in_channels = self.hidden_dim
|
||||
|
||||
def _get_encoder_input(self, feats):
|
||||
# get projection features
|
||||
proj_feats = [self.input_proj[i](feat) for i, feat in enumerate(feats)]
|
||||
if self.num_levels > len(proj_feats):
|
||||
len_srcs = len(proj_feats)
|
||||
for i in range(len_srcs, self.num_levels):
|
||||
if i == len_srcs:
|
||||
proj_feats.append(self.input_proj[i](feats[-1]))
|
||||
else:
|
||||
proj_feats.append(self.input_proj[i](proj_feats[-1]))
|
||||
|
||||
# get encoder inputs
|
||||
feat_flatten = []
|
||||
spatial_shapes = []
|
||||
level_start_index = [0, ]
|
||||
for i, feat in enumerate(proj_feats):
|
||||
_, _, h, w = feat.shape
|
||||
# [b, c, h, w] -> [b, h*w, c]
|
||||
feat_flatten.append(feat.flatten(2).permute(0, 2, 1))
|
||||
# [num_levels, 2]
|
||||
spatial_shapes.append([h, w])
|
||||
# [l], start index of each level
|
||||
level_start_index.append(h * w + level_start_index[-1])
|
||||
|
||||
# [b, l, c]
|
||||
feat_flatten = torch.concat(feat_flatten, 1)
|
||||
level_start_index.pop()
|
||||
return (feat_flatten, spatial_shapes, level_start_index)
|
||||
|
||||
def _generate_anchors(self,
|
||||
spatial_shapes=None,
|
||||
grid_size=0.05,
|
||||
dtype=torch.float32,
|
||||
device='cpu'):
|
||||
if spatial_shapes is None:
|
||||
spatial_shapes = [[int(self.eval_spatial_size[0] / s), int(self.eval_spatial_size[1] / s)]
|
||||
for s in self.feat_strides
|
||||
]
|
||||
anchors = []
|
||||
for lvl, (h, w) in enumerate(spatial_shapes):
|
||||
grid_y, grid_x = torch.meshgrid(\
|
||||
torch.arange(end=h, dtype=dtype), \
|
||||
torch.arange(end=w, dtype=dtype), indexing='ij')
|
||||
grid_xy = torch.stack([grid_x, grid_y], -1)
|
||||
valid_WH = torch.tensor([w, h]).to(dtype)
|
||||
grid_xy = (grid_xy.unsqueeze(0) + 0.5) / valid_WH
|
||||
wh = torch.ones_like(grid_xy) * grid_size * (2.0 ** lvl)
|
||||
anchors.append(torch.concat([grid_xy, wh], -1).reshape(-1, h * w, 4))
|
||||
|
||||
anchors = torch.concat(anchors, 1).to(device)
|
||||
valid_mask = ((anchors > self.eps) * (anchors < 1 - self.eps)).all(-1, keepdim=True)
|
||||
anchors = torch.log(anchors / (1 - anchors))
|
||||
# anchors = torch.where(valid_mask, anchors, float('inf'))
|
||||
# anchors[valid_mask] = torch.inf # valid_mask [1, 8400, 1]
|
||||
anchors = torch.where(valid_mask, anchors, torch.inf)
|
||||
|
||||
return anchors, valid_mask
|
||||
|
||||
|
||||
def _get_decoder_input(self,
|
||||
memory,
|
||||
spatial_shapes,
|
||||
denoising_class=None,
|
||||
denoising_bbox_unact=None):
|
||||
bs, _, _ = memory.shape
|
||||
# prepare input for decoder
|
||||
if self.training or self.eval_spatial_size is None:
|
||||
anchors, valid_mask = self._generate_anchors(spatial_shapes, device=memory.device)
|
||||
else:
|
||||
anchors, valid_mask = self.anchors.to(memory.device), self.valid_mask.to(memory.device)
|
||||
|
||||
# memory = torch.where(valid_mask, memory, 0)
|
||||
memory = valid_mask.to(memory.dtype) * memory # TODO fix type error for onnx export
|
||||
|
||||
output_memory = self.enc_output(memory)
|
||||
|
||||
enc_outputs_class = self.enc_score_head(output_memory)
|
||||
enc_outputs_coord_unact = self.enc_bbox_head(output_memory) + anchors
|
||||
|
||||
_, topk_ind = torch.topk(enc_outputs_class.max(-1).values, self.num_queries, dim=1)
|
||||
|
||||
reference_points_unact = enc_outputs_coord_unact.gather(dim=1, \
|
||||
index=topk_ind.unsqueeze(-1).repeat(1, 1, enc_outputs_coord_unact.shape[-1]))
|
||||
|
||||
enc_topk_bboxes = F.sigmoid(reference_points_unact)
|
||||
if denoising_bbox_unact is not None:
|
||||
reference_points_unact = torch.concat(
|
||||
[denoising_bbox_unact, reference_points_unact], 1)
|
||||
|
||||
enc_topk_logits = enc_outputs_class.gather(dim=1, \
|
||||
index=topk_ind.unsqueeze(-1).repeat(1, 1, enc_outputs_class.shape[-1]))
|
||||
|
||||
# extract region features
|
||||
if self.learnt_init_query:
|
||||
target = self.tgt_embed.weight.unsqueeze(0).tile([bs, 1, 1])
|
||||
else:
|
||||
target = output_memory.gather(dim=1, \
|
||||
index=topk_ind.unsqueeze(-1).repeat(1, 1, output_memory.shape[-1]))
|
||||
target = target.detach()
|
||||
|
||||
if denoising_class is not None:
|
||||
target = torch.concat([denoising_class, target], 1)
|
||||
|
||||
return target, reference_points_unact.detach(), enc_topk_bboxes, enc_topk_logits
|
||||
|
||||
|
||||
def forward(self, feats, targets=None):
|
||||
|
||||
# input projection and embedding
|
||||
(memory, spatial_shapes, level_start_index) = self._get_encoder_input(feats)
|
||||
|
||||
# prepare denoising training
|
||||
if self.training and self.num_denoising > 0:
|
||||
denoising_class, denoising_bbox_unact, attn_mask, dn_meta = \
|
||||
get_contrastive_denoising_training_group(targets, \
|
||||
self.num_classes,
|
||||
self.num_queries,
|
||||
self.denoising_class_embed,
|
||||
num_denoising=self.num_denoising,
|
||||
label_noise_ratio=self.label_noise_ratio,
|
||||
box_noise_scale=self.box_noise_scale, )
|
||||
else:
|
||||
denoising_class, denoising_bbox_unact, attn_mask, dn_meta = None, None, None, None
|
||||
|
||||
target, init_ref_points_unact, enc_topk_bboxes, enc_topk_logits = \
|
||||
self._get_decoder_input(memory, spatial_shapes, denoising_class, denoising_bbox_unact)
|
||||
|
||||
# decoder
|
||||
out_bboxes, out_logits = self.decoder(
|
||||
target,
|
||||
init_ref_points_unact,
|
||||
memory,
|
||||
spatial_shapes,
|
||||
level_start_index,
|
||||
self.dec_bbox_head,
|
||||
self.dec_score_head,
|
||||
self.query_pos_head,
|
||||
attn_mask=attn_mask)
|
||||
|
||||
if self.training and dn_meta is not None:
|
||||
dn_out_bboxes, out_bboxes = torch.split(out_bboxes, dn_meta['dn_num_split'], dim=2)
|
||||
dn_out_logits, out_logits = torch.split(out_logits, dn_meta['dn_num_split'], dim=2)
|
||||
|
||||
out = {'pred_logits': out_logits[-1], 'pred_boxes': out_bboxes[-1]}
|
||||
|
||||
if self.training and self.aux_loss:
|
||||
out['aux_outputs'] = self._set_aux_loss(out_logits[:-1], out_bboxes[:-1])
|
||||
out['aux_outputs'].extend(self._set_aux_loss([enc_topk_logits], [enc_topk_bboxes]))
|
||||
|
||||
if self.training and dn_meta is not None:
|
||||
out['dn_aux_outputs'] = self._set_aux_loss(dn_out_logits, dn_out_bboxes)
|
||||
out['dn_meta'] = dn_meta
|
||||
|
||||
return out
|
||||
|
||||
|
||||
@torch.jit.unused
|
||||
def _set_aux_loss(self, outputs_class, outputs_coord):
|
||||
# this is a workaround to make torchscript happy, as torchscript
|
||||
# doesn't support dictionary with non-homogeneous values, such
|
||||
# as a dict having both a Tensor and a list.
|
||||
return [{'pred_logits': a, 'pred_boxes': b}
|
||||
for a, b in zip(outputs_class, outputs_coord)]
|
||||
81
rtdetr_pytorch/src/zoo/rtdetr/rtdetr_postprocessor.py
Normal file
81
rtdetr_pytorch/src/zoo/rtdetr/rtdetr_postprocessor.py
Normal file
@@ -0,0 +1,81 @@
|
||||
"""by lyuwenyu
|
||||
"""
|
||||
|
||||
import torch
|
||||
import torch.nn as nn
|
||||
import torch.nn.functional as F
|
||||
|
||||
import torchvision
|
||||
|
||||
from src.core import register
|
||||
|
||||
|
||||
__all__ = ['RTDETRPostProcessor']
|
||||
|
||||
|
||||
@register
|
||||
class RTDETRPostProcessor(nn.Module):
|
||||
__share__ = ['num_classes', 'use_focal_loss', 'num_top_queries', 'remap_mscoco_category']
|
||||
|
||||
def __init__(self, num_classes=80, use_focal_loss=True, num_top_queries=300, remap_mscoco_category=False) -> None:
|
||||
super().__init__()
|
||||
self.use_focal_loss = use_focal_loss
|
||||
self.num_top_queries = num_top_queries
|
||||
self.num_classes = num_classes
|
||||
self.remap_mscoco_category = remap_mscoco_category
|
||||
self.deploy_mode = False
|
||||
|
||||
def extra_repr(self) -> str:
|
||||
return f'use_focal_loss={self.use_focal_loss}, num_classes={self.num_classes}, num_top_queries={self.num_top_queries}'
|
||||
|
||||
# def forward(self, outputs, orig_target_sizes):
|
||||
def forward(self, outputs, orig_target_sizes):
|
||||
|
||||
logits, boxes = outputs['pred_logits'], outputs['pred_boxes']
|
||||
# orig_target_sizes = torch.stack([t["orig_size"] for t in targets], dim=0)
|
||||
|
||||
bbox_pred = torchvision.ops.box_convert(boxes, in_fmt='cxcywh', out_fmt='xyxy')
|
||||
bbox_pred *= orig_target_sizes.repeat(1, 2).unsqueeze(1)
|
||||
|
||||
if self.use_focal_loss:
|
||||
scores = F.sigmoid(logits)
|
||||
scores, index = torch.topk(scores.flatten(1), self.num_top_queries, axis=-1)
|
||||
labels = index % self.num_classes
|
||||
index = index // self.num_classes
|
||||
boxes = bbox_pred.gather(dim=1, index=index.unsqueeze(-1).repeat(1, 1, bbox_pred.shape[-1]))
|
||||
|
||||
else:
|
||||
scores = F.softmax(logits)[:, :, :-1]
|
||||
scores, labels = scores.max(dim=-1)
|
||||
boxes = bbox_pred
|
||||
if scores.shape[1] > self.num_top_queries:
|
||||
scores, index = torch.topk(scores, self.num_top_queries, dim=-1)
|
||||
labels = torch.gather(labels, dim=1, index=index)
|
||||
boxes = torch.gather(boxes, dim=1, index=index.unsqueeze(-1).tile(1, 1, boxes.shape[-1]))
|
||||
|
||||
# TODO for onnx export
|
||||
if self.deploy_mode:
|
||||
return labels, boxes, scores
|
||||
|
||||
# TODO
|
||||
if self.remap_mscoco_category:
|
||||
from ...data.coco import mscoco_label2category
|
||||
labels = torch.tensor([mscoco_label2category[int(x.item())] for x in labels.flatten()])\
|
||||
.to(boxes.device).reshape(labels.shape)
|
||||
|
||||
results = []
|
||||
for lab, box, sco in zip(labels, boxes, scores):
|
||||
result = dict(labels=lab, boxes=box, scores=sco)
|
||||
results.append(result)
|
||||
|
||||
return results
|
||||
|
||||
|
||||
def deploy(self, ):
|
||||
self.eval()
|
||||
self.deploy_mode = True
|
||||
return self
|
||||
|
||||
@property
|
||||
def iou_types(self, ):
|
||||
return ('bbox', )
|
||||
101
rtdetr_pytorch/src/zoo/rtdetr/utils.py
Normal file
101
rtdetr_pytorch/src/zoo/rtdetr/utils.py
Normal file
@@ -0,0 +1,101 @@
|
||||
"""by lyuwenyu
|
||||
"""
|
||||
|
||||
import math
|
||||
import torch
|
||||
import torch.nn as nn
|
||||
import torch.nn.functional as F
|
||||
|
||||
|
||||
def inverse_sigmoid(x: torch.Tensor, eps: float=1e-5) -> torch.Tensor:
|
||||
x = x.clip(min=0., max=1.)
|
||||
return torch.log(x.clip(min=eps) / (1 - x).clip(min=eps))
|
||||
|
||||
|
||||
def deformable_attention_core_func(value, value_spatial_shapes, sampling_locations, attention_weights):
|
||||
"""
|
||||
Args:
|
||||
value (Tensor): [bs, value_length, n_head, c]
|
||||
value_spatial_shapes (Tensor|List): [n_levels, 2]
|
||||
value_level_start_index (Tensor|List): [n_levels]
|
||||
sampling_locations (Tensor): [bs, query_length, n_head, n_levels, n_points, 2]
|
||||
attention_weights (Tensor): [bs, query_length, n_head, n_levels, n_points]
|
||||
|
||||
Returns:
|
||||
output (Tensor): [bs, Length_{query}, C]
|
||||
"""
|
||||
bs, _, n_head, c = value.shape
|
||||
_, Len_q, _, n_levels, n_points, _ = sampling_locations.shape
|
||||
|
||||
split_shape = [h * w for h, w in value_spatial_shapes]
|
||||
value_list = value.split(split_shape, dim=1)
|
||||
sampling_grids = 2 * sampling_locations - 1
|
||||
sampling_value_list = []
|
||||
for level, (h, w) in enumerate(value_spatial_shapes):
|
||||
# N_, H_*W_, M_, D_ -> N_, H_*W_, M_*D_ -> N_, M_*D_, H_*W_ -> N_*M_, D_, H_, W_
|
||||
value_l_ = value_list[level].flatten(2).permute(
|
||||
0, 2, 1).reshape(bs * n_head, c, h, w)
|
||||
# N_, Lq_, M_, P_, 2 -> N_, M_, Lq_, P_, 2 -> N_*M_, Lq_, P_, 2
|
||||
sampling_grid_l_ = sampling_grids[:, :, :, level].permute(
|
||||
0, 2, 1, 3, 4).flatten(0, 1)
|
||||
# N_*M_, D_, Lq_, P_
|
||||
sampling_value_l_ = F.grid_sample(
|
||||
value_l_,
|
||||
sampling_grid_l_,
|
||||
mode='bilinear',
|
||||
padding_mode='zeros',
|
||||
align_corners=False)
|
||||
sampling_value_list.append(sampling_value_l_)
|
||||
# (N_, Lq_, M_, L_, P_) -> (N_, M_, Lq_, L_, P_) -> (N_*M_, 1, Lq_, L_*P_)
|
||||
attention_weights = attention_weights.permute(0, 2, 1, 3, 4).reshape(
|
||||
bs * n_head, 1, Len_q, n_levels * n_points)
|
||||
output = (torch.stack(
|
||||
sampling_value_list, dim=-2).flatten(-2) *
|
||||
attention_weights).sum(-1).reshape(bs, n_head * c, Len_q)
|
||||
|
||||
return output.permute(0, 2, 1)
|
||||
|
||||
|
||||
import math
|
||||
def bias_init_with_prob(prior_prob=0.01):
|
||||
"""initialize conv/fc bias value according to a given probability value."""
|
||||
bias_init = float(-math.log((1 - prior_prob) / prior_prob))
|
||||
return bias_init
|
||||
|
||||
|
||||
|
||||
def get_activation(act: str, inpace: bool=True):
|
||||
'''get activation
|
||||
'''
|
||||
act = act.lower()
|
||||
|
||||
if act == 'silu':
|
||||
m = nn.SiLU()
|
||||
|
||||
elif act == 'relu':
|
||||
m = nn.ReLU()
|
||||
|
||||
elif act == 'leaky_relu':
|
||||
m = nn.LeakyReLU()
|
||||
|
||||
elif act == 'silu':
|
||||
m = nn.SiLU()
|
||||
|
||||
elif act == 'gelu':
|
||||
m = nn.GELU()
|
||||
|
||||
elif act is None:
|
||||
m = nn.Identity()
|
||||
|
||||
elif isinstance(act, nn.Module):
|
||||
m = act
|
||||
|
||||
else:
|
||||
raise RuntimeError('')
|
||||
|
||||
if hasattr(m, 'inplace'):
|
||||
m.inplace = inpace
|
||||
|
||||
return m
|
||||
|
||||
|
||||
24
rtdetr_pytorch/tools/README.md
Normal file
24
rtdetr_pytorch/tools/README.md
Normal file
@@ -0,0 +1,24 @@
|
||||
|
||||
|
||||
Train/test script examples
|
||||
- `CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 --master-port=8989 tools/train.py -c path/to/config &> train.log 2>&1 &`
|
||||
- `-r path/to/checkpoint`
|
||||
- `--amp`
|
||||
- `--test-only`
|
||||
|
||||
|
||||
Tuning script examples
|
||||
- `torchrun --master_port=8844 --nproc_per_node=4 tools/train.py -c configs/rtdetr/rtdetr_r18vd_6x_coco.yml -t https://github.com/lyuwenyu/storage/releases/download/v0.1/rtdetr_r18vd_5x_coco_objects365_from_paddle.pth`
|
||||
|
||||
|
||||
Export script examples
|
||||
- `python tools/export_onnx.py -c path/to/config -r path/to/checkpoint --check`
|
||||
|
||||
|
||||
GPU do not release memory
|
||||
- `ps aux | grep "tools/train.py" | awk '{print $2}' | xargs kill -9`
|
||||
|
||||
|
||||
Save all logs
|
||||
- Appending `&> train.log 2>&1 &` or `&> train.log 2>&1`
|
||||
|
||||
147
rtdetr_pytorch/tools/export_onnx.py
Normal file
147
rtdetr_pytorch/tools/export_onnx.py
Normal file
@@ -0,0 +1,147 @@
|
||||
"""by lyuwenyu
|
||||
"""
|
||||
|
||||
import os
|
||||
import sys
|
||||
sys.path.insert(0, os.path.join(os.path.dirname(os.path.abspath(__file__)), '..'))
|
||||
|
||||
import argparse
|
||||
import numpy as np
|
||||
|
||||
from src.core import YAMLConfig
|
||||
|
||||
import torch
|
||||
import torch.nn as nn
|
||||
|
||||
|
||||
def main(args, ):
|
||||
"""main
|
||||
"""
|
||||
cfg = YAMLConfig(args.config, resume=args.resume)
|
||||
|
||||
if args.resume:
|
||||
checkpoint = torch.load(args.resume, map_location='cpu')
|
||||
if 'ema' in checkpoint:
|
||||
state = checkpoint['ema']['module']
|
||||
else:
|
||||
state = checkpoint['model']
|
||||
else:
|
||||
raise AttributeError('only support resume to load model.state_dict by now.')
|
||||
|
||||
# NOTE load train mode state -> convert to deploy mode
|
||||
cfg.model.load_state_dict(state)
|
||||
|
||||
class Model(nn.Module):
|
||||
def __init__(self, ) -> None:
|
||||
super().__init__()
|
||||
self.model = cfg.model.deploy()
|
||||
self.postprocessor = cfg.postprocessor.deploy()
|
||||
print(self.postprocessor.deploy_mode)
|
||||
|
||||
def forward(self, images, orig_target_sizes):
|
||||
outputs = self.model(images)
|
||||
return self.postprocessor(outputs, orig_target_sizes)
|
||||
|
||||
|
||||
model = Model()
|
||||
|
||||
dynamic_axes = {
|
||||
'images': {0: 'N', },
|
||||
'orig_target_sizes': {0: 'N'}
|
||||
}
|
||||
|
||||
data = torch.rand(1, 3, 640, 640)
|
||||
size = torch.tensor([[640, 640]])
|
||||
|
||||
torch.onnx.export(
|
||||
model,
|
||||
(data, size),
|
||||
args.file_name,
|
||||
input_names=['images', 'orig_target_sizes'],
|
||||
output_names=['labels', 'boxes', 'scores'],
|
||||
dynamic_axes=dynamic_axes,
|
||||
opset_version=16,
|
||||
verbose=False
|
||||
)
|
||||
|
||||
|
||||
if args.check:
|
||||
import onnx
|
||||
onnx_model = onnx.load(args.file_name)
|
||||
onnx.checker.check_model(onnx_model)
|
||||
print('Check export onnx model done...')
|
||||
|
||||
|
||||
if args.simplify:
|
||||
import onnxsim
|
||||
dynamic = True
|
||||
input_shapes = {'images': data.shape, 'orig_target_sizes': size.shape} if dynamic else None
|
||||
onnx_model_simplify, check = onnxsim.simplify(args.file_name, input_shapes=input_shapes, dynamic_input_shape=dynamic)
|
||||
onnx.save(onnx_model_simplify, args.file_name)
|
||||
print(f'Simplify onnx model {check}...')
|
||||
|
||||
|
||||
# import onnxruntime as ort
|
||||
# from PIL import Image, ImageDraw, ImageFont
|
||||
# from torchvision.transforms import ToTensor
|
||||
# from src.data.coco.coco_dataset import mscoco_category2name, mscoco_category2label, mscoco_label2category
|
||||
|
||||
# # print(onnx.helper.printable_graph(mm.graph))
|
||||
|
||||
# # Load the original image without resizing
|
||||
# original_im = Image.open('./hongkong.jpg').convert('RGB')
|
||||
# original_size = original_im.size
|
||||
|
||||
# # Resize the image for model input
|
||||
# im = original_im.resize((640, 640))
|
||||
# im_data = ToTensor()(im)[None]
|
||||
# print(im_data.shape)
|
||||
|
||||
# sess = ort.InferenceSession(args.file_name)
|
||||
# output = sess.run(
|
||||
# # output_names=['labels', 'boxes', 'scores'],
|
||||
# output_names=None,
|
||||
# input_feed={'images': im_data.data.numpy(), "orig_target_sizes": size.data.numpy()}
|
||||
# )
|
||||
|
||||
# # print(type(output))
|
||||
# # print([out.shape for out in output])
|
||||
|
||||
# labels, boxes, scores = output
|
||||
|
||||
# draw = ImageDraw.Draw(original_im) # Draw on the original image
|
||||
# thrh = 0.6
|
||||
|
||||
# for i in range(im_data.shape[0]):
|
||||
|
||||
# scr = scores[i]
|
||||
# lab = labels[i][scr > thrh]
|
||||
# box = boxes[i][scr > thrh]
|
||||
|
||||
# print(i, sum(scr > thrh))
|
||||
|
||||
# for b, l in zip(box, lab):
|
||||
# # Scale the bounding boxes back to the original image size
|
||||
# b = [coord * original_size[j % 2] / 640 for j, coord in enumerate(b)]
|
||||
# # Get the category name from the label
|
||||
# category_name = mscoco_category2name[mscoco_label2category[l]]
|
||||
# draw.rectangle(list(b), outline='red', width=2)
|
||||
# font = ImageFont.truetype("Arial.ttf", 15)
|
||||
# draw.text((b[0], b[1]), text=category_name, fill='yellow', font=font)
|
||||
|
||||
# # Save the original image with bounding boxes
|
||||
# original_im.save('test.jpg')
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
|
||||
parser = argparse.ArgumentParser()
|
||||
parser.add_argument('--config', '-c', type=str, )
|
||||
parser.add_argument('--resume', '-r', type=str, )
|
||||
parser.add_argument('--file-name', '-f', type=str, default='model.onnx')
|
||||
parser.add_argument('--check', action='store_true', default=False,)
|
||||
parser.add_argument('--simplify', action='store_true', default=False,)
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
main(args)
|
||||
203
rtdetr_pytorch/tools/infer.py
Normal file
203
rtdetr_pytorch/tools/infer.py
Normal file
@@ -0,0 +1,203 @@
|
||||
import torch
|
||||
import torch.nn as nn
|
||||
import torchvision.transforms as T
|
||||
from torch.cuda.amp import autocast
|
||||
import numpy as np
|
||||
from PIL import Image, ImageDraw, ImageFont
|
||||
import os
|
||||
import sys
|
||||
sys.path.insert(0, os.path.join(os.path.dirname(os.path.abspath(__file__)), '..'))
|
||||
import argparse
|
||||
import src.misc.dist as dist
|
||||
from src.core import YAMLConfig
|
||||
from src.solver import TASKS
|
||||
import numpy as np
|
||||
|
||||
def postprocess(labels, boxes, scores, iou_threshold=0.55):
|
||||
def calculate_iou(box1, box2):
|
||||
x1, y1, x2, y2 = box1
|
||||
x3, y3, x4, y4 = box2
|
||||
xi1 = max(x1, x3)
|
||||
yi1 = max(y1, y3)
|
||||
xi2 = min(x2, x4)
|
||||
yi2 = min(y2, y4)
|
||||
inter_width = max(0, xi2 - xi1)
|
||||
inter_height = max(0, yi2 - yi1)
|
||||
inter_area = inter_width * inter_height
|
||||
box1_area = (x2 - x1) * (y2 - y1)
|
||||
box2_area = (x4 - x3) * (y4 - y3)
|
||||
union_area = box1_area + box2_area - inter_area
|
||||
iou = inter_area / union_area if union_area != 0 else 0
|
||||
return iou
|
||||
merged_labels = []
|
||||
merged_boxes = []
|
||||
merged_scores = []
|
||||
used_indices = set()
|
||||
for i in range(len(boxes)):
|
||||
if i in used_indices:
|
||||
continue
|
||||
current_box = boxes[i]
|
||||
current_label = labels[i]
|
||||
current_score = scores[i]
|
||||
boxes_to_merge = [current_box]
|
||||
scores_to_merge = [current_score]
|
||||
used_indices.add(i)
|
||||
for j in range(i + 1, len(boxes)):
|
||||
if j in used_indices:
|
||||
continue
|
||||
if labels[j] != current_label:
|
||||
continue
|
||||
other_box = boxes[j]
|
||||
iou = calculate_iou(current_box, other_box)
|
||||
if iou >= iou_threshold:
|
||||
boxes_to_merge.append(other_box.tolist())
|
||||
scores_to_merge.append(scores[j])
|
||||
used_indices.add(j)
|
||||
xs = np.concatenate([[box[0], box[2]] for box in boxes_to_merge])
|
||||
ys = np.concatenate([[box[1], box[3]] for box in boxes_to_merge])
|
||||
merged_box = [np.min(xs), np.min(ys), np.max(xs), np.max(ys)]
|
||||
merged_score = max(scores_to_merge)
|
||||
merged_boxes.append(merged_box)
|
||||
merged_labels.append(current_label)
|
||||
merged_scores.append(merged_score)
|
||||
return [np.array(merged_labels)], [np.array(merged_boxes)], [np.array(merged_scores)]
|
||||
def slice_image(image, slice_height, slice_width, overlap_ratio):
|
||||
img_width, img_height = image.size
|
||||
|
||||
slices = []
|
||||
coordinates = []
|
||||
step_x = int(slice_width * (1 - overlap_ratio))
|
||||
step_y = int(slice_height * (1 - overlap_ratio))
|
||||
|
||||
for y in range(0, img_height, step_y):
|
||||
for x in range(0, img_width, step_x):
|
||||
box = (x, y, min(x + slice_width, img_width), min(y + slice_height, img_height))
|
||||
slice_img = image.crop(box)
|
||||
slices.append(slice_img)
|
||||
coordinates.append((x, y))
|
||||
return slices, coordinates
|
||||
def merge_predictions(predictions, slice_coordinates, orig_image_size, slice_width, slice_height, threshold=0.30):
|
||||
merged_labels = []
|
||||
merged_boxes = []
|
||||
merged_scores = []
|
||||
orig_height, orig_width = orig_image_size
|
||||
for i, (label, boxes, scores) in enumerate(predictions):
|
||||
x_shift, y_shift = slice_coordinates[i]
|
||||
scores = np.array(scores).reshape(-1)
|
||||
valid_indices = scores > threshold
|
||||
valid_labels = np.array(label).reshape(-1)[valid_indices]
|
||||
valid_boxes = np.array(boxes).reshape(-1, 4)[valid_indices]
|
||||
valid_scores = scores[valid_indices]
|
||||
for j, box in enumerate(valid_boxes):
|
||||
box[0] = np.clip(box[0] + x_shift, 0, orig_width)
|
||||
box[1] = np.clip(box[1] + y_shift, 0, orig_height)
|
||||
box[2] = np.clip(box[2] + x_shift, 0, orig_width)
|
||||
box[3] = np.clip(box[3] + y_shift, 0, orig_height)
|
||||
valid_boxes[j] = box
|
||||
merged_labels.extend(valid_labels)
|
||||
merged_boxes.extend(valid_boxes)
|
||||
merged_scores.extend(valid_scores)
|
||||
return np.array(merged_labels), np.array(merged_boxes), np.array(merged_scores)
|
||||
def draw(images, labels, boxes, scores, thrh = 0.6, path = ""):
|
||||
for i, im in enumerate(images):
|
||||
draw = ImageDraw.Draw(im)
|
||||
scr = scores[i]
|
||||
lab = labels[i][scr > thrh]
|
||||
box = boxes[i][scr > thrh]
|
||||
scrs = scores[i][scr > thrh]
|
||||
for j,b in enumerate(box):
|
||||
draw.rectangle(list(b), outline='red',)
|
||||
draw.text((b[0], b[1]), text=f"label: {lab[j].item()} {round(scrs[j].item(),2)}", font=ImageFont.load_default(), fill='blue')
|
||||
if path == "":
|
||||
im.save(f'results_{i}.jpg')
|
||||
else:
|
||||
im.save(path)
|
||||
|
||||
def main(args, ):
|
||||
"""main
|
||||
"""
|
||||
cfg = YAMLConfig(args.config, resume=args.resume)
|
||||
if args.resume:
|
||||
checkpoint = torch.load(args.resume, map_location='cpu')
|
||||
if 'ema' in checkpoint:
|
||||
state = checkpoint['ema']['module']
|
||||
else:
|
||||
state = checkpoint['model']
|
||||
else:
|
||||
raise AttributeError('Only support resume to load model.state_dict by now.')
|
||||
# NOTE load train mode state -> convert to deploy mode
|
||||
cfg.model.load_state_dict(state)
|
||||
class Model(nn.Module):
|
||||
def __init__(self, ) -> None:
|
||||
super().__init__()
|
||||
self.model = cfg.model.deploy()
|
||||
self.postprocessor = cfg.postprocessor.deploy()
|
||||
|
||||
def forward(self, images, orig_target_sizes):
|
||||
outputs = self.model(images)
|
||||
outputs = self.postprocessor(outputs, orig_target_sizes)
|
||||
return outputs
|
||||
|
||||
model = Model().to(args.device)
|
||||
im_pil = Image.open(args.im_file).convert('RGB')
|
||||
w, h = im_pil.size
|
||||
orig_size = torch.tensor([w, h])[None].to(args.device)
|
||||
|
||||
transforms = T.Compose([
|
||||
T.Resize((640, 640)),
|
||||
T.ToTensor(),
|
||||
])
|
||||
im_data = transforms(im_pil)[None].to(args.device)
|
||||
if args.sliced:
|
||||
num_boxes = args.numberofboxes
|
||||
|
||||
aspect_ratio = w / h
|
||||
num_cols = int(np.sqrt(num_boxes * aspect_ratio))
|
||||
num_rows = int(num_boxes / num_cols)
|
||||
slice_height = h // num_rows
|
||||
slice_width = w // num_cols
|
||||
overlap_ratio = 0.2
|
||||
slices, coordinates = slice_image(im_pil, slice_height, slice_width, overlap_ratio)
|
||||
predictions = []
|
||||
for i, slice_img in enumerate(slices):
|
||||
slice_tensor = transforms(slice_img)[None].to(args.device)
|
||||
with autocast(): # Use AMP for each slice
|
||||
output = model(slice_tensor, torch.tensor([[slice_img.size[0], slice_img.size[1]]]).to(args.device))
|
||||
torch.cuda.empty_cache()
|
||||
labels, boxes, scores = output
|
||||
|
||||
labels = labels.cpu().detach().numpy()
|
||||
boxes = boxes.cpu().detach().numpy()
|
||||
scores = scores.cpu().detach().numpy()
|
||||
predictions.append((labels, boxes, scores))
|
||||
|
||||
merged_labels, merged_boxes, merged_scores = merge_predictions(predictions, coordinates, (h, w), slice_width, slice_height)
|
||||
labels, boxes, scores = postprocess(merged_labels, merged_boxes, merged_scores)
|
||||
else:
|
||||
output = model(im_data, orig_size)
|
||||
labels, boxes, scores = output
|
||||
|
||||
draw([im_pil], labels, boxes, scores, 0.6)
|
||||
|
||||
if __name__ == '__main__':
|
||||
import argparse
|
||||
parser = argparse.ArgumentParser()
|
||||
parser.add_argument('-c', '--config', type=str, )
|
||||
parser.add_argument('-r', '--resume', type=str, )
|
||||
parser.add_argument('-f', '--im-file', type=str, )
|
||||
parser.add_argument('-s', '--sliced', type=bool, default=False)
|
||||
parser.add_argument('-d', '--device', type=str, default='cpu')
|
||||
parser.add_argument('-nc', '--numberofboxes', type=int, default=25)
|
||||
args = parser.parse_args()
|
||||
main(args)
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
50
rtdetr_pytorch/tools/train.py
Normal file
50
rtdetr_pytorch/tools/train.py
Normal file
@@ -0,0 +1,50 @@
|
||||
"""by lyuwenyu
|
||||
"""
|
||||
|
||||
import os
|
||||
import sys
|
||||
sys.path.insert(0, os.path.join(os.path.dirname(os.path.abspath(__file__)), '..'))
|
||||
import argparse
|
||||
|
||||
import src.misc.dist as dist
|
||||
from src.core import YAMLConfig
|
||||
from src.solver import TASKS
|
||||
|
||||
|
||||
def main(args, ) -> None:
|
||||
'''main
|
||||
'''
|
||||
dist.init_distributed()
|
||||
if args.seed is not None:
|
||||
dist.set_seed(args.seed)
|
||||
|
||||
assert not all([args.tuning, args.resume]), \
|
||||
'Only support from_scrach or resume or tuning at one time'
|
||||
|
||||
cfg = YAMLConfig(
|
||||
args.config,
|
||||
resume=args.resume,
|
||||
use_amp=args.amp,
|
||||
tuning=args.tuning
|
||||
)
|
||||
|
||||
solver = TASKS[cfg.yaml_cfg['task']](cfg)
|
||||
|
||||
if args.test_only:
|
||||
solver.val()
|
||||
else:
|
||||
solver.fit()
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
|
||||
parser = argparse.ArgumentParser()
|
||||
parser.add_argument('--config', '-c', type=str, )
|
||||
parser.add_argument('--resume', '-r', type=str, )
|
||||
parser.add_argument('--tuning', '-t', type=str, )
|
||||
parser.add_argument('--test-only', action='store_true', default=False,)
|
||||
parser.add_argument('--amp', action='store_true', default=False,)
|
||||
parser.add_argument('--seed', type=int, help='seed',)
|
||||
args = parser.parse_args()
|
||||
|
||||
main(args)
|
||||
Reference in New Issue
Block a user