Update README.md

d2c6e5d verified 11 days ago

6.02 kB

	---
	library_name: transformers
	license: apache-2.0
	datasets:
	- detection-datasets/coco
	language:
	- en
	pipeline_tag: object-detection
	---

	# Relation DETR model with ResNet-50 backbone

	## Model Details

	The model is not available now. We are working on integrating Relation-DETR into transformers. We will update as soon as possible.

	### Model Description

	![image/png](/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F66939171e3a813f3bb10e804%2FkNzBZZ2SFq6Wgk2ki_c5t.png%3C%2Fspan%3E)%3C!-- HTML_TAG_END -->

	> This paper presents a general scheme for enhancing the convergence and performance of DETR (DEtection TRansformer).
	> We investigate the slow convergence problem in transformers from a new perspective, suggesting that it arises from
	> the self-attention that introduces no structural bias over inputs. To address this issue, we explore incorporating
	> position relation prior as attention bias to augment object detection, following the verification of its statistical
	> significance using a proposed quantitative macroscopic correlation (MC) metric. Our approach, termed Relation-DETR,
	> introduces an encoder to construct position relation embeddings for progressive attention refinement, which further
	> extends the traditional streaming pipeline of DETR into a contrastive relation pipeline to address the conflicts
	> between non-duplicate predictions and positive supervision. Extensive experiments on both generic and task-specific
	> datasets demonstrate the effectiveness of our approach. Under the same configurations, Relation-DETR achieves a
	> significant improvement (+2.0% AP compared to DINO), state-of-the-art performance (51.7% AP for 1x and 52.1% AP
	> for 2x settings), and a remarkably faster convergence speed (over 40% AP with only 2 training epochs) than existing
	> DETR detectors on COCO val2017. Moreover, the proposed relation encoder serves as a universal plug-in-and-play component,
	> bringing clear improvements for theoretically any DETR-like methods. Furthermore, we introduce a class-agnostic detection
	> dataset, SA-Det-100k. The experimental results on the dataset illustrate that the proposed explicit position relation
	> achieves a clear improvement of 1.3% AP, highlighting its potential towards universal object detection.
	> The code and dataset are available at [this https URL](https://github.com/xiuqhou/Relation-DETR).

	- Developed by: [Xiuquan Hou]
	- Shared by: Xiuquan Hou
	- Model type: Relation DETR
	- License: Apache-2.0

	### Model Sources

	<!-- Provide the basic links for the model. -->

	- Repository: [https://github.com/xiuqhou/Relation-DETR](https://github.com/xiuqhou/Relation-DETR)
	- Paper: [Relation DETR: Exploring Explicit Position Relation Prior for Object Detection](https://arxiv.org/abs/2407.11699)
	<!-- - Demo [optional]: [More Information Needed] -->

	## How to Get Started with the Model

	Use the code below to get started with the model.

	```python
	import torch
	import requests

	from PIL import Image
	from transformers import RelationDetrForObjectDetection, RelationDetrImageProcessor

	url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
	image = Image.open(requests.get(url, stream=True).raw)

	image_processor = RelationDetrImageProcessor.from_pretrained("PekingU/rtdetr_r50vd")
	model = RelationDetrForObjectDetection.from_pretrained("PekingU/rtdetr_r50vd")

	inputs = image_processor(images=image, return_tensors="pt")

	with torch.no_grad():
	outputs = model(**inputs)

	results = image_processor.post_process_object_detection(outputs, target_sizes=torch.tensor([image.size[::-1]]), threshold=0.3)

	for result in results:
	for score, label_id, box in zip(result["scores"], result["labels"], result["boxes"]):
	score, label = score.item(), label_id.item()
	box = [round(i, 2) for i in box.tolist()]
	print(f"{model.config.id2label[label]}: {score:.2f} {box}")
	```

	This should output

	```python
	cat: 0.96 [343.8, 24.9, 639.52, 371.71]
	cat: 0.95 [12.6, 54.34, 316.37, 471.86]
	remote: 0.95 [40.09, 73.49, 175.52, 118.06]
	remote: 0.90 [333.09, 76.71, 369.77, 187.4]
	couch: 0.90 [0.44, 0.53, 640.44, 475.54]
	```

	## Training Details

	Relation DEtection TRansformer (Relation DETR) model is trained on [COCO 2017 object detection](https://cocodataset.org/#download) (118k annotated images) for 12 epochs (aka 1x schedule).

	## Evaluation

	\| Model \| Backbone \| Epoch \| mAP \| AP<sub>50 \| AP<sub>75 \| AP<sub>S \| AP<sub>M \| AP<sub>L \|
	\| ------------------- \| -------------------- \| :---: \| :---: \| :-------: \| :-------: \| :------: \| :------: \| :------: \|
	\| Relation DETR \| ResNet50 \| 12 \| 51.7 \| 69.1 \| 56.3 \| 36.1 \| 55.6 \| 66.1 \|
	\| Relation DETR \| Swin-L<sub>(IN-22K) \| 12 \| 57.8 \| 76.1 \| 62.9 \| 41.2 \| 62.1 \| 74.4 \|
	\| Relation DETR \| ResNet50 \| 24 \| 52.1 \| 69.7 \| 56.6 \| 36.1 \| 56.0 \| 66.5 \|
	\| Relation DETR \| Swin-L<sub>(IN-22K) \| 24 \| 58.1 \| 76.4 \| 63.5 \| 41.8 \| 63.0 \| 73.5 \|
	\| Relation-DETR<sup>† \| Focal-L<sub>(IN-22K) \| 4+24 \| 63.5 \| 80.8 \| 69.1 \| 47.2 \| 66.9 \| 77.0 \|

	† means finetuned model on COCO after pretraining on Object365.

	## Model Architecture and Objective

	![image/png](/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F66939171e3a813f3bb10e804%2FUMtLjkxrwoDikUBlgj-Fc.png%3C%2Fspan%3E)%3C!-- HTML_TAG_END -->

	![image/png](/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F66939171e3a813f3bb10e804%2FMBbCM-zQGgUjKUmwB0yje.png%3C%2Fspan%3E)%3C!-- HTML_TAG_END -->

	## Citation and BibTeX

	```
	@misc{hou2024relationdetrexploringexplicit,
	title={Relation DETR: Exploring Explicit Position Relation Prior for Object Detection},
	author={Xiuquan Hou and Meiqin Liu and Senlin Zhang and Ping Wei and Badong Chen and Xuguang Lan},
	year={2024},
	eprint={2407.11699},
	archivePrefix={arXiv},
	primaryClass={cs.CV},
	url={https://arxiv.org/abs/2407.11699},
	}
	```

	## Model Card Authors

	[xiuqhou](https://huggingface.co/xiuqhou)