DICEPTION: A Generalist Diffusion Model for Vision Perception

Points

Category Name

For non-human image inputs, the pose results may have issues. Same when perform semantic segmentation with categories that are not in COCO.

The results of semantic segmentation may be unstable because:

We only trained on COCO, whose quality and quantity are insufficient to meet the requirements.

Semantic segmentation is more complex than other tasks, as it requires accurately learning the relationship between semantics and objects.

However, we are still able to produce some high-quality semantic segmentation results, strongly demonstrating the potential of our approach.

Input Image

You can click on the image to select points prompt. At most 5 point.

Results

Examples

Input Image	Task	Points	Category Name

Pages: