DICEPTION: A Generalist Diffusion Model for Vision Perception

One single model solves multiple perception tasks, producing impressive results!

Due to the GPU quota limit, if an error occurs, please wait for 5 minutes before retrying.

badge-github-stars

Task

For non-human image inputs, the pose results may have issues. Same when perform semantic segmentation with categories that are not in COCO.

The results of semantic segmentation may be unstable because:

  • We only trained on COCO, whose quality and quantity are insufficient to meet the requirements.
  • Semantic segmentation is more complex than other tasks, as it requires accurately learning the relationship between semantics and objects.

However, we are still able to produce some high-quality semantic segmentation results, strongly demonstrating the potential of our approach.

You can click on the image to select points prompt. At most 5 point.

Examples
Input Image Task Points Category Name
Pages: