Data Generation

Important

The generation of synthetic datasets is currently not available. For training of instance segmentation models, please use real images and create a synthetic dataset.

The Data Generation section of AI Hub serves as a crucial bridge between raw information and the structured, labeled datasets required to build effective machine learning models. Whether through human-powered annotation or algorithmic generation, it is designed to make the data preparation phase of the machine learning lifecycle more efficient, scalable, and controllable.

Synthetic Dataset Generation

Synthetic datasets are generated by leveraging algorithms to automatically simulates an object mesh model in a scene with other distractor objects under various lighting conditions and positions. The images are captured from various angles and are automatically annotated. This process is particularly useful for training models in scenarios where real-world data is scarce, expensive to obtain, or raises privacy concerns.

Important

The generated dataset would contain Neura COCO style instance annotation + Neura style 6D pose annotation. Hence, it can be used for BOTH object detection and pose estimation model training.

../_images/ai-hub-generate-syn-dataset.jpg

Synthetic Dataset Generation - Basic Parameters

../_images/ai-hub-syn-dataset-params.jpg

Synthetic Dataset Generation - Advanced Parameters

Real Dataset Generation

Real datasets are created from actual camera feeds or image captures, providing authentic data that reflects real-world conditions, variations, and challenges. This process involves collecting images or videos from physical environments and manually annotating them, with tools like Neura CVAT to create the ground truth labels necessary for model training.

Note

For detailed instructions on creating real datasets, refer to: