Portfolio

CARLA Semantic Segmentation Challenge

Introduction


In digital image processing and computer vision, Image segmentation is the process of partitioning a digital image into multiple image segments, also known as image regions or image objects (set of pixels). (Wikipedia)

Segmentation models are useful for a variety of tasks, including:

  1. Autonomous Driving (AD)

    Image segmentation in autonomous driving helps classify every pixel of a scene, allowing vehicles to understand their environment in detail and make safer decisions on the road.

  2. Medical Imaging

    Image segmentation in medical imaging allows for precise delineation of structures like tumors, blood vessels, or organs. This granularity helps in accurate diagnosis, treatment planning, and disease monitoring.

  3. Agriculture

    In agriculture, image segmentation can detect and differentiate between crops and weeds. This facilitates targeted herbicide application, ensuring healthier crop growth while minimizing chemical usage.

Goal


Our goal is to achieve real-time semantic segmentation targets in the CARLA simulation environment using the diverse data collected from it. (e.g., heavy rain, fine dust, sunny, etc.)

The techniques that are utilized in this project:

  • Pretrained
    1. DeepLabv3 (backbone: resnet_50)
    2. DeepLabv3 (backbone: resnet_50 with Conditional Random Field)
    3. DeepLabv3 (backbone: mobilenet_v3_large)
  • Scratch
    1. U-Net
    2. Mask-RCNN
  • Dataset Information


    We performed segmentation using data collected from 10 towns provided by default in CARLA, under three environmental conditions: sunny, rainy, dusty.

    The image data has a 1:2 ratio, and we were able to test the generalization performance of segmentation by adjusting to a maximum size of (216 x 512).

    Class Definition

    We modified the 28 classes obtained from the official CARLA documentation into 12 classes.

    
                original_class = {0: "None", 1 : "Roads", 2 : "Sidewalks", 3 : "Buildings", 4 : "Other", 5 : "Other", 6 : "Poles", 7 : "TrafficLight", 8 : "TrafficSigns", 9 : "Vegetation", 10 : "Roads", 11 : "None", 12 : "Pedestrians", 13 : "Vehicles", 14 : "Vehicles", 15 : "Vehicles", 16 : "Vehicles", 17 : "Vehicles", 18 : "Vehicles", 19 : "Vehicles", 20 : "Other", 21 : "Other", 22 : "Other", 23 : "Other", 24 : "RoadLines", 25 : "Sidewalks", 26 : "Other", 27 : "Other", 28 : "Other"}
                remap_class = {0: "None", 1: "Roads", 2: "Sidewalks", 3: "Buildings", 4: "Other", 5: "Poles", 6: "TrafficLight", 7: "TrafficSigns", 8: "Vegetation", 9: "Pedestrians", 10: "Vehicles", 11: "RoadLines"}
              

    You can view the video of the original class and the remapped mask below.

    There are several factors in images that influence the training of segmentation.

    1. Image Resolution:
      1. High-resolution images provide finer details but increase training time and require more memory.
        Reducing resolution simplifies computation but may lead to loss of details.
    2. Image Quality:
      1. Poor quality images (e.g., with lots of noise or low contrast) can impact the accuracy of segmentation.
    3. Image Augmentation:
      1. Augmentation techniques (like rotation, scaling, flipping, brightness adjustments) help the model generalize better across varied scenarios.
        Over-augmenting can risk overfitting the model.
    4. Class Imbalance:
      1. If certain classes of pixels vastly outnumber others in an image, it can lead to class imbalance issues. This might degrade segmentation accuracy for some classes.
    5. Annotation Quality:
      1. The quality of the ground truth segmentation masks plays a major role in the outcome of the training. Inaccurate masks can decrease training accuracy.
    6. Channel Information:
      1. Multi-channel images (e.g., RGB, infra-red, depth, etc.) can provide additional information, potentially improving segmentation accuracy.
    7. Variability and Diversity:
      1. A diverse set of images in the training set (varying lighting, angles, backgrounds, object sizes, etc.) ensures the model generalizes well in real-world scenarios.
    8. Contextual Information:
      1. Context in images can assist in predicting the position of specific objects or structures, especially crucial in larger images.
    9. Spatial Dependencies:
      1. Considering the spatial dependencies between pixels within an image can lead to more accurate segmentation outcomes.

    Conclusion