First things first – Microsoft COCO (Common Objects in Context) dataset supports only 80 classes with similar objects like Dog, Cat, Person, etc.
List of Objects supported by COCO https://gist.github.com/AruniRC/7b3dadd004da04c80198557db5da4bd
What if you want to detect your interesting Object, this is where the concept of Labelling or Annotating comes in
Bounding boxes: Bounding boxes are the most commonly used type of annotation in computer vision. Bounding boxes are rectangular boxes used to define the location of the target object. They can be determined by the 𝑥 and 𝑦 axis coordinates in the upper-left corner and the 𝑥 and 𝑦 axis coordinates in the lower-right corner of the rectangle. Bounding boxes are generally used in object detection and localization tasks
Data labelling is a task that requires a lot of manual work. If you can find a good open dataset for your project, that is labelled, LUCK IS ON YOUR SIDE! But mostly, this is not the case and you have to place bounding boxes for your Objects and then build a prototype.
YOLO v3: Better, not Faster, Stronger
The title of the YOLO v2 paper is similar. If YOLO is a milk-based health drink for kids rather than an object detection algorithm. It was named “YOLO9000: Better, Faster, Stronger”, the current version clocks about 30 FPS. It has to do with the increase in complexity of underlying architecture called Darknet
Architecture of YOLOv3
I know it is a little bit theoretical and confusing, let me make it simple for you, Like how Car has an Engine, similarly YOLOv3 has a backbone known as Darknet53 with 53Convolutional layers – https://en.wikipedia.org/wiki/Convolutional_neural_network#Convolutional_layers
The thing is, if you have more convolutional layers in your network, you will have a better accuracy in a short period.
Like you saw the above image, we will see how to detect an Object in living in the next part