<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="../assets/xml/rss.xsl" media="all"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>dhruv's space (Posts about object-detection)</title><link>https://dhruvs.space/</link><description></description><atom:link href="https://dhruvs.space/categories/object-detection.xml" rel="self" type="application/rss+xml"></atom:link><language>en</language><copyright>Contents © 2020 &lt;a href="mailto:dhruvt93@gmail.com"&gt;Dhruv Thakur&lt;/a&gt; </copyright><lastBuildDate>Fri, 14 Feb 2020 22:35:30 GMT</lastBuildDate><generator>Nikola (getnikola.com)</generator><docs>http://blogs.law.harvard.edu/tech/rss</docs><item><title>Understanding Object Detection Part 4: More Anchors!</title><link>https://dhruvs.space/posts/understanding-object-detection-part-4/</link><dc:creator>Dhruv Thakur</dc:creator><description>&lt;div class="cell border-box-sizing text_cell rendered"&gt;&lt;div class="prompt input_prompt"&gt;
&lt;/div&gt;&lt;div class="inner_cell"&gt;
&lt;div class="text_cell_render border-box-sizing rendered_html"&gt;
&lt;p&gt;This post is fourth in a series on object detection. The other posts can be found &lt;a href="https://dhruvs.space/posts/understanding-object-detection-part-1/"&gt;here&lt;/a&gt;, &lt;a href="https://dhruvs.space/posts/understanding-object-detection-part-2/"&gt;here&lt;/a&gt;, and &lt;a href="https://dhruvs.space/posts/understanding-object-detection-part-3/"&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The last post covered use of anchor boxes for detecting multiple objects in an image. I ended that one with a model that was doing fine with detecting the presence of various objects, but the predicted bounding boxes were not able to properly localize objects with non-squared shapes. This post will detail techniques for further improving that baseline model.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://dhruvs.space/posts/understanding-object-detection-part-4/"&gt;Read more…&lt;/a&gt; (17 min remaining to read)&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;</description><category>convnets</category><category>object-detection</category><category>single-shot-detector</category><guid>https://dhruvs.space/posts/understanding-object-detection-part-4/</guid><pubDate>Sat, 05 Jan 2019 11:35:21 GMT</pubDate></item><item><title>Understanding Object Detection Part 3: Single Shot Detector</title><link>https://dhruvs.space/posts/understanding-object-detection-part-3/</link><dc:creator>Dhruv Thakur</dc:creator><description>&lt;div class="cell border-box-sizing text_cell rendered"&gt;&lt;div class="prompt input_prompt"&gt;
&lt;/div&gt;&lt;div class="inner_cell"&gt;
&lt;div class="text_cell_render border-box-sizing rendered_html"&gt;
&lt;p&gt;This post is third in a series on object detection. The other posts can be found &lt;a href="https://dhruvs.space/posts/understanding-object-detection-part-1/"&gt;here&lt;/a&gt;, &lt;a href="https://dhruvs.space/posts/understanding-object-detection-part-2/"&gt;here&lt;/a&gt;, and &lt;a href="https://dhruvs.space/posts/understanding-object-detection-part-4/"&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;This post will detail a technique for classifying and localizing multiple objects in an image using a single deep neural network. Going from single object detection to multiple object detection is a fairly hard problem, so this is going to be a long post.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://dhruvs.space/posts/understanding-object-detection-part-3/"&gt;Read more…&lt;/a&gt; (30 min remaining to read)&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;</description><category>convnets</category><category>object-detection</category><category>single-shot-detector</category><guid>https://dhruvs.space/posts/understanding-object-detection-part-3/</guid><pubDate>Thu, 03 Jan 2019 07:32:21 GMT</pubDate></item><item><title>Understanding Object Detection Part 2: Single Object Detection</title><link>https://dhruvs.space/posts/understanding-object-detection-part-2/</link><dc:creator>Dhruv Thakur</dc:creator><description>&lt;div class="cell border-box-sizing text_cell rendered"&gt;&lt;div class="prompt input_prompt"&gt;
&lt;/div&gt;&lt;div class="inner_cell"&gt;
&lt;div class="text_cell_render border-box-sizing rendered_html"&gt;
&lt;p&gt;This post is second in a series on object detection. The other posts can be found &lt;a href="https://dhruvs.space/posts/understanding-object-detection-part-1/"&gt;here&lt;/a&gt;, &lt;a href="https://dhruvs.space/posts/understanding-object-detection-part-3/"&gt;here&lt;/a&gt;, and &lt;a href="https://dhruvs.space/posts/understanding-object-detection-part-4/"&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;This is a direct continuation to the &lt;a href="https://dhruvs.space/posts/understanding-object-detection-part-1"&gt;last&lt;/a&gt; post where I explored the basics of object detection. In particular, I learnt that a convnet can be used for localization by using appropriate output activations and loss function. I built two separate models for classification and localization respectively and used them on the Pascal VOC dataset.&lt;/p&gt;
&lt;p&gt;This post will detail stage 3 of single object detection, ie, classifying and localizing the largest object in an image with a single network.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://dhruvs.space/posts/understanding-object-detection-part-2/"&gt;Read more…&lt;/a&gt; (12 min remaining to read)&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;</description><category>convnets</category><category>object-detection</category><guid>https://dhruvs.space/posts/understanding-object-detection-part-2/</guid><pubDate>Thu, 27 Dec 2018 13:43:21 GMT</pubDate></item><item><title>Understanding Object Detection Part 1: The Basics</title><link>https://dhruvs.space/posts/understanding-object-detection-part-1/</link><dc:creator>Dhruv Thakur</dc:creator><description>&lt;div class="cell border-box-sizing text_cell rendered"&gt;&lt;div class="prompt input_prompt"&gt;
&lt;/div&gt;&lt;div class="inner_cell"&gt;
&lt;div class="text_cell_render border-box-sizing rendered_html"&gt;
&lt;p&gt;This post is first in a series on object detection. The succeeding posts can be found &lt;a href="https://dhruvs.space/posts/understanding-object-detection-part-2/"&gt;here&lt;/a&gt;, &lt;a href="https://dhruvs.space/posts/understanding-object-detection-part-3/"&gt;here&lt;/a&gt;, and &lt;a href="https://dhruvs.space/posts/understanding-object-detection-part-4/"&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;One of the primary takeaways for me after learning the basics of object detection was that the very backbone of the convnet architecture used for classification can also be utilised for localization. Intuitively, it does make sense, as convnets tend to preserve spatial information present in the input images. I saw some of that in action (detailed &lt;a href="https://dhruvs.space/posts/grad-cam-heatmaps-along-resnet-34/"&gt;here&lt;/a&gt; and &lt;a href="https://github.com/dhth/grad-cam-visualizer"&gt;here&lt;/a&gt;) while generating localization maps from activations of the last convolutional layer of a Resnet-34, which was my first realization that a convnet really does take into consideration the spatial arrangement of pixels while coming up with a class score.&lt;/p&gt;
&lt;p&gt;Without knowing that bit of information, object detection does seem like a hard problem to solve! Accurate detection of multiple kinds of similar looking objects is a tough one to solve even today, but starting out with a basic detector is not overly complex. Or atleast, the concepts behind it are fairly straightforward (I get to say that thanks to the hard work of numerous researchers).&lt;/p&gt;
&lt;p&gt;&lt;a href="https://dhruvs.space/posts/understanding-object-detection-part-1/"&gt;Read more…&lt;/a&gt; (15 min remaining to read)&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;</description><category>convnets</category><category>object-detection</category><guid>https://dhruvs.space/posts/understanding-object-detection-part-1/</guid><pubDate>Wed, 26 Dec 2018 11:39:21 GMT</pubDate></item></channel></rss>