Segmentation-aware convolution filters are invariant to backgrounds. We achieve this in three steps: (i) compute segmentation cues for each pixel (i.e., “embeddings”), (ii) create a foreground mask for each patch, and (iii) combine the masks with convolution, so that the filters only process the local foreground in each image patch.
For prerequisites, refer to DeepLabV2. Our setup follows theirs almost exactly.
Once you have the prequisites, simply run make all -j4 from within caffe/ to compile the code with 4 cores.
Convolution layers to create dense embeddings.Im2dist to compute dense distance comparisons in an embedding map.Im2parity to compute dense label comparisons in a label map.DistLoss (with parameters alpha and beta) to set up a contrastive side loss on the distances.See scripts/segaware/config/embs for a full example.
Im2col on the input, to arrange pixel/feature patches into columns.Im2dist on the embeddings, to get their distances into columns.Exp on the distances, with scale: -1, to get them into [0,1].Tile the exponentiated distances, with a factor equal to the depth (i.e., channels) of the original convolution features.Eltwise to multiply the Tile result with the Im2col result.Convolution with bottom_is_im2col: true to matrix-multiply the convolution weights with the Eltwise output.See scripts/segaware/config/vgg for an example in which every convolution layer in the VGG16 architecture is made segmentation-aware.
NormConvMeanfield layer. As input, give it two copies of the unary potentials (produced by a Split layer), some embeddings, and a meshgrid-like input (produced by a DummyData layer with data_filler { type: "xy" }).See scripts/segaware/config/res for an example in which a segmentation-aware CRF is added to a resnet architecture.
scripts/segaware/model/res/.scripts, run ./test_res.sh. This will produce .mat files in scripts/segaware/features/res/voc_test/mycrf/.scripts, run ./gen_preds.sh. This will produce colorized .png results in scripts/segaware/results/res/voc_test/mycrf/none/results/VOC2012/Segmentation/comp6_test_cls. An example input-ouput pair is shown below:

If you run this set of steps for the validation set, you can run ./eval.sh to evaluate your results on the PASCAL VOC validation set. If you change the model, you may want to run ./edit_env.sh to update the evaluation instructions.
@inproceedings{harley_segaware,
title = {Segmentation-Aware Convolutional Networks Using Local Attention Masks},
author = {Adam W Harley, Konstantinos G. Derpanis, Iasonas Kokkinos},
booktitle = {IEEE International Conference on Computer Vision (ICCV)},
year = {2017},
}
Feel free to open issues on here! Also, I'm pretty good with email: aharley@cmu.edu