Merge pull request #27952 from DDmytro:ecc/template-mask-rework

Add optional template mask for findTransformECC #27952 Supersedes #22997 **Summary** Add optional template mask support to findTransformECC so that only pixels valid in both the template and the image are used in ECC. Backward compatibility is preserved (existing signatures unchanged; one new overload adds templateMask). **Motivation** - Real-world frames often contain moving foreground artifacts (e.g., a football over a static field). Masking the object in one frame only is insufficient because its position changes independently of the background. Since we don’t know the warp a priori, we can’t back-project a single mask across frames. The correct approach is to supply both masks and take their intersection. - Templates may include uninformative/low-texture or noisy regions, or partial overlaps with other objects. Excluding such regions from the alignment improves robustness and convergence. This PR completes and replaces https://github.com/opencv/opencv/pull/22997 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [ ] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake
2026-01-15 12:15:17 +00:00 · 2025-11-21 10:50:29 +02:00
parent 2cc5b69fd1
commit 6231b080ff
3 changed files with 137 additions and 25 deletions
--- a/modules/video/include/opencv2/video/tracking.hpp
+++ b/modules/video/include/opencv2/video/tracking.hpp
@@ -374,6 +374,51 @@ double findTransformECC(InputArray templateImage, InputArray inputImage,
    TermCriteria criteria = TermCriteria(TermCriteria::COUNT+TermCriteria::EPS, 50, 0.001),
    InputArray inputMask = noArray());

+/** @brief Finds the geometric transform (warp) between two images in terms of the ECC criterion @cite EP08
+using validity masks for both the template and the input images.
+
+This function extends findTransformECC() by adding a mask for the template image.
+The Enhanced Correlation Coefficient is evaluated only over pixels that are valid in both images:
+on each iteration inputMask is warped into the template frame and combined with templateMask, and
+only the intersection of these masks contributes to the objective function.
+
+@param templateImage 1 or 3 channel template image; CV_8U, CV_16U, CV_32F, CV_64F type.
+@param inputImage input image which should be warped with the final warpMatrix in
+order to provide an image similar to templateImage, same type as templateImage.
+@param templateMask single-channel 8-bit mask for templateImage indicating valid pixels
+to be used in the alignment. Must have the same size as templateImage.
+@param inputMask single-channel 8-bit mask for inputImage indicating valid pixels
+before warping. Must have the same size as inputImage.
+@param warpMatrix floating-point \f$2\times 3\f$ or \f$3\times 3\f$ mapping matrix (warp).
+@param motionType parameter, specifying the type of motion:
+ -   **MOTION_TRANSLATION** sets a translational motion model; warpMatrix is \f$2\times 3\f$ with
+     the first \f$2\times 2\f$ part being the unity matrix and the rest two parameters being
+     estimated.
+ -   **MOTION_EUCLIDEAN** sets a Euclidean (rigid) transformation as motion model; three
+     parameters are estimated; warpMatrix is \f$2\times 3\f$.
+ -   **MOTION_AFFINE** sets an affine motion model (DEFAULT); six parameters are estimated;
+     warpMatrix is \f$2\times 3\f$.
+ -   **MOTION_HOMOGRAPHY** sets a homography as a motion model; eight parameters are
+     estimated; warpMatrix is \f$3\times 3\f$.
+@param criteria parameter, specifying the termination criteria of the ECC algorithm;
+criteria.epsilon defines the threshold of the increment in the correlation coefficient between two
+iterations (a negative criteria.epsilon makes criteria.maxcount the only termination criterion).
+Default values are shown in the declaration above.
+@param gaussFiltSize size of the Gaussian blur filter used for smoothing images and masks
+before computing the alignment (DEFAULT: 5).
+
+@sa
+findTransformECC, computeECC, estimateAffine2D, estimateAffinePartial2D, findHomography
+*/
+CV_EXPORTS_W double findTransformECCWithMask( InputArray templateImage,
+                                 InputArray inputImage,
+                                 InputArray templateMask,
+                                 InputArray inputMask,
+                                 InputOutputArray warpMatrix,
+                                 int motionType = MOTION_AFFINE,
+                                 TermCriteria criteria = TermCriteria(TermCriteria::COUNT + TermCriteria::EPS, 50, 1e-6),
+                                 int gaussFiltSize = 5 );
+
 /** @example samples/cpp/kalman.cpp
 An example using the standard Kalman filter
 */
--- a/modules/video/src/ecc.cpp
+++ b/modules/video/src/ecc.cpp
@@ -333,8 +333,15 @@ double cv::computeECC(InputArray templateImage, InputArray inputImage, InputArra
    return templateImage_zeromean.dot(inputImage_zeromean) / (templateImagenorm * inputImagenorm);
 }

-double cv::findTransformECC(InputArray templateImage, InputArray inputImage, InputOutputArray warpMatrix,
-                            int motionType, TermCriteria criteria, InputArray inputMask, int gaussFiltSize) {
+
+double cv::findTransformECCWithMask( InputArray templateImage,
+                                 InputArray inputImage,
+                                 InputArray templateMask,
+                                 InputArray inputMask,
+                                 InputOutputArray warpMatrix,
+                                 int motionType,
+                                 TermCriteria criteria,
+                                 int gaussFiltSize) {
    Mat src = templateImage.getMat();  // template image
    Mat dst = inputImage.getMat();     // input image (to be warped)
    Mat map = warpMatrix.getMat();     // warp (transformation)
@@ -416,7 +423,7 @@ double cv::findTransformECC(InputArray templateImage, InputArray inputImage, Inp
    Ycoord.release();

    const int channels = src.channels();
-    int type = CV_MAKETYPE(CV_32F, channels);  // используем отдельно, если нужно явно
+    int type = CV_MAKETYPE(CV_32F, channels);

    std::vector<cv::Mat> XgridCh(channels, Xgrid);
    cv::merge(XgridCh, Xgrid);
@@ -430,27 +437,10 @@ double cv::findTransformECC(InputArray templateImage, InputArray inputImage, Inp
    Mat imageWarped = Mat(hs, ws, type);    // to store the warped zero-mean input image
    Mat imageMask = Mat(hs, ws, CV_8U);     // to store the final mask

-    Mat inputMaskMat = inputMask.getMat();
-    // to use it for mask warping
-    Mat preMask;
-    if (inputMask.empty())
-        preMask = Mat::ones(hd, wd, CV_8U);
-    else
-        threshold(inputMask, preMask, 0, 1, THRESH_BINARY);
-
    // Gaussian filtering is optional
    src.convertTo(templateFloat, templateFloat.type());
    GaussianBlur(templateFloat, templateFloat, Size(gaussFiltSize, gaussFiltSize), 0, 0);

-    Mat preMaskFloat;
-    preMask.convertTo(preMaskFloat, type);
-    GaussianBlur(preMaskFloat, preMaskFloat, Size(gaussFiltSize, gaussFiltSize), 0, 0);
-    // Change threshold.
-    preMaskFloat *= (0.5 / 0.95);
-    // Rounding conversion.
-    preMaskFloat.convertTo(preMask, preMask.type());
-    preMask.convertTo(preMaskFloat, preMaskFloat.type());
-
    dst.convertTo(imageFloat, imageFloat.type());
    GaussianBlur(imageFloat, imageFloat, Size(gaussFiltSize, gaussFiltSize), 0, 0);

@@ -466,12 +456,48 @@ double cv::findTransformECC(InputArray templateImage, InputArray inputImage, Inp
    filter2D(imageFloat, gradientX, -1, dx);
    filter2D(imageFloat, gradientY, -1, dx.t());

-    cv::Mat preMaskFloatNCh;
-    std::vector<cv::Mat> maskChannels(gradientX.channels(), preMaskFloat);
-    cv::merge(maskChannels, preMaskFloatNCh);
+    // To use in mask warping
+    Mat templtMask;
+    if(templateMask.empty())
+    {
+        templtMask = Mat::ones(hs, ws, CV_8U);
+    }
+    else
+    {
+        threshold(templateMask, templtMask, 0, 1, THRESH_BINARY);
+        templtMask.convertTo(templtMask, CV_32F);
+        GaussianBlur(templtMask, templtMask, Size(gaussFiltSize, gaussFiltSize), 0, 0);
+        templtMask *= (0.5/0.95);
+        templtMask.convertTo(templtMask, CV_8U);
+    }

-    gradientX = gradientX.mul(preMaskFloatNCh);
-    gradientY = gradientY.mul(preMaskFloatNCh);
+    //to use it for mask warping
+    Mat preMask;
+    if(inputMask.empty())
+    {
+        preMask = Mat::ones(hd, wd, CV_8U);
+    }
+    else
+    {
+        Mat preMaskFloat;
+        threshold(inputMask, preMask, 0, 1, THRESH_BINARY);
+
+        preMask.convertTo(preMaskFloat, CV_32F);
+        GaussianBlur(preMaskFloat, preMaskFloat, Size(gaussFiltSize, gaussFiltSize), 0, 0);
+        // Change threshold.
+        preMaskFloat *= (0.5/0.95);
+        // Rounding conversion.
+        preMaskFloat.convertTo(preMask, CV_8U);
+
+        // If there's no template mask, we can apply image masks to gradients only once.
+        // Otherwise, we'll need to combine the template and image masks at each iteration.
+        if (templateMask.empty())
+        {
+            cv::Mat zeroMask = (preMask == 0);
+            gradientX.setTo(0, zeroMask);
+            gradientY.setTo(0, zeroMask);
+        }
+    }

    // matrices needed for solving linear equation system for maximizing ECC
    Mat jacobian = Mat(hs, ws * numberOfParameters, type);
@@ -505,6 +531,15 @@ double cv::findTransformECC(InputArray templateImage, InputArray inputImage, Inp
            warpPerspective(preMask, imageMask, map, imageMask.size(), maskFlags);
        }

+        if (!templateMask.empty())
+        {
+            cv::bitwise_and(imageMask, templtMask, imageMask);
+
+            cv::Mat zeroMask = (imageMask == 0);
+            gradientXWarped.setTo(0, zeroMask);
+            gradientYWarped.setTo(0, zeroMask);
+        }
+
        Scalar imgMean, imgStd, tmpMean, tmpStd;
        meanStdDev(imageWarped, imgMean, imgStd, imageMask);
        meanStdDev(templateFloat, tmpMean, tmpStd, imageMask);
@@ -576,6 +611,18 @@ double cv::findTransformECC(InputArray templateImage, InputArray inputImage, Inp
    return rho;
 }

+double cv::findTransformECC(InputArray templateImage,
+                            InputArray inputImage,
+                            InputOutputArray warpMatrix,
+                            int motionType,
+                            TermCriteria criteria,
+                            InputArray inputMask,
+                            int gaussFiltSize
+                            ) {
+    return findTransformECCWithMask(templateImage, inputImage, noArray(), inputMask,
+            warpMatrix, motionType, criteria, gaussFiltSize);
+}
+
 double cv::findTransformECC(InputArray templateImage, InputArray inputImage, InputOutputArray warpMatrix,
                            int motionType, TermCriteria criteria, InputArray inputMask) {
    // Use default value of 5 for gaussFiltSize to maintain backward compatibility.
--- a/modules/video/test/test_ecc.cpp
+++ b/modules/video/test/test_ecc.cpp
@@ -342,6 +342,26 @@ bool CV_ECC_Test_Mask::test(const Mat testImg) {
        // Test with non-default gaussian blur.
        findTransformECC(warpedImage, testImg, mapTranslation, 0, criteria, mask, 1);

+        if (!checkMap(mapTranslation, translationGround))
+            return false;
+
+        // Test with template mask.
+        Mat_<unsigned char> warpedMask = Mat_<unsigned char>::ones(warpedImage.rows, warpedImage.cols);
+        for (int i=warpedImage.rows*1/3; i<warpedImage.rows*2/3; i++) {
+            for (int j=warpedImage.cols*1/3; j<warpedImage.cols*2/3; j++) {
+                warpedMask(i, j) = 0;
+            }
+        }
+
+        findTransformECCWithMask(warpedImage, testImg, warpedMask, mask, mapTranslation, 0,
+                    TermCriteria(TermCriteria::COUNT+TermCriteria::EPS, ECC_iterations, ECC_epsilon));
+
+        if (!checkMap(mapTranslation, translationGround))
+            return false;
+
+        // Test with non-default gaussian blur.
+        findTransformECCWithMask(warpedImage, testImg, warpedMask,  mask, mapTranslation, 0, criteria, 1);
+
        if (!checkMap(mapTranslation, translationGround))
            return false;
    }