Speckle Reduction in Matrix-Log Domain for Synthetic Aperture Radar Imaging

Synthetic aperture radar (SAR) images are widely used for Earth observation to complement optical imaging. By combining information on the polarization and the phase shift of the radar echos, SAR images offer high sensitivity to the geometry and materials that compose a scene. This information richness comes with a drawback inherent to all coherent imaging modalities: a strong signal-dependent noise called “speckle.” This paper addresses the mathematical issues of performing speckle reduction in a transformed domain: the matrix-log domain. Rather than directly estimating noiseless covariance matrices, recasting the denoising problem in terms of the matrix-log of the covariance matrices stabilizes noise fluctuations and makes it possible to apply off-the-shelf denoising algorithms. We refine the method MuLoG by replacing heuristic procedures with exact expressions and improving the estimation strategy. This corrects a bias of the original method and should facilitate and encourage the adaptation of general-purpose processing methods to SAR imaging.


Introduction
SAR imaging is a key technology in airborne and satellite remote sensing. This active imaging technique based on time-of-flight measurement and coherent processing (the so-called aperture synthesis) has a night and day capability and can produce images through clouds [36]. Beyond intensity images, many SAR systems offer polarimetric information, i.e., they measure how the polarization of the wave is affected by the back-scattering mechanisms occurring when the electromagnetic radar wave interacts with the illuminated B scene. Backscattered radar waves collected from slightly different points of view can be combined, in a process called interferometry, to perform 3-D reconstructions or to recover very small displacements. In contrast to conventional optical imaging, SAR imaging gives access to both the amplitude and the phase of the backscattered wave. Interferometry uses this phase information to relate a phase shift observed between two SAR images to a change in the optical path (i.e., the path length traveled by the wave in its round-trip from the SAR antenna to the scene and back). Figure 1a illustrates the geometry of SAR imaging. Depending on the nature of the information available at each pixel, several names are used to describe SAR images: -SAR denotes single-channel images: each pixel k contains the complex amplitude v k ∈ C backscattered by the scene. When no interferometric processing involving other single-channel SAR images is to be performed, the phase of v k can be discarded and only the amplitude |v k | or the intensity |v k | 2 is considered; -PolSAR denotes multi-channel polarimetric images: each pixel k contains a vector v k = (v 1 k , v 2 k , v 3 k ) ∈ C 3 of 3 complex amplitudes backscattered by the scene under different polarizations (horizontally or vertically linearly polarized components at emission or reception, see Fig. 1a); -InSAR denotes multi-channel interferometric images: each pixel k contains a vector v k ∈ C 2 formed by the 2 complex amplitudes backscattered by the scene on the two antenna positions, multi-baseline and tomographic SAR are extensions of SAR interferometry to more than two images; -PolInSAR denotes multi-channel polarimetric and interferometric images: each pixel k contains a vector of 6 complex amplitudes v k ∈ C 6 and corresponds to the combination of polarimetric and interferometric information.
Polarimetric images are generally displayed in false colors using Pauli polarimetric basis, by combining the complex amplitudes collected under the different polarimetric configurations (the red channel corresponds to 1 2 |v 1 k − v 3 k | 2 , the green channel to 2|v 2 k | 2 and the blue channel to 1 2 |v 1 k + v 3 k | 2 ), see [36]. Figure 2 shows airborne images of the same area obtained with each SAR modality (image credit: ONERA). Different structures are visible in the images: cultivated fields, roads (appearing as dark lines in SAR images due to the low reflectivity of such smooth surfaces), trees and several farm buildings (in the bottom left quarter of the image). The SAR illumination comes from the left hand side of the images; shadows are thus visible on the right of all elevated elements (in particular, trees). A striking peculiarity of SAR images is the strong noise observed in all SAR modalities. This noise is unavoidable because it originates from the coherent illumination that is essential to the synthetic aperture processing. The SAR antenna collects at each time sample several echoes that interfere with each other, see Fig. 1b, c. The resulting complex amplitude v k ∈ C D is distributed, under the well-established speckle model due to Joseph Goodman [23], as a circular complex Gaussian distribution: where * denotes the conjugate transpose, and the complexvalued covariance matrix k carries all the information about the backscattering process: in SAR imaging (D = 1), k corresponds to the reflectivity at pixel k, in PolSAR k ∈ C 3×3 characterizes the reflectivity in each polarimetric channel (diagonal of k ) and the scattering mechanism (matrix k can be decomposed into a sum of matrices corresponding to elementary phenomena such as surface scattering, dihedral scattering and volume scattering), in InSAR k ∈ C 2×2 and diagonal elements correspond to reflectivities while the offdiagonal elements indicate the phase shift arg( 1,2 k ) from one antenna to the other (due to the difference in path length) and the coherence 1,2 k / 1,1 k 2,2 k (i.e., the remaining correlation between the complex amplitudes v 1 k and v 2 k : this correlation drops when the two images are captured at two dates that are more separate or when the scene has evolved between the two acquisitions, e.g., due to vegetation growth). Since the covariance matrix k contains all the useful information, it has to be estimated from the diffusion vector v k to characterize the radar properties of the scene at each pixel. This is classically done by computing the sample covariance C k inside a small window centered at pixel k: where N k is the set of pixel indices within the window centered at pixel k and L = Card(N k ) is the number of pixels in the window. If the speckle is spatially independent and all pixels in the window follow a distribution with a common covariance matrix k , the samples v are independent and identically distributed. When L ≥ D, the sample covariance matrix is then distributed according to a complex Wishart distribution: C k ∼ W( k , L), and its multi-variate probability density function is given by [22] p where D (L) = π D(D−1)/2 D k=1 (L − k + 1). In the case of single-channel SAR images (D = 1), C k corresponds to an intensity I k and k is the pixel reflectivity R k . The SAR intensity is then distributed according to a gamma distribution: The speckle in SAR intensity images is known to be a multiplicative noise in the sense that Var[I k ] = R 2 k /L, so that the standard deviation of speckle fluctuations is proportional to the pixel reflectivity, and the speckle-corrupted intensity I k may be modeled using a generative model of the form: I k = R k S k with S k a random variable distributed according to a gamma distribution with a mean equal to 1 and a variance equal to 1/L. In the multi-variate case (D > 1), the generative model becomes [14]. The variance of C i, j k (the element of matrix C k at row i and column j) is equal to 1 L i,i k j, j k , i.e., the standard deviation of C i, j k is proportional to the geometric mean of the reflectivities in channels i and j and is thus also signaldependent.
The ubiquity of speckle noise in SAR images, the multiplicative nature of the fluctuations and heavy-tailed behavior   2 Synthetic aperture radar imaging offers rich information of a scene but suffers from speckle. The combination of images acquired with slightly different incidence angles (InSAR) or various polarization states (PolSAR) leads to a D-dimensional complex-valued vector per pixel. Reduction of the speckle fluctuations requires appropriate statistical modeling. This paper is devoted to the mathematical analysis of a generic approach for speckle reduction based on matrix-log decom-positions. The images shown were obtained with the X-band airborne imaging system SETHI of the French aerospace lab ONERA [2] (after our pre-processing to achieve a trade-off between sidelobe attenuation and speckle decorrelation, the pixel size is ≈ 70cm × 70cm, and the area shown is ≈ 300m × 370m). The Gaussian denoiser used in the despeckling algorithm is BM3D [10] of Wishart distribution have fueled numerous developments of specific image processing methods to reduce speckle. The vast majority of these works considered single-channel SAR images, and adapted techniques based on pixel selection (based on range selection with Lee's σ -filter [30], window selection [32] or region growing [47]), variational techniques (total variation minimization [1,19,44], curvelets [20]), patch-based methods derived from the non-local means [7,[15][16][17]38], or more recently deep-learning based methods (using supervised [8], semi-supervised [12] or selfsupervised [34] learning strategies). The adaptation of a novel image denoising technique to the specificities of SAR imaging is a thorough process that includes replacing steps to account for the statistics of speckle and the nature of SAR images where many bright structures reach intensities several orders of magnitude larger than the surrounding area. This slows down the transfer of successful denoising techniques to the field of SAR imaging. In order to circumvent this adaptation process, we recently proposed a generic framework [14], named MuLoG, derived from the general "plug-andplay ADMM" strategy [6,48] which is related to a wider family of approaches using denoisers to regularize inverse problems [40]. When an image restoration problem is stated in the form of a variational problem and then solved using the alternating directions method of multipliers (ADMM, see for example [3]), or proximal-splitting techniques [9], one step of the algorithm that improves the fidelity to the prior model corresponds to the denoising of an image corrupted by additive white Gaussian noise. The key idea of "plug-andplay ADMM" is then to replace this step by an off-the-shelf Gaussian denoiser. The flexibility of MuLoG with respect to various SAR modalities (see the despeckling results obtained with MuLoG in Fig. 2) and its ability to benefit from the latest developments in additive Gaussian denoising makes the method very useful for SAR applications (e.g., multi-temporal filtering [53], deformation analysis [21], height retrieval [52] or despeckling using pre-trained neural networks [13]). The original MuLoG algorithm in [14] is based on approximations that can lead, however, to estimation biases. This paper starts with a brief summary of MuLoG framework in Sect. 2. We then perform a rigorous analysis of the optimization problem involved and establish the exact closed-form expression for the first and second directional derivatives of the matrix exponential mapping. We discuss the important problem of initialization and regularization of the covariance matrices, in particular in the rank-deficient case L < D. We introduce several modifications and show that they suppress the bias of the original method. Beyond their use in MuLoG's generic framework, these mathematical developments can benefit other variational methods for the restoration or segmentation of multi-channel SAR images, as well as hybrid methods that combine deep learning and an explicit statistical model of speckle by algorithm unrolling [35]. For easier reproducibility, the source code of the algorithms described in this paper is made available at https://www.charles-deledalle.fr/pages/ mulog.php under an open source license.

An Overview of MuLoG Framework
In order to give a self-contained presentation of our developments, we recall in this section the principle of MuLoG, as first introduced in [14]. MuLoG's approach to multi-channel SAR despeckling is built around two key ideas: -a nonlinear transform that decomposes the field of D × D noisy covariance matrices {C k } k=1..n into D 2 real-valued images with n-pixels; this transform approximately stabilizes speckle fluctuations and decorrelates the channels so that each can be denoised independently; -implicit regularization using a plug-and-play ADMM iterative scheme where the proximal operator associated to the prior term is replaced by an off-the-shelf Gaussian denoiser.
The nonlinear transform is defined in three steps: (i) a matrix-log is applied to map each speckled covariance matrix C k to a Hermitian matrix with approximately stabilized variance; (ii) the real and imaginary parts of these Hermitian matrices are separated, forming D 2 real-valued channels; (iii) an affine transform, identical for all pixel locations k, whitens these channels. The noisy covariance matrix C k ∈ C D×D is then re-parameterized by y k ∈ R D 2 , and similarly the covariance matrix of interest k is defined through the real-valued vector x k ∈ R D 2 : where the exponential corresponds to a matrix-exponential and the affine invertible mapping : C D×D the linear operator that transforms a vector of D 2 reals into a D × D Hermitian matrix: W ∈ R D 2 ×D 2 a (whitening) unitary matrix, ∈ R D 2 ×D 2 a diagonal positive definite matrix (used to weight each channel) and b ∈ R D 2 a real vector (for centering). To compute the real-valued decomposition x of a covariance matrix , the inverse transform can be applied: . A principal component analysis is used to compute matrix W and vector b: with α i k = K −1 (log(C k )), the i-th real-value extracted from the log-transformed covarianceC k = log(C k ), and the columns of matrix W are unit-norm eigenvectors of the Gram matrix: The i-th diagonal element of corresponds to the noise standard deviation of the i-th channel estimated via the median absolute deviation (MAD) estimator. Figure 3 illustrates the channels obtained with the transform −1 . Fluctuations due to the speckle noise have a variance that is approximately stabilized in the channels of y. Due to the whitening with matrix W , most of the information is captured by the first channels (associated with the largest eigenvalues) and the signal-to-noise ratio decreases with the channel number. In the last channels, some denoising artifacts can be noticed (e.g., bottom right image of Fig.3). These artifacts have a negligible impact because of the small contribution of the channel to the recomposed image.
The neg-log-likelihood of the re-parameterized covariance matrix can be derived from the Wishart distribution given in Eq. (3): where x k and y k are the D 2 -dimensional real-valued vectors corresponding, respectively, to the re-parameterization of the noiseless and noisy covariance matrices, while Cst is a constant independent of x k . MuLoG is detailed in Algorithm 1. In practice, a fixed number of steps are generally used to stop the main loop (typically 6 steps provide a good trade-off between computational cost and restoration quality). Parameter β is increased within the loop, line 11, to ensure the convergence (we use the adaptive update rule given in Eq. (15) of paper [6] which consists of increasing β whenever the norm of the primal and dual residuals decreases too slowly between two successive steps). Convergence of plug-in ADMM schemes has been studied in several works, e.g., [6,25,43,45], under various assumptions on the denoiser (non-expansiveness, boundedness), the data fidelity (convexity) or the evolution of parameter β. In our case, the data fidelity is non-convex and the convergence comes from the continuity of the gradient, which implies that the gradient is Lipschitz on any compact subset and that the conditions in [6] are fulfilled provided that the denoiser is bounded. Adaptations of the loss function or normalization procedures have been proposed for denoisers based on deep neural networks in [43,45] to ensure that the denoiser is firmly non-expansive, which also ensures convergence [6]. Several steps playing an important role in MuLoG algorithm are revisited in this paper: -the minimization technique to compute, at each pixel k, the D 2 -dimensional vectorx k line 10: this paper develops in Sect. 3 a new algorithm based on exact derivatives to improve this crucial step; -the initial estimate used forx k , line 4, computed from the matrices {C (init) k } k=1...,n defined line 1: an adequate initialization speeds up the convergence as discussed in Sect. 4.1; -the regularization of the noisy covariance matrices, set line 2, that impacts the minimization problem line 10: regularization strategies can avoid badly behaved cost functions that are hard to minimize but can also lead to estimation biases, see Sect. 4.2; -the choice of the denoiser used line 8: as illustrated in the discussion (Sect. 6), each denoiser suffers from specific artifacts.

Improved Computation of the Data-Fidelity Proximal Operator
Line 10 of Algorithm 1 involves solving n independent D 2dimensional minimization problems of an objective function of the form This corresponds to computing the data-fidelity proximal operator [9]: where x, y and u are vectors in R D 2 . We recall that y corresponds to the noisy log-channels defined in Eq. (5) and Algorithm 1: MuLoG algorithm input : a bidimensional field of speckle-corrupted covariance matrices C input : a denoiser function f σ : R n → R n to remove additive white Gaussian noise with standard deviation σ in n-pixels single-channel images output: a bidimensional field of estimated covariance matrices initialization: (7) and (8)) shown in the first column of Fig. 3, while u =ẑ k +d k in line 10 of Algorithm 1. Efficient minimization methods require the computation of the gradient g = ∇ F(x) of F at x. As shown in [14] and recalled in Appendix A for the sake of completeness, this gradient is given by where the linear operator is defined by and ∂e ] is the adjoint of the directional derivative of the matrix exponential in the direction e ( y) taken at − (x). We recall that the directional derivative of a differentiable function f at X in the direction A is defined as The directional derivative is a linear mapping with respect to the direction A. The computation of the gradient thus requires computing the adjoint of such linear mapping for the matrix exponential.
In the original derivation of MuLoG algorithm [14], we used an integral expression for the adjoint of the directional derivative of the matrix exponential (1−u) du (15) to derive an approximation based on a Riemann sum: where u q = (q − 1/2)/Q. In practice, we were using only Q = 1 rectangle to get a fast-enough approximation. In the next paragraph, we derive a closed-form expression of the gradient to avoid this approximation, and then, we obtain the closed-form expression of second-order directional derivative of F in order to obtain an improved minimization method for the computation of prox data in Eq. (11).

Closed-Form Expression of the Gradient
We leverage studies from [5,18,31] regarding the derivatives of matrix spectral functions. As shown in "Appendix B," these leads to a closed-form expression for this directional derivative of the matrix exponential as described in the next proposition.

vector of corresponding eigenvalues. Then, for any D × D Hermitian matrix A, denotinḡ
where is the element-wise (a.k.a, Hadamar) product, and, for all 1 ≤ i, j ≤ D, we have defined Algorithm 2: Exact evaluation of the gradient ∇ F input : a vector x ∈ R D 2 input : a vector y ∈ R D 2 input : a vector u ∈ R D 2 input : the affine map : (10) 1 compute the eigenvalue decomposition: Note that Proposition 1 assumes the eigenvalues of to be distinct. In practice, we observe instabilities when λ i and λ j are close to each other. To solve this numerical issue, we use that, for λ i > λ j , since exp is convex and increasing. Then, whenever an offdiagonal element G i, j is out of this constraint we project its value onto the feasible range 1 [e λ j , e λ i ]. Equation (19) shows that lim λ j →λ i In the case of duplicate eigenvalues, we use the continuous expansion obtained by replacing the condition i = j with λ i = λ j . We checked numerically that this continuous expansion is working for matrices with repeated eigenvalues.
Proposition 1 gives us an exact closed-form formula for the directional derivative. The next corollary shows that this formula is also valid for its adjoint (see proof in "Appendix C").

Corollary 1 Let be a Hermitian matrix with distinct eigenvalues. The Jacobian of the matrix exponential is a selfadjoint operator
Based on the closed-form expressions provided by Proposition 1 and Corollary 1, we define Algorithm 2 for an exact evaluation of the gradient of F.

A Refined Quasi-Newton Scheme
The computation of the proximal operator requires the minimization of function F(x). This can be performed using quasi-Newton iterations: whereĤ is an approximation of the Hessian H of F at x, i.e., the real symmetric matrix defined by While in [14] a heuristic was used to define a diagonal approximation to the matrix H, we consider here the following approximation: where γ corresponds to the exact second derivative of F at x in the direction d of the gradient g.
As proved in Appendix D, this second-order derivative is given by: where * (·) = K(W ·) is the adjoint of linear operator and for any matrices X and Y , the matrix dot product is defined as X, Y = tr(X * Y ). This choice for approximating the Hessian leads to a quasi-Newton step that is exact when restricted to the direction of the gradient. As F has some regions of non-convexity, we consider in practice half the absolute value of the scalar product in (24) in order to avoid some local minima and ensure that quasi-Newton follows a descent direction. Note that, like with the original formulation of MuLoG, we recover the exact Newton method in the mono-channel case (D = 1).
The computation of γ thus requires computing the secondorder directional derivative of the matrix exponential. We recall that the second-order directional derivative of a function f : = lim Algorithm 3: Iterative method to compute prox data input : a noisy vector y ∈ R D 2 input : a vector u ∈ R D 2 input : the affine map : The closed-form expression for the second-order directional derivative of the matrix exponential is given in the following Proposition (see proof in "Appendix E"): where, for all 1 ≤ i, j ≤ D, we have As in Proposition 1, Proposition 2 assumes the eigenvalues to be distinct. We checked on numerical simulations that the result is still valid for Hermitian matrices with duplicate eigenvalues by simply defining G i j in Proposition 1 using the condition λ i = λ j instead of i = j. Algorithm 3 details the quasi-Newton optimization scheme to solve the minimization problem (11). The algorithm starts with an initial value for x, line 1, obtained by approximating the data likelihood with a Gaussian (i.e., Wiener).

Numerical Validation
In order to validate the correctness of the derived closed-form expressions of the gradient and the directional second derivative, we leverage that for g = ∇ F(x) and γ = d * ∂ 2 F(x) ∂ x 2 d, the following two identities regarding the first and second directional derivatives hold true and During the iterations of the proposed quasi-Newton scheme, we evaluate the left-hand sides of these equalities by running Algorithm 2 and 3 to compute g and γ . Independently, we evaluate the right-hand sides of these equalities by finite difference with a small nonzero value of . The direction u was chosen as a fixed white standardized normal vector. Figure 4 shows the evolution of these four quantities during the iterations of the proposed quasi-Newton scheme for an arbitrary choice of the image y, initializations and constant L and β. In addition, the evolution of F(x) and ||g|| 2 2 = ||∇ F(x)|| 2 2 are also provided. On the one hand, we can notice that the proposed first and second directional derivatives are very close to the ones estimated by finite differences, which shows the validity of our formula. On the other hand, the objective F(x) is decreasing and its gradient converges to 0, showing that the obtained stationary point is likely to be a local minimum. Furthermore, one can notice that the second directional derivative varies less than 20% showing that the loss F is nearly quadratic in the vicinity of its minimizers.

Initialization and Regularization of Rank-Deficient Matrix Fields
Multi-channel SAR images with n pixels are provided in the form of a bidimensional field of either n diffusion vectors v k ∈ C D (single-look data) or n covariance matrices C k ∈ C D×D (multi-look data), see Figure 1. The statistical distribution of speckle is defined with respect to full-rank covariance matrices k . The initial guess for this matrix must be positive-definite, which requires an adapted strategy. We discuss different strategies for this initialization in Paragraph 4.1. Then, we describe in Paragraph 4.2 how the noisy covariance matrices C k can be regularized so that the neglog-likelihood is better behaved. We show that, in contrast to the original MuLoG method [14], different regularizations should be used for the initialization of the covariance matrices (line 1 of Algorithm 1) and for the computation of the noisy covariance matrices used to define the data-fidelity proximal operator (line 2 of Algorithm 1).

Initialization
For the sake of completeness, we recall in this paragraph the regularization approach used in the original MuLoG approach, that we still use for the initialization of MuLoG. Under the assumption that the radar properties of the scene vary slowly with the spatial location, a spatial averaging can be used to estimate a first guess of the covariance matrix At pixel k, the so-called radar coherence between channels i and j of the D-channels SAR image is defined by j , i.e., the correlation coefficient between the two channels. This coherence is notably over-estimated by the sample covariance estimator when L is small, see [46]. In the extreme case of L = 1, C k = v k v * k and the coherence equals 1, see Fig. 5. Some amount of smoothing is necessary to obtain more meaningful coherence values and to guarantee that the covariance matrix C (init) k be positive definite. The coherence is estimated by weighted averaging with Gaussian weights: where the weights w k, are defined, based on the spatial distance between pixels k and , using a Gaussian kernel with spatial variance τ/2π . Spatial parameter τ is chosen in order to achieve a trade-off between bias reduction and resolution loss. Figure 5 shows coherence imagesρ of a single-look interferometric pair (L = 1 and D = 2) for different values of τ . We use the empirical rule τ = D/ min(L, D) to increase the filtering strength with the dimensionality of SAR images and reduce it for multi-look images. Estimation of the interferometric coherence by local Gaussian filtering, for different filtering strengths τ . In the absence of spatial filtering, the coherence is equal to 1 everywhere. All images are displayed using the same gray-level scale with a zero coherence displayed in black and a unit coherence displayed in white. The area shown corresponds to the images of Fig. 2 The estimated coherence valuesρ k,i, j are then used to define a first estimate C k of the covariance matrix at pixel k: Note that on-diagonal entries are unchanged: ∀i, [ C k ] i,i = [C k ] i,i , while the magnitude of off-diagonal entries becomes equal toρ k,i, j and the phase of off-diagonal entries is preserved. In practice, the coherenceρ k,i, j is smaller than 1 for all values of τ larger than 0. In order to guarantee that the covariance matrix C k actually is positive definite, offdiagonal entries can additionally be shrunk toward 0 by a small factor (by multiplication by a factor .99 for example). The filtering step applied in equations (31) and (32) largely improves the conditioning of covariance matrix C k , which helps performing the principal component analysis required to define transform (by producing a more stable analysis). In contrast to intensity data for which E{[C k ] i j } = [ k ] i j for all channels i and j, the average of log-transformed data is known to suffer from a systematic bias [49] that can be quantified on diagonal elements, for all 1 ≤ k ≤ D, by where ψ is the di-gamma function. MuLoG does not estimate log by averaging, but by iterating the lines 8 to 11 in Algorithm 1. The sequence of these iterations leads to an Algorithm 4: Initial estimation of the covariances input : a bidimensional field of speckle-corrupted covariance matrices C output: the initial guess C (init) (to be used line 1 of Algorithm 1)

Regularization
The previous paragraph described the strategy to build an initial guess C (init) k of the covariance matrix at pixel k. This guess serves as a starting point for the iterative estimation procedure conducted by Algorithm 4. Line 2 of Algorithm 4, a regularized version C (reg) k of the covariance matrix is defined to compute vector y (line 5) and then used to define the data-fidelity proximal operator (line 10). Although a rankdeficient matrix C k such as v k v * k could possibly be used to define the proximal operator (by replacing exp( ( y)) by this rank-deficient matrix in the definition of F(x) in equation (10)), rank-deficient or ill-conditioned covariance matrices lead to cost functions F(x) that are harder to minimize. On the other hand, we show in the sequel that the spatial smoothing strategy used in Algorithm 4 to build our initial guess C (init) k should not be used to compute C (reg) k (as done in the original algorithm MuLoG [14]) since this may lead to significant biases.
In order to control the condition number of the covariance matrices C (reg) k , we adjust the eigenvalues by applying an affine map that rescales the eigenvalues from the range [λ min , λ max ] to the range [λ max /c, λ max ], withc = min(c, λ max /λ min ). This transformation ensures that the Algorithm 5: Regularization of covariance matrices input : a covariance matrix C k ∈ C D×D input : a condition number c output: a regularized matrix C (reg) k ∈ C D×D 1 compute the eigenvalue decomposition: (affine mapping of the eigenvalues)

(recomposition with the modified eigenvalues)
resulting covariance matrix has a condition number at most equal to c. Moreover, the largest eigenvalue λ max is left unchanged and the ordering of the eigenvalues is preserved by this strictly increasing mapping (provided that c > 1). It seems that this latter property is beneficial to limit the bias introduced by the covariance regularization scheme. If the condition number is larger than the actual condition number of C k , the affine map corresponds to the identity map. The computation of the regularized covariance matrices is summarized in Algorithm 5. We use in the following the value c = 10 3 .

Simulated Data
In a first experiment, we compare the impact of the modifications introduced in Sects. 3 and 4 with respect to the original MuLoG algorithm in [14] on a simulated PolSAR image generated from optical satellite images by building at each pixel index k the following covariance matrix where R k , G k and B k are the red-green-blue (RGB) channels of the optical image, and the polarimetric channels of the covariance matrix are organized in the following order H H, H V and V V . This way the optical image coincides with the RGB representation of when represented by fake colors in the Pauli basis, as described page 3. This model considers that channel HV is decorrelated from channels HH and VV, while channels HH and VV have a correlation of 1/ √ 2 ≈ 0.71. Given such a ground-truth image , we next simulated noisy The residualsẐ should ideally be as close as possible to the actual noise component Z, we report both the restoration PSNR (−10 log 10 ˆ − 2 2 ) and the residuals PSNR (−10 log 10 Ẑ − Z 2 2 ) for each estimator. The ground truth is not displayed but is extremely close to C when L = 100 (last row) versions C with L looks by random sampling, at each pixel index k, where 1/2 k denotes the Hermitian square root of 1/2 k , and v (l) k are independent complex random vectors with real and imaginary parts drawn according to a Gaussian white noise with standard deviation 1/ √ 2. By construction, this gives Z k ∼ W(Id D , L) and C k ∼ W( , L).

Evaluation with Simulations
Using the procedure described in the previous paragraph, we simulated a PolSAR data from a 2003 SPOT Satellite image of Bratislava (Slovakia) 2 . We suggest performing first a visual comparison of the estimatedˆ , respectively, obtained by the original version of MuLoG and by the modified version introduced in this paper, on noisy versions C . Should the estimation be perfect,ˆ would exactly be equal to , andẐ would perfectly match the white speckle component Z: the residuals would be signal-independent. Comparinĝ Z to Z is thus an efficient way to assess the bias/variance trade-off achieved by the estimator. Areas ofẐ that appear less noisy than Z indicate an under-smoothing. IfẐ contains structures not present in Z, this indicates an over-smoothing. WhereverẐ has a different color than Z, this is a sign of bias.
Results are provided in Fig. 6. The estimated imagesˆ obtained with the original version of MuLoG and the modified version introduced in this paper are visually very close. As expected, the quality of the restored images improves with the number of looks along with the signal-to-noise ratio in the speckle-corrupted images. The residualsẐ clearly indicate a bias with the original version of MuLoG. This bias is smaller with respect to the speckle variance when the number of looks L is lower. This is reflected by the PSNR values given in Fig. 6: the difference between the original and the modified versions of MuLoG is more pronounced for large values of L. When the number of looks L is larger, the speckle fluctuations are smaller and the PSNR of the speckled image is higher (see Fig. 6a). Unsurprisingly, the PSNR of the restored images gradually increases with the PSNR of the speckled image for both versions of MuLoG (see Fig. 6b, c). Computing the PSNR of the residualsẐ of the original MuLoG algorithm shows that the residuals are affected by an error that is increasingly large with respect to the level of fluctuations of the residuals, due to the presence of a bias. The modified MuLoG algorithm introduced in this paper, on the other hand, leads to residuals closer and closer to the true residuals when L increases. The residualsẐ obtained with the modified MuLoG algorithm are comparable to the true residuals Z: no geometrical structure from the image is noticeable in the residuals, which indicates that contrasted features where not removed from the image by the despeckling processing. The levels of fluctuations inẐ and Z seem similar. In order to perform a more quantitative comparison of the residuals, we report in Fig. 7 the symmetrical Kullback-Leibler divergence (KLD) between the distribution of the residuals andˆ for different numbers of looks ranging from L = 1 to 100. The KLD, averaged over the pixel index k, between the distributions W( k ; L) and W(ˆ k ; L) is defined by A KLD of 0 indicates a perfect match, while a larger value indicates a discrepancy. The divergence increases with the number of looks, which is expected because the KLD is a measure of divergence relative to the signal-to-noise ratio: the larger the signal-to-noise ratio, the more conserving is the KLD. We observe that for all values of L, the modified version of MuLoG leads to residuals closer to the theoretical distribution of speckle residuals. This is in agreement with the behavior observed on Fig. 6 where the bias is seen to become prominent for large numbers of looks with the original version of MuLoG. We checked by comparing the average running time on several images that the modifications introduced in this paper do not slow down MuLoG: a slight speedup was even observed in our experiments. Figures 8,9 and 11 compare the restoration performance of the original MuLoG algorithm and of the modified version introduced in this paper on PolSAR images from 3 different sensors (AIRSAR from NASA, PISAR from JAXA and SETHI from ONERA). The equivalent numbers of looks, estimated by maximum likelihood, are, respectively, equal to 2.7, 1.7 and 1 for each image. From the 1 look image of SETHI, we build a 4 looks image by spatial averaging and downsampling by a factor 2 in the horizontal and vertical directions.

Evaluation with Real Data
As previously observed on simulated data, while the results of the two versions of the algorithm are similar when the number of looks is small, a bias is visible in the residuals with the original MuLoG algorithm for larger equivalent numbers of looks. Figure 10 compares the computation time as a function of the number of pixels of the original MuLoG algorithm and the modified version introduced in this paper (the denoising step is performed for both algorithms by BM3D). Computation times were averaged over 20 executions for both algorithms. Each algorithm runs the same fixed number of ADMM iterations and quasi-Newton iterations. As expected, the computation time grows linearly with the number of pixels. Despite the refined computations of the first-order and second-order derivatives introduced in this paper, the computation time per iteration is not increased: there is even a slight acceleration of the method (Fig. 11).

Discussion
MuLoG is a generic framework that offers the possibility of a straightforward application of denoisers developed for additive white Gaussian noise (i.e., optical imaging). It suppresses the need for a time-consuming adaptation of these algorithms to the specifics of SAR imagery. Beyond a much faster transfer of state-of-the-art denoising methods, it makes it easier to run several denoisers in parallel and compare the restored images obtained by each. Figure 12 illustrates such restoration results obtained with 6 different denoising techniques. The patch-based despeckling method NL-SAR [17] is also applied to serve as a reference computed without using the MuLoG framework. The images produced with MuLoG have MuLoG and conventional Gaussian denoisers (b) (total variation minimization [42], the block-matching collaborative 3D filtering algorithm BM3D [10], the dual domain image denoising method DDID [28]) or more recent learning-based techniques (d) (a fast algorithm based on a model of image patches as a mixture of Gaussians FEPLL [37]; and two deep neural networks: DnCNN [50] and IRNCNN [51]). The restored image obtained with the patch-based despeckling method NL-SAR [17] is shown as a reference (restoration not based on MuLoG) a quality that is on-par with NL-SAR, with some denoising algorithms better at restoring textures, others at smoothing homogeneous areas or at preserving the continuity of the roads (dark linear structures). Method-specific artifacts can be identified in the results: the total-variation denoiser [42] suppresses oscillating patterns and creates artificial edges in smoothly varying regions; BM3D [10], based on wavelets transforms, is very good at restoring oscillating patterns but it may also produce oscillating artifacts in some low signal-tonoise ratio areas; DDID [28] creates oscillating artifacts and some ripples around edges; FEPLL [37] tends to suppress low signal-to-noise ratio oscillating patterns; DnCNN [50] produces smooth images with some point-like artifacts and a suppression of the low signal-to-noise ratio oscillating patterns; IRNCNN [51] better restores some of the oscillating patterns but introduces many artifacts in the form of fakeedges in homogeneous areas. We believe that the possibility to include deep learning techniques with MuLoG is particularly useful to multi-dimensional SAR imaging for which the supervised training of dedicated networks is very difficult to achieve due to the lack of ground-truth data and the dimensionality of the patterns (spatial patterns that extend in the D 2 real-valued dimensions of the covariance matrices).
One may wonder to what extent Gaussian denoisers developed to remove white Gaussian noise also behave satisfyingly to handle the non-Gaussian fluctuations that appear throughout the iterations in MuLoG's transformed domain.
A possible explanation of the success of the plug-in ADMM approach, illustrated in Fig. 12 with several denoisers including deep neural networks trained specifically for Gaussian noise, is that throughout the iterations a trade-off between data fidelity and regularity is found by repeatedly applying the Gaussian denoiser to project onto the manifold of smooth images. This interpretation indicates that the Gaussian denoiser plays a role somewhat different to its original role. Given the specific nature of both the non-Gaussian fluctuations that appear throughout MuLoG's iterations and the specificity of SAR scenes, learning dedicated denoisers (for example in the form of deep neural networks) is a natural way to further improve the despeckling algorithm. Since the distribution of fluctuations varies from one iteration to the next, specific denoisers should be used at each iteration: this corresponds to the "algorithm unrolling" approach [35] discussed below with other possible extensions of our work. In the channels shown in Fig. 3, one can observe linear structures, edges and textures that are also common to natural images. It is therefore not unreasonable to think that a denoiser suitable to denoise natural images can also be applied to the log-transformed channels. Figure 12 supports this: very different Gaussian denoisers can be applied within the MuLoG framework and provide decent restoration results. By using a denoiser specific to the content of SAR scenes, the restoration performance can be slightly improved, but a recent study [13] has shown that, for the restoration of intensity SAR images; c-f the despeckling results; g-j the residuals images with MuLoG, a deep neural network trained on natural images behaves almost as well as the same network trained specifically on SAR images. While single-channel speckle-free SAR images can be obtained by averaging long time-series of images and subsequently used for the supervised training of despeckling networks, this approach is hardly feasible in SAR polarimetry or interferometry due to the lack of data. Progress in self-supervised training strategies for remote sensing data restoration is needed to enable the learning of more specific networks [11,39]. This paper introduced several modifications to the MuLoG algorithm [14] based on the closed-form derivation of first and second-order derivatives of data-fidelity proximal operator. These mathematical developments can benefit other algorithms beyond MuLoG. Since the introduction of the plug-in-ADMM strategy in [48], there have been numerous works which considered extensions to proximal algorithms [33], half quadratic splitting [51] or formulations to include a Gaussian denoiser as an implicit regularizer [41]. The application of these approaches to multi-channel SAR data requires the evaluation of the data-fidelity proximal operator described in Sect. 3 or at least the evaluation of the gradient of the data-fidelity term (Algorithm 2). Another rapidly growing body of works considers the end-to-end training of unrolled algorithms, where deep neural networks are trained to remove noise and artifacts while gradient descent terms derived from the data-fidelity term are interlaced between two applications of neural networks [35]. The computation of the exact gradient by Algorithm 2 is then essential to apply these techniques to multi-channel SAR images. Finally, several recent works [24,26,27,29] consider Monte Carlo Markov chains to sample images according to a posterior distribution. These approaches combine a gradient step according to the data-fidelity term, a regularization step computed with a black-box denoiser (similarly to the "regularization by denoising" (RED) approach [41]), and a random perturbation (Langevin dynamics). The exact gradient of the data-fidelity term given in Algorithm 2 is once again necessary for the application to SAR imaging.
Proof (Proof of eq. (12)) Applying the chain rule to eq. (10) leads to the following decomposition Using that for any matrix A and B concludes the proof.

B Proof of Proposition 1
where is the element-wise (a.k.a, Hadamar) product, and, for all 1 ≤ i, j ≤ D, we have defined Proof Let us start by recalling the following Lemma whose proof can be found in [5,18,31].
whereĀ = E * AE and J is the skew-symmetric matrix Recall that exp = E diag(e )E * . From Lemma 1, by applying chain rule, we have We have for i = j For i = j, since J ii = 0, we conclude the proof.

C Proof of Corollary 1
Corollary 1 Let be a Hermitian matrix with distinct eigenvalues. The Jacobian of the matrix exponential is a selfadjoint operator Proof We need to prove that for any two D × D Hermitian matrices A and B, we have for the matrix dot product X, Y = tr[XY * ]. According to Proposition 1, this amounts to show where E and G are defined from as in Proposition 1. DenotingĀ = E * AE andB = E * B E, this can be recast as Expanding the left hand side allows us to conclude the proof as follows

D Hessian of the Objective F
The second-order derivative used in the approximation of the Hessian given in (23) takes the form: . (24) Proof (Proof of eq. (24)) Applying the chain rule to eq. (12) leads to (59) If follows that and thus, as ||d|| 2 = 1, it follows that where, for all 1 ≤ i, j ≤ D, we have In order to apply the chain rule on E[G Ā ]E * , let us first rewrite J and G in Lemma 1 and Proposition 1 as We are now equipped to apply the chain rule to E[G Ā ]E * in the direction B, which leads us to and We have for all 1 ≤ i ≤ D and 1 ≤ j ≤ D Hence, we get • Assume i = j. We have For k = i and k = j, we have Similarly, we have J jk (G ik − G i j ) = J i j (G jk − G ik ). Hence, we get the following It follows that • Now assume that i = j. We have [F 3 ] ii = G iiĀiiBii . It follows that which concludes the proof.