Title: Effective supervised multiple‐feature learning for fused radar and optical data classification
Abstract: IET Radar, Sonar & NavigationVolume 11, Issue 5 p. 768-777 Research ArticleFree Access Effective supervised multiple-feature learning for fused radar and optical data classification Danya karimi, Danya karimi Department of Remote Sensing and GIS, Faculty of Earth Sciences, Shahid Chamran University of Ahvaz, Ahvaz, IranSearch for more papers by this authorGholamreza Akbarizadeh, Corresponding Author Gholamreza Akbarizadeh [email protected] Department of Electrical Engineering, Faculty of Engineering, Shahid Chamran University of Ahvaz, Ahvaz, IranSearch for more papers by this authorKazem Rangzan, Kazem Rangzan Department of Remote Sensing and GIS, Faculty of Earth Sciences, Shahid Chamran University of Ahvaz, Ahvaz, IranSearch for more papers by this authorMostafa Kabolizadeh, Mostafa Kabolizadeh Department of Remote Sensing and GIS, Faculty of Earth Sciences, Shahid Chamran University of Ahvaz, Ahvaz, IranSearch for more papers by this author Danya karimi, Danya karimi Department of Remote Sensing and GIS, Faculty of Earth Sciences, Shahid Chamran University of Ahvaz, Ahvaz, IranSearch for more papers by this authorGholamreza Akbarizadeh, Corresponding Author Gholamreza Akbarizadeh [email protected] Department of Electrical Engineering, Faculty of Engineering, Shahid Chamran University of Ahvaz, Ahvaz, IranSearch for more papers by this authorKazem Rangzan, Kazem Rangzan Department of Remote Sensing and GIS, Faculty of Earth Sciences, Shahid Chamran University of Ahvaz, Ahvaz, IranSearch for more papers by this authorMostafa Kabolizadeh, Mostafa Kabolizadeh Department of Remote Sensing and GIS, Faculty of Earth Sciences, Shahid Chamran University of Ahvaz, Ahvaz, IranSearch for more papers by this author First published: 01 May 2017 https://doi.org/10.1049/iet-rsn.2016.0346Citations: 28AboutSectionsPDF ToolsRequest permissionExport citationAdd to favoritesTrack citation ShareShare Give accessShare full text accessShare full-text accessPlease review our Terms and Conditions of Use and check box below to share full-text version of article.I have read and accept the Wiley Online Library Terms and Conditions of UseShareable LinkUse the link below to share a full-text version of this article with your friends and colleagues. Learn more.Copy URL Share a linkShare onFacebookTwitterLinkedInRedditWechat Abstract In multi-sensor data fusion based on multiple features, the high dimensionality of feature space increases the runtime and computational complexity. The present study proposes a new algorithm based on the combination of random subspace (RS), linear discriminant analysis and sparse regularisation (LDASR), namely RS–LDASR for feature space dimensionality reduction, supervised feature selection, and learning. The use of RSs can effectively solve the problem of high dimensionality and high feature-to-instance ratio. The extraction of multiple features from the images raises the possibility of a correlation between features which reduce classification accuracy. In this study, after the construction of several RSs, supervised feature selection and learning based on LDASR were applied with very high accuracy. Classification and image fusion for remote sensing data analysis were tested by the implementation of feature-based fusion on two pairs of fused synthetic aperture radar and optical data. Four feature matrices were constructed using attribute profiles (APs), multi-APs (MAPs), non-negative matrix factorisation (NMF), and textural features. Support vector machine and rotation forest were applied as the base classifiers. The results show that use of RS–LDASR significantly improved the classification accuracy based on NMF plus texture features and even NMF alone. 1 Introduction Multi-sensor data fusion combining images from two or more sensors are of interest for coping with the increasing variety of remote sensing data [1]. Important applications for data fusion include building height extraction in urban areas [2], change detection [3], flood detection [4], estimation of impervious surfaces [5], and thematic mapping [6]. Multi-sensor data fusion combines complementary information in data to obtain data that contains more useful information than did the original images. There are three ways of fusing satellite data: pixel-based, feature-based, and decision-based methods [5]. The pixel-based approach is unsuitable for synthetic aperture radar (SAR) and optical data fusion because of speckle noise in radar data. Feature-based methods are more efficient for fusing radar and optical data as and have been employed in the present study [5]. SAR sensors use long wavelengths and have the ability to image at all hours of the day under different weather conditions. The sensitivity of SAR to geometric configurations and soil moisture makes these images important sources of complementary information for optical data. Sea target detection [7], detection of inland open water surfaces [8], tropical cyclones observation [9], and soil parameter retrievals [10] are examples of SAR imaging applications. In environmental and socio-economic studies, pixels are classified according to the similarity of pixels [5]. Researchers have recently proven that morphological profiles (MPs) [11] and morphological attribute profiles (APs) [12] are capable tools for extraction and modelling of spatial information in remote sensing data and contain useful information to distinguish classes. MPs and APs first introduced modelling of spatial information of panchromatic, multispectral, and hyper-spectral images [13]. Extended MPs were then introduced for hyper-images [14] and extended APs were introduced by Dalla Mura et al. [15] and Ghamisi et al. [16]. The use of multilevel MAPs has not yet been applied for classification of radar and optical fused data. Other types of image features such as textural features contain valuable information for image classification. Textural features such as the grey-level co-occurrence matrix (GLCM) [17] and wavelet transform [18] are highly capable of accurate classification. Non-negative matrix factorisation (NMF) [19] provides significant structural–spatial local information about the images. The present study first applied random subspace (RS) for feature space dimensionality reduction and feature-to-instance ratio reduction of fused SAR and optical data. Next, it decreased feature correlation and selects the optimal features, using a combination of linear discriminant analysis and sparse regularisation (LDASR) based on L2,p-norm in remote sensing images for supervised feature selection and learning. The SR method based on L2,p-norm, unlike the method used by Masaeli et al. [20], prevents arriving at trivial solutions of zero value in the transformation matrix by imposing sparsity to the matrix. SR forces the transformation matrix to have more rows featuring zero values; as a result, more optimal features will be selected. The novel RS–LDASR approach is a combination of RS and LDASR and was successfully applied to reduce dimensionality and for supervised feature learning. It is noteworthy that RS–LDASR utilises filter-type feature selection methods that it has been proven that these methods have less computational complexity compared with other feature selection methods [21]. Also, the proposed algorithm has less time and computational complexity costs compared with the three mentioned feature selection methods because of the implementation of the proposed algorithm on the low-dimensional feature subspaces. Besides the mentioned advantages of RS–LDASR, it can be added that the classification based on RS–LDASR is an ensemble method. In many studies, it is proved that ensemble strategy significantly improves the classification accuracy [22]. The support vector machine (SVM) and rotation forest (RoF) [23] methods were employed for classification. Section 2 introduces image features. Section 3 explains the RS–LDASR algorithm. Section 4 discusses the results and introduces the study area, data set and compares the RS–LDASR and current state-of-the-art methods. Section 5 presents the conclusions. 2 Features 2.1 Attribute profiles In APs, the connected components, which are sets of pixels that have the same grey level, are successively filtered based on predefined criteria. The operators used in extracting APs are thinning and thickening [24]. Fig. 1a shows the AP feature vector as a stack of 2n + 1 images (n images from the thickening profiles, followed by the original image, and n images from the thinning profiles). To create MAPs, several APs can be calculated based on different image characteristics to create the output feature vector using all the APs. Fig. 1b shows the architecture of the MAPs. Fig. 1Open in figure viewerPowerPoint Spatial feature vector (a) AP feature vector; (b) MAP vector 2.2 Textural features One common method for texture features is GLCM that shows the frequency of a grey level in a linear spatial relationship with other grey levels within a specific region of an image [17]. In this paper, the GLCM features of contrast, energy, correlation, and homogeneity were extracted. The use of a combination of textural features can improve the classification accuracy [25]. In addition to GLCM, the wavelet transform can also be used to extract textural features and is capable of evaluating signals and images on different scales [26]. The first component's energy of wavelet transformation contains spectral information and the other components energy contains textural information [27]. Fig. 2 shows the outputs of the level 1 wavelet transformation. Fig. 2Open in figure viewerPowerPoint One level wavelet transformation (a) Original image, (b) Approximation component, (c) Diagonal component, (d) Horizontal component, (e) Vertical component 2.3 Non-NMF The matrix factorisation method separates the image matrix into two non-negative matrices [28]. Suppose that X is an image matrix. The goal of NMF is to extract non-negative features in form of a W matrix to satisfy (1) as (1) where G is the weighted matrix and W is the base matrix [19, 28]. The base vectors are non-negative features and contain important structural–spatial information [29, 30]. Fig. 3 shows the proposed methodology for dimensionality reduction and supervised feature learning. Fig. 3Open in figure viewerPowerPoint Scheme of the proposed methodology for dimensionality reduction and supervised feature learning 3 Classification ensemble based on RS–LDASR Large-scale matrices in which all data is used for classification increase computation time and complexity and sometimes even raises the level of error. Under such circumstances, the use of a part of the data set can be used; however, it is possible to run a classified section using an inadequate data subset that will result in low classification accuracy. The best solution to this is to use various subsets of the original data set to train the classifier. This idea of RSs was proposed by Xia et al. [31] using the extreme learning machine (ELM) classifier. The classification ensemble based on RSs can be summarised as follows: (i) generate a subset of N features from the entire feature set for F times; (ii) apply these features to the classifier and obtain F classification results; and (iii) produce the final classification map by combining the F results using majority voting rule [31]. In contrast to Xia et al., in this paper, the RSs will be learned using LDASR. This supervised method is repeated using different RSs and different new learned data sets as inputs to the classifier. The classification results are then combined using the majority voting rule. RoF and SVM classifiers were used for classification. Note that utilising LDASR on remote sensing data and combining RSs with LDASR (RS–LDASR) as a novel algorithm for supervised feature selection and learning. 3.1 LDA sparse regularisation Feature selection and feature learning are usually performed separately, but in the proposed method by Tao et al. [21] both processes are performed simultaneously. In this method, sparsity imposed by LDASR on the transformed matrix using the L2,1-norm regulariser to carry out feature selection. The algorithm extends to the L2,p-norm to give better sparsity results where 0 < p < 1 [21]. The types of feature selection methods [32] are: (i) filter methods [33]; (ii) wrapper methods [33]; and (iii) embedded methods [34]. In the method proposed by Tao et al.[21] for supervised feature selection, filter methods were used. Compared with filter methods, wrapper methods and embedded methods have very high computational costs. Therefore, in this paper, filter methods have used for feature selection and dimension reduction that they have less computational complexity cost. Filter methods, because of using only the intrinsic characteristics of the data without using the learning mechanism, proved to be powerful and has been used in many research [21]. The goal of supervised feature selection is to find the most discriminating feature to separate the classes [35, 36]. Tao et al. proved the existence of a trivial solution for linear discriminative feature selection (LDFS) [20]. On the basis of this new formulation, the trivial solution can be avoided, making the transformed vectors uncorrelated. In their method, the L2,1-norm is used because it can be solved with a simple algorithm and is capable of feature selection and removal of additional features (such as LDFS algorithm). The algorithm also avoids the trivial solution. 3.2 Discriminative feature selection using L2,p-norm Masaeli et al. [20] performed supervised feature selection and learning based on L∞,1-norm. The main drawback to this algorithm is that it has a trivial solution for all zero values; thus, the algorithm looses its ability for feature selection when it reaches the trivial solution. The algorithm can be expressed as (2) where B* is the output matrix, B is a transformation matrix, Ew is a within-class scatter matrix, Eb is an inter-class scatter matrix, γ > 0 is the regularisation parameter of the sparsity features of the rows in matrix B, T is the matrix transpose, b is the transformation vector, d is the initial data size, and l is the reduced dimensions. Tao et al. [21] used L2, p-norm to solve this problem. L2, p-norm forces the transformation matrix to have more rows of zero values by imposing sparsity on the matrix. Research has shown that the use of Lp-norm (0 < p < 1) provides sparser solutions than L1-norm [37]. The equation for supervised feature selection and learning is (3) If p = 1, (3) becomes (4) On achieving optimal transformation matrix , the features can be arranged according to in descending order to allow selection of features with higher orders as the optimal features in which is the ith row of . Generally, the closer the value of p is to 0, the better approximation of the formulation is for the feature selection [21]. Tao et al. proposed a simple formulation for both convex (1 ≤ p ≤ 2) and non-convex (0 < p < 1) regularised cases. For convenience, is suggested and (5) becomes (5) where is a diagonal matrix with the ith diagonal element as in (6). If D is constant, (3) can be rewritten as (6) (7) The problem in (3) can be solved by solving (8) in which D obtained by (6) as (8) Solving the problem in (8) finds l vectors associated with the minimum values of l in (9), where Et is the total scatter matrix. RS–LDASR can be summarised as Algorithm 1 (9) The proposed algorithm is described in Algorithm 1. Algorithm 1.RS–LDASRTraining phaseInputs: is the training sample; F is the number of classifiers; H is the base classifier; is the ensemble; N is the number of features in a subspace; O is the feature set; and label information parameters are γ, l, p.Output: Ensemble C for i:1 to F Randomly select feature set from O without replacement to form a new training set composed of N features; Compute the scatter matrices Et, Eb; Set k = 0 (class number). Initialise Dk ∈ Rd×d as an identity matrix. Repeat 5. Solve the generalised eigenproblem ; 6. Bk+1 = [b1, b2, …, bl], where b1, b2, …, bl are the eigenvectors associated with the smallest first l eigenvalues; 7. Calculate diagonal matrix Dk+1, where the ith diagonal element is ; 8. k = k + 1. To convergence 9. Train classifier Hi using a newly learned training set; 10. Add the classifier to current ensemble ; 11. end for 4 Results The proposed methods were evaluated on two pairs of real radar and optical Sentinel and advanced land observation satellite (ALOS) images. RoF and SVM were used as classification algorithms. The number of testing and training samples and the location of the sample data are displayed in Table 1 and Fig. 4, respectively. The four feature combinations were constructed and the effects of RS–LDASR in dimensionality reduction and supervised feature selection and learning on the classification results were evaluated. Table 1. Number of testing and training samples Classes Sample pixels per image Number Class name Train Test 1 city 512 1024 2 crop 512 1024 3 coastal 512 1024 4 river 512 512 Fig. 4Open in figure viewerPowerPoint Location of selected sample areas 4.1 Study area and data set The study area is located in the South of the city of Ahvaz in southwestern Iran between 31°14′4″ and 31°14′57″N latitude and 48°37′36″ and 48°39′17″E longitude. The study area comprises four classes of land cover: coast, crops, river, and city area. The Sentinel-1 radar image (spatial resolution: 10 × 12 m2; date of imaging: 29 November 2015: C-band and vertical-vertical (VV) polarisation) and Sentinel-2 optical image (spatial resolution: 10 × 10 m2; date taken: 30 November 2015; contains 13 optical bands) were used as the first pair of fused images. Since the fusion of multi-sensor images makes use of the benefits of both sensors and decreases the disadvantages of original data, the fusion of the images was done using MAPs along with other features. Sentinel-1 is a two satellite constellation with the prime objectives of land and ocean monitoring. The goal of the mission is to provide C-band SAR data continuity following the retirement of ERS-2 and the end of the Envisat mission. One geometrically corrected S1A_IW_GRDH Sentinel-1 radar image that is interferometric wide swath imaging mode using synthetic aperture radar-C band (SAR-C) instrument was chosen in this paper. An enhanced Lee filter was used to reduce speckle noise and the image was then resampled Sentinel-2 was also geometrically corrected and atmospheric correction was carried out for this image alone. The full Sentinel-2 mission comprises twin polar-orbiting satellites in the same orbit, phased at 180° to each other. The mission will monitor variability in land surface conditions, and its wide swath width and high revisit time (10 days at the equator with one satellite, and 5 days with two satellites under cloud-free conditions which result in 2–3 days at mid-latitudes) will support monitoring of changes to vegetation within the growing season. The coverage limits are from between latitudes 56° South and 84° North. The Level-1C Sentinel-2 data product was chosen in this paper. Among 13 bands, four bands of the Sentinel-2 image and one Sentinel-1 radar image were used for fusing. The mentioned four bands of Sentinel-2 are blue, green, red, and short infrared bands. These bands contain most important information of Earth targets. Before feature extraction, the images were co-registered. Both images are 134 × 229 pixels in size. A scene of ALOS phased array type L-band synthetic aperture radar (PALSAR) (acquired on 06/2010 with a spatial resolution of 12.5 m, L-band, and horizontal-horizontal (HH) and horizontal-vertical (HV) polarisations) and a scene of ALOS advanced visible and near infrared radiometer type-2 (AVNIR-2) (acquired on 06/2010 with a spatial resolution of 10×10 m2 and four optical bands) were used as the second pair of fused images. The preprocessing of this pair is such as the first pair of fused images. The dimensions of ALOS images are 117×189 pixels. Fig. 5 shows the study area and its land cover map. Fig. 5Open in figure viewerPowerPoint Study area and its land cover map (a) Sentinel-2 optical image of the study area, (b) Land use map 4.2 Application of RS–LDASR on the fused Sentinel images The feature combination method used was to combine the NMF and textural features. The non-negative matrices were extracted and then wavelet transform and GLCM were applied to obtain the textural features; the feature types were then used to construct the feature matrix. After extracting several RSs from the feature combination, the subspaces were used to train the classifiers with and without LDASR. The results were combined using the majority voting rule and showed that for RoF and SVM classifiers, RS–LDASR improved the classification accuracy significantly. When RS–LDASR was applied in combination with NMF and texture features, SVM with 100% accuracy was the best classifier; the accuracy of RoF was 64%. The second feature matrix consists of NMF features. Like the previous feature matrix, after extracting several feature sub-spaces, NMF features were used to train the classifiers with and without applying LDASR. The results showed significant improvement in both classifiers based on using RS–LDASR. The classification accuracy of SVM based on RS–LDASR was 91% and the classification accuracy of RoF was 70%. The reason for this significant difference between RoF and SVM results lies in the nature of RoF algorithm. RoF classification method has been recently proposed as a classification ensemble approach [23]. In this method, a classification ensemble is created by using independent decision trees, which are built for the different sets of extracted features. What is notable in this classification method is the simultaneous application of feature extraction and reconstruction of the set of different features for each classifier. To do this, the feature space is randomly divided into N cuts, each of which includes G features. Then, the principal component analysis (PCA) is applied to each cut and using all major components, new sets of G linear features extracted in each cut are reconstructed. In addition, new learning data are formed in each cut through the use of G features. Each single decision tree will be trained by these learning data. Through several iterations of this process, several classification results will be created. The final classification result will be achieved through the integration of the obtained results by applying majority voting rule [23]. Therefore in the case of RoF, the feature space converts to new space different from the result of the RS–LDASR algorithm. In the case of SVM classifier, the learned feature matrix based on RS–LDASR is used as input to train the classifier without any changing phase. So, the classification accuracies of two mentioned classifiers are different. It is notable that when RS–LDASR improves the classification accuracy, the accuracy of SVM is much higher than that of RoF. This proves the usefulness and importance of the selected features based on RS–LDASR that, unlike RoF classifier, they were used directly in the SVM with no changing phase. The third feature combination used combined MAPs and textural features. MAPs have not been used previously for classification of fused radar and optical data. The current study provided the opportunity to evaluate the capabilities of these features for fused radar and optical data classification. To construct this feature combination, APs for the area and diagonal characteristics were extracted and wavelet transform and GLCM were used for texture feature extraction. All extracted features were applied to construct the training and testing feature matrices. To train the classifiers, several RSs were selected from the full feature matrix in the method described by Xia et al. [31]. The subspaces were used to train the classifiers with and without implementing LDASR. Comparison of the results shows that, in this feature combination, the use of RSs directly to train the classifiers improved the accuracy of the classification, but applying LDASR to the RSs decreased the classification accuracy. For this feature combination, the use of RSs gave the best RoF classification accuracy. RoF with 100% accuracy was the best classifier; the accuracy of SVM was 57%. The final feature combination consisted of APs (based on area), NMF, and textural features. The results show that the use of RSs directly to train the classifier improved the accuracy of the classifiers without the need for feature learning using LDASR. Table 2 shows the overall classification accuracy and Fig. 6 shows the resultant maps. Table 2. Results of classification methods based on RSs and RS–LDASR (overall accuracy) NMF features NMF and texture features MAPs and texture features APs, NMF, and texture features Unlearned feature Learned feature Unlearned feature Learned feature Unlearned feature Learned feature Unlearned feature Learned feature SVM 24 91 29 100 57 29 64 14 RoF 24 70 57 64 100 79 100 71 Fig. 6Open in figure viewerPowerPoint Classification based on (a) RoF using RSs of APs, NMF and textural feature matrix, (b) SVM using RSs of APs, NMF and textural feature matrix, (c) RoF using RSs of MAPs and textural feature matrix, (d) SVM using RSs of MAPs and textural feature matrix, (e) RoF using unlearned NMF and textural feature matrix, (f) SVM using unlearned NMF and textural feature matrix, (g) RoF using learned NMF and textural feature matrix applying RS–LDASR, (h) SVM using learned NMF and textural feature matrix applying RS–LDASR Xia et al.[31] mentioned that spatial features such as APs and MAPs are used as inputs to classifiers without any preprocessing. Therefore, the use of LDASR on the feature combination with APs decreased the classification accuracy. This indicates that the high level of spectral–spatial information in APs and MAPs can easily distinguish between classes with high accuracy without the need for learning and other preprocessing phases. However, because of the high dimensionality of these features especially in the case of image fusion, RS is helpful for feature space dimensionality reduction and based on the results, it can improve the classification accuracy. For other features, such as NMF plus textural features and NMF features alone, feature learning implementing RS–LDASR improved the classification accuracy significantly. It can be concluded that the combination of APs, NMF, and textural features based on RS produced the best RoF classification accuracy and a combination of NMF and texture features based on RS–LDASR obtained the best SVM classification accuracy when compared with other feature combinations for fused radar and optical data. Therefore, only when using spatial features, no feature learning is necessary, but for other types of features, the feature selection and learning is necessary to improve the classification accuracy. RS–LDASR significantly improved the classification accuracy of the NMF plus texture feature combination and even of NMF features. Fig. 7 compares the classification results of learned and unlearned feature combinations. Fig. 7Open in figure viewerPowerPoint Classification results of fused Sentinel images using (a) NMF plus texture features, (b) NMF features 4.3 Comparison of RS–LDASR and other feature selection methods based on using Sentinel fused images and ALOS fused images To evaluate the applicability of RS–LDASR in terms of dimensionality reduction, supervised feature selection and learning, the results of the proposed algorithm were compared with the results of the following three filter-type feature selection methods: (i) ReliefF, which evaluates features based on the ability to distinguish close instances [38], (ii) Laplacian score, which is based on the observation that pieces of data from the same class are often similar. This algorithm can be used in supervised or unsupervised fashion. It evaluates the features based on a Laplacian score or their power in locality preservation [39] and (iii) multi-cluster feature selection method (MCFS) as performed in either supervised or unsupervised fashion. This algorithm selects the features such that the multi-cluster structure of the data can be best preserved [40]. Table 3 and Fig. 8 show the results of the comparison of the classification accuracy of the four feature selection methods. RS–LDASR improved the classification accuracy significantly for NMF plus textural features and NMF features compared with the other methods. RS–LDASR increased the RoF classification accuracy from 57 to 64% and the SVM classification accuracy from 29 to 100% using a combination of NMF and texture features and improved the RoF classification accuracy from 24 to 70% and the SVM classification accuracy from 24 to 91% using NMF features. Figs. 8a and b show that RS–LDASR obtained the best results, especially using SVM. In general, RS–LDASR was the best method for classification of fused SAR and optical data using NMF and textural features and even using NMF features. Table 3. Classification accuracy (percentage) of different feature selection methods using Sentinel fused images ReliefF [38] Laplacian score [39] MCFS [40] RS–LDASR RoF SVM RoF SVM RoF SVM RoF SVM NMF plus texture 14 0 57 29 14 57 64 100 NMF 48 46 20 20 53 14 70 91 Fig. 8Open in figure viewerPowerPoint Comparison of classification accuracy (a) On the basis of
Publication Year: 2016
Publication Date: 2016-11-29
Language: en
Type: article
Indexed In: ['crossref']
Access and Citation
Cited By Count: 49
AI Researcher Chatbot
Get quick answers to your questions about the article from our AI researcher chatbot