Moving objects classification via category-wise two-dimensional principal component analysis

Classifying moving objects in video sequences has been extensively studied, yet it is still an ongoing problem. In this paper, we propose to solve moving objects classification problem via an extended version of two-dimensional principal component analysis (2DPCA), named as category-wise 2DPCA (CW2DPCA). A key component of the CW2DPCA is to independently construct optimal projection matrices from object-specific training datasets and produce category-wise feature spaces, wherein each feature space uniquely captures the invariant characteristics of the underlying intra-category samples. Consequently, on one hand, CW2DPCA enables early separation among the different object categories and, on the other hand, extracts effective discriminative features for representing both training datasets and test objects samples in the classification model, which is a nearest neighbor classifier. For ease of exposition, we consider human/vehicle classification, althoughtheproposedCW2DPCA-basedclassificationframeworkcan beeasilygeneralizedtohandlemultiple objectsclassification.TheexperimentalresultsprovetheeffectivenessofCW2DPCAfeaturesindiscriminatingbetweenhumansandvehiclesintwopubliclyavailablevideodatasets.


Introduction
Moving object classification (MOC) in video sequences is an active research area due to its potential in providing more capabilities to wide range of vision-based systems including video surveillance, traffic monitoring and analysis, and security applications. It aims to correctly assign moving objects in dynamic scenes to their respective categories and in turn extract detailed information that helps in understanding objects' behaviors and assessing events within the observed areas of interest. Despite its significant importance, MOC remains The publisher wishes to inform readers that the article "Moving objects classification via category-wise two-dimensional principal component analysis" was originally published by the previous publisher of Applied Computing and Informatics and the pagination of this article has been subsequently changed. There has been no change to the content of the article. This change was necessary for the journal to transition from the previous publisher to the new one. The publisher sincerely apologises for any inconvenience caused. To access and cite this article, please use Alsaqre, F., Almathkour, O. (2020), "Moving objects classification via category-wise two-dimensional principal component analysis", New England Journal of Entrepreneurship, Vol. ahead-of-print No. ahead-of-print. https://10.1016/j.aci.2019.02.001. The original publication date for this paper was 19/02/ 2019. one of the challenging topics in computer vision, especially for outdoor vision-based systems. The primary obstacles of objects classification lie in different inevitable factors such as uncontrollability of outdoor conditions and complexity of background scenes. Furthermore, moving objects that often appear in the field of view are humans, vehicles, motorcycles, and bicycles. As these objects move, their appearances and motions can vary drastically, bringing further difficulties to the classification task.
There have been various existing methods for classifying moving objects in video sequences. One of the important problems in these methods is the differentiation between dynamic objects and background scenes. There are basically two approaches towards this problem. One is to apply a class-specific detector (e.g., human [1], car [2]) at each frame location. However, besides the lack in their generalizability, the application of such detectors is computationally intensive and inadequate for low resolution video sequences. The other is to perform object segmentation prior to classification process. As opposed to applying classspecific detectors, the vast majority of MOC methods assuming stationary camera, and hence benefiting from background modelling techniques, mainly adaptive background subtraction (BS) [3] and Gaussian mixture model (GMM) [4] to segment multiple objects regardless of their types. Note that, these techniques are still not fully satisfactory in terms of performance and accuracy, but, so far, no appropriate alternatives are available [5]. In this work, we follow the latter approach. More specifically, we segment moving objects by means of adaptive BS.
As common to all prior relevant works, after segmenting moving object, the standard scheme for classification consists of firstly performing features extraction to identify descriptors/signatures that properly characterize both individual object and predefined class to which the object belongs. Secondly, each object can be assigned to its most likely category by applying classification model(s). It is natural that the carried-out works vary greatly in terms of targeted moving objects, types and numbers of exploited features, and employed classification models. Besides, the amounts and distributions of processed video sequences differ considerably. As such, there are substantial differences among the reported classification performances, impeding meaningful quantitative evaluation.
In general, available MOC methods can be grouped according to the different features by which the moving objects are described. Regularly used features, individually or in combination, include shape, motion, and texture features.
In shape-based methods, object geometrical properties (dispersedness, silhouette, aspect ratio, area, etc.) are utilized as crucial features for classification. An early example is the work of Lipton et al. [6], which uses dispersedness as a classification metric to discriminate between humans and vehicles. Silhouette-based classification is reported in [7], where a current silhouette is matched with a set of prelabeled silhouettes by distance function. Lin and Wei [8] made use of height/width ratio to specify object category according to predefined thresholds. Often it does not suffice to exploit a single shape feature for classifying different object types. For instance, dispersedness may lead to misclassify human group as a vehicle or vice versa. A straightforward alternative is to use a mixture of shape features. For example, Collins et al. [3] adopted dispersedness, aspect ratio, area, and zoom factor to train a neural network (NN) classifier for categorizing moving objects. The effectiveness of various shape features in conjunction with NN, support vector machine (SVM), and support vector data description (SVDD) are investigated by Hota et al. [9]. However, notwithstanding their simplicity and ease of implementation, shape-based methods unable to accommodate the diverse variabilities in object appearance.
Motion-based methods use temporal information to characterize either entire object or local distinctive patterns. An interesting motion cue is the repetitive movement exhibited by non-rigid articulated objects. In this regard, Lipton [10] considered residual flow as a measure of both rigidity and periodicity of dynamic objects. Cutler and Davis [11] detected and characterized object periodic movement via self-similarity based time-frequency analysis. In ACI [12], Javed and Shah established recurrent motion image (RMI) to encode the recurrent motion of object parts based on the recovered silhouette changes in consecutive frames. Later, Yogameena et al. [13] and Landabaso et al. [14] followed a similar approach, but replaced silhouette with star skeleton and blob, respectively. Conspicuously, these methods rely heavily upon repetitive motion, and therefore cannot be applied when dynamic objects performing complex and or unconstrained movements. Yet, some methods focused on classifying events into the category of humans or vehicles using classification models (e.g., AdaBoost network [15], Bayesian classifier [16]) built on training data containing labeled trajectories. Definitely, this sort of classification is only applicable to situations where the moving objects tend to generate specific trajectories information.
A group of methods attempts to solve classification problem by benefiting from the functional characteristics of objects' geometrical properties and motion information in a complementary manner. Of these, Zhou and Aggarwal [17] showed that the variances of motion direction yield a good performance in classifying humans and vehicles meanwhile the variances of shape compactness well discriminate human from human group; in doing so, they used K-nearest neighbor (KNN) classifier. Zhaoxiang et al. [18] introduced unsupervised framework to classify human, vehicle, and bicycle, using 5D feature vector formed from shape and motion descriptors (size, compactness, area, velocity, and parameterized angle) coupled with K-means clustering and decision level fusion. Bose and Grimson [19] distinguished between humans and vehicles using a discriminative SVM combined with soft margin and Gaussian kernel. They considered mutual information between candidate features and labeled dataset as scoring criteria to select the informative features and group them into scene-invariant (orientation, variation in area, and percentage occupancy) and scene-specific (image coordinates, motion direction speed, area, and aspect ratio). However, many drawbacks, mainly attributed to instability of objects features among various scenes, limit the application of this group of methods [20].
Texture features have also been exploited for MOC due to their ability to encode various types of visual information within object region. Zhang et al. [21] proposed to apply Adaboost learning algorithm with multi-block local binary pattern and error correcting output code to categorize moving objects. In the work by Liang and Juang [22], local shape features and histograms of orientated gradients (HOGs) are adopted to train hierarchical SVM classifier for differentiating between human, car, motorcycle, and bicycle from their side-view imagery. For more realistic scenarios, texture features are also combined with shape and/or motion features. In this way, Miller et al. [23] classified humans and vehicles with a linear SVM model using 9D feature vector contains eight dimensions of edge histograms and one dimension of aspect ratio. Longbin et al. [24] also tackled human/vehicle classification problem. In their work, object size, location, and velocity are incorporated with the differences between its HOGs calculated in consecutive frames, and the classification task is posed as a Maximum A Posterior problem. Gurwicz et al. [25] considered a broad range of features such as luminance asymmetry, 2D moments, cumulants, and morphological properties. They employed five classification techniques (SVM, NN, Bayesian network, decision tree, and KNN) to classify body organs, human, human group, bag, and clutter. Civelek and Yazici [26] combined speed up robust features (SURF) and shape features (aspect ratio, blob ratio, dispersedness, and compactness) in cascade mode to classify human, human group, and vehicle via KNN classifier. Nevertheless, one of the persistent drawbacks with texture-based methods and their combined use is that they require extensive training datasets, which is impractical. Further, common to all of them is the assumption that both training datasets and testing exemplars are gathered from the same video data, which limits their generalization to new video sequences and classes.
Quite obviously, none of the existing MOC methods can be regarded as a prevailing one or entirely satisfactory. It is therefore necessary to advocate for a different strategy through Moving objects classification which better solution to MOC problems can be attended. Towards this, we propose to leverage the invariant characteristics of object type to remedy as much of the deficiencies as possible. Indeed, an effective approach to capture object invariant characteristics is to use principal component analysis (PCA) [27], which is extensively exploited in face recognition to generate Eigenfaces as compact representations of face images in a lower-dimensional space. In fact, inspired by the Eigenfaces technique, a few attempts have been made to classify onecategory object such as pedestrian [28] and vehicle [29]. In this paper, we revisit and extend the application of PCA to multiple objects classification, specifically its 2D version (2DPCA) [30], which, as the name implies, directly deals with 2D images instead of 1D vectorized images. Despite its simplicity, 2DPCA leads to significant improvement in classification performance over traditional PCA, since it, in much lower dimensional representation, effectively preserves structural relation among dataset samples and allows to include more spatial information in produced features. Most importantly, our application of 2DPCA in MOC differs significantly from its standard application in face recognition, in which 2DPCA is used to map the whole face data (complete set of labeled face images belong to certain individuals/classes) from original space into feature space. In contrast, we utilize 2DPCA in a more generalized manner by applying it independently to object-specific training datasets to generate category-wise feature spaces such that each feature space uniquely captures the invariant characteristics of the underlying object category. That is, by considering each training dataset covers a sufficient range of the object appearance conditions positioned on a uniform background, the retained features convey the most energy of the training samples and some useful local information. Consequently, category-wise 2DPCA (CW2DPCA) not only enables early separation between the different object categories, but also provides effective discriminative representations of both training datasets and test samples to the classification model. In practice, the established classification framework exhibits strong resistant to the variability in objects appearance and, in addition, it is inherently insensitive to objects movement. Note that, the presented framework is explicitly designed to classify moving objects without exploiting tracking information. Also note that, while our MOC method can be used to solve almost any multiclass classification problem, however, the formulations and conducted experiments are confined to human/vehicle classification.
The remainder of this paper is structured as follows. Section 2 reviews the application of PCA and 2DPCA in face recognition. Details of the proposed MOC method are provided in Section 3. Experimental results are presented in Section 4; and finally, Section 5 concludes the paper.

Review of PCA and 2DPCA
In this section, we will briefly describe the procedures in PCA-based and 2DPCA-based face recognition methods.

Principal component analysis
The standard procedure in PCA-based face recognition methods is to represent 2D face images as 1D vectors in image space and project these vectors over a small set of principal components (PCs) to extract the most expressive features while simultaneously prune irrelevant information. And these PCs are indeed the leading eigenvectors of the input images covariance matrix.
; be a training set of N face images exclusively partitioned into classes of individuals, and let Λ ¼ fa i g N i¼1 ; a i ∈ ℝ pq ; be a set obtained by vectorizing each image of A. Assume P ¼ fa i −ag N i¼1 ; where a denotes the mean vector of Λ, then the covariance matrix of Λ is defined as C ¼ P P T ∈ ℝ pq3pq : Due to the high ACI dimensionality of image vector space and hence the extreme difficulty in computing C, PCA is very often solved via the eigen-decomposition of the matrix L ¼ P T P ∈ ℝ N3N [31]. Suppose that E ¼ ½e 1 ; . . . ; e d ∈ ℝ N 3d computed as the eigenvectors of L corresponding to the first d biggest eigenvalues, then X ¼ PE ∈ ℝ pq3d gives the PCs of Λ. In other words, columns of X ¼ fx i g d i¼1 are the eigenfaces spanning d-dimensional subspace (facespace) of image vector space. Once the facespace is established, the training images and a given test image, a ∈ ℝ pq , are then projected onto this subspace to produce the weight vectors Z Λ ¼ X T P ∈ ℝ d3N and Z a ¼ X T ða − aÞ ∈ ℝ d , respectively. By measuring the similarity between Z a and each column of Z Λ , a can be assigned to its relevant class.

2D principal component analysis
As opposed to PCA, 2DPCA directly transforms a training set of 2D face images into a set of training feature matrices without vectorization process. Essentially, 2DPCA seeks to construct an optimal projection matrix whose column vectors are the optimal projection axes that maximize total scatter of projected images. In fact, it is proven that these axes are basically the principal eigenvectors of image scatter matrix corresponding to the largest eigenvalues [30]. In particular, consider the set A, and let A be the mean image of all training samples. The total . It then follows, by application of eigen-decomposition to S matrix, that the eigenvectors associated with the first k eigenvalues of S form the optimal projection matrix U opt ¼ ½u 1 ; . . . ; u k ∈ ℝ q3k . The 2DPCA transformation is then applied to each training image, resulting in a set of training . For a given test image T ∈ ℝ p3q , its feature matrix Y T ¼ T U opt ∈ ℝ p3k is compared to each training feature matrix and is ascribed to the class whose training samples yield highest similarity measure.

The proposed method
The framework of the proposed MOC method is illustrated in Figure 1. Given a video sequence, the objects segmentation is first conducted to segregate foreground objects from the background of each observed frame using adaptive BS. As next step, feature extraction is carried out using CW2DPCA. In the training phase, since our focus here is on classifying two categories of objects (humans and vehicles), two disjoint training datasets are used, each comprises a relatively small number of object images per category with diverse object appearance conditions and uniform background. 2DPCA is then applied to each training dataset, leading to construct two category-wise optimal projection matrices. It follows that two sets of training feature matrices are derived, each capturing the underlying invariant characteristics of its respective object. During the testing phase, for each test object segment, two test feature matrices are generated, each by one of the optimal projection matrices. Such representation allows for facilitating the subsequent classification by only returning the minimum distances between each of the test feature matrices and its relevant set of training feature matrices (i.e., obtained by the same optimal projection matrix), thereby the test object image is assigned to its category. To this end, a nearest neighbor classifier based on Euclidean distance is employed.

Dynamic objects segmentation
Despite the existence of many sophisticated segmentation techniques [32,33], BS still, by far, occupies a prime position in this context because of its algorithmic simplicity and low computational expense. But on the other hand, BS is susceptible to environmental conditions Moving objects classification and subtle variations in background scene. It also fails when background geometry is modified, for example, a moving object becomes static or vice versa. To circumvent these issues, a common practice involves the employment of adaptation mechanism to learn background scene over time [34]. Thusly, the segmentation procedure entails background scene modelling and maintaining as well as foreground objects detection. With this in mind, we tackle the segmentation problem by means of adaptive BS.
To do so, we first construct a reference background model by averaging a set of successive background frames void of dynamic objects. We then exploit the concept of exponential forgetting [35] to recursively update the background scene as follows: B n ¼ αF n þ ð1 À αÞB n−1 ; (1) where α ∈ ½0; 1 is a learning constant, typically set to 0.05, and both B n and F n are the background model and observed frame at time instant n, respectively. In order to reveal object-like regions, the difference map between F n and B n is calculated and thresholded, resulting in a binary image BI n as Here, Θ n is a dynamic threshold value, empirically defined as Θ n ¼ 20 þ 2:5σðjF n − B n jÞ, where σ stands for standard deviation. The BI n image is further processed by standardized binary morphological operations to filter out small noises and to alleviate possible false detections, yielding final mask of moving objects BF n ; examples of F n and BF n images are given in Figure 2. After that, the foreground objects are segregated by performing pixel-wise multiplication between F n and BF n . As with most segmentation techniques, we extract a set of segments corresponding to the bounding boxes of moving objects. It is typical that different objects return different segments sizes, so these segments are normalized to fit the ACI height and width of training samples. Therefore, at each n, the segmentation result is a set, denoted as O n ¼ fQ nr g w r¼1 ; Q nr ∈ ℝ p3q , composed of w equal size objects segments have uniform background. And of course, when no object exists in the scene,O n ¼ ffg.

Category-Wise 2DPCA
Classical 2DPCA, by nature, deals with a single-object classification, where the training samples are partitioned into a number of pattern classes, so that all classes are equally contributed to the computation of total scatter matrix. Unfortunately, this is not the case in multiple objects scenario, because directly applying 2DPCA to training dataset containing samples from more than one object type leads to potential ambiguity in the definition of total scatter, since different object types have different structural and spatial properties. Alternatively, we introduce CW2DPCA for multiple objects' classification, in which the scatter matrices are defined from the category perspective to clearly quantify the typical correlation between intra-category samples and, in turn, ensure that each object category is uniquely characterized by its own invariant features. Therefore, the main purpose of CW2DPCA is to transform each object-specific dataset from image space to feature space, through which the preserved information in transformed space is well discriminative while being separable prior to the classification task. Without loss of generality, we address a simplified two-object classification problem. Specifically, given a segmented object in a video sequence, the goal is to coarsely categorize it into either human or vehicle. We hence use two disjoint training datasets, one constructed from human images and the other from vehicle images, to learn optimal projection for each object separately. Let , be the human and vehicle training datasets, respectively, where N is the number of samples in each dataset. Since, here, the objective is to obtain two category-wise optimal projection matrices, each of which maximizes the scatter within its respective intra-category samples, the scatter matrices S H and S V of H and V , respectively, are individually computed as Resulted moving objects masks after performing adaptive BS and morphological operations.

Moving objects classification
Here, H and V denote the mean images of H and V , respectively, and both S H and S V ∈ ℝ q3q . By computing the eigen-decomposition of each scatter matrix, we can form two optimal projection matrices U H opt and U V opt such that U H opt ¼ ½h 1 ; . . . ; h k ∈ ℝ q3k and U V opt ¼ ½v 1 ; . . . ; v k ∈ ℝ q3k , where fh i ji ¼ 1; . . . ; kg and fv i ji ¼ 1; . . . ; kg are, respectively, sets of the first k dominant eigenvectors of S H and S V . So, accordingly, we derive two distinct sets of feature matrices denoted by Y H and Y V for the training images of H and V , respectively, as follows: As we have already defined two feature spaces (one per category), it then becomes natural to base classification task on the fact that feature representation of a test sample from a certain category lies close only to that of training data from the same category. To proceed with this, we first need to map the test object image onto each feature space, and then by using NN classifier, the object membership can be specified, as will be detailed in subsequent section. opt and U V opt , respectively. Thus, the evidence for O nr being belonged to either of the two categories is simply the highest similarity, within each feature space, between O nr feature matrix and the set of training feature matrices. Towards this end, we make use of NN classifier based on Euclidean distance in feature space.

Dynamic objects classification
More formally, we first compute the minimum distance between O H nr and Y H as well as between O V nr and Y V as below: where k$k refers to the Euclidean norm. It follows by returning D nr ¼ minðD H nr ; D V nr Þ that the category index associated with the assigned minimum distance to D nr is the category membership of O nr .

Experimental results
To evaluate the effectiveness of the proposed MOC method, we conducted experiments on two publicly available video datasets: PETS2000 and PETS2001 [36]. It is important to point out that we can only compare the results of the presented CW2DPCA method against those provided by category-wise PCA (CWPCA) methods, which, however, not exist as yet in literature. For fairness, we have compared our method with an extended version of the PCAbased vehicle classification framework introduced in [29], which initially structured to ACI classify vehicles at finer-level, and only a set of training vehicle images was employed to generate the PCs (eigenvehicles). We extended it to the CWPCA by following the similar procedure as in CW2DPCA with exception that the formulation is made on vectorized version of input images.
Before giving the classification results, we will first introduce the used training datasets for constructing category-wise feature spaces, then we will illustrate the role of feature extraction within both CW2DPCA and CWPCA.

Training datasets
In order to define the bases of the category-wise feature spaces, we constructed two separate training datasets, one comprising human samples and the other of vehicle samples, each one 200 samples long, spanning a sufficient range of object appearance conditions. The human and vehicle samples are manually segmented from the images of Penn-Fudan database [37] and Graz-02 dataset [38], respectively. The background intensity and size of each sample are respectively set to 255 and 50 3 30 pixels. Figure 3 shows five samples from each dataset.

The role of feature extraction
We applied CW2DPCA to transform input images into feature matrices. Here, according to the CW2DPCA formulation, the size of scatter matrices S H and S V is 30 3 30, and consequently it is quite easy to form the optimal projection matrices U H opt and U V opt (and hence the feature matrices). For instance, when considering k515, both U H opt and U V opt ∈ ℝ 30315 , and the two sets of training feature matrices receive the forms where Y H i and Y V i ∈ ℝ 50315 , whereas the projected test feature matrices O H nr and O V nr ∈ ℝ 50315 . Although the feature matrices are relatively compact, they convey the most energy of the original images [30] and preserve some local details which may useful in distinguishing between different objects. That is to say, these feature matrices provide Moving objects classification compact and meaningful descriptions to the content of input images while performing classification. As evidence of this, Figure 4 depicts the reconstructed images of the first sample in each row of Figure 3, when k51, 3, 6, 9, 12, and 15. One can observe that the first few principal eigenvectors are sufficient enough to produce a good approximation to the original samples. For comparison, Figure 4 also depicts the reconstructed images by CWPCA (eigenhumans and eigenvehicles) as the number of PCs d set to 10,20,30,40, 50, and 60. As expected, CWPCA yields much lower reconstruction quality compared to CW2DPCA.
4.3 Experiments on PETS2000 and PETS2001 video datasets PETS2000 and PETS2001 are surveillance type of video sequences containing video objects with diverse appearance conditions and motion patterns. These sequences also have some challenging factors such as illumination variations, complex background, and background modifications. PETS2000 consists of 1452 frames with resolution of 480 3 640 pixels (height 3 width), containing only humans and vehicles. Although PETS2001 composed of 3064 frames with size 5763768 pixels, we considered only the first 2550 frames in which the moving objects are solely humans and vehicles. In our experiments, the original frames of PETS2000 and PETS2001 are converted to gray scale and normalized to 240 3 320 pixels. Again, in this paper, CW2DPCA and CWPCA methods are used for feature extraction to distinguish between humans and vehicles. Note that, since in general the number of principal eigenvectors/components to be retained is user-defined, we selected values ranging from 1 to 15 in an incremental manner, so that for each method, 15 test runs have been performed on each video sequence.
In the segmentation procedure, we picked up the first 30 and 170 frames to infer the background scene of PETS2000 and PETS2001, respectively, and then updated using (1). It is worthwhile mentioning that invalid objects segments are excluded from the subsequent classifications. These segments mostly correspond to poorly/erroneously segmented objects and to dynamically occluded objects. Unfortunately, such segmentation results are almost inevitable unless there are user interactions, which, out of scope of this paper. Resultantly, at the end of each test run, the total number of valid segmented objects within PETS2000 is 2028, of which 1302 are humans and 726 vehicles, whereas within PETS2001, it is 2764, of which 1983 are humans and 781 vehicles. Notice, furthermore, that the background intensity and size of each segment are set identically to those of training samples.
In the following, we demonstrate the performance of CW2DPCA and compare it to CWPCA. Figures 5 and 6 separately display some examples of correctly classified humans and vehicles in PETS2000 and PETS2001 by CW2DPCA method. Figures 7 and 8 show the classification accuracies for the moving objects within PETS2000 and PETS2001, respectively, when using CW2DPCA and CWPCA. As observed in Figure 7, the lowest classification accuracies produced by CW2DPCA are 84.10% for humans, 91.18% for vehicles, and 86.83% for total objects (humans and vehicles)  ACI when k51, 2, and 1, respectively. Figure 8 also shows that the lowest classification accuracies yielded by CW2DPCA are 78.67% for humans, 91.55% for vehicles, and 82.31% for total objects when k51. Both of these results indicating that CW2DPCA with a few principal eigenvectors has a strong ability to classify moving objects. Furthermore, it is observed in Figure 7 that the proposed method reached highest classification accuracies of 92.70% for humans, 96.69% for vehicles, and 93.78% for total objects when k55, 8, and 7, respectively. Also, according to Figure 8, the highest classification accuracies achieved with CW2DPCA are 94.35% for humans, 97.95% for vehicles, and 94.79% for total objects when k55, 6, and 5, respectively. Such results further affirming the effectiveness of the proposed method for moving objects classification in these challenging video sequences.
As also noted in Figures 7 and 8, CW2DPCA consistently outperforms CWPCA for each moving object. In Table 1, we report the performance of CW2DPCA and CWPCA methods in terms of average classification accuracy. Results from Table 1 show that CW2DPCA achieves average classification accuracies surpass those of CWPCA by 10% to 14%.
Although the presented method outperforms CWPCA method, mostly benefiting from the efficient representation of original images by CW2DPCA, but not surprisingly their results share some general trends. Particularly, the classification accuracies of both methods tend to increase as the number of principal eigenvectors/components increases. As expected intuitively, the classification accuracies for vehicles are always higher than those for humans. This is fundamentally due to the fact that humans are nonrigid highly deformable objects often appear relatively small within video frames, so the segmentation results may not return their accurate structures. Further, apart from the segmentation performance, the misclassification cases are occurred when the moving objects appear too small to support sufficient features and, in less degree, when their appearances are not well covered in the datasets.
Eventually, we evaluated the computational efficiency of CW2DPCA and CWPCA methods, with unoptimized MATLAB code runs on a laptop with Intel Core i3, 2.26 GHZ CPU, and 4GB RAM. Table 2 provides runtime for individual phases of each method in some of the conducted experiments; more specifically, when both k and d set to 1, 5, 10, and 15. It can be noted that the training times in both methods are very short, since the sample size and  Moving objects classification dimension of training datasets are relatively small. Even so, CWPCA training times are slightly longer compared to CW2DPCA. As for segmentation and testing times, CWPCA method also takes slightly longer times than the CW2DPCA method. Table 2 also clearly shows that the computational time of each individual phase increases as the number of principal eigenvectors/components increases. Moreover, both CW2DPCA and CWPCA methods in their primal forms are able to achieve 8 to 10 fps.

Conclusions
In this paper, we have proposed CW2DPCA-based framework for classifying dynamic objects in video sequences. The basic idea of CW2DPCA is to construct category-wise optimal projection matrices from object-specific training datasets, and then derive feature space for each object category. As a result, CW2DPCA ensures early separation between different object categories and meanwhile produces compact and discriminative features to characterize training datasets and test objects samples. Unlike other methods, our classification framework able to accommodate the variability in objects appearance by the virtue of CW2DPCA, and it is inherently insensitive to objects' motion patterns. The experimental results on two challenging video sequences confirm the performance of the presented framework. Although we have addressed human/vehicle classification    Moving objects classification in this paper, it is straightforward to extend CW2DPCA to handle multiple objects classification.