2) Matrix FactorizationThe matrix factorization approach is a further representation of pLSA. The word frequency matrix that defines the data set is a very large and sparse matrix; it has a number of rows equal to documents d, and the number of columns is the number of different words k that appear in our corpus. The reason for the scarcity is that only a small percentage of the words are used in each document, depending on the specific topic. Therefore, dimensionality reduction is a problem for the word frequency matrix since most of its entries are null and do not provide specific details. This can be achieved by approximating the co-occurrence matrix (denoted F) as the product of two low-rank (thinner) matrices P and R. For example: F≈F ̂=P.RSo, if the dimension of P is X×Y and the dimension of R is Y×Z, with Y≪Z,X , this will achieve dimensionality reduction, as The P and R matrices also indicate some details about the latent structure of the data. pLSA performs exactly a matrix factorization of the conditional distribution P(w|d).F = "P∙Q∙R" where,P consists of the document probabilities P(d|z).Q is a diagonal matrix of the prior probabilities of the P(z).R arguments refer to the word probability P(w|z).These matrices represent the probability distribution and therefore are non-negative and normalized.pLSA: procedural viewThe significant and accurate result of pLSA has increased its outstanding convention in regular practices. Topic detection and corpus tracking in word usage analysis, image classification model etc. are some examples of using pLSA. There are main phases in scene classification which are training and testing.Fig 3. The complete design of pLSA formulation defining its main phases: image training, BOW f...... middle of paper..... .val, CIVR , Dublin, Ireland [2004][15] David G. Lowe, University of British Columbia. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, [2004] [16] Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. International Journal of Computer Vision (42) 145-175 [2001][17] Hofmann, T.: Unsupervised learning using probabilistic latent semantic analysis. Machine Learning 41,177-196, [2001][18] M.J. Swain and DH Ballard, “ Color indexing", International Journal of ComputerVision, vol. 7, no. 1, pp. 11-32. [19] Tommaso Hofmann. Probabilistic latent semantic indexing. In Proceedings of the 22nd annual ACM SIGIR international conference on research and development in data retrieval information, SIGIR '99, pages 50-57, New York, NY, USA, ACM.[1999]
tags