Distributed Representations of Wi-Fi Fingerprints from Non-Contextual Text-Embedding Techniques with Applications in Crowdsourcing Zone-Level Localization

Authors

  • Chotipon Pakdeethammasakul Master’s Degree Program in Industrial Engineering, Faculty of Engineering, Chiang Mai University, Chiang Mai 50200, Thailand
  • Nirand Pisutha-Arnond Department of Industrial Engineering, Faculty of Engineering, Chiang Mai University, Chiang Mai 50200, Thailand https://orcid.org/0000-0001-6101-3217

DOI:

https://doi.org/10.48048/tis.2023.6739

Keywords:

Wi-Fi fingerprint, Indoor localization, Classification, Word embeddings

Abstract

Over the past decade, indoor localization systems have gained increasing attention and found widespread applications in commercial and research environments. Specifically, a Wi-Fi fingerprint-based system offers a low-cost solution over its counterparts such as Bluetooth, ultra-wideband (UWB), and radio frequency identification (RFID) technologies due to the ubiquity of Wi-Fi access points (WAPs) in most buildings. However, the main disadvantage of the fingerprint-based system is intensive survey effort required during system initialization and maintenance. This work explores a solution to alleviate this limitation by considering a crowdsourcing approach for zone-level localization.  Instead of relying only on the labelled fingerprint data from trained surveyors, this approach uses the more-attainable unlabelled fingerprint data collected by participating volunteers. This unlabelled data is then used to augment the survey data in a process called pseudo labelling, forming a more comprehensive training dataset for subsequent localization tasks; this semi-supervised approach allows for minimal survey effort during system initialization and maintenance.  To enable such solution, this work introduces a novel approach of employing non-contextual word embedding techniques to construct distributed vector representations of fingerprint data to overcome 3 challenges; (a) high memory requirement in the downstream tasks due to high-dimensional non-distributed vector representations from the “standard” vector transformation, (b) inclusion of an arbitrary value that represents missing WAPs which can affect the performance of the downstream localization tasks in a non-transparent manner, and most importantly, (c) poor pseudo-labelling and semi-supervised zone-prediction performances due to poor data separability in a feature space. The choice of the non-contextual text-embedding techniques, as opposed to the contextual counterparts, leads to less computational requirement in model training and distributed-representation generation due to simpler model architectures (no deep learning) and no requirement for pre-trained model during distributed-representation generation. To this end, we considered non-contextual word embedding techniques commonly used in natural language processing such as Word2Vec, GloVe, and Doc2Vec in the distributed-representation transformation, and compared the resulting downstream performances with those from well-recognized dimensionality reduction techniques such as PCA, Isomap, and UMAP.  The results show that Word2Vec and GloVe transformations outperform other types of transformations in terms of separability in fingerprint representations, pseudo-labelling performance, and semi-supervised zone-prediction accuracy. Together with the promising robustness property against potential data inhomogeneity, Word2Vec and GloVe transformations are the recommended transformation processes for constructing vector representations of fingerprints in crowdsourcing zone-level localization.

HIGHLIGHTS

  • This work introduces a novel approach of employing non-contexual word-embedding techniques to construct distributed vector representations of Wi-Fi fingerprint data to facilitate pseudo-labelling and semi-supervised zone-prediction tasks in crowdsourcing zone-level localization
  • The benefits of employing word-embedding techniques are (a) lower memory requirement in the downstream tasks due to distributed vector representations (b) no inclusion of an arbitrary value that represents missing WAPs which can affect the performance of the downstream localization tasks in a non-transparent manner (c) improved pseudo-labelling and semi-supervised zone-prediction performances due to improved data separability in a feature space
  • The benefit of employing non-contextual techniques, as opposed to the contextual counterparts, is less computational requirement in model training and distributed-representation generation due to simpler model architectures (no deep learning) and no requirement for pre-trained model during distributed-representation generation
  • The results show that Word2Vec and GloVe transformations outperform other types of transformations in terms of separability in fingerprint representations, pseudo-labelling performance, and semi-supervised zone-prediction accuracy


GRAPHICAL ABSTRACT

Downloads

Download data is not yet available.

Metrics

Metrics Loading ...

References

F Zafari, A Gkelias and KK Leung. A survey of indoor localization systems and technologies. IEEE Comm. Surv. Tutorials 2019; 21, 2568-99.

S Sadowski and P Spachos. RSSI-based indoor localization with the internet of things. IEEE Access 2018; 6, 30149-61.

P Bolliger. Redpin - adaptive, zero-configuration indoor localization through user collaboration. In: Proceedings of the MELT’08: First ACM international workshop on Mobile entity localization and tracking in GPS-less environments, California. 2008, p. 55-60.

AH Salamah, M Tamazin, MA Sharkas and M Khedr. An enhanced WiFi indoor localization system based on machine learning. In: Proceedings of the 2016 International Conference on Indoor Positioning and Indoor Navigation, Alcala de Henares, Spain. 2016.

F Palumbo, P Barsocchi, S Chessa and JC Augusto. A stigmergic approach to indoor localization using bluetooth low energy beacons. In: Proceedings of the 2015 12th IEEE International Conference on Advanced Video and Signal Based Surveillance, Karlsruhe, Germany. 2015.

C Jain, GVS Sashank, N Venkateswaran and S Markkandan. Low-cost BLE based indoor localization using RSSI fingerprinting and machine learning. In: Proceedings of the 2021 Sixth International Conference on Wireless Communications, Signal Processing and Networking, Chennai, India. 2021.

L Yang, Y Chen, XY Li, C Xiao, M Li and Y Liu. Tagoram: Real-time tracking of mobile RFID tags to high precision using COTS devices. In: Proceedings of the MobiCom’14: 20th annual international conference on Mobile computing and networking, Hawaii. 2014, p. 237-48.

L Mainetti, L Patrono and I Sergi. A survey on indoor positioning systems. In: Proceedings of the 2014 22nd International Conference on Software, Telecommunications and Computer Networks, Split, Croatia. 2014.

LM Ni, D Zhang and MR Souryal. RFID-based localization and tracking technologies. IEEE Wireless Comm. 2011; 18, 45-51.

F Ge and Y Shen. Single-anchor ultra-wideband localization system using wrapped PDoA. IEEE Trans. Mobile Comput. 2021; 21, 4609-23.

W Zhao, A Goudar and AP Schoellig. Finding the right place: Sensor placement for UWB time difference of arrival localization in cluttered indoor environments. IEEE Robot. Autom. Lett. 2022; 7, 6075-82.

W Wu, L Shen, Z Zhao, M Li and GQ Huang. Industrial IoT and long short-term memory network enabled genetic indoor tracking for factory logistics. IEEE Trans. Ind. Informat. 2022; 18, 7537-48.

S He and SHG Chan. Wi-fi fingerprint-based indoor positioning: Recent advances and comparisons. IEEE Comm. Surv. Tutorials 2015; 18, 466-90.

AM Hossain, HN Van, Y Jin and WS Soh. Indoor localization using multiple wireless technologies. In: Proceedings of the IEEE International Conference on Mobile Adhoc and Sensor Systems, Pisa, Italy. 2007.

P Kriz, F Maly and T Kozel. Improving indoor localization using bluetooth low energy beacons. Mobile Inform. Syst. 2016; 2016, 2083094.

Y Zhuang, Z Syed, Y Li and N El-Sheimy. Evaluation of two wifi positioning systems based on autonomous crowdsourcing of handheld devices for indoor navigation. IEEE Trans. Mobile Comput. 2015; 15, 1982-95.

W Sun, M Xue, H Yu, H Tang and A Lin. Augmentation of fingerprints for indoor wifi localization based on Gaussian process regression. IEEE Trans. Veh. Tech. 2018; 67, 10896-905.

SH Jung, BC Moon and D Han. Unsupervised learning for crowdsourced indoor localization in wireless networks. IEEE Trans. Mobile Comput. 2015; 15, 2892-906.

JEV Engelen and HH Hoos. A survey on semi-supervised learning. Mach. Learn. 2020; 109, 373-440.

A Haider, Y Wei, S Liu and SH Hwang. Pre-and post-processing algorithms with deep learning classifier for wi-fi fingerprint-based indoor positioning. Electronics 2019; 8, 195.

J Torres-Sospedra, R Montoliu, A Martı́nez-Usó, JP Avariento, TJ Arnau, M Benedito-Bordonau and J Huerta. UJIIndoorLoc: A new multi-building and multi-floor database for WLAN fingerprint-based indoor localization problems. In: Proceedings of the International Conference on Indoor Positioning and Indoor Navigation, Busan, Korea. 2014.

T Mikolov, I Sutskever, K Chen, GS Corrado and J Dean. Distributed representations of words and phrases and their compositionality. arXiv 2013, https://doi.org/10.48550/arXiv.1310.4546

J Pennington, R Socher and C Manning. GloVe: Global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, Doha, Qatar. 2014, p. 1532-43.

Q Le and T Mikolov. Distributed representations of sentences and documents. arXiv 2014, https://doi.org/10.48550/arXiv.1405.4053.

IT Jolliffe and J Cadima. Principal component analysis: A review and recent developments. Phil. Trans. Math. Phys. Eng. Sci. 2016; 374, 20150202.

Z Xu, B Huang, B Jia, W Li and H Lu. A boundary aware wifi localization scheme based on UMAP and KNN. IEEE Comm. Lett. 2022; 26, 1789-93.

JB Tenenbaum, VD Silva and JC Langford. A global geometric framework for nonlinear dimensionality reduction. Science 2000; 290, 2319-23.

SS Birunda and RK Devi. A review on word embedding techniques for text classification. Innovative Data Communication Technologies and Application. Springer, Singapore, 2021, p. 267-81.

W Kim, S Yang, M Gerla and EK Lee. Crowdsource based indoor localization by uncalibrated heterogeneous wi-fi devices. Mobile Inform. Syst. 2016; 2016, 4916563.

Y Shu, Y Huang, J Zhang, P Coué, P Cheng, J Chen and KG Shin. Gradient-based fingerprinting for indoor localization and tracking. IEEE Trans. Ind. Electron. 2015; 63, 2424-33.

N Singh, S Choe and R Punmiya. Machine learning based indoor localization using wi-fi RSSI fingerprints: An overview. IEEE Access 2021; 9, 127150-74.

B Ezhumalai, M Song and K Park. An efficient indoor positioning method based on wi-fi RSS fingerprint and classification algorithm. Sensors 2021; 21, 3418.

P Bojanowski, E Grave, A Joulin and T Mikolov. Enriching word vectors with subword information. Trans. Assoc. Comput. Ling. 2017; 5, 135-46.

ME Peters, M Neumann, M Iyyer, M Gardner, C Clark, K Lee and L Zettlemoyer. Deep contextualized word representations. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, New Orleans, Louisiana. 2018, p. 2227-37.

J Devlin, MW Chang, K Lee and K Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, https://doi.org/10.48550/arXiv.1810.04805.

B Guo, W Zuo, S Wang, W Lyu, Z Hong, Y Ding, T He and D Zhang. Wepos: Weak-supervised indoor positioning with unlabeled wifi for on-demand delivery. Proc. ACM on Interact. Mobile Wearable Ubiquitous Tech. 2022; 6, 54.

X Sun, H Ai, J Tao, T Hu and Y Cheng. BERT-ADLOC: A secure crowdsourced indoor localization system based on BLE fingerprints. Appl. Soft Comput. 2021; 104, 107237.

ST Dumais. Latent semantic analysis. Annu. Rev. Inform. Sci. Tech. 2004; 38, 188-230.

LVD Maaten and G Hinton. Visualizing data using t-SNE. J. Mach. Learn. Res. 2008; 9, 2579-605.

F Anowar, S Sadaoui and B Selim. Conceptual and empirical comparison of dimensionality reduction algorithms (PCA, KPCA, LDA, MDS, SVD, LLE, ISOMAP, LE, ICA, t-SNE). Comput. Sci. Rev. 2021; 40, 100378.

R Rehurek and P Sojka. Gensim-python framework for vector space modelling. Vol 3. NLP Centre, Faculty of Informatics, Masaryk University, Brno, Czech Republic, Czechia, 2011.

F Pedregosa, G Varoquaux, A Gramfort, V Michel, B Thirion, O Grisel, M Blondel, P Prettenhofer, R Weiss, V Dubourg, J Vanderplas, A Passos, D Cournapeau, M Brucher, M Perrot and E Duchesnay. Scikit-learn: Machine learning in python. J. Mach. Learn. Res. 2011; 12, 2825-30.

T Chen and C Guestrin. XGBoost: A scalable tree boosting system. arXiv 2016, https://doi.org/10.1145/2939672.2939785

Y Liu, M Ott, N Goyal, J Du, M Joshi, D Chen, O Levy, M Lewis, L Zettlemoyer and V Stoyanov. Roberta: A robustly optimized bert pretraining approach. arXiv 2019, https://doi.org/10.48550/arXiv.1907.11692

Roberta. Hugging Face. Available at: https://huggingface.co/roberta-base/tree/main, accessed January 2023.

Downloads

Published

2023-08-28

How to Cite

Pakdeethammasakul, C. ., & Pisutha-Arnond, N. . (2023). Distributed Representations of Wi-Fi Fingerprints from Non-Contextual Text-Embedding Techniques with Applications in Crowdsourcing Zone-Level Localization . Trends in Sciences, 20(11), 6739. https://doi.org/10.48048/tis.2023.6739