Abstract:
This study presents an AI-driven Neural Compiler for Augmented Reality (AR) AI Assistants, leveraging Deep Learning and Swarm Intelligence methods to provide real-time efficient performance. The system takes in a dual-input framework of grayscale facial image data and a labelled vocabulary dataset, allowing multi-modal user input. With the inclusion of Convolutional Neural Networks (CNNs) for face image recognition and Long Short-Term Memory (LSTM) networks for sequential language, the compiler adjusts to both visual and text commands. Facial landmark detection is carried out using MediaPipe to achieve accurate facial tracking and expression analysis. Particle Swarm Optimization (PSO) is used to optimize the model further, which further tunes hyperparameters and neural weights to enhance convergence and accuracy. One of the architectural innovations is the capability of the compiler to operate offline on edge AR devices, minimizing latency and maintaining privacy. The training pipeline involves data augmentation, real-time correction of feedback, and confidence thresholding to enhance system stability and minimize misclassification. The model also incorporates context-aware gesture detection, which makes it especially well- suited for accessibility- oriented AR applications. Experimental assessments on a stacked face image corpus and an AR vocabulary corpus showcase the high prediction and classification precision of the model. The presented system provides a 97.6% rate of accuracy for facial image recognition and 94.3% for vocabulary command recognition. Such findings highlight its capability to become a basis device for next-generation AR-based smart assistants, particularly in real-time and low-resource environments.