Processing of acoustic signals for hearing aids seeks optimal speech intelligibility within constantly changing acoustic scenes and soundscapes. This requires the adjustment of processing algorithm parameters to be performed in real-time. This work introduces a system which is able to recognize acoustic environments continuously using Artificial Intelligence (AI) in the form of a Deep Convolutional Neural Network (CNN) with a focus on real-time implementation. Inspired by VGGNet-16, the CNN architecture was modified to a multi-label multi-output model which is able to predict combinations of scene and soundscape labels simultaneously while sharing the same feature extraction. For training we acquired a custom dataset consisting of 23.8 hours of high-quality binaural audio data including five classes per label which are clearly distinguishable by humans. Using a manual Grid Search method, we were able to optimize three models in different complexity domains for choosing a trade-off between accuracy and throughput. CNNs were then post-quantized to 8-bit which achieved an overall accuracy of 99.07% in the best case. After reducing the number of Multiply-Accumulate (MAC) operations 154x and parameters 18x, the classifier was still able to detect scenes and soundscapes with an acceptable accuracy of 94.82%. This compressed model allows real-time inference at the edge on discrete low-cost hardware with a clock speed of 10 MHz and one inference per second.