Drowsiness Classification for Internal Driving Situation Awareness on Mobile Platform

the sleeping driver is potentially more likely to cause an accident than the person who speeds up since the driver is the victim of sleepiness. Automobile industry researchers, including manufacturers, seek to solve this issue with various technical solutions that can avoid such a situation. This paper proposes an implementation of a lightweight method to detect driver's sleepiness using facial landmarks and head pose estimation based on neural network methodologies on a mobile device. We try to improve the accurateness by using face images that the camera detects and passes to CNN to identify sleepiness. Firstly, applied a behavioral landmark's sleepiness detection process. Then, an integrated Head Pose Estimation technique will strengthen the system's reliability. The preliminary findings of the tests demonstrate that with real-time capability, more than 86% identification accuracy can be reached in several real-world scenarios for all classes, including with glasses, without glasses, and light-dark background. This work aims to classify drowsiness, warn, and inform drivers, helping them to stop falling asleep at the wheel. The integrated CNN-based method is used to create a high accuracy and simple-to-use real-time driver drowsiness monitoring framework for embedded devices and Android phones Keywords— Drowsiness Detection, Classification Stages, Facial Landmark, Behaviour System, Android Implementation

I. MOTIVATION The key factors that endanger road safety are drowsiness or fatigue, which causes serious accidents, fatalities, and economic losses. Traffic safety is becoming more and more relevant for both drivers and the whole community. The safety degree forecast would provide valuable information to drivers and administrators, thus preventing traffic from collisions. The raised drowsiness declines the quality of driving. A variety of significant road accidents arise from the lack of alertness caused by unconscious change through alertness to sleep. According to the administration of NHTS (National Highway Traffic Safety), there are nearly 100,000 accidents per year in the US with 1,550 fatalities, 71,001 accidents, and $12.51 billion damages [1]. One more study report that an estimated $60.4 billion a year is spent by the US government and companies on sleepy driving injuries. Also, it financially harms the customer by about $ 16.4 billion like in property loss, medical claims, time loss, and daily life productivity because of traffic road accidents [2]. German Council of Traffic Safety (DVR) argues that one in every four road traffic deaths was caused by drivers' momentary drowsiness [3]. Such as in the year 2010, 54 percent of young drivers drove vehicles when feeling sleepy, and 28 percent fell asleep. According to NSF (National Sleep Foundation) [4]. The exhaustion of a driver may have many causes, including less sleep, very long travel, tiredness, intake of alcohol, and psychological impact. Therefore, previous transport systems are not appropriate to cope with these road hazards. Thus, most deadly accidents can be avoided by incorporating automatic fatigue warning systems into cars. The drowsiness warning system continuously analyses the level of concentration of the drivers and alerts the driver to any significant road safety danger before arrival.
The researcher used the driver's behavioral or physiological changes. Moreover, researchers use various techniques to detect the vehicle's reactions compared with the driver's actions to detect sleepiness. Every approach has its specific strengths, qualities, and disadvantages, making it practical and efficient. The behavioral interventions contribute to the driver's visual knowledge and are strongly influenced by the state of the light, the efficiency of the measurement system, and other external factors. If we overcome lightning conditions, then using this technique, we can provide a cheap, accessible solution to drivers. Physiological changes include pulse rate variation, brain waves, or body muscle electrical signals. While such interventions may theoretically provide an accurate indicator of fatigue, but objects are highly affected by artifacts.
. This sleepiness level can only be measured by experimentation, and an inherent trade-off occurs between speed and prediction accuracy. On the one side, the machine can detect "noise" if the time window is too small and may produce an excessive number of false positives. In this paper, we proposed the behavioral method of sleepiness detection using a mobile phone frontal camera by utilizing facial landmarks passed to a CNN for drowsiness classification. The accomplishment of this work is the ability to provide heavier classification models with a lightweight alternative.

II. RESEARCH FOCUS
Sleepiness detection applications are primarily based on safety devices for cars and the monitoring of the fatigue condition of the driver. Drive alert control system belongs to two main categories: Detection and Classification. The Study illustrates a light structure of the current and effective ways of detecting and classifying driver's drowsiness. They are classified as Subjective measures, Behavioral measure, Vehicular measure, and Physiological measure. Researchers have performed simulations and implemented multiple techniques in current history to investigate driver drowsiness in real-time scenarios [5]. This section reviews the above mentioned most widely used approaches:

A. Subjective Measure
This technique evaluates the level of drowsiness focused on the driver's assessment. Several methods have been used to convert this rating into a measure of driver drowsiness. Like the Epworth Sleepiness Scale (ESS), Karolina Sleepiness Scale (KSS), and Stanford Sleepiness Scale [6]. Short questionnaires are given to subjects who are asked to rate their condition according to suggested scale during these tests. During the test time, the self-rating is generally conducted repeatedly either with a time interval or conditions. This type of rating makes it possible to determine the participant's awareness regarding their level of alertness. Researchers have determined that KSS ratings between 5 and 9 are prevalent for significant lane departures, high eye blinking duration, and drowsiness-related physiological signals [6]. However, the subjective rating does not coincide with vehicle-based, physiological, and behavioral measures. Observer-rated sleepiness (ORS) is another technique used in investigational driver fatigue.

B. Vehicle Measure
In this technique, the drowsiness of the driver is measured by analyzing the vehicle's different controller signals, such as steering wheel movement (SWM), Standard deviation of lane position (SDLP), speed of the vehicle, brake pedal, and different types of shift lever [7,8]. In several situations, Vehicle measurements are estimated in a virtual environment by adding sensors to vehicle components. The measured dataset is continuously examined, and every dataset variance that exceeds the defined threshold suggests a substantially elevated risk that the driver is sleeping. Standard lane position deviation (SDLP) is another technique from which the degree of driver's sleepiness can be assessed [9].

C. Behavioral Measure
This technique employs image processing approaches to recognize changes in the driver's behavior, such as facial expression methods, eye tracking, blinking, template matching, yawn and mouth analysis, and head pose movement [10]. A driver's behavior patterns change expressed once the person feels drowsy. The visual data of the driver is taken in real-time through behavior measuring techniques, then analyzed the information and assessed the driver's status based on the level of presence of changes in the actions of the driver. Among the methods focused on computer vision to assess the driver's drowsiness, researchers focused primarily on the study of blinks and the percentage of PERCLOS of the human eyes [11][12][13][14]. In addition to eye closure, some researchers have considered other facial gestures such as eyebrow-raising [15], yawning [16], and orientation of the head or eye position [17,18].

D. Psychological Measure
This method utilizes the measurement of the human body's physiological processes like brainwave patterns (Electroencephalogram-EEG), blood pressure (Electrocardiogram-ECG), muscle cell electrical signals (Electromyogram-EMG), or human eye movement (Electrooculography-EOG) [19][20][21]. According to the measurement method, the electrodes are connected to the various sections of the body, and the electrical signal is assessed and analyzed to assess the driver's drowsiness condition. Because of noise, psychological signals become weak and may be distorted rapidly. Researchers used various methods to pre-process the raw data to eliminate the noise. Patel M. et al. [22] utilized the filtering and thresholding techniques to eliminate the noise from the ECG input data. The output data is analyzed in the frequency domain after pre-processing using Fast Fourier Transforms (FFT), and essential classification features are extracted. Similar experiments were conducted on EEG data by Fu-Chang et al. [23] to assess a driver's drowsiness.

A. System Architecture
Numerous convolutional neural networks (CNN) have attained the highest accuracy. Since there are many adjustable parameters, preparation is the costliest, requiring a Figure 1: System Architecture large quantity of training data. There have been no specialized approaches for determining the best CNN architecture for a given operation. Some libraries are available for CNN preparation and assessment. Figure 1 consists of two-part: The first is system application architecture, and the second is software architecture.
DLIB Library uses the capacity of OpenCV's Haar Cascades to detect facial landmarks. Michael Jones and Paul Viola [24] presented this method refined over decades as an open-source project of OpenCV library participants focused upon deep learning. This approach incorporates several images to identify an object that is either positively or negatively labeled. In this work, that main entity is going to be a face. As a result, the algorithm can detect a new face picture fed into an algorithm.

B. Software Architecture
Applications are the topmost part of Android architecture. Indigenous and third-party apps such as contact information, mailing, entertainment, library, timer, sports, and others will be configured only on this interface. An application layer operates inside this Android run time and uses the application framework's classes and services. This framework also intends an interface and application resources and offers a standard encapsulation besides hardware access. It essentially offers services that allow us to create a specific class to make that helpful class for creating applications. The software framework includes mobile communications, internet connectivity, a person's uses, NFC facility, approach links, and others that we are using to develop applications based on our needs, see Figure 2. From the application layer, we used the camera for our input.

C. Hardware Architecture
The hardware contains a standard interface that reveals the device's ability to use the Java API framework at a higher level, as shown in the Figure 3. The HAL comprises several mode commands, one of which provides an application for any specific hardware device, including a lens or even a USB connection. The Android machine installs the library module with this hardware portion when an application API calls it to access the computer hardware.

A. Extraction of Eye Features from Detected Face
Detected driver's face cropped as an initial step (see Figure 4), and then, the area at which eyes are likely to be situated is cropped. Every human eye is expressed by six x and y dimensions. It begins from the start of the eye's most left corner and afterward works clockwise across the rest of the eye. There has been a relationship between the width of its coordinates and their height. When the eye is open, the aspect ratio of an eye is nearly constant, but it can rapidly collapse to zero when a blink occurs. When an individual blinks from the right, the eye aspect ratio (EAR) dramatically decreases, even becoming 0. For a video clip, a graph of the eye aspect ratio over time is presented in the figure 5 [25]. As we can see, the eye aspect ratio is constant, then drops rapidly near 0, then rapidly increases again, pointing that a single blink has occurred.

B. Head Pose Estimation
Detection of head poses on images with many mathematical derivations around translating the points to 3D space and locating rotational and translational vectors using cv2.solvePnP. We require six facial lines, i.e., the nib of nose, jaw, extreme left and right of mouth, the left side of the left eye, and the right side of the right eye. We take regular 3D coordinates in these facial landmarks and attempt to approximate the rational and translational vectors at the nose tip. Then, we need intrinsic camera parameters, such as focal length, optical center, and radial distort variables, for a reliable estimation. To simplify our job, we should approximate the previous two and presume the third one is not there. After obtaining the necessary vectors, we can project those 3D points on a 2D surface, which is the output image. We require some points 2D (x, y) positions. In the case of a face, we may select the edges of the eyes, corners of the lips, nip of the nose, and other edges. We obtained these points from DLIB and works accordingly. We will need a 3D Figure 6: Head Pose Estimation point, but this is not needed in practice. In any random reference frame, we only need the 3D positions of a few points.
Through DLIB, we also understand the 2D facial feature points. We should look at the distance between the 3D points predicted and the 2D facial characteristics. The 3D points placed on the image plane will fit almost perfectly the 2D face shape when the expected pose is perfect. We may calculate the measurement of the re-projection error, the number of square distances between projected 3D points and the 2D facial points where the estimation of the pose is inaccurate.
OpenCV function SolvePnP implements several pose prediction algorithms chosen using the flag parameter. The SOLVEPNP ITERATIVE flag is generally the DLT solution followed by Levenberg-Marquardt optimization used by default. The program used to collect the image set and submit it for processing is an integral part of the entire process. The mobile application for Android can take pictures of the driver. The Dlib library analyses this picture. Then the image data is transmitted by the Java Native Interface (JNI) From the Java-programmed native Android framework and the C++-developed Dlib library. Upon obtaining the image data, the Dlib library would pre-process and remove the facial point landmarks, submitting the data to the CNN model built above section. This input runs into the neural network, and the algorithm determines whether the driver is sleepy. In real-time, the effects of the images are displayed on the application. If the driver is discovered drowsy, the application may submit alerts via audio messages. Implementation is done using Java, HTML, and Android Native Development Kit (NDK). As shown in figure 8, the drowsiness is detected based on the facial features, and figure 9 depicts the drowsiness detection based on the head pose. Here we can also see the accuracy that we are getting for detection. Based on the detection, we can classify as Active, Sleepy, or drowsy, and also, we can manually select the time to get an alarm after detection, which is 0.5, 1.5, and 2.5 Seconds. Here our input is a video which is a sequence of images. In 10+ subjects tested, 40 training videos are used, and 20 are used for evaluation. These videos consist of sleepy and non-sleepy scenarios. Every video is converted into an image and then performed defined framework. The algorithms were validated regarding the effectiveness of sleepiness detection time and average speed (see Table 1).  Evaluation and testing specifically show that perhaps the eyes are a significant element in classifying sleepiness of any situation. In photos where eyes appear blocked while using glasses, the designation's accuracy is diminished by a couple of percent. Also, the illumination on the subject's face that when enhanced, seems to show the facial features even more clearly, influencing the performance. Photos that lose visibility due to uneven illumination or darkness are a significant output factor as the CNN Model's error rate rises by 3 percent. Although after a considerable number of trained data, it is still possible to have false positives. According to the experiment, we have less than a 0.4% error rate. In Table 2, we can see the performance overview with different environments and cases. We tested on android devices, namely Samsung s8/s8+, Xiaomi redmi 9, Q-mobile Noir, and to install the application, we need at least 30 MB free space.

VI. CONCLUSION & FUTURE ASPECTS
This paper proposed an enhanced sleepiness approach that relies upon CNN-based Computer Vision. The key goal is to make the machine classify and detect sleepiness, lightweight for handheld devices, and achieve high accuracy. The algorithm could distinguish facial features from videos taken on the android platform and forward them to a CNNqualified Learning Algorithm that detects sleepy driving behavior and categorize them. We have reached an overall accuracy of 86% across all classes. We have also added a feature to overcome the lighting issue, and it helps in getting better results even in the dark. In addition, to boost the process, we implemented that it be integrated with the Head Pose estimate. Outcomes on both facial features and head pose estimation (HPE) based sleepiness identification approaches are realistically integrated to make the conclusive decision. In the future, we can build this technology in-car dashboard camera to inform drivers timely to reduce road accidents due to drowsiness. An extensive data set and training will improve the detection efficiency, which will be the future work.