Smart vision-based analysis and error deduction of human pose to reduce musculoskeletal disorders in construction

Purpose – This pragmatic research paper aims to unravel the smart vision-based method (SVBM), an AI program to correlate the computer vision (recorded and live videos using mobile and embedded cameras) that aids in manual lifting human pose deduction, analysis and training in the construction sector. Design/methodology/approach – Using a pragmatic approach combined with the literature review, this studydiscussestheSVBM.Theresearchmethodincludesaliteraturereviewfollowedbyapragmaticapproach andlabvalidationoftheacquireddata.Adoptingthepracticalapproach,theauthorsofthisarticledevelopedanSVBM,anAIprogramtocorrelatecomputervision(recordedandlivevideosusingmobileandembedded cameras). Findings – Results show that SVBM observes the relevant events without additional attachments to the human body and compares them with the standard axis to identify abnormal postures using mobile and other cameras. Anglesof criticalnodal points are projectedthrough human posedetection andcalculating bodypart movement angles using a novel software program and mobile application. The SVBM demonstrates its ability to data capture and analysis in real-time and offline using videos recorded earlier and is validated for program coding and results repeatability. Research limitations/implications – Literature review methodology limitations include not keeping in phase with the most updated field knowledge. This limitation is offset by choosing the range for literature reviewwithinthelasttwodecades.Thisliteraturereviewmaynothavecapturedallpublishedarticlesbecause therestrictionofdatabaseaccessandsearchwasbasedonlyonEnglish.Also,theauthorsmayhaveomitted fruitful articles hiding in a less popular journal. These limitations are acknowledged. The critical limitation is that the trust, privacy and psychological issues are not addressed in SVBM, which is recognised. However, the benefits of SVBM naturally offset this limitation to being adopted practically.


Introduction
Physically demanding jobs such as those in construction have greater exposure to high-risk work environments and the highest amount of work injuries compared to other New Zealand industries over recent years (ACC, 2023).The data from New Zealand's primary workplace health and safety regulator, WorkSafe, from June 2021 to May 2022 showed that those working in the manufacturing industry had the highest number of injuries resulting in more than a week away from work (5775 total ACC injuries claim).The most common injury is "muscle stress due to lifting, carrying or putting down objects", also known as manual handling (WorkSafe, 2022).These injuries can lead to Musculoskeletal Disorders (MSD), which affect a person's muscles, nerves, tendons, joints, cartilage and spinal disc and are commonly known as trauma, back pain and arthritis (USBJI, n.d).Until the late 1990s, these disorders were widely believed to be in older people.However, the United States Center for Disease Control and Prevention's (CDC) National Institute for Occupational Safety and Health (NIOSH) released evidence of Work-Related Musculoskeletal Disorders (WMSD) in 1997.
According to the CDC (2020), the conditions for WMSD compared to regular MSD are when (1) the work environment and performance of work contribute significantly to the disease; and (2) the condition worsens or persists longer due to work conditions (CDC, 2020).
WMSD is due to lifting heavy objects and performing repetitive forceful tasks (CDC, 2020).WMSD is evident in WorkSafe's Outcomes Dashboard presented in December 2019, which saw a survey from 2004 to 2006 show "repetitive tasks" being the highest risk factor for WMSD, affecting nearly 70% of the general New Zealand workforce and M aori being 79% affected.WMSD was "Lifting", the fourth highest cause of work-related injury, with the affected general workforce reaching nearly 40% while M aori was almost 55% (WorkSafe NZ, 2019a, b).
According to NIOSH's equation for calculating the Recommended Weight Limit (RWL), seven factors are critical to manual lifting (Choi et al., 2012and Singh et al., 2014and VelocityEHS, 2020).
(2) Horizontal distance (HD) -How far is the load from the body when lifting?
(3) Vertical distance (VD) -To what height is the load lifted (such as lifting from the floor)?
(4) Travelling distance (TD) -How far does the load need to be lifted?
(5) Frequency of lift (FL) -How often is the load lifted?
(6) Asymmetric turns (AT) -What angle does the lifter's body take when lifting (posture while lifting)?
(7) Coupling grip (CG) -What is the quality of hand grip on the load?

SASBE
The government agencies such as WorkSafe New Zealand regulate workplace health and safety practices and minimise risks by assessing data and updating policies.However, institutional and academic research and innovation within the technology space have seen the rise of wearable technology (WT)the use of electronics worn on the body (Yasar, 2022) that provides a more personalised approach to reducing these injuries.
Adding technology to the workplace can provide productivity and efficiency and be used for health and safety benefits such as improving safety performance (Safety Champion, 2021 andKarakhan et al., 2019).However, when adopting new technology, WorkSafe NZ (2019a, b) states that it is essential for employers to consider the health and safety risks of the technology itself, as the responsibility falls on the employer if a worker is injured using the technology (WorkSafe NZ, 2019a, b).
Their guidelines are (WorkSafe NZ, 2019a, b) as follows: (1) Consider whether the new technology is fit for its purpose.
(2) Check that the manufacturer/designer has considered the health and safety impact of the technology itself.
(3) Check that technology is proven and reliable.
(4) Consider whether the new technology adds additional health and safety risks or alters any health and safety risks.
Examples of technology currently being used in manufacturing are; Automation and Robotics, Augmented/Virtual Reality Software and Mobile Apps, IIoT Sensors and Wearables (Getac, 2021).These technologies add a physical element to the working environment and human bodies.Though the lifting process is easy, adding a physical component is uncomfortable to humans, and their working and typical poses change over time and may lead to WSMD in the long term.But when done correctly, technology can improve a workplace's health and safety performance (Karakhan et al., 2019).However, there is a current knowledge gap on technologies that use no additional physical elements to humans to assist in manual lifting and reduce WMSD.This paper revolves around the research question: RQ1.How can Smart vision-based analysis and error deduction of human pose technologies that use no additional physical elements to humans to assist in manual lifting and reduce WMSD be adopted?
The research objective was to develop and demonstrate the concept and use of the novel smart vision-based method (SVBM) for analysis and error deduction of human pose to reduce musculoskeletal disorders in construction.This paper aims to highlight the concept and the use of the novel smart vision-based analysis and error deduction of human pose to reduce musculoskeletal disorders in construction.

Literature review
SMART Technologies aid in data collection, training and physical work that aims to reduce WMSD.Over a period, training methods were developed to correct the pose during manual lifting.Digital technologies were subsequently deployed to enhance pose deduction and training.Employees get advice and training on the correct pose and actions during manual material handling at the working site.These trainings are based on physiotherapy principles and use minimal technology.However, most research in the past decade proposes getting the best results using SMART Technologies.Automation and robotics are estimated to reduce workplace physical and psychological injury by 11% by 2030 (Horton et al., 2018).Since 2009, the amount of robotics used worldwide has increased rapidly due to the declining cost and their capacity and ability improving (Horton et al., 2018).Along with robotic machines, drones have been used to monitor workplaces and minimise health and safety risks by accessing hazardous locations such as tunnels, mines, storage tanks, etc., to either monitor or collect samples (Horton et al., 2018 and Chubb, n.d).They help increase health and safety by replacing or assisting workers in doing dangerous tasks or providing relief from the boredom of repetitive tasks for workers (Horton et al., 2018).Augmented reality (AR) overlaps digital information in the real world that uses devices such as smart glasses or mobile phones (Getac, 2021).Virtual reality (VR) is a passive or interactive computer-generated simulation where the user puts on a VR headset (Getac, 2021).VR allows workers to build their knowledge and practice awareness to reduce incidents in the workplace (Strivr, n.d.).In contrast, AR allows that information and knowledge to be shown in real-time and place (Daniels and Dustin, 2022).VR and AR have been used predominantly in training and educating workers in dangerous tasks in a completely safe space (Getac, 2021 and Chubb, n.d.).Software and mobile apps are one of the most used technology systems implemented into the workplace and most accessible due to most workers having mobile devices (Safety Champion, 2021 and Chubb, n.d.).Software and mobile apps began with connecting workers across the workplace, whether onsite or offsite, reporting health and safety hazards, and accessing real-time data (Safety Champion, 2021;Chubb, n.d. and Schulz, 2021).The app technology benefits by utilising the already installed devices within the mobile phones.The app features include the following (Schulz, 2021): (1) Linking location data using QR codes scanned by the device's camera.
(2) Improving health and safety incident reporting by capturing via a camera or using voice-to-text to relay information.
(3) Using the camera's advanced motion-capture technology to make manual ergonomic assessments to reduce musculoskeletal disorders.
(4) Providing workers with accessible training and resources on hand.
(5) Gives more accountability to workers in managing health and safety risks.
Further, the different technologies connect amongst themselves.IIoT (Industrial Internet of Things) is a sensor network that connects and communicates with computers and software, improving efficiency, automating processes and adding AI within the workplace (Ordr, n.d).
While IIoT includes many of the same definitions as IoT (Internet of Things), IIoT specialises in the manufacturing and industrial sector and includes technologies such as; machine learning, big data, sensor data, automation and machine-to-machine communication (Kumar et al., 2019).Examples of how IoT sensors are used include (Ordr, n.d. andEshghi, 2022 andKumar andIyer, 2019): (1) Remote management -Being able to manage machines and workers from afar.
(2) Predictive maintenance -IIoT temperature and vibration sensors can monitor conditions to alert when a machine is close to expiring, acting out of its normal parameters, or needing maintenance.
(3) Remote monitoring -Especially used in production facilities where workplaces can monitor variables such as time, input and power consumption for some machines.
(4) Asset tracking -Using sensors such as GPS and RFID tags, workplaces can track and trace inventory, assets and supplies.
(5) Safe work environment -Particularly within facilities dealing with chemicals, IIoT air quality sensors can provide reassurance or alert when the air quality changes.

SASBE
Furthermore, wherever manual tasks are unavoidable, essential technologies such as "wearables" blend with humans to help ease the work.Wearable technology (WT) is "an electronic device designed to be worn on the user's body" (Yasar, 2022) The recent development of Biomechanical Wearable Technology aims to assess performance during tasks and movements to help health and safety professionals, ergonomists, and workers prevent and identify potential health and safety risks.Poitras et al. (2019) describe that the current use of workplace assessments (such as questionnaires) is subjective, unlike WT, which gives a more personal approach.The research inclusion of WT within the workplace is due to industry workers having the same movements, user performance and prevention of injury goals as sports athletes (McDevitt et al., 2022).
Exoskeletons are the most prominent biomechanical WT device today.More than 7,000 units were sold in manufacturing alone in 2018, with an estimated growth rate of more than 50% between 2019and 2024(Esko Bionics, 2020).McDevitt et al. (2022) define exoskeletons as "wearable machine devices that augment human performance, primarily for heavy lifting tasks".They were introduced for military use in 1965, but since the late 1990s, exoskeletons' workplace use has increased significantly.While most countries' health and safety policies encourage redesigning the workplace with an ergonomic approach, this is impossible in temporary workplaces.Exoskeletons help compensate for situations like this while also improving the quality of work (Esko Bionics, 2020).Exoskeletons use robotic technology to provide postural support while following the user's movements without misalignment or resistance.Exoskeletons reduce the mechanical energy needed to complete tasks (which helps reduce fatigue) while improving both the range of motion and muscle fatigue or activation (McDevitt et al., 2022).Using the exoskeletons reduces stress on the shoulder muscle by 30%, which is the most common muscle to be impacted by injuries while taking the longest to heal and return to full function (Esko Bionics, 2020).Having exoskeletons support older workers in handling a physically demanding task (Okpala et al., 2022).Exoskeletons can either be powered or passive and are currently used in three main ways (AmTrust, n.d): (1) Back-assist: exoskeletons support the lumbar spine while lifting.

Smart vision-based analysis
(2) Shoulder and arm assist: Exoskeletons support sustained overhead work.
(3) ELeg-assist: exoskeletons to support the ankle, knee, and hip joints while carrying a load.
Large companies such as Toyota, Ford and Boeing have all adopted exoskeletons into the workplace, receiving positive worker feedback, less exertion, less discomfort and reduced injuries (Zelik, 2021).Ford Motor Company, which adopted the technology in 2011, has seen an 83% reduction in injuries by those who use the exoskeletons (Esko Bionics, 2020).Rexbionics has developed exoskeletons for those with walking disabilities.However, it is still not used in NZ Industries.Exoskeletons' main barrier is cost and usability, as one person can only use them simultaneously (McDevitt et al., 2022).While Exoskeletons assist a person in a task r, IMU devices monitor performance.Eleven existing companies produce exoskeletons that fit the upper body, semi-full body and whole body that aid in picking, carrying, bending and lifting, prolonged standing, extended arm and repeated motion (Okpala et al., 2022).IMU devices consist of accelerometers, gyroscopes, and magnetometers that monitor a worker's task/posture for performance analysis, exposure to risk analysis, and to help with task redesign (McDevitt et al., 2022).These are used within sports, but due to their minimal size, they are beginning to see an increase in use within the manufacturing workplace as they provide real-time worker monitoring to help identify potential risks on the job.The types of IMUs are the following: (1) Accelerometer-quantify and monitor dynamic linear acceleration, used to monitor biomechanical parameters of human movement.
(2) Gyroscopes-monitor the angular rate of change to measure axial rotation and provide valuable positioning measures.
Research has shown pressure sensors can also be applied with IMUs to measure fatigue and imbalance.Antwi-Afari and Li (2018) used IMU sensors to track balance loss through the pressure sensors used in the insole of the worker's shoe.The results showed differences in gait (a person's pattern of walking) during balance situations.This combination can also detect issues such as asymmetries or specific limb movements indicating fatigue (McDevitt et al., 2022).Other research includes Akhmad et al. (2020) creating a device using nine IMUs to replicate NIOSH's Lifting Equation.While the team said there needed to be more work, it also provided evidence that this could be possible.
Much literature research has been done on IMUs within a laboratory setting, but organisations such as DorsaVi (n.d.) provide a commercial WT solution using IMUs called ViSafe (Gleadhill, 2019).While there are many benefits to wearable technology, many challenges prevent it from being widely used within society.Kalia (2017) describes six significant challenges for WT and how they affect the user: (1) Battery life: Due to WT devices being relatively small, the battery needs to be small.
And with many WT devices worn constantly throughout the day, the battery life is drained quicker.
(2) Ergonomics: User comfort is paramount in WT, as in textile clothing.Some may find discomfort in having a device strapped around them for long periods, or the device's material is uncomfortable-mainly since most WT devices include a rigid component to house the electronics, accompanied by fabric straps.Some WT devices can also heat up over time.

SASBE
(3) Differentiating and providing value: People do not see the value of having WT compared to other electronic devices, so using them is challenging.
(4) Sealing: Waterproofing WT from water and sweat is crucial for WT devices, as work can corrode metal components.
(5) Miniaturization and integration: With WT getting smaller and smaller, it is challenging to reduce components size such as radio/antennas, making it more difficult to have a strong signal.
(6) Safety, security, privacy: Most safety concerns come from using Lithium batteries within WT devices and their proximity to the body and potential radiation emissions.
WT devices are potentially hackable, threatening security and privacy.
With detailed research and statistics showing how wearable technology can reduce injuries related to work-related musculoskeletal disorders (WMSD), there is hesitation in the industry to adopt the technology.McDevitt et al. (2022) and Navarra (2022) discuss the trust and reluctance to use this new technology.Though WT provides accurate data while not injuring the worker themselves, the discomfort, privacy issue, and constant watch concern users (Navarra, 2022;Kalia, 2017).
In laboratories, visual object tracking uses inertial measurement unit (IMU) sensors for pose deduction (Wei et al., 2021).In the past, construction workers' weight gain detection and recognition using a single wearable inertial measurement unit (IMU) method (Chen et al., 2021).Similarly, nine nodal points were tracked using multiple wearable radio-frequency identification (RFID) sensors to monitor human poses; notably, this system focuses on hand positions (Lee et al., 2019).In implementing sensors and devices, the depth sensors take advantage of a portable, accurate, low-cost device for capturing human pose and reconstruction (Taddei et al., 2014).In large-scale working places, data capturing and environmental conditions were considered factors affecting the output result's accuracy (Pang et al., 2021).For example, construction's dynamic work nature needs constant material shifting; hence, manual lifting occurs more frequently in different surroundings.To overcome this barrier, recently, researchers used Computer vision to analyse human pose errors.One computer vision method is 3D Mocap, which aligns the digital video images using similar pixels region segmentation based on pre-defined image frames to calculate the human pose errors (Rogez et al., 2008).In another work, different pose outline measurements utilise augmented reality (AR) to gather the human postural errors; this method does not use any sensors attached to the human body (Hellsten et al., 2021).However, the calculations are inaccurate as they only compare body outlines that do not specify individual human nodal points (part) movement errors.The main disadvantage is that the method only relays on the standing frames and cannot be used for other positions.
Human pose estimation is closely related to analysing human motion from images and video (Poppe, 2010).Numerous research on Vision-based systems has been undertaken in the new Millennium.Moeslund et al. (2006) Hellsten et al. (2021) state "that most promising techniques from a physiotherapy point of view are 3D marker-less pose estimation based on a single view as these can perform advanced motion analysis of the human body while only requiring a single camera and a computing device".Lan et al. (2022) review of 153 articles concludes that vision-based systems have been widely applied to action analysis, human-computer interface, gaming, sports analysis, motion capture and computer-generated imagery.Kulkarni et al. (2023) discuss 49 papers on offline computer vision and machine learning algorithms, such as feed-forward neural networks, convolutional neural networks (CNN), OpenPose, and MediaPipe, with the exception of single live surveillance camera based on fall deduction.Through their review, Lan et al. (2022) identify that a gap in the vision system exists still in the analysis of human poses considering the wide diversity of the human body (Lan et al., 2022).Most of these works are indoors, using high-quality cameras and images, yet to be adopted in real-life situations (Hellsten et al., 2021;Lan et al., 2022).
Researchers used computer vision to deal with musculoskeletal disorders since the 1990s.For example, Wang et al. (1996) initially analysed lower back issues using computer vision and super imposed biomechanical model to identify stress points.Mehrizi et al. (2018) proposed a modified algorithm based on the Twin Gaussian Process (TGP) to extract the 3D pose from each frame of the videos captured from 2 lab cameras to develop and validate a computer vision-based marker-less motion capture method to assess lifting tasks and reduce Workrelated musculoskeletal disorders (WMSD).Snyder et al. (2021) suggested an IMU sensor captured lifting dataset analysis using a 2D vision and CNN.However, for real-world use, they suggest minimising the number of sensors which will significantly advance the practicality, reducing cost and eliminating the awkward placement of several sensors.Jung et al. (2022) developed a computer vision-based lifting task recognition method using CNN and open pose with 17 nodal points.Earlier, Huang and Nguyen (2019) used multiple cameras and OpenPose to develop 2D and 3D skeleton movement tracking.But OpenPose can detect persons in an image only if the nose or the neck keypoint is not occluded and uses fewer nodal points.
The construction industry has also embraced vision-based technologies.Liu et al. (2017) use a convolutional neural network (CNN) to estimate human pose on sequential images from construction sites to analyse unsafe behaviour monitoring, ergonomic analysis and productivity estimation.Roberts et al. (2020) used 317 annotated offline RGB video feeds of bricklaying and plastering operations to estimate each frame's pose-tracking body joint.However, the result display is potentially cluttered that did not consider carrying movements and performing an ergonomics assessment.Luo et al. (2020) proposed a methodology framework to track construction equipment's location, pose and movement to avoid potential collisions and other accidents to achieve safer onsite conditions.However, they state there are limited studies that automatically estimate the full body pose (Luo et al., 2020).The survey also revealed that smart vision-based analysis and error deduction of human pose to reduce musculoskeletal disorders in construction during manual lifting are yet to be developed.
Eventually, the vision-based human pose estimation approach is still lab-based and needs to be implemented for applications in the real world (Lan et al., 2023).The existing methods of vision-based HPE are offline, based on lightweight neural networks that are manual and heuristic design.Implementing these state-of-the-art neural networks in mobile or embedded devices incurs enormous computational costs and is yet to be operationalised (Lan et al., 2023).This literature survey identified a lack of Real-time human pose deduction using mobile or embedded devices that can be used in construction sites.The current lab-based methods used multiple cameras and up to 17 nodal points (Xu et al., 2023).The survey also revealed that existing computer vision-based applications do not consider the combination of SASBE angles, neckline, and Torso line for manual handling pose deduction and analysis in an actual construction work environment.Thus, it is crucial to design real-time neural networks and vision-based systems for efficient human pose estimation using a single camera mobile application that is cost effective and accurate.This paper proposes the real time Smart visionbased method (SVBM), an AI program to correlate the computer vision (recorded and live videos using mobile and embedded cameras) that aids in manual lifting human pose deduction (using 33 nodal points), analysis (combination of nodal points, angles, neckline and torso line), and real-time training in the construction sector.

The method
This research is based on the pragmatism approach that evaluates theories or beliefs in terms of the success of their practical application, the solution that takes a realistic approach (Smith, 1978).This differs from the qualitative paradigm (which relies on objectivism and positivism) and the quantitative paradigm (which depends on deduction and confirmation) in the sense that the outputs are proven for their practicality (Maarouf, 2019).Though the pragmatic approach does not support the assumption made in the quantitative and qualitative techniques, it is the most common philosophical justification for practical research outputs (Maarouf, 2019).The pragmatic research aims to develop a SMART vision-based analysis and error deduction of human pose technology that uses no additional physical elements to humans to analyse manual lifting pose and reduce WMSD that can be adopted Practically.The method adopted is shown in Figure 1 and each step is explained in the following subsections.

Convolutional neural network and BlazePose
The convolutional neural network (CNN) image recognition and object detection is a key architecture that has revolutionised the object detection domain and is the backbone architecture of human pose estimation (Kulkarni et al., 2023).Researchers have used AI-based CNN and advanced computer algorithms that work with vision tracking to conduct the human three-dimension (3D) pose estimation.The most used software platforms for this purpose are OpenPose and BlazePose.While calculating the motion of the parts of our human body, the video images are segmented into multiple single photos in 3D pose reconstruction

Smart vision-based analysis
platforms such as OpenPose (Pang et al., 2021).In some studies using OpenPose, additional algorithms were needed to calculate the pose reasonably.For example, Corin (2021) additionally used a triangulation algorithm to calculate the limb joint kinematics from the videos.The challenge with OpenPose is that it requires camera calibration and takes more time to deliver outputs when videos from two or more cameras are analysed.Another work related to visual object tracking is the 3D posture using three artificial neural networks within two different positions (Aghazadeh et al., 2020).Though these results satisfied the required efficiency, hand locations needed to be input manually for pose estimations.BlazePose (a high-fidelity human pose tracking solution within the MediaPipe Pose software framework) is another CNN for human pose tracking developed by Google, which detects the 33 nodal points of the human body, which are higher compared to others such as COCO, BlazeFace and BlazePlam.The closer the key points are used, the more the human pose can be simulated; BlazePose offers 33 nodal points that are vital and closer compared to other platforms.The nodal points are shown in Figure 2. The BlazePose platform is better than OpenPose and supports mobile and laptop platforms (Bazarevsky et al., 2020).However, this platform does not consider the neck posture and backline of the human body, which is a disadvantage.
This research used a mobile camera and mobile-based application to capture pose errors and quantify the angles of 33 nodal points to help experts and workers to analyse and correct the mistakes.The primary disadvantage of existing vision-based analysis is that it does not consider the neck posture and backline of the human body, uses advanced laboratory-based cameras to capture data, and studies are lab-based.To assess the neck posture and backline of the human body, the research with novelty combined BlazePose with the OpenCV platform for calculating the head, neck and shoulder positions in a given video frame.OpenCV (a 3D  (Kuehne, 2011)) is an image and video processing and vision recognition platform.OpenCV was used for video capturing, storing, camera calibration and geometric measurement data transfer for pose processing.Mainly the research has sought to use pose detection and get more accurate data.The data acquired and calculated include the following: (1) Visual human pose and movement (2) Identification of nodal points and landmark (3) The location of critical human body nodal points.
(4) The angle between the nodal points in a particular frame (5) The angle between the neckline, torso line, and the human body axis in a particular frame (6) The angle and movement of nodal points over a period (7) The distance between nodal points (8) The hip-shoulder length change over a period

Data acquisition
The data acquisition for manual handling was based on multiple box lifts captured on mobile cameras, and the related videos were recorded, coded and stored.The size of the recordings was not limited, and different actions were captured.The footage was separated into frames to analyse results under various conditions.A volunteer participant mimicked the construction sector's lifting task, captured in 18 video clips using a Samsung S7 Edge phone with a 1220 3 960 pixels resolution.The camera was placed on the rear-left side at 1358 from the sagittal plane of the box.The videos were stored in an HP laptop with CPU configuration 11th Gen Intel(R) Core (TM) i7-1185G7 @ 3.00 GHz, 32 gigabytes of RAM embedded with Matplotlib platform.The participant was asked to stand in front of the box weighing 10 kilograms and finish the lifting tasks without moving their feet (refer to Figure 3).The participant chooses the initial distance between them and the box and the Additionally, each lifting sequence ended with a twist of 08, 308, and 608 to the right-hand side of the starting position (Yamauchi and Iwamoto, 2010).Each lifting event was repeated twice.

Identification of nodal points and landmark
OpenCV was used to deduct a person using a heatmap within the captured video frame that helps to isolate the human from other objects.Then the isolated pose is superimposed with nodal points of Blazepose.Next, for the landmarks, a relative position is used to determine the body parts, i.e. x and y-axis deduction that gives an actual value, which is calculated using OpenCV.In the next step, the landmarks are used to calculate the angular motion of the human body parts and provide an output on the mobile in real-time.This method tracked 33 human body nodal points Figure 2, rendering landmarks and background segmentation.The landmark helps to identify the location of the human body within the video image, and background segmentation helps to isolate the human body from other objects in a work environment.Figure 4 below shows human landmark detection and angle calculation process flow.

Pose estimation using nodal points
The next step is pose estimation from the video frames or photos.This novel method of combining BlazePose and OpenCv explores the consistency of the three main features, converting images to Raw blue, green, and red samples based on heatmap (RGB), 3D pose detection, and angle calculation.The novel method for human pose deduction using video frame/photo involves three models that work in conjunction: (1) a detector using a heat mapping principle on images captured using cameras, (2) the location of the human body associated with a region of interest (ROI), (3) the angle of the given nodal point of the human body.
The workflow is shown in Figure 5.

The angle between the nodal points in a frame
The next step is the initial angle calculation.The camera alignment to detect the view of our human pose is used to measure the angle between the nodal points.The digital architecture (workflow) is given below in Figure 6.Similarly, the angle between the left shoulder and left elbow 5 tan À1 ((y11-y15)/(x11-x15). Further, to find the angle between the left shoulder and left wrist, keeping the left elbow as the pivotal point, the angle between the left shoulder and left elbow and the angle between the left elbow and left wrist were added.

The angle between the neck and torso lines and the human body axis in a frame
The angle between human shoulders and hip is challenging to detect using BlazePose and OpenCV because these have no practical neckline in anatomical detection.Therefore, this research used multiple methods to generate the angle of the neck and hip area.The angle is After this, the shoulder nodal points were used as the pivotal point.Similarly, the torso line connects the hip and the shoulder, where the hip is considered a pivotal point.The inclination angle calculates the result of the person bending a threshold angle.
Taken the neckline as a base, the points are P 1 (x 1 , y 1 ) (shoulder), P 2 (x 2 , y 2 ) (eye), and P 3 (x 3 , y 3 ) (any points on the vertical axis passing through P 1 ).The vector approach was considered to find the inner angle of three points.The angle between two vectors P 12 and P 13 is given by, θ ¼ arccos 0 @ P 12 !: P 13 !j P 12 !j:j P 13 !j 1 A Solving for θ We get 3.7 The angle and movement of nodal points over a given period Next, calculate the angle and movement of nodal points at a given time using the segregation of video frames and analysis of pixels.Following the IBM architecture of CNN (IBM, n.d.), the three-dimensional data for image classification and object recognition tasks were done, as shown in Figure 9. Distance, angle, and distance angle relation between frames will give relative contraction or stretch over time that satisfies: In the x D i (t À 1) is a position of an agent, an agent defines by i, D is the search space dimension, and t is the process's iteration time.Prediction of the pose location enhanced with the below equations.As a part of the implementation, the Pooling layer took the part of getting diminished the feature maps produced.Because it is a necessary part of the human body pose detection features.The pooling window size and the stride are hyperparameters that can be adjusted to change the size of the output feature map.It can also be zero-padded to maintain the exact size of the input feature map.
Equations ( 3) and ( 4) work as search agents to enhance the measuring point accuracy for the prediction used to calculate the angles of the detection locations.Then the difference between initial and a given time (video frame) angles were used to calculate the bending movement of a particular nodal point (Joint or human position), subsequently used to assess the relative pose and associated errors.

3.8
The distance between nodal points over a given period Another major factor in determining the extent of spline bending during manual lifting is the change in the distance between the shoulder and hip nodal points.The distance is employed to measure the offset distance between two points.The fixed nodal points were the hip, eyes, and shoulder, as these points are always more or less symmetric to the central axis of the human body.With this assumption, the alignment features are incorporated as Let the initial video frame be F1, and the Last frame after the completion of the lift be Fn.From Figure 8, the left shoulder and the hip's nodal points are P12 and P23, respectively.Let the landmarks for: 3.9 The hip-shoulder length change over a given period To calculate the change in distance between the initial left shoulder-hip distance and to final left shoulder-hip distance: First, calculate the F1 left shoulder-to-hip distance, Then, calculate the Fn left shoulder to hip distance, Change in the length of hip-shoulder distance,

Smart vision-based analysis
Since the measurements are based on an individual's real work-life video, the results are customised to the individual.No specific methodology was employed for training the model since the research proposed capturing real-life experience; the model was advised to lift and turn to his comfort.Though the study used a male model, due to the use of BlazePose and CNN, the dataset's size, diversity, and representativeness can be equated to all gender and all sizes when captured as full-body visuals.The calculated angle and human poses were validated using REBA (Rapid Entire Body Assessment).This evaluation method considered human body postures, movements, and actions.The images in Figures 3-12 were captured using the mobile camera, and the 33 nodal points and angles shown are those projected in real-time with the aid of AI and mobile applications.Using mobile cameras and application helps capture and display angle and other data in real-time, in natural construction environments, and in instant pose correction and training.The application can display angles to 10 (the AI program can be altered to be more precise if required) and isolate the backgrounds to capture the human pose.These features aid in capturing human poses in natural construction work environments without pre-settings.

Results
The visual and quantitative results from the experimental captured data and the HMDB data set are given in this section.

SVBM accuracy and low light intensity test
Figure 11 shows the ability of the SVBM regarding the capability to display angle variation to a minimum accuracy of 18 (refer to the foot).Figure 12 shows the ability of SVBM to process video frames and images with low light intensity, using heatmap and segmentation.
Figure 12 also displays the SVBM's ability to isolate the backgrounds and process the angle of different nodal points.

SVBM validation using HMDB dataset
Figures 13 and 14 show the SVBM's ability to process recorded videos.The images are from video clips of the HMDB dataset.The HMDB dataset is an extensive, publicly available human motion capture data for human motion analysis, recognition, and understanding research.It contains over 3,600 video clips of human actions, with more than 50 action categories, such as walking, running, jumping, and dancing.The videos were captured in various settings, such as indoor and outdoor scenes, and were recorded with multiple cameras to capture different viewpoints.The HMDB dataset has been widely used as a benchmark for evaluating the performance of human action recognition algorithms, and many state-of-the-art methods have been developed using this dataset.Each video clip in the HMDB dataset is labelled with the action category it represents.The dataset also includes information about each video's camera viewpoint, frame rate, and resolution.

SVBM repeatability test
Figure 15 shows experimental data's human body angle variations at a fixed time of repeated lifting.The time was set at 3s from the start of the manual handling, and various angles of Torso inclination, neck, elbows, knees and ankles were plotted.Such analysis helps understand the pose variation at a given point during repetitive lifting and is helpful for pose Figure 16 shows the study of human body angle variations at a fixed time of repeated lifting using HMDB data.
The SVBM was validated for accuracy of angle, repeatability of angle results and ability to show the angle over a period.The accuracy of the angle calculated was 18.The repeatability was tested by running the program 32 times on a single video clip that lasted 9 s.The video recording was at 30 frames/sec and had 293 frames.Four nodal points and two line angles (left hip, right hip, left knee, right knee, torso inclination, and neck inclination) were plotted for each frame of the run.The results were identical for each frame on every run.This meant that the program could precisely return results to an accuracy of 18.The program could plot nodal points and inclination lines for each frame, demonstrating its ability to acquire data over time.Figure 17 shows the angle movement of a single video clip.(1) The accuracy score is a performance metric that measures the overall accuracy of a classification model.It is calculated as the proportion of correct predictions (true positives and negatives) out of the total number of predictions.
(2) The precision score is a performance metric that measures the proportion of true positive predictions among all positive predictions.It is calculated as the number of true positives divided by the sum of true and false positives.Fracy, precision and recall scores plotted against the true positive rate (TPR) and False positive rate (FPR).
(1) TPR, also known as sensitivity, is the proportion of actual positive cases correctly identified as positive by a classification model.It is calculated as the number of true positives divided by the sum of true positives and false negatives.

Discussion
WMSD in workers caused by repetitive tasks and lifting heavy objects is a significant health and safety concern across all countries reviewed (New Zealand, Australia, the United States, China and the United Kingdom).The primary responsibility of employee care is the employer and health and safety regulators providing guidelines on mitigating injury from manual handling.However, Poitras et al. (2019) state that the current techniques used for risk assessment, such as questionnaires and providing guideline handbooks, are subjective.The subjective analysis gives varied and generalised results over time.Since the manual lifting related to WSMD is individualistic, the risk assessment must be individual, objective, and reliable (Singh et al., 2014).WT, such as Exoskeletons and IMUs, offers employees and workers personalised quantitative data and assistance to reduce the risk of injury from lifting heavy objects.Irrespective of research and statistics showing the benefits of WT still a lack of adoption within the industry due to inconvenience, trust, and privacy concerns.This research aimed to eliminate the inconvenience caused by attachments to the body using SVBM.Considering the potential benefits of SVBM and the advancement of digital security and reliability, trust and privacy concerns will reduce over time.However, trust and privacy concerns remain at large, currently.With the SVBM, there is potentially an individualistic constant watch.However, the potential benefits of SVBM could offset the concern.Unlike the WT, SVBM does not offer attachments to the body, thus reducing technology-related H&S concerns to a greater extent.Since the SVBM is compactable to a mobile phone application, the workers can self-video record their manual lifting and do selfassessments, which is impossible with manual training, WT and lab-based technologies.
SVBM offers quantitative data that is closer to WT.However, Lab-based experiments generate more accurate data.The prime disadvantage of lab-based experiments is that they are not conducted in a work environment that is primarily dynamic in the construction sector.The SVBM using a heat map and segmentation offers quantitative data in real work environments comparable to lab-based results.Unlike WT and lab-based technologies, which use close-to-body sensors, the vision can be captured using mobile and installed cameras from close and long ranges.Unlike the lab-based vision technologies that use multiple cameras installed at specific angles, this SVBM offers single camera-based mobile and installed camera-based analysis and results.The flexibility of placing the camera at any angle is available with SVBM, unlike lab-based technologies.However, similar to lab-based technologies, the results are provided live and post-processing.
Though numerous articles combine computer vision with human pose estimation, the literature survey identified a lack of real-time human poses deduction using single-camera mobile or embedded devices that can be used in construction sites.The SVBM is a real-time neural network vision-based systems for efficient human pose estimation that uses a singlecamera mobile application and is cost-effective and accurate.The proposed SVBM application is adoptable in mobile and camera-embedded devices, which can be used at workplaces for real-time human pose analysis.SVBM, an AI program, correlates the angles, neckline and torso line using computer vision (recorded and live videos using mobile and embedded cameras) that aids in manual lifting human pose deduction, analysis, and training in the construction sector.Unlike other vision systems that use high-quality images, SVBM can analyse low-intensity images and display angles to an accuracy of 18 in real time.The accuracy of the angle can be improved if required.The existing computer vision-based analysis uses up to 17 nodal points to calculate the human pose.

Smart vision-based analysis
In contrast, SVBM gathers 33 critical nodal point data of the human body in real work situations and calculates the body part angles with respect to the x-axis and y-axis and the difference in angles over a period of time.This provides greater analysis accuracy and reliability.The survey also revealed that existing computer vision-based applications do not consider the combination of angles, neckline, and Torso line for manual handling pose deduction and analysis in an actual construction work environment.In SVBM, video capturing can be done in most work environments; however, no vision blocking is allowed.WT could be disturbed at times due to the type of sensors used.For example, infrared sensors need a clear transmission, and RFID has a range at which data can be transmitted.The SVBM that uses human Isolation through frame segmentation and heatmap mean most workrelated background.However, it also has a range limitation; the data interpretation is more accurate when the range is closer.Previously, mobile camera's advanced motion-capture technology to make ergonomic assessments to reduce musculoskeletal disorders (Schulz, 2021) by post-lab processing.This research is novel because it provides quantitative data capturing, real-time analysis, and visual capturing in actual work environments.This research demonstrated by combining the BlazePose and OpenCV platforms, more nodal points can be added.
Further, using heat maps, segmentation and model calculations, the contraction of Spinal cord stretch and contraction can be deducted and added in future.The ease of data capturing using the mobile allows frequently comparing actual data for long-term effect analysis.The SVBM aids frequent real-life recording through mobile or camera-embedded devices and record keeping that can be used for training and treatment.Furthermore, the SVBM video analysis can be used for monitoring recovery.Furthermore, SVBM video analysis can monitor vulnerable workplace jobs and affect people in real environments.The SVBM considers all aspects of NIOSH's equation except the weight lifted.According to NIOSH's equation for calculating the Recommended Weight Limit (RWL), seven factors are critical to manual lifting (Choi et al., 2012and Singh et al., 2014and VelocityEHS, 2020).Further, like WT and lab-based technologies, the SVBM does not consider psychological factors (Khalaf et al., 2021) and operator and Environmental variables, as stated by (Drury and Pfeil, 1975), as these factors are qualitative and subjective.

Conclusion
WMSD in workers caused by repetitive tasks and lifting heavy objects is a significant health and safety concern in New Zealand, Australia, the United States, China and the United Kingdom.The primary health and safety responsibility lies with the employer and regulators providing guidelines to mitigate injury from manual handling.Researchers in the past have used various-techniques for risk assessment, such as questionnaires and providing training and guideline handbooks, that are subjective.In the recent digital era, WT, such as exoskeletons and IMUs, offers employees and workers personalised qualitative data and assistance to reduce the risk of injury from lifting heavy objects.To a reasonable extent, WT is used for training and pose correction during manual lifting in laboratories and workplaces.Irrespective of its benefits, WT lacks adoption within the industry due to attachment to the body, trust, and privacy.Attachments hinder operations, and the human pose must be adjusted due to the extensions.
Moreover, with attachments, the workers might find it difficult to work all day.In this research, using novel SVBM and with the ease of using the application on mobile devices, the authors established that attachment to the body could be redundant for training and pose correction during manual lifting.This reduces the health and safety risks of attachments of the WT.Further, the SVBM research highlights using commonly available computer vision-based systems such as mobile cameras and AI-based applications.

SASBE
Furthermore, the SVBM's real-time and offline analysis capabilities, low-intensity vision compatibility, and background isolation method are also discussed in this article, ensuring its use in natural work-life environments.SVBM gathers 33 critical nodal point data of the human body in real work situations and calculates the body part angles with respect to the x-axis and y-axis and the difference in angles over a period of time.The novelty of including the combination of angles, neckline, and Torso line for manual handling pose deduction and analysis in an actual construction work environment that existing computer vision-based applications do not consider helps in real-time analysis that is more accurate and reliable.The offline analysis additionally yields a change in hip-shoulder distance that can be used to calculate the arc of the spinal cord.Since the measurements are based on an individual's real work-life video, the results are customised to that individual.This would help measure the performance of the individual over a period and provide information on the change in pose pattern over the long run, which can be used for diagnosis, training, and prescribing recovery.
In this paper, we have demonstrated the feasibility of SVBM for worker pose detection and measuring operations live in real work-life situations using single-camera mobile and embedded devices.The SVBM can provide individualistic data useful for analysing individuals' health due to repetitive tasks.The practical uses of SVBM include training, pose estimation, pose variation analysis, and posture analysis concerning actual work environments in real time.The theoretical implications include mimicking the human pose and lab-based analysis without attaching sensors that naturally alter the working poses.This would help researchers develop more accurate data and theoretical models close to actuals.The critical limitation, like WT, is that the trust, privacy and psychological issues are not addressed in SVBM, which is acknowledged.However, the benefits of SVBM naturally offset this limitation to be adopted practically.Future research could focus on adding more nodal points to the spinal cord to get a direct output on overstretch or contraction.
Concluding, SVBM has the advantage of capturing the required data without interrupting the everyday working styles in natural work-life settings.With no wearable item, workers can perform activities and capture data.Instant analysis results can be obtained through mobile applications.SVBM also supports the analysis of recorded clips, and high-resolution and high-zoom cameras can capture video from a distance.Detailed analysis is possible offline.
refer to 350 articles between 2000 and 2006 that brief the vision-based initial works in this field.Researchers used a variety of methods for visionbased human pose deduction.For Example, Jain et al. (2015) used red, green, and blue colour components for each pixel, motion features and Convolutional Network architecture to deduct human body pose in the video.Xu et al. (2023) used 17 body points and multiple surveillance cameras for offline abnormal human-posture recognition analysis.Dang et al. (2019) comprehensively surveyed sensor-or vision-based human activity recognition.Their survey of 64 papers identified single and multi-person pipeline analysis, heat map analysis and CNN analysis for various applications using 5-17 critical nodal human body points.Zheng et al. (2020) survey of 309 articles acknowledge offline action recognition, prediction, Smart vision-based analysis detection, and tracking as key outputs of vision-based systems that are handicapped by video resolution, body deformation compared to standard models and a number of parameters.Hellsten et al. (2021) discuss the literature on the Potential of Computer Vision-Based Marker-Less Human Motion Analysis for Rehabilitation.
Figure 1.The method

Figure 2 .
Figure 2. The 33 pose landmarks of the human body (adopted from Media pipe pose)

Figure 3 .
Figure 3.The dataset of the human pose of lifting one and two boxes

Figure 4 .
Figure 4.The process flow of human landmark detection Figure 5.The workflow of the calculation of the angle Figure 7.The angle using three nodal points Frame F1 Left shoulder be x1, y1 and Frame Fn be xn, yn.Then, Left shoulder movement distance; C¼ √ðF1P12 ðx1Þ - Figure 8.The measurement of neckline and Torso inclination Figure 9.The CNN classifier for human body poses detection and angle calculation Figure 10.Captured video frames, real-time superimposed nodal points and offline analysis Figure 14.Analysis of HMDB dataset 2 Figure 16.The analysis of human body angles of the HMDB dataset Figure 17.Nodal points movement