Search results

1 – 10 of over 2000
Open Access
Article
Publication date: 25 January 2024

Atef Gharbi

The purpose of the paper is to propose and demonstrate a novel approach for addressing the challenges of path planning and obstacle avoidance in the context of mobile robots (MR)…

Abstract

Purpose

The purpose of the paper is to propose and demonstrate a novel approach for addressing the challenges of path planning and obstacle avoidance in the context of mobile robots (MR). The specific objectives and purposes outlined in the paper include: introducing a new methodology that combines Q-learning with dynamic reward to improve the efficiency of path planning and obstacle avoidance. Enhancing the navigation of MR through unfamiliar environments by reducing blind exploration and accelerating the convergence to optimal solutions and demonstrating through simulation results that the proposed method, dynamic reward-enhanced Q-learning (DRQL), outperforms existing approaches in terms of achieving convergence to an optimal action strategy more efficiently, requiring less time and improving path exploration with fewer steps and higher average rewards.

Design/methodology/approach

The design adopted in this paper to achieve its purposes involves the following key components: (1) Combination of Q-learning and dynamic reward: the paper’s design integrates Q-learning, a popular reinforcement learning technique, with dynamic reward mechanisms. This combination forms the foundation of the approach. Q-learning is used to learn and update the robot’s action-value function, while dynamic rewards are introduced to guide the robot’s actions effectively. (2) Data accumulation during navigation: when a MR navigates through an unfamiliar environment, it accumulates experience data. This data collection is a crucial part of the design, as it enables the robot to learn from its interactions with the environment. (3) Dynamic reward integration: dynamic reward mechanisms are integrated into the Q-learning process. These mechanisms provide feedback to the robot based on its actions, guiding it to make decisions that lead to better outcomes. Dynamic rewards help reduce blind exploration, which can be time-consuming and inefficient and promote faster convergence to optimal solutions. (4) Simulation-based evaluation: to assess the effectiveness of the proposed approach, the design includes a simulation-based evaluation. This evaluation uses simulated environments and scenarios to test the performance of the DRQL method. (5) Performance metrics: the design incorporates performance metrics to measure the success of the approach. These metrics likely include measures of convergence speed, exploration efficiency, the number of steps taken and the average rewards obtained during the robot’s navigation.

Findings

The findings of the paper can be summarized as follows: (1) Efficient path planning and obstacle avoidance: the paper’s proposed approach, DRQL, leads to more efficient path planning and obstacle avoidance for MR. This is achieved through the combination of Q-learning and dynamic reward mechanisms, which guide the robot’s actions effectively. (2) Faster convergence to optimal solutions: DRQL accelerates the convergence of the MR to optimal action strategies. Dynamic rewards help reduce the need for blind exploration, which typically consumes time and this results in a quicker attainment of optimal solutions. (3) Reduced exploration time: the integration of dynamic reward mechanisms significantly reduces the time required for exploration during navigation. This reduction in exploration time contributes to more efficient and quicker path planning. (4) Improved path exploration: the results from the simulations indicate that the DRQL method leads to improved path exploration in unknown environments. The robot takes fewer steps to reach its destination, which is a crucial indicator of efficiency. (5) Higher average rewards: the paper’s findings reveal that MR using DRQL receive higher average rewards during their navigation. This suggests that the proposed approach results in better decision-making and more successful navigation.

Originality/value

The paper’s originality stems from its unique combination of Q-learning and dynamic rewards, its focus on efficiency and speed in MR navigation and its ability to enhance path exploration and average rewards. These original contributions have the potential to advance the field of mobile robotics by addressing critical challenges in path planning and obstacle avoidance.

Details

Applied Computing and Informatics, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2634-1964

Keywords

Article
Publication date: 16 June 2023

Chirag Suresh Sakhare, Sayan Chakraborty, Sarada Prasad Sarmah and Vijay Singh

Original equipment manufacturers and other manufacturing companies rely on the delivery performance of their upstream suppliers to maintain a steady production process. However…

Abstract

Purpose

Original equipment manufacturers and other manufacturing companies rely on the delivery performance of their upstream suppliers to maintain a steady production process. However, supplier capacity uncertainty and delayed delivery often poses a major concern to manufacturers to carry out their production plan as per the desired schedules. The purpose of this paper is to develop a decision model that can improve the delivery performance of suppliers to minimise fluctuations in the supply quantity and the delivery time and thus maximising the performance of the supply chain.

Design/methodology/approach

The authors studied a single manufacturer – single supplier supply chain considering supplier uncertain capacity allocation and uncertain time of delivery. Mathematical models are developed to capture expected profit of manufacturer and supplier under this uncertain allocation and delivery behaviour of supplier. A reward–penalty mechanism is proposed to minimise delivery quantity and time of delivery fluctuations from the supplier. Further, an order-fulfilment heuristic based on delivery probability is developed to modify the order quantity which can maximise the probability of a successful deliveries from the supplier.

Findings

Analytical results reveal that the proposed reward–penalty mechanism improves the supplier delivery consistency. This consistent delivery performance helps the manufacturer to maintain a steady production schedule and high market share. Modified ordering schedule developed using proposed probability-based heuristic improves the success probability of delivery from the supplier.

Practical implications

Practitioners can benefit from the findings of this study to comprehend how contracts and ordering policy can improve the supplier delivery performance in a manufacturing supply chain.

Originality/value

This paper improves the supplier delivery performance considering both the uncertain capacity allocation and uncertain time of delivery.

Details

International Journal of Quality & Reliability Management, vol. 41 no. 1
Type: Research Article
ISSN: 0265-671X

Keywords

Article
Publication date: 16 January 2024

Ji Fang, Vincent C.S. Lee and Haiyan Wang

This paper explores optimal service resource management strategy, a continuous challenge for health information service to enhance service performance, optimise service resource…

Abstract

Purpose

This paper explores optimal service resource management strategy, a continuous challenge for health information service to enhance service performance, optimise service resource utilisation and deliver interactive health information service.

Design/methodology/approach

An adaptive optimal service resource management strategy was developed considering a value co-creation model in health information service with a focus on collaborative and interactive with users. The deep reinforcement learning algorithm was embedded in the Internet of Things (IoT)-based health information service system (I-HISS) to allocate service resources by controlling service provision and service adaptation based on user engagement behaviour. The simulation experiments were conducted to evaluate the significance of the proposed algorithm under different user reactions to the health information service.

Findings

The results indicate that the proposed service resource management strategy, considering user co-creation in the service delivery, process improved both the service provider’s business revenue and users' individual benefits.

Practical implications

The findings may facilitate the design and implementation of health information services that can achieve a high user service experience with low service operation costs.

Originality/value

This study is amongst the first to propose a service resource management model in I-HISS, considering the value co-creation of the user in the service-dominant logic. The novel artificial intelligence algorithm is developed using the deep reinforcement learning method to learn the adaptive service resource management strategy. The results emphasise user engagement in the health information service process.

Details

Industrial Management & Data Systems, vol. 124 no. 3
Type: Research Article
ISSN: 0263-5577

Keywords

Article
Publication date: 12 April 2024

Youwei Li and Jian Qu

The purpose of this research is to achieve multi-task autonomous driving by adjusting the network architecture of the model. Meanwhile, after achieving multi-task autonomous…

Abstract

Purpose

The purpose of this research is to achieve multi-task autonomous driving by adjusting the network architecture of the model. Meanwhile, after achieving multi-task autonomous driving, the authors found that the trained neural network model performs poorly in untrained scenarios. Therefore, the authors proposed to improve the transfer efficiency of the model for new scenarios through transfer learning.

Design/methodology/approach

First, the authors achieved multi-task autonomous driving by training a model combining convolutional neural network and different structured long short-term memory (LSTM) layers. Second, the authors achieved fast transfer of neural network models in new scenarios by cross-model transfer learning. Finally, the authors combined data collection and data labeling to improve the efficiency of deep learning. Furthermore, the authors verified that the model has good robustness through light and shadow test.

Findings

This research achieved road tracking, real-time acceleration–deceleration, obstacle avoidance and left/right sign recognition. The model proposed by the authors (UniBiCLSTM) outperforms the existing models tested with model cars in terms of autonomous driving performance. Furthermore, the CMTL-UniBiCL-RL model trained by the authors through cross-model transfer learning improves the efficiency of model adaptation to new scenarios. Meanwhile, this research proposed an automatic data annotation method, which can save 1/4 of the time for deep learning.

Originality/value

This research provided novel solutions in the achievement of multi-task autonomous driving and neural network model scenario for transfer learning. The experiment was achieved on a single camera with an embedded chip and a scale model car, which is expected to simplify the hardware for autonomous driving.

Details

Data Technologies and Applications, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2514-9288

Keywords

Article
Publication date: 1 April 2024

Tao Pang, Wenwen Xiao, Yilin Liu, Tao Wang, Jie Liu and Mingke Gao

This paper aims to study the agent learning from expert demonstration data while incorporating reinforcement learning (RL), which enables the agent to break through the…

Abstract

Purpose

This paper aims to study the agent learning from expert demonstration data while incorporating reinforcement learning (RL), which enables the agent to break through the limitations of expert demonstration data and reduces the dimensionality of the agent’s exploration space to speed up the training convergence rate.

Design/methodology/approach

Firstly, the decay weight function is set in the objective function of the agent’s training to combine both types of methods, and both RL and imitation learning (IL) are considered to guide the agent's behavior when updating the policy. Second, this study designs a coupling utilization method between the demonstration trajectory and the training experience, so that samples from both aspects can be combined during the agent’s learning process, and the utilization rate of the data and the agent’s learning speed can be improved.

Findings

The method is superior to other algorithms in terms of convergence speed and decision stability, avoiding training from scratch for reward values, and breaking through the restrictions brought by demonstration data.

Originality/value

The agent can adapt to dynamic scenes through exploration and trial-and-error mechanisms based on the experience of demonstrating trajectories. The demonstration data set used in IL and the experience samples obtained in the process of RL are coupled and used to improve the data utilization efficiency and the generalization ability of the agent.

Details

International Journal of Web Information Systems, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 15 January 2024

Stephen E. Lanivich, Curt Moore and Nancy McIntyre

This study investigates how attention deficit/hyperactivity disorder (ADHD) in entrepreneurs functions through coping schema to affect entrepreneurship-related cognitions. It is…

Abstract

Purpose

This study investigates how attention deficit/hyperactivity disorder (ADHD) in entrepreneurs functions through coping schema to affect entrepreneurship-related cognitions. It is proposed that the resource-induced coping heuristic (RICH) bridges the conceptual gap between pathological cognitive executive control/reward attributes and cognitive resources, specifically entrepreneurial alertness, cognitive adaptability and entrepreneurial intent.

Design/methodology/approach

With data from 581 entrepreneurs, this study utilizes partial least squares structural equation modeling for analysis. Additionally, a two-stage hierarchical component modeling approach was used to estimate latent variable scores for higher-order constructs.

Findings

Findings indicate the RICH mediates the relationships ADHD has with alertness, cognitive adaptability and entrepreneurial intent.

Originality/value

The RICH is introduced as a mechanism to explain how ADHD indirectly influences entrepreneurial alertness, cognitive adaptability and entrepreneurial intent.

Details

International Journal of Entrepreneurial Behavior & Research, vol. 30 no. 4
Type: Research Article
ISSN: 1355-2554

Keywords

Article
Publication date: 15 April 2024

Xiaona Wang, Jiahao Chen and Hong Qiao

Limited by the types of sensors, the state information available for musculoskeletal robots with highly redundant, nonlinear muscles is often incomplete, which makes the control…

Abstract

Purpose

Limited by the types of sensors, the state information available for musculoskeletal robots with highly redundant, nonlinear muscles is often incomplete, which makes the control face a bottleneck problem. The aim of this paper is to design a method to improve the motion performance of musculoskeletal robots in partially observable scenarios, and to leverage the ontology knowledge to enhance the algorithm’s adaptability to musculoskeletal robots that have undergone changes.

Design/methodology/approach

A memory and attention-based reinforcement learning method is proposed for musculoskeletal robots with prior knowledge of muscle synergies. First, to deal with partially observed states available to musculoskeletal robots, a memory and attention-based network architecture is proposed for inferring more sufficient and intrinsic states. Second, inspired by muscle synergy hypothesis in neuroscience, prior knowledge of a musculoskeletal robot’s muscle synergies is embedded in network structure and reward shaping.

Findings

Based on systematic validation, it is found that the proposed method demonstrates superiority over the traditional twin delayed deep deterministic policy gradients (TD3) algorithm. A musculoskeletal robot with highly redundant, nonlinear muscles is adopted to implement goal-directed tasks. In the case of 21-dimensional states, the learning efficiency and accuracy are significantly improved compared with the traditional TD3 algorithm; in the case of 13-dimensional states without velocities and information from the end effector, the traditional TD3 is unable to complete the reaching tasks, while the proposed method breaks through this bottleneck problem.

Originality/value

In this paper, a novel memory and attention-based reinforcement learning method with prior knowledge of muscle synergies is proposed for musculoskeletal robots to deal with partially observable scenarios. Compared with the existing methods, the proposed method effectively improves the performance. Furthermore, this paper promotes the fusion of neuroscience and robotics.

Details

Robotic Intelligence and Automation, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2754-6969

Keywords

Article
Publication date: 19 March 2024

Mingke Gao, Zhenyu Zhang, Jinyuan Zhang, Shihao Tang, Han Zhang and Tao Pang

Because of the various advantages of reinforcement learning (RL) mentioned above, this study uses RL to train unmanned aerial vehicles to perform two tasks: target search and…

Abstract

Purpose

Because of the various advantages of reinforcement learning (RL) mentioned above, this study uses RL to train unmanned aerial vehicles to perform two tasks: target search and cooperative obstacle avoidance.

Design/methodology/approach

This study draws inspiration from the recurrent state-space model and recurrent models (RPM) to propose a simpler yet highly effective model called the unmanned aerial vehicles prediction model (UAVPM). The main objective is to assist in training the UAV representation model with a recurrent neural network, using the soft actor-critic algorithm.

Findings

This study proposes a generalized actor-critic framework consisting of three modules: representation, policy and value. This architecture serves as the foundation for training UAVPM. This study proposes the UAVPM, which is designed to aid in training the recurrent representation using the transition model, reward recovery model and observation recovery model. Unlike traditional approaches reliant solely on reward signals, RPM incorporates temporal information. In addition, it allows the inclusion of extra knowledge or information from virtual training environments. This study designs UAV target search and UAV cooperative obstacle avoidance tasks. The algorithm outperforms baselines in these two environments.

Originality/value

It is important to note that UAVPM does not play a role in the inference phase. This means that the representation model and policy remain independent of UAVPM. Consequently, this study can introduce additional “cheating” information from virtual training environments to guide the UAV representation without concerns about its real-world existence. By leveraging historical information more effectively, this study enhances UAVs’ decision-making abilities, thus improving the performance of both tasks at hand.

Details

International Journal of Web Information Systems, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 3 January 2024

Miao Ye, Lin Qiang Huang, Xiao Li Wang, Yong Wang, Qiu Xiang Jiang and Hong Bing Qiu

A cross-domain intelligent software-defined network (SDN) routing method based on a proposed multiagent deep reinforcement learning (MDRL) method is developed.

Abstract

Purpose

A cross-domain intelligent software-defined network (SDN) routing method based on a proposed multiagent deep reinforcement learning (MDRL) method is developed.

Design/methodology/approach

First, the network is divided into multiple subdomains managed by multiple local controllers, and the state information of each subdomain is flexibly obtained by the designed SDN multithreaded network measurement mechanism. Then, a cooperative communication module is designed to realize message transmission and message synchronization between the root and local controllers, and socket technology is used to ensure the reliability and stability of message transmission between multiple controllers to acquire global network state information in real time. Finally, after the optimal intradomain and interdomain routing paths are adaptively generated by the agents in the root and local controllers, a network traffic state prediction mechanism is designed to improve awareness of the cross-domain intelligent routing method and enable the generation of the optimal routing paths in the global network in real time.

Findings

Experimental results show that the proposed cross-domain intelligent routing method can significantly improve the network throughput and reduce the network delay and packet loss rate compared to those of the Dijkstra and open shortest path first (OSPF) routing methods.

Originality/value

Message transmission and message synchronization for multicontroller interdomain routing in SDN have long adaptation times and slow convergence speeds, coupled with the shortcomings of traditional interdomain routing methods, such as cumbersome configuration and inflexible acquisition of network state information. These drawbacks make it difficult to obtain global state information about the network, and the optimal routing decision cannot be made in real time, affecting network performance. This paper proposes a cross-domain intelligent SDN routing method based on a proposed MDRL method. First, the network is divided into multiple subdomains managed by multiple local controllers, and the state information of each subdomain is flexibly obtained by the designed SDN multithreaded network measurement mechanism. Then, a cooperative communication module is designed to realize message transmission and message synchronization between root and local controllers, and socket technology is used to ensure the reliability and stability of message transmission between multiple controllers to realize the real-time acquisition of global network state information. Finally, after the optimal intradomain and interdomain routing paths are adaptively generated by the agents in the root and local controllers, a prediction mechanism for the network traffic state is designed to improve awareness of the cross-domain intelligent routing method and enable the generation of the optimal routing paths in the global network in real time. Experimental results show that the proposed cross-domain intelligent routing method can significantly improve the network throughput and reduce the network delay and packet loss rate compared to those of the Dijkstra and OSPF routing methods.

Details

International Journal of Intelligent Computing and Cybernetics, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 1756-378X

Keywords

Content available
Article
Publication date: 31 January 2023

Fabio Parisi, Valentino Sangiorgio, Nicola Parisi, Agostino M. Mangini, Maria Pia Fanti and Jose M. Adam

Most of the 3D printing machines do not comply with the requirements of on-site, large-scale multi-story building construction. This paper aims to propose the conceptualization of…

Abstract

Purpose

Most of the 3D printing machines do not comply with the requirements of on-site, large-scale multi-story building construction. This paper aims to propose the conceptualization of a tower crane (TC)-based 3D printing controlled by artificial intelligence (AI) as the first step towards a large 3D printing development for multi-story buildings. It also aims to overcome the most important limitation of additive manufacturing in the construction industry (the build volume) by exploiting the most important machine used in the field: TCs. It assesses the technology feasibility by investigating the accuracy reached in the printing process.

Design/methodology/approach

The research is composed of three main steps: firstly, the TC-based 3D printing concept is defined by proposing an aero-pendulum extruder stabilized by propellers to control the trajectory during the extrusion process; secondly, an AI-based system is defined to control both the crane and the extruder toolpath by exploiting deep reinforcement learning (DRL) control approach; thirdly the proposed framework is validated by simulating the dynamical system and analysing its performance.

Findings

The TC-based 3D printer can be effectively used for additive manufacturing in the construction industry. Both the TC and its extruder can be properly controlled by an AI-based control system. The paper shows the effectiveness of the aero-pendulum extruder controlled by AI demonstrated by simulations and validation. The AI-based control system allows for reaching an acceptable tolerance with respect to the ideal trajectory compared with the system tolerance without stabilization.

Originality/value

In related literature, scientific investigations concerning the use of crane systems for 3D printing and AI-based systems for control are completely missing. To the best of the authors’ knowledge, the proposed research demonstrates for the first time the effectiveness of this technology conceptualized and controlled with an intelligent DRL agent.

Practical implications

The results provide the first step towards the development of a new additive manufacturing system for multi-storey constructions exploiting the TC-based 3D printing. The demonstration of the conceptualization feasibility and the control system opens up new possibilities to activate experimental research for companies and research centres.

Details

Construction Innovation , vol. 24 no. 1
Type: Research Article
ISSN: 1471-4175

Keywords

1 – 10 of over 2000