Search results

1 – 10 of 151
Article
Publication date: 18 October 2019

Shuhuan Wen, Xueheng Hu, Zhen Li, Hak Keung Lam, Fuchun Sun and Bin Fang

This paper aims to propose a novel active SLAM framework to realize avoid obstacles and finish the autonomous navigation in indoor environment.

293

Abstract

Purpose

This paper aims to propose a novel active SLAM framework to realize avoid obstacles and finish the autonomous navigation in indoor environment.

Design/methodology/approach

The improved fuzzy optimized Q-Learning (FOQL) algorithm is used to solve the avoidance obstacles problem of the robot in the environment. To reduce the motion deviation of the robot, fractional controller is designed. The localization of the robot is based on FastSLAM algorithm.

Findings

Simulation results of avoiding obstacles using traditional Q-learning algorithm, optimized Q-learning algorithm and FOQL algorithm are compared. The simulation results show that the improved FOQL algorithm has a faster learning speed than other two algorithms. To verify the simulation result, the FOQL algorithm is implemented on a NAO robot and the experimental results demonstrate that the improved fuzzy optimized Q-Learning obstacle avoidance algorithm is feasible and effective.

Originality/value

The improved fuzzy optimized Q-Learning (FOQL) algorithm is used to solve the avoidance obstacles problem of the robot in the environment. To reduce the motion deviation of the robot, fractional controller is designed. To verify the simulation result, the FOQL algorithm is implemented on a NAO robot and the experimental results demonstrate that the improved fuzzy optimized Q-Learning obstacle avoidance algorithm is feasible and effective.

Details

Industrial Robot: the international journal of robotics research and application, vol. 47 no. 6
Type: Research Article
ISSN: 0143-991X

Keywords

Open Access
Article
Publication date: 25 January 2024

Atef Gharbi

The purpose of the paper is to propose and demonstrate a novel approach for addressing the challenges of path planning and obstacle avoidance in the context of mobile robots (MR)…

Abstract

Purpose

The purpose of the paper is to propose and demonstrate a novel approach for addressing the challenges of path planning and obstacle avoidance in the context of mobile robots (MR). The specific objectives and purposes outlined in the paper include: introducing a new methodology that combines Q-learning with dynamic reward to improve the efficiency of path planning and obstacle avoidance. Enhancing the navigation of MR through unfamiliar environments by reducing blind exploration and accelerating the convergence to optimal solutions and demonstrating through simulation results that the proposed method, dynamic reward-enhanced Q-learning (DRQL), outperforms existing approaches in terms of achieving convergence to an optimal action strategy more efficiently, requiring less time and improving path exploration with fewer steps and higher average rewards.

Design/methodology/approach

The design adopted in this paper to achieve its purposes involves the following key components: (1) Combination of Q-learning and dynamic reward: the paper’s design integrates Q-learning, a popular reinforcement learning technique, with dynamic reward mechanisms. This combination forms the foundation of the approach. Q-learning is used to learn and update the robot’s action-value function, while dynamic rewards are introduced to guide the robot’s actions effectively. (2) Data accumulation during navigation: when a MR navigates through an unfamiliar environment, it accumulates experience data. This data collection is a crucial part of the design, as it enables the robot to learn from its interactions with the environment. (3) Dynamic reward integration: dynamic reward mechanisms are integrated into the Q-learning process. These mechanisms provide feedback to the robot based on its actions, guiding it to make decisions that lead to better outcomes. Dynamic rewards help reduce blind exploration, which can be time-consuming and inefficient and promote faster convergence to optimal solutions. (4) Simulation-based evaluation: to assess the effectiveness of the proposed approach, the design includes a simulation-based evaluation. This evaluation uses simulated environments and scenarios to test the performance of the DRQL method. (5) Performance metrics: the design incorporates performance metrics to measure the success of the approach. These metrics likely include measures of convergence speed, exploration efficiency, the number of steps taken and the average rewards obtained during the robot’s navigation.

Findings

The findings of the paper can be summarized as follows: (1) Efficient path planning and obstacle avoidance: the paper’s proposed approach, DRQL, leads to more efficient path planning and obstacle avoidance for MR. This is achieved through the combination of Q-learning and dynamic reward mechanisms, which guide the robot’s actions effectively. (2) Faster convergence to optimal solutions: DRQL accelerates the convergence of the MR to optimal action strategies. Dynamic rewards help reduce the need for blind exploration, which typically consumes time and this results in a quicker attainment of optimal solutions. (3) Reduced exploration time: the integration of dynamic reward mechanisms significantly reduces the time required for exploration during navigation. This reduction in exploration time contributes to more efficient and quicker path planning. (4) Improved path exploration: the results from the simulations indicate that the DRQL method leads to improved path exploration in unknown environments. The robot takes fewer steps to reach its destination, which is a crucial indicator of efficiency. (5) Higher average rewards: the paper’s findings reveal that MR using DRQL receive higher average rewards during their navigation. This suggests that the proposed approach results in better decision-making and more successful navigation.

Originality/value

The paper’s originality stems from its unique combination of Q-learning and dynamic rewards, its focus on efficiency and speed in MR navigation and its ability to enhance path exploration and average rewards. These original contributions have the potential to advance the field of mobile robotics by addressing critical challenges in path planning and obstacle avoidance.

Details

Applied Computing and Informatics, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2634-1964

Keywords

Article
Publication date: 10 February 2023

Chenchen Hua, Zhigeng Fang, Yanhua Zhang, Shujun Nan, Shuang Wu, Xirui Qiu, Lu Zhao and Shuyu Xiao

This paper aims to implement quality of service(QoS) dynamic optimization for the integrated satellite-terrestrial network(STN) of the fifth-generation Inmarsat system(Inmarsat-5).

Abstract

Purpose

This paper aims to implement quality of service(QoS) dynamic optimization for the integrated satellite-terrestrial network(STN) of the fifth-generation Inmarsat system(Inmarsat-5).

Design/methodology/approach

The structure and operational logic of Inmarsat-5 STN are introduced to build the graphic evaluation and review technique(GERT) model. Thus, the equivalent network QoS metrics can be derived from the analytical algorithm of GERT. The center–point mixed possibility functions of average delay and delay variation are constructed considering users' experiences. Then, the grey clustering evaluation of link QoS is obtained combined with the two-stage decision model to give suitable rewards for the agent of GERT-Q-learning, which realizes the intelligent optimization mechanism under real-time monitoring data.

Findings

A case study based on five time periods of monitoring data verifies the adaptability of the proposed method. On the one hand, grey clustering based on possibility function enables a more effective measurement of link QoS from the users' perspective. On the other hand, the method comparison intuitively shows that the proposed method performs better.

Originality/value

With the development trend of integrated communication, STN has become an important research object in satellite communications. This paper establishes a modular and extensible optimization framework whose loose coupling structure and flexibility facilitate management and development. The grey-clustering-based GERT-Q-Learning model has the potential to maximize design and application benefits of STN throughout its life cycle.

Details

Grey Systems: Theory and Application, vol. 13 no. 3
Type: Research Article
ISSN: 2043-9377

Keywords

Content available
Article
Publication date: 15 November 2022

Matthew Powers and Brian O'Flynn

Rapid sensitivity analysis and near-optimal decision-making in contested environments are valuable requirements when providing military logistics support. Port of debarkation…

Abstract

Purpose

Rapid sensitivity analysis and near-optimal decision-making in contested environments are valuable requirements when providing military logistics support. Port of debarkation denial motivates maneuver from strategic operational locations, further complicating logistics support. Simulations enable rapid concept design, experiment and testing that meet these complicated logistic support demands. However, simulation model analyses are time consuming as output data complexity grows with simulation input. This paper proposes a methodology that leverages the benefits of simulation-based insight and the computational speed of approximate dynamic programming (ADP).

Design/methodology/approach

This paper describes a simulated contested logistics environment and demonstrates how output data informs the parameters required for the ADP dialect of reinforcement learning (aka Q-learning). Q-learning output includes a near-optimal policy that prescribes decisions for each state modeled in the simulation. This paper's methods conform to DoD simulation modeling practices complemented with AI-enabled decision-making.

Findings

This study demonstrates simulation output data as a means of state–space reduction to mitigate the curse of dimensionality. Furthermore, massive amounts of simulation output data become unwieldy. This work demonstrates how Q-learning parameters reflect simulation inputs so that simulation model behavior can compare to near-optimal policies.

Originality/value

Fast computation is attractive for sensitivity analysis while divorcing evaluation from scenario-based limitations. The United States military is eager to embrace emerging AI analytic techniques to inform decision-making but is hesitant to abandon simulation modeling. This paper proposes Q-learning as an aid to overcome cognitive limitations in a way that satisfies the desire to wield AI-enabled decision-making combined with modeling and simulation.

Details

Journal of Defense Analytics and Logistics, vol. 6 no. 2
Type: Research Article
ISSN: 2399-6439

Keywords

Article
Publication date: 11 July 2023

Yuze Shang, Fei Liu, Ping Qin, Zhizhong Guo and Zhe Li

The goal of this research is to develop a dynamic step path planning algorithm based on the rapidly exploring random tree (RRT) algorithm that combines Q-learning with the…

Abstract

Purpose

The goal of this research is to develop a dynamic step path planning algorithm based on the rapidly exploring random tree (RRT) algorithm that combines Q-learning with the Gaussian distribution of obstacles. A route for autonomous vehicles may be swiftly created using this algorithm.

Design/methodology/approach

The path planning issue is divided into three key steps by the authors. First, the tree expansion is sped up by the dynamic step size using a combination of Q-learning and the Gaussian distribution of obstacles. The invalid nodes are then removed from the initially created pathways using bidirectional pruning. B-splines are then employed to smooth the predicted pathways.

Findings

The algorithm is validated using simulations on straight and curved highways, respectively. The results show that the approach can provide a smooth, safe route that complies with vehicle motion laws.

Originality/value

An improved RRT algorithm based on Q-learning and obstacle Gaussian distribution (QGD-RRT) is proposed for the path planning of self-driving vehicles. Unlike previous methods, the authors use Q-learning to steer the tree's development direction. After that, the step size is dynamically altered following the density of the obstacle distribution to produce the initial path rapidly and cut down on planning time even further. In the aim to provide a smooth and secure path that complies with the vehicle kinematic and dynamical restrictions, the path is lastly optimized using an enhanced bidirectional pruning technique.

Details

Engineering Computations, vol. 40 no. 5
Type: Research Article
ISSN: 0264-4401

Keywords

Article
Publication date: 28 March 2008

Daniel Lockery and James F. Peters

The purpose of this paper is to report upon research into developing a biologically inspired target‐tracking system (TTS) capable of acquiring quality images of a known target…

Abstract

Purpose

The purpose of this paper is to report upon research into developing a biologically inspired target‐tracking system (TTS) capable of acquiring quality images of a known target type for a robotic inspection application.

Design/methodology/approach

The approach used in the design of the TTS hearkens back to the work on adaptive learning by Oliver Selfridge and Chris J.C.H. Watkins and the work on the classification of objects by Zdzislaw Pawlak during the 1980s in an approximation space‐based form of feedback during learning. Also, during the 1980s, it was Ewa Orlowska who called attention to the importance of approximation spaces as a formal counterpart of perception. This insight by Orlowska has been important in working toward a new form of adaptive learning useful in controlling the behaviour of machines to accomplish system goals. The adaptive learning algorithms presented in this paper are strictly temporal difference methods, including Q‐learning, sarsa, and the actor‐critic method. Learning itself is considered episodic. During each episode, the equivalent of a Tinbergen‐like ethogram is constructed. Such an ethogram provides a basis for the construction of an approximation space at the end of each episode. The combination of episodic ethograms and approximation spaces provides an extremely effective means of feedback useful in guiding learning during the lifetime of a robotic system such as the TTS reported in this paper.

Findings

It was discovered that even though the adaptive learning methods were computationally more expensive than the classical algorithm implementations, they proved to be more effective in a number of cases, especially in noisy environments.

Originality/value

The novelty associated with this work is the introduction of an approach to adaptive adaptive learning carried out within the framework of ethology‐based approximation spaces to provide performance feedback during the learning process.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 1 no. 1
Type: Research Article
ISSN: 1756-378X

Keywords

Article
Publication date: 8 September 2022

Amir Hosein Keyhanipour and Farhad Oroumchian

User feedback inferred from the user's search-time behavior could improve the learning to rank (L2R) algorithms. Click models (CMs) present probabilistic frameworks for describing…

Abstract

Purpose

User feedback inferred from the user's search-time behavior could improve the learning to rank (L2R) algorithms. Click models (CMs) present probabilistic frameworks for describing and predicting the user's clicks during search sessions. Most of these CMs are based on common assumptions such as Attractiveness, Examination and User Satisfaction. CMs usually consider the Attractiveness and Examination as pre- and post-estimators of the actual relevance. They also assume that User Satisfaction is a function of the actual relevance. This paper extends the authors' previous work by building a reinforcement learning (RL) model to predict the relevance. The Attractiveness, Examination and User Satisfaction are estimated using a limited number of the features of the utilized benchmark data set and then they are incorporated in the construction of an RL agent. The proposed RL model learns to predict the relevance label of documents with respect to a given query more effectively than the baseline RL models for those data sets.

Design/methodology/approach

In this paper, User Satisfaction is used as an indication of the relevance level of a query to a document. User Satisfaction itself is estimated through Attractiveness and Examination, and in turn, Attractiveness and Examination are calculated by the random forest algorithm. In this process, only a small subset of top information retrieval (IR) features are used, which are selected based on their mean average precision and normalized discounted cumulative gain values. Based on the authors' observations, the multiplication of the Attractiveness and Examination values of a given query–document pair closely approximates the User Satisfaction and hence the relevance level. Besides, an RL model is designed in such a way that the current state of the RL agent is determined by discretization of the estimated Attractiveness and Examination values. In this way, each query–document pair would be mapped into a specific state based on its Attractiveness and Examination values. Then, based on the reward function, the RL agent would try to choose an action (relevance label) which maximizes the received reward in its current state. Using temporal difference (TD) learning algorithms, such as Q-learning and SARSA, the learning agent gradually learns to identify an appropriate relevance label in each state. The reward that is used in the RL agent is proportional to the difference between the User Satisfaction and the selected action.

Findings

Experimental results on MSLR-WEB10K and WCL2R benchmark data sets demonstrate that the proposed algorithm, named as SeaRank, outperforms baseline algorithms. Improvement is more noticeable in top-ranked results, which usually receive more attention from users.

Originality/value

This research provides a mapping from IR features to the CM features and thereafter utilizes these newly generated features to build an RL model. This RL model is proposed with the definition of the states, actions and reward function. By applying TD learning algorithms, such as the Q-learning and SARSA, within several learning episodes, the RL agent would be able to learn how to choose the most appropriate relevance label for a given pair of query–document.

Details

Data Technologies and Applications, vol. 57 no. 4
Type: Research Article
ISSN: 2514-9288

Keywords

Open Access
Article
Publication date: 18 July 2022

Youakim Badr

In this research, the authors demonstrate the advantage of reinforcement learning (RL) based intrusion detection systems (IDS) to solve very complex problems (e.g. selecting input…

1211

Abstract

Purpose

In this research, the authors demonstrate the advantage of reinforcement learning (RL) based intrusion detection systems (IDS) to solve very complex problems (e.g. selecting input features, considering scarce resources and constrains) that cannot be solved by classical machine learning. The authors include a comparative study to build intrusion detection based on statistical machine learning and representational learning, using knowledge discovery in databases (KDD) Cup99 and Installation Support Center of Expertise (ISCX) 2012.

Design/methodology/approach

The methodology applies a data analytics approach, consisting of data exploration and machine learning model training and evaluation. To build a network-based intrusion detection system, the authors apply dueling double deep Q-networks architecture enabled with costly features, k-nearest neighbors (K-NN), support-vector machines (SVM) and convolution neural networks (CNN).

Findings

Machine learning-based intrusion detection are trained on historical datasets which lead to model drift and lack of generalization whereas RL is trained with data collected through interactions. RL is bound to learn from its interactions with a stochastic environment in the absence of a training dataset whereas supervised learning simply learns from collected data and require less computational resources.

Research limitations/implications

All machine learning models have achieved high accuracy values and performance. One potential reason is that both datasets are simulated, and not realistic. It was not clear whether a validation was ever performed to show that data were collected from real network traffics.

Practical implications

The study provides guidelines to implement IDS with classical supervised learning, deep learning and RL.

Originality/value

The research applied the dueling double deep Q-networks architecture enabled with costly features to build network-based intrusion detection from network traffics. This research presents a comparative study of reinforcement-based instruction detection with counterparts built with statistical and representational machine learning.

Article
Publication date: 8 August 2016

Chethan Upendra Chithapuram, Aswani Kumar Cherukuri and Yogananda V. Jeppu

The purpose of this paper is to develop a new guidance scheme for aerial vehicles based on artificial intelligence. The new guidance scheme must be able to intercept maneuvering…

Abstract

Purpose

The purpose of this paper is to develop a new guidance scheme for aerial vehicles based on artificial intelligence. The new guidance scheme must be able to intercept maneuvering targets with higher probability and precision compared to existing algorithms.

Design/methodology/approach

A simulation setup of the aerial vehicle guidance problem is developed. A model-based machine learning technique known as Q-learning is used to develop a new guidance scheme. Several simulation experiments are conducted to train the new guidance scheme. Orthogonal arrays are used to define the training experiments to achieve faster convergence. A well-known guidance scheme known as proportional navigation guidance (PNG) is used as a base model for training. The new guidance scheme is compared for performance against standard guidance schemes like PNG and augmented proportional navigation guidance schemes in presence of sensor noise and computational delays.

Findings

A new guidance scheme for aerial vehicles is developed using Q-learning technique. This new guidance scheme has better miss distances and probability of intercept compared to standard guidance schemes.

Research limitations/implications

The research uses simulation models to develop the new guidance scheme. The new guidance scheme is also evaluated in the simulation environment. The new guidance scheme performs better than standard existing guidance schemes.

Practical implications

The new guidance scheme can be used in various aerial guidance applications to reach a dynamically moving target in three-dimensional space.

Originality/value

The research paper proposes a completely new guidance scheme based on Q-learning whose performance is better than standard guidance schemes.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 9 no. 3
Type: Research Article
ISSN: 1756-378X

Keywords

Article
Publication date: 14 February 2022

Kai Xiong, Chunling Wei and Peng Zhou

This paper aims to improve the performance of the autonomous optical navigation using relativistic perturbation of starlight, which is a promising technique for future space…

Abstract

Purpose

This paper aims to improve the performance of the autonomous optical navigation using relativistic perturbation of starlight, which is a promising technique for future space missions. Through measuring the change in inter-star angle due to the stellar aberration and the gravitational deflection of light with space-based optical instruments, the position and velocity vectors of the spacecraft can be estimated iteratively.

Design/methodology/approach

To enhance the navigation performance, an integrated optical navigation (ION) method based on the fusion of both the inter-star angle and the inter-satellite line-of-sight measurements is presented. A Q-learning extended Kalman filter (QLEKF) is designed to optimize the state estimate.

Findings

Simulations illustrate that the integrated optical navigation outperforms the existing method using only inter-star angle measurement. Moreover, the QLEKF is superior to the traditional extended Kalman filter in navigation accuracy.

Originality/value

A novel ION method is presented, and an effective QLEKF algorithm is designed for information fusion.

Details

Aircraft Engineering and Aerospace Technology, vol. 94 no. 6
Type: Research Article
ISSN: 1748-8842

Keywords

1 – 10 of 151