Implementation of new hybrid lightweight cryptosystem

Embeddedsystems,InternetofThings(IoT)andmobilecomputingdevicesareusedinvariousdomainswhichincludepublic-privateinfrastructure,industrialinstallationandcriticalenvironment.Generally,information handledbythesedevicesisprivateandcritical.Therefore,itmustbeappropriatelysecuredfromdifferentattacksandhackers.Lightweightcryptographyisanaspiringfieldwhichinvestigatestheimplementationof cryptographicprimitivesandalgorithmsforresourceconstraineddevices.Inthispaper,anewcompacthybridlightweightencryptiontechniquehasbeenproposed.Proposedtechniqueusesthefastestbitpermutation instructionPERMSwithS-boxofPRESENTblockcipherfornon-linearity.Anarbitraryn-bitpermutationisperformedusingPERMSinstructioninlessthanlog(n)numberofinstructions.Thisnewhybridsystemhas beenanalyzedforsoftwareperformanceonAdvancedRISCMachine(ARM)andIntelprocessorwhereasCadenstoolisusedtoanalyzethehardwareperformance.Theresultoftheproposedtechniqueisimprovedby thefactorofeightascomparedtothePRESENT-GRPhybridblockcipher.Moreover,PERMSinstructionbitpermutationpropertiesresultaverygoodavalancheeffectandcompactimplementationinthebothhardware andsoftwareenvironment.


Introduction
The increasing use of mobile computing devices in the field of Information and Communication Technology (ICT) has raised concerns about security. Lightweight cryptography has made more overdrive from various cipher proposals such as PRESENT [15], CLEFIA [16], KATAN [19], HEIGHT [18], SIMON/SPECK [20], Fantomas [21], KLEIN [22] and many other ciphers. Lightweight cryptography aims to offer sufficient security level with an optimum use of resources [11][12][13][14]. The optimum use of resources includes area, battery, CPU, memory and power. Among these, power consumption is one of the crucial factors on which lightweight ciphers need to work. Power consumption is strongly dependent on Gate Equivalents (GEs) and CPU cycles. GE is focused by hardware implementation and CPU cycles are focused by software implementation of a lightweight cipher. However, security properties should not be compromised for GEs and CPU cycles. For radio-frequency identification (RFID) tags, GEs are normally 1000-10000 but only 300-2100 GEs allotted for security purpose [23]. In lightweight cryptography, researchers adopts different approaches to develop an lightweight cipher such as modifying existing cipher, optimizing existing cipher or developing an entire new cipher. The third approach was explored in 2007 [15] where an entirely new cipher was designed, and it is known as the PRESENT block cipher. PRESENT block cipher is remain inspiration for many lightweight cryptography researchers, who have added their efforts to make it more better, which is discussed in Section 3. There are many ciphers where entirely new ciphers have been designed such as TEA [27], LED, ZORRO [29], Hummingbird [30], KATAN and KTANTAN [19], Halka [31], TWINE [32], RECTANGLE [28], GOST [33], PRINT [34], PUFFIN [35], Fantomas [36], Midori [37], Twofish [39] and mCrypton [38]. Block ciphers that have used permutation are summarized in Table 1. below.
Based on the algorithm structure, block ciphers are classified into Substitution Permutation Network (SPN), Feistel network and stream and Lai-Massey.
We introduce a new hybrid lightweight block cipher which supports portable and secure software as well as hardware implementation of PRESENT block cipher. A permutation layer of PRESENT block cipher is modified and implemented with PERMS bit permutation instruction to improve the performance. A detailed study is carried out with the help of properties and security aspects for the bit permutation instructions like SWPERM [5], GRP [4], PERMS [1] and OMFLIP [7,9] in the next section. As compared to other bit permutation instructions, PERMS provides an efficient bit permutation instruction in terms of cryptographic properties, CPU cycles and total number of gate counts. PERMS instruction is complex in a nature that makes it more suitable for cryptographic environment. PERMS instruction is most appropriate for cryptographic functions such as encryption and hash techniques, especially applications where continuous encryption-decryption operations are required to be performed. Linear and differential cryptanalysis properties of PERMS instructions are elaborated in the Section 6.
The main aim of this paper is to present the results of a compact hybrid cipher with ample security for resource constrained devices. The proposed block cipher is implemented and tested on the ARM and Intel processors. The experimentations are carried out on ARM processor Cortex-M and more powerful Cortex-A series processors. The performance parameters of the proposed cipher are compared with other existing ciphers such as PRESENT, CLEFIA and PRESENT-GRP [2]. We have obtained better results which are shown in Section 5.

Bit permutations
Let B is any arbitrary bit string of length n and (B nÀ1 , B nÀ2 Á Á ÁB 1 , B 0 ) 2 where, Let P is a sequence of the form (P nÀ1 , P nÀ2 ,Á Á ÁP 2 , P 1 , P 0 ) comprising random permutation from 0 to n À 1.
The permutation of B with P is given by (bp nÀ1 , bp nÀ2 ,Á Á Ábp 2 , bp 1 , bp 0 ) 2 . Many existing Instruction Set Architectures (ISA) provide limited support for performing such arbitrary bit permutations compared to the regular permutation. There are different alternative ways to perform bit permutation as discussed below:

Logical operations
In this operation, different logical operations are used to perform the final bit permutation. The use of each logical operation used to perform bit permutation is given as follows: 1. AND operation: To extract the required bits from n bits which have to be selected by a mask.

Shift operation:
To shift bits to their new position.
3. OR operation: To combine with previously permuted bits.
This technique requires many operations that lead to the increase in the number of instructions and memory required [3].

Lookups Table
In this method, the input bit stream is partitioned into multiple sections. Then, the bits in each section are permuted simultaneously with the help of lookup table. Finally, the result of each section is combined to produce the final results of permutations. The required instruction count solely depends on the number of sections formed. Less number of sections needs a few instructions but it increases memory requirement. Apart from these two basic methods, bit permutation can be accelerated with the help of certain instructions like BFLY-IBFLY [6], PPERM-PPERM3R, CROSS, GRP, OMFLIP and SWPERM-SIEVE. All these instructions are compared in Table 2 against various parameters such as the number of instructions required, memory requirement, CPU cycles, Time complexity and mapping. Among these instructions, GRP instruction is used [2] to build a hybrid lightweight encryption. But after detailed study, we found another better instruction that is PERMS instruction. GRP performs 128-bit arbitrary permutation using 64-bit instructions set in 16 instructions. Among these 16 instructions, two are Shift Right Pair (SHRP) instructions which are available only in IA-64 and PA-RISC processors. However, other processors do not have SHRP instruction in their instruction set, thus on other processors GRP needs 4 instructions to perform the same operation. Therefore, GRP utilize total 22 instructions on other processors whereas PERMS instruction requires only 18 instructions to perform 128-bit permutation using 64 bit instructions on any processor. Permutation with repetitions is referred to as 'Mappings'. In this a bit in the input bits can be replicated and can appear at multiple locations in the output. Currently, none of the GRP, CROSS and OMFLIP permutation instructions supports mappings. Only PPERM and PERMS instructions support Mappings. Therefore, it motivates to use PERMS for lightweight cryptography. Table 2 shows a detailed comparison of different bit permutation instructions.

Lightweight cryptography
Many devices have started to become pervasive computing devices which have in built embedded computing power. There is a huge adoption of IoT devices which leads the importance of lightweight cryptography. Lightweight cryptography has become an active research area for researchers since last two decades. For hardware oriented lightweight cryptography techniques performance parameters are area, GEs and power consumption. However, software based cryptography techniques tackles the memory usage, CPU usage and energy constraints. Standard algorithms like AES [24], DES [25], T-DES [26] and SHA-1 have proven their security very well and thus they are used extensively in many standalone pervasive computing as well as mobile devices. However, standard algorithms consumes system resources such as memory or CPU cycles at a very high level which makes them unsuitable for resource constrained devices. Hence, it leads to thrust for a lightweight cipher. In the last two decades, many researchers have come up with their lightweight ciphers, against which different attacks have been proven [11]. PRESENT and CLEFIA are two algorithms which are accepted as International Organization for Standardization (ISO) lightweight cryptography standards ISO/IEC (29192-2P:2012). CLEFIA was developed by Sony in 2007 and it targeted to be used in Digital right managements. PRESENT is developed by the Orange Labs (France), Ruhr University Bochum (Germany) and the Technical University of Denmark in 2007. It is best known for its compact size. PRESENT cipher is used as a benchmark by many researchers.
Various modifications have been carried out to improve hardware as well as software implementation results of a PRESENT cipher. Author Poshmann has implemented PRESENT cipher on different processors ranging from 4-bit to 64-bit [44]. PRESENT-GRP [2] is a hybrid design that has replaced the permutation layer of the original PRESENT cipher by GRP. PRESENT cipher and other twelve block ciphers are implemented and optimized on three different platforms such as 8-bit ATmega, 16-bit MSP430 and 32-bit ARMCortex-M3 [13]. Benadjila et al. provide PRESENT and many other block ciphers implementation to speed up through table based, vector instruction and bit sliced implementations on Intel x86 architectures [45]. It has been concluded that, the bit sliced implementations might not be useful when the amount of data to be enciphered at a time is small. The compatibility between the server and the client is one of the issue arises in bit sliced implementation.  There is another work presented by Tiago et al. [46], where bit sliced implementation and masking technique is used to prevent side channel attacks. In bit sliced implementation, constant time implementation can be achieved that helps to protect against timing attacks. They have modified PRESENT cipher in two ways; first permutation P is decomposed in P 0 and P 1 in alternative rounds and second S-box is implemented through bit-sliced implementation. However, permutation P is applied to some of the round keys and they have restricted their implementation to key size of 80 only.
The prime objective of this research work is to improve the PRESENT cipher hardware based performance. Although our aim is to improve hardware implementation by reducing the GE, we have also achieved satisfactory results for software based implementation. The proposed cipher PRESENT-PERMS is not having any specific processor restriction like GRP instruction or bit sliced implementation. Bit sliced or vector instruction based implementation is supported by only certain higher end processors.
As mentioned in Table 1 many lightweight ciphers uses permutation as their linear layer. Among these PRESENT and CLEFIA are selected to discuss and implement because they are the ISO/IEC standards and both algorithms have deeply been tested against various attacks earlier. PRESENT and CLEFIA lightweight ciphers have proved strong resistance against linear and differential cryptanalysis [15][16][17]. PRESENT cipher can be optimized with PERMS permutation instruction in software environment. The detailed implementation and analysis of proposed cipher is presented in the next section.

Proposed hybrid cipher implementation
The proposed lightweight hybrid cipher (PRESENT-PERMS) block diagram is as shown in Figure 1. A detailed study of bit permutation instructions is carried out and it has been found that PERMS instruction has a greater impact on other instructions. PERMS instruction is superior to GRP instruction in terms of CPU cycles. Different ciphers such as PRESENT, CLEFIA, PRESENT-GRP and proposed hybrid (PRESENT-PERMS) cipher are implemented and tested on 32-bit ARM processor in 'C' language. S-box of PRESENT cipher has a good number of active S-boxes and maximal bias for the linear approximation which makes it resistant against linear and differential cryptanalysis.

Implementation of PERMS instruction
In PERMS instruction, two different algorithms are used to calculate control bits and to perform arbitrary permutation. 'C' language code to calculate control bits is presented below: 4.1.1 'C' language code to generate control bits.

Array input1[ ] and input2[ ] are used as an input array and array control_bits [ ] is used to hold control bits. Array input1 [ ] holds monotonically increasing sequence of integers of a size n and array input2[ ] is a given permutation array.
4.1.2 'C' language code to perform arbitrary permutation. To perform arbitrary permutation, control bits array control_bits [ ] and sorted array pp[ ] are used as an inputs. Following given 'C' code generates final permutation array as an output: It is clear from the above algorithm that PERMS instruction requires only four instructions whereas GRP requires six instructions. The CPU cycles count for both instruction implemented in 'C' language is given in Table 3. Table 4 shows different parameters used for the implementation of the PRESENT-PERMS hybrid cipher for 64 and 128 bits. The different parameters such as block size, key size and number of rounds are considered for implementations.

GRP PERMS
No. of CPU cycles to perform 64bit permutation with control bits 245 180  Table 5, it can be noticed that PRESENT-PERMS require lowest GEs and energy per bit for encryption operation as compared to other ciphers. Many devices or applications more often use only encryption than encryption-decryption. For encryption-decryption operations PRESENT-PERMS GEs are less than CLEFIA, AES and PRESENT-GRP. Comparative results of the proposed hybrid cipher PRESENT-PERMS and other ciphers are shown in Table 6.

Software based evaluation
Permutation is a main building block for SPN based ciphers. Figure 2 shows a comparison of different permutations. It is observed that PERMS takes the least number of CPU cycles for 128bit permutation. PERMS, OMFLIP and GRP are themselves acting as P-Boxes (Permutation-Boxes  Table 5. Comparative results of the proposed hybrid cipher and other ciphers for the STM 90 nm. Table 6. Differential properties of PERMS. Hybrid lightweight cryptosystem algorithms. Software implementation is carried out on ARM cortex M3, ARM cortex-A15 and Intel i5 processor. CPU cycles are measured on ARM processor by accessing performance monitor control register. For compilation GCC compiler is used with O3 optimization level.
CPU cycles needed by the encryption operation and encryption-decryption both the operations are calculated. The CPU cycles required for the PRESENT-PERMS and other ciphers are as shown in Figure 3. PRESENT-PERMS needs the lowest CPU cycles for both encryption as well as encryption-decryption operations. For all the cipher algorithms CPU cycles count, lowest CPU cycle count is considered for the final resultant parameter. From Figure 2 it can be noticed that PRESENT-PERMS requires least CPU cycles whereas CLEFIA needs the highest CPU cycles.

Security analysis
Cryptanalysis for any lightweight block cipher is a vital step to be performed. Cryptanalysis helps us to know the relationship among the plain text, key and cipher text. It also aims to find some details about key or cipher text or both. The most popular attacks on block ciphers are differential and linear cryptanalysis. They have been described respectively in [40] and [41] deeply. Since their inception, a significant research has been carried out to show their relationship and to better solutions to thwart   them [42]. Matsui's branch and bound search algorithm [23] is one of the most powerful and classic methods for obtaining a security bound with respect to differential and linear attack. In differential cryptanalysis two plain texts are selected (x 1 and x 2 ) with some difference as ΔD. Further, it is measured by XOR operation and these two plain texts are converted into cipher texts, where difference between these two cipher texts is denoted as ΔC. The pair ðΔD; ΔCÞ is referred as differential characteristics. The ΔC is expected to be a larger value as compared to average probability. This section analyzes the bit permutation instruction PERMS's differential and linear cryptanalysis. Any permutation operation can described as R 5 P Opr Q, where, Opr is bit permutation operation that is PERMS operation, P is the bits to be permuted according to the Q, R is the output of a bit permutation operation.
For bit permutation there are three forms of differential characteristics most useful [43]: Type A: ðe s ; 0Þ → e t Type B: ð0; e t Þ → Δ Type C: ðe s ; e t Þ → Δ A differential characteristic of the bit permutation operation PERMS is described with a triplet ðΔp; ΔQÞ → ΔR with the probability p in which triplet holds true when the inputs are selected at arbitrary. In differential characteristics es specifies the n-bit word that has all bits zero excluding for a single one bit which is at position s. Type A specifies about how single bit at position s is shifted, when Q 1 5 Q 2 are randomly selected. Probability p decides how likely bit at position s in P is shifted to bit t in R. For Type B and C, diffusion effect is compared by calculating the Hamming weight of ΔR. A bigger hamming weight results in an avalanche effect. Table 7 shows differential characteristic of PERMS.
EðjΔjÞ denotes the expected value of a variation while input sequence is random. From the Table 7, it is cleared that PERMS achieves required avalanche effect.
A linear approximation of the permutation R 5 P Opr Q is a triplet (£p, £q, £r) where, £ is a binary vector and length £ and P is equal. Probability Pr holds on arbitrary inputs and described as: The linear approximation bias is jpr − 1=2j. Linear approximations are two types which are used to compare PERMS instruction as follows:  Table 7, in which b indicates bias.
In PERMS based bit permutation, input bit can move to any random position with equal probability 1/n. The bias is 1/ (2n) for all s and t. PRESENT cipher has achieved a very good  Hybrid lightweight cryptosystem security through its compact 4-bit S-boxes. The S-boxes used in PRESENT can be implemented with less GE and power consumption [15]. Novel properties of PRESENT Sboxes succeed to have an expected avalanche effect and enough number of active S-boxes. PRESENT cipher has strong linear and differential characteristics to stand against linear and differential cryptanalysis attack. PRESENT cipher is strongly defended against other attacks such as algebraic attack, structural attack and key schedule attacks [15].

7.
Conclusions PERMS bit permutation instruction performs arbitrary permutations in less than log (n) steps, compared to the all other bit permutation instructions. PERMS takes a less number of CPU cycles and GE which makes it faster and area efficient compared to GRP. Along with speed PERMS also provides a good security properties which makes it a good candidate for lightweight cryptography. However, for any block cipher there is a need of linear and nonlinear layers. Therefore, PRESENT cipher has been selected for hybrid design that is an ISO standard and proven cipher. In this hybrid crypto cipher, PERMS is used for a permutation layer and PRESENT S-box is used as a non-linear layer. PERMS instruction not only has good differential and linear cryptanalysis properties but also it is capable to prevent bruteforce attack. Furthermore software performance is evaluated on popular ARM and Intel processors. CPU cycles for PRESENT-PERMS implementation results in very less compared to other standard lightweight algorithms. To test proposed cipher for hardware performance, area and energy measurement is carried out on Cadens tool. The proposed cipher is tested for vector implementation on both ARM and Intel processors to speed up the S-box implementation which also provides timing attack protection. This hybrid block cipher proves to be very useful for lightweight cryptography community.