Implementation of AES Algorithm Using Verilog

Cryptography is very important now-a-days for data security and integrity as the ecommerce and internet applications has increased. But, it has least importance in many cases because of extra memory and other requirements needed for the implementation. The main aim of this work is to implement Advanced Encryption Standard (AES) Encryption using Verilog. To protect data like electronics, cryptographic algorithms are used. The digital information can be encrypted and decrypted by the block cipher of AES algorithm. It can be implemented with the key length 128, 192, 256 bits. Each round of encryption associated with delay can be reduced by AES


INTRODUCTION
In this technical era, we have seen a drastic growth in telecommunication. Through internet anybody can access the data present in computer in any corner of the world. Many activities like e-commerce, data sharing etc. will happen through internet. So, data authentication and secured communication become very important. Here the cryptography will play an important role. Now a day's many data encryption algorithms are available. Digital information can encrypt and decrypt by using block cipher by using cryptographic keys of 128, 192 and 256 bits [1].
The main objective of this project is to implement AES algorithm using Verilog and to give optimum circuit with clock frequency, path delay, time required to generate keys and decoding the data [2]. For the secure communication, cryptography is useful in presence of third parties. It mainly deals with the analysing protocols and overcome the influence of information on security. The disciple like mathematics, computer science and electrical engineering inspected by modern cryptography [3].

Problem statement
Considering the history of communication, it is not surprising that security has taken the last seat. Because implementing security mechanisms in distributed applications creates extra overheads like more memory, handshaking, more CPU time for calculating keys etc. [4].
The main aim of this work is to provide a solution for the above stated problem with the help of Verilog code using Xilinx. In real time applications the software code takes lot of time to execute the same code again and again [5]. But particular hardware for a repeating code reduces the execution time. So, this work provides solution to the above-mentioned real-time problem.

Methodology
The methodology involved in this system is Verilog code. To support both analog and digital circuit designing, Xilinx provides analog and digital platform. It is interesting to note that any encryption algorithm works in a digital environment and all the blocks in the system will handle digital data.

Advanced Encryption Standard (AES)
AES is a type of cryptography algorithm. It performs the encryption and decryption operation. The input information is known as plaintext and encrypted form is called as ciphertext. Ciphertext contains the plaintext information, but it is not in a readable form to humans. Encryption procedure is varied to one algorithm to another algorithm. Without the key ciphertext cannot be used to encrypt or decrypt. AES has proven to be more efficient than its encryption processors. AES is mainly used in voice communication, network applications, vertical private network, secured socket layer (SSL) [6][7].

The AES Algorithm
The AES algorithm has mainly 3 encryption modes: 128 bits, 192 bits, 256 bits. Each encryption mode has a corresponding number of rounds Nr based on length of Nk words. The state array block size term Nb is constant for all encryption modes. Each state has 4 words, each words has 4 bytes.

Encryption Process:
Both the encryption and decryption process consist of a number of various transformations. The number of rounds depends on the length of the key used for the encryption process and decryption process. For 128 bit, state array requires 10 rounds of iteration for ciphertext conversion (Nr=10). Each Nr-1 has 4 different transformation i.e, Subbyte(), ShiftRow(), MixColulnm(), AddRound key().
In Subbyte() Transformation, only a nonlinear component is used in the entire process. Eachbyte operates independently. Each byte is replaced with S-box values. This S-box invert is generated by taking the multiplicative inverse in the finite field GF 19 (2^8) with irreducible polynomial m(x) = x^8+x^4+x^3+x^1. After this apply the Affine Transformation over GF (2^8).
In ShiftRow() Transformation, it cyclically shifts bytes of rows in state array. Each row is shifted by specified different offset values. This operation is similar in decryption process except to different rotational offset as shown in figure 2. In Mixcolumn() Transformation, it operates column by column on state array treating each 4 term polynomial. Each column represented as polynomials over GF (2^8) and multiplied by modulo x^4+1 with fixed polynomial as follows: In AddRoundKey() Transformation, the Round key is XORed with the output of Mixcolumn block. Each Addroundkey consists of Nb words from the key expansion unit. These Nb words are added into the column of state array. This is similar in decryption process. In any expansion units a previous Round key,generates a 4 word array generated as a constant that changes each round and a service of S-box values replacement for each 32 bit word of key. The key scheduling unit generates an overall Nb (Nr+1) words as shown in figure 3.

Decryption Process:
For decryption, the same process in the reverse manner. It taking ciphertext as a input and plaintext as an output. It also contains 10 rounds of process. Each round has 4 steps similar to encryption process. AddRound key is same for both encryption and decryption and remaining steps is just inverse in decryption i.e. Inverse Subbytes() , InverseShiftRows() and Inverse Mixcolumn().

Verilog Implementation
Verilog HDL is used because it is easier to explore different design options, flexibility to exchange among environments. The implementation code is pure without changing the design Verilog code that could easily be implemented on other devices, with. We have used mainly three tools to implement the code Notepad++, Questasim, Xilinx Synthesis and Simulation Tools (ISE14.7). The goal of design implementation is speed optimization, keeping other constraints as minimum as possible. We have implemented CBC Mode of AES Rijndael Algorithm. 20

Tool Details
The editor used for writing the design codes is Notepad++. Questasim 10.0 is used for debugging and optimizing the design codes and simulating. Xilinx ISE 14.7 is used for synthesizing the design to the Zed ( Zynq Evaluation and Development) Board as shown in figure 4. The code implementation results are based on Questasim 10.0 simulation results.

Encryption Pre Round
The Simple bitwise XOR operation are performed in Pre Round Operation. .Because of fully parallel architecture output of this stage is registered.

Encryption Inner Rounds
There are 10 rounds as per 128-bit AES Algorithm. Every round includes 4 sub modules SubByte() Transformation, ShiftRow() Transformations, MixColoumns() Transformation and AddRoundkey() Transformation. Inner round includes 9 rounds remaining one round is implemented as last Round. For implementing 10 rounds, if we instantiate each module 10 times, the overall area requirement increases 10 times and implementing it with Zed Board (XC7Z020) resources utilization exceeds by 100% for each encryption and decryption. .  To overcome this problem we used the concept of reusing the same modules as many times they are required. We used state diagram at the top level [middleround()], which uses the same module each times and registered output is sent to the next state as input. Because of this process, the IOB utilization is reduced to 5% and Slices utilization to 43% as shown in figure 5 and 6.

Unit of Key Expansion
From the original input round key, we are generating all round keys. In encryption will the original key will be last group of the generated key i.e. Expansion keys in case of decryption. Key Expansion module includes sub modules rotate word() Transformation, Sub Word() Transformation, Round constant XORing(), and key round Module(). SubWord() Transformation (key s-box) it is same as SubByte() Transformation module only the difference is that it processes bits. Round Constant () Transformation (key rcon()). Predefined round 32 bits constants of GF are fixed for each round. A 4-bit round number and 32-bit output of Sub Word () Transformation is taken as input. Values corresponding to round key are fetched from ROM and XORed with keysbox().
AddRound() Transformation (key round()). Key round is XORed with previous round keys and output of key rcon() as shown in figure 7.

Decryption
The Inverse Ciphertex transformations can be performed by the reverse order of the Ciphertext transformation.The transformations used in the Inverse Ciphertext are: InvShiftRow(),InvSubByte(), InvMixColumn() and AddRoundKey().
As the decryption is inverse of encryption the operations are performed in the inverse manner of encryption. The last round of the encryption becomes the first round indecryption Process and the expanded key generated in Key Expansion () is feedback instead of cipher key as shown in figure 8 and 9. Test Case2 -Encryption: The AES Module inputs are driven for encryption and expected outputs are obtained. All the sequences below are in hexadecimal.

Plaintext: 11111111111111111111111111111111
CipherKey: 3C4FCF098815F7ABA5D2AE2816157E2B Test Case3 -Decryption: The AES Module inputs are driven for decryption and expected outputs are obtained. All the sequences below are in hexadecimal.

Synthesis report
Overall implementation of parallel AES

Conclusion
Each round of encryption associated with the delay that can be reduced by the parallel design of AES encryption that operates in higher frequency than non-pipelined, non-parallel design. In time critical encryption applied this type of encryption Sub title in message throughput. The hardware implementation of AES provides faster speed than software implementation like secure key in encryption and higher throughput.
The work has been extended in order to increase the security for more severe attacks since the encryption time has been reduced.
There has been further scope to optimize the utilization of resources. The implementation can be further improved to achieve more efficient usage of the resources and increase the maximum clock frequency. The key length can be reduced, maintaining the same security, in order to optimize there source utilization. The few gaps have been covered but still a lot can be done to achieve the security of data along with the optimization of resources.