The growth of voluminous amount of data has led to less use of traditional storage and processing methods. Using of RDBMS in the current environment of unstructured or semi structured data has become less possible. Technologies like Hadoop that support storage of data of various type are becoming mandatory. Hadoop uses HDFS file storage method. Here raw data is stored and only when its needed processing is done to unstructured data. Blockchain technologies also let data to be stored in blocks in an effective manner. And as it uses hashing techniques, data is quite safe here. Hash functions create fixed length output (Baygin, N., et al. 2019).
RDBMS stands for Relational Database Management Systems. Data here are stored in data warehouses as Tables. Tables consists of data. Tables are of rows and columns and mostly they have numbers and text as data.
Table 1. Sample Table in RDBMS
Bigdata is about large voluminous data. Bigdata is of different data types, for e.g., texts, numbers, audio, video, social-media contents, blogs, surveys etc.… For greater variety, volumes and velocity of data bigdata technologies are useful. In this sort of scenarios, traditional methods find it difficult to manage. In the case of Hadoop environment data sets are distributed and file system used is HDFS (Hadoop Distributed File System).
Storage of data is taken care of HDFS. HDFS takes replicas of data and distributes them. The master Node takes care of parallel processing of data using Map reduce. In the Slave Nodes data are present.
HDFS stands for Hadoop Distributed File System. In Hadoop file system, even GB to TB of data can be stored. Data is divided and made available in n no. of nodes. To make sure that data is not lost replicates of data is made available in servers. Even on failure in one server, data can be fetched from another server.
HDFS can be used when we are dealing with a larger file which is not manageable in a traditional way. And HDFS are built on low-cost Hardware. To access the first data, it takes time in HDFS.
The name node acts as a master. It manages the metadata of all the files. As multiple clients can work parallelly, this single machine manages all the tasks needed to be done.
Data Node is used to store the data and retrieve them when needed.
This module helps in parallel processing. Speed is quite high. And the data is finally combined from many servers and the output is generated. All forms of data can be stored using Hadoop e.g., Data from web, social networks, images, video etc. (Acharjya, D. P., & Ahmed, K. 2016).
Mapper class takes the input, tokenizes it, maps and sorts it. The output of Mapper class is used as input by Reducer class, which in turn searches matching pairs and reduces them.
Fig 2. Big Data Technologies
Characteristics of Big Data
3 main Characteristics of Big Data are
1.Volume (Mukherjee, S., & Shaw, R. 2016).
Potential insights can be achieved only using large volume of useful data. In recent times most of the data are of unstructured or semi structured forms. There are Penta bytes of Data. File structures like HDFS helps in saving large volume of data.
2.Variety (Narayanan, U., et al. 2017)
RDBMS are good at handling structured data. But when it comes to unstructured form storing and processing becomes tedious. Data forms include textual data, audio forms, video forms, data from social media, surveys, email messages, newspaper articles, blog posts etc.
At a greater speed data are generated and they must be processed at a greater speed too. As big data are more of real time data, mostly they are often.
Block chain consists of n number of blocks and nodes. Data resides in the block. Nonce and Hash value is created for each node. The hash value makes sure that the data is safe inside the block. Decentralization approach is followed in Blockchain (Zheng, Z et al. 2017).
Blockchains are used in applications like IOT, Healthcare, Government sectors, Finance, cryptocurrency exchange, music royalties tracking etc.
Types of Blockchain
1.Public Blockchain: When a new block is created, anyone can add it to their blockchain after checking it for any tampering.
2.Private Blockchain: Private – Its for an organization. Within an organization anyone can add after checking it.
Encryption and Decryption
To make sure that data is generally safe in Network, encryption algorithms are used. Encryption algorithm changes plain text into cipher text. This cipher text makes sure that the original data cannot be read by a 3rd person. To get back the original text, decryption is performed.
Sample Encryption Algorithms:
Characters from plain text are substituted by some other character. Thereby can come up with a encrypted text.
E.g., Columnar Transposition. Here plain text is written horizontally and read vertically and it becomes the cipher text.
Cipher Text: hlaeoilh
Two main forms of Encryption
It is the commonly used one. Here there is only 1 key involved. Single key is used to encrypt the data as well as to decrypt the data. Sender and receiver use the same key here.
E.g., for Symmetric Encryption:
Also known as public key encryption. Here key pairs are used. Public key and Private key. Public key is used for encryption part and private key is used for decryption part.
As the private key is kept safe, unauthenticated access can be minimized. As the hacker might not be aware of private key, he will not be able to decrypt the data.
E.g., for Asymmetric Encryption:
Blockchain is simply a chain of blocks which can hold data (Zheng, Z et al. 2017). E.g., A block with details like From, To, amount etc. Each block consists of hash value of the earlier block too. If someone makes modifications to a block, it an identified from the next block too. So, its difficult for an intruder to affect a block. Plus, hashing algorithms like SHA 256 are so powerful that security is enhanced using those hashing algorithms.
Previous Hash Value
Previous Hash value
For e.g. If an attacker changes data of Block10, Hash value of Block10 changes in Block 10 and the previous Hash value of Block 11 will contain the old Hash value. This makes this method so strong.
Proof of work
In today’s world machines are so powerful hence an attacker might make changes to a block and can compute Hash values accordingly in a fraction of seconds. To avoid this proof of work concept is introduced. For e.g., Proof of work might take a few minutes.
When a new block is created, it is sent to all of them of the Network. They check the blocks for any tampering and finally add the block to their blockchain.
Security Enhancements using Hashing Technique
Hashing techniques helps enhancing security features. Hashing is different from encryption techniques. In the case of encryption, it’s possible to decrypt and get back the original text. But in the case of Hashing, hash values are used to ensure that the data are not altered.
SHA is a famous hashing technique. The size of the output can be 256/512 bits.
STEP 1: Pre-processing work. Represent the message in binary and add zero’s + 64 bits so that the data is of multiples 512 bits.
STEP 2: Initialize 8 Hash values, 8 buffers a to h.
STEP 3: Initialize Round Constant k of 64 bits.
STPE 4: Divide the entire message block into n chunks of 512 bits
STEP 5: Perform 64 rounds of operation to those 512 bits. 1st round results will act as input for the next round and so on.
Fig 3. Single round of SHA 256 Algorithm
STEP 5: The result of the 64th round is the final output. And the final hash value will be of 256 Bits.
The 64 rounds make the hashing algorithm so strong. But the technologies and computing power available these days make it possible to find the Hash value.
As an additional security measure Proof of work technique is used in blockchain techniques. And as hash values of previous blocks are available in the current block, it helps to find the tampering.
Fig 4. Hash Value for an input.
Salt value helps to increase the uniqueness in Hashing Technique. A notable part in hashing algorithm is that for the same input it produces the same output. Though Hashing algorithms are strong, computing power is also strong these days. So, to make things even more difficult for the attacker/hacker Salt value can be used.
Adding salt to the hashing technique makes sure that the outputs are unique irrespective of the input as salt value is added to the input. This helps us to face attacks like hash table attacks, dictionary attacks, brute-force attacks etc.
E.g., Let us consider a Scenario where 2 people are considering same password.
Password of A: Kitty
Password of B: 12345
Password of C: Kitty
Password of D: x123
After applying SHA 256 Algorithm, this will be the following output:
Hashed Value for A: 67731ff58137eb39713ae30eba33c54c8c1d5418e081428ca815e4e733d64f6d
Hashed Value for B:
Hashed Value for C: 67731ff58137eb39713ae30eba33c54c8c1d5418e081428ca815e4e733d64f6d
Hashed Value for D:
Hash value of A and C are similar as the input data is the same. These sorts of things might help the attacker to trace the original data. In the above case if the password of A is known by the attacker, he can very well know the password of C too. To avoid these sort of scenarios Salt method can be used.
We can either append the salt value or we can prepend the salt value.
For e.g., Password + Salt value or salt value + Password.
Salt Value: 123. After prepending salt value to the password, we get 123kitty.
And the respective hash value would be: ac80cf42139999bd5968ded4acacc421089f65ef5049fd471e52d895cf558db1
We must also make sure that we are adding a cryptographically strong salt.
After applying Hash functions to two blocks in blockchain where the block contents are similar.
Fig 5. Hashed value for blocks without salt value
After appending salt to the block data in the blockchain.
Fig 6. After adding salt value to the block contents, where the block contents are the same.
Big data and Blockchain technologies are supportive to handle the large voluminous data and they make sure that the data is secure. Hashing algorithms like SHA 256 are used to make sure that data is safe. But the problem with hashing techniques is that for the same input, same output is always produced. This might act as a clue to the hackers. To sort this out salt techniques are used to the hashing functions. Thereby data can be even more secure.
As various attacks are possible, solutions to handle them can be found. E.g., Routing attack, 51% attacks. In 51% attack, if an entity can control 51% or more of the network nodes, then it can result in control of the network.
Conflict of interest
The authors declare no potential conflict of interest regarding the publication of this work. In addition, the ethical issues including plagiarism, informed consent, misconduct, data fabrication and, or falsification, double publication and, or submission, and redundancy have been completely witnessed by the authors.
The author(s) received no financial support for the research, authorship, and/or publication of this article