Sunday 26 July 2015

Towards Energy Efficient Big Data Gathering In Densely                          Distributed Sensor Networks

INTRODUCTION:- Recent development of various areas of Information and Communication Technology (ICT) has contributed to an explosive growth in the large amount of data. According to a report published by IBM. 90 percent of the data in the world was generated in the last couple of years. In the recent years big data concept has emerged widely, which is currently attracting much attention from government, industry, and academia. As shown in Fig. 1, the big data comprises high volume, velocity, and variety information assets, which are difficult to collect, store, and process by using the available technologies. 

     The variety indicates that the data is of highly varied structures (e.g. data generated by a wide range of sources such as Machine-to-Machine   (M2M), Radio     Frequency Identification(RFID), and sensors) while the velocity refers to the high speed processing/analysis processing/analysis (e.g., fast database transactions, click-streaming, and so forth). Although currently used services (e.g. social networks, network switches, cloud storage and so forth) are already generating much volume of the big data it is anticipated that more and more data will be generated by sensors/RFID devices such as  motion sensors, accelerometers, atmospheric sensors,, thermometric sensors,and so on. In fact, according to a report by ORACLE  the volume of data devices and sensors and RFID sensors  is expected to reach the order of petabytes.As shown in Fig. 1, the sensors are responsible for generation of big data in big volume and also in a wide varietygenerated by RFID.

AIM:- To propose an effective solution to reduce the energy consumption in the sensor networks and to utilize the sink node’s mobility to facilitate the data gathering. Here, a new mobile sink routing and data gathering method through network clustering based on modified ExpectationMaximization (EM) technique.
Synopsis:- Mobile wireless sensor networks can simply be defined as a wireless sensor network (WSN) in which the sensor nodes are mobile. Sensor networks are smaller, when they emerge into field of research in contrast to their well-established predecessor. 
     Sensor Networks are much more versatile than static sensor networks as they can be deployed in any scenario and cope with rapid topology changes. Commonly the nodes consist of a radio transceiver and a micro controller powered by a battery. Also some kind of sensor for detecting heat, light, humidity, temperature, etc.  
     In this section, we first outline the clustering problem in WSN using mobile sink and the challenges in solving this problem. After that, we introduce the considered network model and the overview of EM algorithm for clustering. Based on EM algorithm, we proposed our clustering method and the procedure to gather data using the proposed method.
     Twitter is a well-known  social and micro-blogging website which allows millions of users to interact over different types of communities, topics, and tweeting trends. The big data being generated on Twitter daily, and its impact on social networking, has motivated the applications of data mining (analysis) to extract necessary information from tweets. In this paper, we find the impact of tweets based on the spectral clustering.
Existing System:- The systems are used in many situations recently and provide various information. Although they play an important role in our life, their performance is not sufficient in terms of real time data collection. We discuss the requirements for the next generation data gathering.
     Although the sensor networks have provided essential services, there are some shortcomings such as their coverage’s and mobility, many of the existing systems for sensor networks based on wired or wireless ground infrastructures are used to collect data from sensor terminals. But the coverage of the networks is limited and creating new infrastructure for remote areas is difficult for both economical and physical reasons.
Drawbacks Of Existing System:-  The network is divided to some subnetworks because of the limited wireless communication range. For example, sensors deployed or placed in a building may not be able to communicate with the sensors which are distributed in the neighboring buildings. Therefore, limited communication range wil pose a challenge for data collection from all sensor nodes.
     The wireless transmission consumes the energy of the sensors. Even though the large amount of data generated by an individual sensor is not significant, each sensor consumes  lot of energy to relay the data generated by surrounding sensors.
Proposed System:- The main motivation is to focus on the effect of data request messages by increasing the number of clusters. Based on a common data gathering model of the densely distributed WSNs, we demonstrate that the number of data request messages has a noticeable impact on the energy consumption of the sensor nodes. When the connectivity of the nodes increases, the impact also becomes bigger.
     The mobile sink is responsible to collects the data from the nodes in the cluster. It is easy to see that delay is main problems of using mobile sink in WSNs .To shorten this delay we implemented Expectation Maximization (EM) algorithm.
Modules:- 
Cluster Creation:- WSN are autonomous systems consisting of mobile hosts that are connected by multi hop wireless links. In this cluster head (CH) is elected according to its weight computed by combining a set of system parameters (Mobility). Sensor nodes are equipped with store sensed information until mobile sink approaches the cluster centroid.
Twitter Data Generation:- Twitter is a highly popular platform for information exchange, can be used as a data-mining source which could aid in the aforementioned challenges which is collected by sensor nodes. Specifically, using a large data set of harvested tweets, sensor nodes connect with sink to transfer the dataset to HDFS system.
     The REST APIs provides programmatic access to write and read Twitter data. And also  REST API identifies Twitter applications and responses are available in JSON.
EM computation:- The sink node sends data request message to cluster head to invoke data transmission from sensor nodes when it arrives at the cluster centroids. The nodes that receive data request message send the data to the sink node and broadcast data request message to their neighboring nodes using multi hop traversal. It was realized that clustering can be based on probability models to cover the missing values. This has led to the development of new clustering methods such as Expectation
     Maximization (EM) that is based on the principle of Maximum Likelihood of unobserved variables in finite mixture models.

Data collection:- Once, the mobile sink patrols every cluster centroid and collects the data from the nodes in the cluster. This leads to transfer the sensor data to HDFS system with less energy consumption. The spectral clustering is performed to perform data analytics based on the Hash tag, Location and retweet count.

Motivation:- The following challenges and benefits give us the motivation to developing this product.

Challenges:-
We first outline the clustering problem in WSN using mobile sink and the challenges in solving this problem. After that, we introduce the considered network model and the overview of EM algorithm for clustering. Based on EM algorithm, we proposed our clustering method and the procedure to gather data using the proposed method.

Architecture Diagram:- 

     Our implementation suggested that energy efficient big data gathering in such networks is, indeed, necessary. Where as the conventional mobile sink schemes can reduce energy consumption of the sensor nodes, they lead to a number of challenges such as determining the sink node’s trajectory and cluster formation prior to data collection. To point out  these challenges, we proposed a mobile sink based      data collection    method by introducing a new clustering method. Here clustering method uses a modified Expectation Maximization technique.

IMPLEMENTATION:-

Implementation literally means to put into effect or to carry out. The system implementation phase of the software deals with the translation of the design specifications into the source code. The ultimate goal of the implementation is to write the source code and the internal documentation so that it can be verified easily. The code and documentation should be written in a manner that eases debugging, testing and modification. System flowcharts, sample output,sample run on packages, etc. Is part of the implementation? 
     Various types of bugs were discovered while debugging the modules. These ranged from logical errors to failure on account of various processing cases.

System Implementation:-

A post-implementation review is an evaluation of the extent to which the system accomplishes stated objectives and actual project costs exceed initial estimates.
After the system is implemented and conversion is complete, a review should be conducted to determine whether the system is meeting expectations and where improvements are needed. A post implementation review measures the systems performance against predetermined requirements. It determines how well the system continues to meet performance specifications. It also provides information to determine whether major re-design or modification is required.
There           are    five    things         in consideration when          the    project           is developed. They are as follows:-
v  Adaptation
v  Prevention/Integrity
v  Enhancement
v  Correction
Adaptation/Enhancement:-
In this Project a high performance data synchronization server for mobile device is proposed. For the mobile application system, the information or data (ex. Contacts, Music, Video, Image) sets are usually stored in both the mobile device and system database. After several operations for the mobile system, the data sets between the mobile device and system database may become not identical..
Prevention/Integrity:-
Security has been the measure aspect in the prevailing system and is to be considered the primary key for any successful of the project. The software developed here, Mobile-Sync, has been given a full security providing each TESTERS with their access. We know that:
          Integrity=      [1-
(security*(1-threat)]
     Every measure is employed to secure the system from any types of threats. Integrity has been tried to maintain to its accuracy.
Correction:-
The project is corrective to its end and   all      the    validation has    been incorporated to software developed so that no further corrective action can be thought of.
NOTE:-
The software has been developed keeping in mind the requirements of the Share Investors to share application. One of the most important factors in developing any application is experience. Due to lack of experience, We might have overlooked some things that should be put into consideration.

Maintenance:-

         Maintenance activities involve making enhancements to software products, adapting products to new environments, and correcting problems. Adaptation of software to a new environment may involve moving the software to a different machine. Problem correction involves modification and revalidation of software to correct errors.

        Maintenance activities consume a large portion of the total life cycle budget. Software Maintenance accounts for 70 percent of total software life-cycle costs. Maintenance includes 60 percent of maintenance budget for enhancement, and 20 percent each for adaptation and correction. The primary product attributes that contribute to software maintainability are clarity, modularity, and good internal documentation of the source code, as well as appropriate supporting documents.
Documentation:-
Documentation is a method of communication. A satisfactory documentation of the system should be objective, factual and complete. Thus format, length, volume or complexity does not determine its adequacy. In documentation, there are no uniform standards that are applicable to all system projects.
     Embedding Comments in the executable portion of the code did proper documentation of each module. To enhance the readability of the comments, indentation, parenthesis, blank lines and spaces, proper lining of the loops were used around the block of comments. Care was also taken to use descriptive names of tables, fields, modules, forms etc. The proper use of indentation, parenthesis, blank lines and spaces were also ensured during coding to enhance the readability of the code.    

REFERENCES:-
[1]      IBM, “Four vendor views on big Data and big data analytics: IBM,” http://www.01.ibm.com/software/in/data/bigdata/, Jan. 2012.
[2]      A. Divyakant, B. Philip, and et al., “Challenges and opportunities with Big Data,” 2012, a community white paper developed by leading researchers across the United States.
[Online].
S. Sagiroglu and D. Sinanc, “Big data: A review,” in International Conference on Collaboration Technologies and Systems (CTS), 2013
[3]      D.      Baum         and   CIO    Information
[4]      L. Ramaswamy, V. Lawson, and S. Gogineni, “Towards a qualitycentric big Data architecture for federated sensor services,” in IEEE International Congress on Big Data (BigData
Congress), 2013.
[5]      C.-C. Lin, M.-J. Chiu, C.-C. Hsiao, R.G. Lee, and Y.-S. Tsai, “Wireless health care service system for elderly with dementia,” IEEE Transactions on Information Technology in
[6]       Biomedicine, vol. 10, no. 4, pp. 696– 704, 2006.
[7]       P. Ross, “Managing care through the air [remote health monitoring],” IEEE Spectrum, vol.41, no. 12,  pp. 26-31,2004