Development of a climatological data management system whose objective was to provide processing, storage, query and analysis functionalities of data from various sources of direct or indirect measurement of data collected by remote sensors, as well as to provide advanced functionality of customers who need it. calculations based on data from environmental measurements or estimates, such as operating flows at hydroelectric plants, load capacity on power transmission lines, crop productivity, among others.
A complete and highly contained system that utilizes state-of-the-art technology to deliver large amounts of information in the shortest possible time. To achieve this goal, the system was completely planned and structured from data ingestion through data and metadata storage, complementary process processing (summary, climatology, dependent variable calculations, quality control) to a WEB interface. for visualization and analysis of information. The System also has a large RESTful API capable of arranging all your information for integration in other systems.
In data ingestion is used the concept of message queue which through RabbitMQ enables to handle message traffic quickly and reliably, this concept ensures asynchronization between applications, decreases coupling between applications, distributes alerts / notifications and control the Work queue in the background.
In the data storage layer, a non-relational Cassandra cluster was used, capable of simultaneously serving a large number of online users, whose main advantages are high scalability and availability. To store the metadata, which requires a certain relationship between the information, the PostgreSQL database with PostGIS was used.
Data processing is performed on top of an Apache Spark cluster that extends the MapReduce programming model popularized by Apache Hadoop. Its goal is to process large data sets in a parallel and distributed manner. Some of this information is stored in the Cassandra database through preprocessing while others are executed in real time and made available to the WEB layer through a RESTful API developed in Scala language.
The WEB layer and its metadata API was entirely developed in Python through the Django framework which aims to develop and create pages with greater agility, speed, elegance and less use of code.
The system also has an interface with map servers which, through OGC (Open Geospatial Consortium) standards, is able to integrate layers from GIS tools such as Thredds, Geoserver and MapServer into its map products.
PYTHON, HTML5, CSS3, JavaScript, Ajax, Django, AdminLTE, Vue.js, Bootstrap, Highcharts, Openlayers, D3.js, JQuery, Map and data servers (Geoserver e Thredds), API Restful, SPARK (PYSPARK), Libraries for data-science python (PANDAS, NUMPY, SCYPY, MATPLOTLIB ANACONDA), Use of Swagger in conjunction with API, interfaces for consultation PI System OSI Soft, ZooKeeper, Jupyter, PEP8 Documentation Sphinx, Docker, Jenkins, PostgreSQL, Cassandra, MongoDB.