Onur's blog

Big Data Use cases

Zabbix installation

Zabbix Server MySQL installation on Centos 7 without reboot Server

First of all, MySQL community server is needed for Zabbix Server. For the MySQL installation, you will need to user which have sudo privilege or root user (root user is recommended). MySQL community version 8 was used. In order to download, below command was used In order to install, If you see below output when Read more about Zabbix Server MySQL installation on Centos 7 without reboot Server[…]

Kafka REST to Google BigQuery Data Pipeline Use Case

In this use case, we will create a data pipeline that starts from the REST producer, consumes data from the topic using Kafka Connect, inserts data into the Google BigQuery table. It can be said that this way is the streaming to BigQuery. Important to note that streaming to BigQuery is charged by Google. This Read more about Kafka REST to Google BigQuery Data Pipeline Use Case[…]

Amazon SQS

Amazon Simple Queue Service (SQS) Use Cases

Introduction In this blog section, Amazon Simple Queue Service (SQS) will be discussed with use cases to better understand and gain hands-on experience. In the big data learning, it is very important to dirty your hands, otherwise, the efforts that you endeavored will be like writing on the water. Amazon Simple Queue Service (Amazon SQS) Read more about Amazon Simple Queue Service (SQS) Use Cases[…]

Spark Streaming from Kafka to HBase Use Case

Let’s assume that our data is stored on the Kafka cluster and it should be moved to another storage layer which is will be HBase in this case. And few transformations need to be made before data is moved. These steps can be depicted as below architecture. Data could only be collected using the Spark Read more about Spark Streaming from Kafka to HBase Use Case[…]

Kafka Kerberos Configuration on secured Cloudera Cluster

Introduction Apache Kafka is an open-source distributed streaming platform developed by Linkedin and donated to Apache Software Foundation. It is robust, scalable horizontally, also it has a flexible architecture. Kafka ingests the data for storing data in a limited time and this data can be used by multiple teams in a company. This can create a vulnerably situation in Read more about Kafka Kerberos Configuration on secured Cloudera Cluster[…]

Apache Kafka Streams DSL Stateless Transformations

A state is not needed when doing sequential processing data (like instant arithmetic calculations). I examine all stateless transformations with samples. Processing logic created with Java 8. Below dependency is used for logic API creation. 1. Branch (or split): Branch transformation is used when source topic is splitted to different child downstream topics. KStream –> Read more about Apache Kafka Streams DSL Stateless Transformations[…]

Apache Kafka Streams DSL Stateful Transformations

Stateful transformations use the state store for processing input records and creating output from them. Aggregations, joins, and windowing operation need state stores of each previous stream processors (tasks) to accumulate the final status of the elements. In this topic, Stream DSL stateful transformations is being examined with samples. Sample logics is developed using Java Read more about Apache Kafka Streams DSL Stateful Transformations[…]

Accessing from JDBC to Kerberized Hive

Normally, JDBC connection URL with username and password pairs which are use for accessing database would be enough. But, Kerberized Hive accessing defines some limitations. AuthMech which is using as parameter on connection URL defines 5 different connection types. 0 for No Authentication. 1 for Kerberos. 2 for User Name. 3 for User Name And Password. 6 for Hadoop Delegation Read more about Accessing from JDBC to Kerberized Hive[…]