Skip to main content

Understanding WSO2 Stream Processor — Part 1

Streaming analytics has been one of the trending topics in the software industry for some time. With the production of billions of events through various sources, analyzing these events provides the competitive advantage for any business. The process of streaming analytics can be divided into 3 main sections.
  1. * Collect — Collecting events from various sources
  2. * Analyze — Analyzing the events and deriving meaningful insights
  3. * Act — Take action on the results
WSO2 Stream Processor (WSO2 SP) is an intuitive approach to stream processing. It provides the necessary capabilities to process events and derive meaningful insights with its state of the art “Siddhi” stream processing runtime. The below figure showcases how WSO2 SP acts as a stream processing engine for various events.
Source: https://docs.wso2.com/display/SP410
With the WSO2 SP, events generated from various sources like devices, sensors, applications and services can be received. The received events are processed in real time using the streaming SQL language “Siddhi”. Once the results are derived, those results can be published through APIs, alerts or visualizations so that business users can act on them accordingly.
Users of WSO2 SP need to understand a set of basic concepts around the product. Let’s identify the main components which a user needs to interact with.
WSO2 Stream processor comes with built-in components to configure, run and monitor the product. Here are the main components.
  • * WSO2 SP runtime (worker) — Executes the realtime processing logic which is implemented using Siddhi streaming SQL
  • * Editor — Allows users (developers) to implement their logic using Siddhi streaming SQL and debug, deploy and run their implementations similar to an IDE
  • * Business Rules — Allows business users to change the processing logic by simply modifying few values stored in a simple form
  • * Job Manager — Allows to deploy and manage siddhi applications across multiple worker nodes
  • * Portal — Provides ability to visualize the results generated from processing logic which was implemented
  • * Status Dashboard — Monitor multiple worker nodes in a cluster and showcases the information about those nodes and the siddhi applications which are deployed
In addition to the above components, the diagram includes
  • * Source — Devices, Apps, Services which generates events
  • * Sink — Results of the processing logic are passed into various sinks like APIs, dashboards, notifications
With these components, users can implement plethora of use cases around streaming analytics and/or stream processing whatever you called it. The next thing you need to understand about WSO2 SP is the “Siddhi” streaming SQL language and its high level concepts. Let’s take a look at those concepts as well.
Figure: Siddhi high level concepts in a nutshell
The above figure depicts the concepts which needs to be understood by WSO2 SP users. Except the source and sink which we have looked through in the previous section, all the other concepts are new. Let’s have a look at these concepts one by one.
  • * Event — Actual data coming from sources which are formatted according to the schema
  • * Schema — Define the format of the data which is coming with events
  • * Stream — A running (continuous) set of incoming events are considered as a stream
  • * Window — Is a set of events which are selected based on number of events (length) or a time period (duration)
  • * Partition — Is a set of events which are selected based on a specific condition of data (e.g. events with same “name” field)
  • * Table — Is a static set of events which are selected based on a defined schema and can be stored in a data store
  • * Query — Is the processing logic which uses streams, tables, windows, partitions to derive meaningful data out of the incoming data events
  • * Store — Is a table stored in a persistent database for later consumption through queries for further processing or to take actions (visualizations)
  • * Aggregation — Is a function (pre-defined) applied on events and produce outputs for further processing or as final results
  • * Triggers — Are used to inject events according to a given schema so that processing logic executes periodically through these events
Now we have a basic understanding about WSO2 SP and its main concepts. Let’s try to do a real streaming analysis using the product. Before doing that, we need to understand the main building block of WSO2 SP runtime which is a “Siddhi Application”. It is the place where users configure WSO2 SP runtime to make it happen.
Figure: Siddhi application compoents
Within a Siddhi application, we have 3 main sections.
  • * Source definition — This is the place to define incoming event sources and their schemas. Users can configure different transport protocols, messaging formats, etc.
  • * Sink definition — This section defines the place to emit the results of the processing. Users can choose to store the events in tables, output to log files, etc.
  • * Processing Logic — This section implements the actual business logic for data processing using the Siddhi streaming SQL language
Now you have a basic understanding about WSO2 SP and it’s main concepts. The next thing you can do is to make your hands dirty by trying out few examples with it. The tutorials section of the documentation is a good point to start things off.

Comments

Post a Comment

Popular posts from this blog

WSO2 ESB tuning performance with threads

I have written several blog posts explaining the internal behavior of the ESB and the threads created inside ESB. With this post, I am talking about the effect of threads in the WSO2 ESB and how to tune up threads for optimal performance. You can refer [1] and [2] to understand the threads created within the ESB. [1] http://soatutorials.blogspot.com/2015/05/understanding-threads-created-in-wso2.html [2] http://wso2.com/library/articles/2012/03/importance-performance-wso2-esb-handles-nonobvious/ Within this blog post, I am discussing about the "worker threads" which are used for processing the data within the WSO2 ESB. There are 2 types of worker threads created when you start sending the requests to the server 1) Server Worker/Client Worker Threads 2) Mediator Worker (Synapse-Worker) Threads Server Worker/Client Worker Threads These set of threads will be used to process all the requests/responses coming to the ESB server. ServerWorker Threads will be used to pr

How puppet works in your IT infrstructure

What is Puppet? Puppet is IT automation software that helps system administrators manage infrastructure throughout its lifecycle, from provisioning and configuration to orchestration and reporting. Using Puppet, you can easily automate repetitive tasks, quickly deploy critical applications, and proactively manage change, scaling from 10s of servers to 1000s, on-premise or in the cloud. How the puppet works? It works like this..Puppet agent is a daemon that runs on all the client servers(the servers where you require some configuration, or the servers which are going to be managed using puppet.) All the clients which are to be managed will have puppet agent installed on them, and are called nodes in puppet. Puppet Master: This machine contains all the configuration for different hosts. Puppet master will run as a daemon on this master server. Puppet Agent: This is the daemon that will run on all the servers, which are to be managed using p

How to configure timeouts in WSO2 ESB to get rid of client timeout errors

WSO2 ESB has defined some configuration parameters which controls the timeout of a particular request which is going out of ESB. In a particular  scneario, your client sends a request to ESB, and then ESB sends a request to another endpoint to serve the request. CLIENT->WSO2 ESB->BACKEND The reason for clients getting timeout is that ESB timeout is larger than client's timeout. This can be solved by either increasing the timeout at client side or by decreasing the timeout in ESB side. In any of the case, you can control the timeout in ESB using the below properties. 1) Global timeout defined in synapse.properties (ESB_HOME\repository\conf\) file. This will decide the maximum time that a callback is waiting in the ESB for a response for a particular request. If ESB does not get any response from Back End, it will drop the message and clears out the call back. This is a global level parameter which affects all the endpoints configured in ESB. synapse.global_timeout_inte