Perform big data analysis

Big data analytics perform batch analysis and processing on stored data such as data in a feature layer or cloud big data stores like Amazon S3 and Azure Blob Storage. Big data analytics are typically used for summarizing observations, performing pattern analysis, and incident detection. The analysis which can be performed leverages tools from five distinct groups:

  • Analyze patterns
  • Enrich data
  • Find locations
  • Manage data
  • Summarize data

Examples of big data analysis

  • As an environmental scientist, you can identify times and locations of high ozone levels across the country in a dataset of millions of static sensor records.
  • As a retail analyst, you can process millions of anonymous cell phone locations within a designated time range to determine the number of potential consumers within a certain distance of store locations.
  • As a GIS analyst, you can run a recurring big data analytic that checks a data source for new features every five minutes and sends a notification if certain attribute or spatial conditions are met.

Components of a big data analytic

There are three components of a big data analytic:

  • Sources:
    • A data source is used to load static or near real-time data in a big data analytic. There are many data source types available. For more information about sources and available source types, see What is a data source?
    • There can be multiple data sources in an analytic.
  • Tools:
    • Tools process or analyze data which is loaded via sources.
    • There can be multiple tools in a big data analytic.
    • Tools can be connected to each other where the output of one tool represents the input of the next tool.
  • Outputs:
    • An output defines what should be done with the results of the big data analytic processing.
    • There are many output options available including storing features to a new or existing feature layer, writing features to a cloud layer in Amazon S3 or an Azure Blob Storage, and more. For more information, see What is an output? and Fundamentals of analytic outputs.
    • The result of a tool or source can be sent to multiple outputs.

Working with outputs

When a real-time or big data analytic is run, it will generate one or more outputs. Depending upon the type of output(s) configured, there are several ways you can access and interact with those outputs in the Analytics for IoT application.

ArcGIS feature layer and stream layer outputs

When a real-time or big data analytic generates a feature layer or stream layer output, there are many ways in Analytics for IoT to interact with those output layer(s). Note that, these methods are not available if the analytic has not yet been run.

Access feature layer and stream layer outputs in the analytic

In the editing view of an analytic that has been run and successfully generated thee outputs, use the action button (in Workflow view) or right-click a node (in Model view) to see additional options. From there, you can click links to view item details, open a layer in a map viewer or scene viewer, or delete the layer (feature layers).

Take action on feature layer or stream layer outputs

Additionally, you can click the action button in the top right of the analytic editing interface to view the analytic item details or add all output feature layers to a map at the same time.

Access feature layer and stream layer outputs from the Layers page

All features layers, map image layers, and stream layers created by real-time or big data analytics will appear on the Layers page of the Analytics for IoT application. From the Layers page, you can click to view the layer in a map viewer, view the item details, edit the aggregation and symbolization settings of a map image layer, or open the REST endpoint of the service.

Amazon S3 and Azure Blob Store outputs

Big data analytics are capable of writing output features to Amazon S3 or Azure Blob Store cloud storage. Once the big data analytic finishes, the data will be available in the respective cloud path location. If you do not see the output as expected, check the analytic logs.

All other outputs

Other output types for big data analytics include Email and Kafka. With these outputs, Analytics for IoT forms a connection with the defined output and sends the output records accordingly.

Running a big data analytic (schedule)

Big data analytics can be configured to run in one of two ways:

Remember to click apply, and then save your analytic changes when adjusting analytic run scheduling.

Runs once

Big data analytics configured to run once only run when a user starts the analytic. The analytic performs the processing and analysis as defined and then reverts to a stopped state once complete. This differs from feeds, real-time analytics, and scheduled big data analytics which all continue to run once started. Runs once is the default option for big data analytics.

Schedule button run settings runs once selection

Scheduled

A big data analytic can also be schedule to run at a user defined day and/or time. A big data analytic can be scheduled to run periodically (for example, every 5 minutes) or at a recurring time (for example, daily at 4 am).

Schedule button run settings runs periodically every five minutes selection

When a big data analytic is configured to run in a scheduled manner, once the analytic is started, it will remain started unless the analytic is stopped. Unlike a real-time analytic, a scheduled big data analytic that is started will only consume resources while it is performing the analysis. For example, if a big data analytic is scheduled to run periodically every hour, and the analysis takes four minutes to complete, the big data analytic will only consume resources once an hour for the four minutes that it takes to perform the analysis.

For more information on how to configure and schedule big data analytics, see the Schedule recurring big data analysis.

Perform near-real-time analysis

Scheduled big data analytics can be used to perform near-real-time analysis where the big data analytic processes just the latest features added to a feature layer since its last run. For more information use cases and options for configuring near-real-time analysis, see the Perform near-real-time analysis.

Generate up-to-date informational products

Alternatively, scheduled big data analytics can be utilized to generate up-to-date informational products at a user-defined interval. For more information and examples of use cases and options for such workflows, see Generate up-to-date informational products.

Run settings

Big data analytics allow the user to adjust their Run settings. These settings controls the resource allocation provided by your Analytics for IoT deployment to your analytic for processing. Remember to save your analytic after making a change to run settings.

Generally speaking, the more resources provided to an analytic, the faster it will complete processing and generate your results. When working with larger datasets or complex analysis, it is a good practice and at times essential to increase the resource allocation available to an analytic.

Conversely, if you have a simple analytic with few features that runs successfully with the Medium (default) plan, consider decreasing the run settings resource allocation to a Small plan. This will allow you to run more feeds, real-time analytics, and big data analytics in your Analytics for IoT deployment.

Big data analytic run settings resource allocation selection

Considerations and limitations

Big data analytics are optimized for working with high volumes of data and summarizing patterns and trends, which typically result in a reduced set of output features or records compared to the number of input features. Big data analytics are not optimized for loading or writing massive volumes of features in a single run. Writing tens of millions of features or higher with a big data analytic may result in longer run times than anticipated. The recommended practice is to use big data analytics for summarization and analysis as opposed to copying data.