Centralized SQL Server Agent Job Monitoring with SSIS and Splunk

29 May 2026
KingswaySoft Team

Centralizing SQL Server Agent job monitoring can become highly challenging when jobs are distributed across multiple database servers. While SQL Server provides built-in history tables within the msdb database, reviewing job executions typically requires connecting to each database server individually. This fragmented approach makes it difficult for operations teams to gain a unified view of job health or analyze execution trends across various environments. By integrating SQL Server Integration Services (SSIS) with Splunk, you can automatically collect job execution metadata from across your SQL Server landscape and stream it into a centralized platform for real-time monitoring and analysis.

In this blog post, we will demonstrate how to retrieve SQL Server Agent job execution metadata from an SQL Server instance and effectively transmit that information to Splunk using its HTTP Event Collector (HEC) API. The job metadata is extracted using the Premium SQL Server Source component (part of our SSIS Integration Toolkit offering), and the load is managed by our REST Destination component with Splunk selected as the destination service. We have included a sample package (link here) that you can download to follow along with this guide.

Control Flow Design Overview

Within your SSIS package, the architecture consists of two overarching processes. The first performs a query on the Microsoft SQL Server system tables and loads the data into Splunk. Afterward, an update action is executed on a control table to capture the latest checkpoint, ensuring only incremental values are fetched during subsequent package executions.

Control Flow Overview

When looking closely at the data flow task, the design may look complex at first glance, but it can be broken down into three straightforward steps. First, we use the Premium SQL Server Source component to execute a custom query and return our telemetry data. We then multicast the data into two separate data paths; this allows us to perform an aggregation in parallel and store the maximum synced value into a user variable. While the Dummy Destination is not strictly required, it is used to complete all paths of your Data Flow to ensure consistent runtime completion. The left side of the data flow handles formatting and writing the event payloads to Splunk.

Dataflow Overview

Taking a closer look at the Premium SQL Server Source component, configuring the Data Source to use a Custom Command allows us to run specialized queries directly against system tables.

Premium SQL Server Source Custom Command include duration cast

Our query targets a few main system tables while performing joins to enrich the metadata:

  • sysjobhistory: This table stores the execution history of SQL Server Agent jobs. We retrieve data from several critical columns:
    • instance_id: A unique identifier for every job execution. This system value auto-increments by one, making it ideal for managing incremental data loads. System jobs are also included here.
    • job_id: The unique GUID of the job. While helpful behind the scenes, it is not easily human-readable, so it is used in a join to pull the friendly job name.
    • run_status: An integer representing the outcome status of the job execution (e.g., Success, Failure, Cancelled).
    • run_duration: How long the job took to execute, natively stored in an HHMMSS integer format.
    • run_date / run_time: Natively, these are separated into two awkward integer fields rather than a unified timestamp.
    • message: Provides descriptive output or error messages detailing the outcome of the scheduled job step.
  • sysjobs: This table stores job-level metadata. The primary field we require from this table is the readable name field.
  • sysjobsteps: This table stores the individual steps within each job. Because we are looking specifically for SSIS-based executions in this scenario, we can inspect the subsystem field to ensure only jobs containing 'SSIS' steps are captured.
  • job_history_checkpoint: This is a custom control table created in the environment to track the latest job instance_id synced from each database server.
    • source_server: The server name acting as the primary key to easily track monitoring checkpoints across a multi-server environment.
    • last_instance_id: Represents the highest processed instance_id, which is updated at the end of the package run using our aggregated maximum value.

Additional Query Logic and Functions:

  • msdb.dbo.agent_datetime(run_date, run_time): This built-in function converts the awkward integer formats of run_date (YYYYMMDD) and run_time (HHMMSS) into a unified run_timestamp field. Because ISO 8601 is the preferred timestamp format for Splunk ingestion, this function handles that preparation seamlessly.
  • Lines 7 to 11: This math formula converts the standard HHMMSS duration integer into total seconds. Sending the runtime duration as an absolute number of seconds to Splunk significantly improves calculation clarity when creating dashboards or alerts.
  • Line 17 (jh.step_id = 0): This filter ensures we only pull job-level summary results, as any step_id greater than 0 indicates individual inner step metrics.
  • Lines 18 to 22: This utilizes an EXISTS clause against sysjobsteps to ensure the parent job contains at least one step executing an SSIS subsystem. This filters out non-essential system or maintenance tasks (such as pure TSQL, CmdExec, or PowerShell jobs) that might clutter your monitoring dashboards.
  • Lines 24 to 28: This subquery references our control table to pull the last_instance_id matching our dynamic Project Parameter (@[$Project::source_server]). We filter the source records to ensure we only load data where the instance_id is greater than our last recorded checkpoint. The ISNULL function defaults this to 0 if a server is being monitored for the first time.

The following text block contains the exact script shown in the custom command screenshot above:

SELECT
    j.name AS job_name,
    jh.instance_id,
    jh.job_id,
    jh.run_status,
    msdb.dbo.agent_datetime(jh.run_date, jh.run_time) AS run_timestamp,
    (
        (jh.run_duration / 10000) * 3600 +
        ((jh.run_duration % 10000) / 100) * 60 +
        (jh.run_duration % 100)
    ) AS duration_sec,
    jh.message
FROM msdb.dbo.sysjobhistory jh
INNER JOIN msdb.dbo.sysjobs j
    ON jh.job_id = j.job_id
WHERE
    jh.step_id = 0
    AND EXISTS (
        SELECT 1
        FROM msdb.dbo.sysjobsteps js
        WHERE js.job_id = j.job_id
          AND js.subsystem = 'SSIS'
    )
    AND jh.instance_id > ISNULL((
        SELECT last_instance_id
        FROM TestDB.dbo.job_history_checkpoint
        WHERE source_server = '@[$Project::source_server]'
    ), 0)
ORDER BY jh.instance_id;

In one of the multicast output paths, we isolate the largest instance_id value from the current batch of records. This is easily achieved by routing rows through our Premium Aggregate component and applying the Maximum operation on the column.

Premium Aggregate Max Value

Downstream, we map this value using our Premium Derived Column component, configuring it to write directly to an SSIS user variable. This persists the maximum ID value so that it can be passed over to the subsequent control flow task once the data flow completes.

Derived Column WriteValueToVariable

Configuring the Splunk Destination

The primary path of the data flow streams the prepared telemetry records directly into Splunk. Our REST Destination component is configured to interface with Splunk's HTTP Event Collector (HEC) endpoint, which is the optimized ingestion path for streaming application event data. SSIS effectively acts as the data producer - packaging each row of SQL Server Agent metadata into a structured JSON event payload before securely transmitting it over HTTPS. Inside the destination component, we target Indexes and select the appropriate index action targeting our specific repository (in this example, we created an index named ssis_telemetry).

Splunk Destination General Page

On the Columns page, clicking the Manage Index Fields button allows you to quickly populate the destination schema mapping based on upstream columns. You can manually add, edit, or remove fields here. Because Splunk indexes accept dynamic schemas, you simply need to map your data into clean, lowercase field names suitable for downstream searching.

Manage Index Fields

As illustrated in the syntax snippet below, each event payload sent to the HEC contains atomic details regarding a single SQL Server Agent job execution. These structured fields make it simple to filter, aggregate, and report on job health once the dataset populates Splunk.

{
  "instance_id": 12096,
  "job_id": "7b94076b-15a0-4b6e-a877-5a1201fcd5aa",
  "job_name": "HSSync",
  "message": "The job succeeded....",
  "run_duration": 45,
  "run_status": 1,
  "run_timestamp": "2026-03-20T09:10:00-04:00"
}

Finally, once the Data Flow completes successfully, the concluding step in the Control Flow uses an Execute SQL task to update our checkpoint table with the max_instance variable value. The source_server parameter is supplied dynamically via an SSIS Project Parameter, ensuring the package remains completely reusable across multiple server environments. This architecture scales easily: as you add more SQL Servers to your ecosystem, the job_history_checkpoint table maintains an isolated track of each distinct environment. You can schedule and run this single package loop as many times as needed by simply altering the environment variable input.

Premium SQL Server Command Task

Once this orchestration is scheduled and your telemetry begins piping into Splunk, Operations Teams can build rich, real-time dashboards to visualize execution metrics and automate alerting. For instance, a dedicated monitoring dashboard can graph job run durations over time, surface unexpected failures in an interactive grid, or isolate long-running anomalies before they impact business lines. Consolidating this data into Splunk eliminates the tedious overhead of manually logging into separate instances to investigate job history. Instead, infrastructure health is exposed transparently inside a central glass pane.

Splunk Dashboard View

Conclusion

Monitoring distributed SQL Server Agent jobs across a massive database footprint can quickly lead to visibility gaps without centralized telemetry tracking. By combining the Premium SQL Server Source component to pull incremental job metadata with our flexible REST-based destination components to stream events into Splunk’s HEC API, you can easily establish robust operational visibility. This solution enables engineering teams to identify failures faster, spot efficiency trends over time, and protect production environments. We hope this guide helps streamline your enterprise SSIS integration journeys!

Archive

May 2026 3 April 2026 3 March 2026 2 February 2026 2 January 2026 2 December 2025 2 November 2025 2 October 2025 2 September 2025 2 August 2025 2 July 2025 2 June 2025 1 May 2025 2 April 2025 3 March 2025 1 February 2025 1 January 2025 2 December 2024 1 November 2024 3 October 2024 1 September 2024 1 August 2024 2 July 2024 1 June 2024 1 May 2024 1 April 2024 2 March 2024 2 February 2024 2 January 2024 2 December 2023 1 November 2023 1 October 2023 2 August 2023 1 July 2023 2 June 2023 1 May 2023 2 April 2023 1 March 2023 1 February 2023 1 January 2023 2 December 2022 1 November 2022 2 October 2022 2 September 2022 2 August 2022 2 July 2022 3 June 2022 2 May 2022 2 April 2022 3 March 2022 2 February 2022 1 January 2022 2 December 2021 1 October 2021 1 September 2021 2 August 2021 2 July 2021 2 June 2021 1 May 2021 1 April 2021 2 March 2021 2 February 2021 2 January 2021 2 December 2020 2 November 2020 4 October 2020 1 September 2020 3 August 2020 2 July 2020 1 June 2020 2 May 2020 1 April 2020 1 March 2020 1 February 2020 1 January 2020 1 December 2019 1 November 2019 1 October 2019 1 May 2019 1 February 2019 1 December 2018 2 November 2018 1 October 2018 4 September 2018 1 August 2018 1 July 2018 1 June 2018 3 April 2018 3 March 2018 3 February 2018 3 January 2018 2 December 2017 1 April 2017 1 March 2017 7 December 2016 1 November 2016 2 October 2016 1 September 2016 4 August 2016 1 June 2016 1 May 2016 3 April 2016 1 August 2015 1 April 2015 10 August 2014 1 July 2014 1 June 2014 2 May 2014 2 February 2014 1 January 2014 2 October 2013 1 September 2013 2 August 2013 2 June 2013 5 May 2013 2 March 2013 1 February 2013 1 January 2013 1 December 2012 2 November 2012 2 September 2012 2 July 2012 1 May 2012 3 April 2012 2 March 2012 2 January 2012 1

Tags