ETL (Extract, Transfer and Load) is a well-known data integration process. There is an overwhelming number of tools that you can use (one of which is SSIS) and it can be difficult to choose between them. What exactly is SSIS, and how can it help your company perform ETL better than you ever have before? This article will explain the major features of SSIS, demonstrate the pros and cons of implementing it, and advise as to when you might be better off with a different ETL tool.
Table of Contents
- What is SSIS?
- History of SSIS
- Most Important Features of SSIS
- Pros and Cons of SSIS
- What is ETL?
- Is SSIS a Good ETL Tool for You?
What is SSIS?
SSIS stands for SQL Server Integration Service. Developed by Microsoft, SSIS performs data integration, transformation, and migration tasks as a component of the Microsoft SQL Server database. The architecture has four basic parts: The object model, runtime, data flow, and service.
The object model is the overarching structure of Integration Services. SSIS is the latest version of the service with a rewritten object model, and it includes various APIs so that developers can program and extend packages.
The runtime engine component in SSIS manages workflows of packages and also manages transactions. Such components as containers, tasks, and events are runtime executables.
The SSIS data flow engine reads and then manages the movement and transformation of data from various sources. It includes objects called data flow components. These may be sources, destinations, or transformations. Developers can program the data flow engine to automate the configuration of the components or create custom components.
The Integration Services Service functions with SQL Server Management Studio to monitor packages stored in the SSIS Package Store. You do not need it for the most basic functionality of SSIS, but you do in order to list and monitor packages.
History of SSIS
DTS (Data Transformation Services) was the legacy solution that Microsoft produced prior to SSIS. With SQL Server 2005, the Microsoft team renamed the product and virtually replaced it by updating it to the latest technology. The 2008 SQL Server was next, with new sources and many other updates.
SSIS 2012 brought such improvements as easier package configuration and better storage. SSIS 2014 and 2016 only added small differences such as individual package and entire project deployment capabilities, new sources, and enhanced support.
That leads us directly to the most important features of the current SSIS build.
Most Important Features of SSIS
SSIS has a host of features necessary to help you with effective and comprehensive management of data.
1. Built-In Data Source and Destination Connectors
The latest version of SSIS added compatibility with the most modern data sources through its built-in connectors. It does this by using connection managers. Users can set the properties of these managers to describe the desired physical connection. SSIS connects to:
- Test, XML, and Excel workbooks
- Relational databases containing reference data
- SQL Server for maintenance and transfer tasks, temporary work tables, etc.
- Analysis Services databases and projects to enable access to data mining models, dimensions and more
- WMI (Windows Management Instrumentation), SMO (SQL Server Management Objects), messaging queues, and mail servers.
2. Tasks and Transformations
SSIS has many tasks and transformations built-in as well, making the software even easier to use. They aggregate, merge, modify, and distribute data. SSIS includes several different types of transformations including BI (Business Intelligence) Transformations, Row Transformations, Rowset Transformations, Split and Join Transformations, Auditing Transformations, and Custom Transformations.
BI Transformations include a wide variety of individual transformations for the purpose of cleaning data or mining text, among other things. Row Transformations are for the purposes of creating columns and updating column values. Similarly, Rowset Transformations create and manage rowsets.
Split and Join Transformations distribute rows to various outputs, perform lookup operations, copy transformation inputs, and also join inputs into a single output. Auditing Transformations audit info. Finally, developers can write their own custom transformations.
3. Fuzzy Grouping and Lookup Transformations
These transformations deserve their own section. The Fuzzy Grouping Transformation cleans data by identifying near-duplicate rows and selecting a canonical row that you can then use for the standardization of the data.
This transformation relies on selecting input columns for the identification of duplicates and then selecting either fuzzy or exact matches. Exact matching searches only for identical rows with exactly the same values. Fuzzy matching will also include those with near-equal values, based on a similarity score set by the user.
The Fuzzy Lookup Transformation is also involved in data cleansing through standardization, correction of data, and providing missing values. The software uses it to locate near-exact matches of records in a reference table. It is an alternative to the Lookup Transformation, which is only capable of finding perfect matches.
4. Basic Data Profiling Tools
SSIS provides the Data Profiling task and also the Data Profile Viewer as basic data profiling tools. You can use the first as a component of an Integration Services package in order to profile data stored in the server and check data quality. It does this by computing profiles to help the user learn about the data source. It does not work with any third-party source. It only functions with the SQL Server.
The Data Profile Viewer is a stand-alone tool intended for use after the Data Profiling task in order to review the output. Additionally, it supports drill-down capability to enable users to understand any data quality concerns named in the output.
Pros and Cons of SSIS
Here are the pros of the tool from companies today that use SSIS.
- In-depth documentation
- Graphical interfaces make it fairly easy to visualize the flow of data
- Connects to many different sources
- Easy management of projects and packages
- Highly customizable
- Permits developers to reuse scripts across multiple projects
- Easy to deploy and configure
As you might expect, SSIS also comes with a large number of disadvantages.
- Not always efficient for use with JSON
- Often confusing and hard to learn for new users
- Difficult configuration of connection managers at times
- Super clunky UI
- Limited Excel connections
- It can be hard to find third-party tools that support SSIS
- Requires developer experience and complex coding
This last con can be a real problem for businesses that have limited developers or engineers or seek to reduce their costs and resources spent. If you are currently using SSIS and need a low-code, modern ETL tool, see this list of the top 7 ETL tools for 2022.
What is ETL?
ETL, as mentioned, is the process of integrating data from different sources and loading it into a data warehouse or other data destination. It involves the three steps included in its name:
- Extract. Extracting data from sources of any type starts the process. Since each source may have a different data file format, the software may need to convert some into a common format.
- Transform. The software applies various standards to transform the data. In this step, the user filters, validates and cleans the data.
- Load. Finally, the software loads the data into a database, data warehouse, or other destination.
SSIS was previously one of the most common tools for this purpose, however, recently it has been considered one of the more technical and clunky legacy tools.
Is SSIS a Good ETL Tool for You?
According to many users, SSIS is a great tool for developers and advanced engineers. Users have commented on TrustRadius that it is the "best buddy for skilled SQL developers. However, is SSIS the ideal option for non-developers?
SSIS is ideal for developers and companies with large, complex data volumes. Developers working with SSIS typically use Visual Studio and write many lines of complex code with a big margin for error. However, for non-developers or developers that want to use their time efficiently with low-code tools, there are alternatives that may be better depending on your needs.
Consider, for instance, the Integrate.io ETL solution. Integrate.io offers a low-code, simplified drag-and-drop interface for performing ETL with a minimum of employee hours. Using our elastic and scalable cloud platform, users with zero coding experience can create data pipelines and perform various data-related processes. Additionally, we offer the capability for developers to code where necessary to obtain any necessary functionality.
Integrate.io allows you to schedule jobs, monitor the status and progress of current jobs, and sample data outsets. With full support and excellent documentation, Integrate.io is the leading choice for many companies. Contact us to learn more about how we can help you with all of your data integration needs.