You don't know what you don't know! Even if you think you know, is your knowledge current? It's time to stop guessing and get your eyes on the data!
The data profiling process consists of the following steps: Discover, Collect, Analyze, Document, Act, Track.
DISCOVER
Discovery will consist of meetings and working sessions with knowledgeable resources from both the business and IT subject matter experts. . During discovery resources should identify the following attributes:
-
Data Element
-
The Data Object that the Data Element belongs to
-
Source System (Where is the data)
-
Volume (Record Count)
-
Technical Details (Tables / Field)
-
Business Contact (Subject Matter Expert or Owner)
-
IT Contact (Subject Matter Expert or Owner)
-
Business Scenario Variants
-
Business Rules
COLLECT
Extract the data from the source system into Info Steward, SQL Server or an Excel Spreadsheet. The data must be in one of these tools to facilitate the data analysis.
ANALYZE
In this step we will look at the form, fit and function of the data as well as validate any assumptions captured during discovery.
Data Driven Analysis
-
Data type
-
Consistency
-
Distribution of values
-
Min / Max
-
Frequency
-
Uniqueness
-
Occurrence of Null values
-
String Patterns
-
Duplicates
Business Driven Analysis
-
Timely
-
Accurate
-
‘fit’ for purpose
-
Accessible
-
Useable
DOCUMENT
Artifacts regarding each data element must be captured and memorialize for future consumption and reference.
At the very least capture these attributes in a spreadsheet. If possible, consider a world-class tool like DATUM's Information Value Management Software.
ACT
At times, data analysis will lead to a cleansing effort. Profiling outputs provide the opportunity to ‘clean’ problematic data in advance of the project, which benefits the current business models while also preparing it for the project activities.
TRACK
A cleansing tracker will need to be created and maintained. Tracker should have a list of all cleansing request, a due date, person responsible, and business impact. This process will need to be managed with escalations as appropriate.