TCS
SSIS Interview Questions
1) i have excel sheet i want to send good data one table and bad data to another table can you give me answer?
You can use redirect row option in oledb destination and you can send.
2)what is magic table in sqlserver?
3)what is pivot and unpivot in sqlserver
Converting
row to column and column to row
4)How
to trace errors in ssis?
Using logs you can trace errors.
TCS SSRS Interview Questions
Please
email at etldeveloper.k@gmail.com
TCS
SSAS Interview Questions
Please
email at etldeveloper.k@gmail.com
Accenture
SSIS Interview Questions
Interview Questions faced in Accenture
1.Tell
me briefly about your current project
You
can explain about current projects.
2.How
do you get data.. In what source type
CSV,EXCEL
through FTP using file system task in SSIS.
- In SSIS flat file connection manager you can find text qualifier variable. For example if you want to avoid double codes then you have to enter ""
What
is a text qualifier , where we can use this?
To
remove double codes in the input string and we can avoid garbage
data.
4.
How to eliminate unwanted text from a text file source
5. In order to speed up the loading of millions of records from source file what should i do.?
We have few the below properties in SSIS
1. Avoid Asynchronous Transformation (Sort T/F) wherever possible. Sort T/F required all the incoming rows to be arrived before start processing. Instead of using Sort T/F, we get sorted rows from datasource using the ORDER By clause.
2. Pulling High Volumes of Data
Drop all Non-Clustered Indexes and Clustered Index if exists, then Transfer and load the data into Destination Table. Create Clustered Index and Non-clustered indexes.
3. Avoid SELECT *
DataFlow Task uses buffer oriented architecture for data transfer and transformation. When data transfer from Source to Destination, the data first comes into the buffer, required transformations are done in the buffer itself and then written to Destination.
The size of buffer depends on the estimated row size. The estimated row size is equal to the maximum size of all columns in the row. So the more columns in a row means less number of rows in a buffer. Hence select only those columns which are required at the destination.
Even if we need all the columns from source, we should use the column name specifically in the SELECT statement, otherwise it takes another round for the source to gather meta-data about the columns when u are using SELECT *.
4. Effect of OLEDB Destination Settings
There are couple of settings with OLEDB destination which can impact the performance of data transfer as listed below.
Data Access Mode – This setting provides the 'fast load' option which internally uses a BULK INSERT statement for uploading data into the destination table instead of a simple INSERT statement (for each single row) as in the case for other options. So unless you have a reason for changing it, don't change this default value of fast load. If you select the 'fast load' option, there are also a couple of other settings which you can use as discussed below.
Keep Identity – By default this setting is unchecked which means the destination table (if it has an identity column) will create identity values on its own. If you check this setting, the dataflow engine will ensure that the source identity values are preserved and same value is inserted into the destination table.
Keep Nulls – Again by default this setting is unchecked which means default value will be inserted (if the default constraint is defined on the target column) during insert into the destination table if NULL value is coming from the source for that particular column. If you check this option then default constraint on the destination table's column will be ignored and preserved NULL of the source column will be inserted into the destination.
Table
Lock – By default this setting is checked and the
recommendation is to let it be checked unless the same table is
being used by some other process at same time. It specifies a table
lock will be acquired on the destination table instead of acquiring
multiple row level locks, which could turn into lock escalation
problems.
Check Constraints – Again by default this setting is checked and recommendation is to un-check it if you are sure that the incoming data is not going to violate constraints of the destination table. This setting specifies that the dataflow pipeline engine will validate the incoming data against the constraints of target table. If you un-check this option it will improve the performance of the data load.
#5 - Effect of Rows Per Batch and Maximum Insert Commit Size Settings:
Rows per batch:
The default value for this setting is -1 which specifies all incoming rows will be treated as a single batch. You can change this default behavior and break all incoming rows into multiple batches. The allowed value is only positive integer which specifies the maximum number of rows in a batch.
Maximum insert commit size:The default value for this setting is '2147483647' (largest value for 4 byte integer type) which specifies all incoming rows will be committed once on successful completion. You can specify a positive value for this setting to indicate that commit will be done for those number of records. Changing the default value for this setting will put overhead on the dataflow engine to commit several times. Yes that is true, but at the same time it will release the pressure on the transaction log and tempdb to grow specifically during high volume data transfers.
The above two settings are very important to understand to improve the performance of tempdb and the transaction log. For example if you leave 'Max insert commit size' to its default, the transaction log and tempdb will keep on growing during the extraction process and if you are transferring a high volume of data the tempdb will soon run out of memory as a result of this your extraction will fail. So it is recommended to set these values to an optimum value based on your environment.
#7 - DefaultBufferSize and DefaultBufferMaxRows :
The execution tree creates buffers for storing incoming rows and performing transformations.
The number of buffer created is dependent on how many rows fit into a buffer and how many rows fit into a buffer dependent on few other factors. The first consideration is the estimated row size, which is the sum of the maximum sizes of all the columns from the incoming records. The second consideration is the DefaultBufferMaxSize property of the data flow task. This property specifies the default maximum size of a buffer. The default value is 10 MB and its upper and lower boundaries are constrained by two internal properties of SSIS which are MaxBufferSize (100MB) and MinBufferSize (64 KB). It means the size of a buffer can be as small as 64 KB and as large as 100 MB. The third factor is, DefaultBufferMaxRows which is again a property of data flow task which specifies the default number of rows in a buffer. Its default value is 10000.
If the size exceeds the DefaultBufferMaxSize then it reduces the rows in the buffer. For better buffer performance you can do two things.
First you can remove unwanted columns from the source and set data type in each column appropriately, especially if your source is flat file. This will enable you to accommodate as many rows as possible in the buffer.
Second, if your system has sufficient memory available, you can tune these properties to have a small number of large buffers, which could improve performance. Beware if you change the values of these properties to a point where page spooling (see Best Practices #8) begins, it adversely impacts performance. So before you set a value for these properties, first thoroughly testing in your environment and set the values appropriately.
#8 - How DelayValidation property can help you
SSIS uses two types of validation.
First is package validation (early validation) which validates the package and all its components before starting the execution of the package.
Second SSIS uses component validation (late validation), which validates the components of the package once started.
Let's consider a scenario where the first component of the package creates an object i.e. a temporary table, which is being referenced by the second component of the package. During package validation, the first component has not yet executed, so no object has been created causing a package validation failure when validating the second component. SSIS will throw a validation exception and will not start the package execution. So how will you get this package running in this common scenario?
To help you in this scenario, every component has a DelayValidation (default=FALSE) property. If you set it to TRUE, early validation will be skipped and the component will be validated only at the component level (late validation) which is during package execution
9.
Better performance with parallel execution
10. When to use events
logging and when to avoid.11.
Monitoring the SSIS Performance with Performance Counters
Launch
Performance Monitor:
1. Start -> All Programs ->
Administrative Tools -> Performance
2. Load the SSIS related
Counters
In the Performance Object, select SQL Server:SSIS
Pipeline and SQL Server:SSIS Service.
SSIS provide a set of
performance counters. Among them, the following few are helpful when
you tune or debug your package:
Buffers in use
Flat buffers in
use
Private buffers in use
Buffers spooled
Rows read
Rows
written
“Buffers in use”, “Flat buffers in use” and
“Private buffers in use” are useful to discover leaks. During
package execution time, we will see these counters fluctuating. But
once the package finishes execution, their values should return to
the same value as what they were before the execution. Otherwise,
buffers are leaked.
“Buffers spooled” has an initial
value of 0. When it goes above 0, it indicates that the engine has
started memory swapping. In a case like this, set Data Flow Task
properties BLOBTempStoragePath and BufferTempStoragePath
appropriately for maximal I/O bandwidth.Buffers
Spooled:
The number of buffers currently written to the disk. If the data
flow engine runs low on physical memory, buffers not currently used
are written to disk and then reloaded when needed.
“Rows
read” and “Rows written” show how many rows the entire Data
Flow has processed.12.
FastParse property
Fast
Parse option in SSIS can be used for very fast loading of flat file
data. It will speed up parsing of integer, date and time types if
the conversion does not have to be locale-sensitive. This option is
set on a per-column basis using the Advanced Editor for the flat
file source.
6. difference between OLEDB destination and SQL Server destination.
ans : OLEDB destination loads the records in batches, where as SQL Server destination loads all the records at one go..
7. What is delay validation.
8. How do you use logging in our projects
Enable logging mode in package menu.
9. How many types of loggings are there
SSIS have 5 types of loggings
10. How can you send a particular log detail to someone.. How can you do that.
Using send mail task
11. There are millions of records uploading into the server. if any problem occurs while loading data how can u handle that without reloading whole data again and again.
Using checkpoint we can handle
12. Why do you go for event handlers.. Why to use which event handler.. Can you give a practical example.
We can use event handlers for to trace the errors and debugging the package.
13. Have you used sort transformation? Why u used that? (Or) why you have not used that?
14. How many types of transformations? what are they?
15. The difference between merge join and lookup? which is better? why not merge join.
Accenture
SSRS Interview Questions
Please email at etldeveloper.k@gmail.com
Accenture
SSAS Interview Questions
Please
email at etldeveloper.k@gmail.com
- Few more companies (WIP)
- Deloitte
- Capgemini
- Polaris
- Rolta
- Symphony
- Quinnox
- Datamatics