What is Smart Scan ?
The data search and retrieval processing can be offloaded to the Exadata Storage Servers. This feature is called Smart Scan. Using this Smart Scan, Oracle Database can optimize the performance of operations that perform table and index scans by performing the scans inside Exadata Storage Server, rather than transporting all the data to the database server.
Smart Scan capabilities includes :-
1) Predicate Filtering
2) Column filtering
3) Join Processing
What is predicate filtering and what all conditional operators are supported by predicate filtering ?
Exadata Storage Server enables predicate filtering for table scans. Rather than returning all the rows for the database to evaluate, Exadata Storage Server returns only the rows that match the filter condition.
In addition, many common SQL functions can be evaluated by Exadata Storage Server during predicate filtering.
The definitive list of which functions are offloadable for your particular version is contained in V$SQLFN_METADATA.
SQL> select * from v$sqlfn_metadata where offloadable = 'YES';
A list of conditional operators that are supported by predicate filtering include =, !=, <, >, <=, >=, IS [NOT] NULL, LIKE, [NOT] BETWEEN, [NOT] IN, EXISTS, IS OF type, NOT, AND, OR.
What is Bloom Filter ?
Offloaded joins are accomplished by creating what is called a bloom filter.
A Bloom filter, conceived by Burton Howard Bloom in 1970, is a space-efficient probabilistic data structure that is used to test whether an element is a member of a set. The properties of a Bloom filter make is a very efficient way of determining which values are not in a set. This is very useful for processing join conditions where a significant proportion of the data does not fulfill the join criteria.
Oracle Database 10g Release 2 first used Bloom filters to optimize parallel join operations. When two tables are joined via a hash join, the first table (typically the smaller table) is scanned and the rows that satisfy the WHERE clause predicates (for that table) are used to create a hash table. During the hash table creation, a Bloom filter bit string is also created based on the join column. The bit string is then sent as an additional predicate to the second table scan. After the WHERE clause predicates relating to the second table are applied, the resulting rows are tested using the Bloom filter. Any rows rejected by the Bloom filter must fail the join criteria and are discarded. Any rows that match using the Bloom filter are sent to the hash join.
With Exadata, the Bloom filter is passed to the storage servers as an additional predicate. Processing the Bloom filter inside Exadata Storage Server can reduce the amount of data transported to the database server to process a join, which in turn can speed up query performance.
Will bloom filter work with all type of joins?
No, bloom filter works only with hash joins.
Does Smart Scan work with encrypted and compressed data ?
Yes, smart scan works with encrypted and compressed data.
Exadata Storage Server performs Smart Scans on encrypted tablespaces and encrypted columns. For encrypted tablespaces, Exadata Storage Server can decrypt blocks and return the decrypted blocks to Oracle Database, or it can perform row and column filtering on encrypted data. Significant CPU savings can be made within the database server by offloading the CPU-intensive decryption task to Exadata cells.
Smart Scan works in conjunction with Exadata Hybrid Columnar Compression so that column projection and row filtering can be executed along with decompression at the storage level to save CPU cycles on the database servers.
What are prerequisites for Smart Scan to occur ?
There are 4 basic requirements that must be met for Smart Scans to occur :
1) There must be a full scan of an object; that is, full table scans, fast full index scans and fast full bitmap index scans.
2)The scan must use Oracle’s Direct Path Read mechanism. Direct path reads are generally used by Oracle when reading directly into PGA memory (as opposed to into the buffer cache).
- Direct-path reads are automatically used for parallel queries
- Direct-path reads may be used for serial queries
o Not used by default for serial small table scans
o Use _serial_direct_read=TRUE to force direct path reads
3) Each segment being scanned must be on a disk group that is completely stored on Exadata cells. The disk group must also have the disk group attribute settings as below :-
'compatible.rdbms' = 11.2.0.0.0' (or later)
'compatible.asm' = 11.2.0.0.0' (or later)
' cell.smart_scan_capable' = TRUE
4) The CELL_OFFLOAD_PROCESSING initialization parameter enables or disables Smart Scan.
The default value of the parameter is TRUE, meaning that Smart Scan is enabled by default. If it is set to FALSE, Smart Scan is disabled and the database uses Exadata storage to serve data blocks similar to traditional storage.
What are the situations that prevent Smart Scan to happen?
Scan on a clustered table
Scan on an index-organized table
Fast full scan on a compressed index
Fast full scan on a reverse key indexes
The table has row-level dependency tracking enabled
The ORA_ROWSCN pseudocolumn is being fetched
The optimizer wants the scan to return rows in ROWID order
The command is CREATE INDEX using NOSORT
A LOB or LONG column is being selected or queried
A SELECT .. VERSIONS flashback query is being executed
To evaluate a predicate based on a virtual column
More than 255 columns are referenced in the query
The data is encrypted and cell-based decryption is disabled
If table has CACHE property
If _serial_direct_read is turned off (NEVER)
If the partitioned object's size is less than _small_table_threshold.
Offloading is not done on serial DMLs.
Serial Direct read is not applied for these cases (and thus no smart scan) :
- sql from a table function
- sql from dbms_sql
- sql from plsql trigger
Smart scan is not enabled for the sql in plsql when plsql package is called by 'CALL plsql‘
Does not work on serial queries issued from shared servers
The SQL in question must not be quarantined.
High CPU usage on the Storage Cells
What are Storage Indexes ?
Storage Indexes is a very useful Exadata feature which is transparent to database and are maintained automatically.
They are not indexes that are stored in the database like Oracle’s traditional B-Tree or bitmapped indexes. They are not capable of identifying a set of records that has a certain value in a given column. Rather, they are a feature of the storage server software that is designed to eliminate disk I/O.
They work by storing minimum and maximum column values for disk storage units, which are 1 Megabyte (MB) by default and are called region indexes.
Because SQL predicates are passed to the storage servers when Smart Scans are performed, the storage software can check the predicates against the Storage Index metadata (maximum and minimum values) before doing the requested I/O. Any storage region that cannot possibly have a matching row is skipped.
Since the indexes are in-memory on the cell server, if a cell server is restarted the SIs are lost and must be rebuilt.
They are generally created during the first smart scan that references a given column after a storage server has been restarted.
They can also be created when a table is created via a CREATE TABLE AS SELECT statement, or during other direct-path loads.
Under what conditions Storage Index would be used ?
In order for a storage index to be used, a query must include or make use of all the following :
1. Storage Indexes can only be used with statements that do Smart Scans. The main requirements are that the optimizer must choose a full scan and that the I/O must be done via the direct path read mechanism.
2. In order for a statement to use a Storage Index, there must be a WHERE clause with at least one predicate.
3. Storage Indexes can be used with the following set of operators: =, <, >, BETWEEN, >=, <=, IN, IS NULL, IS NOT NULL
What conditions prevent the use of Storage Indexes ?
1. Storage Indexes are not created on CLOBs.
2. Storage Indexes do not work with predicates that use the != comparison operator.
3. Storage Indexes do not work on predicates that use the % wildcard.
4. Storage Indexes are created and maintained for eight-columns per table.
5. Customer had set either of the following init.ora parameter
"_smu_debug_mode=134217728"
or alternatively
"_enable_minscn_cr"=false
What are the statistics related to Storage Indexes ?
There is only one database statistic related to storage indexes. The statistic, 'Cell Physical IO Bytes
Saved by Storage Index', keeps track of the accumulated I/O that has been avoided by the use of Storage Indexes.
Since the statistic is cumulative, it must be checked before and after a given SQL statement in order to
determine whether Storage Indexes were used on that particular statement.
Run the following query to check the statistics for Storage Indexes :
select name, value from v$sysstat where name like '%storage%';
How to diagnose Smart Scan causing wrong results ?
Refer Note 1260804.1 - Exadata: How to diagnose smart scan and wrong results.
The data search and retrieval processing can be offloaded to the Exadata Storage Servers. This feature is called Smart Scan. Using this Smart Scan, Oracle Database can optimize the performance of operations that perform table and index scans by performing the scans inside Exadata Storage Server, rather than transporting all the data to the database server.
Smart Scan capabilities includes :-
1) Predicate Filtering
2) Column filtering
3) Join Processing
What is predicate filtering and what all conditional operators are supported by predicate filtering ?
Exadata Storage Server enables predicate filtering for table scans. Rather than returning all the rows for the database to evaluate, Exadata Storage Server returns only the rows that match the filter condition.
In addition, many common SQL functions can be evaluated by Exadata Storage Server during predicate filtering.
The definitive list of which functions are offloadable for your particular version is contained in V$SQLFN_METADATA.
SQL> select * from v$sqlfn_metadata where offloadable = 'YES';
A list of conditional operators that are supported by predicate filtering include =, !=, <, >, <=, >=, IS [NOT] NULL, LIKE, [NOT] BETWEEN, [NOT] IN, EXISTS, IS OF type, NOT, AND, OR.
What is Bloom Filter ?
Offloaded joins are accomplished by creating what is called a bloom filter.
A Bloom filter, conceived by Burton Howard Bloom in 1970, is a space-efficient probabilistic data structure that is used to test whether an element is a member of a set. The properties of a Bloom filter make is a very efficient way of determining which values are not in a set. This is very useful for processing join conditions where a significant proportion of the data does not fulfill the join criteria.
Oracle Database 10g Release 2 first used Bloom filters to optimize parallel join operations. When two tables are joined via a hash join, the first table (typically the smaller table) is scanned and the rows that satisfy the WHERE clause predicates (for that table) are used to create a hash table. During the hash table creation, a Bloom filter bit string is also created based on the join column. The bit string is then sent as an additional predicate to the second table scan. After the WHERE clause predicates relating to the second table are applied, the resulting rows are tested using the Bloom filter. Any rows rejected by the Bloom filter must fail the join criteria and are discarded. Any rows that match using the Bloom filter are sent to the hash join.
With Exadata, the Bloom filter is passed to the storage servers as an additional predicate. Processing the Bloom filter inside Exadata Storage Server can reduce the amount of data transported to the database server to process a join, which in turn can speed up query performance.
Will bloom filter work with all type of joins?
No, bloom filter works only with hash joins.
Does Smart Scan work with encrypted and compressed data ?
Yes, smart scan works with encrypted and compressed data.
Exadata Storage Server performs Smart Scans on encrypted tablespaces and encrypted columns. For encrypted tablespaces, Exadata Storage Server can decrypt blocks and return the decrypted blocks to Oracle Database, or it can perform row and column filtering on encrypted data. Significant CPU savings can be made within the database server by offloading the CPU-intensive decryption task to Exadata cells.
Smart Scan works in conjunction with Exadata Hybrid Columnar Compression so that column projection and row filtering can be executed along with decompression at the storage level to save CPU cycles on the database servers.
What are prerequisites for Smart Scan to occur ?
There are 4 basic requirements that must be met for Smart Scans to occur :
1) There must be a full scan of an object; that is, full table scans, fast full index scans and fast full bitmap index scans.
2)The scan must use Oracle’s Direct Path Read mechanism. Direct path reads are generally used by Oracle when reading directly into PGA memory (as opposed to into the buffer cache).
- Direct-path reads are automatically used for parallel queries
- Direct-path reads may be used for serial queries
o Not used by default for serial small table scans
o Use _serial_direct_read=TRUE to force direct path reads
3) Each segment being scanned must be on a disk group that is completely stored on Exadata cells. The disk group must also have the disk group attribute settings as below :-
'compatible.rdbms' = 11.2.0.0.0' (or later)
'compatible.asm' = 11.2.0.0.0' (or later)
' cell.smart_scan_capable' = TRUE
4) The CELL_OFFLOAD_PROCESSING initialization parameter enables or disables Smart Scan.
The default value of the parameter is TRUE, meaning that Smart Scan is enabled by default. If it is set to FALSE, Smart Scan is disabled and the database uses Exadata storage to serve data blocks similar to traditional storage.
What are the situations that prevent Smart Scan to happen?
Scan on a clustered table
Scan on an index-organized table
Fast full scan on a compressed index
Fast full scan on a reverse key indexes
The table has row-level dependency tracking enabled
The ORA_ROWSCN pseudocolumn is being fetched
The optimizer wants the scan to return rows in ROWID order
The command is CREATE INDEX using NOSORT
A LOB or LONG column is being selected or queried
A SELECT .. VERSIONS flashback query is being executed
To evaluate a predicate based on a virtual column
More than 255 columns are referenced in the query
The data is encrypted and cell-based decryption is disabled
If table has CACHE property
If _serial_direct_read is turned off (NEVER)
If the partitioned object's size is less than _small_table_threshold.
Offloading is not done on serial DMLs.
Serial Direct read is not applied for these cases (and thus no smart scan) :
- sql from a table function
- sql from dbms_sql
- sql from plsql trigger
Smart scan is not enabled for the sql in plsql when plsql package is called by 'CALL plsql‘
Does not work on serial queries issued from shared servers
The SQL in question must not be quarantined.
High CPU usage on the Storage Cells
What are Storage Indexes ?
Storage Indexes is a very useful Exadata feature which is transparent to database and are maintained automatically.
They are not indexes that are stored in the database like Oracle’s traditional B-Tree or bitmapped indexes. They are not capable of identifying a set of records that has a certain value in a given column. Rather, they are a feature of the storage server software that is designed to eliminate disk I/O.
They work by storing minimum and maximum column values for disk storage units, which are 1 Megabyte (MB) by default and are called region indexes.
Because SQL predicates are passed to the storage servers when Smart Scans are performed, the storage software can check the predicates against the Storage Index metadata (maximum and minimum values) before doing the requested I/O. Any storage region that cannot possibly have a matching row is skipped.
Since the indexes are in-memory on the cell server, if a cell server is restarted the SIs are lost and must be rebuilt.
They are generally created during the first smart scan that references a given column after a storage server has been restarted.
They can also be created when a table is created via a CREATE TABLE AS SELECT statement, or during other direct-path loads.
Under what conditions Storage Index would be used ?
In order for a storage index to be used, a query must include or make use of all the following :
1. Storage Indexes can only be used with statements that do Smart Scans. The main requirements are that the optimizer must choose a full scan and that the I/O must be done via the direct path read mechanism.
2. In order for a statement to use a Storage Index, there must be a WHERE clause with at least one predicate.
3. Storage Indexes can be used with the following set of operators: =, <, >, BETWEEN, >=, <=, IN, IS NULL, IS NOT NULL
What conditions prevent the use of Storage Indexes ?
1. Storage Indexes are not created on CLOBs.
2. Storage Indexes do not work with predicates that use the != comparison operator.
3. Storage Indexes do not work on predicates that use the % wildcard.
4. Storage Indexes are created and maintained for eight-columns per table.
5. Customer had set either of the following init.ora parameter
"_smu_debug_mode=134217728"
or alternatively
"_enable_minscn_cr"=false
What are the statistics related to Storage Indexes ?
There is only one database statistic related to storage indexes. The statistic, 'Cell Physical IO Bytes
Saved by Storage Index', keeps track of the accumulated I/O that has been avoided by the use of Storage Indexes.
Since the statistic is cumulative, it must be checked before and after a given SQL statement in order to
determine whether Storage Indexes were used on that particular statement.
Run the following query to check the statistics for Storage Indexes :
select name, value from v$sysstat where name like '%storage%';
How to diagnose Smart Scan causing wrong results ?
Refer Note 1260804.1 - Exadata: How to diagnose smart scan and wrong results.