Tuesday, December 15, 2009

How to create CSV file in SQL*Plus

As I remember this is one of question which was asked every time on my Oracle DBA classes
"How to save a data in CSV or other text format ?"
Now a answer is simple go to
Shivmohan Purohit Oracle blog and check that post SQL*Plus output.


Monday, December 14, 2009

Oracle RAC One Node – part 2

In this post I want to share with you some scenarios about RAC One Node and client behaviour. In that part I’m using SQL*Plus from 10g so there is no FAN notification, next part will be with FAN enabled client. The main difference between using and not using FAN is that FAN client has a feedback from Oracle Cluster (formally Grid Infrastructure in 11R2 or Oracle ClusterWare in previous releases) and has information about current cluster status. Non FAN client has to wait for TCP/IP time-out for current session before switching to other node. More information about client failover and cluster configuration can be found here.

My server and client configuration has been presented in part 1 (entry from TNSNAMES.ORA) so this is only short recall:
  • Two nodes – RAC1 and RAC2 with SCAN configured
  • SQL*Plus 10g Client


I want to present 3 scenarios:


  1. One node in cluster is running

  2. Two nodes are running – current instance is crashing



  3. Fixing configuration





One node in cluster is running

Due to failure or maintenance only one node in cluster is running. RAC One node service is starting on that node and it is opened for all sessions. Irrespective of which node was current previously instance is open on node which is running now. Current status can be checked via raconestatus

[oracle@rac1 ~]$ raconestatus

RAC One Node databases on this cluster:


Database UP Fix Required Current Server Candidate Server Names
======== == ============ ============================== ========================================
testone Y N rac1 rac1 rac2


Available Free Servers:

As we can see there is no available server to move our instance on. If we start Omotion process we will hit error.


[oracle@rac1 ~]$ Omotion

RAC One Node databases on this cluster:

# Database Server Fix Required
=== ======== ============================== ============
[1] testone rac1 N

Enter number of the database to migrate [1]:

Specify maximum time in minutes for migration to complete (max 30) [30]: 2

Available Target Server(s) :
# Server Available
=== ================== =========
ERROR: Cannot find any available servers in this cluster.

Exiting...

[oracle@rac1 ~]$

In case of shutdown node RAC1 all session will be terminated as in single node configuration. Whatever server will be started after both nodes are down will become a current one.

Two nodes are running – current instance is crashing
Both servers are up – instance is running on RAC1 server. Current cluster configuration can be checked via crsctl command



[oracle@rac1 grid]$ $GRID_HOME/bin/crsctl status resource -t
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA.dg
ONLINE ONLINE rac1
ONLINE ONLINE rac2
ora.LISTENER.lsnr
ONLINE ONLINE rac1
ONLINE ONLINE rac2
ora.asm
ONLINE ONLINE rac1 Started
ONLINE ONLINE rac2 Started
ora.eons
ONLINE ONLINE rac1
ONLINE ONLINE rac2
ora.gsd
OFFLINE OFFLINE rac1
OFFLINE OFFLINE rac2
ora.net1.network
ONLINE ONLINE rac1
ONLINE ONLINE rac2
ora.ons
ONLINE ONLINE rac1
ONLINE ONLINE rac2
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
1 ONLINE ONLINE rac1
ora.oc4j
1 OFFLINE OFFLINE
ora.rac1.vip
1 ONLINE ONLINE rac1
ora.rac2.vip
1 ONLINE ONLINE rac2
ora.scan1.vip
1 ONLINE ONLINE rac1
ora.testone.db
1 ONLINE ONLINE rac1 Open
ora.testone.serviceone.svc
1 ONLINE ONLINE rac1
[oracle@rac2 grid]$


Session is connected to RAC1 server


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options

SQL> select * from test;

ID
----------
1
2
3
4
5
6

6 rows selected.

SQL> insert into test values (7);

1 row created.

SQL> commit;

Commit complete.

SQL> insert into test values (8);

1 row created.

SQL> select instance_name, host_name from v$instance;

INSTANCE_NAME HOST_NAME
---------------- ----------------------------------------------------------------
testone_2 rac1

SQL>

Active Query
Now first scenario when I’m repeating a select command in SQL*Plus and in that same time I’m turning off RAC1 server. In that case a active statement is cancelled and client is trying to reconnect at that time.


SQL> select to_char(sysdate,'hh24:mi') from dual;

TO_CH
-----
09:55

SQL> select to_char(sysdate,'hh24:mi') from dual;
select to_char(sysdate,'hh24:mi') from dual
*
ERROR at line 1:
ORA-03135: connection lost contact


SQL> select to_char(sysdate,'hh24:mi') from dual;
ERROR:
ORA-03114: not connected to ORACLE

As you can see command has been cancelled and session has been terminated. There is no other instance up and running yet to there is no target to failover a session. In client session network trace file you can find following order of errors:

  • ORA -12547 – TNS Lost Contact

  • ORA- 12151 – Bad packages

  • ORA-03115 – connection lost contact


After that client is going to reconnect session to other nodes like in normal RAC configuration. Client is calling SCAN address to get another VIP address but cluster is still in that same state – a reconfiguration has been just started – and Oracle Cluster is returning a rac1-vip once again as it is only one VIP configured for database service.

Keep in mind that current, not committed transaction in that case will be rollback.


Idle session
When a session is in an idle state a failover will take place when a first SQL query will be executed.
See example

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options

SQL> select to_char(sysdate,'hh24:mi') from dual;

TO_CH
-----
15:30


SQL> select instance_name, host_name from v$instance;

INSTANCE_NAME HOST_NAME
---------------- ----------------------------------------------------------------
testone_2 rac1


SQL> select to_char(sysdate,'hh24:mi') from dual;

TO_CH
-----
15:38

SQL>
SQL> select instance_name, host_name from v$instance;

INSTANCE_NAME HOST_NAME
---------------- ----------------------------------------------------------------
testone_1 rac2

SQL>

As you can see a connection has been migrated between servers without any error. This is because a before a last query a new instance was up and running. If we take a look into client network trace file we will see following errors
  • ORA- 12151
  • ORA- 3113
After that client is going to reconnect session to other nodes. Client is calling SCAN address to get another VIP address and cluster in returning a new VIP (rac2-vip) as a database service has been successfully migrated to other node.
Keep in mind that current, not committed transaction has to be rollback after session migration.


Fixing configuration

After unexpected crash it is possible that RAC One Node configuration need to be fixed.
I have figure out two possible scenarios:

  • After node crash and migration of instance and service the crashed node has been rebooted and there are two running instances

    [oracle@rac1 grid]$ raconestatus

    RAC One Node databases on this cluster:


    Database UP Fix Required Current Server Candidate Server Names
    ======== == ============ ============================== ========================================
    testone Y Y rac1 rac2 rac1 rac2


    Available Free Servers:

    [oracle@rac1 grid]$

    Fixing of configuration will close one of instance (on rebooted node) and will clean a configuration.

    [oracle@rac1 rac1]$ raconefix

    RAC One Node databases on this cluster:

    # Database Server Fix Required
    === ======== ============================== ============
    [1] testone rac1 Y

    Enter number of the database to fix [1]:

    If database is up, it will be checked and cleaned after a previous fail over.


  • Very similar situation like in previous point but only one instance is working and there is a “mess” in RAC service configuration. Oracle RAC One Node fix will clean up a configuration.


At the end I want to mention that I have hit some strange behavior related to migration of instance between servers and starting and stopping servers. I’m not sure if it a bug or not but after whole cluster restart instances have been migrated across.
First configuration was

  • RAC1 – instance name : testone_2
  • RAC2 – instance name : testone_1
Yes I know it was my inconsequence in naming so Oracle decided to fix it. No, not at the beginning but after a week of testing and restarts. And now it looks like:

  • RAC1 – instance name : testone_1
  • RAC2 – instance name : testone_2

Unfortunately after that I was unable to play with RAC One Node and I have to clear a whole configuration and create it from scratch.

Next part will be about FAN enabled clients and more about transactions.

Monday, December 7, 2009

Oracle RAC One Node – part 1

This post is about a installation and configuration of Oracle RAC 11gR2 in One Node configuration on VMWare. At the beginning I want describe a RAC One Node a little bit more – this is a new possible configuration with special licence and price which is very similar to fail over cluster configuration. During a normal work only one instance and up and running – like in failover cluster, a difference is in migration process. The following steps are performed during a migration:
  1. Second instance in started on target node
  2. All session are migrated to target instance – TAF has to be enabled on client configuration
  3. Source instance in shutdown in transaction mode
  4. After timeout a source instance in shutdown in abort mode
Installation and configuration in based on Linux CentOS 5.3 and Oracle 11g 11.2.0.1

If you want to skip Grid Infrastructure configuration tips click here

Oracle Grid Infrastructure
Oracle Grid Infrastructure has to be installed and configured on all nodes belongs to Oracle RAC (this is requirement for both RAC configuration – typical RAC and One Node RAC). In previous releases Oracle Grid Infrastructure was called an Oracle ClusterWare. In 11g R2 a name has been change to Grid Infrastructure and a lot of changes have been made. The most important are:
  • Oracle ASM is a part of Grid Infrastructure and not a part of Oracle Database
  • Vote and cluster configuration can use ASM disk or cluster file system
  • Raw or block devices for vote and cluster configuration are not supported during a installation time
  • More RAM is required ?

Last change made me sad as I have only 4 GB RAM on my laptop so I can allocate about 3 GB for two VM. But why do not try. After a few tests I have found a working configuration for both nodes.

RAC1 – node number 1
  • 1.5 GB of RAM allocated for VM
  • 2.0 GB of swap
  • 1 CPU
  • 10 GB of free space for Oracle Homes (both infrastructure and database)

RAC2 – node number 2
  • 1.0 GB of RAM allocated for VM
  • 2.0 GB of swap
  • 1 CPU
  • 10 GB of free space for Oracle Homes (both infrastructure and database)

Yes there is a difference between memory sizes in nodes – start OUI on node number 1 with more allocated memory.

Common configuration:

/etc/hosts
10.10.20.129    rac1.localdomain rac1
192.168.226.10 rac1-priv.localdomain rac1-priv
10.10.20.130 rac2.localdomain rac2
192.168.226.20 rac2-priv.localdomain rac2-priv
10.10.20.200 rac-cluster
10.10.20.210 rac1-vip.localdomain rac1-vip
10.10.20.220 rac2-vip.localdomain rac2-vip
RAC1-VIP and RAC2-VIP has to be assigned to public network and not configured during an installation. There is an additional entry for rac-cluster which is SCAN interface and has to be in same network as both VIP interfaces and not configured during a configuration time.
SCAN interface is a single entry point for cluster – for more information you can see RAC documentation.

Now is time to start OUI and install a Grid Control infrastructure. Below some installation tips:
  • Installation option – choose Install and configure Grid Infrastructure for Cluster
  • Installation type – Choose Typical installation
  • SCAN name – type rac-cluster
  • Add both nodes – RAC1 and RAC2 with proper VIP – note there is no need to private name anymore
  • Test SSH connectivity between nodes and click Setup there are any problems
  • Choose 10.10.20.x subnet as public and 192.168 as private
  • Specify a Oracle Base and Software location (which is a Oracle Home for Grid Infrastructure) - note that Oracle Home for Grid has to be in different location that Oracle Base
  • Select a disk for ASM group – it will be used for vote and configuration file too – if required change a Discovery Path to correct value (ex. /dev/sdc* for block devices or ORCL:* if you are using ASMLib)
  • If there are any problem with kernel setting or missing packages solve it before installation start – you can ignore memory, swap and NTP – but you have to have at least memory size specified above.
After about 15 minutes there is a time for a last step – execute a root.sh to configure and start cluster infrastructure. Run root.sh on node with more memory first.


Oracle Database Oracle Home

Just perform a standard installation of Oracle 11gR2 binaries without database creation.

RAC One Node patch

This is a best time to install a patch which adds One Node support to our database Oracle Home. Why ? Because it has to be installed when DB is down, so before creation of database we don’t need any other actions. Patch number is RACONENODE_p9004119_112010_LINUX.zip and can be found on Oracle Support Pages.

Database creation

The most important thing is to create a database only on one node. On the first screen a RAC database has to be chosen and then only one node (ex. rac1) has to be selected.
The next important thing is storage for a database. In our example all database files will be placed in ASM disk group. This same used to keep Grid Infrastructure cluster configuration.
All other configuration settings don’t have any impact on RAC One Node configuration.

Service configuration

A new service has to be added to support a RAC One Node configuration. This service will be used in our client configuration and will be entry point to our database.

srvctl add service -d testone -s serviceone -r testone1
where
  • testone – is a database name
  • serviceone – is a service name
  • testone1 – is a instance name created in previous point

RAC One node configuration

When database and service are up there is a time to start a RAC One Node configuration.
To do that a raconeinit has to be started.


Candidate Databases on this cluster:
# Database RAC One Node Fix Required
=== ======== ============ ============
[1] testone NO N/A
Enter the database to initialize [1]:
Database testone is now running on server rac1
Candidate servers that may be used for this DB: rac2

Enter the names of additional candidate servers where this DB may run (space delimited): rac2

Please wait, this may take a few minutes to finish.
Database configuration modified.

After that command a new configuration should be in place. Current status can be checked with following command:raconestatus


RAC One Node databases on this cluster:

Database UP Fix Required Current Server Candidate Server Names
======== == ============ ============================== ========================================
testone Y N rac1 rac1 rac2

Available Free Servers:

RAC One node operations

Main RAC One Node operation is moving an instance between nodes. That operation can be done using Omotion tool. Here is a example of Omotion execution

RAC One Node databases on this cluster:

# Database Server Fix Required
=== ======== ============================== ============
[1] testone rac1 N
Enter number of the database to migrate [1]:
Specify maximum time in minutes for migration to complete (max 30) [30]: 5
Available Target Server(s) :
# Server Available
=== ================== =========
[1] rac2 Y
Enter number of the target node [1]:

Omotion Started...
Starting target instance on rac2...
Migrating sessions...
Stopping source instance on rac1...
Omotion Completed...

=== Current Status ===
Database testone is running on node rac2

In that example database instance has been moved from node rac1 to node rac2. Instance on rac1 has been closed in transaction mode. In that scenario there was no remaining session on rac1 instance. When there are outstanding sessions/transactions on source node (in that case rac1) Omotion will shutdown that instance in transaction mode and then after time out will shutdown in abort mode – see example below.

RAC One Node databases on this cluster:

# Database Server Fix Required
=== ======== ============================== ============
[1] testone rac1 N

Enter number of the database to migrate [1]:
Specify maximum time in minutes for migration to complete (max 30) [30]: 5
Available Target Server(s) :
# Server Available
=== ================== =========
[1] rac2 Y

Enter number of the target node [1]:
Omotion Started...
Starting target instance on rac2...
Migrating sessions...
Stopping source instance on rac1...
Timeout exceeded, aborting instance...
Omotion Completed...

=== Current Status ===
Database testone is running on node rac2
Client configuration
How is look like from client perspective?
  1. TAF not configured – session has to be reconnected after instance migration
  2. TAF configured in client TNS – only current transaction has to be rollback.
TAF example:
tnsnames.ora

testone =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = rac-cluster)(PORT = 1521))
)
(CONNECT_DATA =
(SERVICE_NAME = testone)
(FAILOVER_MODE=
(TYPE=select)
(METHOD=basic))
)
)

Take a look on address - there is no VIP any more - now SCAN name has to be entered
in TNS alias and resolved via DNA or hosts as well as all RAC VIP (rac1-vip and rac2-vip).
SQL*Plus test


sqlplus system@testone
SQL> select instance_name, host_name from v$instance;

INSTANCE_NAME HOST_NAME
---------------- ----------------------------------------------------------------
testone_1 rac2


SQL> select * from test;

ID
----------
1
2
3
4
5
6
7

7 rows selected.

SQL> select instance_name, host_name from v$instance;

INSTANCE_NAME HOST_NAME
---------------- ----------------------------------------------------------------
testone_1 rac2


Omotion has been started.

SQL> select instance_name, host_name from v$instance;
select instance_name, host_name from v$instance
*
ERROR at line 1:
ORA-25402: transaction must roll back


SQL> rollback;

Rollback complete.

SQL> select instance_name, host_name from v$instance;

INSTANCE_NAME HOST_NAME
---------------- ----------------------------------------------------------------
testone_2 rac1


SQL> select * from test;

ID
----------
1
2
3
4
5
6

6 rows selected.

SQL>

After migration there was two changes - instance name and host name has been changed - it's not like in typical failover clustre
where instance is migrated from one host to other. In RAC One Node a new instance

is started on second node and during a migration time this configuration is working
as typical RAC.

This is end of part one - next part with more test and operations soon.


Thursday, December 3, 2009

Oracle RAC 11R2 and VMWare

Hi,

Last week I spend on installation and configuration Oracle RAC 11R2 on VMWare.
This nothing special but I want to do that on my laptop with Windows XP and 4 GB of RAM.
Until know it was enough RAM to install Oracle RAC in 9i, 10g and 11R1 but with 11R2 it is a more complicated.

First of all Oracle required 1.5 GB for Grid infrastructure or 2.5 GB for Grid Infrastructure
and database. If you multiple it by 2 - it is 8 GB as a minimum ;) Hmmm we have only half of it.

First attempt was to install it with my original settings - 1 GB RAM and 1 GB swap for each of VM.
It hung during root.sh script at the end of installation.

After that I decided to increase memory one of VM and add swap for both. New configuration was

on node 1 - 1.5 GB RAM and 2 GB swap,
on node 2 - 1.0 GB RAM and 2 GB swap.


That was it. Node with bigger RAM was used to start OUI and was first in cluster.

After successful installation of Grid Infrastructure I decided to decrease SGA size for ASM instances to required minimum (136 MB) and I have started a database installation and configuration. I have installed Oracle binaries and then I have started manually DBCA to create a database - of course SGA has been to required minimum (about 370 MB).
After 25 minutes I have had a working environment.

Next I have installed a patch for RAC One node - a new option for run Oracle in fail over cluster.
A new post about that should appear in a few days.

regards,
Marcin

Wednesday, November 4, 2009

Have to read and remember

Hi,

Third post today (all very short) but I have to clean up my workspace and close all tab in Firefox.
As I always has problem with bookmarks there is a list of interesting documents which I have read and kept open from some time to add it to blog.

Jonathan Lewis - old but very important about indexes.
James Morle's blog - new stuff is coming
Oracle CBO development group blog - some new papers ex. Upgrade from 10g to 11g - CBO changes
Uwe Hesses blog - is count(col) better than count(*)
Miladin Modrakovic's blog - New Oradebug feature in 11R2

Update:
I have forgot about Richard Foote's blog - two entries - Open World Highlights
and another entries about index rebuilding - I'm on Richard side and I prefer to go out on Sunday afternoon ;)

VMWare and CentOS - 2 virtual CPU

If you want to run virtual machine with 2 virtual CPU and CentOS 5 (RedHat 5) as a guest OS just read that note. Without a special settings a CentOS doesn't boot properly and will hung on UDEV service.
I have tried it and now I can simulate a 2 CPU in my VMPlayer and test some Jonathan Lewis staff about index explosion.

Naming Convention and Oracle Forms

I have been asked to create a test / dev environment to migrate Oracle Forms application from Forms 6i (Windows based) to Forms 10g on Linux box. There was about 30 files so whole operation looked very easy – just copy files and setup a Forms service.
I was really surprised when I test a main form – it didn’t start at all – just blank screen in browser.
I have compiled it and all related PL/SQL libraries once again in Forms Builder and there were no errors. So I decided to compile that same form and libraries on Linux box. Five from six libraries have been compiled without any errors but one has had an error message. Very strange for me – that other library has been not found. I have left it for a while and I was trying to compile form itself. This same kind of error appears – some of PL/SQL functions have not been found. But why ? There were PLX and PLL files with “missing” PL/SQL procedures and functions. I have used “strace” tool and I have compiled a forms again. In “strace” output file I have found references to missing files – OK not missing but with a little bit different names – “p” instead of “P”.
And this is a clue of this writing – developers didn’t use any naming convention for file names and links for that files – Windows is not case sensitive so if you call form “Form1” or “FORM1” this is still this same file – but not on case sensitive OS like Linux or other UNIX based systems. I have seen a lot of form which called other forms using not case sensitive names – don’t do that because in case of migration you will have a lot of additional work just to change forms names inside code.
Forms builder added his part too – if you attached a library it is using a file name as a reference to library and it is very important to keep file name case sensitive.

Friday, October 2, 2009

What can happen if you have "\p" in your ORACLE_HOME

Hi, Today I have hit very strange error when I was trying to install Grid Control 10.2.0.5 on Windows. This installation is split into two parts: installation of Grid Control 10.2.0.1 and then patch to 10.2.0.5. First part has been done without any problems but I have made a mistake. I have setup a beginning of all ORACLE_HOMEs to “e:\oracle\product\10.2.0\”. Grid Control creates itself 3 different ORACLE_HOME (one per DB, OMS and agent) using entered path as a prefix. A whole installation of 10.2.0.1 has been performed without any problems and I have started an upgrade to 10.2.0.5. During a configuration phase an OC4J deployment has been finished with error. I start to investigate and I was looked into OUI log file C:\Program Files\Oracle\Inventory\logs\installAction<...>.log

Can't find unicode character property definition via main->r or r.pl at unicode/Is/r.pl line
oracle.ias.sysmgmt.repository.plugin.advanced.apache.parser.ParserException: parsing e:\oracle\product\10.2.0\oms10g\Apache\Apache\conf\ssl.conf:


Similar error is described in Metalink note 563468.1 but it is related to Oracle Application Server and OPatch utility. A cause of that error is a ORACLE_HOME value. Perl is interpreting the string "\p" in the Oracle home path as a regular expression. It is a little bit funny as Oracle is recommending an ORACLE HOME looking like ‘\oracle\product\’
I didn’t find any patch related to that error so I have just installed a Grid Control using a patch without “\p” string.

Wednesday, September 23, 2009

Trace access errors in Oracle.

There are several possible causes for object access error in application:
  • Application upgrade
  • Deploying new rules for security
  • Hardcoded schema names
Oracle common errors for those situations are:


ORA-00942: table or view does not exist
ORA-01031: insufficient privileges


Depends of application those errors can appear in application log files or only on user screen. Sometimes it is very difficult to find out a root cause of that kind of error – especially when it is out of the box application without proper logging and tracing possibility.
Oracle logging interface is very powerful and it allow a system administrator to setup a logging of above errors into Oracle alert.log file and into sessions trace files. To enable that functionality a trace level for error has to be increased – it can be done on system or session level. Of course for application a system level is more useful.
Following steps has to be performed on system or session level to trace a ORA-00942 error.

- for all sessions


alter system set events '942 trace name errorstack level 1';

for current session

alter session set events '942 trace name errorstack level 1';

Test user is executing following SQL statement:

SQL> select username from dba_users;
select username from dba_users
ERROR at line 1:
ORA-00942: table or view does not exist

Results in alter.log and trace files


Tue Sep 22 11:31:58 2009
Errors in file /oracle/app/diag/rdbms/pioro/pioro/trace/pioro_ora_23409.trc:
ORA-00942: table or view does not exist

Trace file details

Trace file details
*** 2009-09-22 11:31:58.607
*** SESSION ID:(152.1313) 2009-09-22 11:31:58.607
*** CLIENT ID:() 2009-09-22 11:31:58.607
*** SERVICE NAME:(SYS$USERS) 2009-09-22 11:31:58.607
*** MODULE NAME:(SQL*Plus) 2009-09-22 11:31:58.607
*** ACTION NAME:() 2009-09-22 11:31:58.607

----- Error Stack Dump -----
ORA-00942: table or view does not exist
----- Current SQL Statement for this session (sql_id=chvsmttqjzjkn) -----
select username from dba_users


Using above information from trace file DBA can figure out where a problem is and find a proper way to solve it.

regards
Marcin

Sunday, August 30, 2009

SwingBench

First time I have seen that tool a few years ago and I was really surprise how useful it was. Now I have come back to it and a new version (release 2.3) in my opinion is very mature and can be used for many various performance tests.
Let me present show main features:
  • Few different benchmark test
    • OLTP Order Entry PL/SQL version
    • OLTP Order Entry Java version
    • OLTP Calling Circle
    • Stress test
    • DSS Sales History
    • Custom


  • Possible outputs
    • Transaction per minute
    • Transaction Response time
    • DML operation per minute


  • Many configuration options ex:
    • Number of concurrent users
    • Min and max delay time between transactions
    • Logon delays
    • Using of connection pooling


  • Distributed client coordinator
All features are described on Dominic Giles (author) homepage in documentation section. This document describe version 2.2 and a newest version is 2.3 and some additional configuration possibilities has been added from XML to GUI.

There are a lot of possible configurations and tests covered by SwingBench. Combining a customer defined transactions in tool with application knowledge SwingBench can be used as a small brother of Oracle Real Application Testing – of course it required much more work to define a workload but it is for free.

How to use it – it is very simple please take a look on my quick start guide.
  1. Download a package
  2. Unzip into target directory – ex. c:\SwingBench
  3. Edit a SwingBenchenv.bat file – only path below should be changed

    REM Set the following to reflect the root directory of your Java installation

    set JAVAHOME=C:\Program Files\Java\jre1.6.0_07

    REM Set the following to the directory where you installed swingbench

    set SWINGHOME=C:\swingbench

    REM Set the following to the location of your TimesTen install (optional)

    set TTHOME=C:\TimesTen\tt70

    REM If you don't have the ORACLE_HOME environment variable set uncomment the following and change it to reflect your environment

    set ORACLE_HOME=C:\oracle\product\10.2.0\db_1

  4. Go to C:\swingbench\winbin and start oewizard.bat - configuration wizard



  5. Click next and choose a create schema



  6. Enter a server IP, database name and sys password



  7. Enter schema name and tablespace location - if you have a EE edition you can clik Use Partitioning option



  8. Choose a size of demo schema using sliders



  9. Confirm all settings and click Finish



  10. Wizard will display a additional window with Wizard log



  11. Click OK on confirmation screen



  12. Click Cancel in main Wizard Window



  13. You can start now a SwingBench - c:\swingbench\winbin\swingbench.bat



    And change a ConnectionString to proper one for your environment

  14. Click Start button and enjoy a performance test




This is all for a first time. Now I'm trying to find a fastest way to catch up
a workload on existing database and transform it into a SwingBench tests.

I hope it help some people to test a performance of existing and future installations.

regards,
Marcin

Wednesday, August 19, 2009

Slow network connection in 11g

Today I found a thread on Oracle forum about a slow connection and I recall that I solved it in past.
Now I have found a few notes on Metalink (ex. 803838.1) which confirm my investigation that Oracle 11g is using DNS to resolve names instead of /etc/hosts file on Unix box. If your server is not registered in DNS or if there is no DNS server which was specified in /etc/resolv.conf you can have a connection problem.

A fastest way to solve it on Linux is to remove "nameserver" entry from /etc/resolv.conf file if you don't need a DNS on the server otherwise you have to be sure that your Oracle server is registered in DNS.

On Solaris a solution is a little bit different - change line in /etc/nsswitch.conf into

ipnodes: files [NOTFOUND=continue] dns

and do not delete line starting with hosts.

regards,
Marcin

Thursday, August 13, 2009

Application Server - Forms session monitoring

Hi,

Oracle Application Forms server has some build in logging tools which allow a DBA to trace a performance or debug data. In Oracle documentation you can find a example how to use that tools.
If you check a URL you can notice that all examples are related to default forms environment

http://yourserver/forms/frmservlet?serverURL=/forms/lservlet/perf

But how to use that if you have a few different environment deployed on server (ex. a few applications) and you are using URL which include a environment name

http://yourserver/forms/frmservlet?config=myconfig.env

You have to realize (I spend some time several times on it so I decide to create that post)
that "serverURL" and config are a parameters in your URL and according to HTTP standard for GET method we can combine parameters using "&" character.

http://yourserver/forms/frmservlet?config=myconfig.env&serverURL=/forms/lservlet/perf

That's all folks and I will try to remember that using my blog memory :-)


regards,
Marcin

Oracle indexes

Hello,

Jonathan Lewis just started a series of posts about indexes. This is one of these things which has a flag "must to read". Please find some examples:
- Index fragmentation
- Index explosion part 1
- Index explosion part 2


regards,
Marcin

Monday, August 10, 2009

ASM datafile operation in NOMOUNT

Hi

Several times I have been asked to manipulate with data files placed on ASM. In 11g there is a very nice feature – Oracle added a cp command to asmcmd tool but in 10g we have to use database instance to do all work.
One of very common task is coping a data file from one ASM disk group to other one. It can be done via RMAN copy command but this command adds an entry to RMAN repository. There is another method to do that and I found it in Oracle documentation (Yes, yes, RTFM). Convert command can be used not only to copy data file from ASM to file system and vice versa but to copy between ASM groups too.

Example

RMAN> convert datafile '+DATA/PIORO/DATAFILE/ORABPEL.260.692892311' format '+DATA/PIORO/copyofORABPEL';
Starting backup at 10-AUG-09
using channel ORA_DISK_1
channel ORA_DISK_1: starting datafile conversion
input filename=+DATA/pioro/datafile/orabpel.260.692892311
converted datafile=+DATA/pioro/ copyofORABPEL
channel ORA_DISK_1: datafile conversion complete, elapsed time: 00:00:25
Finished backup at 10-AUG-09


ASM to file system

RMAN> convert datafile '+DATA/PIORO/DATAFILE/ORABPEL.260.692892311' format '/home/oracle/copyofORABPEL';
Starting backup at 10-AUG-09
using channel ORA_DISK_1
channel ORA_DISK_1: starting datafile conversion
input filename=+DATA/pioro/datafile/orabpel.260.692892311
converted datafile=/home/oracle/copyofORABPEL
channel ORA_DISK_1: datafile conversion complete, elapsed time: 00:00:26
Finished backup at 10-AUG-09

Keep in mind that this command doesn’t add any information to control file so you want to rename a file you have to maintain it manually.

This command is working in NOMOUNT state too but a additional parameter is required

[oracle@piorovm ~]$ rman target /
Recovery Manager: Release 10.2.0.4.0 - Production on Mon Aug 10 22:15:44 2009
Copyright (c) 1982, 2007, Oracle. All rights reserved.
connected to target database: pioro10 (not mounted)

RMAN> convert datafile '+DATA/PIORO/DATAFILE/ORABPEL.260.692892311' from platform 'Linux IA (32-bit)' format '/home/oracle/copyofORABPEL';
Starting backup at 10-AUG-09
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=154 devtype=DISK
channel ORA_DISK_1: starting datafile conversion
input filename=+DATA/pioro/datafile/orabpel.260.692892311
converted datafile=/home/oracle/copyofORABPEL
channel ORA_DISK_1: datafile conversion complete, elapsed time: 00:00:26
Finished backup at 10-AUG-09

If you need to copy archive log (ex. From flash recovery area) you can use a RMAN copy command - this is running only in MOUNT state.

RMAN> copy archivelog '+data/PIORO/ARCHIVELOG/2009_07_28/thread_1_seq_58.266.693419455' to '/home/oracle/arch_1_58';
Starting backup at 10-AUG-09
using channel ORA_DISK_1
channel ORA_DISK_1: starting archive copy
input archive log thread=1 sequence=58 recid=1 stamp=694559795
output filename=/home/oracle/ arch_1_58 recid=3 stamp=694560324
channel ORA_DISK_1: archivelog copy complete, elapsed time: 00:00:07
Finished backup at 10-AUG-09

But it can be a dangerous command because additional entry to control file is created and if you delete this file from a file system without sending that information to RMAN archive log backup could failed. To avoid it always CROSSCHECK ARCHIVELOG ALL before archive log backup or use option “skip inaccessible”.
This is it for today but I have some additional ideas to check.


Marcin

Monday, August 3, 2009

andLinux and Oracle

Hi,

Please find my short paper about installation of andLinux and Oracle 10g.
It can be a useful when you want to have a Linux version of Oracle on your Windows machine without any virtualization tool.
andLinux is a Ubuntu distribution and is not officially supported but it is working fine. It was some problem with path to "basename" and "awk" during a linking (error related to nnfyboot file).

regards,
Marcin

Monday, July 27, 2009

Free version of ASH

Hi,

If you have a performance problem and don't have license to use Active Session History
you can use a free version of it. It is not using a direct SGA access but it is designed by Kyle Hailey
- one of OEM Performance tab designer.

When I read about direct access to X$ table using a oradebug utility on
Miladin Modrakovic's blog I started to wonder if this two technologies can be combined.
Unfortunately I have tested direct access using oradebug and CPU usage is unexpected high on my VM machine and in that case direct access can disrupt our observations.

regards,
Marcin

Wednesday, July 22, 2009

RMAN backup from active database

Hello,
During my test with ASM I have broken my database. I couldn’t restore an ASM structure and I had to recreate database. Because it was a test environment I hadn’t backup of it but I remembered that I hopefully had a standby database. It should be easy to duplicate standby and recreate my test DB once again.
I decided to use a new feature of RMAN to restore database directly from standby. I created a script and I have started a RMAN.

run {
set newname for datafile 1 to '+DATA';
set newname for datafile 2 to '+DATA';
set newname for datafile 3 to '+DATA';
set newname for datafile 4 to '+DATA';
set newname for datafile 5 to '+DATA';
set newname for datafile 6 to '+DATA';
set newname for tempfile 1 to '+DATA';
duplicate target database to PIORO from active database;
}

rman target sys/oracle@piorost auxiliary sys/oracle@pioro cmdfile=dup_back.txt msglog=dup_back.log

What was my surprise when I found following message in log file.

Starting backup at 22-JUL-09
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=87 device type=DISK
channel ORA_DISK_1: starting datafile copy
input datafile file number=00002 name=/oracle/oradata/pioro/sysaux01.dbf
RMAN-03009: failure of backup command on ORA_DISK_1 channel at 07/22/2009 10:01:58
ORA-19504: failed to create file "+DATA/pioro/datafile/sysaux01.211
channel ORA_DISK_1 disabled, job failed on it will be run on another channel
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of Duplicate Db command at 07/22/2009 10:01:58
RMAN-03015: error occurred in stored script Memory Script

RMAN-03009: failure of backup command on ORA_DISK_1 channel at 07/22/2009 10:01:58
ORA-19504: failed to create file "+DATA/pioro/datafile/sysaux01.211

Recovery Manager complete.


It was very strange because this configuration was running without any problems before my tests.


After few tries with changing a script (adding a db_file_name_convert, changing a new name) I decided to check Metalink – unfortunately there is a note number 787753.1 that this is a bug in 11.1.0.7 and will be fixed in 11.2 ! My test case was a little bit different than Metalink example because my standby DB has been placed on file system and end results was identical. I have to backup a standby to disk than restore it on my test machine.


After that I hit into another strange issue – because I have restored a control file from standby database there were standby logs and online redo logs had a wrong path. But it should be easy, isn’t? Just add new online logs on ASM and drop old one.


SQL> select group#, member from v$logfile;

1 /oracle/oradata/pioro/redo01.log
2 /oracle/oradata/pioro/redo02.log
3 /oracle/oradata/pioro/redo03.log
4 /oracle/oradata/pioro/standby01.log
5 /oracle/oradata/pioro/standby02.log
6 /oracle/oradata/pioro/standby03.log

SQL> alter database add logfile ('+DATA') size 50M;
alter database add logfile ('+DATA') size 50M
*
ERROR at line 1:
ORA-19528: redo logs being cleared may need access to files


Hmmm another nice issue – yes I have cleared logs due to reset logs option but DB is open now and I’m adding a new redo file not dropping it.
Let’s try to drop something


SQL> alter database drop logfile group 6;
alter database drop logfile group 6
*
ERROR at line 1:
ORA-19528: redo logs being cleared may need access to files


I have restarted DB in clear mode but it didn’t help. In my opinion it could be a problem with control file. So I have decided to force DB to recreated a control file using – alter database backup control file to trace – command.
It was a solution – after that I could drop and add log files groups.


SQL> alter database add logfile ('+DATA') size 50M;
Database altered.

SQL> alter database drop logfile group 2;
Database altered.


I’m not sure if similar problem with redo logs appear in previous version of Oracle but main old approach to don’t install a first release of Oracle DB on production systems seems to be still actual and it give me more safety.


regards,
Marcin

Thursday, July 2, 2009

New performance papers from Method R

Cary Millsap public ate a new paper about software performance - Fundamentals of Software Performance Quick Reference Card. Check a other brilliant paper too - Making Friends with the Oracle Database.

regards,
Marcin

Tuesday, June 30, 2009

ASM and BBED

As I mentioned in my previous post I make some research with ASM and Oracle block edit.
Please find my short paper about that. This version doesn't include a table protected via Oracle Vault.

If you have any remarks please don't hesitate and drop me a mail.

regards,
Marcin

Wednesday, June 24, 2009

High CPU after DB Console start in 11g

Hi,

If you have a problem with Oracle sessions taking a high CPU load in 11g after you start a DB Console check if any of these session is executing following query:

SELECT EXECUTION_ID, STATUS, STATUS_DETAIL FROM MGMT_JOB_EXEC_SUMMARY
WHERE JOB_ID = :B3 AND
TARGET_LIST_INDEX = :B2 AND EXPECTED_START_TIME = :B1

If yes, you can fix it using that steps:

  1. stop DB Console

    emctl stop dbconsole

  2. connect as sysman using SQL*Plus execute both procedures

    SQL> execute MGMT_PAF_UTL.STOP_DAEMON;
    SQL> execute MGMT_PAF_UTL.START_DAEMON;

  3. Start DB Console

    emctl start dbconsole
This is reported as bug number 7182962

Marcin

Strange DataGuard behaviour

I have been tested a new features of 11g Data Guard. During that I have change a database protection mode into Maximum Protection
and after that make some test with startup and shutdown of both databases.
As expected when primary database was online I have got following error on console and warning in alert.log

SQL> shutdown immediate
ORA-01154: database busy. Open, close, mount, and dismount not allowed now

Alert.log
Attempt to shut down Standby Database
Standby Database operating in NO DATA LOSS mode
Detected primary database alive, shutdown primary first, shutdown aborted

That’s expected results. But in next test case when I shutdown a primary database I still got that same error !

SQL> shutdown immediate
ORA-01154: database busy. Open, close, mount, and dismount not allowed now

I have rerun disabling command for recovery process and checked if recovery mode is disabled

SQL>ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL;
Database altered.
SQL> SELECT RECOVERY_MODE FROM V$ARCHIVE_DEST_STATUS;

RECOVERY_MODE
-----------------------
IDLE
IDLE
IDLE
IDLE
IDLE
IDLE
IDLE
IDLE
IDLE
IDLE
IDLE

11 rows selected.

Hmmm .... of course I can shutdown that database using abort but I don’t want to.
Let’s check what else can prevent database to shutdown – a little check what processes are running and EUREKA – dataguard broker is still running. So:

SQL> alter system set dg_broker_start=FALSE scope=memory;
System altered.

SQL> shutdown immediate
ORA-01109: database not open
Database dismounted.
ORACLE instance shut down.
SQL>

Uffff .....
Other question is why ? If primary database was down, standby database should be stopped without any other circumstances.

regards,
Marcin

Wednesday, June 17, 2009

ASM and RedHat 5

I’m really angry on me because I spend another 30 min fighting with problem which I have already solved several times. Oracle ASMLib is looking for a disks using /dev/dm-* devices which are not created by default in Red Hat 5. But if you manually map any device as Oracle ASM disk it will be working until next reboot.

So if after reboot in Red Hat 5 your ASM disks disappear check if you done following things:

  • Comment line in /etc/udev/rules.d/ 90-dm.rules

  • #KERNEL=="dm-[0-9]*", ACTION=="add", OPTIONS+="ignore_device"

    it creates a /dev/dm-* devices after system reboot


  • Check in /etc/sysconfig/oracleasm if disk are not excluded

    # ORACLEASM_ENABELED: 'true' means to load the driver on boot.
    ORACLEASM_ENABLED=true
    # ORACLEASM_UID: Default user owning the /dev/oracleasm mount point.
    ORACLEASM_UID=oracle
    # ORACLEASM_GID: Default group owning the /dev/oracleasm mount point.
    ORACLEASM_GID=oinstall
    # ORACLEASM_SCANBOOT: 'true' means scan for ASM disks on boot.
    ORACLEASM_SCANBOOT=true
    # ORACLEASM_SCANORDER: Matching patterns to order disk scanning
    ORACLEASM_SCANORDER=""
    # ORACLEASM_SCANEXCLUDE: Matching patterns to exclude disks from scan
    ORACLEASM_SCANEXCLUDE=""

  • Check a file privileges


After that you should see your ASM disks.

Regards,
Marcin

Tuesday, June 16, 2009

BBED and Oracle Vault

Hi,

I was always wondering how much Oracle Vault is preventing your data to avoid unauthorized accesses by DBA. To be honest I didn’t dig into Oracle code looking for security hole but after I recall about Block Editor I decided to try that tool.

And it is working – I was able to read a data from data file based on file system without any problem. OK there was some work to do like find out a block number and so on but it wasn’t really hard.

After that I tried to use BBED on ASM based file but without luck, but I’m a person who don’t give up so easy. I was thinking about some RMAN copy and it is required a additional disk space and we can deal only with copy of running DB. I have research a internet and I have found amazing stuff about ASM - Luca Canali web site - I read it and I found a solution how to read a data from running DB. It is possible to read a block from ASM into file system, edit it and copy back into running DB. Oracle Vault as expected didn’t recognize that block has been edited.

So is it a possibility to protect data ? In my opinion data encryption is a good choice until someone don’t find a way how to encrypt and decrypt it ;)

Working example of my work should be here in a few days.

regards,
Marcin

Monday, June 8, 2009

BBED - Oracle Block Editor

Hi,

I just found that paper about Oracle Block Editor (BBED) in my old ebook directory.
It can be very useful but very dangerous too. Please read that document very careful
and never try it on production database.

Here is a link to that paper.

regards,
Marcin

Thursday, May 28, 2009

New blogs

Hi,

I have found some interesting blogs

Tanel Poder Oracle blog
Miladin Modrakovic Oracle blog

Take a look if you have a time.

Marcin

Saturday, May 16, 2009

A short story about Oracle CBO and data – part 3 - 11g

I have decided to check that same query on 11g.
And there is an improvement – I gathered a statistics and it is running without any problems.
So upgrade to 11g can be a step in good direction.
The main thing is that in 11g a new multicolumn join selectivity algorithm had been
used. I have to do some more research on that. Right now I want to show only
difference using my query as an example



Trace file from 10.2.0.4


Using multi-column join key sanity check for table TAB8
Revised join sel:6.1576e-04 = 0.0021556 * (1/1624.00) * (1/0.0021556)
Join Card: 126.39 = outer (373.19) * inner (550.00) * sel (6.1576e-04)
Join Card - Rounded: 126 Computed: 126.39


Trace file from 11.1.0.7


ColGroup cardinality sanity check: ndv for TAB8[H] = 550 TAB7[S] = 1
Join selectivity using 1 ColGroups: 0.002156 (sel1 = 0.000000, sel2 = 0.000000)
Join Card: 442.439495 = = outer (373.189655) * inner (550.000000) * sel (0.002156)
Join Card - Rounded: 442 Computed: 442.44


Please compare line 3 in both of them. Did you realize that computed join cardinality in 11g is almost 3.5 times bigger ?
In 10g a join selectivity was based on join columns selectivity, in 11g ?
I don't know yet.

update 18.05.2009
I shouldn't write anything during a weekend ;)
In 11g there is a new feature called a multicolumn histogram statistics
and now this is a source of cardinality in multicolumn joins.
See example on Pythian blog



ps.
Thanks to Alex GorbatchevI have a new syntax highlighting.

regards,
Marcin

Monday, May 11, 2009

Do we still need DBA ? A short story about Oracle CBO and data. – part 2

There is more deep investigation about potential cardinality errors in CBO.
At this place I want to thanks Jonathan Lewis for this brilliant book about Cost Based Optimization
in Oracle. That was and still this is a base of all my investigations.
I have got a query (described in part 1) which cause me a lot of problems with temporary tablespace.

I have used a 10053 event to generate details about CBO decisions for that particular query. Here I want to show only a part of that 5MB text file. According to that file join number 295 has been chosen as best join order.

...
Final - All Rows Plan: Best join order: 295
...

Here is a part of 10056 file

Join order[295]: TAB1_MV[P]#0 TAB2_MV[E]#1 TAB3_MV[S]#13 TAB4_MV[H]#11 TAB5_MV[D]#9 TAB6_MV[AC]#12 TAB7_MV[SR]#7 SP_MV[R]#19 ... rest of tables

***************
Now joining: SP_MV[R]#19
***************
....
Join Card: 0.00 = outer (11.29) * inner (30335.98) * sel (1.8863e-09)
Join Card - Rounded: 1 Computed: 0.00

There is no information about joining previous tables from TAB1 to TAB7. But join cardinality for those 7 tables has been calculated to 11.29. I want to know how it has been calculated. CBO can reuse calculations from other join orders so I have to find where it was.
Here is some information about tables. This is Single Table Cardinality which is used in join calculations.

TAB1_MV[P]#0 Card: Original: 1 Rounded: 1 Computed: 1.00 Non Adjusted: 1.00
TAB2_MV[E]#1 Card: Original: 3 Rounded: 3 Computed: 3.00 Non Adjusted: 3.00
TAB3_MV[S]#13 Card: Original: 21645 Rounded: 21645 Computed: 21645.00 Non Adjusted: 21645.00
TAB4_MV[H]#11 Card: Original: 550 Rounded: 550 Computed: 550.00 Non Adjusted: 550.00
TAB5_MV[D]#9 Card: Original: 117 Rounded: 117 Computed: 117.00 Non Adjusted: 117.00
TAB6_MV[AC]#12 Card: Original: 199128 Rounded: 2797 Computed: 2796.88 Non Adjusted: 2796.88
TAB7_MV[SR]#7 Card: Original: 23 Rounded: 23 Computed: 23.00 Non Adjusted: 23.00

As you can see it can be different between num_rows and computed cardinality. So CBO calculate a number of rows. There are detail table stats

Table Stats::
Table: TAB6_MV Alias: AC
#Rows: 199128 #Blks: 2688 AvgRowLen: 171.00
Column (#16): TAB6_CR_ID
AvgLen: 2.00 NDV: 6262 Nulls: 151566 Density: 8.4931e-05 Min: 1000 Max: 70516
Histogram: HtBal #Bkts: 254 UncompBkts: 254 EndPtVals: 140
Column (#17): TAB6_CR_NUMBER
AvgLen: 2.00 NDV: 5 Nulls: 150840 Density: 7.1446e-04 Min: 1 Max: 33
Histogram: Freq #Bkts: 5 UncompBkts: 48288 EndPtVals: 5
Column (#2): TAB6_ID
AvgLen: 5.00 NDV: 14584 Nulls: 0 Density: 6.1554e-05 Min: 607 Max: 70943
Histogram: HtBal #Bkts: 254 UncompBkts: 254 EndPtVals: 230
Column (#25): TAB_CR_SRCE_CD
AvgLen: 2.00 NDV: 2 Nulls: 150840 Density: 0.0016153
Histogram: Freq #Bkts: 2 UncompBkts: 48288 EndPtVals: 2

There are NULL values in 3 of 4 columns and because in join condition Oracle doesn’t use null values it is trying to adjust a number of rows using those values.
For only one column with null values calculated table cardinality is equal to

Cal cardinality = num_rows – number of nulls

# of rows # of nulls # of not null # not null/ # of rows
199128 151566 47562 0.238851
199128 150840 48288 0.242497
199128 0 199128 1
199128 150840 48288 0.242497


If there is more than one column with null values some more calculations are in place.

Cal cardinality col 1 and 2 = Cal cardinality col 1 * (number of not null / num_rows) for col 2

And so on

Cal cardinality col 1 and 2 and 3 = Cal cardinality col 1 and 2 * (number of not null / num_rows) for col 2

So in our example

Cal card = 47562 * 0.242497 * 1 *0.242497 = 2796.88

Did you read a part number 1 ? Above calculation are based on number of rows and number of nulls and there are no information about data values. Neither number of distinct values nor information from histograms was used.
So let’s CBO do it work

Join order[294]: TAB1_MV[P]#0 TAB2_MV[E]#1 TAB3_MV[S]#13 TAB4_MV[H]#11 TAB5_MV[D]#9 TAB6_MV[AC]#12 TAB7_MV[SR]#7 ...
***************
Now joining: TAB2_MV[E]#1
***************
...
Join Card: 1.00 = outer (1.00) * inner (3.00) * sel (0.33333)
Join Card - Rounded: 1 Computed: 1.00
...
***************
Now joining: TAB3_MV[S]#13
***************
...
Join Card: 373.19 = outer (1.00) * inner (21645.00) * sel (0.017241)
Join Card - Rounded: 373 Computed: 373.19
...
***************
Now joining: TAB4_MV[H]#11
***************
...
Join Card: 126.39 = outer (373.19) * inner (550.00) * sel (6.1576e-04)
Join Card - Rounded: 126 Computed: 126.39
...
***************
Now joining: TAB5_MV[D]#9
***************
...
Join Card: 126.39 = outer (126.39) * inner (117.00) * sel (0.008547)
Join Card - Rounded: 126 Computed: 126.39
...
***************
Now joining: TAB6_MV[AC]#12
***************
...
Join Card: 11.29 = outer (126.39) * inner (2796.88) * sel (3.1939e-05)
Join Card - Rounded: 11 Computed: 11.29
...
***************
Now joining: TAB7_MV[SR]#7
***************
...
Join Card: 11.29 = outer (11.29) * inner (23.00) * sel (0.043478)
Join Card - Rounded: 11 Computed: 11.29
...

As you can see calculated output cardinality is 11.29.
I have made some small experiment and I cut out from SQL query part related only to those 7 tables.

select count(*) FROM TAB4_MV H, TAB3_MV S,TAB1_MV P,TAB2_MV E,TAB5_MV D,TAB6_MV AC,TAB7_MV SR
WHERE S.TAB3_TC = H.TAB4_TC AND S.TAB3_CON_ID = H.TAB4_ID
AND S.TAB3_CON_NUMBER = H.TAB4_CON_NUMBER
AND H.TAB4_DETAIL_PAY_CODE = D.TAB5_DETAIL_CODE
AND S.TAB3_CON_ID = AC.TAB6_CR_ID
AND S.TAB3_CON_NUMBER = AC.TAB6_CR_NUMBER
AND AC.TAB6_CR_SRCE_CODE = SR.TAB7_CODE
AND S.TAB3_CON_ID = P.TAB1_ID
AND P.TAB1_ECAT_CODE = E.TAB2_CODE;

Here is a result of that query

COUNT(*)
---------
141 354 138

Comparing that to estimated 11.92 rows you can see that real join cardinality is almost 10 mln times more that calculated.
Ok so let’s execute a full SQL query. Best join order is set to 295 - see calculation at the begining of that post. As you can see cardinality is set to 1 now but result of query

select count(*) FROM SP_MV R, TAB4_MV H,TAB3_MV S,TAB1_MV P,TAB2_MV E,TAB5_MV D,
TAB6_MV AC,TAB7_MV SR
WHERE R.SP_ENTITY_IND = 'P'
AND R.SP_CHANGE_IND IS NULL
AND R.SID = S.TAB3_STU_ID
AND S.TAB3_TC = H.TAB4_TC
AND S.TAB3_CON_ID = H.TAB4_ID
AND S.TAB3_CON_NUMBER = H.TAB4_CON_NUMBER
AND H.TAB4_DETAIL_PAY_CODE = D.TAB5_DETAIL_CODE
AND S.TAB3_CON_ID = AC.TAB6_CR_ID
AND S.TAB3_CON_NUMBER = AC.TAB6_CR_NUMBER
AND R.SID = AC.TAB6_ID
AND AC.TAB6_CR_SRCE_CODE = SR.TAB7_CODE
AND S.TAB3_CON_ID = P.TAB1_ID
AND P.TAB1_ECAT_CODE = E.TAB2_CODE;

is

ORA-01652: unable to extend temp segment by 64 in tablespace TEMP

This is not what a customer is expecting as results of report. I find out that when cardinality of table is set via dbms_stats.set_table_stats it is not changed in Single Table Cardinality Section. I have to investigate this more deeply why but for now this is my solution for that problem.

See you in part 3.

regards,
Marcin

Thursday, May 7, 2009

Do we still need DBA ? A short story about Oracle CBO and data. – part 1

What is a Cost Based Optimizer ?
It is a mathematic model which is trying to estimate using some calculation what will be a best execution plan for particular SQL. All the calculations are based on objects (tables and indexes) statistics, instance statistics and some build in rules. Step by step Oracle CBO is calculating a cost of query (join by join, filter by filter) but some very important part is missing from calculations. What is missing from that ?
A data itself. CBO (in normal mode) doesn’t care about data of course there can be some histograms or column selectivity but in general there is no overview of values of rows.

Why I’m writing about that – I have created a 14 database for 14 branches of that same intuition. There is one source system which is replicated to BI databases and in each database a materialized view based on that same query are generated.
For 12 database it was working perfectly well, but for 2 doesn’t at all. During creation or refresh phase a following error appear:

ORA-01652: unable to extend temp segment by 64 in tablespace TEMP



My first try was to increase a TEMP tablespace from 2 to 4 GB. But it didn’t help.
OK – lets resize once again – I have set up a autoextend for that file and maximum file size has been set to 20 GB. Should be enough – that tables has no more then 100 000 rows. But not,
After 15 min still

ORA-01652: unable to extend temp segment by 64 in tablespace TEMP

This query has been joined about 12 tables. Nine from them was really small and have a good filter condition – a number of rows for that 9 join was really small between 20 and 500. But these tables have been joined to 2 big tables which have thousands of rows. A join condition between was really unique so overall results of query was limited to hundreds or thousands of rows.
I started with review of query to find a join which generate a lot of rows which has to be save in temporary space, after a few tries I had one !
Two tables have been joined on both columns – unfortunately it was not a unique join.

In table A it was four pair of values which meet join condition

Col1 Col2 Rows COUNT(*)
6001 3 8811
6001 4 292
6001 2 8246
6001 1 3292

In table B I got that same number of pairs

Col1 Col2 Rows COUNT(*)
6001 3 2700
6001 4 75
6001 2 10495
6001 1 6813

So if we multiple every join condition it is about 130 mln of rows generated from that query.
Of course design was a little bit different – there are more tables and joins which limit number of rows returned to user – but CBO decide to add that joins and filter after that.
So yes – at this stage I can say we still need DBA.

Next investigation in near future.