Parallel Operations in Oracle 11g
Parallel processing is using more resources, specifically CPU but also memory and I/O, to finish an operation faster. An operation might be doing an inventory in a warehouse. One person can do the inventory, walking row by row adding up the inventory. A faster way would be to hire a team of people. Each person would take one or two rows and when everyone is finished, the results can be tallied. The final answer will be the same either way (as long as everyone can count) but the team approach is faster. It also uses more resources (people).
An Oracle query, running in parallel, works a lot like that. Pieces of the query are distributed across the source data doing their own part to gather an answer. At the end, the pieces are brought back together and merged to get a final answer.
How fast you finish the inventory depends on many factors. How well can the team count compared to the single individual (quality of resources available)? How many people will be on the team (quantity of resources)? How well laid out is the warehouse (design of the database)? Do the resources have to wander all over the place finding stuff to count (design of the query)? Do individual items need to be counted or can we count pallets and boxes (pre-aggregation and materialized views)? There are many factors to the performance as well as the appropriateness of using a team versus an individual and for using parallelization. As an example, if everything is pre-aggregated in materialized views and you are returning a very small number of results, using a parallelized query doesn’t make sense. That would be like sending in a team to count the contents of a single box. Not very appropriate.
Parallelization can also hurt you. If you try to parallelize more than your resources can support, you can actually slow down overall performance. If you are in contention for IO, adding CPUs (parallel processes) won’t help. You can also lower overall system response by parallelizing too much. How much is too much? That depends on your system, your data, and your needs.
Something I hear very frequently, and unfortunately, usually from DBAs, is that to parallelize, you must use partitioning. I had this argument recently with a DBA. He was of the opinion that you can only parallelize across partitions while it is easy to prove that you can parallelize within a single segment.
I don’t know if this myth was true at some time but it is not true now. While if you do partition, you can get some really great performance on large data sets, you do not need to partition to use parallel operations. In some cases, partitioning + parallel query = go fast button. That’s only SOME cases and there are repercussions on the system even then.
A parallel process in Oracle is made up of several components. There is a master process that starts sessions to do work and coordinates the results from those. This is called the Query Coordinator (QC). The QC is like the MCP in Tron but not evil. Mostly.
The QC spawns slave processes that do the work for it. Wow, that really sounds like the MCP doesn’t it? These slaves are parallel servers. Usually, the QC spawns an even number of parallel servers. It spawns a reader (called the producer) and a user (called the consumer). The producer gets the data and the consumer does something with the data. This partnership is called a slave set.
If you think back to our warehouse scenario, we now have a pointy haired boss (QC) telling his peons (parallel slaves) what to do. What’s changed is that in a team of 10, we would have 5 people going row by row counting items (producer) and then giving that information to their assign partner (consumer). The conversation would go something like this:
Producer: I have a can of dinty moore stew.
Consumer: Dinty Moore Stew 1
Producer: I have a can of dinty moore stew.
Consumer: Dinty moore stew 2
Producer: I have a can of Campbell’s beef stew.
Consumer: Ewww. Dinty Moore stew 2, Campbell’s beef stew 1
When all the partnerships get finished they go find the QC (who is probably taking a smoke break or something) and give this data to it.
Since we get an even number of parallel servers per process, when we tell Oracle how far to go with the parallelism, called our Degree of Parallel, we get twice the number of processes. So, if we want to parallelize a query across 4 processes, we would give Oracle a degree of parallel of 4. We would have 8 parallel servers.
From here, the internals of parallelization can quickly get complex. Throw in RAC and some of the more esoteric features of Oracle and you can get a headache. Oracle provides some excellent documentation on this topic.
Getting the QC fired up and motivated to make your life better, as well as all the coordination with parallel servers adds overhead to processing. When you are trying to make a 10 minute query run faster, a little overhead is no big deal. When you are trying to return sub-second response times, any overhead is bad.
Parallel processing also requires more resources per operation that serial processing does. Make sure you have hardware to support parallel. If you have a single dual core CPU and 4GB of RAM, parallel will break you. If you have 16 CPUs and 64GB of RAM on an ultrafast SAN, parallel can be sweet.
Parallel does not fix a poorly designed database or application. If it runs like crap serially, it will run like crap in parallel, it’ll just use more resources doing it.
Having said that, you might be wondering, when do you want to use parallel processing? Not normally for online, real-time transactions processing (OLTP). OLTP is usually dealing with smaller data sets, smaller data volumes and key based joins. OLTP is usually the place you’re looking for the sub-second response times and overhead is bad. If you do any batch processing or reporting in your OLTP system, say when a customer wants to see all orders for the last year, you might want to do parallelization there.
Data warehousing and BI (OLAP and analytical type processing) can usually benefit greatly from parallelization. You’re dealing with large data sets, you sometimes are crunching large amounts into smaller consolidated data sets and the time it takes to return isn’t usually as critical. You are also usually dealing with larger machines that have more CPU, more memory and larger data pipes. All of this makes it ideal for parallelization.
If you have some very large queries that are not performing to expectations or requirements, you can try parallelizing. Benchmark first, parallelize and benchmark again. Get an improvement? Cool but you’re still not done. Now benchmark before and after, under a full load, with as many concurrent users as normal running normal processing. Still better? Nobody broke? Cool. Now you can start testing.
If you are using partitioning in your warehouse or OLAP database, the combination of partitions and parallel can be a huge performance boost. If you are joining tables partitioned on the same key, partition wise equijoins, the parallelization gets a huge boost. This eliminates the need for a particular data redistribution step and can get the results back even faster.
If you have many users running many operations concurrently, you have many large batch operations, or you have many users allowed to write ad-hoc queries, you will need to be careful with parallelization. You can seriously degrade your system by letting things get out of hand. Adaptive parallelism is a feature to help control this. Oracle will throttle the degree of parallelism down when things start going bad but I prefer not to trust a production instance to this myself.
Oracle requires a few steps to set up parallelization in the database. Nothing huge, just need to think about some parameters. To keep this short, I will just send you to the documentation (the link above).
You can set the degree of parallel at various object levels but I have found it best to set my degree of parallel at the query or operation level. You do this with the parallel hint select /*+ parallel(foo, 4) */ sum(bar) from foo; or you can force parallel using an alter session. If you want to use parallel operations on DML (insert, update, delete), you need to alter session enable parallel dml.
Some last thoughts on parallel processing.
There are at least a half dozen other features of the database, not to mention general performance improvement techniques, that you should look at before parallelizing. Be sure you need what parallel gives you before using it.Parallel processing is a very specific solution to a very specific type of problem. Think of parallelism as applying muscle to fix a problem. It’s not always the best solution and if it’s the muscle is in the forehead, well…
Want to know how fast parallel can make your system? Do you know how fast it is now? No? Do that first. How fast do you need it to be? Don’t know? Do that next.
Are you meeting all of your SLAs? Yes? Then why do you want to parallelize?
Read the Oracle documentation. It has plenty of best practices and tips for parallel. Believe them. Test them, but believe them.
If you see any oopsies or typos, please let me know.
You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.