wallacel -- Thanks for the question regarding "design the table", version 8.1.6
You Asked
Tom: when doing the database design , i have the following question. let's say we have emp table which have empid,empname,deptno, as well as empPicture, empGreeting, empFamilyPict. since we don't use empPicture , empGreeting( which is avi), empFamilypicture a lot, do you think it make sense to create another to store that? so instead of have one table emp(empid number, empname varchar2(100), deptno number, empPicture blob, empGreeting blob, empFamilyPict blob). we have two, one is emp(empid number, empname varchar2(100), deptno number) another is empBlob(empid number, empPicture blob, empGreeting blob, empFamilyPict blob). or latter on i change it to 4 table , like 1.emp(empid number,empname varchar2(100),deptno number) 2.empPict(empid number,empPicture blob); 3.empGreeting(empid number,empGreeting blob) 4.empFpict(empid number,empFamilyPict blob). does that make sense? which one is better? one table? two table? 4(multiple table)? Regards,
and we said...
1 table. With blobs, which are typically stored out of line, you already physically have the data elsewhere. Logically they belong together -- keep them together.
| Reviews | |
|---|---|
maximum no of columns per table July 4, 2002 - 3pm Central time zone Reviewer: nvssk from India If an entity is having say 100 fields, Is it suggested to create one single table? or split into multiple tables based on the most frequently accesses columns? Do we need to consider data block size? If we keep them in one single table what is the impact on performance? If an entity is having LONG and/or CLOB columns also, what all should be considered for creating tables ? Do we need to decide on storage parameters for each transaction table ? Followup July 4, 2002 - 3pm Central time zone: if an entity has 100 attributes, it has 100 attributes -- it'll be a single table. hopefully the average row (or max row actually) fits on a block -- but that is true whether you have 10, 100 or more columns A full scan will take longer -- but you should be asking "what is the performance impact if we split them out". Well, you'll have two (or N) primary key indexes to maintain, not just 1. You'll have n*(m+1) additional IO's for every row retrieve (where n = number of tables you choose, m = height of your index). It could really impact performance badly. do no use longs, period. when you use clobs, they are moved out of the table into their own segment when they exceed 4000 bytes, the work is done for you, no need to factor them out special. I size objects by small, medium, large and extra large, using a locally managed tablespace with uniform extents. Alternatively, you can use a LMT with system manged extents and not even worry about it yourself. Thank you very mcuh. Hi Tom, Thanks for answering the above questions. If a table D depends on table C ( D cant exist without C) and table C depends on table B, and table B depends on table A. What is the best way to design the keys so that it results in better performance ? Can we composite primary keys for the above scenario ? what are the other alternatives ? Do we need to take any diversion, if table A have composite primary key in the above example? Followup July 4, 2002 - 5pm Central time zone: You can certainly use composite keys if you choose. The smaller the key, the faster the compare -- however, you have to outweigh the costs of using a surrogate key and maintaining two unique indexes if you decide to 'shorten' the key by using a single ID field populated via a sequence for example. (there are tradeoffs). I tend to go with natural keys if the natural key is 1 to 3 columns (and NEVER changes), surrogate keys (id field populated via sequence) otherwise. How about using clusters? July 4, 2002 - 11pm Central time zone Reviewer: Ian Matyssik from Nihon Hello,
I understand that relative information better be in the same table. But wouldn't it be better
if we will separate our LOBs into different table, I say that because LOBs are usially big and some
times it takes time to store them them or update them. During that time I understand that the ROW
will be locked and other information will be impossible to update. I hope you understand what I am
trying to say. So I was thinking if it would be any better to use clusters as a PK on all related
tables (separated for logical or organizational reasons) . Could you please advice on that. Followup July 5, 2002 - 10am Central time zone: As I said before: ... when you use clobs, they are moved out of the table into their own segment when they exceed 4000 bytes, the work is done for you, no need to factor them out special ....... It doesn't matter that lobs are big, they are not stored in the table segment. If someone is updating the LOB, yes the row will be locked -- but that is the way it SHOULD BE. I believe this is one table, with one row per thing. LOBS do not take an inordinate amount of time to modify -- the row is protected from other updates, but reads of the data are not blocked. I do not view this as an issue. I would not introduce the complexity you describe, no. Should LOBs be stored in different Tablespace July 5, 2002 - 10am Central time zone Reviewer: Jerry from NY USA LOB Should be stored in different Tablespace , I think that is what Oracle recommends . If that will be the case then there would be need for two tables . Followup July 5, 2002 - 11am Central time zone: NO, you are missing the point of how lobs are stored.
they are stored in 3 pieces
o in the table, there is a lob locator (pointer)
o in the same tablespace as the table, there is a lob index (more pointers to all of the chunks)
o in another segment which may be in another tablespace (or not, upto you), are the lob chunks.
When you create a table with a LOB, there are 3 segments (objects) created -- the table itself, the
lob index (which will be in the same tablespace) and the lob segment (which you control)
We already "broke" all of this stuff out physically. There is NEVER a need for two tables here.
Look at the create table statement for a lob as shown by exp/imp:
ops$tkyte@ORA817DEV.US.ORACLE.COM> create table t ( x int, y clob );
Table created.
ops$tkyte@ORA817DEV.US.ORACLE.COM> !exp userid=/ tables=t
Export: Release 8.1.7.3.0 - Production on Fri Jul 5 11:12:50 2002
(c) Copyright 2000 Oracle Corporation. All rights reserved.
Connected to: Oracle8i Enterprise Edition Release 8.1.7.3.0 - Production
With the Partitioning option
JServer Release 8.1.7.3.0 - Production
Export done in WE8ISO8859P1 character set and WE8ISO8859P1 NCHAR character set
About to export specified tables via Conventional Path ...
. . exporting table T 0 rows exported
Export terminated successfully without warnings.
ops$tkyte@ORA817DEV.US.ORACLE.COM> !imp userid=/ full=y indexfile=t.sql
Import: Release 8.1.7.3.0 - Production on Fri Jul 5 11:13:02 2002
(c) Copyright 2000 Oracle Corporation. All rights reserved.
Connected to: Oracle8i Enterprise Edition Release 8.1.7.3.0 - Production
With the Partitioning option
JServer Release 8.1.7.3.0 - Production
Export file created by EXPORT:V08.01.07 via conventional path
import done in WE8ISO8859P1 character set and WE8ISO8859P1 NCHAR character set
. . skipping table "T"
Import terminated successfully without warnings.
ops$tkyte@ORA817DEV.US.ORACLE.COM> !cat t.sql
REM CREATE TABLE "OPS$TKYTE"."T" ("X" NUMBER(*,0), "Y" CLOB) PCTFREE 10
REM PCTUSED 40 INITRANS 1 MAXTRANS 255 NOLOGGING STORAGE(INITIAL 524288
REM FREELISTS 1 FREELIST GROUPS 1) TABLESPACE "USERS" LOB ("Y") STORE AS
REM (TABLESPACE "USERS" ENABLE STORAGE IN ROW CHUNK 8192 PCTVERSION 10
REM NOCACHE NOLOGGING STORAGE(INITIAL 524288 FREELISTS 1 FREELIST GROUPS
REM 1)) ;
REM ... 0 rows
You already have all of the control you need over the location of the lob, one table -- period. You are the Real Oracle GURU. July 5, 2002 - 12pm Central time zone Reviewer: A reader Wonderful and Thanks for the detailed explanation , Once again you are the best in the business. Easier understanding of table design July 6, 2002 - 1pm Central time zone Reviewer: nvssk from India Hi Tom, How can we achieve, easier developer understanding of the table design, without one explaining the table design document / ER Model. The problem reported is difficulty in understanding the keys and relationship. When an application is going to be developed in Java/JSP in bottom-up approach( table design first), Other than ER modeling, what specific features OO can be used in oracle ? what are the performance impacts ? Followup July 6, 2002 - 1pm Central time zone: #1) telepathy I suppose? #2) I would get database people and teach them Java/JSP. Or I would get database people -- let them do the database logic in finely tuned stored procedures and have the java/jsp guys just call the plsql code (NO SQL in java)... (of course, the java jsp programmers will forget to use bind variables and refuse to use prepared/callable statements more then once but hey -- at least the sql would run good) If you are setting out to write a high performance database application, write a database application. Don't write a "cool java/jsp" that happens to use a database -- the database is going to be where you succeed or fail, don't work AROUND it, with with it. Network data transfer July 8, 2002 - 7am Central time zone Reviewer: nvssk from India Hi Tom, If the table contains say 100 columns,and you access the whole object, Wont it be lot of data on network,especially when its a web based application. What is the performance impact due to this heavy data transfer ?Or Does it matter at all? Can it be solved by creating separate views( of the required columns) for the usage based on the requirement and pickup the data only from the views. Is there any impact on the joins? Thanks for your help. Followup July 8, 2002 - 8am Central time zone: don't access the whole object then -- don't be lazy, don't code "select *" (which in my humble opinion is a BUG in your code if you do! The fastest way to a broken program is to code: select * <<<<==== use of "*" should be limited to sqlplus testing insert into t values ( <<<==== didn't list the columns insert into t select ... <<<<===== ditto a simple drop and re-create with the columns in a different order or the addition or deletion of a column in the table BREAKS your application) SELECT only that data which you actually NEED. Don't use "*" and this entire conversation is rendered MOOT. Excuse the stupid question. Will Oracle not read the entire data block where the row is stored? What is the difference in the way Oracle read the data for a select * and for select col1, col2..etc? Followup July 8, 2002 - 7pm Central time zone: It isn't a stupid question -- it is actually more complex then you probably anticipate and takes a
couple of points to fully answer
o first, there is the "BUG" inherit in the "select *" code. A simple addition or deletion of a
column will generally "break" an application. One of the glaring exceptions to this is PLSQL with
implicit cursors.
for x in (select * from t)
will not break but is unnecessarily inefficient.
o second, there are different paths the optimizer can take to answer a query and it is very
dependent on the columns you select. For example:
big_table@ORA817DEV.US.ORACLE.COM> select object_name from big_table where rownum < 100000;
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=212 Card=1459904 Bytes=36497600)
1 0 COUNT (STOPKEY)
2 1 INDEX (FAST FULL SCAN) OF 'OBJECT_NAME_IDX' (NON-UNIQUE) (Cost=212 C
ard=1459904 Bytes=36497600)
Statistics
----------------------------------------------------------
0 recursive calls
45 db block gets
583 consistent gets
0 physical reads
0 redo size
3228183 bytes sent via SQL*Net to client
2534 bytes received via SQL*Net from client
21 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
99999 rows processed
big_table@ORA817DEV.US.ORACLE.COM> select * from big_table where rownum < 100000;
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=577 Card=1459904 Bytes=150370112)
1 0 COUNT (STOPKEY)
2 1 TABLE ACCESS (FULL) OF 'BIG_TABLE' (Cost=577 Card=1459904 Bytes=1503
70112)
Statistics
----------------------------------------------------------
0 recursive calls
18 db block gets
1397 consistent gets
0 physical reads
0 redo size
11972872 bytes sent via SQL*Net to client
2534 bytes received via SQL*Net from client
21 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
99999 rows processed
big_table@ORA817DEV.US.ORACLE.COM>
Look at the huge differences in consistent gets (blocks processed). This is due entirely to the
one query being answered via an index and the other full scanning. If you include columns in your
select list that FORCE us to goto the base table -- when you don't need to -- you are forcing a
less optimal query plan on us
o third, look at the difference in data transfered!. Almost 4x as much -- that could be huge.
o and the last thing I'll mention will be chained rows... The impact from this is subtle. If
you have chained rows -- and you do a select *, we'll be forced to construct the entire row (which
will take additional random IO's to find the "end" block for your row - we have to piece it
together). Now, if you select only the columns you need AND these columns are on the head
rowpiece, we'll never assemble the entire row -- we do just what we need to do to get your data.
So, yes, Oracle reads an entire block -- but by selecting only the columns you need you can:
o avoid reading that block in the first place (skipping the table access by rowid step in an index
range scan for example)
o avoid reading multiple blocks to construct the entire row
o open up better query plans, more query plans
o have less buggy code -- code that is much more resilient to change
o transmit lots less data over the network
and so on.... Great answer. Thanks Tom. So if Oracle reads the entire datablock (when the data is not available from an index) where the row is stored, why are you against limiting a table to the frequently accessed attributes of an entity? If an entity has 100 attributes and only 20 are accessed frequently, is it not a waste to read 100 columns when you actually need only 10-15 of them? Sorry if I am beating a dead horse. Also can you please tell more about the select * bug? Thanks. Followup July 9, 2002 - 12pm Central time zone: We read a block -- if you have 20 columns or 100 columns we read the block. If you only need 20 of them, and we need to read the block, we'll still read the block. We'll only process 20 of them. The select * bug is that if you have a program that does a select *, the * is resolved at runtime. If the number, order or types of the columns CHANGE over time -- your program breaks. If you selected ONLY those columns of interest -- only those changes that really truly affect you -- in fact affect you. We can ADD columns and you do not break. If we drop a column and you don't select that column -- you don't break -- if you did a SELECT * however, you would have in your FETCH command a place for this missing column and you would break. Parent child relationships July 11, 2002 - 6pm Central time zone Reviewer: nvssk from India Lets say, There are tables P, C1, C2, C3, C4 and P is the parent of C1 ..C4. If the application need to show C1,C2,C3,C4 togather for each row in P, what is the best way to model it? i.e, P1 C11 C12 C21 C22 C31 C32,etc. 1. If you manage as Parent child, one need to write UNION between queries i.e, select p.p1,c1.c11 from P,C1 where P.pc1=C.pc1 UNION select p.p1,c2.c21 from P,C1 where P.pc1=C.pc1 Is this an efficient way? OR 2.Can we have a table C in between P and C1,C2,C3,C4 where P maps to C (1:n) and C maps C1(1-1),C maps to C2(1-1) etc. This way we can get all the required data in one query, no need of unions. But, we will not be able to put referential constrains as C stored all primary keys of C1,C2,C3,C4 in one column. What is the right approach in both of the above ? Followup July 12, 2002 - 8am Central time zone: If P has a 1-1 relationship with 3 of the C* tables and either a 1-1 or 1-m (one to many) with 1 of the C* tables, you can write it as: select * from p, c1, c2, c3, c4 where p.key = c1.fkey and p.key = c2.fkey and p.key = c3.fkey and p.key = c4.fkey if any of the relationships are OPTIONAL, add the outer join (+) to the query. If more then 1 of the child tables is 1-m with P, then you cannot do it with a join (since if C1 has two rows that match with P and C2 has 3 rows -- you'll end up with 6 rows -- a cartesian production). You'll want to use a UNION ALL -- better then UNION Data Model September 13, 2002 - 12pm Central time zone Reviewer: Mark from CA Tom, I have the following requirements and could you please tell me whether I can achive this using the Logical model. It is a technical Membership info Design. 1.One business Type can have multiple comanies under it. 2.One company Must Belong to only on Business Type. 3.One company can have multiple employees 4.we are interested in only the employees with Tech Membership and they have to enroll every year.(we need the Historical Info also). 5.Each company Appoints multiple officers. 6.Each Business Type will have a Director who is selected from the Officers table. 7.Director for each business type should be from that business type only. I have designed the tables for the above requirements but I am having the problems in enforcing the Relation between Business Type and Company and the requirement 7 at the same time. Table1: Business Type Columns Bus_Type (Primary Key) Table2: Company Columns Company Code (Primary Key) Bus_Type (Foreign Key (table1)) Table3: Member Columns Member Id PK Enrollment Year PK Company Code PK (FK to Table 2) Table4 Officer Table Officer Id PK (FK to table3) Year PK (FK to Table3) Company Code PK (FK to Table3) Officer Role (Non Key Column) Directors Bus_Type PK (FK to Table1) Year PK (FK to Table4) Director Id (FK to Table4 column Officer Id) Directors Company (FK to Table4 Column Company Code) But In the above model I can not enforce the Requirement 7 which is Director of the bus type Must belong to that bussiness Type. Could you please tell me how to fit all the above requirements in One logical model. One or two tables? January 10, 2003 - 10am Central time zone Reviewer: John from New York Get back to how to design tables:
I have a system where we need to generate alerts. The alert will be stored in an alert table,
and end user will see them based on queries. For each alert, we need some basic information, such
as alert code, and create date. We also need to store the detailed information for the event that
causes the alert, and that detailed information varies depending on the alert. In another words,
different kinds of detailed information need to be captured for different alert. I am now
considering three options, but not sure what is the best way to handle this.
Alerts are generated based on ALERT_DEFINITION table:
|alert_definition|-------<|alert_record|
(1 to M relationship)
The 3 options I am considering are:
1. Create one ALERT_RECORD table, and create columns that will be needed to capture different types
of alert detail information. For columns that don't apply to a particular type of alert, the value
will be null.
The problem is that the information is not stored in a logical way. For each row (alert) in
the table, there will be lots of columns that are not applicable to it, and therefore have a value
of NULL. There may well be confusions down the road as to which column apply to which type of
alert.
2. Create one ALERT_RECORD table for each alert type. Information will be stored in a logical way,
but I will end up with 20 plus tables just for alert detail information. That will also require
extra Java programming work to present those information.
3. Create one ALERT_RECORD table, but only use it to store the basic alert information (alert code,
and create date). Create another table called ALERT_RECORD_DETAIL to store the detailed
information. The alert details will be stored as Display_heading and display_value. For alert
detail like this:
display1 display2 display3
value1 value2 value3
will be stored in ALERT_RECORD_DETAIL as 3 records:
display_heading display_value
-------------- --------------
display1 value1
display2 value2
display3 value3
and I probably need another column to store the display order of the heading.
This design will require lots of parsing in the Java code to present the information.
|alert_definition|-------<|alert_record|--------<|alert_rec_detail|
Is there any other better way to handle this situation? Thanks.
Followup January 10, 2003 - 10am Central time zone: No version info. Ok, assuming you are using the latest software: option four create table alert_record( alert_code int, alert_date date, alert_info XMLTYPE ) or, in earlier releases option four create table alert_record( alert_code int, alert_date date, alert_info clob ) if you don't need to have the data structured. You can use Oracle text (intermedia) to index the xmltype or clob for searching if that is needed. Thanks. February 4, 2003 - 3pm Central time zone Reviewer: John from New York We are trying to design our database to accomendate Oracle 8.1.7 and up. So I guess the XMLTYPE option is out. The alert detail information I need to store and display later to users are: product_id, packing_list, acknowledge_date, etc. Those information have to be stored in a structured way. I don't know how can I display them if I use XMLTYPE or clob? Followup February 4, 2003 - 4pm Central time zone: original question was: ... I have a system where we need to generate alerts. The alert will be stored in an alert table, and end user will see them based on queries. For each alert, we need some basic information, such as alert code, and create date. We also need to store the detailed information for the event that causes the alert, and that detailed information varies depending on the alert. In another words, different kinds of detailed information need to be captured for different alert. I am now considering three options, but not sure what is the best way to handle this. 1. Create one ALERT_RECORD table, and create columns that will be needed to capture different types of alert detail information. ... 2. Create one ALERT_RECORD table for each alert type.... 3. Create one ALERT_RECORD table, (then a table with variable name/value pairs)..... #1 is the easiest #2 can be thought of as a variation on #1 -- think "views" #3 would be avoided by me if at all possible. People are going to want to query "where product_id = 55 and acknowledge date between A and B or packing_list = C". Try doing that with variable name/values So, I guess I vote for #1 with the #2 variation on a theme (views) to remove the ambiguity of who needs what data... store as segname May 30, 2003 - 11am Central time zone Reviewer: Mark Rynbeek from Cambridge, UK Hi Tom,
thanks for the CLOB info I can find here. I have a question regarding the "store as segname" option
for CLOBS.
We have to design a table with biological data, proteins. These proteins have a sequence of
amino-acids, represented by a long letter string. On average, they're not really long (only .05% is
longer than 4000 characters) but still you need the CLOB.
There seems to be some consensus here that CLOB data should be in its own tablespace, physically
but not logically. So you get something like :
CREATE TABLE protein
( x INTEGER
, Y VARCHAR2(50)
, z CLOB
)
TABLESPACE t1
lob (z)
STORE AS SEGNAME (TABLESPACE t2
PCTVERSION 0
CACHE READS
NOLOGGING
)
;
Does that make sense to you? What would be the benefit - can you tune t2 differently than t1
because you only store CLOBS in there?
I would think, regarding the percentage that most data would fit in the row itself anyway that we
should not bother about all this (but I'm lazy by nature)
Cheers,
Mark. Followup May 30, 2003 - 12pm Central time zone: you'd have to ask the people wanting the separate tablespace "why". the goal is to achieve even IO over all devices. If segrating this way helps that, that is good. That nologging is a bit dangerous no? do you not care about these proteins? (there is a word I never thought I would use myself in a sentence here ;) Question answered elsewhere May 30, 2003 - 12pm Central time zone Reviewer: Mark Rynbeek from still in Cambridge hi Tom, regarding my previous question I think you already answered something likewise on http://asktom.oracle.com/pls/ask/f?p=4950:8:234994012547885035::NO::F4950_P8_DISPLAYID,F4950_P8_CRIT ERIA:386618744661, Sorry about that, so never mind my question. Your website is just too big ! Tom: Can you give me your opinion on the following: I have a survey form that I want to put on the web for users. The paper form has a lot questions and checkboxes. A user can check all checkboxes if he wants. For example, one question can have Languages Spoken? English French Spanish Chinese Japanese Other _________ Here is my question. If you are designing a table to store all this info is it better to have one table with columns like: create table survey as (survey_id, language1, language2, language3, language4, language5 language_other ); or You would do 3 tables as (one main table and one lookup table and one child data table that can have multiple records for each language: create table survey as ( survey_id, language ) create table language as ( lang_Code, language) create table survey_language as ( survey_id, lang_code, lang_other) 2. The second question is that I have a lot of these questions so if I design it in one table it will be huge. If I break them into lookup tables and child tables like optoin two, would it be better to put all code like languages, colors, etc. into one tables of codes or you would break it up to several lookup tables one for colors and one for languages etc. 3. Can you tell me the advantage/disadvantages of each approach from oracle stand point of view? Followup July 22, 2003 - 7am Central time zone: depends -- do you know for 100% sure how many lanaguages there will be. surveys are an example where the "funky data model" as I call it -- where you have this very generic model -- might make sense. It allows very flexible storage -- it stinks for querying - you might have to denormalize for quering later -- so you would have columns spanish, english, french (not lang1, lang2, lang3). so you can query "select count(*) from t where survey_id = 55 and english = 1 and spanish = 1" to find all surveys filled out by people that speak spanish and english. Tom: Do I understand that you are in favor of option 1 for one big table for the whole survey where i define one column for each checkbox value. The table may end up with more than 100 columns. I was planning to create two other tables, one is a lookup table for all languages, and other is where i store the survey_id and language code. The query would then be a two table join or three table join. Is there a way oracle recommends doing survey kind of design? 2. ALso, if you have several lookup tables would you recommend combining all of it into one (code,description) or create one table for each category. Followup July 22, 2003 - 8pm Central time zone: 1) do it in the funky data model and then DENORMALIZE if possible for reporting. 2) a table per lookup type. another table design question August 11, 2003 - 12pm Central time zone Reviewer: A reader We have a situation where there are different types of data (type being a categorization of a trow in a table) - let us say these are "type1", type2.(in our case there are 10 or so such types) Now each of these have same set of information stored in a table (same columns - col1, col2, ....col20) Is it better to store them all in one table (with a type column in addition to the above columns ) or is it better to separate them into 10 different tables (with col1, col2, col3 etc..) Most of the times these tables need not be queried together in a join. What are the pros/cons of creating multiple tables of the form type1_table (col1, col2, ...), type2_table (col1, col2, ...) etc.. versus one table single_table( type, col1, col2...) Thanx! Followup August 11, 2003 - 2pm Central time zone: if they all store the same set of information -- then your "type" column is nothing more then yet another attribute -- no more or less relevant then col1, col2, col3 so, unless there is more then meets the eye here -- one table. multi user problem October 14, 2003 - 6am Central time zone Reviewer: A reader Hi Tom For report query i am using temporary table.I am taking required information from different table then inserting into a temporary table then report generate.If anyone want to use this report then error message.so my question is 1)what is useful process to generate report in my mentioned criteria. 2)If my system is to generate serial number.what procedure should i take? say i have to generate serial number for different person.in my form the operator fillup the whole screen then he click in the generate button. Everything is for multi user. with regards, Followup October 14, 2003 - 7am Central time zone: use a real global temporary table no concurrency issues whatsoever read about them in the sql reference guide "create global temporary ...." Column allocation and performance issues. October 21, 2003 - 11pm Central time zone Reviewer: Alvin from Philippines I'm in a system where table columns are defined as Create table T ( c1 varchar2(2056) , c2 varchar2(2056) ); when the data that's actually stored are never more than 100 characters. 1. Is there any performace issue on such allocations ? Or columns being varchar2 will be internally optimized by oracle. 2. How do i measure the impact of such allocations as compared defining the table as such Create table T2 ( c1 varchar2(200) , c2 varchar2(200) ); Followup October 22, 2003 - 6am Central time zone: 1) i wrote about this in my new book "Effective Oracle By Design". Some of the things I said about
such nonsense was:
...
In addition to using the proper base datatype such as number, date, or string, you should also use
the most specific type you can. For example, use VARCHAR2(30) for a field that is up to 30
characters in length; do not use VARCHAR2(4000).
Ask Tom
"I work with a modelers group. My modeler would like to define every VARCHAR2 field with the
maximum length, which means that a table with 20 VARCHAR2 fields will all be defined with a maximum
of 2000 or 4000 bytes. I tried to talk to him about the reasons we identify data with correct
lengths and names in order to understand what we have in our database. He told me that it doesn't
matter, since Oracle just stores the length, etc., and there is no overhead. I don't believe this
is true, but have been jumping between so many databases that I cannot find a document on the
internals of Oracle storage. Can you help me out here with this question?"
My gut response was, "This is your data modeler, my goodness!" They are the ones who are supposed
to be telling you that it is vital to use the appropriate length when defining fields! That is
their job. Let's just forget about things like storage for a minute, why don't we ask him:
. What is going to happen when users pull this up in a query tool that formats each field based on
the width of the column in the database? They'll see one column and need to scroll way over to see
the second, the third, and so on.
. Say the code prepares a query that selects ten columns that are VARCHAR2. The developers, for
performance, would like to array fetch (very important). They would like to array fetch say 100
rows (very typical). So, you have 4,000 ´ 10 ´ 100 = almost 4MB of RAM the developers must
allocate! Now, consider if that were ten VARCHAR2(80) fields (it's probably much smaller than
that). That's about 78KB. Ask the data modeler how much RAM he is willing to kick in for this
system.
. Now, the developers start to build a data-entry screen to put data into the database. Wow, that
code field can be 4,000 characters long and that first name can be 4,000 characters long. How the
heck is anyone going to know what sort of data can really go in there?
Tell your data modeler group members that they need to consider the length as a constraint. Just as
they use primary and foreign keys, they should use the proper and correct length on fields. You can
always expand a field via a command like alter table t modify c varchar2(bigger_number). There is
no reason to use the maximum length everywhere. It will hurt the applications you develop, because
they will mistakenly allocate many megabytes of RAM. Just think of the array fetch example with an
application server. Now, it's not just 4MB; it's 4MB ´ number of connections. You are talking some
real memory here for a single query, and you'll be doing a lot of them at the same time.
End Ask Tom
A CHAR(2000) will consume 2,000 bytes of storage whether you put in the letter a, the string `hello
world', or 2,000 characters. A CHAR is always blank-padded. Additionally, are you thinking about
using an occasional index in your system? If so, beware of storage-related problems. Consider the
following:
tkyte@ORA817.US.ORACLE.COM> create table t ( a varchar2(4000), b varchar2(4000));
Table created.
tkyte@ORA817.US.ORACLE.COM> create index t_idx on t(a);
create index t_idx on t(a)
*
ERROR at line 1:
ORA-01450: maximum key length (3218) exceeded
I've noticed that 9i seems to have magically improved upon this CM
NOTE: In Oracle9i, the maximum key length is larger - but the restriction still exists. For
example, an index on T(a,b) would raise: ORA-01450: maximum key length (6398) exceeded in Oracle9i.
My system has an 8KB block size. I would need to use at least a 16KB block size to index a single
column, but even then, if I tried to create a concatenated index on T(A,B), it would fail there!
The same holds true for your numbers and the new Oracle9i TIMESTAMP datatypes: When appropriate,
use scales and precisions on those fields in order to better define your data integrity and to give
applications that much more information about the data itself.
In short, never be tempted to use anything other than a DATE or TIMESTAMP datatype to hold a date
value, and never be tempted to use a VARCHAR2 to hold a number. Use the appropriate and correct
type for each to ensure maximum performance, and more important, to protect your data integrity.
2) it won't affect raw performance, the costs of doing such a thing are found elsewhere. Order of column in a table October 22, 2003 - 6am Central time zone Reviewer: Ashiq Shamsuden A from Trichy, Tamil Nadu, India Hi Tom,
When you create a table , how you design it ? For example if I've table and it's going to have
100 columns and the table will have all kinds of datatypes. It'll be having composite primary
key(of three columns).
Total of 100 columns ,the datatype what i'm going to have is
44 --> varchar2's
20 --> date's
35 --> number's
1 --> clob
How to order the columns for the creation , so I should get benefit by design properly.
Followup October 22, 2003 - 7am Central time zone: put the most frequently accessed columns "first"
put the columns most likely to be null "last"
but most importantly -- don't over analyze it. it won't make a tremendous amount of difference at
the end of the day.
I just always put the primary keys first (convention) and the follow with the columns as they "make
sense". when I think of a person for example:
person( unique_id primary key,
first_name,
last_name,
middle_initial,
address,
.....
the data just follows in the "natural order" i see it in my head. Tom, I know one third party application that has all the key columns exactly at the middle of the column list, in a table. The application is supposed to work with any database. How would you rate this? Thanks Followup October 22, 2003 - 12pm Central time zone: confusing but not a big deal I have both your books and i just read the first chapter of the new book. Followup on my earlier question. How would oracle store varchar2(2000) and varchar2(30) in the block buffer cache ? won't the former entail additional overhead of any kind ? My developers are a tad greedy on allocations and i want to show them the effects of such (if any). Followup October 23, 2003 - 8am Central time zone: they will not be stored any differently, no. the effects are all in the client, in the loss of data integrity, in the loss of meaning, in the sloppiness of the implementation. cool info on column ordering thanks October 23, 2003 - 10am Central time zone Reviewer: Kevin Meade from Norwich CT, USA Thanks Tom, may I offer my 2 cents; I would note regarding column ordering these clarifications: 1) if the reader has read this thread, you have explained why primary key comes first (some modelers also suggest foreign keys come next (eh..)), 2) also you have explained why "most used" columns come second, 3) I would mention that "likely to be null" columns at the end of a row is as I recall because null columns at the end of a row take no space on the row but null columns between columns with values take 1 byte (ooh boy jumping at the space savings). As always though you hit the nail on the head with "don't over analyze, do what makes sense". I can't say I every paid much attention to most of this as there are always more important fish to fry. Perfection is never the goal (ooh, did I say never?). My I also suggest, there is always alphabetical order (hmmm...). Followup October 23, 2003 - 1pm Central time zone: i remember once working as a coder -- doing some Ada programming. the "senior reviewers" wanted me to code all subroutines in alphabetical order, that last comment reminded me of that. I hated that! (i made them really made by using "renames" in ada to give the appearance of compliance without really reordering any code -- i just added AA_ AB_ AC_ to the front of each routine and then renamed AA_proc1 to proc1 :) clob clarification October 23, 2003 - 11am Central time zone Reviewer: Rob from Pittsburgh, PA Tom: When you say: when you use clobs, they are moved out of the table into their own segment when they exceed 4000 bytes Does this mean that all clobs from each row are moved, or just the ones where the the clob in a particular row exceeds 4000 bytes. Thanks, Rob Followup October 23, 2003 - 1pm Central time zone: just the ones that are bigger then 4000 Storing Real Numbers and Pseudo-Numbers October 23, 2003 - 12pm Central time zone Reviewer: Dave from Colorado Quote: "... never be tempted to use a VARCHAR2 to hold a number" When storing such entities as Zip codes, Social Security Numbers, credit card numbers, telephone numbers etc., I have always used Char(), or VarChar2(), because my view was that these are not "real" numbers. They are codes for which all of the characters happen to be numeric, and any operation that the system is going to perform will be character-based. For example, formatting a telephone number as (999) 999-9999, or viewing the last four of an SSN. We're certainly never going to perform any algebraic op's on them, like adding one zip code to another, and we can easily apply check constraints to ensure that ... Translate(SSN,'0123456789','0000000000')='000000000' ... for example. What's your opinion, Tom? Does this seem like a valid distinction, "real" and "pseudo" numbers? Followup October 23, 2003 - 1pm Central time zone: they are not numbers as you said. zip codes in particular are "postal codes" in general and are alpha numeric in reality. so, since they are not numbers, store in a string -- you'll never treat them as numbers so it's ok. Length of primary key October 31, 2003 - 3am Central time zone Reviewer: Kunjan Mehta from Bangalore, India Hi Tom, We are using surrogate keys as the primary keys. We are using oracle sequences to generate surrogate keys. I believe, we can generate upto Number(27) through the sequences. is it correct? If not, is there any limit on the largest number being generated by sequence? What is it? Is there any desired maximum length for the primary key? (e.g. it should not be more than 19 digits or something like that)? We are desinging a customized ERP. Is it advisable to use a single sequence for all the surrogate keys across the entire application? (approx 1000 tables in total) or should each table have one dedicated sequence or each logical group (say tables for Accounts, tables for materials etc.) should use a dedicated sequences each? (we're using 9iR1 as database.) Followup October 31, 2003 - 7am Central time zone: ops$tkyte@ORA920LAP> select rpad('9',27,'9')/60/60/24/366/1000 from dual;
RPAD('9',27,'9')/60/60/24/366/1000
----------------------------------
3.1623E+16
27 9's is a big big number. that shows it would take 3.16e16 YEARS to exhaust the sequence
generating 1000 numbers a second
the number of digits is not going to have much bearing on anything. it'll be just fine to use a
single number column as big as it gets.
You should not have a single sequence. see
http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:2985886242221 not exhausting sequence, but what about latches October 31, 2003 - 8am Central time zone Reviewer: freek from Belgium Tom, Using 1 sequence for an entire application wouldn't exhaust the sequence, but wouldn't it be still better to use 1 sequence per table / column to reduce waits? Followup October 31, 2003 - 8am Central time zone: you know what, i didn't read the entire thing. thanks -- i'll finish my answer now :) What about Table APIs November 3, 2003 - 7am Central time zone Reviewer: A P Clarke from London, England Towards the top of this thread you said: >> SELECT only that data which you actually NEED. This is my preferred approach (got burned by SELECT * a long while back). Howvever, there are a number of respected practioners out there who recommend using Table APIs i.e. PL/SQL encapsulation packages that act as the sole method of interacting with a table. The advantages are security and (allegedly) performance - no multiple versions of the same SELECT statement knocking around, just one "tuned" one. However, it seems to me that the SLCT() functions in Table APIs - such as the ones generated by Oracle Designer - must always select every columns, unless we code a SLCT() function for every single permutation of column, which I think is impractical. I have survived a long time without using Table APIs but I got doubts when I read Steven Feuerstein's PL/SQL Best Practices book. What do you think? Can table APIs be reconciled with "only SELECT what you need"? Cheers, APC P.S. I'm not trying to set up some PL/SQL version of "Clash of the Titans" here :P Followup November 3, 2003 - 8am Central time zone: well, table API's to me are somewhat weak in the first place -- or just have special use cases (simple CRUD | |
maximum no of columns per table July 4, 2002 - 3pm Central time zone
Thanks July 4, 2002 - 3pm Central time zone
Should LOBs be stored in different Tablespace July 5, 2002 - 10am Central time zone