"design the table", version 8.1.6

wallacel -- Thanks for the question regarding "design the table", version 8.1.6

Submitted on 14-Feb-2001 19:27 Central time zone
Last updated 16-Feb-2010 11:12

You Asked

Tom:

when doing the database design , i have the following question.

let's say we have emp table which have empid,empname,deptno,
as well as empPicture, empGreeting, empFamilyPict.
since we don't use empPicture , empGreeting( which is avi),
empFamilypicture a lot, do you think it make sense to create 
another to store that?

so instead of have one table 
emp(empid number,
empname varchar2(100),
deptno number,
empPicture blob, 
empGreeting blob, 
empFamilyPict blob).


we have two, one is
emp(empid number,
empname varchar2(100),
deptno number)

another is 
empBlob(empid number,
empPicture blob, 
empGreeting blob, 
empFamilyPict blob).


or latter on i change it to 4 table , like

1.emp(empid number,empname varchar2(100),deptno number)
2.empPict(empid number,empPicture blob);
3.empGreeting(empid number,empGreeting blob)
4.empFpict(empid number,empFamilyPict blob).

 
does that make sense? which one is better?
one table? two table? 4(multiple table)?
Regards,


 

and we said...

1 table.

With blobs, which are typically stored out of line, you already physically have the data 
elsewhere.  Logically they belong together -- keep them together.

 

Reviews
4 stars maximum no of columns per table July 4, 2002 - 3pm Central time zone
Reviewer: nvssk from India
If an entity is having say 100 fields, Is it suggested to create one single table? or split into 
multiple tables based on the most frequently accesses columns? Do we need to consider data block 
size? 

If we keep them in one single table what is the impact on performance?

If an entity is having LONG and/or CLOB columns also, what all should be considered for creating 
tables ?

Do we need to decide on storage parameters for each transaction table ? 


Followup July 4, 2002 - 3pm Central time zone:

if an entity has 100 attributes, it has 100 attributes -- it'll be a single table.  hopefully the 
average row (or max row actually) fits on a block -- but that is true whether you have 10, 100 or 
more columns

A full scan will take longer -- but you should be asking "what is the performance impact if we 
split them out".  Well, you'll have two (or N) primary key indexes to maintain, not just 1.  You'll 
have n*(m+1) additional IO's for every row retrieve (where n = number of tables you choose, m = 
height of your index).  It could really impact performance badly.

do no use longs, period.

when you use clobs, they are moved out of the table into their own segment when they exceed 4000 
bytes, the work is done for you, no need to factor them out special.

I size objects by small, medium, large and extra large, using a locally managed tablespace with 
uniform extents.  Alternatively, you can use a LMT with system manged extents and not even worry 
about it yourself. 
5 stars Thanks July 4, 2002 - 3pm Central time zone
Reviewer: nvssk from India
Thank you very mcuh. 


5 stars Primary key July 4, 2002 - 4pm Central time zone
Reviewer: nvssk from India
Hi Tom,
Thanks for answering the above questions.

 If a table D depends on table C ( D cant exist without C)
and table C depends on table B, 
and table B depends on table A.

What is the best way to design the keys so that it results in better performance ? Can we composite 
primary keys for the above scenario ? what are the other alternatives ?

Do we need to take any diversion, if table A have composite primary key  in the above example?


 


Followup July 4, 2002 - 5pm Central time zone:

You can certainly use composite keys if you choose. 

The smaller the key, the faster the compare -- however, you have to outweigh the costs of using a 
surrogate key and maintaining two unique indexes if you decide to 'shorten' the key by using a 
single ID field populated via a sequence for example. (there are tradeoffs).

I tend to go with natural keys if the natural key is 1 to 3 columns (and NEVER changes), surrogate 
keys (id field populated via sequence) otherwise. 
5 stars How about using clusters? July 4, 2002 - 11pm Central time zone
Reviewer: Ian Matyssik from Nihon
Hello,
     I understand that relative information better be in the same table. But wouldn't it be better 
if we will separate our LOBs into different table, I say that because LOBs are usially big and some 
times it takes time to store them them or update them. During that time I understand that the ROW 
will be locked and other information will be impossible to update. I hope you understand what I am 
trying to say. So I was thinking if it would be any better to use clusters as a PK on all related 
tables (separated for logical or organizational reasons) . Could you please advice on that. 


Followup July 5, 2002 - 10am Central time zone:

As I said before:

...
when you use clobs, they are moved out of the table into their own segment when 
they exceed 4000 bytes, the work is done for you, no need to factor them out 
special
.......


It doesn't matter that lobs are big, they are not stored in the table segment.

If someone is updating the LOB, yes the row will be locked -- but that is the way it SHOULD BE.  

I believe this is one table, with one row per thing.  LOBS do not take an inordinate amount of time 
to modify -- the row is protected from other updates, but reads of the data are not blocked.  I do 
not view this as an issue.

I would not introduce the complexity you describe, no.
 
3 stars Should LOBs be stored in different Tablespace July 5, 2002 - 10am Central time zone
Reviewer: Jerry from NY USA
LOB Should be stored in different Tablespace , I think that is what Oracle recommends . If that 
will be the case then there would be need for two tables .


 


Followup July 5, 2002 - 11am Central time zone:

NO, you are missing the point of how lobs are stored.

they are stored in 3 pieces

o in the table, there is a lob locator (pointer)
o in the same tablespace as the table, there is a lob index (more pointers to all of the chunks)

o in another segment which may be in another tablespace (or not, upto you), are the lob chunks.

When you create a table with a LOB, there are 3 segments (objects) created -- the table itself, the 
lob index (which will be in the same tablespace) and the lob segment (which you control)


We already "broke" all of this stuff out physically.  There is NEVER a need for two tables here.  

Look at the create table statement for a lob as shown by exp/imp:

ops$tkyte@ORA817DEV.US.ORACLE.COM> create table t ( x int, y clob );
Table created.

ops$tkyte@ORA817DEV.US.ORACLE.COM> !exp userid=/ tables=t
Export: Release 8.1.7.3.0 - Production on Fri Jul 5 11:12:50 2002
(c) Copyright 2000 Oracle Corporation.  All rights reserved.
Connected to: Oracle8i Enterprise Edition Release 8.1.7.3.0 - Production
With the Partitioning option
JServer Release 8.1.7.3.0 - Production
Export done in WE8ISO8859P1 character set and WE8ISO8859P1 NCHAR character set
About to export specified tables via Conventional Path ...
. . exporting table                              T          0 rows exported
Export terminated successfully without warnings.

ops$tkyte@ORA817DEV.US.ORACLE.COM> !imp userid=/ full=y indexfile=t.sql
Import: Release 8.1.7.3.0 - Production on Fri Jul 5 11:13:02 2002
(c) Copyright 2000 Oracle Corporation.  All rights reserved.
Connected to: Oracle8i Enterprise Edition Release 8.1.7.3.0 - Production
With the Partitioning option
JServer Release 8.1.7.3.0 - Production
Export file created by EXPORT:V08.01.07 via conventional path
import done in WE8ISO8859P1 character set and WE8ISO8859P1 NCHAR character set
. . skipping table "T"                                

Import terminated successfully without warnings.

ops$tkyte@ORA817DEV.US.ORACLE.COM> !cat t.sql

REM  CREATE TABLE "OPS$TKYTE"."T" ("X" NUMBER(*,0), "Y" CLOB) PCTFREE 10 
REM  PCTUSED 40 INITRANS 1 MAXTRANS 255 NOLOGGING STORAGE(INITIAL 524288 
REM  FREELISTS 1 FREELIST GROUPS 1) TABLESPACE "USERS" LOB ("Y") STORE AS 
REM  (TABLESPACE "USERS" ENABLE STORAGE IN ROW CHUNK 8192 PCTVERSION 10 
REM  NOCACHE NOLOGGING STORAGE(INITIAL 524288 FREELISTS 1 FREELIST GROUPS 
REM  1)) ;
REM  ... 0 rows


You already have all of the control you need over the location of the lob, one table -- period. 
3 stars You are the Real Oracle GURU. July 5, 2002 - 12pm Central time zone
Reviewer: A reader
Wonderful and Thanks for the detailed explanation , Once again you are the best in the business. 


5 stars Easier understanding of table design July 6, 2002 - 1pm Central time zone
Reviewer: nvssk from India
Hi Tom,
How can we achieve, easier developer understanding of the table design, without one explaining the 
table design document / ER Model.
The problem reported is difficulty in understanding the keys and relationship.

When an application is going to be developed in Java/JSP in bottom-up approach( table design 
first), Other than ER modeling, what specific features OO can be used in  oracle ? what are the 
performance impacts ?

 


Followup July 6, 2002 - 1pm Central time zone:

#1) telepathy I suppose?  

#2) I would get database people and teach them Java/JSP.  Or I would get database people -- let 
them do the database logic in finely tuned stored procedures and have the java/jsp guys just call 
the plsql code (NO SQL in java)...  (of course, the java jsp programmers will forget to use bind 
variables and refuse to use prepared/callable statements more then once but hey -- at least the sql 
would run good)

If you are setting out to write a high performance database application, write a database 
application.  Don't write a "cool java/jsp" that happens to use a database -- the database is going 
to be where you succeed or fail, don't work AROUND it, with with it. 
5 stars Network data transfer July 8, 2002 - 7am Central time zone
Reviewer: nvssk from India
Hi Tom,
If the table contains say 100 columns,and you access the whole object, Wont it be lot of data on 
network,especially when its a web based application.

What is the performance impact due to this heavy data transfer ?Or Does it matter at all?

Can it be solved by creating separate views( of the required columns) for the usage based on the 
requirement and pickup the data only from the views.

Is there any impact on the joins?
Thanks for your help.
 


Followup July 8, 2002 - 8am Central time zone:

don't access the whole object then -- don't be lazy, don't code "select *" (which in my humble 
opinion is a BUG in your code if you do!  The fastest way to a broken program is to code:

select *    <<<<==== use of "*" should be limited to sqlplus testing

insert into t values (   <<<==== didn't list the columns
insert into t select ...   <<<<===== ditto


a simple drop and re-create with the columns in a different order or the addition or deletion of a 
column in the table BREAKS your application)


SELECT only that data which you actually NEED.  Don't use "*" and this entire conversation is 
rendered MOOT.
 
4 stars July 8, 2002 - 3pm Central time zone
Reviewer: A reader
Excuse the stupid question. Will Oracle not read the entire data block where the row is stored? 
What is the difference in the way Oracle read the data for a select * and for select col1, 
col2..etc?  


Followup July 8, 2002 - 7pm Central time zone:

It isn't a stupid question -- it is actually more complex then you probably anticipate and takes a 
couple of points to fully answer

o first, there is the "BUG" inherit in the "select *" code. A simple addition or deletion of a 
column will generally "break" an application. One of the glaring exceptions to this is PLSQL with 
implicit cursors.  

   for x in (select * from t)

will not break but is unnecessarily inefficient.  

o second, there are different paths the optimizer can take to answer a query and it is very 
dependent on the columns you select.  For example:

big_table@ORA817DEV.US.ORACLE.COM> select object_name from big_table where rownum < 100000;

Execution Plan
----------------------------------------------------------
   0  SELECT STATEMENT Optimizer=CHOOSE (Cost=212 Card=1459904 Bytes=36497600)
   1    0  COUNT (STOPKEY)
   2    1    INDEX (FAST FULL SCAN) OF 'OBJECT_NAME_IDX' (NON-UNIQUE) (Cost=212 C
ard=1459904 Bytes=36497600)

Statistics
----------------------------------------------------------
          0  recursive calls
         45  db block gets
        583  consistent gets
          0  physical reads
          0  redo size
    3228183  bytes sent via SQL*Net to client
       2534  bytes received via SQL*Net from client
         21  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
      99999  rows processed

big_table@ORA817DEV.US.ORACLE.COM> select * from big_table where rownum < 100000;

Execution Plan
----------------------------------------------------------
   0 SELECT STATEMENT Optimizer=CHOOSE (Cost=577 Card=1459904 Bytes=150370112)
   1    0  COUNT (STOPKEY)
   2    1    TABLE ACCESS (FULL) OF 'BIG_TABLE' (Cost=577 Card=1459904 Bytes=1503
70112)

Statistics
----------------------------------------------------------
          0  recursive calls
         18  db block gets
       1397  consistent gets
          0  physical reads
          0  redo size
   11972872  bytes sent via SQL*Net to client
       2534  bytes received via SQL*Net from client
         21  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
      99999  rows processed

big_table@ORA817DEV.US.ORACLE.COM> 


Look at the huge differences in consistent gets (blocks processed).  This is due entirely to the 
one query being answered via an index and the other full scanning.  If you include columns in your 
select list that FORCE us to goto the base table -- when you don't need to -- you are forcing a 
less optimal query plan on us

o third, look at the difference in data transfered!.  Almost 4x as much -- that could be huge.

o and the last thing I'll mention will be chained rows...  The impact from this is subtle.  If 
you have chained rows -- and you do a select *, we'll be forced to construct the entire row (which 
will take additional random IO's to find the "end" block for your row - we have to piece it 
together).  Now, if you select only the columns you need AND these columns are on the head 
rowpiece, we'll never assemble the entire row -- we do just what we need to do to get your data.


So, yes, Oracle reads an entire block -- but by selecting only the columns you need you can:

o avoid reading that block in the first place (skipping the table access by rowid step in an index 
range scan for example)

o avoid reading multiple blocks to construct the entire row

o open up better query plans, more query plans

o have less buggy code -- code that is much more resilient to change

o transmit lots less data over the network

and so on.... 
4 stars July 9, 2002 - 12pm Central time zone
Reviewer: Anil Sekhar from New York
Great answer. Thanks Tom.

So if Oracle reads the entire datablock (when the data is not available from an index) where the 
row is stored, why are you against limiting a table to the frequently accessed attributes of an 
entity? If an entity has 100 attributes and only 20 are accessed frequently, is it not a waste to 
read 100 columns when you actually need only 10-15 of them? Sorry if I am beating a dead horse.

Also can you please tell more about the select * bug?

Thanks.

 


Followup July 9, 2002 - 12pm Central time zone:

We read a block -- if you have 20 columns or 100 columns we read the block.

If you only need 20 of them, and we need to read the block, we'll still read the block.  We'll only 
process 20 of them.

The select * bug is that if you have a program that does a select *, the * is resolved at runtime.  
If the number, order or types of the columns CHANGE over time -- your program breaks.  If you 
selected ONLY those columns of interest -- only those changes that really truly affect you -- in 
fact affect you.  We can ADD columns and you do not break.  If we drop a column and you don't 
select that column -- you don't break -- if you did a SELECT * however, you would have in your 
FETCH command a place for this missing column and you would break.   
5 stars Parent child relationships July 11, 2002 - 6pm Central time zone
Reviewer: nvssk from India
Lets say, There are tables P, C1, C2, C3, C4 and P is the parent of C1 ..C4.

If the application need to show C1,C2,C3,C4 togather for each row in P, what is the best way to 
model it?
i.e,
P1
  C11
  C12
  C21
  C22
  C31
  C32,etc.

1. If you manage as Parent child, one need to write UNION between queries i.e,
select p.p1,c1.c11
from P,C1
where P.pc1=C.pc1
UNION
select p.p1,c2.c21
from P,C1
where P.pc1=C.pc1

Is this an efficient way? 
OR
2.Can we have a table C in between P and C1,C2,C3,C4
where P maps to C (1:n) and C maps C1(1-1),C maps to C2(1-1) etc.
This way we can get all the required data in one query, no need of unions. But, we will not be able 
to put referential constrains as C stored all primary keys of C1,C2,C3,C4 in one column.

What is the right approach in both of the above ? 


Followup July 12, 2002 - 8am Central time zone:

If P has a 1-1 relationship with 3 of the C* tables and either a 1-1 or 1-m (one to many) with 1 of 
the C* tables, you can write it as:

select * 
  from p, c1, c2, c3, c4
 where p.key = c1.fkey 
   and p.key = c2.fkey
   and p.key = c3.fkey
   and p.key = c4.fkey

if any of the relationships are OPTIONAL, add the outer join (+) to the query.

If more then 1 of the child tables is 1-m with P, then you cannot do it with a join (since if C1 
has two rows that match with P and C2 has 3 rows -- you'll end up with 6 rows -- a cartesian 
production).  You'll want to use a UNION ALL -- better then UNION

 
4 stars Data Model September 13, 2002 - 12pm Central time zone
Reviewer: Mark from CA
Tom,
I have the following requirements and could you please tell me whether I can achive this using the 
Logical model.
It is a technical Membership info Design.

1.One business Type can have multiple comanies under it.
2.One company Must Belong to only on Business Type.
3.One company can have multiple employees
4.we are interested in only the employees with Tech Membership and they have to enroll every 
year.(we need the Historical Info also).
5.Each company Appoints multiple officers.
6.Each Business Type will have a Director who is selected from the Officers table.
7.Director for each business type should be from that business type only.
I have designed the tables for the above requirements but I am having the problems in enforcing the 
Relation between Business Type and Company and the requirement 7 at the same time.
Table1:
Business Type
Columns
Bus_Type (Primary Key)

Table2:
Company
Columns
Company Code (Primary Key)
Bus_Type (Foreign Key (table1))

Table3:
Member
Columns
Member Id        PK
Enrollment Year  PK
Company Code     PK (FK to Table 2)

Table4
Officer Table
Officer Id   PK (FK to table3)
Year         PK (FK to Table3)
Company Code PK  (FK to Table3)
Officer Role (Non Key Column)

Directors
Bus_Type PK (FK to Table1)
Year     PK (FK to Table4)
Director Id  (FK to Table4 column Officer Id)
Directors Company (FK to Table4 Column Company Code)

But In the above model I can not enforce the Requirement 7
which is Director of the bus type Must belong to that bussiness Type.

Could you please tell me how to fit all the above requirements in One logical model.
 


5 stars One or two tables? January 10, 2003 - 10am Central time zone
Reviewer: John from New York
Get back to how to design tables:
    I have a system where we need to generate alerts.  The alert will be stored in an alert table, 
and end user will see them based on queries.  For each alert, we need some basic information, such 
as alert code, and create date.  We also need to store the detailed information for the event that 
causes the alert, and that detailed information varies depending on the alert.  In another words, 
different kinds of detailed information need to be captured for different alert.  I am now 
considering three options, but  not sure what is the best way to handle this.
Alerts are generated based on ALERT_DEFINITION table:

|alert_definition|-------<|alert_record|
(1 to M relationship)

The 3 options I am considering are:

1. Create one ALERT_RECORD table, and create columns that will be needed to capture different types 
of alert detail information.  For columns that don't apply to a particular type of alert, the value 
will be null.
    The problem is that the information is not stored in a logical way.  For each row (alert) in 
the table, there will be lots of columns that are not applicable to it, and therefore have a value 
of NULL.  There may well be confusions down the road as to which column apply to which type of 
alert.

2. Create one ALERT_RECORD table for each alert type.  Information will be stored in a logical way, 
but I will end up with 20 plus tables just for alert detail information.  That will also require 
extra Java programming work to present those information.

3. Create one ALERT_RECORD table, but only use it to store the basic alert information (alert code, 
and create date).  Create another table called ALERT_RECORD_DETAIL to store the detailed 
information.  The alert details will be stored as Display_heading and display_value.  For alert 
detail like this:

display1          display2          display3
value1            value2            value3

will be stored in ALERT_RECORD_DETAIL as 3 records:

display_heading      display_value
--------------       --------------
display1             value1 
display2             value2
display3             value3

and I probably need another column to store the display order of the heading.

This design will require lots of parsing in the Java code to present the information.

|alert_definition|-------<|alert_record|--------<|alert_rec_detail|

Is there any other better way to handle this situation?  Thanks.
 


Followup January 10, 2003 - 10am Central time zone:

No version info.  Ok, assuming you are using the latest software:


option four
create table alert_record( alert_code int, alert_date date, alert_info XMLTYPE )



or, in earlier releases

option four
create table alert_record( alert_code int, alert_date date, alert_info clob )


if you don't need to have the data structured.  You can use Oracle text (intermedia) to index the 
xmltype or clob for searching if that is needed. 
5 stars January 29, 2003 - 3am Central time zone
Reviewer: Mirjana from Barcelona


4 stars Thanks. February 4, 2003 - 3pm Central time zone
Reviewer: John from New York
We are trying to design our database to accomendate Oracle 8.1.7 and up.  So I guess the XMLTYPE 
option is out.

The alert detail information I need to store and display later to users are: product_id, 
packing_list, acknowledge_date, etc.  Those information have to be stored in a structured way.  I 
don't know how can I display them if I use XMLTYPE or clob?
 


Followup February 4, 2003 - 4pm Central time zone:

original question was:

...
  I have a system where we need to generate alerts.  The alert will be stored 
in an alert table, and end user will see them based on queries.  For each alert, 
we need some basic information, such as alert code, and create date.  We also 
need to store the detailed information for the event that causes the alert, and 
that detailed information varies depending on the alert.  In another words, 
different kinds of detailed information need to be captured for different alert. 
 I am now considering three options, but  not sure what is the best way to 
handle this.


1. Create one ALERT_RECORD table, and create columns that will be needed to 
capture different types of alert detail information. ...

2. Create one ALERT_RECORD table for each alert type....

3. Create one ALERT_RECORD table, (then a table with variable name/value pairs).....


#1 is the easiest

#2 can be thought of as a variation on #1 -- think "views"

#3 would be avoided by me if at all possible.  People are going to want to query "where product_id 
= 55 and acknowledge date between A and B or packing_list = C".  Try doing that with variable 
name/values


So, I guess I vote for #1 with the #2 variation on a theme (views) to remove the ambiguity of who 
needs what data...


 
4 stars store as segname May 30, 2003 - 11am Central time zone
Reviewer: Mark Rynbeek from Cambridge, UK
Hi Tom,

thanks for the CLOB info I can find here. I have a question regarding the "store as segname" option 
for CLOBS.

We have to design a table with biological data, proteins. These proteins have a sequence of 
amino-acids, represented by a long letter string. On average, they're not really long (only .05% is 
longer than 4000 characters) but still you need the CLOB.

There seems to be some consensus here that CLOB data should be in its own tablespace, physically 
but not logically. So you get something like : 

CREATE TABLE protein
(         x                     INTEGER
        , Y                     VARCHAR2(50)
        , z                     CLOB
)
TABLESPACE t1
lob (z) 
   STORE AS SEGNAME (TABLESPACE t2
                     PCTVERSION 0
                     CACHE READS
                     NOLOGGING
                    )
;


Does that make sense to you? What would be the benefit - can you tune t2 differently than t1 
because you only store CLOBS in there?

I would think, regarding the percentage that most data would fit in the row itself anyway that we 
should not bother about all this (but I'm lazy by nature)

Cheers,
Mark. 


Followup May 30, 2003 - 12pm Central time zone:

you'd have to ask the people wanting the separate tablespace "why".

the goal is to achieve even IO over all devices.  If segrating this way helps that, that is good.


That nologging is a bit dangerous no?  do you not care about these proteins?  (there is a word I 
never thought I would use myself in a sentence here ;) 
4 stars Question answered elsewhere May 30, 2003 - 12pm Central time zone
Reviewer: Mark Rynbeek from still in Cambridge
hi Tom, 
regarding my previous question I think you already answered something likewise on
http://asktom.oracle.com/pls/ask/f?p=4950:8:234994012547885035::NO::F4950_P8_DISPLAYID,F4950_P8_CRIT
ERIA:386618744661,
Sorry about that, so never mind my question. 
Your website is just too big ! 


5 stars Survey Design July 21, 2003 - 4pm Central time zone
Reviewer: mo
Tom:

Can you give me your opinion on the following:  I have a survey form that I want to put on the web 
for users.  The paper form has a lot questions and checkboxes.  A user can check all checkboxes if 
he wants. For example, one question can have 
Languages Spoken?
  English
  French
  Spanish
  Chinese
  Japanese
  Other _________

Here is my question.  If you are designing a table to store all this info is it better to have one 
table with columns like:
create table survey as 
(survey_id,
language1,
language2,
language3,
language4,
language5
language_other );

or

You would do 3 tables as (one main table and one lookup table and one child data table that can 
have multiple records for each language:

create table survey as (
survey_id,
language )

create table language as (
lang_Code,
language)

create table survey_language as (
survey_id,
lang_code,
lang_other)

2.  The second question is that I have a lot of these questions so if I design it in one table it 
will be huge.  If I break them into lookup tables and child tables like optoin two, would it be 
better to put all code like languages, colors, etc.  into one tables of codes or you would break it 
up to several lookup tables one for colors and one for languages etc.

3. Can you tell me the advantage/disadvantages of each approach from oracle stand point of view?



   


Followup July 22, 2003 - 7am Central time zone:

depends -- do you know for 100% sure how many lanaguages there will be.

surveys are an example where the "funky data model" as I call it -- where you have this very 
generic model -- might make sense.  


It allows very flexible storage -- it stinks for querying - you might have to denormalize for 
quering later -- so you would have columns spanish, english, french (not lang1, lang2, lang3).  so 
you can query "select count(*) from t where survey_id = 55 and english = 1 and spanish = 1" to find 
all surveys filled out by people that speak spanish and english. 
5 stars table design July 22, 2003 - 4pm Central time zone
Reviewer: mo
Tom:

Do I understand that you are in favor of option 1 for one big table for the whole survey where i 
define one column for each checkbox value.  The table may end up with more than 100 columns.

I was planning to create two other tables, one is a lookup table for all languages, and other is 
where i store the survey_id and language code.  The query would then be a two table join or three 
table join.  Is there a way oracle recommends doing survey kind of design?

2.  ALso, if you have several lookup tables would you recommend combining all of it into one 
(code,description) or create one table for each category. 


Followup July 22, 2003 - 8pm Central time zone:

1) do it in the funky data model and then DENORMALIZE if possible for reporting.

2) a table per lookup type. 
5 stars another table design question August 11, 2003 - 12pm Central time zone
Reviewer: A reader
We have a situation where there are different types of data
(type being a categorization of a trow in a table) - let us
say these are "type1", type2.(in our case there are 10 
or so such types)

Now each of these have same set of information stored
in a table (same columns - col1, col2, ....col20)

Is it better to store them all in one table (with a type
column in addition to the above columns ) or is it
better to separate them into 10 different tables (with
col1, col2, col3 etc..)

Most of the times these tables need not be queried together
in a join.

What are the pros/cons of creating multiple tables of
the form
type1_table (col1, col2, ...),
type2_table (col1, col2, ...) etc..
versus
one table
single_table( type, col1, col2...) 

Thanx!

 


Followup August 11, 2003 - 2pm Central time zone:

if they all store the same set of information -- then your "type" column is nothing more then yet 
another attribute -- no more or less relevant then col1, col2, col3


so, unless there is more then meets the eye here -- one table. 
5 stars thanx! August 11, 2003 - 2pm Central time zone
Reviewer: A reader


5 stars multi user problem October 14, 2003 - 6am Central time zone
Reviewer: A reader
Hi Tom

For report query i am using temporary table.I am taking required information from different table 
then inserting into a temporary table then report generate.If anyone want to use this report then 
error message.so my question is

1)what is useful process to generate report in my mentioned criteria.
2)If my system is to generate serial number.what procedure should i take?
say i have to generate serial number for different person.in my form the operator fillup the whole 
screen then he click in the generate button.

Everything is for multi user.

with regards, 


Followup October 14, 2003 - 7am Central time zone:

use a real global temporary table

no concurrency issues whatsoever

read about them in the sql reference guide "create global temporary ...." 
5 stars Column allocation and performance issues. October 21, 2003 - 11pm Central time zone
Reviewer: Alvin from Philippines
I'm in a system where table columns are defined as

Create table T
( c1 varchar2(2056)
, c2 varchar2(2056)
);

when the data that's actually stored are never more than 100 characters.

1. Is there any performace issue on such allocations ? Or columns being varchar2 will be internally 
optimized by oracle.
2. How do i measure the impact of such allocations as compared defining the table as such

Create table T2
( c1 varchar2(200)
, c2 varchar2(200)
);
 


Followup October 22, 2003 - 6am Central time zone:

1) i wrote about this in my new book "Effective Oracle By Design".  Some of the things I said about 
such nonsense was:

...
In addition to using the proper base datatype such as number, date, or string, you should also use 
the most specific type you can. For example, use VARCHAR2(30) for a field that is up to 30 
characters in length; do not use VARCHAR2(4000). 

Ask Tom

"I work with a modelers group. My modeler would like to define every VARCHAR2 field with the 
maximum length, which means that a table with 20 VARCHAR2 fields will all be defined with a maximum 
of 2000 or 4000 bytes. I tried to talk to him about the reasons we identify data with correct 
lengths and names in order to understand what we have in our database. He told me that it doesn't 
matter, since Oracle just stores the length, etc., and there is no overhead. I don't believe this 
is true, but have been jumping between so many databases that I cannot find a document on the 
internals of Oracle storage. Can you help me out here with this question?"

My gut response was, "This is your data modeler, my goodness!" They are the ones who are supposed 
to be telling you that it is vital to use the appropriate length when defining fields! That is 
their job. Let's just forget about things like storage for a minute, why don't we ask him:

. What is going to happen when users pull this up in a query tool that formats each field based on 
the width of the column in the database? They'll see one column and need to scroll way over to see 
the second, the third, and so on.

. Say the code prepares a query that selects ten columns that are VARCHAR2. The developers, for 
performance, would like to array fetch (very important). They would like to array fetch say 100 
rows (very typical). So, you have 4,000 ´ 10 ´ 100 = almost 4MB of RAM the developers must 
allocate! Now, consider if that were ten VARCHAR2(80) fields (it's probably much smaller than 
that). That's about 78KB. Ask the data modeler how much RAM he is willing to kick in for this 
system.

. Now, the developers start to build a data-entry screen to put data into the database. Wow, that 
code field can be 4,000 characters long and that first name can be 4,000 characters long. How the 
heck is anyone going to know what sort of data can really go in there?

Tell your data modeler group members that they need to consider the length as a constraint. Just as 
they use primary and foreign keys, they should use the proper and correct length on fields. You can 
always expand a field via a command like alter table t modify c varchar2(bigger_number). There is 
no reason to use the maximum length everywhere. It will hurt the applications you develop, because 
they will mistakenly allocate many megabytes of RAM. Just think of the array fetch example with an 
application server. Now, it's not just 4MB; it's 4MB ´ number of connections. You are talking some 
real memory here for a single query, and you'll be doing a lot of them at the same time.

End Ask Tom

A CHAR(2000) will consume 2,000 bytes of storage whether you put in the letter a, the string `hello 
world', or 2,000 characters. A CHAR is always blank-padded. Additionally, are you thinking about 
using an occasional index in your system? If so, beware of storage-related problems. Consider the 
following:

tkyte@ORA817.US.ORACLE.COM> create table t ( a varchar2(4000), b varchar2(4000));
Table created.

tkyte@ORA817.US.ORACLE.COM> create index t_idx on t(a);
create index t_idx on t(a)
                      *
ERROR at line 1:
ORA-01450: maximum key length (3218) exceeded
I've noticed that 9i seems to have magically improved upon this CM
NOTE: In Oracle9i, the maximum key length is larger - but the restriction still exists.  For 
example, an index on T(a,b) would raise: ORA-01450: maximum key length (6398) exceeded in Oracle9i. 


My system has an 8KB block size. I would need to use at least a 16KB block size to index a single 
column, but even then, if I tried to create a concatenated index on T(A,B), it would fail there! 

The same holds true for your numbers and the new Oracle9i TIMESTAMP datatypes: When appropriate, 
use scales and precisions on those fields in order to better define your data integrity and to give 
applications that much more information about the data itself.

In short, never be tempted to use anything other than a DATE or TIMESTAMP datatype to hold a date 
value, and never be tempted to use a VARCHAR2 to hold a number. Use the appropriate and correct 
type for each to ensure maximum performance, and more important, to protect your data integrity.


2) it won't affect raw performance, the costs of doing such a thing are found elsewhere. 
5 stars Order of column in a table October 22, 2003 - 6am Central time zone
Reviewer: Ashiq Shamsuden A from Trichy, Tamil Nadu, India
Hi Tom,

  When you create a table , how you design it ? For example if I've table and it's going to have  
100 columns and the table will have all kinds of datatypes. It'll be having composite primary 
key(of three columns).

 Total of 100 columns ,the datatype what i'm going to have is 

  44 --> varchar2's
  20 --> date's
  35 --> number's
  1  --> clob 

  How to order the columns for the creation , so I should get benefit by design properly.

    


Followup October 22, 2003 - 7am Central time zone:

put the most frequently accessed columns "first"

put the columns most likely to be null "last"

but most importantly -- don't over analyze it.  it won't make a tremendous amount of difference at 
the end of the day.

I just always put the primary keys first (convention) and the follow with the columns as they "make 
sense".  when I think of a person for example:

person( unique_id primary key,
        first_name,
        last_name,
        middle_initial,
        address,
        .....

the data just follows in the "natural order" i see it in my head.   
5 stars Middle? October 22, 2003 - 8am Central time zone
Reviewer: A reader
Tom,
I know one third party application that has all the key columns exactly at the middle of the column 
list, in a table. The application is supposed to work with any database. How would you rate this?
Thanks 


Followup October 22, 2003 - 12pm Central time zone:

confusing but not a big deal 
5 stars Thanks. October 23, 2003 - 1am Central time zone
Reviewer: Alvin
I have both your books and i just read the first chapter of the new book.

Followup on my earlier question.

How would oracle store varchar2(2000) and varchar2(30) in the block buffer cache ? won't the former 
entail additional overhead of any kind ?

My developers are a tad greedy on allocations and i want to show them the effects of such (if any). 


Followup October 23, 2003 - 8am Central time zone:

they will not be stored any differently, no.

the effects are all in the client, in the loss of data integrity, in the loss of meaning, in the 
sloppiness of the implementation. 
5 stars cool info on column ordering thanks October 23, 2003 - 10am Central time zone
Reviewer: Kevin Meade from Norwich CT, USA
Thanks Tom, may I offer my 2 cents;  I would note regarding column ordering these clarifications: 
1) if the reader has read this thread, you have explained why primary key comes first (some 
modelers also suggest foreign keys come next (eh..)), 2) also you have explained why "most used" 
columns come second, 3) I would mention that "likely to be null" columns at the end of a row is as 
I recall because null columns at the end of a row take no space on the row but null columns between 
columns with values take 1 byte (ooh boy jumping at the space savings).

As always though you hit the nail on the head with "don't over analyze, do what makes sense".  I 
can't say I every paid much attention to most of this as there are always more important fish to 
fry.  Perfection is never the goal (ooh, did I say never?).

My I also suggest, there is always alphabetical order (hmmm...). 


Followup October 23, 2003 - 1pm Central time zone:

i remember once working as a coder -- doing some Ada programming.

the "senior reviewers" wanted me to code all subroutines in alphabetical order, that last comment 
reminded me of that.  I hated that!  (i made them really made by using "renames" in ada to give the 
appearance of compliance without really reordering any code -- i just added AA_ AB_ AC_ to the 
front of each routine and then renamed AA_proc1 to proc1 :) 
4 stars clob clarification October 23, 2003 - 11am Central time zone
Reviewer: Rob from Pittsburgh, PA
Tom:

When you say:

when you use clobs, they are moved out of the table into their own segment when 
they exceed 4000 bytes

Does this mean that all clobs from each row are moved, or just the ones where the the clob in a 
particular row exceeds 4000 bytes.  

Thanks,

Rob 


Followup October 23, 2003 - 1pm Central time zone:

just the ones that are bigger then 4000 
5 stars Storing Real Numbers and Pseudo-Numbers October 23, 2003 - 12pm Central time zone
Reviewer: Dave from Colorado
Quote: "... never be tempted to use a VARCHAR2 to hold a 
number"

When storing such entities as Zip codes, Social Security Numbers, credit card numbers, telephone 
numbers etc., I have always used Char(), or VarChar2(), because my view was that these are not 
"real" numbers.

They are codes for which all of the characters happen to be numeric, and any operation that the 
system is going to perform will be character-based. For example, formatting a telephone number as 
(999) 999-9999, or viewing the last four of an SSN. We're certainly never going to perform any 
algebraic op's on them, like adding one zip code to another, and we can easily apply check 
constraints to ensure that ...

Translate(SSN,'0123456789','0000000000')='000000000'

... for example.

What's your opinion, Tom? Does this seem like a valid distinction, "real" and "pseudo" numbers? 


Followup October 23, 2003 - 1pm Central time zone:

they are not numbers as you said.  zip codes in particular are "postal codes" in general and are 
alpha numeric in reality.

so, since they are not numbers, store in a string -- you'll never treat them as numbers so it's ok. 
3 stars Length of primary key October 31, 2003 - 3am Central time zone
Reviewer: Kunjan Mehta from Bangalore, India
Hi Tom,

We are using surrogate keys as the primary keys. We are using oracle sequences to generate 
surrogate keys.

I believe, we can generate upto Number(27) through the sequences. is it correct? If not, is there 
any limit on the largest number being generated by sequence? What is it?

Is there any desired maximum length for the primary key? (e.g. it should not be more than 19 digits 
or something like that)?

We are desinging a customized ERP. Is it advisable to use a single sequence for all the surrogate 
keys across the entire application? (approx 1000 tables in total) or should each table have one 
dedicated sequence or each logical group (say tables for Accounts, tables for materials etc.) 
should use a dedicated sequences each?

(we're using 9iR1 as database.) 


Followup October 31, 2003 - 7am Central time zone:

ops$tkyte@ORA920LAP> select rpad('9',27,'9')/60/60/24/366/1000 from dual;

RPAD('9',27,'9')/60/60/24/366/1000
----------------------------------
                        3.1623E+16


27 9's is a big big number.  that shows it would take 3.16e16 YEARS to exhaust the sequence 
generating 1000 numbers a second


the number of digits is not going to have much bearing on anything.  it'll be just fine to use a 
single number column as big as it gets.

You should not have a single sequence.  see
http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:2985886242221
3 stars not exhausting sequence, but what about latches October 31, 2003 - 8am Central time zone
Reviewer: freek from Belgium
Tom,

Using 1 sequence for an entire application wouldn't exhaust the sequence, but wouldn't it be still 
better to use 1 sequence per table / column to reduce waits? 


Followup October 31, 2003 - 8am Central time zone:

you know what, i didn't read the entire thing.  thanks -- i'll finish my answer now :) 
4 stars What about Table APIs November 3, 2003 - 7am Central time zone
Reviewer: A P Clarke from London, England
Towards the top of this thread you said:

>> SELECT only that data which you actually NEED.  

This is my preferred approach (got burned by SELECT * a long while back).  Howvever, there are a 
number of respected practioners out there who recommend using Table APIs i.e. PL/SQL encapsulation 
packages that act as the sole method of interacting with a table.  The advantages are security and 
(allegedly) performance - no multiple versions of the same SELECT statement knocking around, just 
one "tuned" one.

However, it seems to me that the SLCT() functions in Table APIs - such as the ones generated by 
Oracle Designer - must always select every columns, unless we code a SLCT()  function for every 
single permutation of column, which I think is impractical.

I have survived a long time without using Table APIs but I got doubts when I read Steven 
Feuerstein's PL/SQL Best Practices book.  What do you think?  Can table APIs be reconciled with 
"only SELECT what you need"?

Cheers, APC

P.S.  I'm not trying to set up some PL/SQL version of "Clash of the Titans" here :P

   


Followup November 3, 2003 - 8am Central time zone:

well, table API's to me are somewhat weak in the first place -- or just have special use cases 
(simple CRUD