TERADATA SAS 数据加载学习笔记

利用SAS往TD数据库插入大数据量并不是一个好主意

一、If you would like to see how many rows are assigned to each AMP


SELECT hashamp(hashbucket(hashrow(Your_Primary_Index_Columns))) as “AMP”
, count(*) as “Row Count”
FROM Your_Table_Name
GROUP BY AMP
ORDER BY AMP;


   HASHROW(COLUMN)     ------->  Teradata SQL. HASHROW (column(s))--------->Returns the row hash value of a given sequence 返回给定序列的 哈 希 值
  HASHBUCKET (hashrow)----> The grouping for the specific hash value
  HASHAMP (hashbucket)----> The AMP that owns the hash bucket


二、 option multistmt = yes / no

 This option determines whether the INSERT statements should be passed to Teradata one at a time or in batch
 此参数选项决定了insert 语句每次提交给TERADATA 是一句提交还是批次提交。

 适用环境,当你忘可变临时表插入数据时,(可变临时表的数据字典放在内存中)
 multiload fastload 不能用于数据导入临时表

三、 option dbcommit lets you tell the database how often to commit changes to the database data
  此参数选项决定了数据库插入多少记录提交一次

 
DBCOMMIT=n    
If you set this to 0 it commits at the end of the job. This can cause problems with rowhash locking.
You will need to experiment with this value to determine the best value for your situation. More on this below.    
 

 

 

四、Teradata has an optimized form. of INSERT called Fast Path INSERT … SELECT. This is very useful if you want to move data from one Teradata table to another Teradata table

    teradata 有一种优化方法,利用 insert ....select ,当从TD的一个表移到两外一张表时,这种方式非常有效

 这种方式是完全并行的,并且充分利用块写入

 

 

五、OPTIONS DBIDIRECTEXEC 告诉SAS 把以下语句提交到TD数据库执行

OPTIONS SASTRACE=‟,,,d‟ SASTRACELOC=SASLOG NOSTSUFFIX;
LIBNAME mytera TERADATA USER=myuserid PASSWORD=mypassword
SERVER=terasrv DATABASE=PRODDB;
OPTIONS DBIDIRECTEXEC;
PROC SQL;
CREATE TABLE mytera.new_transaction_file AS
SELECT * from mytera.transaction_file;
QUIT;


OPTIONS SASTRACE=‟,,,d‟ SASTRACELOC=SASLOG NOSTSUFFIX;
PROC SQL;
CONNECT TO TERADATA (USER=username PASSWORD=password SERVER=myserver);
EXECUTE (INSERT INTO TESTDB.NEW_TABLE
SELECT * FROM TESTDB.OLD_TABLE) BY TERADATA;
QUIT;


六、FAST LOAD

FastLoad lives to put large amounts of data into an empty Teradata table as quickly as possible. FastLoad is both a Teradata utility and a protocol. The SAS/ACCESS Interface to Teradata can use the FastLoad protocol to quickly load data into empty Teradata tables.
In order to load data using FastLoad:
· The target table must be empty.
· The target table must have no secondary indexes, join indexes, or hash indexes defined on it.
· The target table must have no triggers defined on it.
· The target table must have no standard referential integrity or batch referential integrity defined on it (Soft Referential Integrity is allowed).
· Duplicate rows cannot be loaded.

   1、目标表必须是空表
   2、没有辅助索引,关联索引,哈希索引
   3、 没有触发器
   4、没有标准参照完整性,没有批处理参照完整性
   5、 重复的行是不能被加载的

 七、MULTILOAD utility

MultiLoad is designed to put large amounts of data into a non-empty Teradata table as quickly as possible. Like FastLoad, MultiLoad is both a Teradata batch utility and a protocol. The SAS/ACCESS Interface to Teradata can use the MultiLoad protocol to quickly load data into Teradata tables that already contain data. Unlike FastLoad, the MultiLoad protocol allows you to loadduplicate rows into multiset tables. We will refer to the table into which data is to be loaded as the target table.
In order to load data using MultiLoad:
· The target table must have no unique secondary, join, or hash indexes defined on it.
· The target table must have no triggers defined on it.
· The target table must have no standard referential integrity or batch referential integrity defined on it (Soft Referential Integrity is allowed).
· The MultiLoad input file must have data to qualify all columns defined in the Primary Index of the target table

multiload 是针对目标表已经有数据的时候,大批量数据导入
1、 无辅助索引,连接索引,哈希索引
2、 无触发器
3、 参照完整性约束
4、 主索引定义的所有列

.LOGTABLE scores_mload;
.LOGON teraserver/userid,password;
.BEGIN IMPORT MLOAD TABLES scores;
.Layout Record_Layout;
.FIELD in_IDNUM 1 CHAR(11);
.FIELD in_Rec_Type 13 CHAR(1);
.FIELD in_Name 15 CHAR(30);
.FIELD in_Points 46 CHAR(11);
.DML LABEL Add_Points;
UPDATE scores set Points = (Points + :in_Points)
WHERE IDNUM = :in_IDNUM;
.DML LABEL Sub_Points;
UPDATE scores set Points = (Points - :in_Points)
WHERE IDNUM = :in_IDNUM;
.DML LABEL Add_Row;
INSERT INTO scores VALUES (:in_IDNUM, :in_Name, :in_Points);
.IMPORT INFILE MultiLoadsample.dat
LAYOUT Record_Layout
FORMAT TEXT
APPLY Add_Points WHERE in_Rec_Type = 'A'
APPLY Sub_Points WHERE in_Rec_Type = 'S'
APPLY Add_Row WHERE in_Rec_Type = 'I';
.END MLOAD;
.LOGOFF;

 

 

25页


 

请使用浏览器的分享功能分享到微信等