I've been at "Enzee Universe", the world-wide Netezza conference and offer these top 3 take-aways from the sessions I've seen at day 1.
What's a Netezza?
A Netezza is a database appliance. That means it's a refridgerator-sized box that has about 100 disk drives in it, with many cpus, and a high-speed network inside. It's optimized for data warehousing, so when you add data to a table the data automatically gets divided over the 100 disk drives. This dramatically lowers database operations, as each table is 1/100th the size it otherwise would have been, so scans happen 100 times faster.
"Best Practices With Netezza" by David Birmingham (author of "Netezza Underground")
1) Make floats into ints. Ints compress on the Netezza and thus are more efficient.
2) When making performance changes, consider all 3 components of the ETL/Data Warehouse/BI stack. Don't optimize one at the expense of another.
3) Use CTAS (Create Table As Select) often. Intermediate steps (making 'work tables') is good practice.
"Netezza 101" by Ed Patterson
1) The Netezza has a 10GB internal network, and can currently hold up to 7 Petabytes of storage.
2) The SCSI drivers allow up to 110 MB/second of data transfer, but since data compression is transparent and automatic (right next to the disk), effective rates are really more like 440 MB/second per disk.
3) The Netezza can currently load data at a rate of about 1 TB an hour.
Featured keynote by Donald Feinberg, Distinguished Analyst at Gartner
1) Memory is getting much bigger, now over 1 TB on a server. The future of data warehousing may be in-memory, not on SSD or Flash as some expect.
2) Memory is more expensive to buy than disk, but it requires only 1% of the electricity. This will become an important economic consideration.
3) There is a bright future in Predictive Analytics.
That's it! I hope you found something of interest in all that. If so, tune in again for another list from Day 2.
'Till then,
Happy Coding!
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment