Wednesday, November 10, 2010

How To : installing and testing greenplum single node edition on Centos 5.5

Download the single node edition, Here I use the version 4.0 for redhat /centos , greenplum-db-4.0.0.4-build-1-RHEL5-x86_64.bin

check OS version,

[root@localhost Desktop]# cat /etc/redhat-release
CentOS release 5.5 (Final)


if you run the installer on unmatched version of centos, will get an error, “Installer will only install on RedHat/CentOS x86_64”
here run Uname –m, It will tell the hardware spec, 386 or x86_64.


[root@localhost ~]# uname -m
x86_64

run the installer directly.


[root@localhost Desktop]# ./greenplum-db-4.0.0.4-build-1-RHEL5-x86_64.bin
make sure you have the execute permission. If not, run
chmod 722 greenplum-db-4.0.0.4-build-1-RHEL5-x86_64.bin


read and accept the agreement,
by default the bits will be put into folder /usr/local/greenplum-version.  Enter yes to accept the default settings. like folder path, once done. you will get the message like

image

source /usr/local/greenplum-db-4.0.0.4/greenplum_path.sh

Create a dedicated user to run the process , and folders to hold segments and maser data. given user the folder access permission.


Here we will create a user named gp
Create a maser folder /data/master
Create 4 segment folders /data/seg1, /data/seg2,/data/seg3/,/data/seg4
Change ownership folder of /data/* to gp

image 
Copy one instance from the sample config and change the setting like folder path , listen port.

cd /usr/local/greenplum-db-4.0.0.4/docs/cli_help
[root@localhost cli_help]# cp gp_init_singlenode_example /home/gp/
[root@localhost cli_help]# cp single_hostlist_example /home/gp/
[root@localhost cli_help]#

Switch to user gp, go to home foder, then change two files we just copied here.

gp_init_singlenode_example content change.

MACHINE_LIST_FILE=./single_hostlist_example

declare -a DATA_DIRECTORY=(/data/seg1 /data/seg2 /data/seg3 /data/seg4)
#four segment folders

MASTER_DIRECTORY=/data/master
#master diferctory

Initialize the system. (create folders and start up several postgres processes.)

[gp@localhost ~]$ gpssh-exkeys -f single_hostlist_example

gpinitsystem -c gp_init_singlenode_example
image

once done, when you run ps -aux|grep /data/, you will see 5 postgresq processes here, one master , four segment instances.

image


Now the single Node cluster is ready, we can use the psql to connect to the instance and have some fun.

Create one test database
CreateDB androidtest

psql (8.2.14)
Type "help" for help.

androidtestdb=# Create table public.test( id int primary key, state varchar(30) not null);
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "test_pkey" for table "test"
CREATE TABLE
androidtestdb=# \d+ public.test;
                   Table "public.test"
Column |         Type          | Modifiers | Description
--------+-----------------------+-----------+-------------
id     | integer               | not null  |
state  | character varying(30) | not null  |
Indexes:
    "test_pkey" PRIMARY KEY, btree (id)
Has OIDs: no
Distributed by: (id)


Now we just created one table test which is disturbed by id. which means if we put more data, the data will be distributed and dispatched to different segments by column iD.  let’s run a simple sql.

insert into public.test values (1,'AL') ,(2,'AK') ,(3,'AS') ,(4,'AZ') ,(5,'AR') ,(6,'CA') ,(7,'CO') ,(8,'CT') ,(9,'DE') ,(10,'DC') ,(11,'FM') ,(12,'FL') ,(13,'GA') ,(14,'GU') ,(15,'HI') ,(16,'ID') ,(17,'IL') ,(18,'IN') ,(19,'IA') ,(20,'KS') ,(21,'KY') ,(22,'LA') ,(23,'ME') ,(24,'MH') ,(25,'MD') ,(26,'MA') ,(27,'MI') ,(28,'MN') ,(29,'MS') ,(30,'MO') ,(31,'MT') ,(32,'NE') ,(33,'NV') ,(34,'nh') ,(35,'NJ') ,(36,'NM') ,(37,'NY') ,(38,'NC') ,(39,'ND') ,(40,'MP') ,(41,'OH') ,(42,'OK') ,(43,'OR') ,(44,'PW') ,(45,'PA') ,(46,'PR') ,(47,'RI') ,(48,'SC') ,(49,'SD') ,(50,'TN') ,(51,'TX') ,(52,'UT') ,(53,'VT') ,(54,'VI') ,(55,'VA') ,(56,'WA') ,(57,'WV') ,(58,'WI') ,(59,'WY') ;
INSERT 0 59


How can we tell whether the data are distributed across segments evenly.
in each table, there is one system column called gp_segment_id

select gp_segment_id, id, state from public.test;

gp_segment_id | id | state
---------------+----+-------
             3 |  2 | AK
             3 |  6 | CA
             3 | 10 | DC
             3 | 14 | GU
             3 | 18 | IN
             3 | 22 | LA
             3 | 26 | MA


you can run a basic group query

androidtestdb=# select gp_segment_id, count(*) from public.test group by gp_segment_id;
gp_segment_id | count
---------------+-------
             3 |    15
             0 |    15
             1 |    14
             2 |    15

   data is evenly distributed across four segments.

also you can run a analyze to tell the execution plan. which will also tell you the segments scan are handled in a parallel way.

androidtestdb=# EXPLAIN  select * from public.test;
                                 QUERY PLAN
-----------------------------------------------------------------------------
 Gather Motion 4:1  (slice1; segments: 4)  (cost=0.00..4.59 rows=15 width=7)
   ->  Seq Scan on test  (cost=0.00..4.59 rows=15 width=7)
(2 rows)


EXPLAIN analyze  select * from public.test;
                                                   QUERY PLAN                             
----------------------------------------------------------------------------------------------------------------
Gather Motion 4:1  (slice1; segments: 4)  (cost=0.00..4.59 rows=15 width=7)
   Rows out:  59 rows at destination with 0.515 ms to first row, 1.215 ms to end.
   ->  Seq Scan on test  (cost=0.00..4.59 rows=15 width=7)
         Rows out:  Avg 14.8 rows x 4 workers.  Max 15 rows (seg0) with 0.026 ms to first row, 0.029 ms to end.
Slice statistics:
   (slice0)    Executor memory: 139K bytes.
   (slice1)    Executor memory: 155K bytes avg x 4 workers, 155K bytes max (seg0).
Total runtime: 2.038 ms
(8 rows)




1 comment:

Anonymous said...

Really very helpful page ...

 
Locations of visitors to this page