# Tables
These functions are used to create and modify data tables in HEAVY.AI.
## Nomenclature Constraints
Table names must use the NAME format, described in [regex](https://en.wikipedia.org/wiki/Regular_expression) notation as:
```
[A-Za-z_][A-Za-z0-9\$_]*
```
Table and column names can include quotes, spaces, and the underscore character. Other special characters are permitted if the name of the table or column is enclosed in double quotes (" ").
* Spaces and special characters other than underscore (\_) cannot be used in Heavy Immerse.
* Column and table names enclosed in double quotes cannot be used in Heavy Immerse
## CREATE TABLE
```sql
CREATE [TEMPORARY] TABLE [IF NOT EXISTS]
()], ...)
[WITH ( = value, ...)];
```
Create a table named `
` specifying `` and table properties.
### Supported Datatypes
| Datatype | Size (bytes) | Notes |
| ------------------------- | ------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `BIGINT` | 8 | Minimum value: `-9,223,372,036,854,775,807`; maximum value: `9,223,372,036,854,775,807`. |
| `BOOLEAN` | 1 | TRUE: `'true'`, `'1'`, `'t'`. FALSE: `'false'`, `'0'`, `'f'`. Text values are not case-sensitive. |
| `DATE`**\*** | 4 | Same as `DATE ENCODING DAYS(32)`. |
| `DATE ENCODING DAYS(32)` | 4 | Range in years: `+/-5,883,517` around epoch. Maximum date January 1, 5885487 (approximately). Minimum value: `-2,147,483,648`; maximum value: `2,147,483,647`. Supported formats when using `COPY FROM`: `mm/dd/yyyy`, `dd-mmm-yy`, `yyyy-mm-dd`, `dd/mmm/yyyy`. |
| `DATE ENCODING DAYS(16)` | 2 |
Range in days: -32,768 - 32,767 Range in years: +/-90 around epoch, April 14, 1880 - September 9, 2059. Minumum value: -2,831,155,200; maximum value: 2,831,068,800. Supported formats when using COPY FROM: mm/dd/yyyy, dd-mmm-yy, yyyy-mm-dd, dd/mmm/yyyy.
|
| `DATE ENCODING FIXED(32)` | 4 | In DDL statements defaults to `DATE ENCODING DAYS(16)`. Deprecated. |
| `DATE ENCODING FIXED(16)` | 2 | In DDL statements defaults to `DATE ENCODING DAYS(16)`. Deprecated. |
| `DECIMAL` | 2, 4, or 8 |
Takes precision and scale parameters: DECIMAL(precision,scale).
Size depends on precision:
Up to 4: 2 bytes
5 to 9: 4 bytes
10 to 18 (maximum): 8 bytes
Scale must be less than precision.
|
| `DOUBLE` | 8 | Variable precision. Minimum value: `-1.79 x e^308`; maximum value: `1.79 x e^308`. |
| `FLOAT` | 4 | Variable precision. Minimum value: `-3.4 x e^38`; maximum value: `3.4 x e^38`. |
| `INTEGER` | 4 | Minimum value: `-2,147,483,647`; maximum value: `2,147,483,647`. |
| `SMALLINT` | 2 | Minimum value: `-32,767`; maximum value: `32,767`. |
| `TEXT ENCODING DICT` | 4 | Max cardinality 2 billion distinct string values |
| `TEXT ENCODING NONE` | Variable | Size of the string + 6 bytes |
| `TIME` | 8 | Minimum value: `00:00:00`; maximum value: `23:59:59`. |
| `TIMESTAMP` | 8 |
Linux timestamp from -30610224000 (1/1/1000 00:00:00.000) through 29379542399 (12/31/2900 23:59:59.999).
Can also be inserted and stored in human-readable format:
YYYY-MM-DD HH:MM:SS
YYYY-MM-DDTHH:MM:SS (The T is dropped when the field is populated.)
|
| `TINYINT` | 1 | Minimum value: `-127`; maximum value: `127`. |
**\*** In OmniSci release 4.4.0 and higher, you can use existing 8-byte `DATE` columns, but you can create only 4-byte `DATE` columns (default) and 2-byte `DATE` columns (see [`DATE ENCODING FIXED(16)`](https://docs.omnisci.com/latest/5_datatypes.html#date_encoding_16)).
For more information, see [Datatypes and Fixed Encoding](/sql/data-definition-ddl/datatypes-and-fixed-encoding).
For geospatial datatypes, see [Geospatial Primitives](/sql/data-definition-ddl/datatypes-and-fixed-encoding#geospatial-datatypes).
### **Examples**
Create a table named `tweets` and specify the columns, including type, in the table.
```sql
CREATE TABLE IF NOT EXISTS tweets (
tweet_id BIGINT NOT NULL,
tweet_time TIMESTAMP NOT NULL ENCODING FIXED(32),
lat FLOAT,
lon FLOAT,
sender_id BIGINT NOT NULL,
sender_name TEXT NOT NULL ENCODING DICT,
location TEXT ENCODING DICT,
source TEXT ENCODING DICT,
reply_to_user_id BIGINT,
reply_to_tweet_id BIGINT,
lang TEXT ENCODING DICT,
followers INT,
followees INT,
tweet_count INT,
join_time TIMESTAMP ENCODING FIXED(32),
tweet_text TEXT,
state TEXT ENCODING DICT,
county TEXT ENCODING DICT,
place_name TEXT,
state_abbr TEXT ENCODING DICT,
county_state TEXT ENCODING DICT,
origin TEXT ENCODING DICT,
phone_numbers bigint);
```
Create a table named delta and assign a default value `San Francisco` to column city.
```sql
CREATE TABLE delta (
id INTEGER NOT NULL,
name TEXT NOT NULL,
city TEXT NOT NULL DEFAULT 'San Francisco' ENCODING DICT(16));
```
Default values currently have the following limitations:
* Only literals can be used for column DEFAULT values; expressions are not supported.
* You cannot define a DEFAULT value for a shard key. For example, the following does not parse: `CREATE TABLE tbl (id INTEGER NOT NULL DEFAULT 0, name TEXT, shard key (id)) with (shard_count = 2);`
* For arrays, use the following syntax: `ARRAY[A, B, C, …. N]`
The syntax `{A, B, C, ... N}` is not supported.
* Some literals, like NUMERIC and GEO types, are not checked at parse time. As a result, you can define and create a table with malformed literal as a default value, but when you try to insert a row with a default value, it will throw an error.
### Supported Encoding
| Encoding | Descriptions |
| -------------- | ------------------------------------------------------------------------------------------------------------------------------------------------- |
| `DICT` | Dictionary encoding on string columns (default for `TEXT` columns). Limit of 2 billion unique string values. |
| `FIXED` (bits) | Fixed length encoding of integer or timestamp columns. See [Datatypes and Fixed Encoding](/sql/data-definition-ddl/datatypes-and-fixed-encoding). |
| `NONE` | No encoding. Valid only on `TEXT` columns. No Dictionary is created. Aggregate operations are not possible on this column type. |
### WITH Clause Properties
| Property | Description |
| --------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `fragment_size` | Number of rows per fragment that is a unit of the table for query processing. Default: 32 million rows, which is not expected to be changed. |
| `max_rollback_epochs` |
Limit the number of epochs a table can be rolled back to. Limiting the number of epochs helps to limit the amount of on-disk data and prevent unmanaged data growth.
Limiting the number of rollback epochs also can increase system startup speed, especially for systems on which data is added in small batches or singleton inserts. Default: 3.
The following example creates the table test\_table and sets the maximum epoch rollback number to 50:
CREATE TABLE test\_table(a int) WITH (MAX\_ROLLBACK\_EPOCHS = 50);
|
| `max_rows` |
Used primarily for streaming datasets to limit the number of rows in a table, to avoid running out of memory or impeding performance. When the max\_rows limit is reached, the oldest fragment is removed. When populating a table from a file, make sure that your row count is below the max\_rows setting. If you attempt load more rows at one time than the max\_rows setting defines, the records up to the max\_rows limit are removed, leaving only the additional rows. Default: 2^62.
In a distributed system, the maximum number of rows is calculated as max\_rows \* leaf\_count. In a sharded distributed system, the maximum number of rows is calculated as max\_rows \* shard\_count.
|
| `page_size` | Number of I/O page bytes. Default: 1MB, which does not need to be changed. |
| `partitions` |
Partition strategy option:
SHARDED: Partition table using sharding.
REPLICATED: Partition table using replication.
|
| `shard_count` | Number of shards to create, typically equal to the number of GPUs across which the data table is distributed. |
| `sort_column` | Name of the column on which to sort during bulk import. |
### Sharding
Sharding partitions a database table across multiple servers so each server has a part of the table with the same columns but with different rows. Partitioning is based on a sharding key defined when you create the table.
Without sharding, the dimension tables involved in a join are replicated and sent to each GPU, which is not feasible for dimension tables with many rows. Specifying a shard key makes it possible for the query to execute efficiently on large dimension tables.
Currently, specifying a shard key is useful for joins, only:
* If two tables specify a shard key with the same type and the same number of shards, a join on that key only sends a part of the dimension table column data to each GPU.
* For multi-node installs, the dimension table does not need to be replicated and the join executes locally on each leaf.
#### **Constraints**
* A shard key must specify a single column to shard on. There is no support for sharding by a combination of keys.
* One shard key can be specified for a table.
* Data are partitioned according to the shard key and the number of shards (`shard_count`).
* A value in the column specified as a shard key is always sent to the same partition.
* The number of shards should be equal to the number of GPUs in the cluster.
* Sharding is allowed on the following column types:
* DATE
* INT
* TEXT ENCODING DICT
* TIME
* TIMESTAMP
* Tables must share the dictionary for the column to be involved in sharded joins. If the dictionary is not specified as shared, the join does not take advantage of sharding. Dictionaries are reference-counted and only dropped when the last reference drops.
#### **Recommendations**
* Set `shard_count` to the number of GPUs you eventually want to distribute the data table across.
* Referenced tables must also be `shard_count` -aligned.
* Sharding should be minimized because it can introduce load skew accross resources, compared to when sharding is not used.
**Examples**
Basic sharding:
```sql
CREATE TABLE customers(
accountId text,
name text,
SHARD KEY (accountId))
WITH (shard_count = 4);
```
Sharding with shared dictionary:
```sql
CREATE TABLE transactions(
accountId text,
action text,
SHARD KEY (accountId),
SHARED DICTIONARY (accountId) REFERENCES customers(accountId))
WITH (shard_count = 4);
```
### Temporary Tables
Using the TEMPORARY argument creates a table that persists only while the server is live. They are useful for storing intermediate result sets that you access more than once.
Adding or dropping a column from a temporary table is not supported.
#### **Example**
```sql
CREATE TEMPORARY TABLE customers(
accountId TEXT,
name TEXT,
timeCreated TIMESTAMP)
```
## CREATE TABLE AS SELECT
```sql
CREATE TABLE [IF NOT EXISTS] AS (