# Importing Geospatial Data If there is a potential for duplicate entries and you want to avoid loading duplicate rows, see the Troubleshooting page for information on avoiding duplicate rows. ## Importing Geospatial Data Using Heavy Immerse You can use Heavy Immerse to import geospatial data into HeavyDB. Supported formats include: * Keyhole Markup Language (`.kml`) * GeoJSON (`.geojson`) * Shapefiles (`.shp`) * FlatGeobuf (`.fgb`) Shapefiles include four mandatory files: `.shp`, `.shx`, `.dbf`, and `.prj`. If you do not import the `.prj` file, the coordinate system will be incorrect and you cannot render the shapes on a map. To import geospatial definition data: 1. Open Heavy Immerse. 2. Click **Data Manager**. 3. Click **Import Data**. 4. Choose whether to import from a local file or an Amazon S3 instance. For details on importing from Amazon S3, see [Importing Data from Amazon S3](/loading-and-exporting-data/supported-data-sources/immerse-datamanager#importing-data-from-amazon-s3). 5. Click the large `+` icon to select files for upload, or drag and drop the files to the **Data Importer** screen. When importing shapefiles, upload all required file types at the same time. If you upload them separately, Heavy Immerse issues an error message. 6. Wait for the uploads to complete (indicated by green checkmarks on the file icons), then click **Preview**. 7. On the **Data Preview** screen: * Edit the column headers (if needed). * Enter a name for the table in the field at the bottom of the screen. * If you are loading the data files into a distributed system, verify under **Import Settings** that the **Replicate Table** checkbox is selected. * Click **Import Data**. 8. On the **Successfully Imported Table** screen, verify the rows and columns that compose your data table. ## Importing Well-Known Text You can import spatial representations in [Well-known Text (WKT)](https://en.wikipedia.org/wiki/Well-known_text) format. WKT is a text markup language for representing vector geometry objects on a map, spatial reference systems of spatial objects, and transformations between spatial reference systems. When representing longitude and latitude in HEAVY.AI geospatial primitives, the first coordinate is assumed to be longitude by default. ### WKT Data Supported in Geospatial Columns You can use `heavysql` to define tables with columns that store WKT geospatial objects. ``` heavysql> \d geo CREATE TABLE geo ( p POINT, l LINESTRING, poly POLYGON) ``` ### Insert You can use `heavysql` to insert data as WKT string values. ``` heavysql> INSERT INTO geo values('POINT(20 20)', 'LINESTRING(40 0, 40 40)', 'POLYGON(( 0 0, 40 0, 40 40, 0 40, 0 0 ))'); ``` ### Importing Delimited Files You can insert data from CSV/TSV files containing WKT strings. HEAVY.AI supports Latin-1 ASCII format and UTF-8. If you want to load data with another encoding (for example, UTF-16), convert the data to UTF-8 before loading it to HEAVY.AI. ``` > cat geo.csv "p", "l", "poly" "POINT(1 1)", "LINESTRING( 2 0, 2 2)", "POLYGON(( 1 0, 0 1, 1 1 ))" "POINT(2 2)", "LINESTRING( 4 0, 4 4)", "POLYGON(( 2 0, 0 2, 2 2 ))" "POINT(3 3)", "LINESTRING( 6 0, 6 6)", "POLYGON(( 3 0, 0 3, 3 3 ))" "POINT(4 4)", "LINESTRING( 8 0, 8 8)", "POLYGON(( 4 0, 0 4, 4 4 ))" heavysql> COPY geo FROM 'geo.csv'; Result Loaded: 4 recs, Rejected: 0 recs in 0.356000 secs ``` You can use your own custom delimiter in your data files. ``` > cat geo1.csv "p", "l", "poly" POINT(5 5); LINESTRING(10 0, 10 10); POLYGON(( 5 0, 0 5, 5 5 )) heavysql> COPY geo FROM 'geo1.csv' WITH (delimiter=';', quoted='false'); Result Loaded: 1 recs, Rejected: 0 recs in 0.148000 secs ``` ## Importing Legacy CSV/TSV Files ### Storing Geo Data You can import CSV and TSV files for tables that store longitude and latitude as either: * Separate consecutive scalar columns * A POINT field. If the data is stored as a POINT, you can use spatial functions like `ST_Distance` and `ST_Contains`. When location data are stored as a POINT column, they are displayed as such when querying the table: ``` select * from destination_points; name|pt Just Fishing Around|POINT (-85.499999999727588 44.6929999755849) Moonlight Cove Waterfront|POINT (-85.5046011346879 44.6758447935227) ``` HEAVY.AI accepts data with any SRID, or with no SRID. HEAVY.AI supports SRID 4326 ([WGS 84](https://en.wikipedia.org/wiki/World_Geodetic_System#WGS84)), and allows projections from SRID 4326 to SRID 900913 (Google Web Mercator). Geometries declared with SRID 4326 are compressed by default, and can be rendered and used to calculate geodesic distance. Geometries declared with any other SRID, or no SRID, are treated as planar geometries; the SRIDs are ignored. If two geometries are used in one operation (for example, in `ST_Distance`), the SRID values need to match. ### Importing the Data If you are using heavysql, create the table in HEAVY.AI with the POINT field defined as below: ``` CREATE TABLE new_geo (p GEOMETRY(POINT,4326)) ``` Then, import the file using `COPY FROM` in heavysql. By default, the two columns as consumed as longitude `x` and then latitude `y`. If the order of the coordinates in the CSV file is reversed, load the data using the WITH option `lonlat='false'`: ``` heavysql> COPY new_geo FROM 'legacy_geo.csv' WITH (lonlat='false'); ``` Columns can exist on either side of the point field; the lon/lat in the source file does not have to be at the beginning or end of the target table. Fields can exist on either side of the lon/lat pair. If the imported coordinates are not 4326---for example, 2263---you can transform them to 4326 on the fly: ``` heavysql> COPY new_geo FROM 'legacy_geo_2263.csv' WITH (source_srid=2263, lonlat='false'); ``` ### Importing CSV, TSV, and TXT Files in Immerse In Immerse, you define the table when loading the data instead of predefining it before import. Immerse supports appending data to a table by loading one or more files. Longitude and latitude can be imported as separate columns. ## Importing Geospatial Files You can create geo tables by importing specific geo file formats. HEAVY.AI supports the following types: * ESRI shapefile (`.shp` and associated files) * GeoJSON (`.geojson` or `.json`) * KML (`.kml` or `.kmz`) * ESRI file geodatabase (`.gdb`) An ESRI file geodatabase can have multiple layers, and importing it results in the creation of one table for each layer in the file. This behavior differs from that of importing shapefiles, GeoJSON, or KML files, which results in a single table. See [Importing an ESRI File Geodatabase](/loading-and-exporting-data/supported-data-sources/import-geo#importing_esri_file_geodatabase) for more information. You import geo files using the `COPY FROM` command with the `geo` option: ``` heavysql> COPY states FROM 'states.shp' WITH (geo='true'); heavysql> COPY zipcodes FROM 'zipcodes.geojson' WITH (geo='true'); heavysql> COPY cell_towers FROM 'cell_towers.kml' WITH (geo='true'); ``` The geo file import process automatically creates the table by detecting the column names and types explicitly described in the geo file header. It then creates a single geo column (always called heavyai\_geo) that is of one of the supported types (`POINT`, `MULTIPOINT`, `LINESTRING`, `MULTILINESTRING`, `POLYGON`, or `MULTIPOLYGON`). In Release 6.2 and higher, polygon render metadata assignment is disabled by default. This data is no longer required by the new polygon rendering algorithm introduced in Release 6.0. The new default results in significantly faster import for polygon table imports, particularly high-cardinality tables. If you need to revert to the legacy polygon rendering algorithm, polygons from tables imported in Release 6.2 may not render correctly. Those tables must be re-imported after setting the server configuration flag `enable-assign-render-groups` to `true`. The legacy polygon rendering algorithm and polygon render metadata server config will be removed completely in an upcoming release. Due to the prevalence of mixed `POLYGON/MULTIPOLYGON` geo files (and CSVs), if HEAVY.AI detects a `POLYGON` type geo file, HEAVY.AI creates a `MULTIPOLYGON` column and imports the data as single polygons. If the table does not already exist, it is created automatically. If the table already exists, and the data in the geo file has exactly the same column structure, the new file is appended to the existing table. This enables import of large geo data sets split across multiple files. The new file is rejected if it does not have the same column structure. By default, geo data is stored as `GEOMETRY`. You can also create tables with coordinates in SRID 3857 or SRID 900913 (Google Web Mercator). Importing data from shapefiles using SRID 3857 or 900913 is supported; importing data from delimited files into tables with these SRIDs is not supported at this time. To explicitly store in other formats, use the following `WITH` options in addition to `geo='true':` Compression used: * `COMPRESSED(32)` - 50% compression (default) * `None` - No compression Spatial reference identifier (SRID) type: * `4326` - EPSG:4326 (default) * `900913` - Google Web Mercator * `3857` - EPSG:3857 For example, the following explicitly sets the default values for encoding and SRID: ``` geo_coords_encoding='COMPRESSED(32)' geo_coords_srid=4326 ``` Rendering of geo `LINESTRING, MULTILINESTRING`, `POLYGON` and `MULTIPOLYGON` is possible only with data stored in the default lon/lat WGS84 (SRID 4326) format, although the type and encoding are flexible. Unless compression is explictly disabled (`NONE`), all SRID 4326 geometries are compressed. For more information, see[ WSG84 Coordinate Compression](/loading-and-exporting-data/supported-data-sources/import-geo#wgs84). Note that rendering of geo MULTIPOINT is not yet supported. ### Importing an ESRI File Geodatabase An ESRI file geodatabase (`.gdb`) provides a method of storing GIS information in one large file that can have one or more "layers", with each layer containing disparate but related data. The data in each layer can be of different types. Importing a `.gdb` file results in the creation of one table for each layer in the file. You import an ESRI file geodatabase the same way that you import other geo file formats, using the `COPY FROM` command with the `geo` option: ``` heavysql> COPY counties FROM 'counties.gdb' WITH (geo='true'); ``` The layers in the file are scanned and defined by name and contents. Contents are classified as `EMPTY`, `GEO`, `NON_GEO` or `UNSUPPORTED_GEO`: * `EMPTY` layers are skipped because they contain no useful data. * `GEO` layers contain one or more geo columns of a supported type (`POINT`, `MULTIPOINT`, `LINESTRING`, `MULTILINESTRING`, `POLYGON`, `MULTIPOLYGON`) and one or more regular columns, and can be imported to a single table in the same way as the other geo file formats. * `NON_GEO` layers contain no geo columns and one or more regular columns, and can be imported to a regular table. Although the data comes from a geo file, data in this layer does not result in a geo table. * `UNSUPPORTED_GEO` layers contain geo columns of a type not currently supported (for example, `GEOMETRYCOLLECTION`). These layers are skipped because they cannot be imported completely. A single `COPY FROM` command can result in multiple tables, one for each layer in the file. The table names are automatically generated by appending the layer name to the provided table name. For example, consider the geodatabase file `mydata.gdb` which contains two importable layers with names `A` and `B`. Running `COPY FROM` creates two tables, `mydata_A` and `mydata_B`, with the data from layers `A` and `B`, respectively. The layer names are appended to the provided table name. If the geodatabase file only contains one layer, the layer name is not appended. You can load one specific layer from the geodatabase file by using the `geo_layer_name` option: ``` COPY mydata FROM 'mydata.gdb' WITH (geo='true', geo_layer_name='A'); ``` This loads only layer A, if it is importable. The resulting table is called `mydata`, and the layer name is not appended. Use this import method if you want to set a different name for each table. If the layer name from the geodatabase file would result in an illegal table name when appended, the name is sanitized by removing any illegal characters. ### Importing Geo Files from Archives or Non-Local Storage You can import geo files directly from archive files (for example, .zip .tar .tgz .tar.gz) without unpacking the archive. You can directly import individual geo files compressed with Zip or GZip (GeoJSON and KML only). The server opens the archive header and loads the first candidate file it finds (.shp .geojson .json .kml), along with any associated files (in the case of an ESRI Shapefile, the associated files must be siblings of the first). ``` $ unzip -l states.zip Archive: states.zip Length Date Time Name --------- ---------- ----- ---- 0 2018-02-13 11:09 states/ 446116 2017-11-06 12:15 states/cb_2014_us_state_20m.shp 8434 2017-11-06 12:15 states/cb_2014_us_state_20m.dbf 9 2017-11-06 12:15 states/cb_2014_us_state_20m.cpg 165 2017-11-06 12:15 states/cb_2014_us_state_20m.prj 516 2017-11-06 12:15 states/cb_2014_us_state_20m.shx --------- ------- 491525 6 files heavysql> COPY states FROM 'states.zip' with (geo='true'); heavysql> COPY zipcodes FROM 'zipcodes.geojson.gz' with (geo='true'); heavysql> COPY zipcodes FROM 'zipcodes.geojson.zip' with (geo='true'); heavysql> COPY cell_towers FROM 'cell_towers.kml.gz' with (geo='true'); ``` You can import geo files or archives directly from an Amazon S3 bucket. ``` heavysql> COPY states FROM 's3://mybucket/myfolder/states.shp' with (geo='true'); heavysql> COPY states FROM 's3://mybucket/myfolder/states.zip' with (geo='true'); heavysql> COPY zipcodes FROM 's3://mybucket/myfolder/zipcodes.geojson.gz' with (geo='true'); heavysql> COPY zipcodes FROM 's3://mybucket/myfolder/zipcodes.geojson.zip' with (geo='true'); ``` You can provide Amazon S3 credentials, if required, by setting variables in the environment of the `heavysql` process… ``` AWS_REGION=us-west-1 AWS_ACCESS_KEY_ID=******************** AWS_SECRET_ACCESS_KEY=**************************************** ``` You can also provide your credentials explicitly in the COPY FROM command. ``` heavysql> COPY states FROM 's3://mybucket/myfolder/states.zip' WITH (geo='true', s3_region='us-west-1', s3_access_key='********************', s3_secret_key='****************************************'); ``` You can import geo files or archives directly from an HTTP/HTTPS website. ``` heavysql> COPY states FROM 'http://www.mysite.com/myfolder/states.zip' with (geo='true'); ``` You can extend a column type specification to include spatial reference (SRID) and compression mode information. Geospatial objects declared with SRID 4326 are compressed 50% by default with `ENCODING COMPRESSED(32)`. In the following definition of table geo2, the columns poly2 and mpoly2 are compressed. `COMPRESSED(32)` compression maps lon/lat degree ranges to 32-bit integers, providing a smaller memory footprint and faster query execution. The effect on precision is small, approximately 4 inches at the equator. You can disable compression by explicitly choosing `ENCODING NONE`. ## WGS84 Coordinate Compression You can extend a column type specification to include spatial reference (SRID) and compression mode information. Geospatial objects declared with SRID 4326 are compressed 50% by default with `ENCODING COMPRESSED(32)`. In the following definition of table geo2, the columns poly2 and mpoly2 are compressed. ``` CREATE TABLE geo2 ( p2 GEOMETRY(POINT, 4326) ENCODING NONE, l2 GEOMETRY(LINESTRING, 900913), poly2 GEOMETRY(POLYGON, 4326), mpoly2 GEOMETRY(MULTIPOLYGON, 4326) ENCODING COMPRESSED(32)); ``` `COMPRESSED(32)` compression maps lon/lat degree ranges to 32-bit integers, providing a smaller memory footprint and faster query execution. The effect on precision is small, approximately 4 inches at the equator. You can disable compression by explicitly choosing `ENCODING NONE`.