Querying block model data

Introduction

When querying a block model, multiple endpoints need to be hit in order to complete the workflow:

The initial query request.
Polling the job URL.
Downloading the query file.

Initial query request

This starts the block model query workflow.
This is done via the Start a block model data query endpoint. This is the initial request where the server validates that the body of the request is well formed and references an existing block model with valid columns.
- The POST request body contains details on return data format, which columns to select, and spatial bounds of the query.
Unlike a block model update, the intitial request to start the query is all that is needed for the job to be created and eventually completed by the server.
A successful response from the server contains a reflection of the query criteria made in the initial request, as well as a job_url which can be polled to check the status of the query job.

The following is an example request body for making this initial request.

{
    "bbox": {
        "i_minmax": {
            "min": 2,
            "max": 15
        },
        "j_minmax": {
            "min": 2,
            "max": 21
        },
        "k_minmax": {
            "min": 2,
            "max": 27
        }
    },
    "columns": ["*"],
    "geometry_columns": "indices",
    "output_options": {
        "file_format": "csv",
        "column_headers": "name",
        "exclude_null_rows": "true",
    },
    "version_uuid": "11111111-2222-3333-4444-555555555555"
}

Definitions of payload fields as follows:

bbox (Optional)
columns (Required)
geometry_columns (Optional)
output_options (Optional)
version_uuid (Optional)

`bbox` (Optional)

The field bbox refers to the bounding box to be used for the query, and can either be defined in terms of IJK or XYZ. If bbox is not provided, the query automatically defaults to encompassing the entire block model.

IJK bounding box

When using an IJK (block index) bounding box, it should be of the following form.

{
    "i_minmax": {
        "min": "int",
        "max": "int"
    },
    "j_minmax": {
        "min": "int",
        "max": "int"
    },
    "k_minmax": {
        "min": "int",
        "max": "int"
    }
}

An IJK bounding box must be in bounds of the index extents of the model, and must make a non-empty selection of blocks to be considered valid by the server.
The corners of the bounding box defined by min_corner = (i_min, j_min, k_min) and max_corner = (i_max, j_max, k_max) must follow min_corner <= max_corner to result in a non-empty selection of blocks.
The index extents of a block model are defined in terms of the n_blocks of the model dx, dy, and dz, and refer to the range of valid integer values an I, J, or K index can have. For example, if we have dx = 10, dy = 5, and dz = 7 for some block model, then the valid range of values for i, j, k min/maxes are as follows:
- 0 <= i_min < dx or 0 <= i_min <= 10 - 1
- 0 <= j_min < dy or 0 <= j_min <= 5 - 1
- 0 <= k_min < dz or 0 <= k_min <= 7 - 1

XYZ bounding box

When using an XYZ (centroid coordinate) bounding box, it should be of the following form.

{
    "x_minmax": {
        "min": "float",
        "max": "float"
    },
    "y_minmax": {
        "min": "float",
        "max": "float"
    },
    "z_minmax": {
        "min": "float",
        "max": "float"
    }
}

An XYZ bounding box should be model aligned. If the model is rotated, then the XYZ bounding box should also be rotated.
An XYZ bounding box must have min < max, and at least one centroid must be contained by the bounding box.
An XYZ bounding box can exceed the bounds of the model, as we select centroids based on whether they're encapsulated by the bounding box or not.
For more details on XYZ bounding boxes, see XYZ bounding boxes and rotations.

`columns` (Required)

The columns fields denotes a list of string values for extra columns of the block model to be included in the output file.
The columns field supports selecting columns by either their title or ID, and can also include a wildcard ("*") placeholder, which will expand to all user columns, ordered alphabetically by title, in a case-insensitive manner.
Please note that the wildcard does not cover the system column version_id. To include version_id in the output file, you must also explicitly specify it in the columns field alongside the wildcard.
The order of columns in the output file will match the order in the columns field.
Columns that are part of the initial "geometry" columns will be ignored if specified.

For example, given a model that has user columns A (with a column id of d718abe4-56a5-4e27-ad51-813e69eb8aac), B, and C.

The table below shows the output file columns for different model types and parameters.

Model Type	`geometry_columns` field	`columns` field	Output File Columns
Regular	"indices"	["d718abe4-56a5-4e27-ad51-813e69eb8aac", "B"]	`i`, `j`, `k`, `A`, `B`
Regular	"coordinates"	[]	`x`, `y`, `z`
Fully sub-blocked	"indices"	["*", "version_id"]	`i`, `j`, `k`, `sidx`, `A`, `B`, `C`, `version_id`
Flexible	"coordinates"	["*", "start_si"]	`x`, `y`, `z`, `dx`, `dy`, `dz`, `A`, `B`, `C`, `start_si`

`geometry_columns` (Optional)

If geometry_columns is set to "indices" (the default), then the first columns within the output file will be i, j, k, followed by the sub-block index columns (if applicable). If geometry_columns is set to "coordinates", then the output file will contain the columns x, y, z for regular block models and x, y, z, dx, dy, dz for sub-blocked block models. These columns are referred to as the "geometry" columns.

`output_options` (Optional)

The field ouput_options determines the format of the file output by BMS, and each file type has additional properties that can be set by the user. ouput_options defaults in the server to the following.

{
    "file_format": "parquet",
    "column_headers": "name",
    "exclude_null_rows": "true"
}

The exclude_null_rows field specifies whether rows with all null values in the queried user columns should be excluded from the output file.

Set exclude_null_rows to true so rows with entirely null values for the specified user columns will not be included in the output — as outlined in the columns section.
Set exclude_null_rows to false so the output query file will include the block model within the specified bbox, including rows with null values only.

By default, exclude_null_rows is enabled (true).

parquet

When setting the file_format field to parquet we have an optional field column_headers which can be set to either name or id (or omitted).

{
    "file_format": "parquet",
    "column_headers": "string" | null,
    "exclude_null_rows": "boolean" | null
}

The queried file will be output in the .parquet file format, which is written using the following parameters:

Row group size of 100,000.
Compressed using Zstd.
Parquet data page version set to 1.0 and Parquet version set to 2.6. more details in the Apache Arrow documentation.
Each of the file schema fields (or column headers) will have their name set to either the column ID or column title if column_headers is set to "id" or "name" respectively.

CSV

When file_format is set to csv, we have the following fields.

{
    "file_format": "csv",
    "column_headers": "string" | null,
    "delimiter": "string" | null,
    "exclude_null_rows": "boolean" | null
}

column_headers behaves exactly the same way as in file_format: parquet, and will set the header row values of the CSV to the selected option. The delimiter field defaults to ",", but can be set to any single character string.

`version_uuid` (Optional)

A version UUID can be specified, which will result in the query running against that specific version of the model. Be mindful when using this, as only columns that exist at the version specified can be queried, and when using the wildcard "*", you can get wildly different results from version to version.

Polling the job URL

A successful initial request will result in the following response from the server.

{
    "bm_uuid": "string",
    "version_id": "Integer specifying the version sequence number of the version queried",
    "version_uuid": "Same as in initial request if specified, otherwise will be version UUID of latest",
    "bbox": {"Same as in initial request"},
    "mapping": {
        "columns": [
            {
                "col_id": "string",
                "title": "string",
                "data_type": "string"
            } // one entry for each in index columns followed by selected columns
        ]
    },
    "columns": ["columns queried in initial request"],
    "job_url": "https://example.seequent.com/blockmodel/orgs/{org_uuid}/workspaces/{workspace_uuid}/block-models/{bm_uuid}/jobs/{job_uuid}"
}

The field of the response body we care about here is job_url, which can be polled via a GET request to retrieve the current status of the query job. The server's response is largely determined by the value of the job_status field, so it will be individually defined below.

Job status: `QUEUED` or `PROCESSING`

QUEUED: The query job is currently queued and waiting to be picked up by the compute service for processing.
PROCESSING: A compute node is currently processing the query.

{
    "job_status": "QUEUED" | "PROCESSING" 
}

Job status: `COMPLETE`

The query job has finished and the file is ready to be downloaded. The response will also contain the field payload, which will contain a SAS signed blob URL which points to the file containing the queried blocks/columns of the block model.

{
    "job_status": "COMPLETE",
    "payload": {
        "download_url": "https://example.seequent.com/blocksync/{org_id}
            /query/{criteria_hash}.{file_format}?{additional_blob_query_params}"
    }
}

Job status: `FAILED`

This means that the query job failed during processing. If this happens, a support ticket should be raised as nothing a user changes will result in a different outcome. The response will also contain the field payload, which will contain an error message from the compute service explaining why the query failed.

{
    "job_status": "FAILED",
    "payload": {
        "status": 500,
        "title": "Internal Service Error",
        "detail": "A service error occurred when processing this job",
        "type": "https://seequent.com/error-codes/block-model-service/compute-service-error"
    }
}

Downloading the query file

When the download_url is retrieved from the payload of a complete job, it can be downloaded using either a GET request to the download URL or using the Azure SDK. The download URL that is generated every time the completed job_url is polled has a 30 minute time-to-live (TTL), so you must start the download process within 30 minutes. The download itself can take longer than 30 minutes as long as the connection remains open.

Azure blob SDK Python example

Downloading using the Azure SDK for Python can be a bit easier than dealing with file streams, etc. For quick reference the example below uses the SDK function download_blob_from_url.

from azure.storage.blob import download_blob_from_url

file_extension = 'csv or parquet'
download_file_path = f'queried_block_model.{file_extension}'

with open(download_file_path, 'wb') as f:
    download_blob_from_url(download_url, output=f)

Additional functionality

Once a query has been carried out by the server, BMS will cache the resultant file in blob storage indefinitely (subject to change). This file can be retrieved again in two ways:

Make another query request with the exact same body + URL.
Save the job URL from the query for later use. Polling again will return a new blob download URL.

Introduction​

Initial query request​

bbox (Optional)​

IJK bounding box​

XYZ bounding box​

columns (Required)​

geometry_columns (Optional)​

output_options (Optional)​

parquet​

CSV​

version_uuid (Optional)​

Polling the job URL​

Job status: QUEUED or PROCESSING​

Job status: COMPLETE​

Job status: FAILED​

Downloading the query file​

Azure blob SDK Python example​

Additional functionality​