CSV Uploads

Import data from a CSV directly into Sift

Credentials

Before starting this section, ensure you have retrieved your API key and the appropriate Sift URL for your provisioned environment. Instructions on obtaining the API key and URL are available in the authentication section of the documentation.

Import CSVs with cURL

The configuration is a JSON object in the follow format:

{
  "csv_config": { ... }, // Required. See below.
}

To import data from a CSV:

  • POST the configuration to /api/v1/data-imports:upload. If the request is valid, the endpoint will return an upload URL.
  • POST the data file to the upload URL. When the request finishes, the data is fully uploaded and will start ingesting.

Example

$ curl "$SIFT_API_HOST/api/v1/data-imports:upload" \
  --data-binary @my-upload-config.json \
  -H "authorization: Bearer $SIFT_API_KEY"
 
{"uploadUrl":"http://$SIFT_REST_URL/api/v1/data-imports:upload/6e62f0ba-b1ed-491b-88bf-89037d5b017a"}
 
$ curl "$SIFT_API_HOST/api/v1/data-imports/6e62f0ba-b1ed-491b-88bf-89037d5b017a" \
  --data-binary @my-data.csv \
  -H "authorization: Bearer $SIFT_API_KEY"
OK

GZIP'd files are supported via the "content-encoding: gzip" request header.

Data Import from URL

The configuration is a JSON object in the follow format:

{
  "url": string, // Required. The URL of the the data file.
                 // HTTP and S3 are supported.
  "csv_config": { ... }, // Required. See below.
}

To import data from a URL:

  • POST the configuration to /api/v1/data-imports:url. If the request is valid, the endpoint will return a 200 response code and begin ingesting the data.
  • GZIP'd files are supported via the "content-encoding: gzip" response header.

Example

$ curl "$SIFT_REST_URL/api/v1/data-imports:upload" \
    --data-binary @my-url-config.json \
    -H "authorization: Bearer $SIFT_API_KEY"
OK

CSV Upload Status

The 200 response from the upload itself only signals that the request to ingest has been submitted properly and does not indicate whether the data is ingested properly.

In order to check on the status of the upload, users can either poll

General status:

api/v1/data-imports

Specific upload status:

api/v1/data-imports/:<id>

Examples:

General:

curl -s "$SIFT_API_HOST/api/v1/data-imports" -H "authorization: Bearer $SIFT_API_KEY"

Specific:

curl -s "$SIFT_API_HOST/api/v1/data-imports/$UPLOAD_ID" -H "authorization: Bearer $SIFT_API_KEY"

CSV Configuration

The CSV configuration is a JSON object in the following format:

{
  "asset_name": string,     // Required. Your asset name.
  "run_name": string,       // Optional. The name of the run to create for this data.
  "run_id": string,         // Optional. The id of the run to add this data to.
                            // If set, "run_name" is ignored.
  "first_data_row": number, // The first row to start reading as data. Can be used to skip header rows.
                            // Note: The first row in the file is 1.
  "time_column": {
    "format": string         // Required. See "Time Formats" below for options.
    "column_number": number, // The (1-indexed) column number of the column
                             // containing the time data.
    "relative_start_time": string // Required if "format" is a relative time format.
                                  // Must be in RFC3339 format.
                                  // Will be added to all relative times to
                                  // generate an absolute timestamp.
  },
  "data_columns": {
    // Map index is the 1-indexed column number of the data column.
    "2": {
      "name": string,        // Required. The name of the channel.
      "data_type": string    // Required. See "Data Types" below for options.
      "component": string,   // Optional. The channel component.
      "units": string,       // Optional. Channel units. Defaults to the empty unit.
      "description": string, // Optional. Description of the channel.
      "enum_types": [        // Optional. Only valid if data_type is "CHANNEL_DATA_TYPE_ENUM"
        {
          "key": number,     // The raw enum value
          "name": string     // The display value for the enum value
        },
        ...
      ],
      "bit_field_elements: [   // Optional. Only valid if data_type is "CHANNEL_DATA_TYPE_BIT_FIELD"
        {
          "index": number,    // Starting index of the bit_field
          "name": string,     // Name of the bit field element.
          "bit_count": number // Number of bits in the element
        },
        ...
      ]
    },
  },
  ...
}

Columns not specified in the configuration are not ingested.

Time Formats

You can specify the following time formats:

Absolute

  • TIME_FORMAT_ABSOLUTE_RFC3339: Example 2023-01-02T15:04:05Z
  • TIME_FORMAT_ABSOLUTE_DATETIME Example: 2023-01-02 15:04:05
  • TIME_FORMAT_ABSOLUTE_UNIX_SECONDS: Seconds since the unix epoch.
  • TIME_FORMAT_ABSOLUTE_UNIX_MILLISECONDS: Milliseconds since the unix epoch.
  • TIME_FORMAT_ABSOLUTE_UNIX_MICROSECONDS: Microseconds since the unix epoch.
  • TIME_FORMAT_ABSOLUTE_UNIX_NANOSECONDS: Nanoseconds since the unix epoch.

Relative

  • TIME_FORMAT_RELATIVE_NANOSECONDS
  • TIME_FORMAT_RELATIVE_MICROSECONDS
  • TIME_FORMAT_RELATIVE_MILLISECONDS
  • TIME_FORMAT_RELATIVE_SECONDS
  • TIME_FORMAT_RELATIVE_MINUTES
  • TIME_FORMAT_RELATIVE_HOURS

Data Types

You can specify the following data types:

  • CHANNEL_DATA_TYPE_DOUBLE: A double precision floating point number.
  • CHANNEL_DATA_TYPE_FLOAT: A single precision floating point number.
  • CHANNEL_DATA_TYPE_STRING: A string.
  • CHANNEL_DATA_TYPE_BOOL: A boolean.
  • CHANNEL_DATA_TYPE_INT_32: A 32bit signed integer.
  • CHANNEL_DATA_TYPE_INT_64: A 64bit signed integer.
  • CHANNEL_DATA_TYPE_UINT_32: A 32bit unsigned integer.
  • CHANNEL_DATA_TYPE_UINT_64: A 64bit unsigned integer.
  • CHANNEL_DATA_TYPE_ENUM:
  • CHANNEL_DATA_TYPE_BIT_FIELD:

Embedded Configuration

You can also supply the configuration for each column as a JSON encoded object in the CSV file itself. Each column must encode it's configuration object separately as a CSV string. The configuration can be in either the first or second row of the data.

The objects for the time column and data column correspond to the time_column and data_columns objects from the CSV configuration.

For example, the time column object will look like this:

{
  "format": "TIME_FORMAT_ABSOLUTE_RFC3339"
}

And each data column object will look like this:

{
  "name": "my column name",
  "data_type": "CHANNEL_DATA_TYPE_DOUBLE"
}

Import Data from CSV files using sift-stack-py

Getting started with sift-stack-py

Be sure to visit the Python Quickstart section of the documentation to see how to get started with the Sift Python client library before continuing.

In this example, our goal will be to upload a CSV called sample_asset.csv which will have the following format.

timestamp,velocity,pressure
2024-10-07 17:00:09.982126,0.9869788584872923,0.4321820341919653
2024-10-07 17:00:10.002126,0.5701255316316417,0.5914707762677202
2024-10-07 17:00:10.022126,0.49446373422349477,0.3195179734137701

Using sift-stack-py, we can upload the CSV as is like so:

from sift_py.data_import.csv import CsvUploadService
 
# Instantiate csv upload service
csv_upload_service = CsvUploadService({
    "uri": sift_uri,
    "apikey": apikey,
})
 
# Upload the CSV to Sift
import_service = csv_upload_service.simple_upload(
    asset_name,
    "sample_asset.csv",
)
 
# Wait until the upload is complete and store the uploaded file's metadata
# into the 'uploaded_file' variable.
uploaded_file = import_service.wait_until_complete()

For more comprehensive examples you can visit the Sift public repository.

On this page