Overview
Sift enables users to import data either programmatically or directly through the UI. Currently, the Sift UI supports importing CSV files, while the Sift Python client library extends support to additional file formats. The table below summarizes the file formats supported and the methods available for importing data:Supported formats
| Format | Supported in Sift UI | Supported in sift-stack-py |
|---|---|---|
| CSV | Yes | Yes |
| Parquet | Yes | Yes |
| TDMS | Yes | Yes |
| Chapter 10 | Yes | Yes |
| Rosbags | Coming Soon | Yes |
Instructions
- CSV
- Parquet
- TDMS
- Chapter 10
- Rosbags
- UI
- REST API
- CSV URL
- Python client
To upload a CSV file:
- In Sift, click .
- Click Import data.
- Select a CSV file.
- Asset:
- Existing Asset: If you want to associate the new CSV file with an existing Asset:
- In the Asset list, select the Asset to associate with the CSV file.
- New Asset: If you want to associate the new CSV file with a new Asset:
- Click New Asset.
- In the Asset box, enter a name for the new Asset.
- Existing Asset: If you want to associate the new CSV file with an existing Asset:
- Run:
- New Run: If you want to associate the new CSV file with a new run:
- Optional: In the Run box, edit the name for the new Run.
- Existing Run: If you want to associate the new CSV file with an existing Run:
- In the Run list, select a Run to associate with the CSV file.
- New Run: If you want to associate the new CSV file with a new run:
- In the First data row box, enter the row number where time-series data begins.
- In the Timestamp column list, select the column containing timestamps. Auto-detected but editable.
- In the Timestamp format list, select the format of the timestamp column.
- Optional: Edit the Channel configuration settings.
- Click Upload.
To upload a CSV file from your local environment using the REST API:
-
Create a CSV upload
Send a request to the upload endpoint with a configuration that describes how Sift should interpret the CSV file.
Report incorrect codeCopyAsk AI
curl "$SIFT_API_HOST/api/v2/data-imports:upload" \ -H "authorization: Bearer $SIFT_API_KEY" \ -H "content-type: application/json" \ -d '{ "csv_config": { "asset_name": "mars-weather", "run_name": "mars-test-run", "first_data_row": 2, "time_column": { "column_number": 1, "format": "TIME_FORMAT_ABSOLUTE_RFC3339" }, "data_columns": { "2": { "name": "mars_date_time", "data_type": "CHANNEL_DATA_TYPE_STRING" }, "3": { "name": "sol_number", "data_type": "CHANNEL_DATA_TYPE_STRING" }, "4": { "name": "max_ground_temp(C)", "data_type": "CHANNEL_DATA_TYPE_DOUBLE", "units": "Celsius" }, "5": { "name": "min_ground_temp(C)", "data_type": "CHANNEL_DATA_TYPE_DOUBLE", "units": "Celsius" }, "6": { "name": "max_air_temp(C)", "data_type": "CHANNEL_DATA_TYPE_DOUBLE", "units": "Celsius" }, "7": { "name": "min_air_temp(C)", "data_type": "CHANNEL_DATA_TYPE_DOUBLE", "units": "Celsius" }, "8": { "name": "mean_pressure(Pa)", "data_type": "CHANNEL_DATA_TYPE_DOUBLE", "units": "Pa" }, "9": { "name": "sunrise", "data_type": "CHANNEL_DATA_TYPE_STRING" }, "10": { "name": "sunset", "data_type": "CHANNEL_DATA_TYPE_STRING" }, "11": { "name": "UV_Radiation", "data_type": "CHANNEL_DATA_TYPE_STRING" }, "12": { "name": "weather", "data_type": "CHANNEL_DATA_TYPE_STRING" } } } }' -
Upload the CSV file
The response returns an
uploadUrland adataImportId. Use the returneduploadUrlto upload your CSV file.Report incorrect codeCopyAsk AIcurl "<UPLOAD_URL>" \ -H "authorization: Bearer $SIFT_API_KEY" \ --data-binary @mars-weather.csv -
Optional: Verify the upload
Check the ingestion status using the
dataImportId.Report incorrect codeCopyAsk AIcurl "$SIFT_API_HOST/api/v2/data-imports/<IMPORT_ID>" \ -H "authorization: Bearer $SIFT_API_KEY"
To import a CSV file from a hosted location using the REST API:
-
Provide the CSV URL and configuration
Send a request to the CSV URL import endpoint with the location of the CSV file and a configuration describing how Sift should interpret the file. After the request succeeds, Sift retrieves the CSV file from the specified URL and begins ingestion.
Report incorrect codeCopyAsk AI
curl -X POST "YOUR_REST_API_URL_HERE/api/v2/data-imports:url" \ -H "Authorization: Bearer YOUR_API_KEY_HERE" \ -H "Content-Type: application/json" \ -d '{ "url": "YOUR_CSV_FILE_URL_HERE", "csvConfig": { "assetName": "fl_demo_asset", "runName": "fl_demo_run", "firstDataRow": 2, "timeColumn": { "columnNumber": 1, "format": "TIME_FORMAT_ABSOLUTE_RFC3339" }, "dataColumns": { "2": { "name": "mars_date_time", "dataType": "CHANNEL_DATA_TYPE_STRING" }, "3": { "name": "sol_number", "dataType": "CHANNEL_DATA_TYPE_STRING" }, "4": { "name": "max_ground_temp(C)", "dataType": "CHANNEL_DATA_TYPE_DOUBLE", "units": "Celsius" }, "5": { "name": "min_ground_temp(C)", "dataType": "CHANNEL_DATA_TYPE_DOUBLE", "units": "Celsius" }, "6": { "name": "max_air_temp(C)", "dataType": "CHANNEL_DATA_TYPE_DOUBLE", "units": "Celsius" }, "7": { "name": "min_air_temp(C)", "dataType": "CHANNEL_DATA_TYPE_DOUBLE", "units": "Celsius" }, "8": { "name": "mean_pressure(Pa)", "dataType": "CHANNEL_DATA_TYPE_DOUBLE", "units": "Pa" }, "9": { "name": "sunrise", "dataType": "CHANNEL_DATA_TYPE_STRING" }, "10": { "name": "sunset", "dataType": "CHANNEL_DATA_TYPE_STRING" }, "11": { "name": "UV_Radiation", "dataType": "CHANNEL_DATA_TYPE_STRING" }, "12": { "name": "weather", "dataType": "CHANNEL_DATA_TYPE_STRING" } } } }' -
Optional: Verify the import
You can check the ingestion status using the returned
dataImportId.Report incorrect codeCopyAsk AIcurl "$SIFT_API_HOST/api/v2/data-imports/<IMPORT_ID>" \ -H "authorization: Bearer $SIFT_API_KEY"
To upload a CSV file using the Python client:
-
Install the Python client
Report incorrect codeCopyAsk AI
pip install sift-stack-py -
Initialize the Sift client
Create a client using your Sift API key and API endpoint.
Report incorrect codeCopyAsk AI
from sift_client import SiftClient client = SiftClient( api_key="YOUR_API_KEY_HERE", rest_url="YOUR_REST_API_URL_HERE" )
Sift supports Parquet files with a flat schema, where each telemetry Channel is represented as an individual column. To learn more, see Parquet: Format.
- UI
- REST API
- Python client
- URL
- On the Sift home page, click Import Data.
- Select a Parquet file.
- Select a flat dataset or channel per row.
- Configure the upload settings.
- Select Upload.
cURL lets you interact with the Sift REST API directly. This is useful if you want to test ingestion quickly from the command line, or if you are in an environment where installing the Python client isn’t practical. The process involves three steps: create a JSON config, request an upload URL, and then upload your Parquet file.
Before making the request, create a JSON file (for example
my-upload-config.json) that describes how Sift should interpret your Parquet file (see Parquet ingestion configuration for details):Report incorrect code
Copy
Ask AI
{
"parquet_config": { ... }
}
Send the configuration JSON (
my-upload-config.json) to the REST API by POSTing to /api/v2/data-imports:upload. If the request is valid, Sift returns a temporary upload URL.Report incorrect code
Copy
Ask AI
$ curl "$SIFT_API_HOST/api/v2/data-imports:upload" \
--data-binary @my-upload-config.json \
-H "authorization: Bearer $SIFT_API_KEY"
Report incorrect code
Copy
Ask AI
{"uploadUrl":"http://$SIFT_REST_URL/api/v2/data-imports:upload/<UPLOAD_ID>"}
Use the
uploadUrl returned in the previous step to upload your Parquet file.Report incorrect code
Copy
Ask AI
curl "$SIFT_API_HOST/api/v2/data-imports/<UPLOAD_ID>" \
--data-binary @my-data.parquet \
-H "authorization: Bearer $SIFT_API_KEY"
GZIP: To upload a GZIP-compressed file, add the header:
-H "content-encoding: gzip".Use the
sift-stack-py client library to upload a Parquet file directly into Sift. This method interacts with the gRPC API.Upload Parquet file
Report incorrect code
Copy
Ask AI
from sift_py.data_import.parquet import ParquetUploadService
# Initialize the Parquet upload service
parquet_upload_service = ParquetUploadService({
"uri": sift_uri,
"apikey": apikey,
})
# Ingest the Parquet file into Sift
import_service = parquet_upload_service.flat_dataset_upload(
asset_name,
"sample_data.parquet",
time_path="timestamp",
time_format=TimeFormatType.ABSOLUTE_UNIX_NANOSECONDS,
)
# Wait for ingestion to finish and save the resulting file metadata in 'uploaded_file'
uploaded_file = import_service.wait_until_complete()
Time:
time_path must be a timestamp column; time_format must be supported (see Time formats).Examples: For additional usage patterns, see the Parquet ingestion examples in the Sift public
repository.
Instead of uploading a local file, you can tell Sift to fetch and ingest a Parquet file directly from a remote location. Both HTTP and S3 sources are supported. This method requires a configuration JSON that specifies the file URL and how to interpret the Parquet data.
Create a JSON file (for example
my-url-config.json) with the remote file URL and a Parquet configuration (see Parquet ingestion configuration for details):Report incorrect code
Copy
Ask AI
{
"url": string,
"parquet_config": { ... }
}
Start ingestion. POST the configuration to
/api/v2/data-imports:url. If the request is valid, the endpoint will return a 200 response code and Sift will fetch the file from the given URL and begin ingestion.Report incorrect code
Copy
Ask AI
curl "$SIFT_API_HOST/api/v2/data-imports:url" \
--data-binary @my-url-config.json \
-H "authorization: Bearer $SIFT_API_KEY"
GZIP: If the remote file is GZIP-compressed, make sure the server sets the
content-encoding: gzip header.Parquet ingestion configuration
The JSON configuration defines how Sift should interpret and ingest your Parquet file. This configuration is required when ingesting with cURL or URL ingestion.Report incorrect code
Copy
Ask AI
{
"url": string, // Required only for URL ingestion. The remote file location (HTTP or S3).
"parquet_config": { // Required. Defines how to ingest the Parquet file.
"assetName": string, // Required. Name of the Asset to attach this data to.
"runName": string, // Optional. Name of the Run to create for this data.
"runId": string, // Optional. ID of the Run to add this data to.
// If set, "runName" is ignored.
"flatDataset": { // Required. ParquetFlatDatasetConfig object.
"timeColumn": {
"path": string, // Required. Path to the time column.
"format": string, // Required. See the "Time formats" section for options.
"relativeStartTime": string // Optional. RFC3339 format. Used for relative time formats.
},
"dataColumns": [
{
"path": string, // Required. Path to the data column. See the "Nested columns"
// section for how to specify the path for nested columns.
"channelConfig": {
"name": string, // Required. Name of the Channel.
"dataType": string, // Required. See the "Data types" section for options.
"units": string, // Optional. Channel units (defaults to empty).
"description": string, // Optional. Channel description.
"enumTypes": [ // Optional. Only valid if dataType is ENUM.
{
"key": number, // Raw enum value.
"name": string // Display value for the enum.
}
],
"bitFieldElements": [ // Optional. Only valid if dataType is BIT_FIELD.
{
"index": number, // Starting index of the bit field.
"name": string, // Name of the bit field element.
"bitCount": number // Number of bits in the element.
}
]
}
}
]
},
"footerOffset": number, // Required. Byte position where the Parquet footer starts.
"footerLength": number, // Required. Length of the Parquet footer in bytes.
"complexTypesImportMode": string // Optional. See the "Complex types" section for options.
}
}
Columns: Columns not specified in the configuration are not ingested.
Parquet upload status
When you submit a Parquet ingestion request, the response (200 for Python or URL ingestion, OK for cURL) only confirms that the request was accepted, not that the file has been fully ingested. To verify ingestion, use the following data-import endpoints to check the upload status:Report incorrect code
Copy
Ask AI
curl -s "$SIFT_API_HOST/api/v2/data-imports" -H "authorization: Bearer $SIFT_API_KEY"
Time formats
When ingesting Parquet data, you must specify how timestamps are interpreted. These formats apply to both thetimeColumn.format field in the JSON configuration (for cURL and URL ingestion) and the time_format parameter when using the Python client.
Supported formats are as follows: absolute formats and relative formats.Absolute (timestamps tied to a fixed point in time)
Relative (timestamps relative to a start time)
Data types
Data types are specified per Channel in thechannelConfig.dataType field of your JSON configuration. You can use the following options:Complex types
Parquet files may contain lists or maps. The complexTypesImportMode field in the Parquet configuration (JSON file) controls how these are ingested. The table below lists the available options:Default value: If the
complexTypesImportMode field is not included in the Parquet configuration JSON file, Sift will use
PARQUET_COMPLEX_TYPES_IMPORT_MODE_BOTH by default.Nested columns
To specify a nested column in your Parquet configuration, use the| character as a separator in the path field. For example, if your Parquet file contains a struct column named location with a nested field lat, set the path as "location|lat". This tells Sift to ingest the nested lat field inside the location struct.Parquet configuration detection
If you do not need to customize how your data is imported, you can use thedetect-config endpoint to automatically generate a Parquet configuration. This endpoint analyzes your Parquet file footer and returns a configuration object with most fields pre-filled.
You will only need to manually set the timestamp information (time_column) and the footer information (footer_offset and footer_length) before uploading. To use this endpoint:Send a POST request with your Parquet file footer to the
/api/v0/data-imports:detect-config endpoint (this is a v0 endpoint).Review the response, which will include a suggested
parquetConfig object generated from the file’s footer.Edit the returned configuration to specify the correct time column (
timeColumn) and the footer values (footerOffset and
footerLength).Report incorrect code
Copy
Ask AI
# The following example shows how to extract the footer bytes from a Parquet file and submit
# them to the `detect-config` endpoint to generate a configuration.
import base64
import gzip
import json
import os
import struct
import requests
# Calculate Parquet footer offset and length
file_path = "sample_asset.parquet"
with open(file_path, "rb") as f:
# See the "Parquet footer" section for how to calculate this.
footer_len, footer_offset = extract_footer_information(file_path)
f.seek(-(footer_len + 8), os.SEEK_END)
footer_bytes = f.read(footer_len)
# Encode footer for detect-config
encoded_data = base64.b64encode(footer_bytes).decode("utf-8")
request_data = json.dumps({
"data": encoded_data,
"type": "DATA_TYPE_KEY_PARQUET_FLATDATASET",
})
# POST footer to detect-config endpoint
detect_config_url = f"{sift_uri}/api/v0/data-imports:detect-config"
headers = {"authorization": f"Bearer {apikey}"}
response = requests.post(detect_config_url, data=request_data, headers=headers)
response.raise_for_status()
config_info = response.json()
parquet_config = config_info["parquetConfig"]
# Update detected config with required fields
parquet_config["assetName"] = asset_name
parquet_config["flatDataset"]["timeColumn"]["path"] = "timestamp"
parquet_config["flatDataset"]["timeColumn"]["format"] = "TIME_FORMAT_ABSOLUTE_DATETIME"
parquet_config["footerOffset"] = footer_offset
parquet_config["footerLength"] = footer_len
# Use the updated parquet_config in your upload request to complete ingestion
# ....
Parquet footer
The Parquet footer is a metadata block at the end of every Parquet file. It contains schema information, row group metadata, and other details required for reading the file. When uploading Parquet files to Sift, you must specify thefooterOffset (the byte position where the footer starts) and footerLength (the size of the footer in bytes) in your configuration.Report incorrect code
Copy
Ask AI
[ row groups ... ][ footer (metadata) ][ footer length (4 bytes) ][ "PAR1" (4 bytes) ]
footerOffset and footerLength in your configuration JSON, you need to extract them from the Parquet file itself. The following examples show two common methods:Report incorrect code
Copy
Ask AI
import os
import struct
file_path = "example.parquet"
with open(file_path, "rb") as f: # Get file size
f.seek(0, os.SEEK_END)
file_size = f.tell()
# Read last 8 bytes
f.seek(-8, os.SEEK_END)
footer_info = f.read(8)
# First 4 bytes = footer length (little-endian uint32)
footer_len = struct.unpack("<I", footer_info[:4])[0]
# Last 4 bytes = magic "PAR1"
magic = footer_info[4:]
if magic != b"PAR1":
raise ValueError("Not a valid Parquet file: missing PAR1 magic bytes")
# Footer offset = file size - footer_len - 8 (length + magic)
footer_offset = file_size - footer_len - 8
- Credentials
- Before starting this section, ensure you have retrieved your API key and the appropriate Sift URL for your provisioned environment.
- For steps on obtaining the API key and URL, see Create an API key.
- Getting started with sift-stack-py
- Be sure to visit the Python Quickstart section of the documentation to see how to get started with the Sift Python client library before continuing.
sample_data.tdms that can be found on Github.To import data from TDMS files using sift-stack-py:Report incorrect code
Copy
Ask AI
from sift_py.data_import.tdms import TdmsUploadService
# Instantiate a tdms upload service
tdms_upload_service = TdmsUploadService({
"uri": sift_uri,
"apikey": apikey,
})
# Upload the tdms file
import_service = tdms_upload_service.upload(
"sample_data.tdms",
asset_name,
group_into_components=True,
)
# Wait until the upload is complete and store the uploaded file's metadata
# into the 'uploaded_file' variable.
uploaded_file = import_service.wait_until_complete()
- Credentials
- Before starting this section, ensure you have retrieved your API key and the appropriate Sift URL for your provisioned environment.
- For steps on obtaining the API key and URL, see Create an API key.
- Getting started with sift-stack-py
- Be sure to visit the Python Quickstart section of the documentation to see how to get started with the Sift Python client library before continuing.
BaseCh10File to parse your Chapter 10 file. Please reach out to the Sift team for a specific implementation.First create or obtain a parser implementation:Report incorrect code
Copy
Ask AI
from sift_py.data_import.ch10 import BaseCh10File
class MyCh10Parser(BaseCh10File):
"""Implements BaseCh10File."""
def __init__(self, path):
self.file = open(path, "rb")
self.csv_config_data_columns = None
def initialize_csv_data_columns(self):
self.csv_config_data_columns = self.process_ch10_computer_f1_packet()
def process_ch10_computer_f1_packet(self) -> Dict[int, dict]:
# Processes the first Computer F1 packet
# and returns the measurements as a dict.
...
def process_ch10_pcm_packet(self) -> str:
# Processed the data packets and returns
# a CSV row.
...
def __next__(self) -> str:
# On all iterations, return data for the CSV file.
if end_of_file:
raise StopIteration()
else:
return self.process_ch10_data_packet()
Report incorrect code
Copy
Ask AI
from sift_py.data_import.ch10 import Ch10UploadService
# Instantiate a Chapter 10 upload service
ch10_upload_service = Ch10UploadService({
"uri": sift_uri,
"apikey": apikey,
})
ch10_file = MyCh10Parser("sample_data.ch10")
# Upload the Chapter 10 file
import_service = ch10_upload_service.upload(
ch10_file,
asset_name,
)
# Wait until the upload is complete and store the uploaded file's metadata
# into the 'uploaded_file' variable.
uploaded_file = import_service.wait_until_complete()
- Credentials
- Before starting this section, ensure you have retrieved your API key and the appropriate Sift URL for your provisioned environment.
- For steps on obtaining the API key and URL, see Create an API key.
- Getting started with sift-stack-py
- Be sure to visit the Python Quickstart section of the documentation to see how to get started with the Sift Python client library before continuing.
sample_data rosbag which can be found at this link and using the std_msgs message definitions which can be found at this link.Uploading this Rosbag file is only a few steps:Report incorrect code
Copy
Ask AI
from sift_py.data_import.rosbags import RosbagsUploadService
# Instantiate a rosbag upload service.
ros2_upload_service = RosbagsUploadService({
"uri": sift_uri,
"apikey": apikey,
})
# Upload the rosbags file specifying the path to relevant message definitions.
import_service = ros2_upload_service.upload(
"data/sample_data",
["data/std_msgs"],
Stores.ROS2_HUMBLE,
asset_name,
)
# Wait until the upload is complete and store the uploaded file's metadata
# into the 'uploaded_file' variable.
uploaded_file = import_service.wait_until_complete()
ffmpeg:Report incorrect code
Copy
Ask AI
from sift_py.data_import.rosbags import RosbagsUploadService
import ffmpeg
video_processor = (
ffmpeg.input("pipe:", s=f"{width}x{height}", framerate=30)
.output(output_video_filename, pix_fmt="yuv420p")
.run_async(pipe_stdin=True)
)
# Callback handler to write video frames.
def write_video_frame_handler(topic, timestamp, msg):
video_processor.stdin.write(msg.data)
# Instantiate a rosbag upload service
ros2_upload_service = RosbagsUploadService({
"uri": sift_uri,
"apikey": apikey,
})
# Upload the rosbag file specifying the path to relevant message definitions.
import_service = ros2_upload_service.upload(
"data/sample_data",
["data/std_msgs"],
Stores.ROS2_HUMBLE,
asset_name,
handlers={"/my/video/topic": write_video_frame_handler},
)
video_processor.stdin.close()
video_processor.wait()
# Wait until the upload is complete and store the uploaded file's metadata
# into the 'uploaded_file' variable.
uploaded_file = import_service.wait_until_complete()