Bulk Data Import API

The Import API allows you to import historical user data from other platforms into Userpilot. With this API, you can migrate data related to user and company identification, page views, and custom events. This enables a seamless transition and continuity of insights, providing a comprehensive view of user interactions.

This feature is available on the Enterprise plan only or as an add-on to the Growth plan. If interested in adding this feature please contact support@userpilot.co

Use Cases

Data Migration: Migrate historical data from other analytics platforms to Userpilot, ensuring no loss of valuable insights.
Retrospective Analysis: Analyze historical user behavior and feature engagement by importing past data on user actions, page views, and events.

Tips:

If you're migrating from another provider, complete the import process before installing the SDK. This ensures historical data is ready for analysis once Userpilot starts tracking live events.

Authorization

Userpilot API uses an API key to authenticate requests. You can find your API key on the Environment Page.

Your API key carries many privileges, so be sure to keep it secure! Do not share your secret API keys in publicly accessible areas.

Authentication Method:

Include your API key in the Authorization header:

Authorization: Token {{API_KEY}}

Note:

All API requests must be made over HTTPS.

Rate Limits

Please note that the trigger import API is limited to one import job per application token at a time. If you attempt to create another import job while one is already in progress, the API will return a 409 Conflict error.

Endpoints

HTTP Endpoints

For most users, the HTTP API endpoint URL is https://analytex.userpilot.io as the examples show.

However, if you are on Enterprise or EU hosting, refer to the Environment Page in the application to retrieve your dedicated endpoint.

1. Request Data Import

Endpoint:

POST {{ENDPOINT}}/v1/imports

Description:

Initiates a data import job. Accepts a file in CSV or NDJSON format containing historical data.

Headers

Header	Value
`Content-Type`	`multipart/form-data`
`Authorization`	`Token {{API_KEY}}`

Request Parameters:

Parameter	Type	Required	Description
file	File	Yes	The CSV or NDJSON file containing the data to be imported. See the event schema for more details.

Data Format: The imported data should conform to a list of events with the following event types:

Identify User: User data update.
Identify Company: Company data update.
Page View: User page view event.
Track Event: Custom user event.

Example Import File

NDJSON File Content

{"event_type": "identify_user", "user_id": "user_123", "inserted_at": "2024-07-28T08:55:35.874555", "metadata": { "name": "John Doe", "email": "john.doe@example.com", "location": "Dublin, Ireland" },"user_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"}
{"event_type": "identify_company","company_id": "company_123","inserted_at": "2024-07-28T08:55:35.874555","metadata": {"name": "Acme Labs Inc.","industry": "Software","size": "200-500","location": "San Francisco, CA"}}

CSV File content

event_type,event_name,source,user_id,company_id,hostname,pathname,country_code,screen_width,screen_height,operating_system,browser,browser_language,user_agent,device_type,inserted_at,metadata
identify_user,,web-client,User-7320-001,Company-607-001,campfire-example.userpilot.io,/profile/settings/,DE,1086,696,Mac,Edge,de-DE,Edge - Mozilla/5.0 (Mac) AppleWebKit/537.36,Mobile,2023-12-26 14:36:15.453703,"{""created_at"": ""1694172975"", ""name"": ""User-Alice"", ""prop_0"": ""value_743"", ""prop_1"": ""value_927""}"
page_view,,web-client,User-9491-002,Company-973-002,campfire-example.userpilot.io,/home/,DE,832,809,iOS,Chrome,fr-FR,Chrome - Mozilla/5.0 (iOS) AppleWebKit/537.36,Mobile,2024-04-13 14:36:15.453788,"{}"
track,created_project,web-client,User-6267-003,Company-584-003,campfire-example.userpilot.io,/profile/settings/,PS,1114,704,iOS,Safari,en-US,Safari - Mozilla/5.0 (iOS) AppleWebKit/537.36,Mobile,2024-05-05 14:36:15.453813,"{""created_at"": ""1730810175"", ""name"": ""User-Bob"", ""prop_0"": ""value_17""}"

Uploading the File

curl -X POST "{{ENDPOINT}}/v1/imports" \
-H "Authorization: Token {{API_KEY}}" \
-F "file=@/path/to/import_data.ndjson"

Response:

On Success

{
"end_time": null,
"file_size": 2234,
"filename": "2024-10-28.ndjson",
"job_id": "imports:jobs:NX-51f4acf7:f681073b-21ac-4276-9af5-c662dc05fb25",
"links": "/v1/background_jobs/imports:jobs:NX-51f4acf7:f681073b-21ac-4276-9af5-c662dc05fb25",
"start_time": "2024-11-28T10:34:11.376662",
"status": "queued",
"total_rows": 0
}

On Error

{
    "errors": [
        {
            "details": "There is already an export job in progress, you can only have one export job in progress at a time.",
            "error": "Conflict",
            "error_code": "409",
            "message": "The request could not be completed due to a conflict with the current state of the target resource."
        }
    ]
}

2. Get All Import Job Statuses

Endpoint:

GET {{ENDPOINT}}/v1/background_jobs

Description:

Fetches the status of all import jobs, allowing users to monitor the progress of multiple imports at once.

Headers

Header	Value
`Authorization`	`Token {{API_KEY}}`

Request Parameters:

None.

Example Request:

curl -X GET "{{ENDPOINT}}/v1/background_jobs" \
-H "Authorization: Token {{API_KEY}}"

Response:

Empty List:

[]

List of Jobs:

[
    {
        "end_time": null,
        "file_size": 534060000,
        "filename": "2024-09-02-500mb.ndjson",
        "job_id": "imports:jobs:NX-51f4acf7:f681073b-21ac-4276-9af5-c662dc05fb25", 
"links": "/v1/background_jobs/imports:jobs:NX-51f4acf7:f681073b-21ac-4276-9af5-c662dc05fb25",
        "start_time": "2024-11-04T15:17:40.836382",
        "status": "completed",
        "total_rows": 828000
    },
    {
        "end_time": null,
        "file_size": 3949,
        "filename": "2024-10-28.ndjson",
        "job_id": "imports:jobs:NX-51f4acf7:ca6859b7-cc9d-4ab3-801d-294625bd3b2b",
        "links": "/v1/background_jobs/imports:jobs:NX-51f4acf7:ca6859b7-cc9d-4ab3-801d-294625bd3b2b",
        "reason_for_failure": [
            "Invalid type(:none_negative_integer) for page_sessions with value \"-1\"..."
        ],
        "start_time": "2024-11-04T12:21:14.316002",
        "status": "failed",
        "total_rows": 0
    },
    {
        "elapsed_time": 0,
        "end_time": "2024-11-25T14:25:50.234413Z",
        "file_size": 6104,
        "filename": "2024-10-28.ndjson",
        "job_id": "imports:jobs:NX-51f4acf7:b5217b82-71f1-4e48-b858-2a6bdab32e95",
        "links": "/v1/background_jobs/imports:jobs:NX-51f4acf7:b5217b82-71f1-4e48-b858-2a6bdab32e95",
        "reason_for_failure": [
            "Unsupported event type: page_views, allowed types: page_view, track, identify_user, identify_company"
        ],
        "start_time": "2024-11-25T14:25:50.229108",
        "status": "failed",
        "total_rows": 0
    }
]

Each job object may include:

job_id : Unique identifier for the job.
status : Job status, (queued , validating , processing , pending_refresh , completed , and failed .
start_time : Time when the job was started.
end_time : Time when the job ended, if applicable.
elapsed_time : Total processing time in seconds.
total_rows : Number of rows processed.
reason_for_failure : Error details for failed jobs.

3. Get Job Status by ID

Endpoint:

GET {{ENDPOINT}}/api/background_jobs/:job_id

Description:

Fetches the status of a specific import job using its job_id , useful for tracking a single import job.

Headers

Header	Value
`Authorization`	`Token {{API_KEY}}`

Path Parameters

Parameter	Type	Required	Description
`job_id`	String	Yes	The unique identifier of the import job

Example Request:

curl -X GET "{{ENDPOINT}}/api/background_jobs/:job_id" \
-H "Authorization: Token {{API_KEY}}"

Response:

On Success:

{
    "elapsed_time": 46,
    "end_time": "2024-11-05T13:34:32.244508",
    "file_size": 534060000,
    "filename": "2024-09-02-500mb.ndjson",
"job_id": "imports:jobs:NX-51f4acf7:f681073b-21ac-4276-9af5-c662dc05fb25",
"links": "/v1/background_jobs/imports:jobs:NX-51f4acf7:f681073b-21ac-4276-9af5-c662dc05fb25",
    "start_time": "2024-11-05T13:33:46.080762",
    "status": "completed",
    "total_rows": 828000
}

On Error:

{
    "errors": [
        {
            "details": null,
            "error": "Resource not found",
            "error_code": "404",
            "message": "Job not found"
        }
    ]
}

Event Schema Documentation

Each record in the import file must conform to one of the following event schemas. The supported event types are identify_user , identify_company , page_view , and track .

1. Identify User

Identifies or updates the attributes of a user.

Field	Type	Required	Description
`event_type`	String	Yes	Must be `identify_user` .
`user_id`	String	Yes	Unique identifier for the user.
`metadata`	Object	Yes	Key-value pairs of user attributes.
`source`	String	Yes	Indicates the origin of the event data.
`inserted_at`	String (ISO)	Yes	Timestamp of when the data was recorded.

Example Payload:

{
  "event_type": "identify_user",
  "event_name": "",
  "source": "web-client",
  "user_id": "user_id",
  "company_id": "company_id",
  "hostname": "example.com",
  "pathname": "/users/list",
  "country_code": "UK",
  "screen_width": 859,
  "screen_height": 746,
  "operating_system": "Mac",
  "browser": "Chrome",
  "browser_language": "en-US",
  "user_agent": "Chrome - Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36",
  "device_type": "Desktop",
  "inserted_at": "2024-07-28 08:55:35.874555",
  "metadata": {
    "name": "John Doe",
    "email": "john.doe@example.com",
    "job_title": "Solution Architect",
    "department": "Engineering",
    "squad": "CORE Squad",
    "branch_location": "Dublin, Ireland",
    "sign_up_date": "2023-11-01 14:35:36.173103",
    "total_logins": 28,
    "days_since_last_login": 7
  }
}

2. Identify Company

Identifies or updates the attributes of a company.

Field	Type	Required	Description
`event_type`	String	Yes	Must be `identify_company` .
`company_id`	String	Yes	Unique identifier for the company.
`source`	String	Yes	Indicates the origin of the event data.
`metadata`	Object	Yes	Key-value pairs of company attributes.
`inserted_at`	String (ISO)	Yes	Timestamp of when the data was recorded.

Example Payload:

{
  "event_type": "identify_company",
  "source": "web-client",
  "company_id": "company_id",
  "inserted_at": "2024-07-28 08:55:35.874555",
  "metadata": {
    "name": "Acme Labs Inc.",
    "industry": "Software Solutions",
    "branches_count": 3,
    "Headquarter": "Dublin, Ireland"
  }
}

3. Page View

Logs a page view by a user.

Field	Type	Required	Description
`event_type`	String	Yes	Must be `page_view` .
`user_id`	String	Yes	Unique identifier for the user.
`hostname`	String	Yes	Hostname of the page (e.g., `example.com` ).
`pathname`	String	Yes	Pathname of the page (e.g., `/dashboard` ).
`source`	String	Yes	Indicates the origin of the event data.
`inserted_at`	String (ISO)	Yes	Timestamp of when the page view occurred.

Example Payload:

{
  "event_type": "page_view",
  "event_name": "",
  "source": "web-client",
  "user_id": "user_id_10",
  "company_id": "company_id_60",
  "hostname": "example.com",
  "pathname": "/dashboard/",
  "country_code": "US",
  "screen_width": 859,
  "screen_height": 746,
  "operating_system": "",
  "browser": "",
  "browser_language": "en-US",
  "user_agent": "Chrome - Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36",
  "device_type": "Desktop",
  "inserted_at": "2024-07-28 08:55:34.229738",
}

4. Track Event

Tracks a custom user action or event.

Field	Type	Required	Description
`event_type`	String	Yes	Must be `track` .
`user_id`	String	Yes	Unique identifier for the user.
`event_name`	String	Yes	Name of the event (e.g., `button_click` ).
`source`	String	Yes	Indicates the origin of the event data.
`metadata`	Object	No	Key-value pairs of event-specific attributes.
`inserted_at`	String (ISO)	Yes	Timestamp of when the event occurred.

Example Payload:

{
  "event_type": "track",
  "event_name": "account_upgraded",
  "source": "web-client",
  "user_id": "user_id_123",
  "company_id": "",
  "hostname": "example.com",
  "pathname": "/subscription/plans",
  "country_code": "UK",
  "screen_width": 859,
  "screen_height": 746,
  "operating_system": "Mac",
  "browser": "Chrome",
  "browser_language": "en-US",
  "user_agent": "Chrome - Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36",
  "device_type": "Desktop",
  "inserted_at": "2024-12-01T08:55:43.220396",
  "metadata": {
    "old_plan": "Basic",
    "new_plan": "Premium",
    "upgrade_date": "2024-12-01T08:55:43.220396"
  }
}

Row Validation Rules for HTTP Import File

When preparing your HTTP import file, ensure that each row adheres to the following validation rules:

Default Metadata:
- If an event does not include the metadata key, it will be automatically assigned an
  empty object ({} ).
identify_company Event Rules:
- For events with event_type set to identify_company , only the following keys are processed:
  - company_id
  - source
  - inserted_at
  - metadata
- Any additional keys will be ignored.
URL Parsing:
- if the entry has url key present, the hostname and pathname fields will be overridden based on the parsed values of the url .
Fields like browser and operating_system are automatically derived from the user_agent property if included in your data. This eliminates the need to define these attributes explicitly.
Auto Properties:
- The following auto-properties are optional but must be valid if provided:
  - browser
  - browser_language
  - country_code
  - device_type
  - hostname
  - operating_system
  - screen_height
  - screen_width
  - user_agent
    Allowed Values for source :
    The source field must contain one of the following predefined values:
- web-client
  - Events come directly from a web client SDK, such as a browser or a front-end application.
  - Special Requirements:
    - Must include hostname , pathname , and user_agent .
    - You can provide a url field instead of hostname and pathname . The import process will automatically parse the url to extract and populate the hostname and pathname fields.
  - Typical Use Case:
    - Capturing user interactions, page views, or custom events directly from a website
- backend-http
  - Events captured from backend systems via direct HTTP integration.
  - Typical Use Case:
    - Sending events such as user or company updates from server-side applications.
- backend-hubspot
  - Events imported from HubSpot integration.
  - Typical Use Case:
    - Synchronizing CRM data like user and company details from HubSpot.
- backend-salesforce
  - Events imported from Salesforce integration.
  - Typical Use Case:
    - Importing CRM data, such as company and user profiles, from Salesforce.
- backend-segment
  - Events forwarded through Segment integration.
  - Typical Use Case:
    - Leveraging Segment as an event orchestration platform to ingest data into the system.
- backend-http-import
  - Events imported through the HTTP Bulk Import API.
  - Typical Use Case:
    - Bulk importing historical data for analysis or migration.

User & Company Accumulative Properties Best Practices

When importing historical data, it is crucial to ensure the consistency and accuracy of user and company data in Userpilot. Follow these guidelines to avoid potential issues in reporting or data analysis:

Sequential Property Building
Begin the import process with a subset of properties and gradually build up the profile until all properties are included. This ensures a smooth transition and avoids overwriting or omitting critical data.
Consistent Property Lists
Avoid submitting inconsistent sets of properties for the same user or company in sequential identify_user or identify_company events. For example:
- Incorrect:
  - First import: { "name": "John Doe", "email": "john@example.com" }
  - Second import: { "email": "john@example.com" }
- This approach may result in incomplete profiles or incorrect data in reporting tools.
  Complete Property Submissions
  As a best practice, always include the full list of properties for each identify event. Populate known attributes and leave the remaining attributes as null or empty values. This ensures that:
- Userpilot retains an accurate and up-to-date profile for every user or company.
- The reporting and analytics pipeline functions as expected.
Avoid Overwriting with Partial Data
Submitting partial data in subsequent identify calls can cause critical attributes to be overwritten with null values, leading to inaccurate reporting and inconsistent profiles.

Example Approach

Suppose you have the following user attributes over time:

Attribute	First identify	Second Identify	Third Identify
`name`	`"John Doe"`	`"John Doe"`	`"John Doe"`
`email`	`"john@example.com"`	`"john@example.com"`	`"john@example.com"`
`location`	`null`	`"Dublin"`	`"Dublin"`
`job_title`	`null`	`null`	`"Architect"`

Best Practice: Gradually build up the user profile by adding properties in each step until the full profile is complete. This ensures the merge process maintains data consistency.
Incorrect Practice: Reducing the set of properties in subsequent calls. This may result in missing or overwritten data.

Why This Matters

Userpilot's data pipeline relies on the accuracy and completeness of historical imports. The merge tree logic aggregates data based on the most recent updates. Inconsistent or partial imports can disrupt this process, leading to incorrect data in reporting tools.

By adhering to these best practices, you ensure reliable and comprehensive user and company profiles in Userpilot, maintaining the integrity of your analytics and insights.

By following this documentation and adhering to the best practices outlined, you can seamlessly import historical data into Userpilot and maintain accurate, consistent, and up-to-date user and company profiles. Proper preparation and attention to detail during the import process will ensure reliable reporting and insights for your application.

If you have any questions or encounter any issues, please don’t hesitate to reach out to our support team. We’re here to help!

📧 Contact Support: support@userpilot.co

Bulk Data Import API

Use Cases

Authorization

Rate Limits

Endpoints

HTTP Endpoints

1. Request Data Import

2. Get All Import Job Statuses

3. Get Job Status by ID

Event Schema Documentation

1. Identify User

2. Identify Company

3. Page View

4. Track Event

Row Validation Rules for HTTP Import File

User & Company Accumulative Properties Best Practices

Related Articles