Bulk Data Import API
The Import API allows you to import historical user data from other platforms into Userpilot. With this API, you can migrate data related to user and company identification, page views, and custom events. This enables a seamless transition and continuity of insights, providing a comprehensive view of user interactions.
This feature is available on the Enterprise plan only or as an add-on to the Growth plan. If interested in adding this feature please contact support@userpilot.co
Use Cases
- Data Migration: Migrate historical data from other analytics platforms to Userpilot, ensuring no loss of valuable insights.
- Retrospective Analysis: Analyze historical user behavior and feature engagement by importing past data on user actions, page views, and events.
Tips:
If you're migrating from another provider, complete the import process before installing the SDK. This ensures historical data is ready for analysis once Userpilot starts tracking live events.
Authorization
Userpilot API uses an API key to authenticate requests. You can find your API key on the Environment Page.
Your API key carries many privileges, so be sure to keep it secure! Do not share your secret API keys in publicly accessible areas.
Authentication Method:
Include your API key in the Authorization
header:
Authorization: Token {{API_KEY}}
Note:
All API requests must be made over HTTPS.
Rate Limits
Please note that the trigger import API is limited to one import job per application token at a time. If you attempt to create another import job while one is already in progress, the API will return a 409 Conflict
error.
Endpoints
HTTP Endpoints
For most users, the HTTP API endpoint URL is https://analytex.userpilot.io
as the examples show.
However, if you are on Enterprise or EU hosting, refer to the Environment Page in the application to retrieve your dedicated endpoint.
1. Request Data Import
Endpoint:
POST {{ENDPOINT}}/v1/imports
Description:
Initiates a data import job. Accepts a file in CSV or NDJSON format containing historical data.
Headers
Header | Value |
---|---|
Content-Type |
multipart/form-data |
Authorization |
Token {{API_KEY}} |
Request Parameters:
Parameter | Type | Required | Description |
---|---|---|---|
file | File | Yes | The CSV or NDJSON file containing the data to be imported. See the event schema for more details. |
Data Format: The imported data should conform to a list of events with the following event types:
- Identify User: User data update.
- Identify Company: Company data update.
- Page View: User page view event.
- Track Event: Custom user event.
Example Import File
NDJSON File Content
{"event_type": "identify_user", "user_id": "user_123", "inserted_at": "2024-07-28T08:55:35.874555", "metadata": { "name": "John Doe", "email": "john.doe@example.com", "location": "Dublin, Ireland" },"user_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"} {"event_type": "identify_company","company_id": "company_123","inserted_at": "2024-07-28T08:55:35.874555","metadata": {"name": "Acme Labs Inc.","industry": "Software","size": "200-500","location": "San Francisco, CA"}}
CSV File content
event_type,event_name,source,user_id,company_id,hostname,pathname,country_code,screen_width,screen_height,operating_system,browser,browser_language,user_agent,device_type,inserted_at,metadata identify_user,,web-client,User-7320-001,Company-607-001,campfire-example.userpilot.io,/profile/settings/,DE,1086,696,Mac,Edge,de-DE,Edge - Mozilla/5.0 (Mac) AppleWebKit/537.36,Mobile,2023-12-26 14:36:15.453703,"{""created_at"": ""1694172975"", ""name"": ""User-Alice"", ""prop_0"": ""value_743"", ""prop_1"": ""value_927""}" page_view,,web-client,User-9491-002,Company-973-002,campfire-example.userpilot.io,/home/,DE,832,809,iOS,Chrome,fr-FR,Chrome - Mozilla/5.0 (iOS) AppleWebKit/537.36,Mobile,2024-04-13 14:36:15.453788,"{}" track,created_project,web-client,User-6267-003,Company-584-003,campfire-example.userpilot.io,/profile/settings/,PS,1114,704,iOS,Safari,en-US,Safari - Mozilla/5.0 (iOS) AppleWebKit/537.36,Mobile,2024-05-05 14:36:15.453813,"{""created_at"": ""1730810175"", ""name"": ""User-Bob"", ""prop_0"": ""value_17""}"
Uploading the File
curl -X POST "{{ENDPOINT}}/v1/imports" \ -H "Authorization: Token {{API_KEY}}" \ -F "file=@/path/to/import_data.ndjson"
Response:
On Success
{ "end_time": null, "file_size": 2234, "filename": "2024-10-28.ndjson", "job_id": "e2695de9-2ff6-4517-8c33-c2933f0435fb", "links": "/v1/imports/jobs/e2695de9-2ff6-4517-8c33-c2933f0435fb", "start_time": "2024-11-28T10:34:11.376662", "status": "queued", "total_rows": 0 }
On Error
{ "errors": [ { "details": "There is already an export job in progress, you can only have one export job in progress at a time.", "error": "Conflict", "error_code": "409", "message": "The request could not be completed due to a conflict with the current state of the target resource." } ] }
2. Get All Import Job Statuses
Endpoint:
GET {{ENDPOINT}}/v1/imports/jobs
Description:
Fetches the status of all import jobs, allowing users to monitor the progress of multiple imports at once.
Headers
Header | Value |
---|---|
Authorization |
Token {{API_KEY}} |
Request Parameters:
None.
Example Request:
curl -X GET "{{ENDPOINT}}/v1/imports/jobs" \ -H "Authorization: Token {{API_KEY}}"
Response:
Empty List:
[]
List of Jobs:
[ { "end_time": null, "file_size": 534060000, "filename": "2024-09-02-500mb.ndjson", "job_id": "3ba9a432-b0f8-4dbc-97a7-3c55047d2c5d", "links": "/v1/imports/jobs/3ba9a432-b0f8-4dbc-97a7-3c55047d2c5d", "start_time": "2024-11-04T15:17:40.836382", "status": "completed", "total_rows": 828000 }, { "end_time": null, "file_size": 3949, "filename": "2024-10-28.ndjson", "job_id": "ca6859b7-cc9d-4ab3-801d-294625bd3b2b", "links": "/v1/imports/jobs/ca6859b7-cc9d-4ab3-801d-294625bd3b2b", "reason_for_failure": [ "Invalid type(:none_negative_integer) for page_sessions with value \"-1\"..." ], "start_time": "2024-11-04T12:21:14.316002", "status": "failed", "total_rows": 0 }, { "elapsed_time": 0, "end_time": "2024-11-25T14:25:50.234413Z", "file_size": 6104, "filename": "2024-10-28.ndjson", "job_id": "b5217b82-71f1-4e48-b858-2a6bdab32e95", "links": "/v1/imports/jobs/b5217b82-71f1-4e48-b858-2a6bdab32e95", "reason_for_failure": [ "Unsupported event type: page_views, allowed types: page_view, track, identify_user, identify_company" ], "start_time": "2024-11-25T14:25:50.229108", "status": "failed", "total_rows": 0 } ]
Each job object may include:
job_id
: Unique identifier for the job.status
: Job status, (queued
,validating
,processing
,pending_refresh
,completed
, andfailed
.start_time
: Time when the job was started.end_time
: Time when the job ended, if applicable.elapsed_time
: Total processing time in seconds.total_rows
: Number of rows processed.reason_for_failure
: Error details for failed jobs.
3. Get Job Status by ID
Endpoint:
GET {{ENDPOINT}}/api/import/status/:job_id
Description:
Fetches the status of a specific import job using its job_id
, useful for tracking a single import job.
Headers
Header | Value |
---|---|
Authorization |
Token {{API_KEY}} |
Path Parameters
Parameter | Type | Required | Description |
---|---|---|---|
job_id |
String | Yes | The unique identifier of the import job |
Example Request:
curl -X GET "{{ENDPOINT}}/api/import/status/:job_id" \ -H "Authorization: Token {{API_KEY}}"
Response:
On Success:
{ "elapsed_time": 46, "end_time": "2024-11-05T13:34:32.244508", "file_size": 534060000, "filename": "2024-09-02-500mb.ndjson", "job_id": "5247f455-76cd-4728-92b2-7237119f7160", "links": "/v1/imports/jobs/5247f455-76cd-4728-92b2-7237119f7160", "start_time": "2024-11-05T13:33:46.080762", "status": "completed", "total_rows": 828000 }
On Error:
{ "errors": [ { "details": null, "error": "Resource not found", "error_code": "404", "message": "Job not found" } ] }
Event Schema Documentation
Each record in the import file must conform to one of the following event
schemas. The supported event types are identify_user
, identify_company
, page_view
, and track
.
1. Identify User
Identifies or updates the attributes of a user.
Field | Type | Required | Description |
---|---|---|---|
event_type |
String | Yes | Must be identify_user . |
user_id |
String | Yes | Unique identifier for the user. |
metadata |
Object | Yes | Key-value pairs of user attributes. |
source |
String | Yes | Indicates the origin of the event data. |
inserted_at |
String (ISO) | Yes | Timestamp of when the data was recorded. |
Example Payload:
{ "event_type": "identify_user", "event_name": "", "source": "web-client", "user_id": "user_id", "company_id": "company_id", "hostname": "example.com", "pathname": "/users/list", "country_code": "UK", "screen_width": 859, "screen_height": 746, "operating_system": "Mac", "browser": "Chrome", "browser_language": "en-US", "user_agent": "Chrome - Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36", "device_type": "Desktop", "inserted_at": "2024-07-28 08:55:35.874555", "metadata": { "name": "John Doe", "email": "john.doe@example.com", "job_title": "Solution Architect", "department": "Engineering", "squad": "CORE Squad", "branch_location": "Dublin, Ireland", "sign_up_date": "2023-11-01 14:35:36.173103", "total_logins": 28, "days_since_last_login": 7 } }
2. Identify Company
Identifies or updates the attributes of a company.
Field | Type | Required | Description |
---|---|---|---|
event_type |
String | Yes | Must be identify_company . |
company_id |
String | Yes | Unique identifier for the company. |
source |
String | Yes | Indicates the origin of the event data. |
metadata |
Object | Yes | Key-value pairs of company attributes. |
inserted_at |
String (ISO) | Yes | Timestamp of when the data was recorded. |
Example Payload:
{ "event_type": "identify_company", "source": "web-client", "company_id": "company_id", "inserted_at": "2024-07-28 08:55:35.874555", "metadata": { "name": "Acme Labs Inc.", "industry": "Software Solutions", "branches_count": 3, "Headquarter": "Dublin, Ireland" } }
3. Page View
Logs a page view by a user.
Field | Type | Required | Description |
---|---|---|---|
event_type |
String | Yes | Must be page_view . |
user_id |
String | Yes | Unique identifier for the user. |
hostname |
String | Yes | Hostname of the page (e.g., example.com ). |
pathname |
String | Yes | Pathname of the page (e.g., /dashboard ). |
source |
String | Yes | Indicates the origin of the event data. |
inserted_at |
String (ISO) | Yes | Timestamp of when the page view occurred. |
Example Payload:
{ "event_type": "page_view", "event_name": "", "source": "web-client", "user_id": "user_id_10", "company_id": "company_id_60", "hostname": "example.com", "pathname": "/dashboard/", "country_code": "US", "screen_width": 859, "screen_height": 746, "operating_system": "", "browser": "", "browser_language": "en-US", "user_agent": "Chrome - Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36", "device_type": "Desktop", "inserted_at": "2024-07-28 08:55:34.229738", }
4. Track Event
Tracks a custom user action or event.
Field | Type | Required | Description |
---|---|---|---|
event_type |
String | Yes | Must be track . |
user_id |
String | Yes | Unique identifier for the user. |
event_name |
String | Yes | Name of the event (e.g., button_click ). |
source |
String | Yes | Indicates the origin of the event data. |
metadata |
Object | No | Key-value pairs of event-specific attributes. |
inserted_at |
String (ISO) | Yes | Timestamp of when the event occurred. |
Example Payload:
{ "event_type": "track", "event_name": "account_upgraded", "source": "web-client", "user_id": "user_id_123", "company_id": "", "hostname": "example.com", "pathname": "/subscription/plans", "country_code": "UK", "screen_width": 859, "screen_height": 746, "operating_system": "Mac", "browser": "Chrome", "browser_language": "en-US", "user_agent": "Chrome - Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36", "device_type": "Desktop", "inserted_at": "2024-12-01T08:55:43.220396", "metadata": { "old_plan": "Basic", "new_plan": "Premium", "upgrade_date": "2024-12-01T08:55:43.220396" } }
Row Validation Rules for HTTP Import File
When preparing your HTTP import file, ensure that each row adheres to the following validation rules:
- Default Metadata:
If an event does not include the
metadata
key, it will be automatically assigned anempty object (
{}
).
identify_company
Event Rules:- For events with
event_type
set toidentify_company
, only the following keys are processed:company_id
source
inserted_at
metadata
- Any additional keys will be ignored.
- For events with
- URL Parsing:
- if the entry has
url
key present, thehostname
andpathname
fields will be overridden based on the parsed values of theurl
.
- if the entry has
- Fields like
browser
andoperating_system
are automatically derived from theuser_agent
property if included in your data. This eliminates the need to define these attributes explicitly. - Auto Properties:
- The following auto-properties are optional but must be valid if provided:
browser
browser_language
country_code
device_type
hostname
operating_system
screen_height
screen_width
user_agent
Allowed Values for
source
:The
source
field must contain one of the following predefined values:
web-client
- Events come directly from a web client SDK, such as a browser or a front-end application.
- Special Requirements:
- Must include
hostname
,pathname
, anduser_agent
. - You can provide a
url
field instead ofhostname
andpathname
. The import process will automatically parse theurl
to extract and populate thehostname
andpathname
fields.
- Must include
- Typical Use Case:
- Capturing user interactions, page views, or custom events directly from a website
backend-http
- Events captured from backend systems via direct HTTP integration.
- Typical Use Case:
- Sending events such as user or company updates from server-side applications.
backend-hubspot
- Events imported from HubSpot integration.
- Typical Use Case:
- Synchronizing CRM data like user and company details from HubSpot.
backend-salesforce
- Events imported from Salesforce integration.
- Typical Use Case:
Importing CRM data, such as company and user profiles, from Salesforce.
backend-segment
- Events forwarded through Segment integration.
- Typical Use Case:
- Leveraging Segment as an event orchestration platform to ingest data into the system.
backend-http-import
- Events imported through the HTTP Bulk Import API.
- Typical Use Case:
- Bulk importing historical data for analysis or migration.
- The following auto-properties are optional but must be valid if provided:
User & Company Accumulative Properties Best Practices
When importing historical data, it is crucial to ensure the consistency and accuracy of user and company data in Userpilot. Follow these guidelines to avoid potential issues in reporting or data analysis:
Sequential Property Building
Begin the import process with a subset of properties and gradually build up the profile until all properties are included. This ensures a smooth transition and avoids overwriting or omitting critical data.
Consistent Property Lists
Avoid submitting inconsistent sets of properties for the same user or company in sequential
identify_user
oridentify_company
events. For example:- Incorrect:
- First import:
{ "name": "John Doe", "email": "john@example.com" }
- Second import:
{ "email": "john@example.com" }
- First import:
This approach may result in incomplete profiles or incorrect data in reporting tools.
Complete Property Submissions
As a best practice, always include the full list of properties for each
identify
event. Populate known attributes and leave the remaining attributes asnull
or empty values. This ensures that:- Userpilot retains an accurate and up-to-date profile for every user or company.
The reporting and analytics pipeline functions as expected.
- Incorrect:
Avoid Overwriting with Partial Data
Submitting partial data in subsequent
identify
calls can cause critical attributes to be overwritten with null values, leading to inaccurate reporting and inconsistent profiles.
Example Approach
Suppose you have the following user attributes over time:
Attribute | First identify | Second Identify | Third Identify |
---|---|---|---|
name |
"John Doe" |
"John Doe" |
"John Doe" |
email |
"john@example.com" |
"john@example.com" |
"john@example.com" |
location |
null |
"Dublin" |
"Dublin" |
job_title |
null |
null |
"Architect" |
- Best Practice: Gradually build up the user profile by adding properties in each step until the full profile is complete. This ensures the merge process maintains data consistency.
- Incorrect Practice: Reducing the set of properties in subsequent calls. This may result in missing or overwritten data.
Why This Matters
Userpilot's data pipeline relies on the accuracy and completeness of historical imports. The merge tree logic aggregates data based on the most recent updates. Inconsistent or partial imports can disrupt this process, leading to incorrect data in reporting tools.
By adhering to these best practices, you ensure reliable and comprehensive user and company profiles in Userpilot, maintaining the integrity of your analytics and insights.
By following this documentation and adhering to the best practices outlined, you can seamlessly import historical data into Userpilot and maintain accurate, consistent, and up-to-date user and company profiles. Proper preparation and attention to detail during the import process will ensure reliable reporting and insights for your application.
If you have any questions or encounter any issues, please don’t hesitate to reach out to our support team. We’re here to help!
📧 Contact Support: support@userpilot.co