Page cover

πŸ¦”PostHog Integration

Query Your Product Analytics with SQL and APIs

Overview

The PostHog integration allows you to query your PostHog analytics data using multiple approaches: HogQL (PostHog's SQL dialect), REST API calls, and the PostHog Query API. This integration makes it easy to analyze product analytics data alongside other databases in your data stack.

Quick Start

1. Get Your PostHog Credentials

# From your PostHog instance:
# 1. Get your Project API Key from Project Settings
# 2. Note your Project ID 
# 3. Get your PostHog instance URL (e.g., https://app.posthog.com)

2. Connect to PostHog

{
  "connection": {
    "host": "https://us.posthog.com",
    "port": "443", 
    "password": "YOUR_PROJECT_API_KEY",
    "database": "YOUR_PROJECT_ID"
  }
}

3. Run Your First Query

-- HogQL query to get recent events
SELECT event, count() as event_count
FROM events 
WHERE timestamp > now() - interval 7 day
GROUP BY event
ORDER BY event_count DESC
LIMIT 10;

-- REST API call to get events
GET /api/projects/:project_id/events?limit=10

-- Query API call for insights
QUERY {"kind": "EventsQuery", "select": ["*"], "limit": 10}

Authentication Setup

PostHog Cloud

  1. Get API Key

    • Go to your PostHog project settings

    • Navigate to "Project API Key"

    • Copy your Project API Key

  2. Find Project ID

    • In PostHog, go to Project Settings

    • Copy the Project ID (numeric value)

Self-Hosted PostHog

  1. Get Your Instance URL

    # Your self-hosted PostHog URL
    https://posthog.yourcompany.com
  2. Configure Connection

    {
      "type": "posthog",
      "host": "posthog.yourcompany.com",
      "port": "443",
      "username": "phc_your_project_api_key_here", 
      "database": "12345"
    }

Query Types

The PostHog integration supports three types of queries:

PostHog's SQL dialect for querying event data:

SELECT event, properties.$browser, count()
FROM events 
WHERE timestamp > '2024-01-01'
  AND event = 'page_view'
GROUP BY event, properties.$browser
ORDER BY count() DESC;

2. REST API Calls

Direct API endpoint calls:

# Get events
GET /api/projects/:project_id/events

# Get insights  
GET /api/projects/:project_id/insights

# Get persons
GET /api/projects/:project_id/persons

3. Query API Calls

Structured query requests:

QUERY {
  "kind": "EventsQuery",
  "select": ["event", "timestamp", "properties"],
  "where": ["event", "=", "page_view"],
  "limit": 100
}

HogQL Reference

HogQL is PostHog's SQL dialect optimized for event analytics.

Basic Syntax

SELECT columns
FROM events | persons | sessions | recordings
WHERE conditions
GROUP BY columns
ORDER BY columns
LIMIT number

Key Tables

Table
Description

events

All tracked events with properties

persons

User/person records with properties

sessions

Session-level data

recordings

Session recording metadata

Common Columns

Events Table

-- Core event fields
SELECT 
  event,           -- Event name
  timestamp,       -- When the event occurred
  distinct_id,     -- User identifier
  uuid,           -- Unique event ID
  properties,     -- Event properties (JSON)
  person_id      -- Person UUID
FROM events;

Event Properties

-- Access event properties
SELECT 
  event,
  properties.$browser,        -- Browser name
  properties.$current_url,    -- Current URL
  properties.$lib,           -- Library used
  properties.$os,            -- Operating system
  properties.custom_property -- Custom properties
FROM events;

Person Properties

-- Access person properties  
SELECT
  distinct_id,
  properties.email,
  properties.name,
  properties.$initial_browser,
  properties.$initial_current_url
FROM persons;

Date/Time Functions

-- Date filtering
WHERE timestamp > '2024-01-01'
WHERE timestamp >= now() - interval 7 day
WHERE timestamp between '2024-01-01' and '2024-01-31'

-- Date formatting
SELECT 
  toDate(timestamp) as date,
  toStartOfWeek(timestamp) as week,
  toStartOfMonth(timestamp) as month
FROM events;

Aggregation Functions

-- Counting
SELECT count() as total_events
SELECT count(distinct distinct_id) as unique_users
SELECT countIf(event = 'purchase') as purchases

-- Statistical functions
SELECT 
  avg(properties.session_duration) as avg_duration,
  median(properties.session_duration) as median_duration,
  percentile(properties.session_duration, 0.95) as p95_duration
FROM events;

REST API Usage

Event Queries

# Get recent events
GET /api/projects/:project_id/events?limit=100

# Filter events by name
GET /api/projects/:project_id/events?event=page_view

# Filter by date range
GET /api/projects/:project_id/events?after=2024-01-01&before=2024-01-31

Insights API

# Get existing insights
GET /api/projects/:project_id/insights

# Get specific insight
GET /api/projects/:project_id/insights/123

# Create new insight
POST /api/projects/:project_id/insights
{
  "name": "Page Views This Week",
  "filters": {
    "events": [{"id": "page_view"}],
    "date_from": "-7d"
  }
}

Persons API

# Get persons
GET /api/projects/:project_id/persons

# Search persons
GET /api/projects/:project_id/persons?search=user@example.com

# Get person by ID
GET /api/projects/:project_id/persons/distinct_id

Query API Usage

The Query API provides structured querying capabilities.

Events Query

QUERY {
  "kind": "EventsQuery",
  "select": [
    "event",
    "timestamp", 
    "properties.$browser",
    "properties.custom_field"
  ],
  "where": [
    "and",
    ["event", "=", "page_view"],
    ["timestamp", ">", "2024-01-01"]
  ],
  "orderBy": ["timestamp desc"],
  "limit": 100
}
QUERY {
  "kind": "TrendsQuery",
  "series": [
    {
      "event": "page_view",
      "name": "Page Views"
    }
  ],
  "dateRange": {
    "date_from": "-30d",
    "date_to": null
  },
  "interval": "day"
}

Funnel Query

QUERY {
  "kind": "FunnelsQuery", 
  "series": [
    {"event": "page_view", "name": "Page View"},
    {"event": "sign_up", "name": "Sign Up"},
    {"event": "purchase", "name": "Purchase"}
  ],
  "dateRange": {
    "date_from": "-30d"
  }
}

Query Examples

Event Analytics

-- Most popular events
SELECT event, count() as event_count
FROM events 
WHERE timestamp > now() - interval 30 day
GROUP BY event
ORDER BY event_count DESC
LIMIT 20;

-- Events by hour of day
SELECT 
  toHour(timestamp) as hour,
  count() as events
FROM events
WHERE timestamp > now() - interval 7 day
GROUP BY hour
ORDER BY hour;

-- Top pages viewed
SELECT 
  properties.$current_url as page,
  count() as page_views,
  count(distinct distinct_id) as unique_visitors
FROM events
WHERE event = 'page_view'
  AND timestamp > now() - interval 7 day
GROUP BY properties.$current_url
ORDER BY page_views DESC
LIMIT 10;

User Analytics

-- Daily active users
SELECT 
  toDate(timestamp) as date,
  count(distinct distinct_id) as daily_active_users
FROM events
WHERE timestamp > now() - interval 30 day
GROUP BY date
ORDER BY date;

-- User retention by signup date
SELECT 
  toStartOfWeek(first_seen) as signup_week,
  count(distinct distinct_id) as signed_up,
  countIf(last_seen > first_seen + interval 7 day) as retained_week1,
  countIf(last_seen > first_seen + interval 30 day) as retained_month1
FROM (
  SELECT 
    distinct_id,
    min(timestamp) as first_seen,
    max(timestamp) as last_seen
  FROM events
  GROUP BY distinct_id
)
GROUP BY signup_week
ORDER BY signup_week;

Product Analytics

-- Feature usage analysis
SELECT 
  properties.feature_name,
  count() as usage_count,
  count(distinct distinct_id) as unique_users
FROM events
WHERE event = 'feature_used'
  AND timestamp > now() - interval 30 day
GROUP BY properties.feature_name
ORDER BY usage_count DESC;

-- Session duration analysis
SELECT 
  properties.$browser,
  avg(properties.session_duration) as avg_duration,
  median(properties.session_duration) as median_duration,
  count(distinct distinct_id) as users
FROM events
WHERE event = 'session_end'
  AND timestamp > now() - interval 7 day
GROUP BY properties.$browser
ORDER BY avg_duration DESC;

-- Conversion funnel
SELECT 
  step,
  count(distinct user_id) as users,
  count(distinct user_id) * 100.0 / first_value(count(distinct user_id)) OVER (ORDER BY step) as conversion_rate
FROM (
  SELECT distinct_id as user_id, 1 as step FROM events WHERE event = 'page_view' AND timestamp > now() - interval 30 day
  UNION ALL
  SELECT distinct_id as user_id, 2 as step FROM events WHERE event = 'sign_up' AND timestamp > now() - interval 30 day  
  UNION ALL
  SELECT distinct_id as user_id, 3 as step FROM events WHERE event = 'purchase' AND timestamp > now() - interval 30 day
)
GROUP BY step
ORDER BY step;

Performance Analytics

-- Page load times
SELECT 
  properties.$current_url as page,
  avg(properties.load_time) as avg_load_time,
  percentile(properties.load_time, 0.95) as p95_load_time,
  count() as page_loads
FROM events
WHERE event = 'page_load'
  AND timestamp > now() - interval 7 day
  AND properties.load_time > 0
GROUP BY properties.$current_url
ORDER BY avg_load_time DESC
LIMIT 20;

-- Error tracking
SELECT 
  properties.error_type,
  properties.error_message,
  count() as error_count,
  count(distinct distinct_id) as affected_users
FROM events
WHERE event = 'error'
  AND timestamp > now() - interval 24 hour
GROUP BY properties.error_type, properties.error_message
ORDER BY error_count DESC;

Marketing Analytics

-- Traffic sources
SELECT 
  properties.$referring_domain as source,
  count() as visits,
  count(distinct distinct_id) as unique_visitors,
  countIf(event = 'sign_up') as conversions
FROM events
WHERE event = 'page_view'
  AND timestamp > now() - interval 30 day
GROUP BY properties.$referring_domain
ORDER BY visits DESC;

-- Campaign performance
SELECT 
  properties.utm_campaign,
  properties.utm_source,
  properties.utm_medium,
  count(distinct distinct_id) as visitors,
  countIf(event = 'purchase') as purchases,
  sum(properties.revenue) as total_revenue
FROM events
WHERE timestamp > now() - interval 30 day
  AND properties.utm_campaign IS NOT NULL
GROUP BY properties.utm_campaign, properties.utm_source, properties.utm_medium
ORDER BY total_revenue DESC;

REST API Examples

# Get events for specific user
GET /api/projects/:project_id/events?distinct_id=user123

# Get insights with filters
GET /api/projects/:project_id/insights?filter=events&events=[{"id":"page_view"}]

# Export events as CSV
GET /api/projects/:project_id/events/export?format=csv&after=2024-01-01

# Get cohorts
GET /api/projects/:project_id/cohorts

# Get feature flags
GET /api/projects/:project_id/feature_flags

Query API Examples

// User path analysis
QUERY {
  "kind": "PathsQuery",
  "dateRange": {"date_from": "-7d"},
  "pathsFilter": {
    "include_event_types": ["$pageview"],
    "start_point": "/home"
  }
}

// Session recording query
QUERY {
  "kind": "SessionsTimelineQuery", 
  "dateRange": {"date_from": "-24h"},
  "events": [
    {"event": "error", "properties": {"error_type": "javascript"}}
  ]
}

// Property analysis
QUERY {
  "kind": "EventsQuery",
  "select": [
    "properties.$browser",
    "count()"
  ],
  "where": [
    "and",
    ["event", "=", "page_view"],
    ["timestamp", ">", "-7d"]
  ],
  "groupBy": ["properties.$browser"]
}

Best Practices

1. Query Performance

-- βœ… Good: Use specific date ranges
WHERE timestamp > now() - interval 7 day

-- βœ… Good: Filter early and be specific
WHERE event = 'page_view' 
  AND timestamp > '2024-01-01'
  AND properties.$current_url LIKE '/product%'

-- βœ… Good: Limit large result sets
LIMIT 1000

-- ❌ Avoid: Querying all historical data
WHERE timestamp > '2020-01-01'  -- Too broad

2. Property Access

-- βœ… Good: Use specific property names
SELECT properties.$browser, properties.custom_field

-- βœ… Good: Check for property existence
WHERE properties.email IS NOT NULL

-- βœ… Good: Use proper property types
WHERE toFloat64(properties.price) > 100

3. Aggregations

-- βœ… Good: Use appropriate aggregation functions
SELECT 
  count() as total_events,
  count(distinct distinct_id) as unique_users,
  avg(properties.session_duration) as avg_duration

-- βœ… Good: Group by meaningful dimensions
GROUP BY event, properties.$browser, toDate(timestamp)

4. API Usage

# βœ… Good: Use pagination for large datasets
GET /api/projects/:project_id/events?limit=100&offset=100

# βœ… Good: Filter at the API level
GET /api/projects/:project_id/events?event=page_view&after=2024-01-01

# βœ… Good: Use appropriate content types
Content-Type: application/json
Authorization: Bearer your_api_key

5. Error Handling

-- βœ… Good: Handle potential null values
WHERE properties.email IS NOT NULL AND properties.email != ''

-- βœ… Good: Use safe type conversions
WHERE toFloat64OrNull(properties.price) > 0

-- βœ… Good: Validate data before aggregating
WHERE timestamp > '2020-01-01' AND timestamp < now()

Troubleshooting

Common Errors

1. "Invalid API Key"

{
  "error": "HTTP 401: {\"type\":\"authentication_error\",\"code\":\"invalid_api_key\"}"
}

Solution: Verify your Project API Key is correct and has proper permissions.

2. "Project not found"

{
  "error": "HTTP 404: Project not found"
}

Solution: Check that your Project ID is correct and accessible with your API key.

3. "HogQL syntax error"

-- ❌ Error: Invalid HogQL syntax
SELECT * FROM events WHERE timestamp > invalid_date

-- βœ… Fix: Use proper date format
SELECT * FROM events WHERE timestamp > '2024-01-01'

4. "Property not found"

-- ❌ Error: Property doesn't exist
SELECT properties.nonexistent_field FROM events

-- βœ… Fix: Check property exists first
SELECT properties.nonexistent_field FROM events 
WHERE properties.nonexistent_field IS NOT NULL

Debugging Tips

  1. Start Simple: Begin with basic queries and add complexity gradually

  2. Check Date Ranges: Ensure your date filters return data

  3. Validate Properties: Use DESCRIBE events to see available properties

  4. Test with Limits: Use LIMIT 10 for initial testing

  5. Check API Responses: Look at raw JSON responses for debugging

Performance Optimization

  1. Use Specific Date Ranges to reduce data scanned

  2. Filter Early with WHERE clauses before aggregation

  3. Limit Results with LIMIT clauses

  4. Index on Properties that you query frequently

  5. Use Sampling for large datasets when approximate results are acceptable

Connection Issues

# Test connection
curl -H "Authorization: Bearer your_api_key" \
  https://app.posthog.com/api/projects/your_project_id/

# Verify API key format
# Should start with 'phc_' for project API keys
phc_abcdef1234567890...

# Check project ID is numeric
12345  # βœ… Correct
project_name  # ❌ Incorrect

Advanced Usage

Custom Event Properties

-- Track custom business metrics
SELECT 
  properties.plan_type,
  count(distinct distinct_id) as customers,
  sum(toFloat64(properties.monthly_revenue)) as total_revenue,
  avg(toFloat64(properties.monthly_revenue)) as avg_revenue_per_customer
FROM events
WHERE event = 'subscription_created'
  AND timestamp > now() - interval 30 day
GROUP BY properties.plan_type
ORDER BY total_revenue DESC;

Cohort Analysis

-- Monthly cohort retention
WITH cohorts AS (
  SELECT 
    distinct_id,
    toStartOfMonth(min(timestamp)) as cohort_month
  FROM events
  WHERE event = 'sign_up'
  GROUP BY distinct_id
),
monthly_activity AS (
  SELECT 
    distinct_id,
    toStartOfMonth(timestamp) as activity_month
  FROM events
  WHERE event = 'page_view'
  GROUP BY distinct_id, activity_month
)
SELECT 
  cohort_month,
  count(distinct c.distinct_id) as cohort_size,
  count(distinct ma.distinct_id) as retained_users,
  count(distinct ma.distinct_id) * 100.0 / count(distinct c.distinct_id) as retention_rate
FROM cohorts c
LEFT JOIN monthly_activity ma ON c.distinct_id = ma.distinct_id 
  AND ma.activity_month = cohort_month + interval 1 month
GROUP BY cohort_month
ORDER BY cohort_month;

Real-time Dashboards

-- Live event stream (last 5 minutes)
SELECT 
  timestamp,
  event,
  distinct_id,
  properties.$current_url
FROM events
WHERE timestamp > now() - interval 5 minute
ORDER BY timestamp DESC
LIMIT 100;

-- Current active users (last 5 minutes)
SELECT count(distinct distinct_id) as current_active_users
FROM events
WHERE timestamp > now() - interval 5 minute;

Getting Help

Start analyzing your product data with SyneHQ today! πŸ¦”βœ¨

Last updated