How do I join user profiles? Documentation
One of the first questions we get when our customers start querying all of their data is, how do I join all this data together? For example, let’s say you’d like to know if support interactions in Zendesk increase revenue in Stripe, or which percentage of users opened your email campaign and visited your website or mobile app? The key to answering these advanced questions is tying your data together across these sources. To do that, you need a common user identifier.
What is the user ID problem?
Each SaaS tool you use has its own way of identifying users with a unique primary key. And, you will find each of these different IDs across different collections of tables in your database. So, when you want to start matching Joe Shmo who entered a ticket in Zendesk and also clicked through a campaign in Mailchimp, it starts to get tricky.
For example, Stripe keeps track of users with a customer_id
, Segment requires auser_id
, and Marketo uses email
to uniquely identify each person.
To effectively join across these sources, you need to understand how each id maps to each other. The best way to do this is to create a common identifier across tools.
Use a common identifier when possible
When you install a new tool (or use Segment to install all of them at once), you need to choose what you will put in the ID field. There are lots of different options for this: emails, twitter handles, usernames, and more.
However, we suggest using the same ID you generate from your production database when you create a new user. Database IDs never change, so they are more reliable than emails and usernames that users can switch at their leisure. If you use this same database ID across as many tools as possible, it will be easier to join identities down the road. (In MongoDB, it would look something like this 507f191e810c19729de860ea
.)
analytics.identify('1e810c197e', { // that's the user ID from the database
name: 'Bill Lumbergh',
email: '[email protected]'// also includes email
});
Though we wish you could use a database ID for everything, some tools force you to identify users with an email. Therefore, you should make sure to send email along to all of your other tools, so you can join on that trait as a fallback.
For Segment Destination Users
Integrating as many tools as possible through Segment will make your joins down the road a little easier. When you use Segment to identify
users, we’ll send the same ID and traits out to all the destinations you turn on in our interface. (More about Segment destinations.)
A few of our destination partners accept an external ID, where they will insert the same Segment user ID. Then you can join tables in one swoop. For example, Zendesk saves the Segment User ID as external_id
, making a Segment-Zendesk join look like this:
SELECT zendesk.external_id, users.user_id
FROM zendesk.tickets zendesk
JOINsegment.usersusers
ON zendesk.tickets.external_id = segment.user_id
Here’s a look at the Segment destinations that store the Segment User ID:
Tool | Corresponding Trait | Corresponding Sources Table |
Zendesk | external_id | zendesk.tickets.external_id |
Mailchimp | unique_email_id | mailchimp.lists.unique_email_id |
Intercom | user_id | intercom.users.user_id |
How to merge identities
Whether you’re using Segment or not, we suggest creating a master user identities table that maps IDs for each of your sources.
This table will cut down on the number of joins you have to do because some IDs may only exist in one out of many tables related to a source.
Here’s sample query to create a master user identities table:
CREATE TABLE user_identities AS (
select
segment.id as segment_id,
segment.email as email,
zendesk.id as zendesk_id,
stripe.id as stripe_id,
salesforce.id as salesforce_id,
intercom.id as intercom_id
from segment.users segment
– Zendesk
leftjoin zendesk.users zendesk on
( zendesk.external_id = segment.id– if enabled through Segment
or zendesk.email = segment.email ) – fallback if not enabled through Segment
– Stripe
left join stripe.customers stripe on
stripe.email = segment.email
– Salesforce
left join salesforce.leads salesforce on
salesforce.email = segment.email
– Intercom
left join intercom.users intercom on
( intercom.user_id = segment.id– if enabled through Segment
or intercom.email = segment.email ) – fallback if not enabled through Segment
group by 1,2,3,4,5,6
)
You’ll spit out a user table that looks something like this:
segment_id | zendesk_id | stripe_id | salesforce_id | intercom_id | |
mYhgYcRBC7 | [email protected] | 1303028105 | cus_6ll4iGAO7X8u7L | 00Q31000014XGRcEAO | 55c8923f67b8d6524600037f |
mYhgYcRBC7 | [email protected] | 1303028105 | cus_6ll3xVVSLIZomI | 00Q31000014XGRcEAO | 55c8923f67b8d6524600037f |
7adt7XG27c | [email protected] | 1472230319 | cus_6u2ZcW3uC8VwZa | 00Q31000014sKCqEAM | 5626dfed2e028608710000ce |
QZnP7cViH1 | [email protected] | 1486907299 | cus_6yrv9bwLgXN78s | 00Q31000015G7kIEAS | 55f6a142bd531ec6930005fa |
While creating this table in SQL is a good strategy, we’d be remiss not to point out a few drawbacks to this approach. First, you need to run this nightly or at some regular interval. And, if you have a large user base, it might take a while to run. That said, it’s probably still worth it.
How to run a query with your joined data
So what can you do once you have all of your ID’s mapped? Answer some pretty nifty questions that is. Here are just a few SQL examples addressing questions that incorporate more than one source of customer data.
Segment + Zendesk
– Which referral source is sending us the most tickets?
SELECTsegment.referral_source,
COUNT(zendesk.ticket_id) AS count_of_tickets
FROM zendesk.tickets zendesk
LEFT JOIN segment.userssegment
ONusers.segment_id = segment.user_id
GROUP BY 1
ORDER BY 2 desc
Stripe + Zendesk
– How many tickets do we receive across each pricing tier?
SELECT stripe.plan_name AS plan_name,
COUNT(zendesk.ticket_id) AS count_of_tickets
– Start with Zendesk
FROM zendesk.tickets zendesk
– Merge Users
LEFT JOIN user_identities users
ON zendesk.id = users.zendesk_id
– Add Stripe
LEFT JOIN stripe.charges stripe
ON users.stripe_id = stripe.customer_id
– Group by plan name, from most tickets to least
GROUPBY1
ORDERBY2desc
Advanced Tips
An alternative to the lookup user table in SQL would be writing a script to grab user IDs across your third-party tools and dump them into your database.
You’d have to ping the APIs of each tool with something like an email, and ask them toreturn the key or id for the corresponding user in their tool.
A sample script, to run on a nightly chron job, would look something like this:
var request = require('superagent'); // https://www.npmjs.com/package/superagent
var username = '<your-username>';
var password = '<your-password>';
var host = 'https://segment.zendesk.com/api/v2/';
/
Gets the user object in Zendesk by email address.
@param {String} email
@param {Function} fn
*/
functiongetUserIds(email, fn) {
request
.get(host + 'users/search.json?query=' + email)
.auth(username, password)
.end(fn);
}
/
Get the first Zendesk user that matches '[email protected]'
/
getUserIds('[email protected]', function(err, res) {
if (err) return err;
// res.body.users will be an Array
// res.body.users[0].id will return the id
of the first user
});
If you have any questions, or see anywhere we can improve our documentation, please let us know!
相關推薦
How do I join user profiles? Documentation
One of the first questions we get when our customers start querying all of their data is, how do I join all this data together? For example, let’s say you’
[iOS] How do I save user preferences for my iPhone app?
One of the easiest ways would be saving it in the NSUserDefaults: Setting: NSUserDefaults *userDefaults = [NSUserDefaults standardUserDefaults]; [userD
How do I import historical data? Documentation
When transitioning over to Segment customers commonly want to import historical data to tools they are migrating to or evaluating.Note: Historical imports
How do I measure my advertising funnel? Documentation
However, it’s surprisingly hard to answer questions about the ROI of your ad campaigns, and many technical marketers aren’t able to dig into the numbers wi
How do I handle common cloud source errors? Documentation
The most common reasons why sources will have trouble is due to authentication or permission issues. When the issue is authentication-related, you'll see a
How do I pick a secure password? Documentation
Picking a strong password is one of the most important things you can do to protect your account.Under the HoodWhen you first create a Segment account, or
How do I add a team member? Documentation
If you are on our Team or Business plan you can add a Team member in your workspace team page and inviting any team members by email. If you are on a Devel
How do I find out my usage data? Documentation
If you have questions about your data usage or how it relates to your bill, we recommend logging into your Segment workspace, clicking on the top left arro
How do I decide between Redshift, Postgres, and BigQuery? Documentation
Comparing Redshift and PostgresIn most cases, you will get a much better price-to-performance ratio with Redshift for typical analyses.Redshift lacks some
How do I find my source slug? Documentation
Your source slug can be found in the URL when you’re looking at the source destinations page or live debugger. The URL structure will look like this:If you
How do I find my write key? Documentation
The write key is a unique identifier for your Source. It lets Segment know which Source is sending the data and therefore which destinations should receive
How do I measure the ROI of my Marketing Campaigns? Documentation
The purpose of marketing campaigns is to drive traffic to your store front. But how do you know which campaigns yield the most conversions or what channel
How do I find what queries were executing in a SQL memory dump?-----stack
been sea under lba bject ecif tool data- mil https://blogs.msdn.microsoft.com/askjay/2010/10/03/how-do-i-find-what-queries-were-execu
ubuntu How do I configure proxies without GUI?
cli pri art lar open config user settings 修改 想法: 我的想法是想是一臺國內的 ubuntu 雲主機可以通過另外一臺在國外(新加坡)的服務器 ,來實現可以訪問 google ,哈哈,比較好查資料:) 下面的做法 去修改 /et
How do I clone a generic list in C#?
code sele listt list ati class ocl list() () static class Extensions { public static IList<T> Clone<T>(this IList<T>
how do I access the EC Embedded Controller firmware level with wmi win32?
Imports System Imports System.Management Imports System.Windows.Forms Namespace WMISample Public Class MyWMIQuery Public Overloads Shar
How do I add a Foreign Key Field to a ModelForm in Django?
What I would like to do is to display a single form that lets the user: Enter a document title (from Document model
Dlib how do I save image
解決c++ - In Dlib how do I save image with overlay? 推薦:how to save a c++ object in java object and use it http://blog.csdn.net/luoshen
How do I interpret scsi status messages in RHEL like "sd 2:0:0:243: SCSI error: return code = 0x0800
Issue What does "return code = 0xNNNNNNNN" mean, for example 0x08000002 within the following: Raw Oct 23 14:56:25 uname kernel: sdas: C
【轉】How do I set the real time scheduling priority of a process?
In the event that a process is not achieving the desired performance performance benchmarks, it can be helpful to set CPU affinity, real