top of page
  • 5.15 Technologies

How to collect Email data from Microsoft Graph API with Python

Microsoft Exchange Online serves as a pivotal communication and collaboration hub for many organizations. Forming an integral part of their operational framework. This powerful tool provides a wealth of valuable and enlightening data for informed decision-making. Gathering and analyzing this data can pose a challenge for many. One tool that is particularly helpful is the Microsoft Graph API. Adding integrations with the Microsoft Graph API can help take your organization to the next level.


In this blog, we will walk through how to connect, authenticate, collect, and process data from the Microsoft Graph API using Python. By leveraging the Microsoft Graph API, you can view information about your organization. This can change your whole perspective on how your teams are operating. With this data you can build knowledge graphs and identify patterns that may reduce costs or improve the value of your investment. The Microsoft Graph API allows you to collect data about email, calendars, Microsoft Teams, and online collaboration. Below, you'll find informative demonstrations and prototypes that will kick-start your journey of exploration.


How to Connect to the Microsoft Graph API

To connect to the Microsoft Graph API in Python we need to define variables and import a library called MSAL (Microsoft Authentication Library).


We will need five variables to connect to Microsoft Graph API:

  • Client ID: The Public Identifier for the application

  • Client Secret: A confidential passcode used to authenticate to the application

  • Tenant ID: A unique identifier of the Azure Active Directory instance

  • Authority: This is the API login link that includes your Tenant ID

  • Scope: This defines the permissions that the token will have

import msal

client_id = '**************'
tenant_id = '**************'

authority = f'https://login.microsoftonline.com/{tenant_id}'

Authentication with the Microsoft Graph API

Now that we have defined all the necessary variables, we can generate an access token for the Microsoft Graph API. We will do this by calling a function of MSAL named ConfidentialClientApplication. This function takes three inputs:

  • Client id

  • Authority

  • Client credential (client secret)

Once we've created a good connection to our application, we can call another function to obtain the API token. This function is called “acquire_token_for_client”, and we will pass in the scope that we defined earlier.


See a code example below:

def get_token(username, password, c_id, c_secret, auth):
    try:
        app = msal.ClientApplication(
            c_id, authority=auth,
            client_credential=c_secret,
        )
        result = app.acquire_token_by_username_password(username, password, scopes=['User.ReadBasic.All'])
        return result['access_token']
    except:
        return "Authentication Failed"

Now that we have our access token, let's get started collecting data.


Gathering Email Data

In this example, we are going to collect email data. We will need to import three more python libraries to accomplish this. The extra libraries we need are listed below:

import requests
import json
import pandas as pd

The first step to collecting email data is to define our endpoint URL. Most Microsoft Graph API URLs start with the same base URL, which is https://graph.microsoft.com/v1.0/. Then we add the endpoint for email, which you'll see below.

'''
GET /users/{id | userPrincipalName}/mailFolders/{id}/messages
'''

user_id = '***************'
url = f'https://graph.microsoft.com/v1.0/users/{user_id}/mailFolders/inbox/Messages?$top=999&$select=sender,subject,toRecipients'

We also have custom parameters defined, including "top” and “select.” By default, this endpoint will only return 10 emails, so for this example we define top to be 999. Meaning that the request will return up to 999 emails. Select tells the API which fields we want to be returned with the request. For this request, each email will have only the sender, subject, and recipients returned with it.


Now we'll retrieve a token using the code we wrote earlier, define headers for the request, and make the request using the Python Requests library. We will load the response as JSON. This will make it easier to parse through in the next step.

token = get_token()
if token == "Authentication Failed":
    print('Invalid Token. Abort Process')
headers = {"Authorization": f"Bearer  {token}"}
emails = requests.get(url, headers=headers)

Processing the Data

Now that we have requested and received data, the next step is to process the data. In this example, we will perform basic processing and output the result in a CSV file. All the email data we are after is within the key “value” of the “email_json” variable. In the code segment below, you can see we have a “for loop” that will iterate through each email in the “value” key.

email_json = json.loads(emails.text)
email_data = []
for email in email_json['value']:
    subject = email['subject']
    sender_name = email['sender']['emailAddress']['name']
    sender_email = email['sender']['emailAddress']['address']
    recipients = []
    for recipient in email['toRecipients']:
        recip_name = recipient['emailAddress']['name']
        recip_email = recipient['emailAddress']['address']
        recipients.append([recip_name, recip_email])
    email_data.append([subject, sender_name, sender_email, recipients])

df_emails = pd.DataFrame(data=email_data, columns=['Subject', 'SenderName', 'SenderEmail', 'Recipients'])

df_emails.to_csv("Email_Data.csv", index=False)

For each email, we capture the Subject, Sender Name, Sender Email, and the Recipients. As you can see, we also have a nested “for loop” that iterates through the Recipients and adds them to a recipients list variable. Once we have all the data stored in python variables, we add it to the email data list. The last step is to take the list of processed emails, convert it to a Pandas DataFrame, and then we save that DataFrame as a CSV file.


Full Code Snippet:

import msal

client_secret = '**************'
client_id = '**************'
tenant_id = '**************'

authority = f'https://login.microsoftonline.com/{tenant_id}'

def get_token(username, password, c_id, c_secret, auth):
    try:
        app = msal.ClientApplication(
            c_id, authority=auth,
            client_credential=c_secret,
        )
        result = app.acquire_token_by_username_password(username, password, scopes=['User.ReadBasic.All'])
        return result['access_token']
    except:
        return "Authentication Failed"
import requests
import json
import pandas as pd

'''
GET /users/{id | userPrincipalName}/mailFolders/{id}/messages
'''

user_id = '***************'
url = f'https://graph.microsoft.com/v1.0/users/{user_id}/mailFolders/inbox/Messages?$top=999&$select=sender,subject,toRecipients'

token = get_token()
if token == "Authentication Failed":
    print('Invalid Token. Abort Process')
headers = {"Authorization": "Bearer " + token}
emails = requests.get(url, headers=headers)

email_json = json.loads(emails.text)
email_data = []
for email in email_json['value']:
    subject = email['subject']
    sender_name = email['sender']['emailAddress']['name']
    sender_email = email['sender']['emailAddress']['address']
    recipients = []
    for recipient in email['toRecipients']:
        recip_name = recipient['emailAddress']['name']
        recip_email = recipient['emailAddress']['address']
        recipients.append([recip_name, recip_email])
    email_data.append([subject, sender_name, sender_email, recipients])

df_emails = pd.DataFrame(data=email_data, columns=['Subject', 'SenderName', 'SenderEmail', 'Recipients'])
df_emails.to_csv("Email_Data.csv", index=False)

Conclusion

This blog is meant to serve as an introduction to Microsoft Graph API. With its versatility for personal or organization-wide analytics, Graph API empowers you to reach new heights. The possibilities and use cases for are endless.


Now, armed with the necessary tools, you can embark on your journey towards success.



Thank you for taking the time to review this article. Reach out to 5.15 if you'd like an analysis of your environment.


Comments


bottom of page