Skip to main content

Microsoft OneDrive Reader

Note

This adapter is in preview and is available on Striim Developer only. See Striim Developer for more information.

Microsoft OneDrive is a cloud-based storage solution that allows users to securely store, sync, and share files across devices, enabling seamless collaboration and easy access to documents anytime, anywhere. It integrates with Microsoft 365 applications, enhancing productivity through real-time file sharing and version control.

Striim's Microsoft OneDrive Reader transforms these application entities into relational tables, enabling real-time, efficient migration of data from OneDrive to any relational database or data warehouse. This integration supports advanced analytics, improved reporting, and comprehensive data insights.

Feature summary

Feature

Supported?

Notes

Objects

Standard objects

Custom objects

Authentication

Basic authentication

Username and password

OAuth authentication

Manual configuration based

Custom authentication methods

Not all methods may be supported

Operations

Automated mode

Initial load

Pull-based incremental load

Push-based incremental load

Automated pipeline

Governance

Connection profile

Sherlock AI

Sentinel AI

Schema handling

Initial schema creation

Works with supported targets

Schema evolution

Setup

Wizard template

Flow Designer

Striim TQL

Runtime

Resilience/recovery

Parallel execution

Metrics

Standard metrics

Supported authentication method

The Microsoft OneDrive Reader supports OAuth authentication. Creating a connection requires creating an OAuth application in Azure AD, obtaining the OAuth credentials, exchanging them for an access token and refresh token, and configuring the connection properties in Striim.

To create an Azure AD application:

  1. In the left navigation pane, select Azure Active Directory > App registrations.

  2. Choose New Registration.

  3. Enter a name for the application.

  4. Specify the types of accounts this application should support:

    • For private use applications, select Accounts in this organization directory only.

    • For distributed applications, select one of the multi-tenant options.

  5. Set the redirect URI to http://localhost:33333 (default) OR, if you want to specify a different port, specify the desired port.

  6. To register the new application, choose Register. An application management screen displays. Record these values for later use. (You will use the Application (client) ID value to set the Client ID property, and the Directory (tenant) ID value to set the Azure Tenant property.)

  7. Navigate to Certificates & Secrets. Select New Client Secret for this application and specify the desired duration. After the client secret is saved, the Azure App Registration displays the key value. This value is displayed only once, so record it for future use. (You will use it to set the Client Secret property.)

  8. Select the Microsoft Graph API and then add the delegated permissions Files.ReadWrite.All or Files.Read.All. Click the Grant admin consent for the new permissions to take effect.

  9. If you have specified the use of permissions that require admin consent, you can grant them from the current tenant on the API Permissions page.

To obtain the access token and refresh token:

  1. To obtain the authorization code, use a web browser to load this URL request.

    GET https://login.live.com/oauth20_authorize.srf?client_id={client_id}&scope={scope}
    &response_type=code&redirect_uri={redirect_uri}
  2. Upon successful authentication and authorization of your application, the web browser will be redirected to your redirect URL with additional parameters added to the URL.

    https://login.live.com/oauth20_authorize.srf?code={authorization_code}
  3. After you have received the code value, you can redeem this code for a set of tokens that allow you to authenticate with the OneDrive API. To redeem the code, make the following request:

    POST https://login.live.com/oauth20_token.srf
    Content-Type: application/x-www-form-urlencoded
    
    client_id={client_id}&redirect_uri={redirect_uri}&client_secret={client_secret}
    &code={code}&grant_type=authorization_code
  4. If the call is successful, the response for the POST request contains a JSON string that includes several properties, including access_token, token_type, and refresh_token (if you requested the wl.offline_access scope).

    {
      "token_type":"bearer",
      "expires_in": 3600,
      "scope":"wl.basic onedrive.readwrite",
      "access_token":"{access_token}",
      "refresh_token":"{refresh_token}"
    }

Supported objects

The following are the supported objects for reading from Microsoft OneDrive:

  • Drives

  • Files

  • FileVersions

  • Folders

  • Permissions

  • SharedResources

  • Users

Microsoft OneDrive Reader properties

Property

Type

Default value

Notes

Client ID

String

The client ID assigned when you register your application with an OAuth authorization server.

Client secret

Password

The client secret assigned when you register your application with an OAuth authorization server.

Access token

Password

The access token for connecting using OAuth. The OAuth access token is retrieved from the OAuth server as part of the authentication process. It has a server-dependent timeout and can be reused between requests.

Refresh token

Password

The OAuth refresh token for the corresponding OAuth access token, used to refresh the OAuth access token when using OAuth authentication.

Azure tenant

String

The Microsoft Online tenant being used to access data. For instance, contoso.onmicrosoft.com.

Alternatively, specify the tenant ID. This value is the directory ID in the Azure Portal > Azure Active Directory > Properties.

Connection pool size

Integer

20

Specifies the maximum number of active connections.

Exclude tables

String

A list of tables excluded from read operations. Typically used to create a list of exceptions when the Tables property includes wildcards. Misconfiguration of the Tables and Exclude Tables properties can cause "Invalid table names" errors.

Incremental load marker

String

The incremental load marker is a unique incremental column in each object used for incremental load. When no marker is specified, tables are resynced at each polling interval.

Specify the name of the column that contains the start position value. This column must meet the following criteria:

  • It should have an integer or timestamp data type (for example, a creation timestamp or an employee ID).

  • The values must be unique and continuously increasing to ensure proper incremental reading.

Migrate schema

Boolean

False

Only available in Initial Load or Automated mode. Set to True to enable initial schema migration, which propagates the object schema from the source to the target.

Mode

Select list:

  • Automated mode

  • Initial load

  • Incremental load

Automated

Automated mode applies incremental updates to objects that support incremental load and performs full resyncs for objects that do not support incremental load.

Polling interval

Integer

5m

Specifies an interval as an integer followed by a unit. Supported units are days (d), hours (h), minutes (m), or seconds (s). The reader polls the source at the specified interval.

Refresh token

Password

An OAuth 2.0 refresh token.Use the value generated while creating the token.

Start Position

String

%=-1

Value of the incremental load marker that defines the initial reading position.

Tables

String

A semicolon-delimited (;) list of objects to read from the source. Supports the % wildcard. Misconfiguration of the Tables and Exclude Tables properties can cause "Invalid table names" errors. Do not modify this property when recovery is enabled for the application.

Thread pool count

Integer

10

The number of parallel running threads. The default value of zero specifies single-threaded operation.

When the value of the thread pool counter is higher than the connection pool size, large data ingestion operations can cause the app to halt. Since best performance is achieved when using one thread for each table being synced, increasing the size of the connection pool to match the number of threads in use is a performance best practice.