Skip to main content

Microsoft Excel Online Reader

Note

This adapter is in preview and is available on Striim Developer only. See Striim Developer for more information.

Excel Online is the web-based version of Microsoft Excel, part of the Microsoft Office suite of tools, which allows users to create, edit, and share spreadsheets directly in a web browser. It provides a lightweight and collaborative way to work on Excel files without requiring the desktop application.

The Microsoft Excel Online Reader connects with the Excel Online platform and reads supported objects.

Feature summary

Feature

Supported?

Notes

Objects

Standard objects

Custom objects

Authentication

Basic authentication

Username and password

OAuth authentication

Manual configuration based

Custom authentication methods

Not all methods may be supported

Operations

Automated mode

Initial load

Pull-based incremental load

Push-based incremental load

Automated pipeline

Governance

Connection profile

Sherlock AI

Sentinel AI

Schema handling

Initial schema creation

Works with supported targets

Schema evolution

Setup

Wizard template

Flow Designer

Striim TQL

Runtime

Resilience/recovery

Parallel execution

Metrics

Standard metrics

Supported authentication method

Microsoft Excel Online supports user-based authentication using Azure AD, which leverages OAuth 2.0 for authentication and authorization. Creating a connection involves registering an application, obtaining client credentials, retrieving an authorization code via the redirect URL, exchanging the authorization code for an access token and refresh token, and configuring connection properties in Striim.

  1. In the left navigation pane, select Azure Active Directory > App registrations.

  2. Click New registration.

  3. Enter a name for the application.

  4. Specify the types of accounts this application should support:

    • For private use applications, select Accounts in this organization directory only.

    • For distributed applications, select one of the multi-tenant options.

    Note

    If you select Accounts in this organizational directory only (default), when you establish a connection with the Microsoft Excel Online Reader you must set the Azure Tenant to the ID of the Azure AD Tenant. Otherwise, the authentication attempt fails.

  5. Set the redirect URI to http://localhost:33333 (default) OR, if you want to specify a different port, specify the desired port and set CallbackURL to the exact reply URL you just defined.

  6. To register the new application, choose Register. An application management screen displays. Record these values for later use. (You will use the Application (client) ID value to set the OAuth Client ID parameter, and the Directory (tenant) ID value to set the Azure Tenant parameter.)

  7. Navigate to Certificates & Secrets. Select New Client Secret for this application and specify the desired duration. After the client secret is saved, the Azure App Registration displays the key value. This value is displayed only once, so record it for future use. (You will use it to set the OAuth Client Secret.)

  8. Add the following application permissions: Sites.Read.All, Files.Read, Files.Read.All, Files.Read.Selected, Files.ReadWrite, Files.ReadWrite.All, Files.ReadWrite.AppFolder, Files.ReadWrite.Selected, and offline_access.

  9. If you have specified the use of permissions that require admin consent (such as the Application Permissions), you can grant them from the current tenant on the API Permissions page.

To grant admin consent:

  1. Have an admin log in to the Azure portal at https://portal.azure.com.

  2. Navigate to App Registrations and find the custom Azure AD application you created.

  3. Under API Permissions, click Grant Consent and follow the wizard.

This gives your application permissions on the tenant under which it was created.

Supported objects

The following are the supported objects for reading from Microsoft Excel Online:

  • Drives

  • SharePointSites

  • SharedDocuments

  • Workbooks

  • Worksheets

Microsoft Excel Online Reader properties

Property

Type

Default value

Notes

Azure tenant

String

The Microsoft Online tenant being used to access data. For instance, example.onmicrosoft.com.

Alternatively, specify the tenant ID. This value is the directory ID in the Azure Portal > Azure Active Directory > Properties.

OAuth Client ID

String

Client ID of the private app registered in the Active Directory of the Microsoft platform.

OAuthClient secret

Password

Client secret of the private app registered in the Active Directory of the Microsoft platform.

Drive

String

Specifies the ID of the drive. A list of all drives is available from the Drives view. This property takes precedence over SharepointURL.

This means that if SharepointURL and Drive are specified, a schema will only be identified for the drive specified by Drive, and tables will only be identified from the worksheets in workbooks in this drive.

Workbook

String

Specifies the name or ID of the workbook. A list of all workbooks is available from the Workbooks view.

Include Share Point sites

Boolean

Whether to retrieve drives for all SharePoint sites when querying Drives view. If true the provider will retrieve all Site IDs recursively and for each of them issue a separate call to get their drives.

Setting this property to true may decrease performance for the Drives view.

Show shared documents

Boolean

Whether or not to show shared documents. If set to true, shared documents will be listed along-side user owned workbooks as database tables.

Ultimately, the specific files should have been granted direct access or explicitly shared with the authenticated user.

Connection pool size

Integer

20

Specifies the maximum number of active connections.

Exclude tables

String

A list of tables excluded from read operations. Typically used to create a list of exceptions when the Tables property includes wildcards. Misconfiguration of the Tables and Exclude Tables properties can cause "Invalid table names" errors.

Incremental load marker

String

The incremental load marker is a unique incremental column in each object used for incremental load. When no marker is specified, tables are resynced at each polling interval.

Specify the name of the column that contains the start position value. This column must meet the following criteria:

  • It should have an integer or timestamp data type (for example, a creation timestamp or an employee ID).

  • The values must be unique and continuously increasing to ensure proper incremental reading.

Migrate schema

Boolean

False

Only available in Initial Load or Automated mode. Set to True to enable initial schema migration, which propagates the object schema from the source to the target.

Mode

Select list:

  • Automated mode

  • Initial load

  • Incremental load

Automated

Automated mode applies incremental updates to objects that support incremental load and performs full resyncs for objects that do not support incremental load.

Polling interval

Integer

5m

Specifies an interval as an integer followed by a unit. Supported units are days (d), hours (h), minutes (m), or seconds (s). The reader polls the source at the specified interval.

Refresh token

Password

An OAuth 2.0 refresh token.Use the value generated while creating the token.

Start Position

String

%=-1

Value of the incremental load marker that defines the initial reading position.

Tables

String

A semicolon-delimited (;) list of objects to read from the source. Supports the % wildcard. Misconfiguration of the Tables and Exclude Tables properties can cause "Invalid table names" errors. Do not modify this property when recovery is enabled for the application.

Thread pool count

Integer

10

The number of parallel running threads. The default value of zero specifies single-threaded operation.

When the value of the thread pool counter is higher than the connection pool size, large data ingestion operations can cause the app to halt. Since best performance is achieved when using one thread for each table being synced, increasing the size of the connection pool to match the number of threads in use is a performance best practice.