Spotfire Copilot: Implementation Guide, Architecture, and Training

Resources

Spotfire Copilot: Implementation Guide, Architecture, and Training

Why Spotfire Copilot Matters for Data & AI Teams

Organizations are under pressure to accelerate insights, govern AI responsibly, and scale analytics without expanding headcount. Spotfire Copilot adds a natural-language assistant directly inside Spotfire, enabling business users, analysts, and developers to ask questions, generate visuals, and even build data functions faster—with the guardrails your enterprise needs.

What you’ll learn in this webinar:

Spotfire Copilot overview and real-world demos
Technical architecture: data loaders, vector DB, LLM options, and the orchestrator
Implementation steps: installation, configuration, and token management
Governance & security considerations for regulated environments
Training paths for operators, business users, and developers
Advanced energy analytics use case: drilling reports with RAG and graph DB

What Is Spotfire Copilot? (Features & Value)

Natural language Q&A over Spotfire DXPs (find trends, outliers, metrics quickly).
Autocharts: “natural language in → visualizations out” across 10+ native chart types.
Data Function generation: describe what you want; Copilot scaffolds the script and I/O.
Documentation search: ask product or API questions without leaving your DXP.
Custom corpus (RAG): index your manuals, SOPs, PDFs for private Q&A.

Outcome: faster analysis, reduced developer toil, and more self-serve analytics for business users.

Spotfire Copilot Architecture: How It Works

A production-ready Copilot environment has both front-end and back-end pieces:

Data Loaders (containers): ingest Spotfire docs, DXPs, and your own PDFs/SOPs.
Vector DB: stores/serves high-dimensional embeddings for retrieval augmented generation (RAG).
LLM(s): your choice—Azure OpenAI / OpenAI, Claude (AWS), Gemini (GCP), or self-hosted models via Cohere/Ollama.
Orchestrator: coordinates loaders, vector DB, and LLMs; handles workflows and short-lived tokens.

Deployment patterns:

Fully on-prem Spotfire Copilot: all components inside your network perimeter.
Hybrid cloud: keep loaders/orchestrator on-prem; use managed vector DB and LLM APIs for scale.
Multi-cloud: distribute for resilience and latency.

Implementation Checklist: From Pilot to Production

Select LLM(s) based on data sensitivity, latency, cost, and allowed providers.
Choose a vector database (managed or self-hosted) aligned to your infra standards.
Stand up containers (Docker/Kubernetes) for loaders and orchestrators.
Connect sources (DXPs, documentation, data catalogues) via loaders.
Configure authentication & short-lived tokens; establish rotation policies.
Deploy web player add-ins and enable Copilot for the right user groups.
Validation & hardening: prompt attack tests, hallucination tests, access boundaries, audit logs.
Promote Dev → Test → Prod with change controls and runtime monitoring.

Pro tip: treat Copilot as co-pilot, not pilot. Keep humans in the loop, especially for data functions and governance workflows.

Governance, Security, and Access Controls

RBAC alignment: Copilot respects Spotfire user roles and node/region boundaries.
Corpus scoping: separate in-scope and out-of-scope data; isolate by environment when needed.
Auditability: log Q&A, data access, and Copilot actions to your SIEM or action logs.
PII/PHI controls: apply masking/pseudonymization before indexing; enforce least privilege.
Token handling: short-lived orchestrator tokens + centralized secrets management.

Demo Highlights: Building Faster with Spotfire Copilot

Data Function in minutes: created a function to fetch, smooth, and pivot external time-series; Copilot generated script + inputs/outputs end-to-end.
What should I look at? Copilot summarized metadata across data tables to guide first-step exploration.
Autocharts: natural language prompts to generate the right visualization without manual configuration.

Advanced Use Case: Daily Drilling Reports with Graph DB + RAG

Energy teams manage thousands of PDF drilling reports across wells and years. Traditional vector search alone can lose semantic relationships. In our approach:

Chunk PDFs smartly and index into graph DB (e.g., Amazon Neptune) to preserve entities/relationships, with optional OpenSearch vector DB.
Use agents/workflows to extract formations, events, weather, issues, and KPIs with high correctness and reduced hallucinations.
Serve answers inside Spotfire via Copilot for interactive Q&A against unstructured operational history.

Why it matters: faster incident review, trend discovery, and report generation—without manual sifting.

Training Paths: Operations, Business Users, Developers

Operations (Admins): installation, orchestration, token/auth lifecycle, monitoring.
Business Users: effective prompting, Q&A across DXPs, adding contextual docs.
Developers: data function generation, complex visual workflows, advanced prompting for reproducible outputs.

What’s New & What’s Next

Recent and upcoming improvements highlighted in the session:

Explain Visual now uses aggregated plot data (not raw data) to boost accuracy.
Formatted responses (markdown) for readability.
Increased instrumentation & action logs for admin insight.
Smaller container images and short-lived tokens for smoother ops.
Exploration into agentic workflows (planner/supervisor + domain agents) for reliability and cost control.

FAQ: Spotfire Copilot for the Enterprise

Can we restrict Copilot access by user or environment?
Yes. Align Copilot to Spotfire roles/nodes/regions, or isolate in a separate environment for in-scope/out-of-scope data segregation.

Do we need the public cloud?
No. You can deploy fully on-prem, run a hybrid, or go multi-cloud depending on policy, latency, and cost.

Which LLM is best?
It depends on data sensitivity, prompt cost, speed, and approved providers. Many teams standardize on Azure OpenAI, test Claude/Gemini, and evaluate self-hosted models for private workloads.

How do we prevent hallucinations?
Use curated corpora, graph + vector retrieval, prompt templates, and strict RBAC. Keep a human in the loop for critical outputs.

Speakers:

Amanda Summers — Director of Client Engagement, Cadeon
Kelly Blair — Certified Trainer & Data Specialist, Cadeon
Vaibhav Gedigeri — Principal Data Scientist, Spotfire

Learn more on our training page at https://cadeon.com/training-services/

Consultation

Resources