ES
Selling Brief — Hadoop Analytics Platform AWS Migration (Professional Services)
Holiday-Seasonal Analytics Platform Migration —
Specialty Jewelry Retailer (World's Largest Jewelry Retailer)
2022  ·  Amazon Web Services  ·  Specialty Retail  ·  Hadoop + Solr + Oracle + Informatica + Mulesoft → AWS  ·  SOW Executed
✓ SOW Executed
AWS Holiday-Seasonal Sizing Hadoop / EMR Apache Solr Oracle RDS Informatica ETL Mulesoft ESB ActiveMQ / AmazonMQ 3-Scenario TCO Model Specialty Retail
5
Platform Stacks Migrated
Hadoop, Solr, Oracle, Informatica ETL, Mulesoft ESB — all scoped and priced together in a single ProServ SOW
150
Peak Season Days/Year
Oct–Feb (Holiday Rush) + Back to School — Hadoop workers sized at r5.8xlarge/256G to handle seasonal jewelry shopping data volumes
11
PROD Hadoop Workers
r5.8xlarge / 256G RAM / 32 vCPUs / 1.7TB IO2 20K IOPS each — compute-intensive analytics workload
SOW
Executed — ProServ
Signed as AWS Migration Professional Services engagement — targeted data platform migration, not full estate
~$500K
Automation Opportunity
Model identified $25K/mo ($500K+/yr) in additional savings through DEV/UAT/PROD automation — 25% basis at peak usage cost
3
Pricing Scenarios
Like-for-Like (inventory), Rightsized (usage-based), and Peak Usage (holiday) — each modeled for PROD, DEV, and TEST/UAT

Specialty Jewelry Retailer is the world's largest jewelry retailer — operating jewelry brand A, jewelry brand B, jewelry brand C, H.Samuel, Ernest Jones, and a portfolio of banner brands with thousands of retail locations across North America and the UK. Their business is one of the most seasonally concentrated in retail: the Oct–Feb window (Holiday Rush through Valentine's Day) represents the majority of their annual revenue, which means their analytics and data platform must be sized for peak retail season, not average load. The engagement was a targeted AWS Professional Services migration for their analytics stack — Hadoop, Solr, Oracle, Informatica ETL, and Mulesoft — not a general estate assessment. This made the pricing exercise uniquely demanding: you can't use average utilization for cluster sizing; you have to model peak compute capacity, peak IO throughput, and peak data volumes simultaneously.

The three-scenario TCO model (Like-for-Like from inventory, Rightsized from actual usage, and Peak Usage for holiday season) reflects exactly this complexity. The Hadoop worker nodes are r5.8xlarge instances — 256GB RAM, 32 vCPUs, IO2 with 20K IOPS — because the data processing workload during holiday shopping season requires both compute headroom and sustained disk throughput. The model also included an important architectural decision: should Mulesoft's ActiveMQ be lifted and shifted to EC2, or should Specialty Jewelry Retailer migrate to Amazon's native managed messaging service (AmazonMQ)? We modeled both options in the same workbook and flagged the trade-off explicitly.

Use this when selling to: Specialty retailers, luxury goods companies, and any retail or e-commerce organization with a Hadoop/Spark analytics platform and strong seasonal demand patterns. Also strong for any client evaluating a targeted analytics platform migration separate from their general estate, or clients at the "should we use managed AWS services vs. lift-and-shift" decision point.

Distributed Analytics
Apache Hadoop + AWS EMR
PROD: EC2 · DEV: EC2 · TEST: EC2 · EMR as managed option
The core analytics engine. PROD cluster: 3 master nodes (m5.8xlarge, 128G, IO2 10K IOPS) + 11 worker nodes (r5.8xlarge, 256G, IO2 20K IOPS, 1.7TB each). DEV/TEST: 3 master + 5 worker nodes. AWS EMR modeled as the managed alternative to EC2 Hadoop — cost included in PROD add-on costs ($2,759/mo).
m5.8xlarge (masters)r5.8xlarge (workers)AWS EMR option
Search Indexing
Apache Solr
PROD: 4 × EC2 · DEV: 2 × EC2 · TEST: 2 × EC2
Product and catalog search indexing layer — feeds the retail search experience. PROD: 4 × m6i.8xlarge (128G, 32 vCPUs, 1.7TB GP3 each). High-memory instances reflect Solr's index caching requirements. DEV/TEST use same instance type with reduced storage.
m6i.8xlargeGP3 storage
Relational Database
Oracle → AWS RDS
PROD: RDS MultiAZ · DEV: RDS Single-AZ
Oracle relational workloads migrated to RDS. PROD: db.m5.4xlarge MultiAZ (64G, 16 vCPUs, 2.5TB IO1 50K IOPS) — high-IOPS provisioned storage for transactional throughput. On-Demand $12,094/mo; 3yr RI $10,568/mo + $10,512 upfront. DEV: db.m4.16xlarge Single-AZ.
db.m5.4xlarge MultiAZIO1 50K IOPS
ETL / Data Integration
Informatica PowerCenter
PROD: 4 × EC2 · DEV: 1 × EC2 · TEST: 4 × EC2
Data integration platform for ETL pipelines feeding the Hadoop cluster and downstream systems. PROD: 4 × r6i.4xlarge (128G, 16 vCPUs, 2TB GP3) — memory-optimized for large data set transformations. On-Demand $841/mo per node; 3yr RI $439/mo per node.
r6i.4xlargeGP3 2TB
ESB / API Integration
Mulesoft (ESB)
PROD: 6 × EC2 · DEV: 2 × EC2 · TEST: 2 × EC2
Enterprise Service Bus for API and system integration. PROD: 6 × m6i.xlarge (16G, 4 vCPUs, 90GB GP3). On-Demand $191/mo each; 3yr RI $115/mo each — relatively low cost per node; value is in the integration fabric, not raw compute.
m6i.xlarge
Message Brokering
ActiveMQ / AmazonMQ
PROD: 6 × EC2 ActiveMQ · OR managed AmazonMQ
Message brokering for asynchronous communication across the analytics stack. Two options modeled: lift-and-shift EC2 ActiveMQ (6 × m6i.xlarge, $191/mo each on-demand) vs. Amazon MQ managed service (2 brokers w/ Active Standby, mq.m5.large — 4TB/mo outbound assumed). One option to be selected; both priced in the workbook.
EC2 m6i.xlargeAmazonMQ mq.m5.large

Additional PROD services: AWS WAF ($9,312/mo, 20 Web ACLs × 20 rules each), AWS Transit Gateway (data transfer), IBM Tivoli monitoring (1 × m6i.large). DEV/TEST include parallel stacks of all five platforms plus Solr and messaging layers. Total modeled nodes across all environments: ~70+ instances.

Scenario 1
Like-for-Like (Inventory)
AWS instance sized to match the current provisioned spec — same vCPU count, same RAM tier, same disk profile. Uses what's configured on-prem/in Advisory today, regardless of actual utilization. This is the highest-cost scenario: it preserves all current over-provisioning but ensures zero functional risk at cutover. Instance selection: RISC discovery inventory as the basis; matched to closest AWS instance family.
Scenario 2 — Recommended Baseline
Rightsized (Usage-Based)
AWS instance sized to actual observed CPU, memory, network, and disk I/O utilization from CloudScape / RISC discovery. Instances are downsized where utilization shows headroom. Instance Matching & Rightsized annotation applied to each node in the workbook. Recommended for steady-state planning — but must be paired with the Peak model for seasonal-capacity-constrained workloads like Hadoop workers during holiday.
Scenario 3 — Capacity Constraint
Peak Usage (Holiday Seasonal)
Cost modeled at full capacity utilization for 150 peak days/year: Oct–Feb Holiday Rush + Back to School. Peak hours modeled at 730.5 hrs/month equivalent (24×7). Hadoop workers at r5.8xlarge (256G) — peak sizing reflects max data processing volume during holiday shopping season, not average. Includes "add-back" calculation for peak hour demand on top of the baseline monthly cost. Automation opportunity: 25% reduction at peak = $25K/month or $500K+/year savings.
Stack Nodes AWS Instance On-Demand /mo 3yr RI /mo Notes
Hadoop Workers 11 r5.8xlarge (256G / 32 vCPU / 1.7TB IO2) $106,170 $96,975 Peak-season sizing; IO2 20K IOPS per node
Hadoop Masters 3 m5.8xlarge (128G / 32 vCPU / 1.2TB IO2) $8,201 $6,292 IO2 10K IOPS per node
Oracle RDS 1 (MultiAZ) db.m5.4xlarge (64G / 16 vCPU / 2.5TB IO1) $12,094 $10,568 + $10,512 upfront IO1 50K IOPS; 2nd node covered by MultiAZ pricing
Solr 4 m6i.8xlarge (128G / 32 vCPU / 1.7TB GP3) $5,422 $2,971 Search index caching requires high-memory tier
AWS EMR Managed Managed Hadoop Service $2,759 $2,759 Managed alternative to EC2 Hadoop; no RI option
Informatica ETL 4 r6i.4xlarge (128G / 16 vCPU / 2TB GP3) $3,364 $1,756 Memory-optimized for large ETL transformations
Mulesoft ESB 6 m6i.xlarge (16G / 4 vCPU / 90GB GP3) $1,147 $688 3yr No Upfront RI
ActiveMQ 6 m6i.xlarge (16G / 4 vCPU / 90GB GP3) $1,147 $688 OR AmazonMQ managed broker — both priced; select one
AWS WAF Service 20 ACLs × 20 rules/ACL $9,312 $9,312 Web Application Firewall; no RI; $370 one-time setup
Tivoli Monitoring 1 m6i.large (8G / 2 vCPU / 64GB GP3) $119 $81 IBM Tivoli monitoring node
PROD Subtotal (ex-TGW) ~$149,735/mo ~$131,130/mo Transit Gateway data transfer additional (variable by volume)

the retailer's analytics platform cannot be sized for average load — it must be sized for peak. The Oct–Feb window (Holiday Rush: Christmas, Valentine's Day) plus Back to School represents the high-demand period for jewelry retail. In the TCO model, 150 days/year are designated as peak days. During peak, Hadoop workers process significantly higher data volumes: more transactions, more customer behavioral data, more inventory signals, more search queries feeding the Solr layer. The Hadoop workers are r5.8xlarge instances (256G RAM, IO2 20K IOPS, 1.7TB storage) — not because average utilization requires this, but because the peak workload does.

This creates an interesting cost optimization opportunity: cloud elasticity means you don't have to keep 11 r5.8xlarge workers running 24/7 year-round. The model explicitly identified a $25K/month or $500K+/year savings from automating DEV/UAT/PROD scaling — spinning down or downsizing non-production environments outside peak windows, and implementing auto-scaling for the production Hadoop cluster during off-peak periods. This is the direct counter-argument to "the cloud is more expensive than on-prem" for seasonal workloads: on-prem is always sized for peak; cloud can be sized for average and burst for peak.

The three-scenario pricing model makes this tangible: Like-for-Like (always-on peak sizing) vs. Rightsized (average utilization baseline) vs. Peak Usage (holiday throughput). The delta between the Rightsized and Peak scenarios represents the cloud elasticity dividend — the cost you'd incur on-prem every day but only need to pay in cloud during the ~150 peak days.

Option 1 — Lift & Shift
EC2 ActiveMQ (6 × m6i.xlarge)
Run the existing Apache ActiveMQ broker configuration on EC2 instances. Same configuration as on-prem — minimal migration effort for the messaging layer, preserves any custom broker configuration, plugins, and client compatibility. PROD: 6 × m6i.xlarge at $191/mo on-demand each ($1,147/mo total). DEV and TEST: equivalent m5.large nodes at $74/mo each. Operationally more work: patching, broker management, HA configuration are all on the customer's team.
Option 2 — Cloud-Native
Amazon MQ (2 Brokers, Active Standby)
Amazon's managed ActiveMQ-compatible service — same protocol (AMQP, STOMP, OpenWire, MQTT) as on-prem ActiveMQ, meaning existing client code works without changes. Fully managed: patching, HA, backups handled by AWS. Pricing model: 2 × mq.m5.large brokers with Active Standby configuration. Data transfer costed at 4TB/month outbound. Higher per-node cost than EC2 ActiveMQ but eliminates broker operational overhead entirely. The note in the workbook explicitly flagged: "Either ActiveMQ or AmazonMQ will be used — final calculator should remove one of the highlighted sets." Both priced so the client can make the trade-off decision.
Automation Opportunity
$500K+/year
Model explicitly identified $25K/month in savings from DEV/UAT/PROD automation — auto-scaling Hadoop cluster for off-peak, scheduled spin-down of non-production stacks, and dynamic Solr index tier management. This was not a vague estimate: the model calculated it at 25% of peak usage cost. Implementing this automation is a natural next engagement after the migration lands.
General Estate Migration
Full Retail Estate
The Specialty Jewelry Retailer SOW was scoped to the analytics/data platform — Hadoop, Solr, Oracle, Informatica, Mulesoft. The broader Specialty Jewelry Retailer retail technology estate (point-of-sale, inventory management, store systems, e-commerce platform) was not in scope. A successful analytics platform migration builds the Advisory relationship and credibility for a broader estate assessment and migration engagement.
Managed AWS Services
Post-Migration Ops
At a PROD infrastructure run rate of ~$130K–$150K/month, a Advisory Managed AWS layer (FinOps cost governance, automated rightsizing alerts, security operations, patch management) has strong ROI and fits the $35K/mo Service Blocks tier or custom pricing based on actual AWS spend. The holiday-season cost spike makes FinOps and auto-scaling automation particularly valuable — continuous rightsizing during off-peak = real dollars saved.

Retail Analytics Platform Migrations, Seasonal Workloads, and Targeted Scope Migrations

"We need to migrate our Hadoop cluster and data analytics platform to AWS, but our business is highly seasonal. We're worried about over-provisioning for peak and wasting money in off-peak."
This is exactly the Specialty Jewelry Retailer engagement. World's largest jewelry retailer — Kay, jewelry brand B, jewelry brand C — with a Hadoop analytics platform running on large EC2-equivalent nodes sized for holiday season. The Oct–Feb window (Christmas, Valentine's Day) drives the majority of their jewelry revenue, and the Hadoop workers have to handle that data volume at peak. The TCO model was built with three scenarios: Like-for-Like from inventory (always provisioned at peak), Rightsized from actual usage (average load), and a Peak Usage model that specifically calculates the Oct–Feb cost delta. The cloud's answer to seasonality isn't "accept the cost" — it's automation. We modeled a $25K/month or $500K+/year savings from auto-scaling the cluster during off-peak periods. That number came directly from the rightsizing analysis: the difference between what you'd pay year-round at peak sizing vs. what you'd pay if you could scale down during the non-peak 215 days. Cloud elasticity converts a fixed capital cost into a variable operational cost — and for a jewelry retailer, that seasonal variability is extremely valuable.
"We're running Hadoop on-prem and evaluating whether to migrate to EC2 or move to Amazon EMR as a managed service."
We ran this exact evaluation for Specialty Jewelry Retailer. The short answer: both paths were priced in the same workbook, and the decision comes down to operational maturity and data platform complexity. For Specialty Jewelry Retailer, we modeled: EC2 Hadoop (3 master + 11 worker nodes, fully self-managed) at specific instance types with IO2 storage for the throughput requirements; and Amazon EMR as the managed alternative, priced at about $2,759/month for the managed service layer on top of whatever EC2 instance type you choose. EMR handles cluster bootstrapping, software version management, scaling, and integration with S3 as the storage layer — significantly reducing operational overhead. For a retailer whose IT team's core competency is retail analytics, not Hadoop infrastructure management, EMR is usually the right answer. The trade-off is flexibility: EC2 Hadoop gives you full control over the Hadoop distribution, plugins, and cluster configuration; EMR constrains you to AWS-managed distributions and configurations. If you have custom Hadoop modifications or non-standard HDFS configurations, EMR requires a migration effort within the migration. For standard HDP or CDH workloads, EMR is almost always the right long-term answer.
"We have a mixed data platform — Hadoop, Oracle, Informatica, Mulesoft — and no one vendor seems to understand the full stack. Can Advisory scope a migration for a heterogeneous platform like this?"
The Specialty Jewelry Retailer engagement was precisely this. Five distinct technology stacks — Hadoop cluster, Apache Solr search indexing, Oracle database (migrated to RDS MultiAZ), Informatica PowerCenter ETL, and Mulesoft ESB — all in scope simultaneously in a single Professional Services SOW. Plus the messaging evaluation (ActiveMQ vs. AmazonMQ) and the security layer (AWS WAF). The TCO workbook had a tab for each environment (PROD, DEV, TEST) and each technology priced independently with the right AWS service for each component. Oracle went to RDS MultiAZ with IO1 50K IOPS provisioned storage — not EC2 Oracle, because RDS provides the HA and backup automation that the current on-prem setup likely required engineering effort to maintain. Informatica stayed on EC2 (r6i.4xlarge memory-optimized) because Informatica doesn't have a managed AWS equivalent and the lift-and-shift path is straightforward. Mulesoft and ActiveMQ were priced both as EC2 and as managed services (AmazonMQ). The architecture decision for each stack was made based on what the specific technology needed — not a blanket "everything to EC2" or "everything to managed services" approach.
Client: Specialty Jewelry Retailer Ltd. — world's largest specialty jewelry retailer, operating jewelry brand A, jewelry brand B, jewelry brand C, Peoples, H.Samuel, and Ernest Jones brands. Corporate domain: signetomni.com. Publicly traded (NYSE: SIG). Anonymized as "Major Specialty Jewelry Retailer" or "Specialty Retail" in external references. Engagement: Hadoop Analytics Platform AWS Migration — Professional Services. SOW: "Specialty Jewelry Retailer Hadoop AWS Migration Proserv - EXECUTED" (2022). Scope: Targeted data/analytics platform migration (Hadoop, Solr, Oracle, Informatica, Mulesoft, ActiveMQ/AmazonMQ). NOT a full estate migration. Deliverables: 3-scenario TCO model (Like-for-Like, Rightsized, Peak Usage) — multiple versions through v1.22; Server Inventory Application Mapping; Asset Error remediation; RISC-based IaaS cloud pricing; CloudScape export (cloud-cost-assets-2022-06-21). AWS regions: US Virginia (primary). Use "Major Specialty Jewelry Retailer" or "Specialty Retail / Specialty Consumer" in external references unless otherwise confirmed with account team. Do not reference specific brand names (Kay, jewelry brand B, jewelry brand C) in external materials.