How Amazon tests their products
At PlayerZero, we're constantly evaluating best practices from market leaders in the tech industry, specifically MAMAA companies (Meta Platforms, Amazon, Apple, Microsoft, and Alphabet), to help set guiding principles for delivering E2E value. Note - this collection of companies used to be referred to as FAANG, but has been updated to reflect current market players and rebrands across the various constituent companies.
When you work at one of these behemoths, you have access to endless amounts of resources intended to support product managers & developers.
Now let’s say you’re not a PM or engineer at a FAANG company, and you’re thinking to yourself… “I wonder what it would be like to have that type of infrastructure and support in my day-to-day work life, and what impact that would have on my productivity and general enjoyment in my job.”
To answer this question, let’s take a closer look at one particular example company to shed light on the resources available to you when you work at a big tech company, and how PlayerZero is matching it step-for-step with a streamlined data-first testing experience.
Amazon was originally created to optimize the book buying process and has since transitioned into a juggernaut that has single handedly transformed several different consumer & B2B industries (from the shopping, delivery and supply chain process to cloud computing and storage).
In fact, Amazon has gotten so big that when you reach a checkout page, you might see over 10,000 service calls on the backend. This, obviously, places a lot of importance on consistently and thoroughly testing new code. One line of rogue code can cripple a system (cough, cough Facebook…). And when you take into consideration that Amazon.com sees a worldwide usage of 197 million people, you can only imagine the constant stress being places on the system.
But for Amazon, it goes even deeper than tools and cadences for writing and maintaining tests, it goes as far as the construction of the team you work on, the expectations set by your team leader, and the systemic attitude towards unidentified errors that make it through to production. Let’s take a closer look at how Amazon builds and runs teams and controls the quality of their code.
The Amazon testing methodology
The sheer size of a company like Amazon requires an effort to control as many variables as possible, or things can get out of hand, quickly. Just like their business model and its focus on total A->Z experience oversight, Amazon tries to control each leg of the developer journey, allowing them to have confidence in the code that they commit.
So how do they do this? Well first off, they build their own internal tools. When you ask a veteran Amazon engineer what their thoughts are on some of the leading testing tools in the software industry, there’s a high likelihood that they would have no idea what you’re talking about.
This is by design. There are no third party integrations or hooks into outside platforms at Amazon. It’s an in-house, customized machine.
But it wasn’t always this way. Early on, Amazon made efforts to integrate industry tools to their workflow. Remedy, for example, was used for ticketing at Amazon, but Amazon simply outgrew it, both internally and externally. They weren’t as agile as they needed to be; if a new direction was needed, being hamstrung by the pace of third parties was just not something that they could let happen.
This same idea goes for quality monitoring, it’s a completely in-house experience. What’s interesting about Amazon, however, is that their testing practices are mostly carried out by the developers themselves, while QA divisions focus on developing infrastructure and completing high level, functional testing. This goes against what you see at most other growing companies, who have unique subdivisions within their company for writing code and testing code.
At Amazon, automation is key to success. The more automation set in place, the higher the iteration speed. Testing new code early and often is crucial for reaching quality, faster, but only when the bumper guards are in place to allow developers to do it within a safe environment.
As a part of Amazon’s end-to-end test experience, developers are expected to continually test everything, whether that be integration testing, cross browser testing, regression testing, etc. Every scenario should be tested by the developer, but again not without the support automations built into their workflow by QA engineers. It is essential to highlight here that while QA engineers are not testing the majority of the code written, they are creating a foundation for engineers to write fast and with confidence.
Now is this a perfect system? Probably not, as developers are still tasked with writing tests on top of automations, and that still takes a significant amount of time (a process which can take up to 2-3 weeks with each new feature). But, like we mentioned before, it allows devs to iterate fast and control their own quality and delivery timelines, which is a huge win in a massive organization like Amazon.
After new code has passed through the rigorous Amazon pre-production testing process there should be no bugs, right? Wrong, this does happen. And this is one main reason why Canary testing exists.
Canary testing has become increasingly popular among big tech; it is a system that allows product teams to test products on a subset of the user population, identifying potential bugs or product missteps before they can affect an entire user base. This is a technique Amazon deploys for its new builds.
Think about it like this: Imagine taking a new build and targeting it towards a small slice of users. Then integrate Sentry and Google Analytics to track the interaction. Harnessing the power of monitoring and analyzing usage with one single mechanism not only brings quality to the forefront, but it brings user experience optimization with it.
Again, another checks and balances system that delivers confidence and protection against uncaught errors.
So what if STILL, a bug persists, what happens?! Well, that depends on the bug.
Amazon is known for their hierarchy of discovery. They are as follows:
Level 5 - Not a huge deal, let’s get it fixed.
Level 4 - A slightly bigger deal, let’s prioritize this.
Level 3 - Ok, now we’re seeing an impact. Get it fixed today.
Level 2 - If not fixed in 30 minutes, its moving up the chain
Level 1 - CEO gets alerted - All hands on deck (Most of the time, its because someone did something silly, like delete a data table)
If there’s a Level 1 or 2 incident that squeaks through the cracks, queue red flashing lights and sirens -Houston, we have a problem. A few years ago, when previous CEO of Amazon Worldwide Consumer Jeff Wilke ordered an item on Amazon.com, but it was delivered from Fresh instead, it was a bad day for the team. You never want one of the highest ranking members of one of the world's biggest organizations to come across a bug in casual product use rather than the stacked ecosystem that has been built out to catch such a problem.
Outside these glaring high-impact issues, code is expected to be clean for Unit level bugs (level 4 or 5) as well. In the case that something does slip through the cracks, the developer is called into a meeting with their manager to discuss the problem in depth. Sometimes, depending on the manager, they will be assigned a COE - correction of exception:
In response to the error and preparation to limit future instances of this occurring, the developer is asked to outline 3-5 whys for how this occurred, identify the lessons they learned, and clearly pinpoint what caused it. It's not just ‘which line of code was the problem?”, it’s a “here’s the systemic issue that led to this occurring, and every step that led up to it”.
Managers at Amazon care most about monitoring the functionality that will affect the most users (ie, prod level, user-facing incidents), but nonetheless expect bugless code upon release.
While all Amazon teams are unique in their construction and use of internal collaboration tools, they usually follow the two pizza team structure. What does this mean? Well, if you can’t feed the entire team with 2 pizzas (8-12 people), it’s likely too big. And for big eaters like myself? Well, maybe there’s some flexibility, but you get the point.
The ability to know your team and keep open and honest lines of communication are of paramount importance with an organization so big. You have a job and it’s expected that you will do it to the best of your ability; you will be held accountable for your work. The larger the team, the easier it is to find grey areas in deliverables and get lost in the weeds.
Velocity vs Lead Time
Amazon gives PMs and engineers the autonomy to execute and maintain the quality of their application - they are expected to do it all: write, manage, test, etc.
An interesting measure of impact and success used by managers at Amazon is Velocity vs Lead Time. Developers are given the power to set their own deadlines, as long as they can clearly articulate how fast they can build and how much time they need to reach their deadline. If they are able to bring in (or build) a tool that helps write integration tests, for example, the lead time will go down, and the velocity will go up! This is a big picture solution for optimizing workflow and achieving more consistent successful outcomes.
Amazon puts a lot of faith in their developers to execute in a way that is best for the entire team, and they measure only when they achieve their goal, not how they got there. (Output over input).
How PlayerZero helps you test like an Amazon engineer
As a startup that's reimagining incident detection management, we take every opportunity to learn about other companies and benchmark ourselves against them in meaningful ways. We constantly ask ourselves: What works, and what doesn’t? What are the best practices for success that we might be able to build into our own product? When we look at Amazon, we see a few core tenants that they use that are built into our product experience:
Push early and often - quality code is driven by consistent feedback and iterations
- With PlayerZero - connect all the moving pieces, from backend to frontend and mobile to see where things are breaking and what the knock on effect is. Since we intake all your various streams of data, you never have to guess about where to iterate or debug next.
Collaboration levers are essential for solving problems.
- With PlayerZero - share issue reports directly with your developers and/or link them directly to a Datadog, Sentry, or Bugsnag issue. Dive into devtools and make comments to help guide your developers and exchange context.
Prioritize the high impact bugs, and the rest will follow.
- With PlayerZero - monitor the flows in you application that translate to actual dollars, and don’t waste time prioritizing fixes for non mission-critical sections of your product. Avoid the time suck and tangibly move your product experience forward for your users.
Monitor the impact on real people
- With PlayerZero - at the end of the day, customers are king. PlayerZero empowers you to get closer to your users with a full de-anonymized list of affected users for each and every issue.
We took what MAMAA (FAANG) companies do for testing, and built it into an end-to-end experience that can be run by a single product manager or developer in 5 minutes. Sign-up and give it a try today!