Real-time Point-of-Sale Analytics With a Data Lakehouse

Disruptions in the supply chain – from reduced product supply and diminished warehouse capacity – coupled with rapidly shifting consumer expectations for seamless consumer demands in the new normal. In this blog, we’ll address the need for real-time data in retail, and how to overcome the challenges of moving real-time streaming of point-of-sale data at... The post Real-time Point-of-Sale Analytics With a Data Lakehouse appeared first on Databricks.

Source post →

Scaling LinkedIn's Hadoop YARN cluster beyond 10,000 nodes

Co-authors: Keqiu Hu, Jonathan Hung, Haibo Chen, and Sriram Rao At LinkedIn, we use Hadoop as our backbone for big data analytics and machine learning. With an exponentially growing data volume, and the company heavily investing in machine learning and data science, we have been doubling our cluster size year over year to match the compute workload growth. Our largest cluster now has ~10,000 nodes, one of the largest (if not the largest) Hadoop clusters on the planet. Scaling Hadoop YARN has emerged as one of the most challenging tasks for our infrastructure over the years. In this […]

Source post →

Introducing LinNét: Using Rich Image and Text data to Categorize Products at Scale

In this post, we’ll discuss how we evolved and modernized our product categorization model that increased our leaf precision by 8% while doubling our coverage. We’ll dive into the challenges of solving this problem at scale and the technical trade-offs we made along the way.

Source post →

Using Twin Neural Networks to Train Catalog Item Embeddings

Understanding the contents of a large digital catalog is a significant challenge for online businesses, but this challenge can be addressed using self-supervised neural network models. Product discovery in particular becomes difficult when a digital catalog gets to a size that is too large to manually label or analyze. For DoorDash, having a deep understanding ... The post Using Twin Neural Networks to Train Catalog Item Embeddings appeared first on DoorDash Engineering Blog.

Source post →

GPU-Accelerated Deep Learning Can Spot Signs of Early Alzheimer’s With 99% Accuracy

Speedy diagnoses are critical, especially when a loved one seems to be slowly losing their cognitive abilities. Researchers from the Kaunas University of Technology in Lithuania report they’ve developed a deep learning-based method able to predict the possible onset of Alzheimer’s disease from brain images with an accuracy of over 99 percent. The impact of The post GPU-Accelerated Deep Learning Can Spot Signs of Early Alzheimer’s With 99% Accuracy  appeared first on The Official NVIDIA Blog.

Source post →

What Your More Experienced Programming Partner Learns from You

At Atomic, we program in pairs a lot. It’s how we teach each other and learn from one another. Pairing is how we build confidence that we’re building something actually great, not just something that makes sense to me when I’m under-caffeinated. It’s also core to the way that we run our business. We like […] The post What Your More Experienced Programming Partner Learns from You appeared first on Atomic Spin.

Source post →

Peek-a-boo! Sometimes No Data is the Answer

Sometimes experiments are not to improve specific KPIs, but to disprove novelty effects or the null hypothesis. The post explains the thought process with examples to share that clarity on what they mean by “no-data”.

Source post →

Chaos Experimentation, an open-source framework built on top of Envoy Proxy

Services are bound to degrade. It’s a matter of when, not if. In a distributed system where there are many interdependent microservices, it is increasingly difficult to know what will happen when a service is unavailable, latency goes up, or when the success rate drops. Usually, companies find out the hard way when it happens in production and it affects their customers. This is where Chaos Engineering helps us.

Source post →

The Architecture of Uber’s API gateway

API gateways are an integral part of microservices architecture in recent years. An API gateway provides a single point of entry for all our apps and provides an interface to access data, logic, or functionality from back-end microservices. It also … The post The Architecture of Uber’s API gateway appeared first on Uber Engineering Blog.

Source post →

Sharing learnings about our image cropping algorithm

Twitter shares a technical analysis of its assessment for potential bias in its image cropping algorithm as part of its efforts to be more transparent around how it uses machine learning to improve pe

Source post →

Threading at the Speed of Light

This two part series by NYT is interesting. Part one, talks about how to build consensus around changing best practices in tech. It is more towards aligning internal teams towards new changes. In part two, authors go in-depth with examples and constraints.

Source post →

Running Border Gateway Protocol in large-scale data centers

What the research is: A first-of-its-kind study that details the scalable design, software implementation, and operations of Facebook’s data center routing design, based on Border Gateway Protocol (BGP). BGP was originally designed to interconnect autonomous internet service providers (ISPs) on the global internet. Highly scalable and widely acknowledged as an attractive choice for routing, BGP [...]

Source post →

Algorithm-Assisted Inventory Curation

Personalizing fashion at scale requires that we build an inventory whose size and complexity are as great as that of our client base. To support our inventory expansion and our broader supply chain management, Stitch Fix has developed a suite of algorithms to act as a new inventory recommender system.

Source post →

Engineering Career Series: Using structured interviews to improve equity

For years, Yelp continued to use an interview process that was created when we were a 50-200 person Engineering organization, with only a handful of interviewers: Each interviewer wrote their own interview questions A few senior leaders gave overall hire/no hire decisions for every panel Interviewers received ad hoc feedback from senior leaders when it seemed like they were too tough or too easy in their interviews A few things went well: there was a strong sense of personal responsibility for both leaders and interviewers turnaround time for offer approvals was quick and Yelp values could be preserved by senior...

Source post →

A Heuristic for Multiple Times Speed-up of Model Training

The first question that comes to mind is HOW? The answer is simple, reduce the number of datapoints. However, a more interesting question is the way in which datapoints are reduced.

Source post →