Can I use Polars and pandas in the same project without conflicts?

Yes. They coexist fine as pip packages. You can convert between them with `polars_df.to_pandas()` and `pl.from_pandas(pandas_df)`. The conversion copies data into memory, so don't do it in a hot loop. A common pattern is using Polars for ingestion and transformation, then converting to pandas only for a library that doesn't support Polars yet (like some scikit-learn utilities that expect a pandas DataFrame).

Is Marimo production-ready or still experimental?

As of mid-2024, Marimo hit 0.7.x and I've been using it for non-trivial internal tooling without issues. The reactive execution model is stable. The area that's still maturing is the widget/UI component library — if you're building complex interactive dashboards, you'll find some gaps compared to the Jupyter ipywidgets ecosystem. For pure data analysis notebooks, it's solid.

Do I need DVC if I'm already using MLflow?

They solve adjacent but different problems. MLflow tracks experiment metrics and model artifacts after training. DVC versions the raw data and defines the pipeline DAG before training. You can run them together — DVC manages data versioning and pipeline stages, MLflow logs the metrics from those runs. If you're on a small team and want to minimize infra, DVC's experiment tracking (`dvc exp run`, `dvc exp show`) can replace MLflow for many workflows.

uv is new — should I trust it for a production CI pipeline?

uv is built by Astral, the same team behind Ruff, and it's backed by real investment and active development. The lockfile format (`uv.lock`) is deterministic and cross-platform. I've been running it in GitHub Actions CI since early 2024 with zero issues. The one thing to know: pin your uv version in CI with `pip install uv==0.x.x` or use the official `astral-sh/setup-uv` GitHub Action — you don't want a uv update changing behavior mid-sprint.

The 5 Python Data Science Tools I Actually Have Open Every Day (and Why I Ditched the Others)

My Stack After Burning Through a Dozen Tools

Most “best tools for data science” lists are written by someone who watched a YouTube tutorial, got a Jupyter notebook running, and called it research. I know because I wrote one of those lists in 2022 and I’m embarrassed by it now. The tools I actually depend on today look nothing like what I’d have recommended back then — and the gap between “I tried this” and “I use this at 2am when a pipeline breaks” is enormous. What changed my thinking wasn’t reading more blog posts. It was shipping a production ML pipeline to a client, watching it eat 8GB of RAM on a CSV parse, and rebuilding the whole stack from scratch over three weeks.

Here’s what I actually need daily: fast iteration without restarting the kernel every time I change a function signature, reproducible environments that behave identically on my MacBook and the Ubuntu EC2 I deploy to, and data pipelines that don’t silently corrupt or crash when files hit 3–5GB — which happens more than anyone admits. I also need tooling that doesn’t require a 40-minute setup ritual every time I start a new project. That last requirement killed more tools than anything else. Papermill looked great until I realized every parameterized notebook run required me to babysit a config file that drifted from the actual code within a week.

Two tools I dropped since 2022: Conda (replaced with a tighter combination of pyenv + pip-tools for most projects, and Docker for anything client-facing), and standard Jupyter notebooks for anything beyond quick EDA. Conda’s environment solves felt like paying a 90-second tax every single command. The conda env export output is also famously unreliable across platforms — the --from-history flag helps but it doesn’t capture transitive deps properly. I’ve had “works on my machine” disasters with Conda environments more than once, and that’s the one thing I can’t tolerate.

What replaced them tells you more than any feature list. For environments, I now pin dependencies with pip-tools and a two-file setup:

# requirements.in — what I actually want
pandas>=2.0
scikit-learn
polars

# then run:
pip-compile requirements.in
# outputs requirements.txt with every transitive dep pinned and hashed

That requirements.txt is deterministic and portable. I check it into git. Junior devs on my team can run pip install -r requirements.txt and get exactly what I have. No resolver drama, no platform-specific build failures hiding behind a conda install that silently grabbed a different binary. The gotcha nobody mentions: pip-compile respects your current Python version when resolving, so if you compile on 3.11 and deploy on 3.10, you can still hit conflicts. Always compile inside a pyenv shell that matches your target environment.

For broader tooling context — if you’re running a small operation where the data science stack is just one piece of a larger software and ops puzzle — the Essential SaaS Tools for Small Business in 2026 guide is worth a read. It covers the ecosystem around dev tooling that most purely technical guides ignore, like where your notebooks actually live in a business context. But the five tools below are the ones I’d rebuild my setup around if I had to start over today — chosen specifically because they survive contact with real data, real deadlines, and the specific hell of a 4GB CSV that someone sent you in an email.

1. Marimo — JupyterLab Replacement I Didn’t Expect to Love

Why I Switched Away from Jupyter (And What It Cost Me)

I hit the same bug twice before I took it seriously. A notebook I’d been using for a client churn model was producing different outputs depending on which cells got re-run during exploration. The model trained fine, the results looked reasonable, but the data preprocessing steps had drifted from what was actually serialized. Jupyter’s cell execution model makes this trivially easy to do — you edit a cell mid-session, run it out of order, and your kernel state no longer matches what the file describes. The second time this happened in a production context, I went looking for an alternative and found Marimo.

The core thing Marimo does differently is reactive execution. Every cell that depends on a variable you changed gets re-run automatically, like a spreadsheet. You can’t run cells out of order because Marimo builds a dependency graph and enforces it. If cell B uses a variable defined in cell A, Marimo knows that — and if you delete cell A, cell B immediately errors rather than silently using a stale value from memory. This sounds annoying until you realize it’s exactly what you wanted your notebooks to do all along.

Installation and the File Format Shift

Getting started is fast:

pip install marimo
marimo edit notebook.py

That .py extension is not a typo. Marimo notebooks are valid Python scripts. Every cell becomes a function internally. This means you can run a Marimo notebook directly from the terminal with python notebook.py or import functions from it like any other module. No more nbconvert, no more stripping output before committing, no more JSON merge conflicts that make your tech lead want to cry.

The thing that actually surprised me the most wasn’t the reactive execution — it was opening a pull request with a Marimo notebook change and being able to read the diff like normal code. A Jupyter .ipynb file is a JSON blob with cell outputs embedded. One changed line in your analysis produces a diff spanning hundreds of lines of serialized output. A Marimo diff looks like a Python diff. That alone made my code review process meaningfully better.

The Honest Trade-offs

The ecosystem gap is real and I won’t sugarcoat it. If you’ve spent months building a Jupyter workflow with ipywidgets, custom nbextensions, or Voilà for dashboarding, you’re going to feel the friction immediately. Marimo has its own UI component system — sliders, dropdowns, tables — and they work well, but they don’t map 1:1 to ipywidgets. Some things I used regularly (interactive matplotlib backends, specific JupyterLab panels) either don’t exist yet or require rethinking how you build interactivity. This isn’t a dealbreaker but budget time for it.

When You Should Actually Stay on JupyterLab

Large team already on nbconvert pipelines: If your organization runs Jupyter notebooks as part of automated report generation via nbconvert or Papermill, migrating to Marimo means rebuilding that infrastructure. The cost is real and the business case has to be there.
Heavy ipywidgets dashboards: If you’ve shipped internal tools built on Voilà + ipywidgets, Marimo’s component system is a rewrite, not a migration.
Institutional JupyterHub setups: If your data team lives inside a managed JupyterHub, switching one person to Marimo creates a split environment that causes its own headaches.

Use Marimo if you’re working solo or on a small team, you’ve been burned by reproducibility issues, and you want notebooks that behave like software rather than interactive scratchpads. The productivity gain from readable git history and reactive execution is genuine — but only if the migration cost is proportional to your team size and existing tooling investment.

2. Polars — I Moved Off Pandas for Anything Over 500MB

A .groupby().apply() on a 2GB CSV file. Pandas. Four minutes. I watched the memory climb past 14GB on a 16GB machine and genuinely wasn’t sure it would finish. I ran the same operation in Polars’ lazy API and it completed in 18 seconds — same laptop, same data, no config tweaks. That was the moment I stopped treating Polars as a curiosity and started treating pandas as the legacy option.

Installation is a non-event, which matters more than people admit. No Rust toolchain to install, no C compiler headaches, no fighting with conda environments:

pip install polars

Wheels ship prebuilt for Windows, Mac, and Linux. You get it in the same time it takes to install requests. Then you immediately want to use pl.scan_csv() instead of pl.read_csv() — this is the lazy API entry point and it changes everything about how you think about the pipeline:

import polars as pl

# Don't do this for large files
# df = pl.read_csv("big_file.csv")

# Do this instead — nothing executes yet
df = (
    pl.scan_csv("big_file.csv")
    .filter(pl.col("revenue") > 1000)
    .group_by("region")
    .agg(pl.col("revenue").sum())
    .collect()  # execution happens HERE
)

The query optimizer sees your full chain before touching the disk. It pushes filters down, skips columns you never reference, and handles predicate pushdown automatically. For out-of-core processing on files that exceed your RAM, add this before your pipeline runs:

pl.Config.set_streaming_chunk_size(50_000)

That tells Polars to process 50,000 rows per chunk in streaming mode. Combine it with scan_csv() and you can process files larger than your available memory without spinning up a Spark cluster.

The API Is Not a Pandas Drop-In — Stop Trying to Map It

The thing that caught me off guard was assuming I could just swap method names. df['col'] does return a Series, so that part feels familiar. But the moment you reach for .loc[] or .assign(), you’re in the wrong mental model. The Polars way uses .filter() and .with_columns() chains:

# pandas way (don't port this directly)
df = df[df['sales'] > 500]
df = df.assign(margin=df['profit'] / df['revenue'])

# polars way
df = (
    df
    .filter(pl.col("sales") > 500)
    .with_columns(
        (pl.col("profit") / pl.col("revenue")).alias("margin")
    )
)

The chain approach is genuinely cleaner once it clicks, but there’s a real gotcha hiding in it: debugging a seven-step chain where something produces nulls in step four is painful. You can’t just print mid-chain without breaking it. My actual advice while you’re learning — use .collect() at each step and print the intermediate frames. Yes, it defeats the lazy optimization temporarily. That’s fine while you’re learning the shape of the data:

step1 = pl.scan_csv("data.csv").filter(pl.col("amount") > 0)
print(step1.collect().head())  # check here

step2 = step1.with_columns(pl.col("amount").log().alias("log_amount"))
print(step2.collect().head())  # check here too

Once you trust the chain, remove the intermediate .collect() calls and let the optimizer work. But fighting a black-box chain on day two of using Polars will just make you hate it.

When Pandas Still Wins

GeoPandas pipelines: Polars has no native geometry dtype. If your workflow touches shapefiles, spatial joins, or coordinate projections, you’re on pandas. Full stop.
Junior teams: The pandas mental model is what most data science courses teach. If your team is new to data engineering, forcing a Polars migration adds a learning curve that may not pay off on 200MB files that run fine in 30 seconds anyway.
Existing codebases: If you have 40,000 lines of pandas code with heavy use of MultiIndex and custom .apply() functions, a rewrite will take longer than the runtime savings justify — unless those slowdowns are genuinely costing you.

The honest split in my own work: anything exploratory on a small dataset, I still open pandas because I know every method by reflex. Anything over 500MB going into a repeatable pipeline, I reach for Polars and don’t look back.

3. Ruff — Replaced Flake8, Black, and isort With One Binary

My pre-commit setup before Ruff looked like this: flake8 for linting, black for formatting, isort for import ordering, and pylint for deeper static analysis. On a medium-sized data science repo — roughly 15k lines of Python with a bunch of notebooks converted to scripts — that hook took 8 seconds every single commit. Eight seconds doesn’t sound catastrophic until you’re committing frequently during exploratory work and you start hitting --no-verify out of habit. That’s when your pre-commit setup has failed.

Ruff runs all of that — flake8 rules, black-compatible formatting, isort-equivalent import sorting — in under a second on the same codebase. It’s written in Rust, which is why. Install it and wire it up:

pip install ruff

Then drop this into your pyproject.toml:

[tool.ruff]
line-length = 88

[tool.ruff.lint]
select = ["E", "F", "I", "N"]

That config covers pycodestyle errors (E), pyflakes checks (F), isort-style import ordering (I), and PEP 8 naming conventions (N). The line-length = 88 matches Black’s default, so if you’re migrating a codebase that was already Black-formatted, nothing explodes on day one.

The --fix flag is the part most people sleep on. Run ruff check . --fix and it will actually rewrite your files — fix unused imports, reorder imports, fix whitespace violations. I was nervous about this the first time and ran it on a branch with git diff open. Every change it made was correct. For the rule categories it touches (I, E, some F), it’s mechanical and safe. It won’t touch your logic. Use it.

Here’s something I’d recommend doing exactly once: run ruff check . --select ALL on your existing codebase. It will surface everything you’ve been quietly ignoring — missing docstrings, missing type annotations, complexity violations, security patterns. It’s humbling. Then selectively disable the stuff you’re not ready to fix yet:

[tool.ruff.lint]
select = ["E", "F", "I", "N"]
ignore = ["D100", "D101", "D102", "ANN001", "ANN201"]

D rules are docstring enforcement and ANN rules are annotation checks — both useful eventually, both brutal if you turn them on cold in an existing project. Disable them explicitly rather than leaving them unselected so you’re making a conscious choice, not an accidental one.

Honest limitation: pylint’s class design rules — things like detecting when a class has too many responsibilities, spotting attribute access patterns that suggest a design smell — don’t have Ruff equivalents yet. I still run pylint in CI on pull requests. I just don’t run it in pre-commit anymore because the speed penalty isn’t worth it for something that runs on every save. Ruff in pre-commit, pylint in CI is the split that’s worked for me. One more thing: stick to 0.4.x or above. Earlier versions had inconsistent rule coverage that made migrating from flake8 feel unreliable. The 0.4+ releases are stable enough that I’d trust them on a production data science codebase without double-checking every rule against the original tool.

4. DVC — Version Control for the Data and Models, Not Just the Code

The problem isn’t your code — it’s the 8GB file that disappeared

Here’s the situation that pushed me toward DVC: I did a git checkout to a branch from six weeks ago to reproduce a model result, and the training data it relied on was gone. Overwritten by a preprocessing script someone ran. No trace. The git history was perfect — every notebook commit, every config tweak — but the actual data was just missing. DVC exists to close exactly this gap. It version-controls pointers to your data, not the data itself, and stores the actual bytes in a remote (S3, GCS, Azure, or even a plain SSH server).

Getting it running

Install with your remote backend baked in:

pip install dvc[s3]
# or
pip install dvc[gcs]
pip install dvc[azure]

Then inside your existing git repo:

dvc init
dvc add data/raw/train.parquet

That second command does two things: it writes the file’s hash into data/raw/train.parquet.dvc, and it adds the actual parquet to .gitignore. You commit the .dvc file to git. The blob goes to your remote. Later, any teammate (or your future self on a new machine) runs dvc pull and gets the exact bytes back. This is the core loop and it genuinely works.

The pipeline config that makes DVC more than just large-file storage

Where DVC earns its keep beyond “S3 but tracked” is dvc.yaml. A real stage looks like this:

stages:
  featurize:
    cmd: python src/featurize.py
    deps:
      - data/raw/train.parquet
      - src/featurize.py
    outs:
      - data/processed/features.parquet
  train:
    cmd: python src/train.py
    deps:
      - data/processed/features.parquet
      - src/train.py
      - params.yaml
    outs:
      - models/classifier.pkl
    metrics:
      - metrics/eval.json

Run dvc repro and it hashes every dep. If featurize.py hasn’t changed and neither has the input parquet, it skips featurization entirely and jumps straight to training. This sounds obvious but it saves real time on pipelines with slow preprocessing steps — I’ve had runs where DVC skipped 40 minutes of feature work because only the model architecture file changed.

The experiment tracking pattern most people miss

The params.yaml + dvc exp run combo is genuinely underrated. Put your hyperparameters in params.yaml:

train:
  lr: 0.01
  epochs: 50
  batch_size: 128

Reference them from your training script via dvc.api.params_show() or just load the YAML directly. Then run experiments like this:

dvc exp run --set-param train.lr=0.001
dvc exp run --set-param train.lr=0.0001 --set-param train.batch_size=64

Each run gets its own entry in dvc exp show, tracked against the metrics you log. You get experiment branching without spinning up an MLflow server, without a separate database, without convincing your team to adopt another tool. For solo projects or small teams that don’t want the MLflow operational overhead, this covers 80% of what you actually need.

The remote config gotcha that bites every new team member

I’ve onboarded three people to DVC-backed repos and the same thing trips everyone up: credentials. The solution is to always use the --local flag for secrets:

dvc remote add myremote s3://your-bucket/dvc-cache
dvc remote modify myremote access_key_id --local YOUR_KEY
dvc remote modify myremote secret_access_key --local YOUR_SECRET

The --local flag writes to .dvc/config.local, which is gitignored by default. The base .dvc/config only stores the bucket URL — nothing sensitive. Document this explicitly in your project README with the exact commands. Without it, someone will either hardcode keys into .dvc/config and push them, or they’ll get cryptic S3 auth errors and spend an hour debugging. I’ve seen both happen.

Honest skip criteria

If your raw data never leaves a data warehouse — everything lives in Snowflake or BigQuery and your pipeline starts with a SQL query, not a local file — DVC adds friction without much return. You’re not versioning the warehouse tables, you’re versioning the query that hits them, and git handles that fine. DVC shines when you’re actually moving files around: downloading raw datasets, generating intermediate artifacts, serializing trained models to disk. If that’s not your workflow, skip it and don’t feel guilty about it.

5. uv — Package and Environment Management That Actually Works

I switched from conda to uv six months ago and I’m not going back

Conda was the default for years in data science, and I get why — it handles non-Python dependencies like CUDA libraries and handles environments reasonably well. But the solve times alone drove me out. A fresh conda create with a typical data science stack would sit at “Solving environment…” for 4-7 minutes. Add a channel conflict between conda-forge and defaults, and you’re debugging dependency hell for an afternoon. The base environment bloat is real too — a stock Anaconda install dumps 3GB on your disk before you’ve written a single line of code. Miniconda helps, but the solver problem doesn’t go away.

Plain venv + pip is the other option people reach for, and it’s fine until you need reproducibility. There’s no lockfile. pip freeze > requirements.txt is not a lockfile — it captures your current state but doesn’t encode a full dependency graph with hashes. On a fresh CI runner, installing 40 packages from a requirements.txt takes 2-3 minutes because pip hits PyPI fresh every time and resolves dependencies on the fly. That’s not a minor inconvenience; that compounds across every PR.

Installing uv takes one command and produces a single binary with zero Python dependency:

curl -LsSf https://astral.sh/uv/install.sh | sh

That’s it. No bootstrapping pip, no worrying about which Python version is active. Starting a new data science project looks like this:

uv init myproject
cd myproject
uv add pandas polars marimo dvc ruff

This generates a pyproject.toml and a uv.lock file. The lockfile is the key piece — it’s a full dependency graph with exact versions and hashes, designed to be committed to version control. Cross-platform reproducibility is handled. Every teammate, every CI runner, gets the same environment.

The command I run every morning when pulling from a shared repo is just:

uv sync

That’s it. It reads the lockfile, creates a .venv if it doesn’t exist, and installs exactly the pinned versions. If nothing changed in the lockfile since yesterday, it’s near-instant because uv caches wheels aggressively. On a 40-package data science environment (pandas, polars, scikit-learn, matplotlib, seaborn, dvc, ruff, and friends), I timed pip install from scratch at around 90 seconds and uv sync from the lockfile at around 8 seconds. Check your own numbers — YMMV depending on your hardware and network — but the order-of-magnitude difference held across every machine I’ve tested.

Honest gotcha before you commit fully: packages with unusual C extensions can occasionally fail uv’s wheel resolution. numpy and scipy are fine, they have wheels for everything. But niche geospatial packages — some builds of pyogrio, certain fiona versions — sometimes choke. The fallback is:

uv add --no-binary :all: fiona

This forces a source build, which requires your system to have the relevant C libraries. It’s the same problem you’d hit with pip anyway, so it’s not really a regression — but don’t expect magic on packages that were already painful. For the 95% of a standard data science stack, uv just works, and works fast enough that I’ve stopped thinking about environment management as a chore.

When to Pick What — Match the Tool to the Situation

Solo Work vs. Team Projects vs. Teaching — The Actual Decision Tree

The mistake I see constantly is people picking tools based on what’s trending on Twitter instead of matching the tool to the actual situation. I’ve done it myself — spent two days configuring a full DVC pipeline for a dataset I was the only person touching. Don’t do that. Here’s how I actually think through the choice.

Solo Exploration on Small Datasets

Your full stack is Marimo + Polars (eager mode) + uv and you genuinely don’t need anything else. Here’s the setup that takes under five minutes:

curl -LsSf https://astral.sh/uv/install.sh | sh
uv init my-analysis
cd my-analysis
uv add marimo polars
uv run marimo edit analysis.py

That’s it. You have a reactive notebook, a fast DataFrame library, and a reproducible environment — all without touching conda, pip, or virtualenv manually. The thing that caught me off guard with Polars in eager mode is how readable the errors are compared to pandas. When you chain .filter() on a wrong column type, it tells you exactly what went wrong and where. In pandas you’re often reading a stack trace that starts three abstraction layers deep.

Marimo’s killer feature for solo work is that your notebook is a valid .py file. You can run it as a script with uv run analysis.py and it executes top-to-bottom without cell ordering headaches. No more “works on my machine in cell order 3-1-5-2.”

Team Projects with Shared Data

The moment a second person touches the repo, you need DVC. Add it the same day, not after someone asks “which version of the dataset did you train on?” and nobody can answer.

uv add dvc dvc-s3   # or dvc-gs for GCS
dvc init
dvc remote add -d myremote s3://your-bucket/dvc-store
dvc add data/raw/customers.parquet
git add data/raw/customers.parquet.dvc .gitignore
git commit -m "track raw dataset with dvc"
dvc push

The .dvc files go into git. The actual data goes to S3 or GCS. Your teammates run dvc pull and get the exact binary you used. DVC’s free tier is unlimited for self-managed remotes — you pay only for the cloud storage you already have. The gotcha: DVC doesn’t handle very large numbers of small files well (thousands of CSVs in one dvc add call gets slow). If that’s your situation, zip them first or track the directory as a single unit with dvc add data/raw/.

Any Shared Codebase: Ruff in Pre-commit, Day One

I cannot stress this enough — add Ruff to pre-commit hooks before you write your second commit, not after six months of style drift where half the team uses double quotes and half uses single quotes and nobody agrees on line length. Retrofitting a linter onto an existing codebase means one massive noisy PR that touches every file and breaks every open branch. Add it early and the cost is near zero.

# .pre-commit-config.yaml
repos:
  - repo: https://github.com/astral-sh/ruff-pre-commit
    rev: v0.4.4
    hooks:
      - id: ruff
        args: [--fix]
      - id: ruff-format

uv add --dev pre-commit
pre-commit install

Ruff replaces flake8, isort, pyupgrade, and black in a single binary. It runs on a medium-sized data science repo in under a second. The honest trade-off: Ruff’s autofix occasionally makes a change that surprises you (especially with --fix on import sorting), so review the diff before committing the first time you run it on an existing codebase.

Teaching or Onboarding Juniors

Use JupyterLab. I know Marimo is better for production work, but when someone is learning Python data science for the first time, you want the tool that has ten years of Stack Overflow answers, tutorials, and course materials behind it. The Polars learning curve can wait — teach pandas first, or teach concepts with pandas and let them migrate later. What you should enforce from day one is uv for environment management.

uv init student-project
uv add jupyterlab pandas matplotlib
uv run jupyter lab

This gets them out of the “it works on my laptop but not yours” trap immediately, without them needing to understand why. Once the habit is formed — uv add instead of pip install, uv run to launch things — moving to more advanced tooling later is easy because the mental model is already right.

Notebook Environment Comparison

Criterion	Marimo	JupyterLab	VS Code Notebooks
Reproducibility	Enforced by design — cells are pure functions, no hidden state	Depends entirely on you running cells in order	Same problem as Jupyter — kernel state persists
Git-friendliness	Plain `.py` files — diffs are clean, PRs are readable	JSON `.ipynb` files — diffs are a nightmare without nbstripout	Also `.ipynb` — same diff problem, same workaround needed
Extension Ecosystem	Young — built-in UI components are solid, third-party sparse	Mature — extensions for everything including LSP, git, SQL	Best in class — full VS Code extension library available
Learning Curve	Low UI friction, but reactive model trips up pandas users used to mutation	Near zero — every data science tutorial uses it	Low if you already use VS Code; adds cognitive load if you don’t
My Pick For	Solo analysis, anything that ships to production	Teaching, quick exploration, legacy teams	Teams already on VS Code who want everything in one window

One honest note on VS Code Notebooks: the IntelliSense and debugger integration are genuinely better than JupyterLab’s. If your team is already living in VS Code for everything else, the notebook experience there is good enough that switching to a dedicated tool adds friction with minimal upside — unless reproducibility is a hard requirement, in which case Marimo is the only real answer on this list.

What I Dropped and Why

MLflow was the first to go. I set it up properly — tracking server, artifact store backed by S3, Postgres for the backend — and then spent three weeks maintaining the infra instead of doing ML work. The model registry is genuinely great, but DVC experiments cover the comparison and reproducibility side without requiring me to run a server. The command I actually use now is dvc exp run --set-param train.lr=0.01 and dvc exp show to compare runs in the terminal. No server, no migrations, no “why is my tracking URI wrong in the container” debugging at 11pm. If you’re on a team that already has MLflow deployed and maintained by someone else, keep it. If you’re the one who’d have to maintain it, skip it.

Poetry was my dependency manager from 2021 through early 2024. I switched to uv after clocking a poetry lock on a moderately complex project at around 45 seconds. The same resolution with uv lock runs in under 3 seconds on identical hardware. That’s not a benchmark I made up — run it yourself on any project with transitive ML dependencies. The gotcha I hit with uv early on: it doesn’t automatically read your pyproject.toml scripts block the same way Poetry does, so if you’ve got custom poetry run shortcuts defined, you’ll need to map those to uv run explicitly or define them differently. The migration is worth it. The speed difference is embarrassing for Poetry.

# What replacing a poetry workflow looks like
# Before:
poetry add torch --group dev
poetry run python train.py

# After:
uv add torch --dev
uv run python train.py

# Lock file generation that doesn't make you wait:
uv lock  # ~2-3s vs poetry lock's 30-60s on ML projects

Pylance didn’t get dropped — I still have it enabled for autocomplete and type inference in VS Code, and it’s genuinely good at that job. But I stopped relying on it as a linter. Ruff caught real problems Pylance’s linting passed over: unused imports in __init__.py files, shadowed builtins I’d been ignoring, and some B-category flake8 rules around mutable default arguments that had caused actual bugs. The config is minimal:

# pyproject.toml
[tool.ruff]
line-length = 88
select = ["E", "F", "B", "I"]

[tool.ruff.lint.per-file-ignores]
"notebooks/*.py" = ["E402", "F401"]

That notebooks ignore rule is non-obvious but essential — if you export notebooks to .py files, Ruff will flag every cell-order import as an error without it.

Conda I genuinely haven’t touched since 2022. The reason I stayed on it so long was CUDA — getting the right cudatoolkit version paired with the right PyTorch build used to require Conda or manual wheel hunting. uv changed this with explicit index configuration. You point it at PyTorch’s own index for the CUDA build you need, and it handles the rest:

# pyproject.toml
[tool.uv.sources]
torch = { index = "pytorch-cu121" }

[[tool.uv.index]]
name = "pytorch-cu121"
url = "https://download.pytorch.org/whl/cu121"
explicit = true

The explicit = true flag is the part people miss — without it, uv will check that index for every package, not just torch, which slows resolution and occasionally pulls the wrong build. With it set correctly, you get CUDA-enabled PyTorch in a plain virtual environment with no Conda layer anywhere in the stack. The environment.yml files and conda activate muscle memory took about two weeks to stop reaching for.

Disclaimer: This article is for informational purposes only. The views and opinions expressed are those of the author(s) and do not necessarily reflect the official policy or position of Sonic Rocket or its affiliates. Always consult with a certified professional before making any financial or technical decisions based on this content.

Written by Eric Woo

Lead AI Engineer & SaaS Strategist

Eric is a seasoned software architect specializing in LLM orchestration and autonomous agent systems. With over 15 years in Silicon Valley, he now focuses on scaling AI-first applications.