Foundations of Computer Vision (2024)

https://visionbook.mit.edu

253•tzury•7mo ago

Comments

pantulis•7mo ago

There is a very interesting section in the book, "On Research, Writing and Speaking", which includes gems like:

“This sounds like hard work.” Yes. It’s no longer about being smart. By now, everyone around you is smart. In graduate school, it’s the hard workers who pull ahead.

bonoboTP•7mo ago

That's definitely insightful. Everyone reaches a level where coasting on smarts is no longer sufficient.

Many reach this realization when starting university, but some can still coast okay in college since the material to learn is well defined and upper bounded. A PhD is not really upper bounded. There's no set out amount of papers to read per week like in a college course. There's no "this won't be part of the exam". Anything is fair game. The returns on being smarter never flatten out, but simply there's no ceiling. You can always do more, read more to keep up with the literature firehose, improve your experiments, your method, etc.

You also need soft skills and a network. You need to keep your finger on the pulse of the community by going to conferences and getting to know people, grabbing coffee or going out to dinner with them. You also need to be slef driven instead of waiting for instructions like it was in college. You need to be just the right amount of skeptical and critical regarding existing methods to be able to come up with new things while being also understood and accepted and seen relevant and exciting by the community.

You also need to manage your time and set your own deadlines and maintain a routine without the external sync given by university lectures and exams. All this basically has no upper limit and even the expectations are vaguely defined. You face rejections maybe for the first time despite having done a thorough work because the reviewers don't see enough novelty or it doesn't slot neatly into what is in fashion at the moment.

My point is that a PhD can push everyone to meet their mental limits. It can be frustrating and it's a notoriously hard period of time for many PhD students. Of course if your only goal is to graduate to get the doctorate, there are possible strategies to "coast", but those who go for the academic path often expect to achieve more than the bare minimum, especially if they managed to coast with good results in college.

VladVladikoff•7mo ago

In third year of undergrad it felt like I couldn’t even keep up with the class despite my hard work. Granted this was an engineering program which had an average entrance from highschool marks of 90%, and had 75% of the students drop out by 2nd year because it was so hard.

bonoboTP•7mo ago

The capabilities needed for a phd are quite different from regular engineering college though. You can mostly pull through college using cognitive horsepower and hard work directed at obvious and explicitly defined things. Grad school (PhD) is much more about nebulous skills (in addition to hard work and smartness), taste, "reading the room" of the research community, networking, acting according to unwritten and unspoken conventions and "just knowing", branding, self-marketing, confidence etc.

jb3689•7mo ago

I really wish someone would have shared something like this with me in graduate school. Learning how to be a successful grad student took me too long to learn. In fact, I honestly didn't event learn it until I was done with school.

bonoboTP•7mo ago

For anyone beginning a phd for some reason (there can be some good reasons; there are many bad reasons), this will save so much time: https://maxwellforbes.com/posts/your-paper-is-an-ad/

la_fayette•7mo ago

Unbelievable that this book is freely available! Thanks to the authors, publishers or whoever.

bonoboTP•7mo ago

The machine learning, computer vision and robotics communities are really great at publishing their books online for free access. You can get the absolute top textbooks of these fields for free online. Quite a contrast to other fields where profs kinda require you to buy the latest edition for hundreds of dollars in the US. Not to mention that this gives access to the best resources everyone around the world in poorer countries as well. Many also share their course materials and videos online.

walterlw•7mo ago

Very true and joining in on the thanks. Did you find a way to download it as a pdf though? I believe it is essential to be able to add notes and references when reading any learning material.

AdieuToLogic•7mo ago

Another great book in this field is:

  Computer Vision, Fifth Edition
  E.R. Davies
  Academic Press
  ISBN-13  978-0128092842

bonoboTP•7mo ago

The other main one is Szeliski's Computer Vision 2nd Ed from 2022 https://szeliski.org/Book/

Forsyth & Ponce is also good but somewhat old by now. And for 3d, the classic is still Hartley & Zisserman's Multiple View Geometry.

vincenthwt•7mo ago

Can anyone recommend a good book on Machine Vision? I believe the foundation of effective machine vision, and even computer vision, lies in selecting the right camera, optics, and lighting. High-quality images are essential because poor input leads to poor output.

ack_inc•7mo ago

Hi, could you mention a use-case or two where these things made a real difference?

jeffreygoesto•7mo ago

Any serious production inspection.

bonoboTP•7mo ago

The term "machine vision" is mainly used in highly controlled, narrow industrial applications, think factory assembly lines, steel inspection, monitoring for cracks in materials, shape or size classification of items, etc. The task is usually very well defined, and the same thing needs to be repeated under essentially the same conditions over and over again with high reliability.

But many other things exist outside the "glue some GPT4o vision api stuff together for a mobile app to pitch to VCs" space. Like inspecting and servicing airplanes (Airbus has vision engineers who make tools for internal use, you don't have datasets of a billion images for that). There are also things like 3D motion capture of animals, such as mice or even insects like flies, which requires very precise calibration and proper optical setups. Or estimating the meat yield of pigs and cows on farms from multi-view images combined with weight measurements. There are medical things, like cell counting, 3D reconstruction of facial geometry for plastic surgery, dentistry applications, and a million other things other than chatting with ChatGPT about images or classifying cats vs dogs or drawing bounding boxes of people in a smartphone video.

richard___•7mo ago

Your disdain for LLMs is unfounded.

bonoboTP•7mo ago

I use LLMs daily for coding. They are great. They are not a replacement for reading a book like the one linked here, or understanding image formation, lenses etc. Many people seem to imagine that all this stuff is now obsolete and all you need to do is wire up some standard APIs, ask an LLM to glue the json and that's all there is to being a computer vision engineer nowadays. Maybe even pros will self denigradinglybsay say say that but after a bit of chatting it will be obvious they have plenty of background knowledge beyond prompting vision language models.

So it's not disdain, I'm simply trying to broaden the horizon for those who only know about computer vision from OpenAI announcement and tech news and FOMO social media influencers.

vincenthwt•7mo ago

Thank you for your thoughtful comment! I completely agree.

It’s great to see someone emphasize the importance of mastering the fundamentals—like calibration, optics, and lighting—rather than just chasing trendy topics like LLM or deep learning. Your examples are a great reminder of the depth and diversity in machine vision.

bonoboTP•7mo ago

Thanks for the LLM response. Not sure if you meant to be clever here.

vincenthwt•7mo ago

Your clever remark highlights poor emotional intelligence and weak communication skills. Sarcasm might have its place in casual conversation, but in professional discussions, it signals insecurity and a lack of respect—neither of which contribute to meaningful dialogue.

Your disdain for LLMs is equally puzzling. Are you seriously suggesting I shouldn’t use tools to improve my grammar and delivery simply because they don’t align with your engineering view? Ironically, LLM-based tools likely support your own work—whether through coding assistance, debugging, or other tasks—even if you choose not to acknowledge it.

By the way, I used an LLM to craft this reply too—who doesn’t?

bonoboTP•7mo ago

Most don't use LLMs, and I'm telling you, many people are going to be pissed if they figure out that you're writing to them through LLMs. Maybe you find this reaction strange, but it's at least good to know in advance and not be surprised.

vincenthwt•7mo ago

You claim that 'most people' will be upset—are you their appointed spokesperson, or is this just your personal assumption? What I find strange is that I complimented and thanked you for your thoughts on machine vision, yet you responded with hostility. Is this how you communicate in real life too?

If 'most people' are upset about others using LLMs to improve their written communication, maybe they should reflect on why they hold such outdated views—or consider that the person replying might not be a native English speaker. Are platforms like Hacker News meant only for native English speakers?

Warning: The statement above was written by an LLM, so don’t be surprised—I’m letting you know in advance.

vincenthwt•7mo ago

Here are two examples where the right camera, optics, and lighting make a huge difference:

Semiconductor Wafer Inspection: Detecting tiny defects like scratches or edge chips requires high-resolution cameras, precision optics, and specific lighting (e.g., darkfield) to highlight defects on reflective surfaces. Poor choices here can easily miss critical flaws.

Food Packaging Quality Control: Ensuring labels, seals, and packaging are error-free relies on the right camera and lighting. For instance, polarized lighting reduces glare on shiny surfaces, helping detect issues that might otherwise go unnoticed.

hananova•7mo ago

The "Writing this book" section accidentally implies that LLM's were used for 2/3rds of the manuscript.

I think they probably mean that LLM's just gave them a lot more to write about, but I think it would be a good idea to clarify.

oytis•7mo ago

I am not reading it like this - in fact ChatGPT was the first thing out there that would be able to assist them in writing, and less than a third of this book was written after release of ChatGPT. To me it just looks like marking important events in ML/AI field on the graph.

oytis•7mo ago

Can someone working in the field comment on how relevant the content still is? A lot of ML including CV seems (from the outside at least) to be completely disrupted by the developments of the last two years.

bonoboTP•7mo ago

Very relevant. None of the recent techniques are truly revolutionary. It's all based on these same foundations. I'd say it would do good to read even older ones. There are lots of real, profitable computer vision applications built on classic methods like Hough transforms, canny edges, sift, Harris corners, etc. You should be familiar with these if you want to come across as a serious professional as opposed to a hype boy vibe coder who can just rattle off buzzwords and glue apis without fundamental understanding.

walterlw•7mo ago

there are still a lot of problems to be solved using "classical" computer vision, especially in systems where you don't have easy access to GPU acceleration. I am a practitioner doing Simultaneous localization and mapping on compute-restricted platforms, so definitely going to read the Structure from Motion chapter.

Greamy•7mo ago

It still is super relevant. Most computer vision done outside academia is still based on older stuff, or classical computer vision algorithms. You don't really get so many chances to use the latest models and techniques, as most often than not, they are not that relevant, or are only for extremely specific cases, or you just don't need something that complex.

aanet•7mo ago

Thanks for posting this.

Is there a Computer Vision course based on this book? Any videos, etc? Thanks!

Bringing Polars to .NET

Adventures in Guix Packaging

Show HN: We had 20 Claude terminals open, so we built Orcha

Your Best Thinking Is Wasted on the Wrong Decisions

Warcraftcn/UI – UI component library inspired by classic Warcraft III aesthetics

Trump Vodka Becomes Available for Pre-Orders

Velocity of Money

Stop building automations. Start running your business

You can't QA your way to the frontier

Show HN: PalettePoint – AI color palette generator from text or images

Robust and Interactable World Models in Computer Vision [video]

Nestlé couldn't crack Japan's coffee market.Then they hired a child psychologist

Notes for February 2-7

Study confirms experience beats youthful enthusiasm

The Big Hunger by Walter J Miller, Jr. (1952)

The Genus Amanita

We have broken SHA-1 in practice

Ask HN: Was my first management job bad, or is this what management is like?

Ask HN: How to Reduce Time Spent Crimping?

KV Cache Transform Coding for Compact Storage in LLM Inference

A quantitative, multimodal wearable bioelectronic device for stress assessment

Why Big Tech Is Throwing Cash into India in Quest for AI Supremacy

How to shoot yourself in the foot – 2026 edition

Eight More Months of Agents

From Human Thought to Machine Coordination

The new X API pricing must be a joke

Show HN: RMA Dashboard fast SAST results for monorepos (SARIF and triage)

Show HN: Source code graphRAG for Java/Kotlin development based on jQAssistant

Python Only Has One Real Competitor

Tmux to Zellij (and Back)

Bringing Polars to .NET

Adventures in Guix Packaging

Show HN: We had 20 Claude terminals open, so we built Orcha

Your Best Thinking Is Wasted on the Wrong Decisions

Warcraftcn/UI – UI component library inspired by classic Warcraft III aesthetics

Trump Vodka Becomes Available for Pre-Orders

Velocity of Money

Stop building automations. Start running your business

You can't QA your way to the frontier

Show HN: PalettePoint – AI color palette generator from text or images

Robust and Interactable World Models in Computer Vision [video]

Nestlé couldn't crack Japan's coffee market.Then they hired a child psychologist

Notes for February 2-7

Study confirms experience beats youthful enthusiasm

The Big Hunger by Walter J Miller, Jr. (1952)

The Genus Amanita

We have broken SHA-1 in practice

Ask HN: Was my first management job bad, or is this what management is like?

Ask HN: How to Reduce Time Spent Crimping?

KV Cache Transform Coding for Compact Storage in LLM Inference

A quantitative, multimodal wearable bioelectronic device for stress assessment

Why Big Tech Is Throwing Cash into India in Quest for AI Supremacy

How to shoot yourself in the foot – 2026 edition

Eight More Months of Agents

From Human Thought to Machine Coordination

The new X API pricing must be a joke

Show HN: RMA Dashboard fast SAST results for monorepos (SARIF and triage)

Show HN: Source code graphRAG for Java/Kotlin development based on jQAssistant

Python Only Has One Real Competitor

Tmux to Zellij (and Back)

Foundations of Computer Vision (2024)

Comments