HACKER Q&A
📣 zachrip

What problem are you close to solving and how can we help?


Please don't list things that just need more bodies - specifically looking for intellectual blockers that can be answered in this thread.


  👤 jordwest Accepted Answer ✓
I want to bring back old school distributed forum communities but modernise them in a way that respects attention and isn’t a notification factory.

Mastodon is a pretty inspirational project but the Twitter influence shows, I miss the long form writing that was encouraged before our attention spans were eroded.

Not at all close to solving it, but it’s been on my mind for a long time. Would love to hear if there are others like me out there and what you imagine such a community to look like.


👤 davehcker
I have posted this here before- hexafarms.com. I am trying to use ML to discover optimal phenotype for growing plants in vertical indoor farms to a. have the higest quality produce b. to lower the cost of producing leafy green/med plants, etc. within cities itself.

Basically, every leafy green (and herbs, and even mushrooms), can grow in a range of climatic condition (phenotype, roughly) ie temperature, humidity, water, CO2 level, pH, light (spectrum, duration and intensity) etc. As you might have seen around the world there is a rise in indoor vertical farms, but the truth is that 50% of those are not even profitable. My startup wants to discover the optimal parameters for each plant grown in our indoor vertical farm and eventually I would let our AI system control everything (something like alphaGo, but for growing plant X (lettuce, kale, chard, ). Think of it as reinforcement learning with live plants! I am betting on the fact that our startup will discover the 'plant recipes' and figure out the optimal parameters for the produce that we would grow. Then, the goal is that cities can grow food cheaper in more secure and sustainable way than our 'outsourced' approach in country side or far away lands.

So now I have secured some funding to be able to start working on optimizations, but I realized that *hardware* startups are such a different kind of beast (I am a good software product dev though, I think). Honestly, if anyone with experience in hardware related startups (or experience in the kind of venture I am in) would just want to meet me and advise me, I would take it any day. Being the star of the show, it's hard for me to handle market segmentation, tech dev, team, next round of funding, European tech landscape, etc. I am foreseeing so many ways that our decisions can kill my startup, all I need is advise from someone qualified/experienced enough. My email: david[at]hexafarms.com


👤 Rodeoclash
I'm not sure if this is in the spirit of the thread but I've been working on a way to allow reviews of gameplay in video games. In short, you upload a video of you playing the game and someone who's an expert can review it.

I currently have a UI with the comments down the side of the screen which looks like this:

https://www.volt.school/videos/c980297a-417b-416f-947b-58a70...

This is good because you can easily: - See all the comments - Navigate between them - See replies etc.

However it has a huge problem with you trying to balance watching the video with reading the comments.

I also have an alternative UI I've been working on which only shows one comment at a time:

https://www.volt.school/videos-v2/c980297a-417b-416f-947b-58...

However the downsides of this is that you can't see all the comments at once. I'm not a UI/UX designer AT ALL so I'd really appreciate some pointers around how to think about making this better! The original post mentions "close to solving", I think I am pretty close but it's still not quite right and while I'm not out of ideas yet, I'd appreciate feedback if solving this is obvious to someone else.


👤 peripitea
Don't have anything to add right now but I like the idea of this thread and would support it becoming a regular thing.

👤 k1rcher
We are having atrocious READ/WRITE latency with our PG database (api layer is django rest framework). The table that is the issue consists of multiple JSON BLOB fields, with quite a bit of data— I am convinced these need to be abstracted to their own relational tables. Is this a sound solution? I believe it is the deserialization in these fields of large nested JSON BLOBS that is causing latency. Note: this database architecture was created by a contractor. There is no indexing or relations existing in current schema. Just a single “Videos” table with all metadata stored as Postgres JSON field type blobs. EDIT: rebuilding the schema from the ground up with 5-6GB of data in the production database (not much, but still at the production level) is a hard sell, but I think it is necessary as we will be scaling enormously very soon. When I say rebuild, I mean a proper relational table layout with indexing, fk’s, etc.

EDIT2: to further comment on current table architecture, we have 3-4 other tables with minimal fields (3-4 Boolean/Char fields) that are relationally linked back the Videos table with a char field ‘video_id’, that is unique on the Videos table. Again, not a proper foreign key so no indexing.


👤 yosito
This may already be solved, but one of the last pieces remaining in my quest to be Google-free is an interoperable way to sync map bookmarks (and routes, etc) between different open source mapping apps. I can manually import/export kmz files from Organic Maps and OsmAnd, and store them in a directory synced between different devices with Nextcloud, but there's no automatic way to keep them updated in the apps, and so far I haven't found a great desktop app for managing them either. The holy grail would be to also have them sync in the background to my Garmin Fenix, but I am not aware of a way to sync POIs to a Garmin watch in the background.

Related: I'd love to have an Android app with a shortcut that allows me to quickly translate Google Maps links into coordinates, OSM links or other map links. There is a browser extension that does this on desktop, so if anyone is looking for a low hanging fruit idea for an Android app, this might be a fun idea (if I don't get around to it first).


👤 stavros
I built a community that aims to keep FOSS projects alive. It's meant to solve the kitchen and egg problem by having as many people and projects sign up, and then any developer who was interested could just automatically get commit permissions to any project.

It's called Code Shelter:

https://www.codeshelter.co/

It's stalled for a while, so I don't know how viable it is, but I'd appreciate any help.


👤 palmtree3000
Json diffing.

I haven't found any implementations I'd consider good. The problem as I see it is that there are tree based algorithms like https://webspace.science.uu.nl/~swier004/publications/2019-i... and array based algorithms like, well, text diffing, but nothing good that does both. The tree based approach struggles because there's no obvious way to choose the "good" tree encoding of an array.

I've currently settled on flattening into an array, containing things like String(...) or ArrayStart, and using an array based diffing algorithm on those, but it seems like one could do better.


👤 dzjin
I want to improve parts of online professional networking, specifically to be more about self-mentoring/shared learning, as opposed to sales connections.

This is ever more important with the onset of remote hiring, remote work, and the isolation/depersonalization it brings to newcomers to the industry.

There's also an "evil" momentum in remote hiring -- some companies _need_ asynchronous interviews to support their scaling and operations, and the general perception is that it's impersonal and dehumanizing.

This made me think that if we preemptively answered interview questions publicly, then it'd empower the job seekers to have a better profile/fight back a dehumanizing step, while allowing non-job-seekers to share the lessons that were important to them.

I've been getting decent feedback on my attempt at the solution HumblePage https://humble.page, the reality is that there's a mental hurdle to putting your honest thoughts out there.


👤 kroltan
Economically sustainable and ethical monetization of user-generated-content games.

The closest most known example of this kind of game nowadays is Roblox, but I'm thinking of things more like Mario Maker or the older-generation Atmosphir/GameGlobe-likes.

Unlike "modding platforms" or simulators/sandboxes/platforms such as Flight Simulator, VRChat or ARMA, these games' content are created by regular players with no technical skill, which means the game needs to provide both the raw assets from which to build the content, as well as the tool to build that content.

Previous titles tried the premium model (Mario Maker), user-created microtransactions (Roblox) and plain old freemium (Atmosphir and GameGlobe).

I suspect Mario Maker only works because of the immense weight and presence of the franchise.

Roblox's user-created microtransactions (in addition to first-party ones) seem to be working, but they generate strange incentives for creators, which I personally feel taints a lot of the games within it. (The user-generated content basically tends to become the equivalent of shovelware)

GameGlobe failed miserably by applying the microtransaction model to creator assets, which means that to make cool content, creators have to pay as well as spend lots of their time actually building the thing, which means most levels actually published end up being the same default bunch of free assets and limited mechanics.

Atmosphir is a bit closer to me so I find some more nuance in its demise, but long story short, essentially they restricted microtransactions only to player customization, however it didn't seem to be enough to cover the cost of developing the whole game/toolset. Eventually adding microtransactions to unlock some player mechanics, which meant that some levels were not functional without owning a specific item.

---

In short, the only thing that can effectively monetize on is the game itself (premium model) or purely-cosmetic content for players. Therefore, to incentivize the cosmetics, the game needs to be primarily multiplayer, which implies lots more investment on the creator tooling UX, as well as the infrastructure itself. But this also restricts the possibilities for the base game somewhat.


👤 chubot
These are statistics/math problems that 2 medical professionals I'm seeing are working on, not my own work. But they got me curious. FWIW I worked in "data science" as a software engineer for many years, and did some machine learning, so I have some adjacent experience, but I could use help conceptualizing the problems.

Does anyone know of any books or surveys about statistics and medicine, or specifically mechanics of the human body?

- One therapist is taking measurements of say your arm motion and making inferences about the motion of other muscles. He does is very intuitively but wants to encode the knowledge in software.

- The other one has an oral appliance that has a few parameters that need to be adjusted for different geometries of the mouth and airway.

The problems aren't that well posed which is why I'm looking for pointers to related materials, rather than specific solutions (although any ideas are welcome). I appreciate replies here, and my e-mail is in my profile. I asked a former colleague with a Ph.D. and biostats and he didn't really know. Although I guess biostats is often specifically related to genetics? Or epidemiology?

I guess the other thing this makes me think of is software for Invisalign or Lasik, etc. In fact I remember a former co-worker 10 years ago had actually worked on Lasik control software. If anyone has pointers to knowledge in these areas I'm interested.


👤 udev
I am blocked on finding a good (defined below) way to determine whether a product description A and product description B refer to the same product.

Imagine that a product description is a n-dimensional vector like:

  ( manufacturerName, modelName, width, height, length, color, ...)
Now imagine you have a file with m such vectors (where m is in millions), and that not all fields in the vectors are reliable info (typos, missing info, plain wrong, etc).

What is a good way to determine which product descriptions refer to the same product.

Is this even a good approach? What is state of the art? Are there simpler ways?

Here is what I mean by good:

  - robust to typos, missing info, wrong info
  - efficient since both m and n are large
  - updateable (e.g. if classification was done, and 10k new descriptins are added, how to efficiently update and avoid full recomputation)

👤 alecmgo
I want to make technical recruiters better at their job.

Many sourcers and recruiters don't have a technical background and find it very difficult to hire software engineers, especially in the current labor market which is very tight.

I'm starting off simple: writing recruiting guides from a software engineer's perspective that are easy to understand.

Are there other ways we can make technical recruiters better?


👤 contingencies
Frustrated by the degree of manual programming process in production metal machining. The industry exists largely on inertia. I would like to resolve this by applying standard optimization algorithms to a set of known machining strategies plus machine, work-holding, material, part and tool inputs. Have already analyzed the problem space to some extent and will be touring a huge production facility next week to better understand best-in-class processes from large established players. Need someone to either wrap existing simulation algorithms (any CAM system) or write enough of one (not that hard, the solution space is extremely multivariate but well understood and well documented) to make it feasible (not too hard for 2.5D machining). You can get as intellectual as you like in the solution, but remember perfect is the enemy of done. Value is huge, happy to split equity on a new entity to resolve if a workable solution for the easier subset of parts emerges in the next few weeks.

👤 pgroves
How to make png encoding much faster? I'm working with large medical images and after a bit of work we can do all the needed processing in under a second (numpy/scipy methods). But then the encoding to png is taking 9-15secs. As a result we have to pre-render all possible configurations and put them on S3 b/c we can't do the processing on demand in a web request.

Is there a way to use multiple threads or GPU to encode pngs? I haven't been able to find anything. The images are 3500x3500px and compress from roughly 50mb to 15mb with maximum compression (so don't say to use lower compression).


👤 cygned
I try to find an agile project management tool that works for us. We run on what many would call Scrum (it’s not actually Scrum).

We are on JIRA now, and it’s … JIRA. We tried basically any other tool, including Excel (yes, that is somewhat possible).

My problem generally is that tools are slow, planning is cumbersome, visibility is limited and reporting for clients is often even more limited.

Heck, I’d even write my own tool if I knew it would help others, but I am concerned it’s too close to what we already have for anyone to actually migrate.

You could help me by sharing your thoughts!


👤 zanek
I'm working on a different type of compression (for all file types). I am able to to get in the 10-20% range, but the speed to compress is to slow many times, or the compression doesnt complete at other times (I've been working on this for years). My personal website: http://danclark.org

I'm also working on a conversational search engine (using NLP) at http://supersmart.ai


👤 pedrokost
We are experiencing very high CPU load caused by tinc [0], which we use to ensure all communication between cloud VMs is encrypted. This is primarily affecting the highest traffic VMs, including the one hosting the master DB.

I am starting to consider alternative tools such us wireguard to reduce load, but I am concerned of adding too much complexity. Tinc's mesh network makes setup and maintenance easy. The wireguard ecosystem seems to be growing very quickly, and it's possible to find tools that aim to simplify its deployment, but it's hard to see which of these tools are here to stay, and which will be replaced in a few months.

What is the best practice, in 2021, to ensure all communication between cloud VMs (even in a private network) is encrypted?

[0]https://www.tinc-vpn.org/


👤 ID1452319
There is a juxtaposition in the UK job market. We have millions of people working in low-paid precarious jobs in retail, food service, warehousing etc. while simultaneously companies complain that they cannot recruit into highly-paid, skilled roles due to a lack of candidates.

Given that you can study Introduction to Computer Science from Harvard University, online, for free and in your own time, it seems like the barriers to building skills is lower than ever.

However, many people are put off or intimidated by the idea of studying such a course. My solution to this is some kind of mentoring, either 1-to-1 or more likely in small groups. However, this is very resource intensive for my idea to scale. I'd be very interested to hear how others might approach this, both the mentoring or the underlying encouragement to study.


👤 dbancajas
How to find motivation/energy to do a long-term creative project when having a full time jobs + other responsibilities?

👤 michaelbuckbee
I could use some help with some heuristics for Machine Learning, like how much data do I need to make a workable model, what framework/approach makes more sense given my ultimate goals.

Here's an example: there's a lot of ML tutorials on doing image identification. Like you have a series of images: picture one might have an apple and a pear in it, picture 2 might have an apple, orange, and a banana in it.

Where I'm struggling is putting this into my domain. I have a 100k images and from that around 1k distinct labels (individual images can have between 1 to 7 of these different labels), with between 13,000 to 100 images as examples in each label.

Is that enough data? Should I just start working on gathering more? Is this a bad fit for a ML solution?


👤 erikerikson
How do we scale social accountability and knowing?

This is expected to enable us to solve distributed coordination problems. Also, it should facilitate richer more meaningful relationships between people.

Expected outcomes include increased thriving and economic productivity.

[edit: consider the limit on how many people you can know and the relationship between how deeply you come into relationship with that population and the size of that number]


👤 TeeMassive
A way to preserve and link factual data sets.

Most reference to Wikipedia are dead links.

Many legacy media will stealth edit articles or outright delete them.

Original media files can be loss and after strange eons their authenticity will not be able to be asserted.

It will soon be impossible to distinguish from deep fakes and actual original and genuine media.

Some regimes such as Maoist China wanted to rewrite their past from scratch and erased all historical artifacts from their territory.

There are strong pressure to create an Orwellian Doublespeak to erase certain words entirely from speech, books and records. With e-books now the norm it has now become legitimate question to ask if the books are the same they were when the author published them.

Collaborative content websites have shown that they were not immune to subversive large and organized influence operations.

I have set my mind to multiple solutions (even bought a catchy sounding *.org domain name!). Obviously it will have to be distributed as to build a consensus and thus it will have to rely on hashes. But hashes alone are meaningless so some from of information will have to come along with them, which in themselves are information to authenticate with other hashes. I was thinking that the authentication value would come from individual recognized signatories. Those would be a a mesh of statements of records. For example you might not trust your government, but you might trust you grandparents and you old neighbors who all agree that there was a statue on the corner of the street and they all link to each other and maybe link to hashes of pictures and 3D scans with links. Future generations can then confirm those links with other functional URIs.

Something like blockchain technology seems an obvious choice but I have no experience with that (for now) but also there is the problem that it needs to be easily usable; therefore there is a need of a bit of a centralization (catchy domain name yay!) although any one could setup his/her own service for certain specialized subjects.

Thoughts?


👤 cookiengineer
I'm working on a prototype that uses the compositional game theory [1] and adapts it to be able to reliably predict the order complexity of functors and their differences between states.

A huge bonus there would be when the order difference can be represented in a graph, so that tesselation or other approaches like a hypercube representation can be used for quick estimations. (that's what I'm aiming for right now)

If successful, the next step would be to integrate it into my web browser so that I can try out whether the equilibrium works as expected on some niche topics or forums.

[1] https://arxiv.org/abs/1603.04641


👤 marcosdumay
Yeah, this week I've restarted my scanning tunnel microscope that I'm failing to make work for years... The current one is a standard pair of long metal bars with the piezoelectric component on one end, with 2 screws, and a 3rd screw on the other end.

My problem is that it doesn't matter how I design the thing, either the screws offer too little precision so I can't help but to crush the tip into the sample every time, or too little travel distance so I can't help but crush the tip into the sample when adjusting the coarser screws near the tip. This is the kind of thing that looks like a non-problem on the web, because everybody just ignores this step.


👤 the-dude
We are working on a totally new way to do cold fusion, our only problem is getting enough new fuel into the reactor without disturbing the running process.

Any help would be greatly appreciated.


👤 habibur
A search engine that prioritizes ad free, tracker free sites.

Of course google can't do it. But this is a ripe for someone to step in.


👤 snerual
Stateful, exaclty once, event processing without the operational capacity to run a proper Flink cluster. This thing needs to be dead simple, pragmatic and cheap/simple to operate and update. The only stateful part in our infra at the moment is a PG database.

We are going to start work on this in a weeks, so I'm looking for some insights/shortcuts/existing projects that will make our lives easier.

The goals is to process events from students during exams (max 2500studnets/exam = ~100k-150k events) and generate notifications for teachers. No fancy ML/AI, just logic. Latency of max 1 min.

Our current plan is to let a worker pool lock onto exams (PG lock) and pull new event every few seconds for those exams where (time > last pull & time < now - 10s). All the notifications that are generated are committed together with a serialized state of the statemachine and the ID of the last processed event. Events would just be stored in PG.

This solution is mean to be simple, be implemented in really short timeframe and be a case study for a more "proper & large scale" architecture later on.

Any tips, tricks or past experiences are much appreciated. Also, if you think our current plan sucks, please let me know.


👤 sriram_malhar
I'm not sure I'm close to solving it, but I have an approach that I'd like some feedback on.

I have a corpus of text in many Indian languages, which i'd like to index and search. The twist is that I'd like to support searches in English. The problem is that there are many phonetic transliterations of the same word (e.g the Hindi word for law can either be written as "qanoon" or "kanun"), and traditional spelling correction methods don't work because of excessive edit distance.

My approach is this: Use some sequence to sequence ML technique (LSTM, GRU, ..., attention) to a query in English to the most probable translation and then use that to look it up using a standard document indexing toolkit like Lucene. (I can put together a training dataset of english transliterations of sentences to their original text)

The problem is that I'd like the corpus, the index and the model to be all on a mobile. I have a suspicion that the above method won't straightforwardly fit on a mobile (for a few Gig of corpus text), and that the inference time may be long. Is this assumption wrong?

How would you solve the problem? Would TinyML be a better approach for the inferencing part?


👤 dzink
Working to enable users of https://www.DreamList.com to record audio of any length and see it transcribed, ideally at the same time as the recording, while the recording is also saved. The goal is for grandparents to save stories for loved ones and not worry about quality of the transcription - just talk. When the recording is saved, the transcription can be redone or tweaked if needed later, but the memory is not lost. DreamList is web and soon native apps, so WebRTC connected to a cloud transcription service is my first instinct, but there are benefits to native iOS apis as well - especially being able to keep share stories while listening to other streams also on iOS (families talking and digging into stories together). What architecture/transcription approaches would you suggest? Any gotchas you've seen dealing with similar problems (accuracy given accents, do we train our own transcription based on gathered data, etc)?

👤 nickdothutton
I too am missing my 1990s forums experience. This feeling, and a particularly frustrating few minutes spent on LinkedIn prompted me to write something about it.

I discuss some intellectual problems and solutions.

https://blog.eutopian.io/building-a-better-linkedin/


👤 huksley
How visually implement process in the app? I.e. how to guide users over complex process they need to do in the app to achieve success?

The process might span different medium (write email, do something in the app, check twitter, etc) and different activities multiple days. How to make sure they know what they should do next? Checklist? Emails? Slack? Wizard?


👤 red0point
I‘m trying to re-/sell cheap bulk object storage, by renting cheap dedicated servers (e.g. Hetzner), connect them using 10GbE and putting them into a big Ceph cluster.

My problem is how to bill people for consuming object storage properly. Do you do it retrospectively and take the fraud risk? Are there any pre-existing platforms that do Ceph billing?


👤 desertraven
I find it hard to find communities for my ever-changing niche interests.

I’m working on community discussion boards which exist at the intersection of interests.

Eg. Mountain Biking/New Zealand, Propulsion/Submarine, Machine Learning/Aquaponics/Tomato, etc.

The search terms for interests are supplied from Wikipedia articles which avoids duplicate interests and allows for any level of granularity.

I find that key word functionality in search engines has degraded to the point that finding good content for niche interests is difficult. I’m hoping with this system I can view historical (and current) discussions around my many niche interest combos.

I’ve got the foundation done, I just need some feedback/advice on whether I’m reinventing the wheel here, or if others share this problem?


👤 gpa
The problem - correct string matching at scale. I am aware of fuzzy string matching. The problem is that the two strings can be > 90% similar even if the difference is, for example, one digit in the year of manufacturing. My current solution is to represent the 2 strings as similar as I can based on the available information by transforming (wrangling) the data to match the data as close as possible and then applying constraints based on make, model and year (they should be the same). It works pretty well, but I am looking for a more interactive (human-in-the-loop) solution.

👤 yasserf
I’m facing an issue where I store small binary data blobs within a Postgres column in order to benefit from delete cascades.

I’m considering moving the binary data into S3 and then doing the sync layer on the server (which means the front end requests the data from the backend and is given it back as a JSON object with base64 values).

Doing this manually via code isn’t impossible, just API intensive, so I’m wondering if this is a solved issue for anyone.

The why: The JSON blobs are recordings of words and sentences that can be copied between articles.


👤 CRConrad
I've been thinking for a while about building a tool to generate ETL/ELT jobs for data warehousing. Yes, there are lots of such tools already, but I've become frustrated in one way or another with all that I've used so far (mainly with their bloated size and clunky "repositories" and inscrutable runtime engines). This "while" that I've been thinking is stretching out -- the other day I stumbled across my earliest vague musings on the subjects, and noted that they are a couple of years old already, so I'm beginning to think it's time to stop thinking and start building...

For various reasons -- mainly familiarity, FOSS ecosystem, and cross-platform compatibility -- I'm going to try to implement this in Free Pascal / Lazarus. There is one kind of component I'm definitely going to need, and if there were a ready-made one I could use in stead of building one from scratch, it would save me a lot of time and effort. I've looked around online, but so far haven't found "the perfect one". So, my question is:

Can anyone recommend a good FOSS graphical SQL query builder component -- i.e, one which presents tables and their columns to the end user, so they can specify joins and filters by clicking and dragging, etc -- for Lazarus (or in a pinch Delphi, to port to FP/L)?


👤 pollomarzo
I'm looking for a way to integrate a React app with an existing Vue... thing. Don't really need any communication between the two, just displaying it would be fine. My issue is: the Vue code just throws in