i believe using a vectorDB to store the files, then only search for relevant portions would work to get around the input context limit.
https://python.langchain.com/docs/modules/chains/additional/vector_db_text_generation
Could break it down into multiple queries, probably using something like the chain of thoughts to break up different parts. seems like what the community is doing.
falcon-40b-code seems interesting. the encoder can be used standalone to run that vector search, before piping that output to the decoder.
i believe using a vectorDB to store the files, then only search for relevant portions would work to get around the input context limit.
https://python.langchain.com/docs/modules/chains/additional/vector_db_text_generation
Could break it down into multiple queries, probably using something like the chain of thoughts to break up different parts. seems like what the community is doing.
falcon-40b-code seems interesting. the encoder can be used standalone to run that vector search, before piping that output to the decoder.